Doc. no. WG21/N2105=06-0175
Date:
2006-10-23
Project: Programming Language C++
Reply to: Beman Dawes <bdawes@acm.org>
Introduction
Proposed Keywords
Do proposed names clearly denote semantics?
Are proposed names consistent with naming conventions?
How much do proposed names impact existing code?
Possible alternatives for problem keywords
Code-search web sites
Methodology
Acknowledgements
Existing keywords
This paper looks at new keywords proposed for C++0x, identifiers several that cause concerns, and proposes alternatives to eliminate or reduce the concerns.
Whenever the committee makes a decision that appears inconsistent or to fly in the face of reason, committee members (me included) get called stupid, arrogant, and unconcerned about ordinary users. That's OK as long as there is a compelling rationale for the decision, and alternatives have been considered and found wanting. The purpose of this paper is to discover if any proposed keywords are problematic, explicitly consider alternatives for problem keywords, and encourage development of rationale for the committee's final keyword choices.
Please don't shoot the messenger! I am in favor, often strongly, of all the proposals discussed in this paper. But someone has to point out the bad news about some of the proposed keywords.
Possible concerns regarding proposed keywords include:
Each of those concerns is discussed below.
Proposed New C++ Keywords | |
Keyword | Proposal |
_Char16_t, _Char32_t |
N1823 - New Character Types in C++ |
alignof, align_union |
N1877 - Adding Alignment Support to the C++ Programming Language |
concept, concept_map, where, axiom, late_check |
N2081 - Concepts |
constexpr |
N1980 - Generalized Constant Expressions |
decltype |
N1705 - Decltype (and auto) |
import |
N2073 - Modules in C++ |
nullptr |
N1601 - A name for the null pointer: nullptr |
static_assert |
N1720 - static_assert |
This concern has not be raised for any of the proposed keywords. It will be of interest, however, when considering alternatives to proposed keywords.
New keywords which do not follow current C++ Standard naming conventions are more difficult to learn and provide a lightning rod for criticism of C++. Indeed, there has already been criticism on comp.std.c++ and elsewhere of some of the new keywords for their inconsistency with current language and standard library naming conventions.
C++ language and standard library naming conventions for ordinary names are to use all-lowercase, begin with an alpha character, and separate multiple words with underscores. The underscore separator convention is sometimes not applied if it will become part of a set of existing keyword or library names that do not follow this convention.
The following proposed keywords do not follow the above conventions, and thus clash with similar current keywords and library names:
Proposed keyword | Similar existing keywords and library names |
_Char16_t , _Char32_t |
wchar_t, size_t, and the
many header <cstdint> types. |
constexpr |
const_cast , const_iterator , const_pointer ,
const_reference , const_mem_fun_t , and so on. |
decltype |
size_type , value_type , difference_type ,
argument_type , result_type , and so on. |
nullptr |
auto_ptr , shared_ptr , weak_ptr and bad_weak_ptr . |
Any new keyword has the potential to break existing code. In the past, it was not possible to quantify how much existing code would be broken, but with the advent of code-search web sites it is now possible to gain quantitative insights into the impact of a proposed new keyword. Although the searches that can be performed by these search engines are not yet sophisticated enough to make exact assertions about potential keywords, they do allow us to make hard-data predictions about the impact on existing code.
The impact of proposed keywords on 1,388,870 existing C++ source files is analyzed in the following table. See Methodology.
Keyword | # of files | Comments |
_Char16_t , _Char32_t , align_union ,
concept_map , late_check . |
0 | <1 in 1,000,000 files |
alignof |
0 | There was one use in a gcc compiler file not counted since it appeared compatible. |
static_assert |
0 | A few uses of a static_assert macro not counted since they appeared compatible. |
constexpr |
1 | |
nullptr |
7 | |
decltype |
14 | |
axiom |
17 | |
concept |
110 | |
import |
190 | 1 in 7,300 files |
where |
11,678 | 1 in 119 files; ~ 1% of all files. Projects include: wxWidgets, Mozilla, KDE, etc. |
The proposed where
keyword stands out as a serious problem. It
can't just be changed globally in a source file, because its use in comments is ambiguous;
is "where" the English word "where" or a source entity name? The name where
is used in widely in third party libraries, including
proprietary libraries. That means an organization can't upgrade to C++0x until
after
all of the third party libraries it depends on upgrade to C++0x. Because one of
the several meanings of the word "where" is "location", applications such as
Geographic Information Systems, Logistics Support, graphics, and some of the
physical sciences make particularly heavy use of where
in program
code. Because where
is a keyword in SQL, C++ code that composes SQL
commands is particularly hard hit. Because industrial code is proprietary, it is
not present in the code database that was searched, so the impact of adding where
as a keyword may be even worse that indicated in the table above.
Proposed keyword | Alternatives | Comments on Alternative |
_Char16_t |
char16_t |
N1823 proposes a typedef instead |
_Char32_t |
char32_t |
N1823 proposes a typedef instead |
constexpr |
const_expr |
Consistent with const_cast ,
const_iterator , etc. |
decltype |
decl_type |
Consistent with
size_type , value_type , result_type ,
etc. The case is weaker, however, because other type related keywords (typedef ,
typeid , typename ) do not have underscores. |
nullptr |
null_ptr |
Consistent with auto_ptr , shared_ptr ,
etc. |
import |
? | No useful alternative comes to mind. |
concept |
type_concept, typeconcept |
Although the case is weak since the impact on existing code is fairly minor, these would reduce impact to essentially zero, and avoid tramping on useful names. (0 files found). |
axiom |
type_axiom, typeaxiom |
|
where |
requires |
Cuts impact on existing code by factor of 64 (11,678 files to 183 files). Arguably does a better job of denoting semantics. |
Code-search web sites are starting to become available. These sites allow automated searches of publicly available source code. The files usually have some form of open-source license.
The ability to search vast amounts of source code allows quantitative analysis of the impact on existing code of a proposed C++ keyword.
Code-search sites include:
Google Code Search - www.google.com/codesearch - A search for "include" finds 809,000 C++ files and 4 million C files. Google is mistakenly classifying at least some C++ code as C code. For example, some Boost code is classified as C code. No apparent way to exclude comments from search.
codesearch.net - csourcesearch.net - 283 million lines of C/C++ code in 1.1 million files. Searches do not yield total hit counts, making the site less useful than it might otherwise be.
koders - www.koders.com - Claims 424 million lines of code, but doesn't appear to have as much C/C++ code as others. No apparent way to exclude comments from search.
krugle - www.krugle.com - Appears to have 1,388,870 files classified as C++ code. Allows comments to be excluded from search.
The krugle database was studied because it distinguishes between source code and comments, gives files found counts, appears to have done a good job of identifying C++ code, and did not yield self-contradictory results for test queries.
For each keyword of interest, the automated search was run on the word, selecting C++ as the language and "Source code" as the search area.
For candidates reported in 100 or less files, each search hit was examined to determine the actual count. This determination included discarding uses in literals, as part of longer names, and anything else that could skew the results.
For candidates reported in more than 100 files, the first five search hits on the first 20 pages (100 total hits) were examined to determine a percentage of correct hits. This determination included discarding uses in literals, as part of longer names, and anything else that could skew the results. The percentage in the sample was then applied to the total file count to obtain the file count reported in the table.
Gennaro Prota pointed out the nullptr
naming inconsistency in a
comp.std.c++ posting.
Existing C++ Keywords | |||||
|
© Beman Dawes 2006
Revised 23 October 2006