NOTES ON THE ISO VULNERABILITY DOC, MAIN PART This document and the part 1 is being reviewed by WG 23 members. This review is about 80% completed. This document reflects a review up to 24 May 2021. -- General comments The triggers or attacks described for the vulnerabilities fall into three types, which are not clearly discussed and separated: 1. Programmer mistakes, where the programmer makes a simple mistake of commission or omission, or misuses an obscure or complex language feature, or one with unspecified or implementation-defined effect, etc., but no malice is intended. 2. Input attacks ("black box" or "external" attacks): Harm can be triggered by selected malicious input to the program (including environment conditions), without any malicious alteration of the program's source code or binary executable, but possibly with knowledge of the source code (e.g. open-source programs). 3. Code-altering attacks: By altering or extending the program's source code, the attacker plants some kind of trapdoor, Trojan, or bomb, that can be activated later to cause some harm. The bad actor can be an insider with legitimate rights to alter the source code, or an intruder who obtains illegitimate access to the source code by cracking into the development or distribution system. I believe it might be clarifying to identify which type(s) of trigger or attack are possible, or necessary, for each vulnerability. Group - Postpone -- Specific comments: 6.2.3: The Ada example of Celsius and Fahrenheit has a formatting problem. The example with convert_to_fahrenheit is formatted as "F = convert_to_fahrenheit" followed by the "copyright" symbol, no doubt because the three-character string "(C)" has been helpfully interpreted as a copyright symbol. GROUP - Done 6.2.3: The example of using a sum of powers of 2 to construct a mask is formatted as "28 = 22 + 23 + 24", but was no doubt intended to have the final digits on the right-hand-side be exponents (superscripts), as in the Ada expression 28 = 2**2 + 2**3 + 2**4. GROUP - Done 6.2.3: The sentence "Some computers or other devices store the bits left-to-right while others store them right-to-left" has no meaning -- there is no "left" or "right" in the electronics. Bit-endianness is significant only in descriptions of words and bits, either graphical or textual, but even there the universal graphical convention (in my experience) is to show the most significant bit on the left and the least significant on the right. The only variable aspect is bit numbering -- whether "bit number 0" means the most significant or the least significant bit. And that has no relevance to languages (like C) were shift/mask operations are the default for bit manipulation; it is relevant only to languages (like Ada) where bits can be identified by their "bit numbers" (as in Ada record-representation clauses). There are some machines (microcontrollers, e.g. the 8051) where the same memory locations can be accessed both by bit addressing and by byte or word addressing, and in such machines bit-endianness ("bit ordering", i.e. bit numbering or bit addressing) can impact a program in any language, including C. However, few language standards define bit-addressable memory, the exception being "packed" arrays of Booleans in Ada and like languages. GROUP - Humanly understandable - no change. 6.3.3: Byte-endianness is of course an important issue, but the discussion of bytes in the two last sentences of 6.2.3 is strange because it focuses on word-crossing data, while byte-endianness problems arise when one views a word as a sequence of bytes, whether or not the data spans multiple words. To be sure, a next level of word-endianness problems can arise if data spans multiple words and "word" is not the largest scalar (integer) in the machine or the language. For example, when the machine has 64-bit integers stored as two 32-bit words, a big-endian machine puts the more significant word first, while a little-endian machine does the opposite. GROUP - Agree in principle, but absent specific wording, postponed to next major edition. 6.3.5, 3rd bullet, recommending the use of bit fields: This may be misunderstood as favouring the use of "bit fields" in C, where they are not standardized enough (as I understand it) to be as safe/portable as using binary, octal, or hex representations (as masks). Even in Ada, using record representation clauses with bit fields can be less portable (because of Bit_Order and word-size differences) than using shift/mask or binary, octal, or hex representations. Therefore I disagree with this recommendation, although it depends on what is meant by "supported by the language". IMO, and unfortunately, neither C nor Ada provide enough support to make their "bit fields" always more portable than shift/mask, although bit fields will often work in both languages. GROUP - Bullet removed 6.3.5: The last bullet ends "or bit representation". What is "bit representation"? It seems undefined, although it is used as the title of section 6.3. I would define it as "The representation of a value of a given type, such as an integer or a floating-point number, in the several binary bits of the memory bytes or words used to store that value. The meaning and position of each bit in those bytes or words." GROUP - Made more precise by adding "of values" 6.3.5: In summary, I've come to understand that the "endianness" problem, whether at the level of bits, bytes, or words, is basically a case of type-violating aliasing: for example, taking (a pointer to) a word and converting/treating is as (a pointer to) an array of bytes, or vice versa. It only seems like its own kind of problem because in _most_ cases the results of such aliasing can be described by talking about the order of bytes in a word -- whether it is big-endian or little-endian -- while in principle there could be much weirder results from such aliasing, results that could not be described as simple endianness differences. For example there have been (and perhaps are) machines where the number of bits in a word is not a multiple of the number of bits in a byte, so each word has some bits that are not part of any byte of the word. GROUP - no change implied 6.4.1: Perhaps also worth mentioning that, when IEEE formats are used, there are the special values Inf and NaN that can have surprising behaviour. -- Ok, they are addressed in 6.4.5. GROUP - - no change implied 6.5: I know what an "enumeration" and an "enumeration type" are, but what is an "enumerator" as in the title of this section? Ah, it probably refers to the "names" in the "set of names" in the first sentence, or what in Ada is called a "literal" of an enumeratation type. However, in 6.5.3 these are called "items". Better to be consistent and call them "names" or "enumerators" there too. GROUP - "enumerator" will be kept since it is a title used in all documents, it was an attempt to not favour one language over others. 6.5.1, 2nd paragraph: It speaks of "non-default representations", but does not say what a "default" representation is. A definition could be: "In most languages, enumerations have a default representation that maps the names (enumerators) to consecutive integers starting from zero." GROUP - not necessary 6.5.3, last paragraph: I fail to understand the meaning of the first sentence, "When enumerators are set and initialized explicitly, ...". What is an "enumerator"? What does it mean to "set and initialize" an enumerator? What is an "incomplete initializer"? If an "enumerator" is one of the names of the values in the enumeration type, I can perhaps understand this to mean e.g. a C definition of an enumeration where some, but not all, names are assigned ("set and initialized") integer values. Is that the meaning? An example could help. However, the assignment of a representation value to an enumerator should not be called "initialization", because the representation value cannot be dynamically changed later, as "initialization" would imply. This also applies to the second bullet of 6.5.6. GROUP - reject - consistent with retention of "enumerator" 6.5.4, first bullet: Here both "enumerator" and "literal" are used. Better to be consistent and use one term only. GROUP - changed "literal" to enumerator. We also address languages with no enumeration concept. 6.5.4, second bullet: Better say "enumeration type" instead of "enumerator type". GROUP - reject - same comment. 6.6: It seems to me that this section confuses three different kinds of conversion errors: - Conversions where the numerical value changes, in possibly unexpected or misunderstood ways. For example, bit-for-bit "unchecked" conversion of a signed integer to an unsigned integer type, where the value -1 can be converted to a very large (the largest possible) unsigned value. For another example, conversion (rounding or truncation) of a floating-point number to a type with less precision, or to a type with a different base (binary float to decimal float, for example). Such conversions can often be detected by compilers or static analyzers. Unfortunately, many such conversions are intentional and even well-defined in the respective languages, so static analysis will give many false-positive warnings. - Conversions that fail (exception, trap) because the original value cannot be represented in the new type (overflow failures). These can be vulnerabilities if the failure is unexpected by the programmer (Ariane 501 example) but not if proper exception handling is implemented. Compilers and static analyzers can often detect conversions where overflow could happen, and false-positive warnings can be rare if the analysis also checks for exception handlers. - Conversions where the numerical value is preserved, but the meaning of the value changes, for example conversion from a type specified to use "meters" to a type specified to use "feet", without applying the scaling factor. As most type systems do not let the programmer specify the physical units, these conversion errors usually cannot be detected by compilers or static analyzers. Moreover, most such "conversions" are not actual type conversions (e.g. both source and target are often "float") but just copies of a value to a new container/variable/use where this value has a different meaning (for example the failure of Mars Climate Orbiter, see below). This is not really a "conversion" error, but some kind of "logic" or "interfacing" error; perhaps a "missing conversion" error (error of omission). GROUP - this is correct, however, the document is not a programming tutorial. Requested detail is not necessary in this document. 6.6.3, 3rd paragraph: To avoid propagating the false belief that Ariane 5 is a failure, it would be nice to be more precise and say "The first launch of the Ariane 5 rocket [2][33] failed due to ...". GROUP - Accept 6.6.3, 4th paragraph: This example (attacker inputs aggressive values designed to trigger errors) is not a case of "conversion error", but a lack of input checking, or improper/insufficient input checking. Of course, the input value, if not checked, may then lead to a conversion error, but that is secondary, and it may as well lead to an out-of-bounds array access with no conversions involved (unless one thinks that there is an implicit conversion to the "array index" type). GROUP - Disagree. If the flaw is a conversion error that lets 2 parts of input data become decoupled, then this is simply the attacker taking advantage of a programming flaw. This particular attack has indeed happened. Say for example the attacker inputs a buffer of length 10**6 but correctly denotes its length as 10**6. The input validation would say "yup" but when the value 10**6 is stored into a 16 bit word, and the buffer allocated on that value, the attack will succeed. We reword slightly to show that conversions are involved. 6.6.3, last paragraph: The first Martian landers were the Soviet Mars 2 and Mars 3, of which Mars 2 impacted (no soft landing) and Mars 3 landed successfully but stopped transmitting shortly after landing. I have no reason to believe that either failure was due to faulty metric-imperial conversions. The first US lander attempts were Viking 1 and Viking 2, and both succeeded. The reference to the "first Martian lander" probably means the much later US Mars Climate Orbiter (https://en.wikipedia.org/wiki/Mars_Climate_Orbiter) which was not intended to land on Mars, but only to orbit Mars. It failed because a navigation error, caused by missing metric-imperial conversions in navigation operations on Earth, made it enter too deeply into the Martian atmosphere where it was probably destroyed by heating. I suggest to change the sentence to say "The Mars Climate Orbiter spacecraft failed to enter Mars orbit due to a missing conversion from imperial to metric units in the navigation computations." To be precise, the error was that one piece of ground SW produced its results in "pound-force seconds", while another expected its input in "newton seconds", but the corresponding conversion was not applied and the output from the first SW was fed unconverted into the second SW (reference [16] in the Wikipedia article). The fact that the missing conversion was not in the on-board SW, but in ground-control procedures, unfortunately weakens this example for this document's purposes. GROUP - Negative. It was the Mars Polar lander, and the failure is as stated in the document. 6.6.4, 4th bullet: I don't thinks shifts have anything to do with conversions. This bullet is not related to conversion errors, but to languages with operations that have undefined or implementation-defined behaviour in some cases (such as shifts of signed integers). GROUP - Disagree, a shift is an implicit conversion to and then back from a binary type. 6.6.5, first two bullets: I don't think these bullets are related to conversions; they are general advice for avoiding out-of-range values and overflows in computations. The relevant advice is to check values of conversion arguments before the conversion to ensure that the conversion works as intended. For example, the Ariane 501 failure, where such checks were considered by the programmers, but were omitted because analysis of the Ariane 4 launch trajectory showed that the values would be in range, and processor time was tight. The code worked well for Ariane 4, but failed when it was reused for Ariane 5, because Ariane 5 has a different launch trajectory and the analysis was not repeated. Group - Reject. Reading external values is always viewed as an implicit conversion to an internal type. 6.6.5, 3rd bullet: Add ".. and program handlers for such exceptions." GROUP - No. It is self evident that errors generated, either by exception or by setting a flag, must be handled. 6.6.5, 6th bullet: I don't see any connection between "plausible but wrong default values" and conversion problems. GROUP - Reject. The advice is about recovery strategy after a failing conversion. 6.6.5, last bullet: Surely unit systems should be respected whenever a value is moved or converted form one unit system to another, whether this involves numeric type conversions or not, and whether those conversion are implicit or explicit. I would replace this with more specific advice to document carefully the unit systems used in all parts of a program, all variables and parameters, and especially all interfaces between different parts of a program or between different programs. (The failure of Mars Climate Orbiter was a failure to document interfaces, or a failure to use and respect the interface documentation.) Even more concretely, one could suggest that all textual output from a program should also display the units of each output value, and all textual input to a program should require the input to also specify the units of each input value, but this may be going a bit too far. GROUP - The advice is correct. Adding a note about either use the language's capability to manage this or implement a documentation system to keep the issue at the management review level is a good point. 6.6.6, 1st bullet: This is very general and abstract advice that does not address directly any of the failure mechanisms. How about: Language designers should extend the type system, or other declarative mechanisms, to let programmers specify the physical dimensions and physical units of variables, to be checked statically by the compiler (ideally). The GNAT Ada compiler from AdaCore already has a non-standard extension letting users specify the dimensions of subtypes, e.g. "length" or "speed", which the compiler then checks in expressions and assignments. However, as I understand it, it does not yet handle conversions between different unit systems, but assumes that a single unit system is used throughout a program. GROUP - Reject We tried the specific approaches initially but received push-back from language designers. We cannot tell them how to do their job. 6.6.6: Further: Language designers and standardization organisations should develop a standard system of textual notations to identify the physical units of output and input to computer programs, and extend languages to make it easy for programmers to generate such notations on output, and check them (against the expected units) on input. Of course there are already standard abbreviations for many simple units ("m" for meter, etc.) but perhaps not for complex units such as acceleration -- "m/s/s", perhaps? Compare to the standard abbreviations for various currencies -- USD, EUR, etc. GROUP - Reject We tried the specific approaches initially but received push-back from language designers. We cannot tell them how to do their job. 6.8.5, Note 1: You might add that some languages support arrays whose lower bound is negative, in which case the index type must be signed. GROUP - accept - added. 6.8.6, last bullet: This seems to be relevant only if the language also allows pointer arithmetic. Othewise, if index bounds are checked when a pointer is set to point at an array element, the pointer remains valid without further index bounds checks when the pointer is used (assuming that the referenced array cannot change its index bounds, in which case this would probably become a dangling-reference error). Group - accept. 6.9.6, 3rd bullet (and other references to "automatically extended arrays"): You would not want an unsigned-index underflow from zero to -1 = largest-possible-unsigned-integer to automatically extend the array to be large enough to include that index. That could become an attack point for crashing the program, or even the system on which the program runs. While I understand why automatically extending arrays are attractive, I would rather recommend leaving the low-level "array" type without that capability, and instead recommend that languages should provide, and programmers should use, higher-level "container" data structures that do extend automatically, but where "indexes" are replaced by "keys" which are easier to use and less error-prone, because key values are usually not produced by computations that are prone to underflow or overflow. GROUP - accept - solved by narrowing the advice to upper bounds. This is provided in a number of languages. 6.10.5, first bullet: I suspect that this was intended to be two bullets, one sentence per bullet. GROUP - Good catch - thx.Actually 3 bullets. 6.11.3: I would include: Accessing a data area with a pointer type that refers to smaller or larger units of data (bits, bytes, words) than the data area actually contains can make the operation dependent on the bit representation of the data, including its endianness, and so can make the program unportable. For example, accessing an array of words as an array of bytes will work differently depending on byte endianness. GROUP - While I agree, there are many different ways that data can be manipulated once you use generalized pointers. This clause just highlights the general issue. A detailed discussion of one approach makes others seem less critical. 6.11.3: Further, calling a function through a pointer type that has been converted to specify a different parameter profile (for example, more or fewer parameters, or parameters of a different type) can trick the callee into performing normally illegal operations, possibly corrupting the call/return stack. GROUP - Accept - added. 6.11.4, second bullet: There seems to a superfluous "6.11." at the end of the bullet. GROUP - ??? 6.11.4: Why not include "Pointers to data can be converted to pointers to functions"? That sounds at least as dangerous as the opposite conversion, which is listed in the second bullet. GROUP - change sentence to say pointers to functions converted to/from pointers to data -------------------- 6.12.5: I understand none of the motivation behind these bullets. See following commments. GROUP - I do not like this complete subclause. Committee should discuss it. Back in the 2006-2010 timeframe when the first edition was published, we had many pitched battles over such issues. I think that this subclause was a real compromise. 6.12.5, first bullet: Why say "composite types" instead of "array types"? The only other common composite type is the "structure", or "record", but I don't thinks that present-day programmers tend to use pointer arithmetic for selecting a component of a structure/record, although that may be necessary in assembly language programming. Group - reject. People walk over records and multidimensional arrays as much as they do "classic" arrays. 6.12.5, second bullet: Using index arithmetic instead of pointer arithmetic just moves the possible failures into the array-indexing group, but the same harms can happen. Perhaps the point is that static analysis is easier to apply to index arithmetic than to pointer arithmetic? If so, that would be good to note as motivation for this bullet. Group - Reject. Indexing lends itself to static checking or runtime checking. Pointer arithmetic does not. 6.12.5, third bullet: I've never heard of pointer arithmetic using other than integer addends/subtrahends. Perhaps this bullet index to prohibit subtracting one pointer from another to give the distance (number of array elements) between the respective referenced data? That has some specific problems that are worth a separate section, I think. Group - Reject, not true for multi-dimensional arrays and for length computations. 6.12.6: I think the advice in the last bullet of 6.8.6 would be very much in place here: implement pointer types that do array-bounds checks. Group - good catch. Agreed. ---------------------------- 6.13.5: Perhaps suggest the use of static analysis to ensure that the program will never dereference a null pointer. GROUP - Accept 6.13.6: Note that Ada now allows the declaration of pointer types that exclude the null value, which means that this check can be applied already when a value is assigned to the pointer, and also means that the compiler can sometimes check that statically, for example by ensuring that every variable of such a type must be initialized (and not default-initialized to null). Perhaps similar types should be suggested for other and new languages in this section. GROUP - Accept 6.14.3, 1st bullet: I don't understand what it means for an object "to become undefined" and how that is related to dangling references. Should it instead say "to be deallocated"? GROUP - Agreed 6.14.4, 2nd bullet: I don't understand what is meant by "constructs that can be parametrized" and how that is related to dangling references. GROUP - Take under consideration. 6.14.5, first two bullets: I consider this advice impractical, except for some rather special situations. For example, the advice in the second bullet may be appropriate for batch-oriented programs that need and use all the memory they allocate, until the end of the computation, when it can all be released implicitly when the program terminates. GROUP - Disagree. First bullet is for lock-and-key allocation systems. Second bullet addresses many programs written that do everything statically, such as embedded applications. 6.14.5: I would add: Whenever pointers to heap objects are passed between modules or subprograms, document carefully, in the module/subprogram interface descriptions, which module, if either, is responsible for deallocating the heap object. GROUP - Added in principle. 6.14.5: I might also add: Consider using a language (such as Rust) where the compiler tracks and checks the "ownership" of heap-allocated objects accessed through pointers. GROUP - Added advice but stay away from recommending language choices. 6.14.6, second bullet: This advice is completely general and has no direct relation to dangling references. It should at least suggest what kind of "properties" would help to avoid dangling references. 6.16.6, first bullet: This is drastic advice. Would it not be enough to limit logical shifts to unsigned integer types? SM - Yeah, maybe. -- Stopped here!!!!!!!! 6.17.1: The discussion is hard to follow because it does not separate clearly between programming languages and natural languages. It would become clearer if "programming" or "natural" were added before each use of "language", as appropriate. Group - OK 6.17.4, 2nd bullet: This is rather abstract. What is "redundant coding of subprogram signatures"? I also don't understand how pre/post-conditions help with the problem of mistaken names. Is the point that if the wrong subprogram is called, its pre/post-conditions are more likely to fail than if the correct subprogram were called? But the bullet seems to say something different ("... do nothing if different subprograms are called.") Group Ð removed bulle because it does not appear to relate to the vulnerability. 6.17.5: Perhaps add: If the language allows the use of formal-parameter names in calls to associate formal parameters with actual parameters, use that facility, because it allows the compiler a further check that the intended subprogram is being called (by a check that the correct formal-parameter names are used). (Perhaps this is what 6.17.4 calls "redundant coding of subprogram signatures"?) * GROUP Ð Accept. Added sentence Use language features such as preconditions and postconditions or named parameter passing to facilitate the detection of accidentally incorrect function names. 6.17.5: Perhaps add: If the language has a strong type system, define separate types for separate purposes to make it more likely that mistakes in similar variable names are detected as type errors at compile time. GROUP Ð covered in 6.2 ÒType systemÓ 6.17.5: Coding standards sometimes suggest to create a project-specific or even company-wide glossary of terms and abbreviations, to be systematically used in names within that project or company. This seems like good advice, but is perhaps not sufficiently related to the present vulnerability to be included here. GROUP Ð agreed that it is not sufficiently related. 6.17.6: Perhaps add: Consider requiring (if not already the case) that implementations use all the characters of a name when comparing names, instead of some fixed number of leading characters. (Failure to do so was mentioned as a language or implementation problem in 6.17.1.) GROUP Ð Agreed, added. 6.18.4: I don't know if single-assignment, or definitional languages (functional "let" languages) count as "providing assignment", but surely they too can suffer this vulnerability: a "variable" (a name) is defined and bound to a value, but that value is never used. GROUP Ð Accept. 6.18.5, last bullet: This advice _creates_ a dead assignment, instead of removing or detecting one. Why is this advice here? It could be listed as a legitimate reason for a dead assignment, in the 3rd paragraph of 16.8.3. Although it is very likely that a compiler will remove such a dead assignment, unless the variable is marked volatile (which may slow the program down). GROUP Ð Agreed and implemented. 6.18.6: Perhaps add: Consider providing a statement to "erase" the value of a variable, by overwriting its value with an innocuous value, and not to be considered a dead assignment even if the variable is not volatile. (This would eliminate the need for the "erasing" but dead assignment suggested in the last bullet of 6.18.5. However, perhaps the whole issue of leaking or erasing sensitive values should be in some other section.) Group Ð While the idea is correct, it does not contribute to the avoidance of the vulnerability. 6.20.5, 3rd bullet: This seems to be the first place that uses the phrase "hide" for nested scopes that declare variables with the same name. Even the title of 6.20 calls it "reuse" of identifier names. I think "hide" is more descriptive than "reuse", so I suggest that "hide" should be used more, and at least defined in 6.20.1 or 6.20.3 as a synonym for "reuse". GROUP Ð Reject Ð insufficiently wrong. Changing a section title affects all parts of 24772. 6.20.6: Sub-section 6.20.1 describes the problem correctly as concerning "entities" declared with the same name in nested scoped. But sub-section 6.20.6 speaks only of "variables", not "entities", at least in its first two bullets. Should not the term "entities" be used here too? GROUP Ð Agreed, changed. 6.22.3: It could be useful and illustrative to list here some reasons why a variable may be left uninitialized, at least for some time after the creation (declaration) of the variable. These include: simple omission of the initialization by programmer mistake or forgetfulness; the lack of a sensible ("non-junk") value with which to initialize the variable in its declaration, with the sub-case where the type of the variable is "private" and all values of this type must be produced by some computation in calls to other modules; the presence of complex control flow (conditionals nested with or within loops) where it is hard to be sure that the variable is assigned a value before its first use. GROUP - Reject - This is not a tutorial. 6.22.3, last sentence: The semicolon at the end should probably be a period. SM - Fixed 6.22.5: The possibility of using dynamic checks for uses of uninitialized variables, with tools such as Valgrind or Purify, is not mentioned. Although static analysis is preferable, it if works, sometimes it is hard to make it work, while dynamic checks are easy to enable in test runs. GROUP Ð Agreed, bullet added. 6.22.5, 4th bullet: Initializing all variables in their declarations is not generally to be recommended because in many cases the initial values must then be dummy values, and dummy values can cause some of the same kinds of problems as uninitialized values cause. However, in some cases (such as null values for pointers) dummy values are safe (if dereferencing null pointers is checked). In fact, the advice in the 6th bullet, to avoid "junk initializations", contradicts the advice in this (4th) bullet. Perhaps this can be corrected by adding to the 4th bullet a condition that the initial values be "real", non-junk values. For example, initializing a "sum" or "counter" variable to zero is often the correct initial value. GROUP - "No junk initialization" is bullet 6, weakened that bullet to Òconsider ÉÓ 6.22.5: Perhaps add: When a module provides a type for use by other modules, and especially when the type in question is private (opaque to the client modules), ensure that the type either has a default initialization that lets the program avoid the use of uninitialized values or uninitialized components of the value, or that the module also provides such an initial value as a constant with which a variable of this type can be explicitly initialized, or a function returning such an initial value if a constant cannot be provided. GROUP Ð Reject. Too specific. Constructors simply move where default values are set. 6.22.6: Perhaps add: Extending the language to make it easy for the programmer to defer variable declarations until a point in the execution where a good initial value can be assigned in the declaration itself or immediately after the declaration. For example, consider allowing the interleaving of statements and declarations, either with or without explicit syntax to create new declaration scopes within a sequence of statements. GROUP Ð Reject. Too contentious. 6.22.6: Perhaps add something about help or automation to define a valid order of initialization of modules, when the modules have dependencies? That is, the Ada elaboration-order problem. The point being that an order is wrong (at least) if it means that some initialization step tries to use something that has not yet been initialized, therefore an automatically generated correct order would avoid some of the "uninitialized variable" vulnerability. GROUP Ð Reject. This is a topic for PHD theses. 6.23.4: This applicability definition does not cover APL, where the rule is very simple, but very unusual and in conflict with normal conventions. So perhaps add "or unusual" as an alternative to "complex". GROUP Ð Reject. ÒunusualÓ but simple is not a problem. 6.24.1: Shouldn't "side effect" also include effects on the environment, such as producing some output, or commanding some mechanical activity (in a robot)? Unpredictable ordering of such external side effects is also undesirable. Consider something like the Ada: move_is_ok := turn_left_is_ok(angle) and drive_is_ok(distance); It makes a great difference if the robot first turns and then drives, or if it first drives and then turns. GROUP Ð Accept in principle. Text added. 6.24.1, 1st paragraph: The example in parentheses should say that it is taken from the C language. GROUP Ð fixed. 6.26.3, legitimate reasons for dead code: I would divide these reasons into two groups: first, real reasons for deliberately including dead code, such as the last bullet; second, practical reasons for not excluding dead code, such as unused parts of libraries. The first group is really legitimate; the second is less legitimate, because it results from short-comings of tools (e.g. linkers that do not omit uncalled functions) or other incidental reasons. GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite. 6.26.3: Add a further legitimate reason: Code that is foreseen to possibly become useful if the program is modified in situ, e.g. by patching embedded SW in remote systems such as spacecraft where it is cumbersome to replace the entire SW or to add much new code to the SW. But perhaps that is covered by the 4th bullet (code that may be needed soon). GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite. 6.26.3, 3rd last paragraph: If there is code that the developer believes is never going to be used, this means that there are no tests that execute that code (with the exception of unit tests). That makes the inclusion of this code very risky and inadvisable. GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite. 6.26.5: The third bullet should be placed first, as it is the first thing to be done. Possibly include the static-analysis suggestion from the last bullet in this (first) bullet: "Identify any dead code in the application, perhaps with static analysis tools, and ...". GROUP Ð disagree. Small order adjustments made to focus on static analysis, followed by justifications and code marking. 6.26.6: Perhaps: Consider including standardized annotations or other means to identify code that is or may be dead, but that is deliberately included and should remain in the executable program. This will help to reduce false positive warnings from static analysis tools or branch coverage measurements, and will enable such tools to warn if the code is not actually dead. It will also provide a standard way to document the deliberate inclusion of dead code. GROUP - too far of a reach for this document. 6.27: Why is "and static analysis" included in the title? Is static analysis considered a vulnerability? :-) While section 6.27.5 does recommend static analysis to find errors, so do many other sections without having "static analysis" in their section titles. GROUP Ð agreed, changed and actual vulnerabilities documented. 6.28.3, second sentence: I think "lay" should be "lie". The text is not talking about laying eggs. GROUP Ð Agreed in principle, text rewritten. 6.28.5: The Pascal if-then-else example ends with a big, thick, unmatched closing parenthesis. A formatting problem? GROUP Ð agreed Ð bad form Ð fixed. 6.28.6: Today, most programmers use language-sensitive IDEs. Such IDEs usually provide means to identify and display nested structures, such as automatic code indentation, controllable "folding" of control structures, and high-lighting matching opening and closing delimiters, and section 6.28.5 suggests that programmers should use them. Perhaps add to section 6.28.6 advice to ensure that IDEs have such facilities. Perhaps all sections on language design and evolution should generally include suggestions for IDE functions as well a for language features, because IDEs are becoming as important as languages and compilers, in the daily work of programmers. GROUP - Good suggestion, but this is not applicable to language design. 6.29.3: Perhaps add that a code reviewer may misunderstand the code, based on this assumption, which may make the reviewer fail to detect an error in the code (false negative), or to report a spurious error (false positive). SM - Maybe. But this subclause is weak and needs improvement. It never documents the actiual vulnerability. Formal analysis or human analysis depends on being able to separate out the loop control logic from the loop execution logic. Modifying the loop control variable effectively makes the loop "spagetti code". GROUP Ð comments led to a significant rewrite. 6.30.1, 2nd paragraph after the bullets: I fail to understand this sentence. The meaning of "relationships between components", "existence of bounds value", and "changes the conditions of the test" are all unclear. An example would be necessary to make it understandable. GROUP Ð Agrees, removed. 6.30.1, last paragraph: Surely this error is more likely to lead to a buffer bounds violation, with all its possible security effects? As pointed out in the first bullet in 6.30.3. GROUP - Agreed. Rewritten. 6.30.5: Add a bullet: Distinguish conceptually, and in design, code and comments, between offsets and lengths. For example, in an array with index bounds 0 and 5, the offset from element 0 to element 5 is 5, but the length of the array is 6. The offset from element 2 to element 4 is 2, but the length of that sub-array (the slice 2..4) is 3. Note that "distance" is a confusing word here,because depending on context it can mean an offset or a length. GROUP Ð Reject Ð There is no language that confuses these items, just humans, which is why the final bullet suggests encapsulation. 6.31.5: How does the third bullet differ in meaning from the first bullet? They seem redundant to me. GROUP Ð fixed. 6.32.3, 3rd paragraph, 3rd sentence: This paragraph is dedicated to call-by-copy, so this sentence, which discusses the control of changes to formal parameters by means of "in", "out", "in-out" labels, is out of place, because it applies as well to call-by-reference. The sentence should be put before the second paragraph, in a discussion that separates the parameter-passing method (pass by reference or pass by copy) from the data-flow direction (in, out, or in-out). The last sentence of the 4th paragraph should also be moved into that discussion (and the part "by constant pointers" should be changed to "by marking a parameter as a pointer to a constant, for the purposes of the subprogram). SM - Good catch. Should have been new paragraph. Also fixed Òconstant pointerÓ GROUP Ð Agrees that this clause needs work. Left for a future revision. 6.32.3, 5th paragraph: Swapping two values with the exclusive-or method is an esoteric algorithm that is probably unfamiliar to most readers. A clearer example is needed. For example, consider a subprogram signed_sqrt(in x, out y) that returns y as the square root of |x|, provided with the sign of x. This could well be coded as: y := sqrt (abs (x)); if x < 0 then y := -y; end if; If x and y are called by reference, this would never return a negative value if called as signed_sqrt (x, x), because the condition "x < 0" would test the new value of x as set by the first assignment to y, which is never negative. Passing x by copy-in and y by copy-out (or by reference) would solve this problem. GROUP - Disagree Ð short examples are preferred to longer examples. 6.32.3, 6th paragraph, last sentence: Assuming C-like call-by-value, a subprogram can "pass back pointers to anything" only through a parameter that is a pointer to a pointer; it is not enough to just have a pointer to a data structure as introduced in the preceding sentence. And even languages with complex parameter-passing mechanisms can have "out pointer" parameters through which the subprogram can "pass back" pointers. Whether those pointers can "point to anything whatsoever" depends on other language features, not on whether parameters are passed by value or reference. I think this criticism of C-style parameters is overstating the case; the main problem in C is that explicitly passing a pointer (by value) is the only way to get pass by reference and the only way to get "out" and "in out" data flow, but passing a pointer does not separate "in" from "out" from "in out". However, the "const" qualifier can deny the "out" direction. GROUP Ð Agrees that this clause needs work. Left for a future revision. 6.32.3: Although 6.32.1 and 6.32.4 also talk about function return values, 6.32.3 does not mention return values at all. The same questions regarding "by reference" or "by value" apply to return values as to parameters, but the data-flow-direction questions do not apply; the direction is "out". Aliasing could happen in the case of return-by-reference if the function's return value is assigned to a variable that is also a call-by-reference parameter, for example x := signed_sqrt(x), using an obvious modification of the signed_sqrt example in an earlier comment. Section 6.32.3 should say something about potential problems in return-value passing. GROUP Ð Agrees that this clause needs work. Left for a future revision. 6.32.5, second sub-bullet: I don't see how aliasing can be introduced by using expressions or functions (I assume this means "function calls") as actual arguments, unless the expression or the function value are themselves some kind of references to other objects that are non-local or also passed as reference parameters in the same call. In Ada, an array slice is, I believe, the only such "reference" expression that does not explicitly involve a pointer (access value). If a function returns a pointer, aliasing is not reduced by storing that pointer in a temporary and then using the temporary as the actual argument instead of the function call, as suggested in this bullet. In short, I think this sub-bullet should be reconsidered and clarified. If the present advice is followed blindly for parameter types that are not pointers (e.g. integers or floats), it will add unnecessary complication to the code. GROUP Ð Fixed ÒfunctionsÓ to Òfunction calls and changed ÒaliasingÓ to Òaliasing effects. Consider in a future revision. 6.32.6: The sentence lacks a terminating period. Is the sentence complete, or has some part been truncated? SM - Fixed. Thx. 6.33.1, first sentence: Should "treating" be "taking" or "using"? We are not assuming that the address is ill with something, and the illness should be treated? SM Ð Agreed - Fixed. 6.33.1, second sentence: The character used before "Access" and "Address" seems to be a back-tick (`) and not an apostrophe (') as Ada requires. SM - Thx. Fixed. 6.33.3, last paragraph: In addition to interrupts, it may happen (in C++ or the like) that returning from the subprogram implicitly invokes some destructors (finalizers) that may also use the stack for their own purposes, and may thus also overwrite the data. Furthermore, there are computer architectures in development where the stack "rubble" (anything that is no longer part of an active frame) is implicitly inaccessible, causing a protection-fault trap if the program tries to access it, making this idiom unusable and such code unportable to these architectures (but happily the first test would expose the problem). GROUP Ð Disagree Ð Stack cannot be corrupted by correct constructors. 6.33.5, 2nd bullet: Perhaps this advice should be moderated so as not to apply to local variables that are statically allocated and not stack-allocated. Some languages allow such variables. Of course, such variables have their own problems, mainly the question of how long they retain their values in the face of concurrency and new calls of the same function. GROUP - Disagree Ð We are not using Òlocal variableÓ in the C style for local ÔstaticÕ variables. 6.33.6, first bullet: Same comment as for 6.33.5, 2nd bullet. GROUP - Disagree. 6.34.3, first paragaph: This seems to assume that arguments are pushed by the caller, but popped by the callee. That is not always the case. GROUP Ð This is not assumed. The statement is Òis pushedÓ with no allocation of responsibility. 6.34.3, first paragraph: In addition to a possible mis-match in the push and pop, equally likely and severe problems can occur when the callee accesses some part of the stack that the callee assumes to be an argument of a certain type, based on what the callee knows is its signature, but (because the caller uses a different signature) that part of the stack is not actually an argument of that type, but perhaps a local variable of the caller, or a parameter or local variable of a different type, or only a part of such a local variable or parameter, or a concatenation of (parts of) several adjacent local variables or parameters. The callee might then read some secret of the caller, or might alter some local data of the caller. GROUP - Already implicitly covered 6.34.4, last paragraph: "For .. then" is not grammatical. Reword as "Parameter mismatches are particularly likely for functions that accept a variable number of parameters." SM Ð Could not find. 6.35.3, first paragraph, last sentence: The word "and" seems wrong; the sentence starts with "if", but does not lead to a consequence. Perhaps "and" should be replaced by a comma: "If stack space is limited, the calculation of some values ... resulting in the program terminating". SM - Thx. Fixed. 6.35.5: Add: Use static analysis to detect non-obvious recursive call paths such as indirect and long recursive call cycles. SM - OK 6.35.6: Perhaps add: Consider providing language and implementation functions to (a) let the programmer specify the amount of stack space the program (or each thread) should be allocated; (b) let the program measure, during or after execution, how much stack space is/was actually used, (c) during execution, check for stack overflow and if it happens, signal a fault before any data are corrupted or other abnormal execution occurs, with (d) preferably a way for the program to handle stack overflow without aborting. Group - changed "stack" to "memory" since some implementations do not use a stack, and heap exhaustion is also a possibility. 6.37.1: This seems to confuse together two different kinds of type breaking: (1) overlays of objects with different types in the same storage area, and (2) copying bits from an object, where they represent one type of data, verbatim to an object where they are interpreted as some other type of data. It seems to me that (1) is much more risky than (2), but they are not clearly separated in this section. Group - reject. we do not want to preferrentially single out one way that reinterpretation of data can happen. 6.37.3, 3rd to last paragraph: Aliasing of parameters is not a case of this vulnerability, because aliasing does not involve type-breaking reinterpretation of data (except for some corner cases involving changes to the discriminants of records/unions). Remove this paragraph. Group - agree. Removed. 6.37.3, 2nd to last paragraph: In addition to Unchecked_Conversion, there are several other ways in Ada to do type-breaking reinterpretation of the overlaying kind, such as specifying the Address of a variable to be the same as that of another variable. Group - Reject - language-specific mechanisms that can result in this vulnerability belong in the language-specific documents. 6.37.3, last paragraph: I would have expected some reference to the "Pointer type conversion" vulnerability [HFC], section 6.11. Group - OK 6.37.5, 4th bullet: This advice, including the use of 'Valid in Ada, applies to all reinterpretations of data, not just when the mechanism is pointers with different underlying types as suggest in this bullet. For example, Unchecked_Conversion involves no pointers, but the common advice in Ada is to check the result with 'Valid. Group - agree in principle. Changed accordingly. 6.37.6: Perhaps add: Reducing the need for reinterpretation by providing standard functions to decompose and recompose values of standard types from its parts, for example, functions to extract the exponent or the mantissa of a floating point value, and functions to compose a floating point value from an exponent and a mantissa. Group - Suggested advice is too specific. 6.38.1, next to last sentence: In principle, it is just as likely that a given algorithm _requires_ shallow copying, to ensure the desired sharing of referenced objects, and will fail or perform poorly if a deep copy is performed instead. This is the case, for example, in "incremental" or "functional" data structures where the sharing of sub-structures is desirable, and is in fact the point. This is reflected in the first bullet of 6.38.5, good, however I would not call it "aliasing", unless all cases where there are several pointers to the same object should be called "aliasing" (which I doubt). Group - disagree. Aliasing is usually defined as any way that two or more names can refer to the same object or field in an object.. 6.39.4, 2nd bullet: Why is there no possibility of heap fragmentation in this case? It seems to me that this depends entirely on the garbage collection mechanism -- on whether that mechanism coalesces adjacent free blocks, or compacts the heap, or employs a two-space approach, etc. Group - agree. Fixed. 6.39.5: Perhaps add: Use languages (such as Rust) that track the ownership of heap-allocated memory blocks and so make it simpler and surer to deallocate them at the proper time. Group - agreed. Guidance added. Also added guidance about using reference counting. 6.39.6: Perhaps add: Adding mechanisms to control and track the ownership of heap-allocated memory blocks to ensure that deallocation happens at the right time, whether explicitly or implicitly. Group - reject - Such advice is premature, especially considering the age and complexity of existing languages. 6.39.6: Perhaps add: Defining standard "container" data structures which encapsulate the management of dynamic memory. Group - reject - storage pools cover language-provided containers, and container libraries are not considered to be a language issue. 6.40.3, last paragraph: My understanding of C++ templates is not complete, but I think there is an additional risk with special-case, explicit instantiations: If a compilation unit intends to make use of such a special-case instance, and #includes the file that defines the template itself (the general case), but mistakenly omits to #include the file that defines the desired special instance, the compilation will often succeed but the code will use the general case instead of the special case, and so may not work correctly. Advice to avoid this problem might be to ensure that all special-case instances are defined in the same header file that defines the template itself. Group - Way too language-specific to include in the Part 1. Part 10 should address this. 6.41.1: 1st paragraph, 2nd sentence: I don't agree that object-oriented systems are design to "separate ... code and data", quite the opposite; the object-oriented idea is to combine certain data (object components) with the related code (object operations) in the same "class" concept. Perhaps I misunderstand the sentence, and the intent is to say that object orientation aims to encapsulate closely related data and code (in a class) and also to separate that data and code from the rest of the program. Perhaps the sentence should be reworded. Group - reject - do not understand comment. Text says "encapsulate", not "separate" 6.41.1, last paragraph, last sentence: I've never seen "object brokerage" used in this way, and web searches don't show up any such uses. The most common combination of "object" and "brokerage" is in CORBA, that is, network-scale object-oriented communication and service invocation. Consider using a different term, perhaps just "multiple inheritance". Group - agreed, "object brokerage" changed to "languages" 6.41.3, last bullet: There are so many "-ing" words in this sentence that it is hard to understand. Perhaps reword as 'Directly reading or writing visible class members, instead of callling the corresponding "get" and "set" member functions which may include additional functionality that should be executed for every such read or write access.' Moreover, it is not easy to see how this problem is related to inheritance. Perhaps the explanation is that the "get" and "set" functions are implemented in a parent (ancestor) class, while the programmer who codes the direct accesses in some subclass method does not see those functions and is not aware of their existence. On the other hand, if the programmer knows aboute the class members that are directly accessed, why should the programmer not know about the "get" and "set" functions? Group - agreed - rewritten. 6.41.5, 5th bullet: I fail to see how methods that provide "versioning information" help. Is the information meant to be used at run-time somehow? Or how is it meant to be used to help with this vulnerability? Same comment for the first bullet of 6.41.6 ("common versioning method"). Group - agreed - deleted. 6.41.6, 2nd bullet: This advice would be better addressed to IDE developers, because the information is of much more use if provided interactively. See my comment on 6.28.6, above. Alternatively, compilers could emit this information in the debugging info, for use by call-graph-display programs (some C++ compilers already do this). Group - Language development could also include IDE development. We do not address IDE 6.42.3, first paragraph, first sentence: I think "the client has mechanism" should be "the client has no mechanism". SM - Good catch - thx. 6.42.6: Perhaps this advice should also require that the language mechanism for pre/post-conditions should obey (and check) the rules for the Liskov substitution principle. Group - reject - such checks are not feasible in general. 6.43.6: Perhaps add: Consider extending languages to allow the specification of "layers" of operations/methods so that the compiler can check that recursion cycles, as described in 6.43.1, cannot happen. In that example, the A and B methods would be in different layers, with the layer order specified to let any instance of A call any instance of B, but forbidding calls in the other direction. In effect, extend the language to support the second bullet in 6.43.5, with mandated checking that it is obeyed. Group - reject - Too specific. 6.44.1, next to last paragraph: The relevance of section 6.11 is not evident; polymorphic variables are not necessarily pointers. A better formulation would be to say that "Unsafe casts allow arbitrary breaches of safety and security, similar to the breaches described in section 6.11...". Group - agree - Implemented 6.44.3, first paragraph: This mechanism is not clear or not sufficiently explained. Why should an inconsistent state result in this scenario? In fact section 6.44.1 says that upcast + calling a parent method _avoids_ inconsistencies. Group - disagree - mechanism is explained explicitly in the final sentence of the paragraph on upcasts in 6.44.1. 6.46.2: There is an unusual amount of vertical white-space around the second line. SM - Thx. 6.47.1: There are many other possible differences and mismatches between code produced by compilers for different languages, or even different compilers for the same language. For example, different compilers may make different choices on register usage: which registers are caller-save, and which are callee-save; which register is used for the return address in jump-and-link instructions; and even in some cases which register is used for the stack pointer and in which direction the stack grows, with some compilers even using two stacks where others use one. In some cases the same processor may support two or more instruction encodings (e.g. 32-bit ARM and 16-bit THUMB). Different compilers may emit code with different encodings, requiring special care when linking such mixed-encoding object files into one program. The way to avoid such problems is for the processor vendor to define an Application Binary Interface (ABI) standard for the processor architecture. A compiler that follows the ABI can then produce code that is compatible with code produced by any compiler that follows the same ABI, at least for language features and processor features covered by the ABI. For example, the DEC VAX had a strong ABI defined by DEC. Group - reject - That is all true, but far beyond what we consider reasonable for this document. Most of the issues mentioned are subsumed under "calling conventions" 6.47.3, 2nd paragraph: The discussion of identifiers should also cover the linking process, the possible compiler-specific decoration or mangling of identifiers into linker symbols, and the means that a language may have to specify the linker symbol when importing or exporting objects or functions from or to other languages. Group - accept in principle. Text added. 6.47.3, 3rd paragraph: As far as I know, the Pascal standards do not define the representation of a "string" variable. The representation shown here, and its C equivalence, is produced by some specific Pascal compiler - perhaps some Turbo Pascal version. Moreover, the size of the "int" in the C form may depend on which C compiler is used. The paragraph should make clear that the details depend on which Pascal and C compilers are used. Ok, the very last sentence in 6.47.3 says something to that effect, but that comes very late, and can be understood to apply only to the numeric data types. Group - Put in "may correspond" 6.47.5: Add bullet: Use compilers that follow the standard ABI for the target processor (and choose a target processor with a good and comprehensive standard ABI). Group - reject - This document does not address ABI. 6.47.5: Add bullet: use the compilers' means to control the linker symbol used when exporting or importing objects or functions instead of relying on compiler-specific automatic mappings between source-code identifiers and linker symbols. Group - First bullet is intended to cover this. 6.47.5, 1st bullet: Note that the Fortran/Ada inter-language specifications for C only apply when a compatible C compiler is used, with compatible options. This again involves the processor ABI. Some C compilers have options to define the size of the "int" type, and whether "char" is signed or unsigned. The Fortran/Ada inter-language specifications cannot automatically cover such variations on the C side. Group - reject - we must assume implementations that conform to standards. We can consider a vulnerability associated with ABI's for a future revision. 6.47.5, 2nd bullet: Replace "languages" with "languages and compilers". Group - accept - used "languages and language processors". 6.47.5, 3rd bullet: All sub-bullets concerning identifiers should also involve the abilities of the linker to use long names, to separate upper/lower case characters, as well as the compiler-specific decorations and manglings. Perhaps better general advice would be to always specify the linker symbol for all items of the inter-language interface, to do that on both sides of the interface, and to ensure that the chosen linker symbols are acceptable to the linker and cannot be confused with other symbols, for example symbols used in the run-time system or in the standard libraries (of any language or compiler involved). Group - Reject - Too specific and the advice is arguable. 6.47.6: The advice is very general. Perhaps some more specific advice could be added, such as to ensure that the language has standard ways to define the linker symbols ("external names") to be used for all imports and exports. There could also be advice for language implementors to follow the standard ABI for the target processor, where possible, and to document any deviations from the ABI, and perhaps to warn the programmer if any exported or imported entity relies on such deviations. Group - That does not work. We get too much push-back from more specific advice. (Should the document structure have some general provision for advice to language implementors? Currently it has dedicated sections for advice for language designers, and for language users (programmers), but that leaves a gap in between: the language implementors. There are some cases of advice to implementors, however, usually in the sections devoted to language design implications.) Group - Reject. This is out of scope of this document. 6.47.5 (not 6), 3rd sub-bullet: If interpreted literally, this would make it impossible for a callee, imported from another language, to return any kind of error indication to its caller. I suspect the intent here is to warn against trying to use "sophisticated" error-reporting mechanisms such as exceptions to report errors across languages. But if exception propagation is defined in the ABI, and both compilers obey the API, it should work. Better to just warn programmers to ensure that exception propagation across languages works, before they try to use it, and perhaps to warn them that exception progation across languages may not be portable to diffent compilers or different target systems. Group - Agree in principle with the interpretation. Added explanatory words to 6.47.1 to justify the advice. 6.48: I would divide this section into separate sections on dynamically linked code on the one hand, and self-modifying code on the other. The issues are very different in the two cases. For instance, almost all applications executing on typical PC operating systems use dynamic linking to OS libraries, while few PC applications implemented in ahead-of-time-compiled languages such as C or Ada (non-JIT languages) use self-modifying code. Dynamically linked libraries are a special case of a wider vulnerability concerned with how the execution environment affects programs -- for example, many applications depend very much on various program-specific configuration and option files, which they read at start-up, often using search paths similar to LD_LIBRARY_PATH. Attackers can try to insert their own harmful versions of those configuration files somewhere in those paths to override the benign files, or simple user mistakes can result in the wrong files being used. Group - reject - Distinguishing the cases in more detail does not change the fact that the attack mechanism is to thereby execute arbitrary code, hence we find no compelling reason to split for this release. 6.48.1, 2nd paragraph, 2nd sentence: Historically, I believe that self-modifying code was introduced for machines that lacked some fundamental features such as index registers, indirect-addressing modes, indirect branch instructions, or return-address stacks. I haven't seen small memory sizes blamed for self-modifying code, although the memories of those ancient machines were small, of course Group - Added a general statement with small memory as a "such as" 6.48.1, 3rd sentence: By far the most common use of self-modifying code today is in JIT compilation. This is now so common that some people prefer not to use the negative-sounding term "self-modifying code" for it, but that is what it is, although the modification is limited to specific and benign cases, as it generally was in the ancient machines where it was first introduced. Group - accept - rewritten. 6.48.1, 3rd sentence: Self-modifying code is also used in many embedded systems to enable remote update of the software, either as a whole or by patching here and there. However, it could be argued that this is an "operating system" update feature, although the "operating system" in these cases is often completely fused with the "application". This use or mis-use of self-modifying code is both hugely important and risky for security, as SW in embedded systems such as network routers or other IoT devices is very much under attack. It is risky because attackers can use it for harm; it is important because SW maintainers can use it to patch vulnerabilities in fielded devices. I suggest that the section on self-modifying code should discuss cases where some degree or kind of self-modification is benign and useful, such as JITs or remote code updates. Group - Reject - Added explanation hides the original concepts. 6.48.5: Perhaps add some advice for the benign uses of self-modifying code, in particular for code updates in embedded systems. The main advice could be to limit or avoid the "self" by separating the SW that can modify code (the update/patch functions) from the SW that can be modified (the application). This approach is commonly used in spacecraft on-board SW to reduce the amount of SW that has to be designed and qualified to higher Design Assurance Levels to just the update/patch functions. Those functions are often separated into a separate executable called the "Boot and Service SW", which is held in read-only memory and cannot be patched. Group - reject - We do not want advice to encourage dangerous constructs, no metter how benign the context may appear. 6.48.6: I don't think that this suggestion (checking library signatures) is an issue for language design. It could be an issue for language implementation (see my general comment on 6.47.6), for the definition of program formats like ELF, and for the implementors of dynamic linkers and operating systems. Group - Reject - clearly type checking at library boundaries IS a language design issue. 6.49: The points about the target-system ABI that I made in my comment on 6.47.1 are very relevant here too. Group - reject, as per earlier replies about ABI's. 6.50.6: Have you considered making the languages enforce exception handling by: (1) always including the "propagated exceptions" set in the signature of a subprogram, and checking that it matches the code of the subprogram; (2) checking that the signature of the "main" subprogram does not contain the propagation of any exceptions; in other words, checking that all exceptions are handled? (I understand that Java offers this feature, but that it has some problems that have limited the actual usage of the feature. There have also been discussions of such a feature for Ada, but it is not included in the Ada 202x proposal.) Then, these signatures should be included in the ELF or other such file, both on the export side (provided interface) and on the import side (expected interface), and checked by the linker, whether static or dynamic. Group - reject - This issue is being vigously discussed in Ada, Java and C++ language communities, and there is no consensus. 6.51.1: Perhaps add: Library interface descriptions that include macro definitions may not be translatable to other languages, and so may prohibit or hamper the use of the library from other languages, and perhaps lead to interfacing errors (vide sections 6.47 and 6.49). Group - Reject - does not apply to this vulnerability. 6.52.5, last bullet: Asking programmers to consider arbitrary HW faults is non-productive, I fear. HW faults should be considered only where they could affect critical or irreversible actions. These and other HW failures should be mitigated or prevented by HW means (EDACs, check-sums, and redundancies). I don't have much hope for SW-implemented error-detection strategies. Group - accept. Point removed. 6.52.6: Perhaps suggest that languages/compilers which currently suppress checks by default should instead enable them by default. Of course this brings the risk that some programs which worked before (or seemed to) will now fail because some checks fail. And other programs may run more slowly than before, perhaps failing real-time deadlines. But still. Stopped here 24 May 2021. SM - for discussion. 6.53: This (quite short) section seems to be completely redundant with other sections. Why not delete it? SM - reject. The list of vulnerbilities is fixed for this iteration. 6.54.1: However, often it is hard to find a consensus on which language features are obscure, complex, difficult to understand, or hard to use. SM - Agreed, even withing one language commuity, let alone across languages. 6.54.5, 2nd bullet: Better to move the parenthesis "(organizations)" to the start of the bullet, as in the 3rd bullet. SM - Good catch. 6.54.5, last bullet: The static analysis could also check that the coding standards are followed (no use of forbidden features). SM - OK 6.54.6: Perhaps add some positive suggestions: Improving the description of complex features in the language standard. Modifying features to make them simpler or easier to use, without removing them entirely. SM - Too unfocused 6.55.3, 1st paragraph: I find the term "unspecified behaviour" a little strange for something where the implementation is only allowed to choose between a limited number of specified behaviours. "Unspecified" sounds absolute (what is later called "undefined behaviour"). I would call it "alternative behaviours", or even "limited implementation-defined behaviour". Perhaps it would help linear readers of the document to collect the definitions of all three terms (unspecified, undefined, implementation-defined) into 6.55.1, with references to the discussion in 6.56 and 6.57. SM - The C/C++ and Ada notions of these terms are nearly opposite. This was written for the "C" notions. In 6.57.3, the difference between "unspecified behaviour" and "implementation-defined behaviour" is said to be only that implementations are required to document the behaviour in the latter case. That is a very superficial difference. SM - The C/C++ and Ada notions of these terms are nearly opposite. This was written for the "C" notions. 6.55.5, 4th bullet: Idempotent behaviour is not enough to eliminate evaluation-order effects. Assume a global variable X and two Boolean functions, A and B, where A always sets X to 1 and B always sets X to 2, in addition to returning some Boolean value, and whatever the value of X was before. Both operations are idempotent, but after evaluating "A and B" the final value of X depends on which of A or B was evaluated last. Mathematically speaking, the important property is commutation, that A and B "commute" in the sense that the result of applying A first, followed by B, is the same as when they are applied in the opposite order. Unfortunately, "commutation" is not a property of a single operation alone, such as "idempotency", but a property of sets of operations, making it both harder to define and harder to check. I suggest to remove the alternative of "idempotent behaviour" from this bullet, and leave only the "no side effects" case. SM - Needs some thought. 6.55.5, 5th bullet, 2nd sub-bullet: What does "be enumerated" mean? Enumerated by whom and where? In the coding guidelines or in the code? 6.56.5, 5th bullet: "language extensions" is mentioned here; if that is considered "undefined behaviour", it should be introduced earlier, in 6.56.3 or 6.56.1. Formally, it is covered by the definition in 6.56.1, but that is easy to overlook there, so an explicit mention is better. In fact, it seems to me that the term "programming language" would, in general, include extensions added by the implementation; for example, the "Turbo Pascal language". But perhaps the term is more narrowly used in ISO documents. SM - For consideration 6.56.5, 5th bullet: Surely all uses of language extensions should be documented, no just those uses that are "needed for correct operation"? For example, some extension may be used for programmer convenience, or for execution speed. SM - for consideration 6.56.5, 6th bullet: I think "documented" should be "document", and should perhaps be followed by a comma. SM - OK 6.56.5, last bullet: I don't understand what should be documented. The part "provided by and for changing its undefined behaviour" seems garbled. SM - For consideration 6.56.6: If use of language extensions is considered unspecified behaviour, add a bullet: Making compilers optionally report all uses of compiler-specific language extensions, and optionally consider such use an error that makes the compilation fail. SM - for consideration 6.57.4: If the only difference between "implementation-defined" and "unspecified" behaviour is documentation of the behaviour, there should be an additional language criterion here: that the language standard requires implementations to document their behaviour in these cases. SM - covered in bullet 2 6.57.5, 5th bullet: I think "documented" should be "document", and should perhaps be followed by a comma. And "enumerated" should be "enumerate". SM - good catch 6.57.5, next to last bullet: I don't understand what should be documented. The part "provided by and for changing its implementation-defined behaviour" seems garbled. 6.57.5, last bullet: This seems rather weak advice. For example, if the language allows two kinds of behaviour, the likelihood that the two different compilers will choose the same behaviour is high, especially if that behaviour is much more convenient for the compiler, or for the target architecture -- I assume the "different technologies" part does not mean using two different target architectures. SM - Disagree. That is why it says "at least" and different OS, different hardware architecture, etc. 6.57.6, 1st bullet: Why should only "common" implementation-defined behaviours be listed? Why not all? SM - Too many to list "all" 6.57.6: Perhaps add: Extending the language to include features that have the same function as the features with implementation-defined behaviour, even if the new features are more costly in compilation time, execution time, or other resources. Then, possibly deprecating the now redundant features that have implementation-defined behaviour. SM - for consideration 6.58.5, 1st bullet: "complier" should be "compiler". SM - Thx 6.58.6, 2nd bullet: Why should only "obscure" problematic features be removed? Should not all trouble-spots be removed? SM - Agree 6.59.2: Most of the references listed here are general works with no specific connection to this vulnerability. This is very different from other similar sections which list specific paragraph or sections in MISRA or other coding rules and guidelines. Are these general references really appropriate here? Such general references would fit better in a general "references" section, which could also have some general discussion, for example of static analysis tools and which tools might help with which vulnerabilities. SM - The "standard" other vulnerability documents fail to address concurrency adequately. 6.60.2: Same comments as for 6.59.2. SM - Same response 6.60.4: Formatting problem: the text is formatted as a bullet, but without the bullet symbol. The text is also subtly different from 6.59.4, which is surprising and perhaps not intended. SM - Thx 6.60.5, 3rd bullet: I think it is unlikely that formal, abstracted models would have the capability to warn about coding failures that might prevent timely termination, such as staying too long in an abort-deferred region. Also, shouldn't the aim here be to show that thread non-termination (or delayed termination) is properly handled? SM - Those are covered in the other bullets. This bullet address the interaction of concurrent entities at a higher level. 6.60.6: I find this advice peculiar; there is no attempt to prevent the occurrence of the problem: late termination or non-termination of a thread. For example, the language could ensure that abort-deferred regions cannot take a long time to execute; or could insist that a thread is not allowed to ignore an abort request; or could place a time-out on thread termination, with an immediate forced abort if the time-out is exceeded (of course, preferably without losing any resources claimed by the thread). SM - For discussion. 6.61.5, 1st bullet: Surely "all data " is too much? It should be "data read or written by several threads". SM - No. Putting data in a shared memory region could result in concurrent access when it wasn not planned that way. 6.61.5, last bullet: I think it is dangerous to advise the use of "atomic" or "volatile" without explaining in more detail what they do and don't do. For example, I've seen advice that an update, such as K := K + 1, can be made thread-safe by marking K as "atomic", which is of course false. It should be made clear that "atomic" and "volatile" are very limited in effect, and must be used together with a correct, lock-free access protocol, faithfully followed by all threads. SM - Agreed. 6.62.2: Same comment as for 6.59.2. SM - same response. 6.62.2, last paragraph: The initial quote character should be T. SM - Thx 6.62.4: Formatting problem: the text is formatted as a bullet, but without the bullet symbol. The text is also subtly different from 6.59.4, which is surprising and perhaps not intended. SM - Thx 6.62.5, foot-note 7: This is formatted as a bullet, but should not be. SM - Thx 6.63.3, 2nd paragraph, 1st bullet: Instead of every thread (in the application) stopping because of dead-lock, I think it is equally or more likely that some threads are dead-locked, but others continue running. I suggest to change "every thread" to "some or all threads" and adjusting the rest of the sentence accordingly. The system might still make progress in some of its jobs, while being stymied in others. SM - OK 6.63.5, 7th bullet: Should "calls and releases" perhaps be "locks and releases"? SM - OK 6.63.5, 7th bullet: Suggesting that the order of "calls and releases" should be "correct" is not very helpful unless there is some explanation of what is "correct". For example, that any locking of several objects, to hold locks on all those objects at the same time, should always be done in the same order of the objects. Add some discussion of what is a "correct order" for this bullet. SM - for consideration 6.63.5, 8th bullet: Yes, static ceiling priorities can be statically checked (assuming that the call-graph can be statically constructed), but I know of no tool that does it automatically, starting from Ada source files, say. Does one exist? Can it be referenced? SM - It can be done in model checkers, but needs manual translation. 6.63.5, 9th bullet: What does it mean to treat a collection of tasks as "a separate independent entity"? In what ways should the collection be so treated? For scheduling, for locking, or what? This is obscure. SM - OK. This needs a footnote 6.64.3, point 4, last sentence: The sentence contains the ungrammatical "an object that's address was...". I suggest: "If the intent was to output the value of the object to which that parameter points, but the intended control sequence is modified to %n, the value of that object will be changed, instead of the object's value being output." SM - Fixed 6.64.3, last paragraph, 2nd sentence: The parenthesis is unnecessary and distracting; remove it. SM - OK 6.65.5, first bullet: This advice rather introduces the vulnerability, instead of avoiding or mitigating it. Delete this bullet. SM - to be considered 6.65.6: Perhaps add: Introducing a type of constant (such as Ada's "named numbers") that exists only at compile time, is not allocated memory at run-time, and therefore cannot be altered at run-time. SM - To be considered -- end