NOTES ON THE ISO VULNERABILITY DOC, MAIN PART

This document and the part 1 is being reviewed by WG 23 members. This review is about 80% completed.
This document reflects a review up to 24 May 2021.

-- General comments

The triggers or attacks described for the vulnerabilities fall into three types, which are not
clearly discussed and separated: 

   1. Programmer mistakes, where the programmer makes a simple mistake of commission or omission, or
misuses an obscure or complex language feature, or one with unspecified or implementation-defined
effect, etc., but no malice is intended. 

   2. Input attacks ("black box" or "external" attacks): Harm can be triggered by selected malicious
input to the program (including environment conditions), without any malicious alteration of the
program's source code or binary executable, but possibly with knowledge of the source code
(e.g. open-source programs). 


   3. Code-altering attacks: By altering or extending the program's source code, the attacker plants
some kind of trapdoor, Trojan, or bomb, that can be activated later to cause some harm. The bad
actor can be an insider with legitimate rights to alter the source code, or an intruder who obtains
illegitimate access to the source code by cracking into the development or distribution system. 

I believe it might be clarifying to identify which type(s) of trigger or attack are possible, or
necessary, for each vulnerability. 

Group - Postpone

-- Specific comments:

6.2.3: The Ada example of Celsius and Fahrenheit has a formatting problem. The example with
convert_to_fahrenheit is formatted as "F = convert_to_fahrenheit" followed by the "copyright"
symbol, no doubt because the three-character string "(C)" has been helpfully interpreted as a
copyright symbol. 
GROUP - Done 

6.2.3: The example of using a sum of powers of 2 to construct a mask is formatted as "28 = 22 + 23 +
24", but was no doubt intended to have the final digits on the right-hand-side be exponents
(superscripts), as in the Ada expression 28 = 2**2 + 2**3 + 2**4.  
GROUP - Done

6.2.3: The sentence "Some computers or other devices store the bits left-to-right while others store
them right-to-left" has no meaning -- there is no "left" or "right" in the
electronics. Bit-endianness is significant only in descriptions of words and bits, either graphical
or textual, but even there the universal graphical convention (in my experience) is to show the most
significant bit on the left and the least significant on the right. The only variable aspect is bit
numbering -- whether "bit number 0" means the most significant or the least significant bit. And
that has no relevance to languages (like C) were shift/mask operations are the default for bit
manipulation; it is relevant only to languages (like Ada) where bits can be identified by their "bit
numbers" (as in Ada record-representation clauses). 

There are some machines (microcontrollers, e.g. the 8051) where the same memory locations can be
accessed both by bit addressing and by byte or word addressing, and in such machines bit-endianness
("bit ordering", i.e. bit numbering or bit addressing) can impact a program in any language,
including C. However, few language standards define bit-addressable memory, the exception being
"packed" arrays of Booleans in Ada and like languages. 

GROUP - Humanly understandable - no change.

6.3.3: Byte-endianness is of course an important issue, but the discussion of bytes in the two last
sentences of 6.2.3 is strange because it focuses on word-crossing data, while byte-endianness
problems arise when one views a word as a sequence of bytes, whether or not the data spans multiple
words. To be sure, a next level of word-endianness problems can arise if data spans multiple words
and "word" is not the largest scalar (integer) in the machine or the language. For example, when the
machine has 64-bit integers stored as two 32-bit words, a big-endian machine puts the more
significant word first, while a little-endian machine does the opposite.  

GROUP - Agree in principle, but absent specific wording, postponed to next major edition.

6.3.5, 3rd bullet, recommending the use of bit fields: This may be misunderstood as favouring the
use of "bit fields" in C, where they are not standardized enough (as I understand it) to be as
safe/portable as using binary, octal, or hex representations (as masks). Even in Ada, using record
representation clauses with bit fields can be less portable (because of Bit_Order and word-size
differences) than using shift/mask or binary, octal, or hex representations. Therefore I disagree
with this recommendation, although it depends on what is meant by "supported by the language". IMO,
and unfortunately, neither C nor Ada provide enough support to make their "bit fields" always more
portable than shift/mask, although bit fields will often work in both languages. 

GROUP - Bullet removed

6.3.5: The last bullet ends "or bit representation". What is "bit representation"? It seems
undefined, although it is used as the title of section 6.3. I would define it as "The representation
of a value of a given type, such as an integer or a floating-point number, in the several binary
bits of the memory bytes or words used to store that value. The meaning and position of each bit in
those bytes or words." 

GROUP - Made more precise by adding "of values"

6.3.5: In summary, I've come to understand that the "endianness" problem, whether at the level of
bits, bytes, or words, is basically a case of type-violating aliasing: for example, taking (a
pointer to) a word and converting/treating is as (a pointer to) an array of bytes, or vice versa. It
only seems like its own kind of problem because in _most_ cases the results of such aliasing can be
described by talking about the order of bytes in a word -- whether it is big-endian or little-endian
-- while in principle there could be much weirder results from such aliasing, results that could not
be described as simple endianness differences. For example there have been (and perhaps are)
machines where the number of bits in a word is not a multiple of the number of bits in a byte, so
each word has some bits that are not part of any byte of the word. 

GROUP - no change implied

6.4.1: Perhaps also worth mentioning that, when IEEE formats are used, there are the special values
Inf and NaN that can have surprising behaviour. -- Ok, they are addressed in 6.4.5. 

GROUP - - no change implied 

6.5: I know what an "enumeration" and an "enumeration type" are, but what is an "enumerator" as in
the title of this section? Ah, it probably refers to the "names" in the "set of names" in the first
sentence, or what in Ada is called a "literal" of an enumeratation type. However, in 6.5.3 these are
called "items". Better to be consistent and call them "names" or "enumerators" there too. 

GROUP - "enumerator" will be kept since it is a title used in all documents, it was an attempt to not favour 
one language over others.

6.5.1, 2nd paragraph: It speaks of "non-default representations", but does not say what a "default"
representation is. A definition could be: "In most languages, enumerations have a default
representation that maps the names (enumerators) to consecutive integers starting from zero."

GROUP - not necessary 

6.5.3, last paragraph: I fail to understand the meaning of the first sentence, "When enumerators are
set and initialized explicitly, ...". What is an "enumerator"? What does it mean to "set and
initialize" an enumerator? What is an "incomplete initializer"? If an "enumerator" is one of the
names of the values in the enumeration type, I can perhaps understand this to mean e.g. a C
definition of an enumeration where some, but not all, names are assigned ("set and initialized")
integer values. Is that the meaning? An example could help. However, the assignment of a
representation value to an enumerator should not be called "initialization", because the
representation value cannot be dynamically changed later, as "initialization" would imply. This also
applies to the second bullet of 6.5.6. 

GROUP - reject - consistent with retention of "enumerator"

6.5.4, first bullet: Here both "enumerator" and "literal" are used. Better to be consistent and use
one term only. 

GROUP - changed "literal" to enumerator. We also address languages with no enumeration concept.

6.5.4, second bullet: Better say "enumeration type" instead of "enumerator type".

GROUP - reject - same comment.

6.6: It seems to me that this section confuses three different kinds of conversion errors:


- Conversions where the numerical value changes, in possibly unexpected or misunderstood ways. For
  example, bit-for-bit "unchecked" conversion of a signed integer to an unsigned integer type, where
  the value -1 can be converted to a very large (the largest possible) unsigned value. For another
  example, conversion (rounding or truncation) of a floating-point number to a type with less
  precision, or to a type with a different base (binary float to decimal float, for example). Such
  conversions can often be detected by compilers or static analyzers. Unfortunately, many such
  conversions are intentional and even well-defined in the respective languages, so static analysis
  will give many false-positive warnings. 

- Conversions that fail (exception, trap) because the original value cannot be represented in the
  new type (overflow failures). These can be vulnerabilities if the failure is unexpected by the
  programmer (Ariane 501 example) but not if proper exception handling is implemented. Compilers and
  static analyzers can often detect conversions where overflow could happen, and false-positive
  warnings can be rare if the analysis also checks for exception handlers. 

- Conversions where the numerical value is preserved, but the meaning of the value changes, for
  example conversion from a type specified to use "meters" to a type specified to use "feet",
  without applying the scaling factor. As most type systems do not let the programmer specify the
  physical units, these conversion errors usually cannot be detected by compilers or static
  analyzers. Moreover, most such "conversions" are not actual type conversions (e.g. both source and
  target are often "float") but just copies of a value to a new container/variable/use where this
  value has a different meaning (for example the failure of Mars Climate Orbiter, see below). This
  is not really a "conversion" error, but some kind of "logic" or "interfacing" error; perhaps a
  "missing conversion" error (error of omission). 

GROUP - this is correct, however, the document is not a programming tutorial.  Requested detail is not necessary in this document.


6.6.3, 3rd paragraph: To avoid propagating the false belief that Ariane 5 is a failure, it would be
nice to be more precise and say "The first launch of the Ariane 5 rocket [2][33] failed due to
...". 

  GROUP - Accept

6.6.3, 4th paragraph: This example (attacker inputs aggressive values designed to trigger errors) is
not a case of "conversion error", but a lack of input checking, or improper/insufficient input
checking. Of course, the input value, if not checked, may then lead to a conversion error, but that
is secondary, and it may as well lead to an out-of-bounds array access with no conversions involved
(unless one thinks that there is an implicit conversion to the "array index" type). 

GROUP - Disagree. If the flaw is a conversion error that lets 2 parts of input data become decoupled,
then this is simply the attacker taking advantage of a programming flaw. This particular attack has
indeed happened. Say for example the attacker inputs a buffer of length 10**6 but correctly denotes
its length as 10**6. The input validation would say "yup" but when the value 10**6 is stored into a
16 bit word, and the buffer allocated on that value, the attack will succeed. We reword slightly to show that conversions are involved.

6.6.3, last paragraph: The first Martian landers were the Soviet Mars 2 and Mars 3, of which Mars 2
impacted (no soft landing) and Mars 3 landed successfully but stopped transmitting shortly after
landing. I have no reason to believe that either failure was due to faulty metric-imperial
conversions. The first US lander attempts were Viking 1 and Viking 2, and both succeeded. The
reference to the "first Martian lander" probably means the much later US Mars Climate Orbiter
(https://en.wikipedia.org/wiki/Mars_Climate_Orbiter) which was not intended to land on Mars, but
only to orbit Mars. It failed because a navigation error, caused by missing metric-imperial
conversions in navigation operations on Earth, made it enter too deeply into the Martian atmosphere
where it was probably destroyed by heating. I suggest to change the sentence to say "The Mars
Climate Orbiter spacecraft failed to enter Mars orbit due to a missing conversion from imperial to
metric units in the navigation computations." To be precise, the error was that one piece of ground
SW produced its results in "pound-force seconds", while another expected its input in "newton
seconds", but the corresponding conversion was not applied and the output from the first SW was fed
unconverted into the second SW (reference [16] in the Wikipedia article). The fact that the missing
conversion was not in the on-board SW, but in ground-control procedures, unfortunately weakens this
example for this document's purposes. 

    GROUP - Negative. It was the Mars Polar lander, and the failure is as stated in the document.

6.6.4, 4th bullet: I don't thinks shifts have anything to do with conversions. This bullet is not
related to conversion errors, but to languages with operations that have undefined or
implementation-defined behaviour in some cases (such as shifts of signed integers). 

  GROUP - Disagree, a shift is an implicit conversion to and then back from a binary type.

6.6.5, first two bullets: I don't think these bullets are related to conversions; they are general
advice for avoiding out-of-range values and overflows in computations. The relevant advice is to
check values of conversion arguments before the conversion to ensure that the conversion works as
intended. For example, the Ariane 501 failure, where such checks were considered by the programmers,
but were omitted because analysis of the Ariane 4 launch trajectory showed that the values would be
in range, and processor time was tight. The code worked well for Ariane 4, but failed when it was
reused for Ariane 5, because Ariane 5 has a different launch trajectory and the analysis was not
repeated. 

  Group - Reject. Reading external values is always viewed as an implicit conversion to an internal type. 


6.6.5, 3rd bullet: Add ".. and program handlers for such exceptions."

  GROUP - No. It is self evident that errors generated, either by exception or by setting a flag, must be handled. 

6.6.5, 6th bullet: I don't see any connection between "plausible but wrong default values" and
conversion problems. 

  GROUP - Reject. The advice is about recovery strategy after a failing conversion.

6.6.5, last bullet: Surely unit systems should be respected whenever a value is moved or converted
form one unit system to another, whether this involves numeric type conversions or not, and whether
those conversion are implicit or explicit. I would replace this with more specific advice to
document carefully the unit systems used in all parts of a program, all variables and parameters,
and especially all interfaces between different parts of a program or between different
programs. (The failure of Mars Climate Orbiter was a failure to document interfaces, or a failure to
use and respect the interface documentation.) Even more concretely, one could suggest that all
textual output from a program should also display the units of each output value, and all textual
input to a program should require the input to also specify the units of each input value, but this
may be going a bit too far. 

  GROUP - The advice is correct. Adding a note about either use the language's capability to manage
  this or implement a documentation system to keep the issue at the management review level is a good
  point. 

6.6.6, 1st bullet: This is very general and abstract advice that does not address directly any of the failure mechanisms.

How about: Language designers should extend the type system, or other declarative mechanisms, to let
programmers specify the physical dimensions and physical units of variables, to be checked
statically by the compiler (ideally). The GNAT Ada compiler from AdaCore already has a non-standard
extension letting users specify the dimensions of subtypes, e.g. "length" or "speed", which the
compiler then checks in expressions and assignments. However, as I understand it, it does not yet
handle conversions between different unit systems, but assumes that a single unit system is used
throughout a program. 

   GROUP - Reject We tried the specific approaches initially but received push-back from language designers. We cannot
   tell them how to do their job. 

6.6.6: Further: Language designers and standardization organisations should develop a standard
system of textual notations to identify the physical units of output and input to computer programs,
and extend languages to make it easy for programmers to generate such notations on output, and check
them (against the expected units) on input. Of course there are already standard abbreviations for
many simple units ("m" for meter, etc.) but perhaps not for complex units such as acceleration --
"m/s/s", perhaps? Compare to the standard abbreviations for various currencies -- USD, EUR, etc. 

   GROUP - Reject We tried the specific approaches initially but received push-back from language designers. We cannot
   tell them how to do their job. 

6.8.5, Note 1: You might add that some languages support arrays whose lower bound is negative, in
which case the index type must be signed. 

   GROUP - accept - added.

6.8.6, last bullet: This seems to be relevant only if the language also allows pointer
arithmetic. Othewise, if index bounds are checked when a pointer is set to point at an array
element, the pointer remains valid without further index bounds checks when the pointer is used
(assuming that the referenced array cannot change its index bounds, in which case this would
probably become a dangling-reference error). 

Group - accept.


6.9.6, 3rd bullet (and other references to "automatically extended arrays"): You would not want an
unsigned-index underflow from zero to -1 = largest-possible-unsigned-integer to automatically extend
the array to be large enough to include that index. That could become an attack point for crashing
the program, or even the system on which the program runs. While I understand why automatically
extending arrays are attractive, I would rather recommend leaving the low-level "array" type without
that capability, and instead recommend that languages should provide, and programmers should use,
higher-level "container" data structures that do extend automatically, but where "indexes" are
replaced by "keys" which are easier to use and less error-prone, because key values are usually not
produced by computations that are prone to underflow or overflow. 

  GROUP - accept - solved by narrowing the advice to upper bounds. This is provided in a number of languages.

6.10.5, first bullet: I suspect that this was intended to be two bullets, one sentence per bullet.

GROUP - Good catch - thx.Actually 3 bullets.

6.11.3: I would include: Accessing a data area with a pointer type that refers to smaller or larger
units of data (bits, bytes, words) than the data area actually contains can make the operation
dependent on the bit representation of the data, including its endianness, and so can make the
program unportable. For example, accessing an array of words as an array of bytes will work
differently depending on byte endianness. 

  GROUP - While I agree, there are many different ways that data can be manipulated once you use
  generalized pointers. This clause just highlights the general issue. A detailed discussion of one
  approach makes others seem less critical. 

6.11.3: Further, calling a function through a pointer type that has been converted to specify a
different parameter profile (for example, more or fewer parameters, or parameters of a different
type) can trick the callee into performing normally illegal operations, possibly corrupting the
call/return stack. 

  GROUP - Accept - added.

6.11.4, second bullet: There seems to a superfluous "6.11." at the end of the bullet.

  GROUP - ???

6.11.4: Why not include "Pointers to data can be converted to pointers to functions"? That sounds at
least as dangerous as the opposite conversion, which is listed in the second bullet. 

  GROUP - change sentence to say pointers to functions converted to/from pointers to data

--------------------

6.12.5: I understand none of the motivation behind these bullets. See following commments.

  GROUP - I do not like this complete subclause. Committee should discuss it. Back in the 2006-2010
  timeframe when the first edition was published, we had many pitched battles over such issues. I
  think that this subclause was a real compromise. 

6.12.5, first bullet: Why say "composite types" instead of "array types"? The only other common
composite type is the "structure", or "record", but I don't thinks that present-day programmers tend
to use pointer arithmetic for selecting a component of a structure/record, although that may be
necessary in assembly language programming. 

Group - reject. People walk over records and multidimensional arrays as much as they do "classic" arrays.


6.12.5, second bullet: Using index arithmetic instead of pointer arithmetic just moves the possible
failures into the array-indexing group, but the same harms can happen. Perhaps the point is that
static analysis is easier to apply to index arithmetic than to pointer arithmetic? If so, that would
be good to note as motivation for this bullet. 

Group - Reject. Indexing lends itself to static checking or runtime checking. Pointer arithmetic does not.

6.12.5, third bullet: I've never heard of pointer arithmetic using other than integer
addends/subtrahends. Perhaps this bullet index to prohibit subtracting one pointer from another to
give the distance (number of array elements) between the respective referenced data? That has some
specific problems that are worth a separate section, I think. 

Group - Reject, not true for multi-dimensional arrays and for length computations.

6.12.6: I think the advice in the last bullet of 6.8.6 would be very much in place here: implement
pointer types that do array-bounds checks. 

Group - good catch. Agreed.

----------------------------

6.13.5: Perhaps suggest the use of static analysis to ensure that the program will never dereference
a null pointer. 

  GROUP - Accept

6.13.6: Note that Ada now allows the declaration of pointer types that exclude the null value, which
means that this check can be applied already when a value is assigned to the pointer, and also means
that the compiler can sometimes check that statically, for example by ensuring that every variable
of such a type must be initialized (and not default-initialized to null). Perhaps similar types
should be suggested for other and new languages in this section. 

  GROUP - Accept

6.14.3, 1st bullet: I don't understand what it means for an object "to become undefined" and how
that is related to dangling references. Should it instead say "to be deallocated"? 

  GROUP - Agreed

6.14.4, 2nd bullet: I don't understand what is meant by "constructs that can be parametrized" and
how that is related to dangling references. 

  GROUP - Take under consideration.

6.14.5, first two bullets: I consider this advice impractical, except for some rather special
situations. For example, the advice in the second bullet may be appropriate for batch-oriented
programs that need and use all the memory they allocate, until the end of the computation, when it
can all be released implicitly when the program terminates. 

  GROUP - Disagree. First bullet is for lock-and-key allocation systems. Second bullet addresses many
  programs written that do everything statically, such as embedded applications. 

6.14.5: I would add: Whenever pointers to heap objects are passed between modules or subprograms,
  document carefully, in the module/subprogram interface descriptions, which module, if either, is
  responsible for deallocating the heap object. 

GROUP - Added in principle.

6.14.5: I might also add: Consider using a language (such as Rust) where the compiler tracks and
checks the "ownership" of heap-allocated objects accessed through pointers. 

  GROUP - Added advice but stay away from recommending language choices.

6.14.6, second bullet: This advice is completely general and has no direct relation to dangling
references. It should at least suggest what kind of "properties" would help to avoid dangling
references. 

6.16.6, first bullet: This is drastic advice. Would it not be enough to limit logical shifts to
unsigned integer types? 

  SM - Yeah, maybe.

-- Stopped here!!!!!!!!

6.17.1: The discussion is hard to follow because it does not separate clearly between programming
languages and natural languages. It would become clearer if "programming" or "natural" were added
before each use of "language", as appropriate. 

  Group - OK

6.17.4, 2nd bullet: This is rather abstract. What is "redundant coding of subprogram signatures"? I
also don't understand how pre/post-conditions help with the problem of mistaken names. Is the point
that if the wrong subprogram is called, its pre/post-conditions are more likely to fail than if the
correct subprogram were called? But the bullet seems to say something different ("... do nothing if
different subprograms are called.") 

  Group Ð removed bulle because it does not appear to relate to the vulnerability.

6.17.5: Perhaps add: If the language allows the use of formal-parameter names in calls to associate formal parameters with actual parameters, use that facility, because it allows the compiler a
further check that the intended subprogram is being called (by a check that the correct
formal-parameter names are used). (Perhaps this is what 6.17.4 calls "redundant coding of subprogram signatures"?) 

*   GROUP Ð Accept. Added sentence Use language features such as preconditions and postconditions or named parameter passing to facilitate the detection of accidentally incorrect function names.


6.17.5: Perhaps add: If the language has a strong type system, define separate types for separate
purposes to make it more likely that mistakes in similar variable names are detected as type errors
at compile time. 

  GROUP Ð covered in 6.2 ÒType systemÓ

6.17.5: Coding standards sometimes suggest to create a project-specific or even company-wide
glossary of terms and abbreviations, to be systematically used in names within that project or
company. This seems like good advice, but is perhaps not sufficiently related to the present
vulnerability to be included here. 

GROUP Ð agreed that it is not sufficiently related.

6.17.6: Perhaps add: Consider requiring (if not already the case) that implementations use all the
characters of a name when comparing names, instead of some fixed number of leading characters. (Failure to do so was mentioned as a language or implementation problem in 6.17.1.) 

GROUP Ð Agreed, added.

6.18.4: I don't know if single-assignment, or definitional languages  (functional "let" languages)
count as "providing assignment", but surely they too can suffer this vulnerability: a "variable" (a
name) is defined and bound to a value, but that value is never used. 

  GROUP Ð Accept. 

6.18.5, last bullet: This advice _creates_ a dead assignment, instead of removing or detecting
one. Why is this advice here? It could be listed as a legitimate reason for a dead assignment, in
the 3rd paragraph of 16.8.3. Although it is very likely that a compiler will remove such a dead
assignment, unless the variable is marked volatile (which may slow the program down). 

  GROUP Ð Agreed and implemented.

6.18.6: Perhaps add: Consider providing a statement to "erase" the value of a variable, by
overwriting its value with an innocuous value, and not to be considered a dead assignment even if
the variable is not volatile. (This would eliminate the need for the "erasing" but dead assignment
suggested in the last bullet of 6.18.5. However, perhaps the whole issue of leaking or erasing
sensitive values should be in some other section.) 

  Group Ð While the idea is correct, it does not contribute to the avoidance of the vulnerability.

6.20.5, 3rd bullet: This seems to be the first place that uses the phrase "hide" for nested scopes
that declare variables with the same name. Even the title of 6.20 calls it "reuse" of identifier
names. I think "hide" is more descriptive than "reuse", so I suggest that "hide" should be used
more, and at least defined in 6.20.1 or 6.20.3 as a synonym for "reuse". 

GROUP Ð Reject Ð insufficiently wrong. Changing a section title affects all parts of 24772.

6.20.6: Sub-section 6.20.1 describes the problem correctly as concerning "entities" declared with
the same name in nested scoped. But sub-section 6.20.6 speaks only of "variables", not "entities",
at least in its first two bullets. Should not the term "entities" be used here too? 

 GROUP Ð Agreed, changed.

6.22.3: It could be useful and illustrative to list here some reasons why a variable may be left
uninitialized, at least for some time after the creation (declaration) of the variable. These
include: simple omission of the initialization by programmer mistake or forgetfulness; the lack of a
sensible ("non-junk") value with which to initialize the variable in its declaration, with the
sub-case where the type of the variable is "private" and all values of this type must be produced by
some computation in calls to other modules; the presence of complex control flow (conditionals
nested with or within loops) where it is hard to be sure that the variable is assigned a value
before its first use. 

GROUP - Reject - This is not a tutorial.

6.22.3, last sentence: The semicolon at the end should probably be a period.

SM - Fixed

6.22.5: The possibility of using dynamic checks for uses of uninitialized variables, with tools such
as Valgrind or Purify, is not mentioned. Although static analysis is preferable, it if works,
sometimes it is hard to make it work, while dynamic checks are easy to enable in test runs.

GROUP Ð Agreed, bullet added.

6.22.5, 4th bullet: Initializing all variables in their declarations is not generally to be
recommended because in many cases the initial values must then be dummy values, and dummy values can
cause some of the same kinds of problems as uninitialized values cause. However, in some cases (such
as null values for pointers) dummy values are safe (if dereferencing null pointers is checked). In
fact, the advice in the 6th bullet, to avoid "junk initializations", contradicts the advice in this
(4th) bullet. Perhaps this can be corrected by adding to the 4th bullet a condition that the initial
values be "real", non-junk values. For example, initializing a "sum" or "counter" variable to zero
is often the correct initial value.

GROUP  - "No junk initialization" is bullet 6, weakened that bullet to Òconsider ÉÓ

6.22.5: Perhaps add: When a module provides a type for use by other modules, and especially when the
type in question is private (opaque to the client modules), ensure that the type either has a default initialization that lets the program avoid the use of uninitialized values or uninitialized components of the value, or that the module also provides such an initial value as a constant with
which a variable of this type can be explicitly initialized, or a function returning such an initial
value if a constant cannot be provided. 

GROUP Ð Reject. Too specific. Constructors simply move where default values are set.

6.22.6: Perhaps add: Extending the language to make it easy for the programmer to defer variable
declarations until a point in the execution where a good initial value can be assigned in the
declaration itself or immediately after the declaration. For example, consider allowing the
interleaving of statements and declarations, either with or without explicit syntax to create new
declaration scopes within a sequence of statements.

GROUP Ð Reject. Too contentious. 

6.22.6: Perhaps add something about help or automation to define a valid order of initialization of
modules, when the modules have dependencies? That is, the Ada elaboration-order problem. The point
being that an order is wrong (at least) if it means that some initialization step tries to use
something that has not yet been initialized, therefore an automatically generated correct order
would avoid some of the "uninitialized variable" vulnerability. 

GROUP Ð Reject. This is a topic for PHD theses.

6.23.4: This applicability definition does not cover APL, where the rule is very simple, but very
unusual and in conflict with normal conventions. So perhaps add "or unusual" as an alternative to
"complex".

GROUP Ð Reject. ÒunusualÓ but simple is not a problem.

6.24.1: Shouldn't "side effect" also include effects on the environment, such as producing some
output, or commanding some mechanical activity (in a robot)? Unpredictable ordering of such external
side effects is also undesirable. Consider something like the Ada: 

   move_is_ok := turn_left_is_ok(angle) and drive_is_ok(distance);

It makes a great difference if the robot first turns and then drives, or if it first drives and then
turns. 

GROUP Ð Accept in principle. Text added.

6.24.1, 1st paragraph: The example in parentheses should say that it
is taken from the C language. 
GROUP Ð fixed.

6.26.3, legitimate reasons for dead code: I would divide these reasons into two groups: first, real reasons for deliberately including dead code, such as the last bullet; second, practical reasons for
not excluding dead code, such as unused parts of libraries. The first group is really legitimate;
the second is less legitimate, because it results from short-comings of tools (e.g. linkers that do not omit uncalled functions) or other incidental reasons. 

GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite.

6.26.3: Add a further legitimate reason: Code that is foreseen to possibly become useful if the
program is modified in situ, e.g. by patching embedded SW in remote systems such as spacecraft where
it is cumbersome to replace the entire SW or to add much new code to the SW. But perhaps that is
covered by the 4th bullet (code that may be needed soon). 


GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite. 

6.26.3, 3rd last paragraph: If there is code that the developer believes is never going to be used,
this means that there are no tests that execute that code (with the exception of unit tests). That
makes the inclusion of this code very risky and inadvisable. 

GROUP Ð The net benefit of the suggested restructuring appears insufficient to justify such a massive rewrite.

6.26.5: The third bullet should be placed first, as it is the first thing to be done. Possibly include the static-analysis suggestion from the last bullet in this (first) bullet: "Identify any dead code in the application, perhaps with static analysis tools, and
...".

GROUP Ð disagree. Small order adjustments made to focus on static analysis, followed by justifications and code marking.

6.26.6: Perhaps: Consider including standardized annotations or other means to identify code that is
or may be dead, but that is deliberately included and should remain in the executable program. This
will help to reduce false positive warnings from static analysis tools or branch coverage
measurements, and will enable such tools to warn if the code is not actually dead. It will also
provide a standard way to document the deliberate inclusion of dead
code.

GROUP - too far of a reach for this document.

6.27: Why is "and static analysis" included in the title? Is static analysis considered a
vulnerability? :-) While section 6.27.5 does recommend static analysis to find errors, so do many
other sections without having "static analysis" in their section
titles.

GROUP Ð agreed, changed and actual vulnerabilities documented.

6.28.3, second sentence: I think "lay" should be "lie". The text is
not talking about laying eggs.
GROUP Ð Agreed in principle, text rewritten.

6.28.5: The Pascal if-then-else example ends with a big, thick, unmatched closing parenthesis. A
formatting problem?

GROUP Ð agreed Ð bad form Ð fixed.


6.28.6: Today, most programmers use language-sensitive IDEs. Such IDEs usually provide means to
identify and display nested structures, such as automatic code indentation, controllable "folding"
of control structures, and high-lighting matching opening and closing delimiters, and section 6.28.5
suggests that programmers should use them. Perhaps add to section 6.28.6 advice to ensure that IDEs
have such facilities. Perhaps all sections on language design and evolution should generally include
suggestions for IDE functions as well a for language features, because IDEs are becoming as
important as languages and compilers, in the daily work of
programmers.

GROUP - Good suggestion, but this is not applicable to language design.

6.29.3: Perhaps add that a code reviewer may misunderstand the code, based on this assumption, which
may make the reviewer fail to detect an error in the code (false negative), or to report a spurious
error (false positive).

SM - Maybe. But this subclause is weak and needs improvement. It never
documents the actiual vulnerability. Formal analysis or human analysis
depends on being able to separate out the loop control logic from the
loop execution logic. Modifying the loop control variable effectively
makes the loop "spagetti code".
GROUP Ð comments led to a significant rewrite.

6.30.1, 2nd paragraph after the bullets: I fail to understand this sentence. The meaning of
"relationships between components", "existence of bounds value", and "changes the conditions of the
test" are all unclear. An example would be necessary to make it
understandable.

GROUP Ð Agrees, removed.

6.30.1, last paragraph: Surely this error is more likely to lead to a buffer bounds violation, with
all its possible security effects? As pointed out in the first bullet
in 6.30.3.

GROUP - Agreed. Rewritten.

6.30.5: Add a bullet: Distinguish conceptually, and in design, code and comments, between offsets and lengths. For example, in an array with index bounds 0 and 5, the offset from element 0 to element 5 is 5, but the length of the array is 6. The offset from element 2 to element 4 is 2, but the length of that sub-array (the slice 2..4) is 3. Note that "distance" is a confusing word here,because depending on context it can mean an offset or a length.

GROUP Ð Reject Ð There is no language that confuses these items, just humans, which is why the final bullet suggests encapsulation.

6.31.5: How does the third bullet differ in meaning from the first
bullet? They seem redundant to me.

GROUP Ð fixed.

6.32.3, 3rd paragraph, 3rd sentence: This paragraph is dedicated to call-by-copy, so this sentence,
which discusses the control of changes to formal parameters by means of "in", "out", "in-out"
labels, is out of place, because it applies as well to call-by-reference. The sentence should be put
before the second paragraph, in a discussion that separates the parameter-passing method (pass by
reference or pass by copy) from the data-flow direction (in, out, or in-out). The last sentence of
the 4th paragraph should also be moved into that discussion (and the part "by constant pointers"
should be changed to "by marking a parameter as a pointer to a constant, for the purposes of the
subprogram).

SM - Good catch. Should have been new paragraph. Also fixed Òconstant pointerÓ
GROUP Ð Agrees that this clause needs work. Left for a future revision.

6.32.3, 5th paragraph: Swapping two values with the exclusive-or method is an esoteric algorithm
that is probably unfamiliar to most readers. A clearer example is needed. For example, consider a
subprogram signed_sqrt(in x, out y) that returns y as the square root of |x|, provided with the sign
of x. This could well be coded as: 

   y := sqrt (abs (x));
   if x < 0 then y := -y; end if;

If x and y are called by reference, this would never return a negative value if called as
signed_sqrt (x, x), because the condition "x < 0" would test the new value of x as set by the first
assignment to y, which is never negative. Passing x by copy-in and y by copy-out (or by reference)
would solve this problem. 

GROUP -  Disagree Ð short examples are preferred to longer examples.

6.32.3, 6th paragraph, last sentence: Assuming C-like call-by-value, a subprogram can "pass back
pointers to anything" only through a parameter that is a pointer to a pointer; it is not enough to
just have a pointer to a data structure as introduced in the preceding sentence. And even languages
with complex parameter-passing mechanisms can have "out pointer" parameters through which the
subprogram can "pass back" pointers. Whether those pointers can "point to anything whatsoever"
depends on other language features, not on whether parameters are passed by value or reference. I
think this criticism of C-style parameters is overstating the case; the main problem in C is that
explicitly passing a pointer (by value) is the only way to get pass by reference and the only way to
get "out" and "in out" data flow, but passing a pointer does not separate "in" from "out" from "in
out". However, the "const" qualifier can deny the "out" direction.


GROUP Ð Agrees that this clause needs work. Left for a future revision.

6.32.3: Although 6.32.1 and 6.32.4 also talk about function return values, 6.32.3 does not mention
return values at all. The same questions regarding "by reference" or "by value" apply to return
values as to parameters, but the data-flow-direction questions do not apply; the direction is
"out". Aliasing could happen in the case of return-by-reference if the function's return value is
assigned to a variable that is also a call-by-reference parameter, for example x := signed_sqrt(x),
using an obvious modification of the signed_sqrt example in an earlier comment. Section 6.32.3
should say something about potential problems in return-value
passing.


GROUP Ð Agrees that this clause needs work. Left for a future revision.

6.32.5, second sub-bullet: I don't see how aliasing can be introduced by using expressions or functions (I assume this means "function calls") as actual arguments, unless the expression or the function value are themselves some kind of references to other objects that are non-local or also passed as reference parameters in the same call. In Ada, an array slice is, I believe, the only such "reference" expression that does not explicitly involve a pointer (access value). If a function returns a pointer, aliasing is not reduced by storing that pointer in a temporary and then using the temporary as the actual argument instead of the function call, as suggested in this bullet. In
short, I think this sub-bullet should be reconsidered and clarified. If the present advice is followed blindly for parameter types that are not pointers (e.g. integers or floats), it will add unnecessary complication to the code.

GROUP Ð Fixed ÒfunctionsÓ to Òfunction calls and changed ÒaliasingÓ to Òaliasing effects. Consider in a future revision.

6.32.6: The sentence lacks a terminating period. Is the sentence complete, or has some part been
truncated?

SM - Fixed. Thx.

6.33.1, first sentence: Should "treating" be "taking" or "using"? We are not assuming that the
address is ill with something, and the illness should be treated?

SM Ð Agreed - Fixed.

6.33.1, second sentence: The character used before "Access" and "Address" seems to be a back-tick
(`) and not an apostrophe (') as Ada requires.

SM - Thx. Fixed.

6.33.3, last paragraph: In addition to interrupts, it may happen (in C++ or the like) that returning from the subprogram implicitly invokes some destructors (finalizers) that may also use the stack for their own purposes, and may thus also overwrite the data. Furthermore, there are computer architectures in development where the stack "rubble" (anything that is no longer part of an active frame) is implicitly inaccessible, causing a protection-fault trap if the program tries to access it, making this idiom unusable and such code unportable to these architectures (but happily the first test would expose the problem).

GROUP Ð Disagree Ð Stack cannot be corrupted by correct constructors. 

6.33.5, 2nd bullet: Perhaps this advice should be moderated so as not to apply to local variables
that are statically allocated and not stack-allocated. Some languages allow such variables. Of
course, such variables have their own problems, mainly the question of how long they retain their
values in the face of concurrency and new calls of the same function.

GROUP - Disagree Ð We are not using Òlocal variableÓ in the C style for local ÔstaticÕ variables.

6.33.6, first bullet: Same comment as for 6.33.5, 2nd bullet.

GROUP - Disagree.

6.34.3, first paragaph: This seems to assume that arguments are pushed by the caller, but popped by the callee. That is not always the case.

GROUP Ð This is not assumed. The statement is Òis pushedÓ with no allocation of responsibility.

6.34.3, first paragraph: In addition to a possible mis-match in the push and pop, equally likely and
severe problems can occur when the callee accesses some part of the stack that the callee assumes to
be an argument of a certain type, based on what the callee knows is its signature, but (because the
caller uses a different signature) that part of the stack is not actually an argument of that type,
but perhaps a local variable of the caller, or a parameter or local variable of a different type, or
only a part of such a local variable or parameter, or a concatenation of (parts of) several adjacent
local variables or parameters. The callee might then read some secret of the caller, or might alter
some local data of the caller.

GROUP - Already implicitly covered

6.34.4, last paragraph: "For .. then" is not grammatical. Reword as "Parameter mismatches are
particularly likely for functions that accept a variable number of
parameters."

SM Ð Could not find.

6.35.3, first paragraph, last sentence: The word "and" seems wrong; the sentence starts with "if",
but does not lead to a consequence. Perhaps "and" should be replaced by a comma: "If stack space is
limited, the calculation of some values ... resulting in the program
terminating".

SM - Thx. Fixed.

6.35.5: Add: Use static analysis to detect non-obvious recursive call paths such as indirect and
long recursive call cycles.

SM - OK

6.35.6: Perhaps add: Consider providing language and implementation functions to (a) let the
programmer specify the amount of stack space the program (or each thread) should be allocated; (b)
let the program measure, during or after execution, how much stack space is/was actually used, (c)
during execution, check for stack overflow and if it happens, signal a fault before any data are
corrupted or other abnormal execution occurs, with (d) preferably a way for the program to handle
stack overflow without aborting.

Group - changed "stack" to "memory" since some implementations do not use a stack, and heap exhaustion
is also a possibility.

6.37.1: This seems to confuse together two different kinds of type breaking: (1) overlays of objects
with different types in the same storage area, and (2) copying bits from an object, where they
represent one type of data, verbatim to an object where they are interpreted as some other type of
data. It seems to me that (1) is much more risky than (2), but they are not clearly separated in
this section.

Group - reject. we do not want to preferrentially single out one way that reinterpretation of data can happen.

6.37.3, 3rd to last paragraph: Aliasing of parameters is not a case of this vulnerability, because
aliasing does not involve type-breaking reinterpretation of data (except for some corner cases
involving changes to the discriminants of records/unions). Remove this
paragraph.

Group - agree. Removed.

6.37.3, 2nd to last paragraph: In addition to Unchecked_Conversion, there are several other ways in
Ada to do type-breaking reinterpretation of the overlaying kind, such as specifying the Address of a
variable to be the same as that of another variable.

Group - Reject - language-specific mechanisms that can result in this vulnerability belong in the
language-specific documents.

6.37.3, last paragraph: I would have expected some reference to the "Pointer type conversion"
vulnerability [HFC], section 6.11.

Group - OK

6.37.5, 4th bullet: This advice, including the use of 'Valid in Ada, applies to all
reinterpretations of data, not just when the mechanism is pointers with different underlying types
as suggest in this bullet. For example, Unchecked_Conversion involves no pointers, but the common
advice in Ada is to check the result with 'Valid.

Group - agree in principle. Changed accordingly. 


6.37.6: Perhaps add: Reducing the need for reinterpretation by providing standard functions to
decompose and recompose values of standard types from its parts, for example, functions to extract
the exponent or the mantissa of a floating point value, and functions to compose a floating point
value from an exponent and a mantissa.

Group - Suggested advice is too specific. 

6.38.1, next to last sentence: In principle, it is just as likely that a given algorithm _requires_
shallow copying, to ensure the desired sharing of referenced objects, and will fail or perform
poorly if a deep copy is performed instead. This is the case, for example, in "incremental" or
"functional" data structures where the sharing of sub-structures is desirable, and is in fact the
point. This is reflected in the first bullet of 6.38.5, good, however I would not call it
"aliasing", unless all cases where there are several pointers to the same object should be called
"aliasing" (which I doubt).

Group - disagree. Aliasing is usually defined as any way that two or more names can refer to the same
object or field in an object..

6.39.4, 2nd bullet: Why is there no possibility of heap fragmentation in this case? It seems to me
that this depends entirely on the garbage collection mechanism -- on whether that mechanism
coalesces adjacent free blocks, or compacts the heap, or employs a
two-space approach, etc.

Group - agree. Fixed.

6.39.5: Perhaps add: Use languages (such as Rust) that track the ownership of heap-allocated memory
blocks and so make it simpler and surer to deallocate them at the
proper time.

Group - agreed. Guidance added. Also added guidance about using reference counting.


6.39.6: Perhaps add: Adding mechanisms to control and track the ownership of heap-allocated memory
blocks to ensure that deallocation happens at the right time, whether
explicitly or implicitly.

Group - reject - Such advice is premature, especially considering the age and complexity of existing
languages.

6.39.6: Perhaps add: Defining standard "container" data structures which encapsulate the management
of dynamic memory.

Group - reject - storage pools cover language-provided containers, and container libraries are not
considered to be a language issue.

6.40.3, last paragraph: My understanding of C++ templates is not complete, but I think there is an
additional risk with special-case, explicit instantiations: If a compilation unit intends to make
use of such a special-case instance, and #includes the file that defines the template itself (the
general case), but mistakenly omits to #include the file that defines the desired special instance,
the compilation will often succeed but the code will use the general case instead of the special
case, and so may not work correctly. Advice to avoid this problem might be to ensure that all
special-case instances are defined in the same header file that
defines the template itself.

Group - Way too language-specific to include in the Part 1. Part 10 should address this.

6.41.1: 1st paragraph, 2nd sentence: I don't agree that object-oriented systems are design to
"separate ... code and data", quite the opposite; the object-oriented idea is to combine certain
data (object components) with the related code (object operations) in the same "class"
concept. Perhaps I misunderstand the sentence, and the intent is to say that object orientation aims
to encapsulate closely related data and code (in a class) and also to separate that data and code
from the rest of the program. Perhaps the sentence should be reworded. 

Group - reject - do not understand comment. Text says "encapsulate", not "separate"

6.41.1, last paragraph, last sentence: I've never seen "object brokerage" used in this way, and web
searches don't show up any such uses. The most common combination of "object" and "brokerage" is in
CORBA, that is, network-scale object-oriented communication and service invocation. Consider using a
different term, perhaps just "multiple inheritance".

Group - agreed, "object brokerage" changed to "languages"

6.41.3, last bullet: There are so many "-ing" words in this sentence that it is hard to
understand. Perhaps reword as 'Directly reading or writing visible class members, instead of
callling the corresponding "get" and "set" member functions which may include additional
functionality that should be executed for every such read or write access.' Moreover, it is not easy
to see how this problem is related to inheritance. Perhaps the explanation is that the "get" and
"set" functions are implemented in a parent (ancestor) class, while the programmer who codes the
direct accesses in some subclass method does not see those functions and is not aware of their
existence. On the other hand, if the programmer knows aboute the class members that are directly
accessed, why should the programmer not know about the "get" and "set"
functions?

Group - agreed - rewritten.

6.41.5, 5th bullet: I fail to see how methods that provide "versioning information" help. Is the
information meant to be used at run-time somehow? Or how is it meant to be used to help with this
vulnerability? Same comment for the first bullet of 6.41.6 ("common
versioning method").

Group - agreed - deleted.

6.41.6, 2nd bullet: This advice would be better addressed to IDE developers, because the information
is of much more use if provided interactively. See my comment on 6.28.6, above. Alternatively,
compilers could emit this information in the debugging info, for use by call-graph-display programs
(some C++ compilers already do this).

Group - Language development could also include IDE development. We do not address IDE 

6.42.3, first paragraph, first sentence: I think "the client has mechanism" should be "the client
has no mechanism".

SM - Good catch - thx.

6.42.6: Perhaps this advice should also require that the language mechanism for pre/post-conditions
should obey (and check) the rules for the Liskov substitution
principle.

Group - reject - such checks are not feasible in general.

6.43.6: Perhaps add: Consider extending languages to allow the specification of "layers" of
operations/methods so that the compiler can check that recursion cycles, as described in 6.43.1,
cannot happen. In that example, the A and B methods would be in different layers, with the layer
order specified to let any instance of A call any instance of B, but forbidding calls in the other
direction. In effect, extend the language to support the second bullet in 6.43.5, with mandated
checking that it is obeyed.

Group - reject - Too specific.

6.44.1, next to last paragraph: The relevance of section 6.11 is not evident; polymorphic variables
are not necessarily pointers. A better formulation would be to say that "Unsafe casts allow
arbitrary breaches of safety and security, similar to the breaches
described in section 6.11...".

Group - agree - Implemented

6.44.3, first paragraph: This mechanism is not clear or not sufficiently explained. Why should an
inconsistent state result in this scenario? In fact section 6.44.1 says that upcast + calling a
parent method _avoids_ inconsistencies.

Group - disagree - mechanism is explained explicitly in the final sentence of the paragraph on
upcasts in 6.44.1.

6.46.2: There is an unusual amount of vertical white-space around the
second line.

SM - Thx.

6.47.1: There are many other possible differences and mismatches between code produced by compilers
for different languages, or even different compilers for the same language. For example, different
compilers may make different choices on register usage: which registers are caller-save, and which
are callee-save; which register is used for the return address in jump-and-link instructions; and
even in some cases which register is used for the stack pointer and in which direction the stack
grows, with some compilers even using two stacks where others use one. 

In some cases the same processor may support two or more instruction encodings (e.g. 32-bit ARM and
16-bit THUMB). Different compilers may emit code with different encodings, requiring special care
when linking such mixed-encoding object files into one program.

The way to avoid such problems is for the processor vendor to define an Application Binary Interface
(ABI) standard for the processor architecture. A compiler that follows the ABI can then produce code
that is compatible with code produced by any compiler that follows the same ABI, at least for
language features and processor features covered by the ABI. For example, the DEC VAX had a strong
ABI defined by DEC.

Group - reject - That is all true, but far beyond what we consider reasonable for this document.
                 Most of the issues mentioned are subsumed under "calling conventions"

6.47.3, 2nd paragraph: The discussion of identifiers should also cover the linking process, the
possible compiler-specific decoration or mangling of identifiers into linker symbols, and the means
that a language may have to specify the linker symbol when importing or exporting objects or
functions from or to other languages.

Group - accept in principle. Text added.

6.47.3, 3rd paragraph: As far as I know, the Pascal standards do not define the representation of a
"string" variable. The representation shown here, and its C equivalence, is produced by some
specific Pascal compiler - perhaps some Turbo Pascal version. Moreover, the size of the "int" in the
C form may depend on which C compiler is used. The paragraph should make clear that the details
depend on which Pascal and C compilers are used. Ok, the very last sentence in 6.47.3 says something
to that effect, but that comes very late, and can be understood to apply only to the numeric data
types.

Group - Put in "may correspond"

6.47.5: Add bullet: Use compilers that follow the standard ABI for the target processor (and choose
a target processor with a good and comprehensive standard ABI).
Group - reject - This document does not address ABI.

6.47.5: Add bullet: use the compilers' means to control the linker symbol used when exporting or
importing objects or functions instead of relying on compiler-specific automatic mappings between
source-code identifiers and linker symbols.

Group - First bullet is intended to cover this.


6.47.5, 1st bullet: Note that the Fortran/Ada inter-language specifications for C only apply when a
compatible C compiler is used, with compatible options. This again involves the processor ABI. Some
C compilers have options to define the size of the "int" type, and whether "char" is signed or
unsigned. The Fortran/Ada inter-language specifications cannot automatically cover such variations
on the C side.

Group - reject - we must assume implementations that conform to standards. We can consider a
vulnerability associated with ABI's for a future revision.

6.47.5, 2nd bullet: Replace "languages" with "languages and
compilers".

Group - accept - used "languages and language processors".

6.47.5, 3rd bullet: All sub-bullets concerning identifiers should also involve the abilities of the
linker to use long names, to separate upper/lower case characters, as well as the compiler-specific
decorations and manglings. Perhaps better general advice would be to always specify the linker
symbol for all items of the inter-language interface, to do that on both sides of the interface, and
to ensure that the chosen linker symbols are acceptable to the linker and cannot be confused with
other symbols, for example symbols used in the run-time system or in the standard libraries (of any
language or compiler involved).

Group - Reject - Too specific and the advice is arguable.

6.47.6: The advice is very general. Perhaps some more specific advice could be added, such as to
ensure that the language has standard ways to define the linker symbols ("external names") to be
used for all imports and exports. There could also be advice for language implementors to follow the
standard ABI for the target processor, where possible, and to document any deviations from the ABI,
and perhaps to warn the programmer if any exported or imported entity
relies on such deviations.

Group - That does not work. We get too much push-back from more specific advice.

(Should the document structure have some general provision for advice to language implementors?
Currently it has dedicated sections for advice for language designers, and for language users
(programmers), but that leaves a gap in between: the language implementors. There are some cases of
advice to implementors, however, usually in the sections devoted to language design implications.)

Group - Reject. This is out of scope of this document.

6.47.5 (not 6), 3rd sub-bullet: If interpreted literally, this would make it impossible for a callee,
imported from another language, to return any kind of error indication to its caller. I suspect the
intent here is to warn against trying to use "sophisticated" error-reporting mechanisms such as
exceptions to report errors across languages. But if exception propagation is defined in the ABI,
and both compilers obey the API, it should work. Better to just warn programmers to ensure that
exception propagation across languages works, before they try to use it, and perhaps to warn them
that exception progation across languages may not be portable to diffent compilers or different
target systems.

Group - Agree in principle with the interpretation. Added explanatory words to 6.47.1 to justify
the advice.

6.48: I would divide this section into separate sections on dynamically linked code on the one hand,
and self-modifying code on the other. The issues are very different in the two cases. For instance,
almost all applications executing on typical PC operating systems use dynamic linking to OS
libraries, while few PC applications implemented in ahead-of-time-compiled languages such as C or
Ada (non-JIT languages) use self-modifying code. Dynamically linked libraries are a special case of
a wider vulnerability concerned with how the execution environment affects programs -- for example,
many applications depend very much on various program-specific configuration and option files, which
they read at start-up, often using search paths similar to LD_LIBRARY_PATH. Attackers can try to
insert their own harmful versions of those configuration files somewhere in those paths to override
the benign files, or simple user mistakes can result in the wrong
files being used.

Group - reject - Distinguishing the cases in more detail does not change the fact that the attack
mechanism is to thereby execute arbitrary code, hence we find no compelling reason to split for
this release.


6.48.1, 2nd paragraph, 2nd sentence: Historically, I believe that self-modifying code was introduced
for machines that lacked some fundamental features such as index registers, indirect-addressing
modes, indirect branch instructions, or return-address stacks. I haven't seen small memory sizes
blamed for self-modifying code, although the memories of those ancient machines were small, of
course

Group - Added a general statement with small memory as a "such as"

6.48.1, 3rd sentence: By far the most common use of self-modifying code today is in JIT
compilation. This is now so common that some people prefer not to use the negative-sounding term
"self-modifying code" for it, but that is what it is, although the modification is limited to
specific and benign cases, as it generally was in the ancient machines where it was first
introduced.

Group - accept - rewritten.

6.48.1, 3rd sentence: Self-modifying code is also used in many embedded systems to enable remote
update of the software, either as a whole or by patching here and there. However, it could be argued
that this is an "operating system" update feature, although the "operating system" in these cases is
often completely fused with the "application". This use or mis-use of self-modifying code is both
hugely important and risky for security, as SW in embedded systems such as network routers or other
IoT devices is very much under attack. It is risky because attackers can use it for harm; it is
important because SW maintainers can use it to patch vulnerabilities in fielded devices. I suggest
that the section on self-modifying code should discuss cases where some degree or kind of
self-modification is benign and useful, such as JITs or remote code
updates.

Group - Reject - Added explanation hides the original concepts.

6.48.5: Perhaps add some advice for the benign uses of self-modifying code, in particular for code
updates in embedded systems. The main advice could be to limit or avoid the "self" by separating the
SW that can modify code (the update/patch functions) from the SW that can be modified (the
application). This approach is commonly used in spacecraft on-board SW to reduce the amount of SW
that has to be designed and qualified to higher Design Assurance Levels to just the update/patch
functions. Those functions are often separated into a separate executable called the "Boot and
Service SW", which is held in read-only memory and cannot be patched. 

Group - reject - We do not want advice to encourage dangerous constructs, no metter how benign
the context may appear.


6.48.6: I don't think that this suggestion (checking library signatures) is an issue for language
design. It could be an issue for language implementation (see my general comment on 6.47.6), for the
definition of program formats like ELF, and for the implementors of dynamic linkers and operating
systems.

Group - Reject - clearly type checking at library boundaries IS a language design issue.


6.49: The points about the target-system ABI that I made in my comment on 6.47.1 are very relevant
here too.

Group - reject, as per earlier replies about ABI's.

6.50.6: Have you considered making the languages enforce exception handling by: (1) always including
the "propagated exceptions" set in the signature of a subprogram, and checking that it matches the
code of the subprogram; (2) checking that the signature of the "main" subprogram does not contain
the propagation of any exceptions; in other words, checking that all exceptions are handled? (I
understand that Java offers this feature, but that it has some problems that have limited the actual
usage of the feature. There have also been discussions of such a feature for Ada, but it is not
included in the Ada 202x proposal.) Then, these signatures should be included in the ELF or other
such file, both on the export side (provided interface) and on the import side (expected interface),
and checked by the linker, whether static or dynamic.

Group - reject - This issue is being vigously discussed in Ada, Java and C++ language communities, and
there is no consensus.

6.51.1: Perhaps add: Library interface descriptions that include macro definitions may not be
translatable to other languages, and so may prohibit or hamper the use of the library from other
languages, and perhaps lead to interfacing errors (vide sections 6.47
and 6.49).

Group - Reject - does not apply to this vulnerability.

6.52.5, last bullet: Asking programmers to consider arbitrary HW faults is non-productive, I
fear. HW faults should be considered only where they could affect critical or irreversible
actions. These and other HW failures should be mitigated or prevented by HW means (EDACs,
check-sums, and redundancies). I don't have much hope for SW-implemented error-detection
strategies.

Group - accept. Point removed.

6.52.6: Perhaps suggest that languages/compilers which currently suppress checks by default should
instead enable them by default. Of course this brings the risk that some programs which worked
before (or seemed to) will now fail because some checks fail. And other programs may run more slowly
than before, perhaps failing real-time deadlines. But still.

Stopped here 24 May 2021.

SM - for discussion.

6.53: This (quite short) section seems to be completely redundant with other sections. Why not
delete it?

SM - reject. The list of vulnerbilities is fixed for this iteration.

6.54.1: However, often it is hard to find a consensus on which language features are obscure,
complex, difficult to understand, or hard to use.

SM - Agreed, even withing one language commuity, let alone across languages.

6.54.5, 2nd bullet: Better to move the parenthesis "(organizations)" to the start of the bullet, as
in the 3rd bullet.

SM - Good catch.

6.54.5, last bullet: The static analysis could also check that the coding standards are followed (no
use of forbidden features).

SM - OK

6.54.6: Perhaps add some positive suggestions: Improving the description of complex features in the
language standard. Modifying features to make them simpler or easier to use, without removing them
entirely.

SM - Too unfocused

6.55.3, 1st paragraph: I find the term "unspecified behaviour" a little strange for something where
the implementation is only allowed to choose between a limited number of specified
behaviours. "Unspecified" sounds absolute (what is later called "undefined behaviour"). I would call
it "alternative behaviours", or even "limited implementation-defined behaviour". Perhaps it would
help linear readers of the document to collect the definitions of all three terms (unspecified,
undefined, implementation-defined) into 6.55.1, with references to the
discussion in 6.56 and 6.57.

SM - The C/C++ and Ada notions of these terms are nearly
opposite. This was written for the "C" notions.

In 6.57.3, the difference between "unspecified behaviour" and "implementation-defined behaviour" is
said to be only that implementations are required to document the behaviour in the latter case. That
is a very superficial difference. 

SM - The C/C++ and Ada notions of these terms are nearly
opposite. This was written for the "C" notions.

6.55.5, 4th bullet: Idempotent behaviour is not enough to eliminate evaluation-order effects. Assume
a global variable X and two Boolean functions, A and B, where A always sets X to 1 and B always sets
X to 2, in addition to returning some Boolean value, and whatever the value of X was before. Both
operations are idempotent, but after evaluating "A and B" the final value of X depends on which of A
or B was evaluated last. Mathematically speaking, the important property is commutation, that A and
B "commute" in the sense that the result of applying A first, followed by B, is the same as when
they are applied in the opposite order. Unfortunately, "commutation" is not a property of a single
operation alone, such as "idempotency", but a property of sets of operations, making it both harder
to define and harder to check. I suggest to remove the alternative of "idempotent behaviour" from
this bullet, and leave only the "no side effects" case.

SM - Needs some thought.

6.55.5, 5th bullet, 2nd sub-bullet: What does "be enumerated" mean? Enumerated by whom and where? In
the coding guidelines or in the code? 

6.56.5, 5th bullet: "language extensions" is mentioned here; if that is considered "undefined
behaviour", it should be introduced earlier, in 6.56.3 or 6.56.1. Formally, it is covered by the
definition in 6.56.1, but that is easy to overlook there, so an explicit mention is better. In fact,
it seems to me that the term "programming language" would, in general, include extensions added by
the implementation; for example, the "Turbo Pascal language". But perhaps the term is more narrowly
used in ISO documents.

SM - For consideration

6.56.5, 5th bullet: Surely all uses of language extensions should be documented, no just those uses
that are "needed for correct operation"? For example, some extension may be used for programmer
convenience, or for execution speed.

SM - for consideration

6.56.5, 6th bullet: I think "documented" should be "document", and should perhaps be followed by a
comma.

SM - OK

6.56.5, last bullet: I don't understand what should be documented. The part "provided by and for
changing its undefined behaviour" seems garbled.

SM - For consideration

6.56.6: If use of language extensions is considered unspecified behaviour, add a bullet: Making
compilers optionally report all uses of compiler-specific language extensions, and optionally
consider such use an error that makes the compilation fail.

SM - for consideration

6.57.4: If the only difference between "implementation-defined" and "unspecified" behaviour is
documentation of the behaviour, there should be an additional language criterion here: that the
language standard requires implementations to document their behaviour
in these cases.

SM - covered in bullet 2

6.57.5, 5th bullet: I think "documented" should be "document", and should perhaps be followed by a
comma. And "enumerated" should be "enumerate".

SM - good catch

6.57.5, next to last bullet: I don't understand what should be documented. The part "provided by and
for changing its implementation-defined behaviour" seems garbled. 

6.57.5, last bullet: This seems rather weak advice. For example, if the language allows two kinds of
behaviour, the likelihood that the two different compilers will choose the same behaviour is high,
especially if that behaviour is much more convenient for the compiler, or for the target
architecture -- I assume the "different technologies" part does not mean using two different target
architectures.

SM - Disagree. That is why it says "at least" and different OS,
different hardware architecture, etc.

6.57.6, 1st bullet: Why should only "common" implementation-defined behaviours be listed? Why not
all?

SM - Too many to list "all"

6.57.6: Perhaps add: Extending the language to include features that have the same function as the
features with implementation-defined behaviour, even if the new features are more costly in
compilation time, execution time, or other resources. Then, possibly deprecating the now redundant
features that have implementation-defined behaviour.

SM - for consideration

6.58.5, 1st bullet: "complier" should be "compiler".

SM - Thx

6.58.6, 2nd bullet: Why should only "obscure" problematic features be removed? Should not all
trouble-spots be removed?

SM - Agree

6.59.2: Most of the references listed here are general works with no specific connection to this
vulnerability. This is very different from other similar sections which list specific paragraph or
sections in MISRA or other coding rules and guidelines. Are these general references really
appropriate here? Such general references would fit better in a general "references" section, which
could also have some general discussion, for example of static analysis tools and which tools might
help with which vulnerabilities.

SM - The "standard" other vulnerability documents fail to address
concurrency adequately.

6.60.2: Same comments as for 6.59.2.

SM - Same response

6.60.4: Formatting problem: the text is formatted as a bullet, but without the bullet symbol. The
text is also subtly different from 6.59.4, which is surprising and  perhaps not intended.

SM - Thx

6.60.5, 3rd bullet: I think it is unlikely that formal, abstracted models would have the capability
to warn about coding failures that might prevent timely termination, such as staying too long in an
abort-deferred region. Also, shouldn't the aim here be to show that thread non-termination (or
delayed termination) is properly handled?

SM - Those are covered in the other bullets. This bullet address the
interaction of concurrent entities at a higher level.

6.60.6: I find this advice peculiar; there is no attempt to prevent the occurrence of the problem:
late termination or non-termination of a thread. For example, the language could ensure that
abort-deferred regions cannot take a long time to execute; or could insist that a thread is not
allowed to ignore an abort request; or could place a time-out on thread termination, with an
immediate forced abort if the time-out is exceeded (of course, preferably without losing any
resources claimed by the thread).

SM - For discussion.

6.61.5, 1st bullet: Surely "all data " is too much? It should be "data read or written by several
threads".

SM - No. Putting data in a shared memory region could result in
concurrent access when it wasn not planned that way.

6.61.5, last bullet: I think it is dangerous to advise the use of "atomic" or "volatile" without
explaining in more detail what they do and don't do. For example, I've seen advice that an update,
such as K := K + 1, can be made thread-safe by marking K as "atomic", which is of course false. It
should be made clear that "atomic" and "volatile" are very limited in effect, and must be used
together with a correct, lock-free access protocol, faithfully
followed by all threads.

SM - Agreed.

6.62.2: Same comment as for 6.59.2.

SM - same response.

6.62.2, last paragraph: The initial quote character should be T.

SM - Thx

6.62.4: Formatting problem: the text is formatted as a bullet, but without the bullet symbol. The
text is also subtly different from 6.59.4, which is surprising and
perhaps not intended.

SM - Thx

6.62.5, foot-note 7: This is formatted as a bullet, but should not be.
SM - Thx

6.63.3, 2nd paragraph, 1st bullet: Instead of every thread (in the application) stopping because of
dead-lock, I think it is equally or more likely that some threads are dead-locked, but others
continue running. I suggest to change "every thread" to "some or all threads" and adjusting the rest
of the sentence accordingly. The system might still make progress in some of its jobs, while being
stymied in others.

SM - OK

6.63.5, 7th bullet: Should "calls and releases" perhaps be "locks and
releases"?

SM - OK

6.63.5, 7th bullet: Suggesting that the order of "calls and releases" should be "correct" is not
very helpful unless there is some explanation of what is "correct". For example, that any locking of
several objects, to hold locks on all those objects at the same time, should always be done in the
same order of the objects. Add some discussion of what is a "correct
order" for this bullet.

SM - for consideration

6.63.5, 8th bullet: Yes, static ceiling priorities can be statically checked (assuming that the
call-graph can be statically constructed), but I know of no tool that does it automatically,
starting from Ada source files, say. Does one exist? Can it be
referenced?

SM - It can be done in model checkers, but needs manual translation.

6.63.5, 9th bullet: What does it mean to treat a collection of tasks as "a separate independent
entity"? In what ways should the collection be so treated? For scheduling, for locking, or what?
This is obscure.

SM - OK. This needs a footnote

6.64.3, point 4, last sentence: The sentence contains the ungrammatical "an object that's address
was...". I suggest: "If the intent was to output the value of the object to which that parameter
points, but the intended control sequence is modified to %n, the value of that object will be
changed, instead of the object's value being output."

SM - Fixed

6.64.3, last paragraph, 2nd sentence: The parenthesis is unnecessary
and distracting; remove it.

SM - OK

   6.65.5, first bullet: This advice rather introduces the vulnerability, instead of avoiding or
mitigating it. Delete this bullet.

SM - to be considered

6.65.6: Perhaps add: Introducing a type of constant (such as Ada's "named numbers") that exists only
at compile time, is not allocated memory at run-time, and therefore cannot be altered at run-time. 

SM - To be considered
-- end