N1533 Updates to memchr from POSIX

Nick Stoughton
2010-11-04

Background

The Austin Group, responsible for the maintenance of ISO/IEC 9945 POSIX, were made aware of a potential problem in the specification of memchr, as described in WG 14 paper N1529 issue 110.

This paper is intended to propose similar wording to that agreed for POSIX to be applied to C1x to keep both standards in alignment. Without this change, POSIX would keep their new wording, but it would be marked as a C extension.

Description

Traditional implementations of memchr process the input in ascending order. This has the advantage that when the object size of s is not known, but c occurs within the object, the caller can pass a value of n that is larger than the actual object size without dereferencing inaccessible memory. However, while the POSIX and C99 standards are explicit that it is permissible to pass n smaller than the object size of s, it is silent on whether passing a larger n is well-defined.

In contrast, consider the wording for fprintf when dealing with the %.*s specifier, in POSIX from line 29938:

If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.
Many implementations of the *printf family use memchr to implement this statement; for example, http://git.sv.gnu.org/cgit/gnulib.git/tree/lib/vasnprintf.c?id=d4ca645#n197.

However, if memchr does not have any strict requirement on evaluation order, then this invokes undefined behavior. Likewise, application writers have noticed that it is possible to write faster code for finding a NUL byte, if one is present within a bounded length, by using memchr rather than strnlen, since the former has fewer conditionals (bounds check and search for NUL) than the latter (bounds check, search for NUL, and search for c). But again, this usage is rendered unsafe unless memchr is specified to behave like strnlen and not dereference past the match.

POSIX Change

In the DESCRIPTION remove "of the object" from
The memchr( ) function shall locate the first occurrence of c (converted to an unsigned char) in the initial n bytes (each interpreted as unsigned char) of the object pointed to by s.
In the RETURN VALUE section change
The memchr( ) function shall return a pointer to the located byte, or a null pointer if the byte does not occur in the object.
to
The memchr( ) function shall return a pointer to the located byte, or a null pointer if the byte is not found.
Add to DESCRIPTION
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).

Proposed C1x Change

Add to 7.23.5.1 para 2 the following:
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).