Document number: N2351
Submitter: Martin Sebor
Submission Date: March 18, 2019
Subject: Add strnlen to C2X

Summary

The C library specifies a rich set of functions in the <string.h> to manipulate strings. The strlen function, in particular, is heavily used by programs to compute the length of NUL-terminated character strings. However, not all character data C programs commonly work with is necessarily properly NUL-terminated strings. In addition, although most string copying and concatenation functions take NUL-terminated strings as arguments and create strings on return, not all of them do. For example, the strncpy function takes as an argument an ordinary array that need not be NUL-terminated and only appends NUL to the destination only when there is sufficient room. Similarly, the strncat function takes an ordinary array as its second argument, and both of strncmp's first two arguments can point to unterminated arrays.

Among the most common operations when manipulating character arrays is determining the length of a string. The strlen function exists to obtain this length. More generally, however, when dealing with character data, before its length can be computed, it may be necessary to first determine whether such data is, in fact, a properly NUL-terminated string. Only then can its length can safely be computed using strlen. But determining whether a NUL exists in an array of bytes of some size also computes the length of such a string when one is stored there, so it would be convenient and more efficient to do both in the same step than in two. Yet, no function in the C library is provided to perform this basic query.

The memchr function can be used for this purpose but only in a sort of roundabout way: it doesn't return a number of characters in a sequence but rather a pointer to the sought character specified by one of its arguments. Thus, setting that character to NUL lets mmemchr obtain its position. If the NUL exists in the source sequence, that is, if the sequence is, in fact, a string, the length can be computed in a subsequent step. If the sequence is not a string the result is a NULl pointer. As a result, when the length or size of a character sequence that may or may not be a string needs to be computed memchr might be used as follows.

      size_t string_length_or_size (const char *ptr, size_t max_size);
      {
        char *end = memchr (ptr, '\0', max_size);
        if (end)
          return end - ptr;
        return max_size;
      }

While this isn't a terribly difficult function for programmers to implement, it is a very common one that one would expect to be provided by a string library. As it happens, a function just like it has been specified by another ISO standard, namely ISO/IEC 9945, also known as IEEE Std 1003.1, 2017 Edition, or for short, POSIX, since 2006 and provided by all implementations that conform to it:

      size_t strnlen (const char *s, size_t maxlen);

Besides POSIX implementations, strnlen is available on a broad range of other systems, including but not limited to the following.

The latest GCC 9.0 source tree contains 31 calls to strnlen not counting the test suite, the Binutils/GDB tree contains 52 calls, and the Linux kernel tree 381 calls to it.

Since strnlen is commonly needed, widely available, and easy to implement, including as a compiler intrinsic function for efficiency (GCC, for example, provides it among its extensive list of built-in functions -- see Other Built-in Functions Provided by GCC), this proposal suggests to add it to C2X.

Why Is strnlen_s Not Enough?

The strmlen_s function specified in Annex K of C17 is nearly identical to strnlen, with one exception: it returns zero when its pointer argument is null. However, Annex K is an optional part of the C standard that virtually no implementation provides, and no compiler implements efficient intrinsics for. In addition, in order to use strnlen_s a program must also define the __STDC_WANT_LIB_EXT1__ macro to a non-zero value before including the <string.h>.h header, which brings into scope all other Annex K functions. Conversely, defining the macro to zero prevents the declarations from being visible and makes the names of all Annex K symbols available to the program. These restrictions, while appropriate and useful for the Annex APIs as a whole, would severly hamper the function's availability and portability. strnlen does not have any of these downsides.


Suggested Change

Add the following subsection just after §7.24.6.3 The strlen function.

7.24.6.? The strnlen function

Synopsis
	#include <string.h>

	size_t strnlen(const char *s, size_t maxlen);
Description

The strnlen function computes the smaller of the number of characters in the array pointed to by s, not including any terminating null character, or the value of the maxlen argument. The strnlen function examines no more than maxlen bytes of the array pointed to by s.

Returns

The strnlen function returns the number of bytes that precede the first null character in the array pointed to by s, if s contains a null character within the first maxlen characters; otherwise, it returns maxlen.