Name n3666, alx-0037r1 - add stpsep(), wcpsep() Principles - Codify existing practice to address evident deficiencies. - Enable secure programming Category Standardize higher-level string APIs Author Alejandro Colomar History r0 (2025-07-01): - Initial draft. r1 (2025-07-27; n3666): - Add note about DoS. Rationale After calling fgets(3) it's common to want to trim the trailing newline character while at the same time rejecting an incomplete line (or a file that contains a line that is too long). This is currently cumbersome to do in standard C: while (fgets(buf, _Countof(buf), stdin) != NULL) { p = strchr(buf, '\n'); if (p == NULL) goto fail; strcpy(p, ""); ... } I've found several bugs in code doing that, which often forgot to reject lines that are too long, and simply replaced the '\n' if present. It is interesting to have an API that writes a '\0' at the found character (if found), and returns a boolean-like pointer similar to strchr(3) (so it can be used as a boolean to know if it found the character). This is how it's used after fgets(3) to replace the code shown above: while (fgets(buf, _Countof(buf), stdin) != NULL) { if (stpsep(buf, "\n") == NUL) goto fail; // line too long, or non-text file ... } This function is still useful when using POSIX's getline(3), to reject lines that contain embedded null bytes: while (getline(&s, &n, stdin) != -1) { if (stpsep(s, "\n") == NUL) goto fail; // line contains '\0' ... } This API resembles strtok_r(3), but is stateless. It also resembles strsep(3) (GNU and BSD), but has the input and output switched. I've called it stpsep(), since it's very similar to strsep(3), except that it returns a pointer to one past the found-and-replaced delimiter character (so it's closer to the stp*() family of functions). By doing that, it doesn't update the input pointer in-place, and instead just takes a char*. This API is useful in places other than after fgets(3), but this one is the main use case, which led me to write this API, and afterwards I found it useful in other places. Because it was useful in other places, and also because this API serves to implement strsep(3) itself (and is closely related), I decided to search for multiple characters, as in strpbrk(3). Here's an example implementation in terms of strsep(3): char * stpsep(char *restrict s, const char *restrict delim) { strsep(&s, delim); return s; } Here's another implementation, in terms of more basic APIs: char * stpsep(char *restrict s, const char *restrict delim) { char *p; p = strpbrk(s, delim); if (p == NULL) return NULL; return stpcpy(p, ""); } which itself allows one to implement strsep(3) in a very simple way: char * strsep(char **restrict sp, const char *restrict delim) { char *s; s = *sp; if (s == NULL) return NULL; *sp = stpsep(s, delim); return s; } DoS concerns While one may be concerned that these APIs may promote DoS attacks by searching in a loop within a loop, these APIs are designed to be used with (short) string literals as the second argument. It would only be a concern if the second argument can be controlled by a user, but that would be unusual. Prior art stpsep() is defined and used extensively in the shadow-utils project. $ grep -rho 'stpsep([^)]*)' \ | sed 's/([a-z]*,/(s,/' \ | sort \ | uniq -c \ | sort; 1 stpsep(s, " \t") 1 stpsep(s, " \t\n") 1 stpsep(s, ",") 1 stpsep(s, "\"") 1 stpsep(tok + 1, "@") 2 stpsep(s, ":") 3 stpsep(char *s, const char *delim) 5 stpsep(s, "=") 21 stpsep(s, "\n") Proposed wording Based on N3550. 7.28.5 String handling :: Search functions ## New subsection after 7.28.5.9 ("The strtok function") +7.28.5.9+1 The stpsep function + +Synopsis +1 + #include + char *stpsep(char *restrict s, const char *restrict delim); + +Description +2 + The stpsep function + locates the first occurence + in the string pointed to by s + of any character + from the string pointed to by delim. + If such a character is found, + the function overwrites it with a null character. + +Returns +3 + The stpsep function + returns a pointer to one past the overwritten character, + or a null pointer + if no character from delim occurs in s. 7.33.4.6 General wide string utilities :: Wide string search functions ## New section after 7.33.4.6.8 ("The wcstok function") +7.33.4.6.8+1 The wcpsep function + +Synopsis +1 + #include + wchar_t *wcpsep(wchar_t *restrict s, const wchar_t *restrict delim); + +Description +2 + The wcpsep function + is equivalent to stpsep + except that it handles wide strings.