N3438
#embed synchronization

Published Proposal,

Authors:
Latest:
https://thephd.dev/_vendor/future_cxx/papers/C%20-%20embed%20Synchronization.html
Previous Revisions:
None
Paper Source:
GitHub ThePhD/future_cxx
Project:
ISO/IEC 9899 Programming Languages — C, ISO/IEC JTC1/SC22/WG14

Abstract

Synchronizes and cleans up both intended semantics and terminology for how preprocessor embed is meant to work in C, to match a similar proposal going through its final standardization in C++.

1. Changelog

1.1. Revision 0 - December 23rd, 2024

2. Introduction and Motivation

During the standardization discussion of #embed in WG21 in the last year for [P1967] and [P3540], several adjustments were requested to the behavior of #embed for niche cases. This paper synchronizes the behavior between what WG21 is going to see and what is contained in the current C23/C Working Draft.

The requested synchronizations are as follows:

The wording below attempts to accomplish all of these things.

3. Wording

This wording is relative to C’s latest working draft.

📝 Editor’s Note: The ✨ characters are intentional. They represent stand-ins to be replaced by the editor.

3.1. Modify §6.10.4.1 to change the expansion behavior of macros

1 A resource is a source of data accessible from the translation environment. An embed parameter is a single preprocessor parameter in the embed parameter sequence. It has an implementation resource width, which is the implementation-defined size in bits of the located resource. It also has a resource width, which is either:

Constraints

2 An embed parameter sequence is a whitespace-delimited list of preprocessor parameters which can modify the result of the replacement for the #embed preprocessing directive. Let embed element width be either:

Let implementation resource count be (resource width) / (embed element width). Let resource count initially be (implementation resource count). Let resource offset initially be zero. The result of (resource width) % (embed element width) shall be zero.

...

📝 IMPORTANT Editor’s Note: Replace all instances of "implementation resource width" with simply "resource width".

📝 IMPORTANT Editor’s Note: Delete all of ❡5 and ❡6, as its constraints have been moved up to just after ❡1 and redundant text has been eliminated, while a new definition for "empty" is defined in the semantics.

Semantics

5✨ A resource is considered empty in one of the following cases:

76✨ The expansion replacement of a #embed directive is a preprocessor token sequence in the form of a comma-delimited list of integer constant expressions, unless otherwise modified by embed parameters. formed from the list of integer constant expressions described later in this subclause. The group of tokens for each integer constant expression in the list is separated in the token sequence from the group of tokens for the previous integer constant expression in the list by a comma. The sequence neither begins nor ends in a comma. If the list of integer constant expressions is empty, the token sequence is empty. The directive is replaced by its expansion and, with the presence of certain embed parameters, additional or replacement token sequences. If the resource is empty, then the directive is not replaced by the comma-delimited list of integer constant expressions representing the resource. Otherwise, the resource offset indicates the first (resource offset) values (which would have been placed in the comma-delimited list had the resource offset been equivalent to zero) are discarded, ignored, and not part of the list. There shall be $max(0, min((resource\ count), (implementation\ resource\ count) - (resource\ offset)))$ integer constant expressions in the comma-delimited list, where $max$ and $min$ select the maximum and minimum value between 2 provided values, respectively. The value of each integer constant expression is determined in an implementation-defined manner, and is in the range from $0$ to $2^{embed\ element\ width} − 1$, inclusive.FOOTNOTE(For example, an embed element width of 8 will yield a range of values from 0 to 255, inclusive.) If:

then the contents of the initialized elements of the array are as-if the resource’s binary data represented by the resource offset and the resource count, as a file, is fread (7.23.8.1) into the array at translation time.

...

1110✨ Either form of the #embed directive shall process the preprocessor balanced token sequence of any embed parameter in the optional embed parameter sequence as in normal text, unless otherwise specified further in this subclause. specified previously behaves as specified later in this subclause. The values of the integer constant expressions in the expanded sequence are determined by an implementation-defined mapping of the resource’s data. Each integer constant expression’s value is in the range from 0 to (2embed element width) − 1, inclusive.207) If:

then the contents of the initialized elements of the array are as-if the resource’s binary data is fread (7.23.8.1) into the array at translation time.

1211✨ ... The preprocessing tokens after embed in the directive are processed just as in normal text. (Each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens.) The directive resulting after all replacements shall match one of the two previous forms. If the directive matches one of the two previous forms after the directive is processed as in normal text, any further processing as in normal text described for the two previous forms is not performed. The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters is combined into a single resource name preprocessing token is implementation-defined.

12✨ NOTE If the directive is processed as in normal text because it doesn’t match the first two forms but matches the third, processing as in normal text happens once and only once for the entire directive, including its parameters.
13✨ EXAMPLE If the directive matches one of the first two forms, then processing as in normal text only applies to the preprocessor balanced token sequence of any embed parameters. If the directive matches the third form, then processing as in normal text applies to the entire directive:
#define offset(ARG) limit(ARG)
#define prefix(ARG) suffix(ARG)
#define THE_ADDITION "teehee"
#define THE_RESOURCE ":3c"
#embed ":3c"        offset(2) prefix(THE_ADDITION)
#embed THE_RESOURCE offset(2) prefix(THE_ADDITION)

is equivalent to:

#embed ":3c" offset(2) prefix("teehee")
#embed ":3c" limit(2)  suffix("teehee")

3.2. Modify §6.10.4.1 Semantics, ❡12 (now ❡13) to add a new embed parameter

An embed parameter with a preprocessor parameter token that is one of the following is a standard embed parameter:

limit prefix suffix if_empty offset

3.3. Modify §6.10.4.2 "limit parameter"'s macro expansion rules in Semantics, ❡3 and ❡4

...

3The standard embed parameter with a preprocessor parameter token limit denotes a balanced preprocessing token sequence that will be used to compute the resource width. whose integer constant expression becomes the new value for the resource’s resource count defined in 6.10.4.1. The integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text. Independently of any macro replacement done previously (e.g. when matching the form of #embed), the constant expression is evaluated after the balanced preprocessing token sequence is processed as in normal text, using the rules specified for conditional inclusion (6.10.2), with the exception that any defined macro expressions are not permitted.

4The resource width is: 4✨The resource count is set to:

3.4. Add a new section §6.10.4.3 "offset parameter"

6.10.4.3 offset parameter
Constraints

The offset standard embed parameter may appear zero times or one time in the embed parameter sequence. Its preprocessor argument clause shall be present and have the form:

( constant-expression )

and shall be an integer constant expression. The integer constant expression shall not evaluate to a value less than 0.

The token defined shall not appear within the preprocessor balanced token sequence.

Semantics

The integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text.

The offset standard embed parameter denotes a balanced preprocessing token sequence whose integer constant expression becomes the value of the resource’s resource offset as defined in 6.10.4.1. The integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text.

References

Informative References

[P1967]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis). P1967 - Preprocessor Embed. December 15th, 2024. URL: https://htephd.dev/_vendor/future_cxx/papers/d1967.html
[P3540]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis). P3540 - embed offset parameter. December 15th, 2024. URL: https://htephd.dev/_vendor/future_cxx/papers/d3540.html