Proposal for C2y
WG14 3316

Title:               The `void`-_which-binds_, v2: typesafe parametric polymorphism
Author, affiliation: Alex Celeste, Perforce
Date:                2024-08-19
Proposal category:   New feature
Target audience:     Compiler/tooling developers, library developers, application developers

Abstract

C has a built-in mechanism for handling generically-typed data: void *. Unfortunately, this has a problem: it achieves genericity by simply discarding all type information!

We propose a mechanism to define native functions and data structures that are both polymorphic and strongly-typed, as commonly seen in functional programming languages or those that have adopted functional idioms.

The mechanism does not impose any runtime or ABI burden.

The `void`-which-binds, v2: typesafe parametric polymorphism

Reply-to:     Alex Celeste (aceleste@perforce.com)
Document No:  N3316
Revises:      N2853
Date:         2024-08-30

Summary of Changes

N3316

add discussion of function pointer usability issues, proposal for conversion
add qualifier-generic binding in response to Committee request
add discussion of container types
small fixes, typo changes, completely reworked normative wording
rebase proposed changes against latest draft

N2853

original proposal

Introduction

C can define polymorphic functions by using void *, but this comes at the cost of complete type erasure.

This means that while a function can be defined to operate on any type of data, it trades the ability to do so for the lost ability to statically check that the operations it wants to perform on that data are actually valid for its concrete type. It cannot guarantee that two operands to be compared originate from the same kind of data, and it cannot even guarantee that the comparison operator intends to work with them if they are!

In many other languages, expressing the necessary constraints is easy. In many languages influenced by ML, "type variables" can be used in the function signature, and in C++, templates can be used to parameterize a function over types, although templates are not parametric in quite the same way (templates are not polymorphic functions, but instead instantiate to monomorphic functions).

We propose a matched pair of two attributes which can appertain to a pointed-to void, which force the pointer to either bind to a concrete type (for operator implementations and container type definitions), or to bind consistently with other void pointers within the same toplevel function signature. Because these are attributes, they impose no runtime cost, and if they are removed from a correct program, ensure it will remain binary-identical to the type-safe version. This has the convenient impact that implementation conformance can be achieved by doing nothing.

Rationale

Other languages, especially those descending from or influenced by ML, have the notion of "type variables", by which function (and other) types can be parameterized: at the point of use, an operand type is substituted into the type of the function, and if the same type variable appears in two places in the function type, it has to have the same binding to a type deduced from the operands, enforcing type safety.

So in an ML-like or Haskell-like language, a function can have a type like this:

map :: forall a. ([a], (a -> a)) -> [a]

...which we can understand to take some sequence of elements of type a, a callback which transforms objects of type a to other objects also of type a, and returns another sequence which (we will assume) contains the results of each transformation. We can't call this function with a callback that operates on objects of a type not compatible with the objects in the input buffer.

Informatively, a similar C++ signature might be:

template <typename A>
auto map (Seq <A> const &, Function <A (A)> const &) -> Seq <A>;

Though it is important to remember that C++ templates are not functions; they merely instantiate into monomorphic functions, so this is not the same. Polymorphic functions in Haskell and ML have a truly polymorphic type for a single identity; the same function can be referenced, passed around, and still used polymorphically at a distance. This property is also true of C functions that use void * - they can be referenced by pointer and still applied to different argument types - and is the one we wish to preserve.

To express this idiomatically in C, we could use void *, and the function signature would probably look something like:

void map (void * in
        , void * out
        , void (* op) (void const *, void *)
        , void * (* step) (void *));

(either the step function, or a step size and some crafty casts to char *, is required to actually navigate the lists in the absence of size information about the elements)

Nothing prevents us from providing a mistyped op - or even a mistyped step: not even navigating a polymorphic sequence is statically assured to make sense!

Surprisingly, this is actually possible to enforce (with a lot of supporting verbosity) in pure C11. However, the technique is too verbose and fragile for real-world use and does not scale at all. Examples are provided at the end of an untyped program which can miscompile, a typechecked C11 program, and an equivalent typechecked program using the void-which-binds, showing the relative verbosity and the improvements in readabilty of the full feature.

Proposal

We propose to add four new attributes which can appertain to the type void, when it appears as the referenced type of a pointer type used as either a function parameter or a function return. These are:

[[bind_var (id)]]
[[bind_type (type)]]
[[qual_var (id)]]
[[qual_with (id)]]

where id is an identifier that introduces a type variable, and type is a concrete type name other than void (it does not otherwise need to be complete).

The names of these attributes are tentative.

Syntactically, because bind_var and bind_type appertain to the type void itself, they appear after it in a pointer type specifier. They may also appertain to a typedef of void. qual_var and qual_with also appertain to the pointed-to type, but are not limited to void.

(note that the syntax requires them to appear at the end of the specifier list, so if the type is qualified-void e.g. void const, they have to appear void const [[here]] and not void [[here]] const, which forces the use of "west const" const void [[here]] if the void and the attribute originate from the same macro)

Both of the bind attributes add constraints which restrict which pointer types the given parameter or return value may be converted to without an explicit cast. Effectively both of these attributes add virtual qualifiers that associate with the pointed-to type and therefore cannot be discarded except by explicit cast-off.

qual_with adds a similar constraint based on the qualifier deduced by qual_var, enforcing that the pointer type annotated with it must only be converted as if it had a deduced qualifier.

`bind_type`

The simpler attribute is bind_type. This adds the constraints that the pointer with the annotated referenced type can only implicitly convert to or from a (suitably qualified) pointer to type, or be assigned to another (suitably qualified) pointer to void with the same annotation.

For example:

void const [[bind_type (float)]] *
float_op_impl (void const [[bind_type (float)]] * p1, void [[bind_type (float)]] * p2)
{
  float const * fp1 = p1; // OK, bound type
  float * fp2 = p2; // OK
  
  int  const * ip1 = p1; // new constraint violation
  void const * vp1 = p1; // new constraint violation
  int  const * ip2 = p2; // new constraint violation
  void const * vp2 = p2; // new constraint violation
  
  fp2 = p1; // (existing constraint violation)
  
  ip2 = (int  *)p2; // OK, explicit cast
  vp2 = (void *)p2; // OK, explicit cast-off of bind_type
  
  return fp1; // OK, bound type
  return vp1; // new constraint violation
  return (float const *)ip1; // ...OK at least by these rules
  
  return p1; // OK, same bind_type
}

This attribute is not really intended to be used in polymorphic functions themselves, but rather in the operators passed to the polymorphic function, which are monomorphic in themselves but usually will require a signature that explicitly works with void * (because int * (*) (int *) and void * (*) (void *) are not convertible or guaranteed to have similar representations), in order to be passed to the actual polymorphic function (but see below for more on both of these points).

`bind_var`

This attribute is used on the parameters and return type of the actual polymorphic function. It is also appertained to the parameters and return type of any callback operators accepted by the polymorphic function, within its own signature.

bind_var adds the constraint that whatever the referenced types of the pointers that are converted to or from (possibly-qualified) void * in the argument types and returned value conversion, must be consistent between all void pointers annotated with the same id, within each instance of a call to that function.

In other words it introduces a type variable and all void * annotated with the same type variable must convert to or from the same (ignoring qualifiers) pointer type.

For example:

void [[bind_var (A)]] *
select_first (void [[bind_var (A)]] * a, void [[bind_var (A)]] * b, void [[bind_var (B)]] * c)
{
  return a; // OK, a must have bound to an A*
  return b; // OK, b must have bound to an A*
  return c; // new constraint violation, c may have bound to a different type
}

int * ip;
float * fp;

ip = select_first (ip, ip, fp); // OK, a and b are both A
fp = select_first (fp, fp, ip); // same

ip = select_first (ip, ip, ip); // OK, A and B can both be the same, it just doesn't know that

ip = select_first (fp, fp, ip); // new constraint violation, mismatched A between return/param
select_first (ip, fp, fp);      // new constraint violation, mismatched A between params 1 & 2

The two attributes combine: bind_type can bind its type variable to the type introduced by a bind_var annotation, to allow operator callbacks to be type-checked against data arguments to a higher-order polymorphic function:

void map (
    void [[bind_var (A)]] * in
  , void [[bind_var (A)]] * out
  , void (* op) (void const [[bind_type (A)]] *, void [[bind_type (A)]] *)
  , void [[bind_type (A)]] * (* step) (void [[bind_type (A)]] *)
);

The distinction is that bind_var deduces a value for the type variable from context, while bind_type enforces that the conversion is consistent when there is no input context (it is for conversions that will occur within the function body, from typed-void to typed-void, and can therefore be used in declarations local to the definition).

bind_type therefore works with untyped callbacks and enforces a type for their arguments within the function's scope only. If the callback had typed-void parameters already, bind_var could also appear here and deduce a type from the callback's existing bind_type attributes (which would still have to be consistent with the type deduced from in and out in this particular example).

In the above declaration of a polymorphic map (implementation is in the Examples section), all of the void * type components have been annotated as having to bind to the same concrete object type. Therefore:

int i1[10], i2[10];
float f1[10], f2[10];

void int_op (void const [[bind_type (int)]] *, void [[bind_type (int)]] *);
void [[bind_type (int)]] * int_step (void [[bind_type (int)]] *);

void float_op (void const [[bind_type (float)]] *, void [[bind_type (float)]] *);
void [[bind_type (float)]] * float_step (void [[bind_type (float)]] *);

map (i1, i2, int_op, int_step); // OK, all void * bind to int *
map (f1, f2, float_op, float_step); // OK, all void * bind to float *

map (i1, i2, float_op, float_step); // new constraint violation - operator types do not match data
map (i1, f2, int_op, float_step); // new constraint violation - operator and data types mismatch

Finally, if a polymorphic function is called from within another polymorphic function, the type variables bind to the type variables of the void * pointers passed directly to it, rather than to void:

// yikes!
void map2 (
    void [[bind_var (A)]] * in1
  , void [[bind_var (A)]] * out1
  , void (* op1) (void const [[bind_type (A)]] *, void [[bind_type (A)]] *)
  , void [[bind_type (A)]] * (* step1) (void [[bind_type (A)]] *)

  , void [[bind_var (B)]] * in2
  , void [[bind_var (A)]] * out2
  , void (* op2) (void const [[bind_type (B)]] *, void [[bind_type (B)]] *)
  , void [[bind_type (B)]] * (* step2) (void [[bind_type (B)]] *)
) {
  map (in1, out1, op1, step1); // OK, all void * bind to A *
  map (in2, out2, op2, step2); // OK, all void * bind to B *

  map (in1, out1, op2, step2); // new constraint violation - operator types do not match data
  map (in1, out2, op1, step2); // new constraint violation - operator and data types mismatch
}

Within the function body, void-pointers annotated by bind_var may only implicitly convert to other void-pointers with the same type variable. Conversions to other types, including pointers to void without the same type variable (inc. no type variable), require an explicit cast.

A void-pointer annotated by bind_type with a type variable that was introduced by bind_var has the same constraint (consistent with the constraint if it had named a concrete type).

`qual_var` and `qual_with`

qual_var is very similar to bind_var, in that it introduces a type-level variable that binds to part of the type that the appertained pointed-to type was implicitly converted from. However, instead of introducing a variable for the pointed-to object type, it introduces a variable for the qualification of the pointed-to object type.

Like bind_var, qual_var can bind to the qualifiers of the type named in a bind_type attribute, or to the qualifier variable named by another qual_var attribute if one polymorphic function calls another polymorphic function. Just like bind_var, if two qual_var attributes declare the same qualifier variable in the same scope, they must bind to exactly the same qualifiers on the pre-conversion type.

Unlike bind_var, which works with void as the object type and therefore has no other object information to work with, it is possible for a pointer to be declared as pointing to a qualified type and also to qual_var. This is actually necessary: if a pointer is declared as pointing to unqualified void, only pointers to unqualified object types would be able to convert to it, so the listed qualifiers on the pointed-to type actually describe the maximum qualification that qual_var can bind (i.e. it can bind to less qualification than whatever is declared using qualifier specifiers).

This same restriction doesn't exist in the return type (or whatever other "output" type is annotated by qual_with), which can (and generally should) be a pointer to an unqualified object type, and must either be immediately implicitly converted back to a pointer with at-least as much qualification of the pointed-to type annotated by the corresponding qual_var, or discarded (in practice functions using this are likely to be logically nodiscard).

This attribute implements a feature which is already in C23 but currently uses ad-hoc signatures, namely the qualifier-generic search functions. These provide the best examples:

char [[qual_with (Q)]] * strchr (const char [[qual_var (Q)]] * s, int c);

The strchr function is already defined to return a pointer with the same qualification as its argument (not as its formal parameter), and this signature is able to capture that: if the argument applied to s was char const *, Q binds const, and therefore the return type char Q * must be converted to char const *; or if the argument to s was char *, Q binds nothing and char Q * is allowed to convert to char * or a pointer to a more qualified char. However, Q can never bind to volatile because the parameter is not a pointer to a volatile-qualified type, so no such pointer can be used as an argument value.

As is demonstrated by the example, qual_var can be used with complete object types, unlike bind_var, because it captures a different part of the pointed-to type.

The relationship between qual_var and qual_with is symmetric to the relationship between bind_var and bind_type: qual_with does not deduce any qualifiers from the conversion, because its purpose is to ensure that appropriate qualification is added by the "output" conversion.

Casting to generic function types

The "operator" functions that make use of the bind_type attribute suffer from an unfortunate usability problem, which is that although the concrete type of the pointed-to object is known by the function definition, the pointers cannot be dereferenced because they still have an incomplete C language type: the attribute does not change the meaning of the annotated C code, so there is no ability added to use the pointers "conveniently", without converting the pointer type.

This is hardly insurmountable but makes the code much less elegant to read:

// compar functions for qsort:

int cmp_float_void (void const [[bind_type(float)]] * lv
                  , void const [[bind_type(float)]] * rv) {
    float const * lhs = lv;
    float const * rhs = rv; // this verbosity isn't great
    return *lhs < *rhs ? -1
         : *lhs > *rhs ? +1
         :                0;
}

int cmp_float_simple (float const * lhs, float const * rhs) {
    return *lhs < *rhs ? -1
         : *lhs > *rhs ? +1
         :                0;
}

Consequently we propose that it should be permitted to cast a pointer to any given function type, to a pointer to a function type with the same signature but with some or all of the argument types or the return type substituted for (equivalently-qualified) pointers to void, annotated by bind_type with the substituted type name, if they were pointers to complete object types in the original signature. This would allow:

qsort (base, n, sz, (Compar)cmp_float_simple);

even though cmp_float_simple doesn't have the same signature as the compar for qsort and is not a compatible function type.

We propose that this permission should only be made available to (possibly parenthesized/generically selected) identifiers that declare a function, and not to generalized function pointers. The reason for the restriction is that the conversion is not guaranteed to be a no-op on targets where data pointers can have different representations, and therefore function types that differ only in parameter and return pointer types can potentially have different ABIs. This primarily affects the embedded space. A cast of an identifier which designates a specific function can be implemented by a compiler-generated shim function wrapping that specific function in the new signature, but this cannot implement generalized (non-constant) function pointers.

(Compar)cmp_float_simple

// compiler inserts the static shim:
static int @shim$cmp_float_simple (void const [[bind_type(float)]] * lv
                                 , void const [[bind_type(float)]] * rv) {
    return cmp_float_simple (lv, rv);
}

(Compar)some_dynamic_pointer

// static shim is not possible

There is an existing requirement in 6.3.2.3 that when a pointer to a function is converted to a pointer to a function of another type and back again, the result compares equal to the original pointer. This is achievable with compiler-inserted shims, since the compiler (or at least, the linker) would be able to construct a list of the generated shim functions and their originals to check for in a back-conversion. This is not free, but on platforms where the conversion is not trivial anyway, is potentially an acceptable cost as users are less likely to be freely converting between function types in that situation. (on common desktop platforms where the shim is not needed anyway, the conversion back will always be a no-op too)

Library changes

We would propose that the bind_var and qual_var attributes apply immediately to the functions qsort and bsearch in <stdlib.h> and to memchr in <string.h>; and that the qual_var attribute apply immediately to the functions strchr, strpbrk, strrchr and strstr from <string.h>; and to the functions wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr from <wchar.h>:

void [[qual_with (Q), bind_type (T)]] * bsearch (void const * key
    , void const [[qual_var (Q), bind_var (T)]] * base
    , size_t nmemb
    , size_t size
    , int (*compar)(void const [[bind_type (T)]] *, void const [[bind_type (T)]] *));

void qsort (void [[bind_var (T)]] * base
    , size_t nmemb
    , size_t size
    , int (*compar)(void const [[bind_type (T)]] *, void const [[bind_type (T)]] *));

void [[qual_with (Q)]] * memchr  (void const [[qual_var (Q)]] * s,  int c, size_t n);
char [[qual_with (Q)]] * strchr  (char const [[qual_var (Q)]] * s,  int c);
char [[qual_with (Q)]] * strpbrk (char const [[qual_var (Q)]] * s1, const char * s2);
char [[qual_with (Q)]] * strrchr (char const [[qual_var (Q)]] * s,  int c);
char [[qual_with (Q)]] * strstr  (char const [[qual_var (Q)]] * s1, const char * s2);

wchar_t [[qual_with (Q)]] * wcschr  (wchar_t const [[qual_var (Q)]] * s,  wchar_t c);
wchar_t [[qual_with (Q)]] * wcspbrk (wchar_t const [[qual_var (Q)]] * s1, const wchar_t *s2);
wchar_t [[qual_with (Q)]] * wcsrchr (wchar_t const [[qual_var (Q)]] * s,  wchar_t c);
wchar_t [[qual_with (Q)]] * wcsstr  (wchar_t const [[qual_var (Q)]] * s1, const wchar_t *s2);
wchar_t [[qual_with (Q)]] * wmemchr (wchar_t const [[qual_var (Q)]] * s,  wchar_t c, size_t n);

(i.e. none of the functions deduce object type or qualification from the return conversion, and neither bsearch nor qsort deduce object type from compar, even though doing so is possible, only from base)

Alternatives

The names and exact placement of the attributes can be subject to further discussion. In practice we expect that users would wrap them inside macros anyway, for instance:

#define Auto(T) void [[bind_var (T)]]
#define Void(T) void [[bind_type (T)]]

void map (
    Auto (A) * in
  , Auto (A) * out
  , void (* op) (const Void (A) *, Void (A) *)
  , Void (A) * (* step) (Void (A) *)
);

void int_op (const Void (int) *, Void (int) *);

(again note that the choice to deduce from both in and op, but not out or step, is a design decision and all four could be deduced from, or even just one of the four; wherever the type information exists)

Overloads and _Generic do not provide a suitable alternative to this feature. The ad-hoc polymorphism provided by _Generic operates on a fixed set of known types. The operator name that it can be used to construct is a second-class language feature that does not have a single address. To pass such an operator around, it must be deconstructed somehow and a non-generic overload selected.

This functionality is completely orthogonal; where an ad-hoc polymorphic "overloaded" function provides different suitable implementations for a number of different types, a parametrically polymorphic function is completely unaware of the concrete type of its arguments and always provides the exact same implementation regardless of it.

Templates are similarly orthogonal: they do not create a first-class polymorphic language feature (remaining polymorphic when passed by value), but instantiate into separate monomorphic functions. They can also specialize, which betrays the principle of always providing the exact same body implementation down to the instruction level.

Dynamic types

A different proposal targeting C2y suggests adding runtime-polymorphic types to the language. This feature can be used to implement essentially any algorithm that could be expressed using parametric polymorphism, but does not perform any type-checking. Essentially it expands the functionality of void * itself to allow whole object footprints to become untyped. In general we feel like these proposals are best suited to target different use cases - N3212 is better suited to use cases where the object type really is intended to be dynamic, such as relying on a dynamic object footprint, rather than functions where it is simply abstracted and can be checked for consistency against the other arguments. It also does not propose any way to improve type safety, so does not help with any case where the existing void * feature is already enough to implement an algorithm but is not enough to be sure it is correct. The sort examples from N3212 are an area which we feel this proposal handles better, while the atomic examples are a use case that this proposal cannot handle at all and benefit very well from its dynamic approach.

Impact

The implementation impact is lower than it seems.

Firstly, there is no ABI impact whatsoever from this feature. Because the constraints are applied by attributes, they do not change the representation type of the pointers to generic data, which remain void * at all times. There is no binary difference between a program that chooses to make use of this typechecking feature and one that does not.

This is different from C++, where a template function instantiates into multiple implementations with semantically separate identities and which all have different signature types. There remains only one polymorphic function, with one type, and whose type remains polymorphic even after it is referenced indirectly.

This follows directly from the principle that a correct program with standard attributes remains correct and has the exact same observable meaning if the attributes are removed.

Secondly, if polymorphic type checking proves too difficult for a smaller implementation, since these constraints are only introduced by attribute - the implementation can choose to be unable to diagnose violations of the constraints and will still be conforming, because it will always compile a correct program to exactly the same meaning as a higher-end compiler that does understand the attributes. Therefore, implementations are eased-in to needing to support polymorphism, and do not need to provide full checking in order to conform.

It is possible and entirely plausible that only analysis tools and not primary compilers would ever bother to implement the constraints for these attributes.

Thirdly, implementation experience with Helix QAC suggests that ML-style type inference with unification is "not too difficult" for a small team (one) to implement in a short amount of time. We found that C could easily be extended with back-propagating inferred pointer types, which are used to drive a number of different type-based (proprietary) analyses. (In QAC these properties are mostly inferred from usage rather than specified explicitly by user attributes.)

ML subset compilers are frequently implemented as student projects, so this is a widely understood feature in the broader compiler community.

Future directions

For container types

A request that immediately arose from the first version of this proposal was that bind_var should also be available for use in structures, to make container types slightly more resilient.

If bind_var and friends are adopted for use with function types, it therefore follows that the logical next line of development would be to define the attributes in order to allow syntax along the lines of the following:

[[bind_var (T)]] struct LinkedList {
    struct LinkedList [[bind_type (T)]] * next;
    void [[bind_type (T)]] * data;
};

Such a structure definition is the same type as a generic linked list that can hold elements of any type, but enforces that the type held in data is consistent with the declaration of the pointer to the head, and that the type of the node pointed to by next will similarly be consistent with the type of this node. This would enforce that the list only contains elements of a single element type without having to instantiate or metaprogram-up a specific LinkedList for the member type. This has the advantage that containers of distinct (but internally-consistent) element types can be handled by algorithm functions (map etc.) that are themselves fully polymorphic; if LinkedList needed to be instantiated into multiple struct types, the handling functions would themselves also need to have a distinct instance for each concrete type, ending up with something like C++ templates instead of a proper parametrically-polymorphic system with only one generic function and one generic container type.

Note that unlike polymorphic functions, which deduce bindings for the type variable from use, polymorphic types would need to have the type explicitly declared:

struct LinkedList [[bind_type (int)]] * head = malloc (sizeof (struct LinkedList));

Note it is not possible to write auto [[bind_type (int)]] * head = in the current version of the language because of syntax limitations on auto (the star is an implementation-defined extension). This could be a separate improvement that composes well.

We do not propose wording for this feature at this time, though overall the development should be simpler than the basis established for function types, and the extension seems reasonable.

For dynamic types

Although the feature described by N3212 generally fits better against other use cases, it has room to compose well with void-which-binds in situations where a function accepts more than one dynamically-typed argument. For instance:

void generic_from_to (_Type T
    , _Var(T) const * from
    , _Var(T)       * to) {
    T temporary;
    // work using the temporary, from, and to
}

could avoid having, and needing to check, a dependent type in the signature by using void *:

void generic_from_to (_Type T
    , void const [[bind_var (A)]] * from
    , void       [[bind_var (A)]] * to) {
    T temporary;  // can still do this bit
    // same work because `_Var(X) *` is defined to be a `void *` in ABI
}

Proposed wording

The proposed changes are based on the latest public draft of C2y, which is N3220. Bolded text is new text when inlined into an existing sentence.

New attributes

Modify 6.7.13 "Attributes":

Modify 6.7.13.2 to add the four new standard attribute names:

The identifier in a standard attribute shall be one of:
bind_type
bind_var
deprecated
fallthrough
maybe_unused
nodiscard
noreturn
_Noreturn
qual_var
qual_with
unsequenced
reproducible

(ED: not in alphabetical order?)

Note that 6.7.13.2 paragraph 3 and footnote 173 remain true for the new attributes without changes.

Add two new sections after 6.7.13.8:

6.7.13.9 The bind_var and bind_type attributes

Constraints

The bind_var attribute shall be applied to the referenced type of a pointer in the return or parameter types of a function declaration or function pointer.

bind_var shall have an argument clause, which shall have the form
( identifier )

The bind_type attribute shall be applied to the referenced type of a pointer in the return or parameter types of a function declaration or function pointer, or to the referenced type of a pointer type specified within the scope of a function that uses bind_var in its signature.

bind_type shall have an argument clause, which shall have the form
( type-name )
or
( identifier )
where identifier is an identifier introduced by a bind_var attribute appearing in the function signature.

The bind_var and bind_type attributes shall not be applied to any type other than possibly-qualified void. ^footnote)

footnote) they can therefore apply to aliases for void created by typedef.

Semantics

The bind_var and bind_type attributes indicate that a pointer declared as a possibly-qualified pointer to void is intended to be used with statically-checked types.

A pointer that specifies the bind_var attribute is a binding pointer. A binding pointer with an identifier A as its attribute argument is binding to A.

A pointer that specifies the bind_type attribute is a constrained pointer. A constrained pointer with an identifier A as its attribute argument is constrained to A. A constrained pointer with a type name T as its attribute argument is constrained to T.

Within the scope of its declaration, a binding pointer shall only be converted to a pointer binding to the same identifier, or to a pointer constrained to the same identifier, except by means of an explicit cast.

Within the scope of its declaration, a constrained pointer shall only be converted to a pointer binding to the same identifier, or to a pointer constrained to the same identifier or type name, or to a pointer to (possibly-qualified) void, except by means of an explicit cast.

For a function call expression, if a parameter of the called function, or the return type, is or derives from a binding pointer; the corresponding type ignoring qualification in the type of the argument, or the type that the return type is converted or assigned to, determines the actual type. If the actual type is a binding pointer or constrained pointer in the calling scope, the actual type is the identifier or type name it is binding or constrained to; otherwise, the actual type is the referenced type; or void if the corresponding type in the calling context is not a pointer.

For each binding pointer in the type of the called function that binds to the same identifier, the actual type shall be the same ^footnote).

footnote) two types that are compatible might not be the same type.

For each constrained pointer in the type of the called function that is constrained to an identifier, the actual type shall be the same as the actual type of all binding pointers in the type of the called function that bind to that identifier.

For each constrained pointer in the type of the called function that is constrained to a type name, the actual type shall be compatible with the constrained type, or void.

The __has_c_attribute conditional inclusion expression (6.10.2) shall return the value ######L when given bind_var or bind_type as the pp-tokens operand.

If an identifier has been bound by bind_var in the current scope, and is also visible in the current scope as a typedef name, the bind_type attribute interprets it as the identifier bound by bind_var.

NOTE there is no restriction against two binding pointers with distinct identifiers binding to or from the same type outside the function, but internally the function treats them as unrelated.

Recommended Practice

The bind_var attribute should be used in the parameters or return type of a polymorphic function to indicate that two separate generic pointers in its signature are intended to work on the same concrete type, in any given invocation.

This attribute allows usage of the polymorphic function to be type-checked against the concrete types of its arguments and destination, as well as the expected types to be operated on by any callback operators passed to it.

An implementation is encouraged to emit a diagnostic if the identifier introduced by bind_var has also been declared in any name space in the current scope when the attribute is encountered.

An implementation is encouraged to emit a diagnostic if two declarations of a function do not make consistent use of the bind_var and bind_type attributes.

The bind_type attribute should be used with a type name in the parameters or return type of a callback operator that is declared with a signature that uses pointers to void because it is intended to work with polymorphic higher-order functions (like qsort), but that is itself intended to operate on values with a concrete type.

The bind_type attribute used with an identifier should be used to indicate when the concrete type of the pointed-to object is expected to be the same as the concrete type pointed to by a binding pointer. This can for instance communicate that a return type is the same as a parameter type, but that it doesn't contribute to "deduction" and be discarded or cast to some other type.

EXAMPLE 1 This polymorphic function returns a reference to an element within the first buffer. It cannot correctly return a reference to an element within the second buffer:
int i1[10];
float f1[10];

void [[bind_var (A)]] * first (void [[bind_var (A)]] * p1
                             , void [[bind_var (B)]] * p2) {
    return p1; // correct
    // return p2; // constraint violation
}

int   * ip = first (i1, f1); // OK
float * fp = first (f1, i1); // OK
ip = first (i1, i1); // also OK, the function just doesn't know A and B are compatible here
fp = first (i1, f1); // constraint violation, A and A do not match
EXAMPLE 2 qsort is a polymorphic function that needs to accept a compare function that will work on its base array, so the following signature can communicate this:
void my_qsort (void [[bind_var (T)]] * base
             , size_t nmemb
             , size_t size
             , int (*compar)(void const [[bind_var (T)]] *
                           , void const [[bind_var (T)]] *));)

my_qsort (f1, 10, sizeof (float), float_compare); // OK
my_qsort (i1, 10, sizeof (int), float_compare); // constraint violation, A and A do not match
EXAMPLE 3 This callback is intended to be passed to qsort to sort an array of float values. Its operating parameter types are float pointers, but the signature of qsort requires them to be declared as pointers to void:
int float_compare (const void [[bind_type (float)]] * l
                 , const void [[bind_type (float)]] * r) {
    const float * fl = l;
    const float * fr = r;
    
    return *fl < *fr ? -1
         : *fl > *fr ? +1
         :             0;
}
EXAMPLE 4 This polymorphic function only intends to deduce the type A from its in parameter, and requires out to match its type exactly:
void move_bytes (size_t sz
    , void const [[bind_var (A)]]  * in
    , void       [[bind_type (A)]] * out);

int const a[10] = { ... };
int b[10];
move_bytes (sizeof a, &a, &b); // Note that the actual type of A is int,
                               // not int const, so these match
This can be helpful for communicating intent more clearly. This function declares that the returned pointer type has the same concrete object type as the parameter, without deducing the binding from it, which (for instance) allows it to be cast immediately to another type or converted to void *:
void [[bind_type (A)]] * do_step (size_t sz, void [[bind_var (A)]] * in);

int i[10];
void * p = do_step (sizeof (int), &i[0]);  // OK - the converted result
                                           // doesn't contribute to deduction so
                                           // the (nop) conversion to void* is OK
                                           // even though the actual type is int*

6.7.13.10 The qual_var and qual_with attributes

Constraints

The qual_var attribute shall be applied to the referenced type of a pointer in the parameter types of a function declaration or function pointer.

The qual_with attribute shall be applied to the referenced type of a pointer in the return or parameter types of a function declaration or function pointer.

The qual_var and qual_with attributes shall have an argument clause, which shall have the form
( identifier )

The qual_with attribute shall only appear in the declaration of a function that also contains a qual_var attribute with the same argument clause.

Semantics

The qual_with attribute applied to a pointed-to type indicates that it is intended to be treated as if it had the same qualification as the type that is converted to the parameter type with the qual_var attribute with a matching identifier, within the arguments for a given function call expression.

If the return type of the function contains the qual_with attribute, its return value shall not be used in a way that would discard any qualification existing before implicit conversion, from the type at the corresponding position in the type of the argument value assigned to a parameter containing the qual_var attribute with a matching identifier.

If two qual_var attributes appear in the function's parameter types, the qualification associated with the identifier is the combined qualification of the types at the corresponding positions in both argument types, before implicit conversion.

The __has_c_attribute conditional inclusion expression (6.10.2) shall return the value ######L when given qual_var or qual_with as the pp-tokens operand.

Recommended practice

The qual_var and qual_type arguments should be used to indicate that the result of a function call has the same qualification as the input(s) before implicit conversion. This is useful for functions that intended to return a pointer to an unqualified element if given an unqualified array, and to a qualified element if given a qualified array.

An implementation is encouraged to emit a diagnostic if the identifier introduced by qual_var has also been declared in any name space in the current scope when the attribute is encountered.

EXAMPLE 1 The canonical example is the search functions from the Standard Library, which may be used to search both const and non-const arrays:
// declared in string.h

// if the input pointer is to const, the result is to const
void [[qual_with (Q)]] * memchr  (void const [[qual_var (Q)]] * s,  int c, size_t n);
char [[qual_with (Q)]] * strchr  (char const [[qual_var (Q)]] * s,  int c);

char const a1[100] = ...
char       a2[100] = ...

char const * pcc;
char       * pcm;
pcc = strchr (a1, '6');  // valid: a1 is pointer to const, result goes to pointer to const
pcm = strchr (a2, '6');  // valid: a2 is unqualified, result goes to unqualified
pcc = strchr (a2, '6');  // valid: a2 is unqualified, result goes to pointer to const
pcm = strchr (a1, '6');  // diagnostic: a1 is pointer to const, result goes to unqualified
EXAMPLE 2 Using qual_var on a pointer to an unqualified type doesn't make sense:
char [[qual_with (Q)]] * get_elem (char [[qual_var (Q)]] *, size_t at);

char const a1[100] = ...
char       a2[100] = ...

char const * p1 = get_elem (a1, 6);  // error: can't pass a1 to this parameter!
char       * p2 = get_elem (a2, 10); // valid, but the attributes don't add anything
EXAMPLE 3 When two parameters use the same identifier, the qualification associated with the identifier by qual_with is combined from both parameters:
void [[qual_with (Q)]] * one_of (void const volatile [[qual_var (Q)]] * l
                               , void const volatile [[qual_var (Q)]] * r);
void const          * pcv;
void volatile       * pvv;
void const volatile * pcvv;

pcv  = one_of (pcv, pcv); // valid, only const qualified
pcv  = one_of (pcv, pvv); // diagnostic, lost volatile qualification
pvv  = one_of (pcv, pvv); // diagnostic, lost const qualification
pcvv = one_of (pcv, pvv); // valid

Permissive function casts

Modify 6.3.2.3, paragraph 8:

A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If the return type or **any parameter type of the converted type is a pointer to possibly-qualified void, and the ** corresponding return type or parameter type of the type before conversion is a pointer to an identically-qualified object type, and the expression being converted is an identifier declared as having function type ^footnote), using the converted pointer to call the underlying **function has the same effect as a call through a pointer of the original type. Otherwise, ** if a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

footnote) as opposed to an identifier declared as having pointer to function type.

Library changes

Modify 7.24.5.1 "The bsearch generic function", changing the signature in the synopsis:

#include <stdlib.h>
void [[qual_with (Q), bind_type (T)]] * bsearch (void const * key
    , void const [[qual_var (Q), bind_var (T)]] * base
    , size_t nmemb
    , size_t size
    , int (*compar)(void const [[bind_type (T)]] *, void const [[bind_type (T)]] *));

Replace "the bsearch generic function" with "the bsearch function" in paragraph 2 and paragraph 4, and the clause title.

Leave paragraph 5 as-is for clarity.

Remove paragraph 6 "The external declaration ... is visible" and footnote 355.

Modify 7.24.5.2 "The qsort function", changing the signature in the synopsis:

#include <stdlib.h>
void qsort (void [[bind_var (T)]] * base
    , size_t nmemb
    , size_t size
    , int (*compar)(void const [[bind_type (T)]] *, void const [[bind_type (T)]] *));

Remove 7.26.5 "Search functions", paragraph 2 "The external declarations ...", paragraph 3, and footnote 365.

Modify 7.26.5.2 "The memchrgeneric function", changing the signature in the synopsis:

#include <string.h>
void [[qual_with (Q)]] * memchr  (void const [[qual_var (Q)]] * s,  int c, size_t n);

Replace "the memchr generic function" with "the memchr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.26.5.3 "The strchrgeneric function", changing the signature in the synopsis:

#include <string.h>
char [[qual_with (Q)]] * strchr  (char const [[qual_var (Q)]] * s,  int c);

Replace "the strchr generic function" with "the strchr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.26.5.5 "The strpbrkgeneric function", changing the signature in the synopsis:

#include <string.h>
char [[qual_with (Q)]] * strpbrk (char const [[qual_var (Q)]] * s1, const char * s2);

Replace "the strpbrk generic function" with "the strpbrk function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.26.5.6 "The strrchrgeneric function", changing the signature in the synopsis:

#include <string.h>
char [[qual_with (Q)]] * strrchr (char const [[qual_var (Q)]] * s,  int c);

Replace "the strrchr generic function" with "the strrchr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.26.5.8 "The strstrgeneric function", changing the signature in the synopsis:

#include <string.h>
char [[qual_with (Q)]] * strstr  (char const [[qual_var (Q)]] * s1, const char * s2);

Replace "the strstr generic function" with "the strstr function" in paragraph 2 and paragraph 3, and the clause title.

Remove 7.31.4.5 "Wide string search functions", paragraph 2 "The external declarations ...", paragraph 3, and footnote 413.

Modify 7.31.4.5.2 "The wcschr generic function", changing the signature in the synopsis:

#include <wchar.h>
wchar_t [[qual_with (Q)]] * wcschr (wchar_t const [[qual_var (Q)]] * s,  wchar_t c);

Replace "the wcschr generic function" with "the wcschr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.31.4.5.4 "The wcspbrk generic function", changing the signature in the synopsis:

#include <wchar.h>
wchar_t [[qual_with (Q)]] * wcspbrk (wchar_t const [[qual_var (Q)]] * s1, const wchar_t *s2);

Replace "the wcspbrk generic function" with "the wcspbrk function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.31.4.5.5 "The wcsrchr generic function", changing the signature in the synopsis:

#include <wchar.h>
wchar_t [[qual_with (Q)]] * wcsrchr (wchar_t const [[qual_var (Q)]] * s, wchar_t c);

Replace "the wcsrchr generic function" with "the wcsrchr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.31.4.5.7 "The wcsstr generic function", changing the signature in the synopsis:

#include <wchar.h>
wchar_t [[qual_with (Q)]] * wcsstr (wchar_t const [[qual_var (Q)]] * s1, const wchar_t *s2);

Replace "the wcsstr generic function" with "the wcsstr function" in paragraph 2 and paragraph 3, and the clause title.

Modify 7.31.4.5.5 "The wmemchr generic function", changing the signature in the synopsis:

#include <wchar.h>
wchar_t [[qual_with (Q)]] * wmemchr (wchar_t const [[qual_var (Q)]] * s, wchar_t c, size_t n);

Replace "the wmemchr generic function" with "the wmemchr function" in paragraph 2 and paragraph 3, and the clause title.

Questions for WG14

Would WG14 like to add something along the lines of the bind_type, bind_var, qual_var and qual_with attributes specified in N3316 to C2y?

Would WG14 like to allow function designator expressions to be cast to more-generic function pointers, as specified in N3316, in C2y?

Would WG14 like to change the specification of the generic search and sort functions bsearch, qsort, memchr, strchr, strpbrk, strrchr, strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr to include the bind_var and qual_var attributes instead of QVoid et al.?

Would WG14 like to see something along the lines of bind_var and bind_type for structure or container types?

References

C2y latest public draft
Parametric polymorphism
Towards type-checked polymorphism
Polymorphic Types

Examples

Example 1: untyped

// basic array mapper
// we can use it with mis-typed arguments

typedef void (* Mutate) (void *, void *);
typedef void * (* Step) (void *);

void addOneInt   (void * in, void * out) {   *(int *)out =   *(int *)in + 1;    }
void addOneFloat (void * in, void * out) { *(float *)out = *(float *)in + 1.0f; }

void * step_float (void * p) { return (float *)p + 1; }
void * step_int   (void * p) { return   (int *)p + 1; }

int ia[10];
float fa[10];

void map (void * array_in, void * array_out, int size, Mutate mut, Step step) {
    void * in = array_in;
    void * out = array_out;
    for (int i = 0; i < size; ++ i, in = step (in), out = step (out)) {
        mut (in, out);
    }
}

void incrArrays (void) {
  map (ia, ia, 10, addOneInt, step_int);
  map (fa, fa, 10, addOneFloat, step_float);

  // oh no: this also compiles, because of void*
  map (fa, fa, 10, addOneInt, step_float);
  map (ia, fa, 10, addOneInt, step_float);
  map (ia, fa, 10, addOneInt, step_float);
  map (ia, ia, 10, addOneInt, step_float);
  map (fa, fa, 10, addOneInt, step_int);

  map (ia, ia, 10, addOneFloat, step_int);
  map (ia, fa, 10, addOneFloat, step_int);
  map (ia, ia, 10, addOneFloat, step_int);
  map (ia, ia, 10, addOneFloat, step_int);
  map (ia, ia, 10, addOneFloat, step_float);
}

Example 2: semiautomatic typechecking with brittle verbosity (C11)

// basic array mapper
// enhanced with type checking despite accepting arrays of any type - checks
// that the operand kind of the mapped function matches the array element type
//  i.e. map :: ([T], T -> T) -> [T]

typedef void (* Mutate) (void *, void *);
typedef void * (* Step) (void *);

void addOneInt   (void * in, void * out) {   *(int *)out =   *(int *)in + 1;    }
void addOneFloat (void * in, void * out) { *(float *)out = *(float *)in + 1.0f; }

void * step_float (void * p) { return (float *)p + 1; }
void * step_int   (void * p) { return   (int *)p + 1; }

int ia[10];
float fa[10];

void map_impl (void * array_in, void * array_out, int size, Mutate mut, Step step) {
  void * in = array_in;
  void * out = array_out;
  for (int i = 0; i < size; ++ i, in = step (in), out = step (out)) {
    mut (in, out);
  }
}


#define FunctionDescriptor(Type, Func) union { Func func; Type T; }

#define same_type(A, B) _Generic(1 ? (A) : (B)  \
  , void *: 0                                   \
  , void const *: 0                             \
  , void volatile *: 0                          \
  , void const volatile *: 0                    \
  , default: 1)
#define check_same_type(A, B) _Static_assert (same_type (A, B), "types must match");

#define check_array_size

#define map(in, out, size, mut, step) do {                 \
  check_same_type ((in), (out));                           \
                                                           \
  check_same_type ((in), &(mut).T);                        \
  check_same_type ((in), &(step).T);                       \
                                                           \
  map_impl ((in), (out), (size), (mut).func, (step).func); \
} while (0)


typedef FunctionDescriptor (int, Mutate) MutInt;
typedef FunctionDescriptor (int, Step) StepInt;
typedef FunctionDescriptor (float, Mutate) MutFloat;
typedef FunctionDescriptor (float, Step) StepFloat;

MutInt addOneInt_g;
StepInt stepInt_g;

MutFloat addOneFloat_g;
StepFloat stepFloat_g;


void incrArrays (void) {
  map (ia, ia, 10, addOneInt_g, stepInt_g);
  map (fa, fa, 10, addOneFloat_g, stepFloat_g);

  // no longer compile!
  // map (fa, fa, 10, addOneInt, step_float);
  // map (ia, fa, 10, addOneInt, step_float);
  // map (ia, fa, 10, addOneInt, step_float);
  // map (ia, ia, 10, addOneInt, step_float);
  // map (fa, fa, 10, addOneInt, step_int);

  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, fa, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_float);
}

Example 3: polymorphic typechecking with `void`-which-binds

// basic array mapper
// enhanced with type checking despite accepting arrays of any type - checks
// that the operand kind of the mapped function matches the array element type
//  i.e. map :: ([T], T -> T) -> [T]

#define Auto(T) void [[bind_var (T)]]
#define Void(T) void [[bind_type (T)]]

#define Mutate(T, N) void (* N) (Void (T) *, Void (T) *)
#define Step(T, N) Void (T) * (* N) (Void (T) *)

void addOneInt   (Void (int)   * in, Void (int)   * out) {   *(int *)out =   *(int *)in + 1;    }
void addOneFloat (Void (float) * in, Void (float) * out) { *(float *)out = *(float *)in + 1.0f; }

Void (float) * step_float (Void (float) * p) { return (float *)p + 1; }
Void (int)   * step_int   (Void (int)   * p) { return   (int *)p + 1; }

int ia[10];
float fa[10];

void map (Auto (A) * array_in, Auto (A) * array_out, int size, Mutate(A, mut), Step (A, step)) {
    Void (A) * in = array_in;
    Void (A) * out = array_out;
    for (int i = 0; i < size; ++ i, in = step (in), out = step (out)) {
        mut (in, out);
    }
}

void incrArrays (void) {
  map (ia, ia, 10, addOneInt, step_int);
  map (fa, fa, 10, addOneFloat, step_float);

  // no longer compile!
  // map (fa, fa, 10, addOneInt, step_float);
  // map (ia, fa, 10, addOneInt, step_float);
  // map (ia, fa, 10, addOneInt, step_float);
  // map (ia, ia, 10, addOneInt, step_float);
  // map (fa, fa, 10, addOneInt, step_int);

  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, fa, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_int);
  // map (ia, ia, 10, addOneFloat, step_float);
}

Abstract

The void-which-binds, v2: typesafe parametric polymorphism

Summary of Changes

N3316

N2853

Introduction

Rationale

Proposal

bind_type

bind_var

qual_var and qual_with

Casting to generic function types

Library changes

Alternatives

Dynamic types

Impact

Future directions

For container types

For dynamic types

Proposed wording

New attributes

Permissive function casts

Library changes

Questions for WG14

References

Examples

Example 1: untyped

Example 2: semiautomatic typechecking with brittle verbosity (C11)

Example 3: polymorphic typechecking with void-which-binds

The `void`-which-binds, v2: typesafe parametric polymorphism

`bind_type`

`bind_var`

`qual_var` and `qual_with`

Example 3: polymorphic typechecking with `void`-which-binds