Document #: | P3394R0 [Latest] [Status] |
Date: | 2024-10-14 |
Project: | Programming Language C++ |
Audience: |
EWG |
Reply-to: |
Wyatt Childers <wcc@edg.com> Dan Katz <dkatz85@bloomberg.net> Barry Revzin <barry.revzin@gmail.com> Daveed Vandevoorde <daveed@edg.com> |
Ever since writing [P1240R0] (“Scalable Reflection in C++”), but more so since [P2996R0] (“Reflection for C++26”), we have been requested to add a capability to annotate declarations in a way that reflection can observe. For example, Jeremy Ong presented compelling arguments in a post to the SG7 reflector: https://lists.isocpp.org/sg7/2023/10/0450.php. Corentin Jabot also noticed the need while P1240 was evolving and wrote [P1887R0] (“Reflection on Attributes”), which proposes syntax not entirely unlike what we present here.
In early versions of P2996 (and P1240 before that), a workaround was to encode properties in the template arguments of alias template specializations:
template <typename T, auto... Annotations> using Noted = T; struct C { <int, 1> a; Noted<int*, some, thing> b; Noted};
It was expected that something like type_of(^C::a)
would produce a reflection of Noted<int, 1>
and that can be taken apart with metafunctions like
template_arguments_of
— which both
preserves the type as desired (a
is
still an
int
) and
allows reflection queries to get at the desired annotations (the
1
,
some
, and
thing
in this case).
There are problems with this approach, unfortunately:
In this paper, we propose simple mechanisms that more directly support the ability to annotate C++ constructs.
We’ll start with a few motivating examples for the feature. We’ll describe the details of the feature in the subsequent section.
These examples are inspired from libraries in other programming languages that provide some mechanism to annotate declarations.
Rust’s clap library provides a way to add annotations to declarations to help drive how the parser is declared. We can now do the same:
struct Args { [[=clap::Help("Name of the person to greet")]] [[=clap::Short, =clap::Long]] ::string name; std [[=clap::Help("Number of times to greet")]] [[=clap::Short, =clap::Long]] int count = 1; }; int main(int argc, char** argv) { = clap::parse<Args>(argc, argv); Args args for (int i = 0; i < args.count; ++i) { ::cout << "Hello " << args.name << '\n'; std} }
Here, we provide three types
(Short
,
Long
, and
Help
) which help define how these
member variables are intended to be used on the command-line. This is
implemented on top of Lyra.
When run:
$ demo -h USAGE: demo [-?|-h|--help] [-n|--name <name>] [-c|--count <count>] Display usage information. OPTIONS, ARGUMENTS: -?, -h, --help -n, --name <name> Name of the person to greet -c, --count <count> Number of times to greet $ demo -n wg21 --count 3 Hello wg21 Hello wg21 Hello wg21
While Short
and
Long
can take explicit values, by
default they use the first letter and whole name of the member that they
annotate.
The core of the implementation is that parse<Args>
loops over all the non-static data members of
Args
, then finds all the
clap
-related annotations and invokes
them:
template <typename Args> auto parse(int argc, char** argv) -> Args { Args args;auto cli = lyra::cli(); // ... template for (constexpr info M : nonstatic_data_members_of(^^Args)) { auto id = std::string id(identifier_of(mem)); auto opt = lyra::opt(args.[:M:], id); template for (constexpr info A : annotations_of(M)) { if constexpr (parent_of(type_of(A)) == ^^clap) { // for those annotions that are in the clap namespace // invoke them on our option <[:type_of(A):]>(A).apply_annotation(opt, id); extract} } .add_argument(opt); cli} // ... };
So, for instance, Short
would be
implemented like this:
namespace clap { struct ShortArg { // optional isn't structural yet but let's pretend <char> value; optional constexpr auto operator()(char c) const -> ShortArg { return {.value=c}; }; auto apply_annotation(lyra::opt& opt, std::string_view id) const -> void { char first = value.value_or(id[0]); [std::string("-") + first]; opt} }; inline constexpr auto Short = ShortArg(); }
Overall, a fairly concise implementation for an extremely user-friendly approach to command-line argument parsing.
The pytest framework comes with a decorator to parametrize test functions. We can now do the same:
namespace N { [[=parametrize({ Tuple{1, 1, 2}, Tuple{1, 2, 3} })]] void test_sum(int x, int y, int z) { ::println("Called test_sum(x={}, y={}, z={})", x, y, z); std} struct Fixture { () { Fixture::println("setup fixture"); std} ~Fixture() { ::println("teardown fixture"); std} [[=parametrize({Tuple{1}, Tuple{2}})]] void test_one(int x) { ::println("test one({})", x); std} void test_two() { ::println("test two"); std} }; } int main() { <^^N>(); invoke_all}
When run, this prints:
Called test_sum(x=1, y=1, z=2) Called test_sum(x=1, y=2, z=3) setup fixture test one(1) teardown fixture setup fixture test one(2) teardown fixture setup fixture test two teardown fixture
Here, parametrize
returns a value
that is some specialization of
Parametrize
(which is basically an
array of tuples, except that
std::tuple
isn’t structural so the implementation rolls its own).
The rest of the implementation looks for all the free functions named
test_*
or
nonstatic member functions of class types that start with
test_*
and
invokes them once, or with each parameter, depending on the presence of
the annotation. That looks like this:
consteval auto parametrization_of(std::meta::info M) -> std::meta::info { for (auto a : annotations_of(M)) { auto t = type_of(a); if (has_template_arguments(t) and template_of(t) == ^^Parametrize) { return a; } } return std::meta::info(); } template <std::meta::info M, class F> void invoke_single_test(F f) { constexpr auto A = parametrization_of(M); if constexpr (A != std::meta::info()) { // if we are parametrized, pull out that value // and for each tuple, invoke the function // this is basically calling std::apply on an array of tuples constexpr auto Params = extract<[:type_of(A):]>(A); for (auto P : Params) { .apply(f); P} } else { (); f} } template <std::meta::info Namespace> void invoke_all() { template for (constexpr std::meta::info M : members_of(Namespace)) { if constexpr (is_function(M) and identifier_of(M).starts_with("test_")) { <M>([:M:]); invoke_single_test} else if constexpr (is_type(M)) { template for (constexpr std::meta::info F : nonstatic_member_functions_of(M)) { if constexpr (identifier_of(F).starts_with("test_")) { <F>([&](auto... args){ invoke_single_testtypename [:M:] fixture; .[:F:](args...); fixture}); } } } } }
Rust’s serde library is a framework for serialization and deserialization. It is easy enough with reflection to do member-wise serialization. But how do you opt into that? An annotation provides a cheap mechanism of doing just that (built on top of Boost.Json):
struct [[=serde::derive]] Point { int x, y; };
Allowing:
// prints {"x":1,"y":2} ::cout << boost::json::value_from(Point{.x=1, .y=2}); std
But opting in is just the first thing you might want to do with
serialization. You might also, for instance, want to change how fields
are serialized. serde
provides a lot
of attributes to do so. The easiest to look at is
rename
, which uses the provided
string instead of the name of the non-static data member:
struct [[=serde::derive]] Person { [[=serde::rename("first name")]] std::string first; [[=serde::rename("last name")]] std::string last; };
Which leads to:
// prints {"first name":"Peter","last name":"Dimov"} ::cout << boost::json::value_from(Person{.first="Peter", .last="Dimov"}); std
The implementation for these pieces is fairly straightforward. We
provide an opt-in for the value conversion function when the serde::derive
annotation is present. In that case, we walk all the non-static data
members and write them into the boost::json::value
output. If a serde::Rename
annotation is present, we use that instead of the data member’s
name:
namespace serde { inline constexpr struct{} derive{}; struct rename { char const* field; }; } namespace boost::json { template <class T> requires (has_annotation(^^T, serde::derive)) void tag_invoke(value_from_tag const&, value& v, T const& t) { auto& obj = v.emplace_object(); template for (auto M : nonstatic_data_members_of(^^T)) { constexpr auto field = annotation_of<serde::rename>(M) .transform([](serde::rename r){ return std::string_view(r.field); }) .value_or(identifier_of(M)); [field] = boost::json::value_from(t.[:M:]); obj} } }
You can imagine extending this out to support a wide variety of other
serialization-specific attributes that shouldn’t otherwise affect the
C++ usage of the type. For instance, a more complex approach
additionally supports the
skip_serializing_if
annotation while
first collecting all serde
annotations into a struct.
The core idea is that an annotation is a compile-time value that can be associated with a construct to which attributes can appertain. Annotation and attributes are somewhat related ideas, and we therefore propose a syntax for the former that builds on the existing syntax for the latter.
At its simplest:
struct C { [[=1]] int a; };
Syntactically, an annotation is an attribute of the form
= expr
where
expr
is a constant-expression
(which syntactically excludes, e.g., comma-expression
)
to which the glvalue-to-prvalue conversion has been applied if the
expression wasn’t a prvalue to start with.
Currently, we require that an annotation has structural type because
we’re going to return annotations through std::meta::info
,
and currently all reflection values must be structural.
Attributes are very close in spirit to annotations. So it made sense to piggy-back on the attribute syntax to add annotations. Existing attributes are designed with fairly open grammar and they can be ignored by implementations, which makes it difficult to connect them to user code. Given a declarations like:
[[nodiscard, gnu::always_inline]] [[deprecated("don't use me")]] void f();
What could reflecting on f
return? Because attributes are ignorable, an implementation might simply
ignore them. Additionally, there is no particular value associated with
any of these attributes that would be sensible to return. We’re limited
to returning either a sequence of strings. Or, with P3294, token sequences.
But it turns out to be quite helpful to preserve the actual values without requiring libraries to do additional parsing work. Thus, we need to distinguish annotations (whose values we need to preserve and return back to the user) from attributes (whose values we do not). Thus, we looked for a sigil introducing a general expression.
Originally, the plus sign
(+
) was
considered (as in P1887), but it is not ideal because a prefix
+
has a
meaning for some expressions and not for others, and that would not
carry over to the attribute notation. A prefix
=
was found
to be reasonably meaningful in the sense that the annotation “equals”
the value on the right, while also being syntactically unambiguous. We
also discussed using the reflection operator
(^
) as an
introducer (which is attractive because the annotation ultimately comes
back to the programmer as a reflection value), but that raised questions
about an annotation itself being a reflection value (which is not
entirely improbable).
As such, this paper proposes annotations as distinct from attributes,
introduced with a prefix
=
.
We propose the following set of library functions to work with annotations:
namespace std::meta { consteval bool is_annotation(info); consteval vector<info> annotations_of(info item); // (1) consteval vector<info> annotations_of(info item, info type); // (2) template<class T> consteval optional<T> annotation_of(info item); template<class T> consteval bool has_annotation(info item); // (3) template<class T> consteval bool has_annotation(info item, T const& value); // (4) consteval info annotate(info item, info value,= source_location::current()); source_location loc }
is_annotation
checks whether a
particular reflection represents an annotation.
We provide two overloads of
annotations_of
to retrieve all the
annotations on a particular item:
a
such that type_of(a) == type
.And a singular version, annotation_of<T>
,
that returns the annotation a
such
that dealias(type_of(a)) == ^^T
.
If no such annotation exists, returns
nullopt
. If more than one such
annotation exists and all the values are not
template-argument-equivalent, this call is not a constant
expression.
And then two overloads of
has_annotation
that simply checks if
a given annotation exists:
a
such that dealias(type_of(a)) == ^^T
.a
such that value_of(a) == reflect_value(value)
.Of these, four can be directly implemented in terms of the unary
annotations_of(item)
,
but we think they’ll be common enough to merit inclusion.
And lastly, annotate
provides the
ability to programmatically add an annotation to a declaration.
Annotations can be repeated:
[[=42, =42]] int x; static_assert(annotations_of(^x).size() == 2);
Annotations spread over multiple declarations of the same entity accumulate:
[[=42]] int f(); [[=24]] int f(); static_assert(annotations_of(^f).size() == 2);
Annotations follow appertainance rules like attributes, but shall not appear in the attribute-specifier-seq of a type-specifier-seq or an empty-declaration:
struct [[=0]] S {}; // Okay: Appertains to S. [[=42]] int f(); // Okay: Appertains to f(). int f[[=0]] f(); // Ditto. int [[=24]] f(); // Error: Cannot appertain to int. [[=123]]; // Error: No applicable construct.
To avoid confusion, annotations are not permitted after an attribute-using-prefix. So this is an error:
[[using clang: amdgpu_waves_per_eu, =nick("weapon")]] int select_footgun(int);
Instead, use:
[[using clang: amdgpu_waves_per_eu]] [[=nick("weapon")]] int select_footgun(int);
The core language feature and the basic query functions have been
implemented in the EDG front end and in Bloomberg’s P2996 Clang fork
(with option -freflection-latest
),
both available on Compiler Explorer.
As evidenced in the motivating examples earlier, there is a lot of value in this proposal even in this simple form. However, there is more to consider when it comes to annotations.
This proposal right now lets us unconditionally add an annotation to a type:
struct [[=X]] Always;
But it does not let us conditionally add an annotation to a type:
template <class T> struct /* X only for some T */ Sometimes;
Or to really generalize annotations. For instance, in the clap
example earlier, our example showed usage with
clap::Short
and
clap::Long
.
What if somebody wants to compose these into their own annotation that
attaches both
clap::Short
and
clap::Long
to a declaration?
More broadly, there is clear value in having an annotation be able to be invoked by the declaration itself. Doing so allows the two uses above easily enough. An interesting question, though, is whether this callback (syntax to be determined) is invoked at the beginning of the declaration or at the end of the declaration. For annotations on classes, this would be before the class is complete or after the class is complete. Before completeness allows the class to observe the annotation during instantiation. After completeness allows the annotation callback to observe properties of the type. In some sense, Herb Sutter’s [P0707R4] (“Metaclass functions: Generative C++”) was adding annotations on classes, invoked on class completeness, that allow mutation of the class.
One concrete, simpler example. We can, with this proposal as-is,
create a Debug
annotation that a
user can add to their type and a specialization of std::formatter
for
all types that have a Debug
annotation as follows:
template <auto V> struct Derive { }; template <auto V> inline constexpr Derive<V> derive; inline constexpr struct{} Debug; template <class T> requires (has_annotation(^^T, derive<Debug>)) struct std::formatter<T> { // ... }; struct [[=derive<Debug>]] Point { int x; int y; }; int main() { auto p = Point{.x=1, .y=2}; // prints p=Point{.x=1, .y=2} ::println("p={}", p); std}
This works, but it’s not really the ideal way of doing it.
This could still run into potential issues with ambiguous specialization
of std::formatter
.
Better would be to allow the Debug
annotation to, at the point of completion of
Point
, inject an explicit
specialization of std::formatter
.
This would rely both on the ability for the annotation to be called back
and language support for such injection (see [P3294R1] (“Code Injection with Token
Sequences”)).
There are still open questions as to how to handle such callbacks.
Does an annotation that gets called back merit different syntax from an
annotation that doesn’t? Can it mutate the entity that it is attached
to? How do we name the potential callbacks? Should the callback be
registered implicitly (e.g., if an annotation of type
X
with member X::annotate_declaration(...)
appears, that member is automatically a callback invoked when an entity
is first declared with an annotation of type
X
) or explicitly (e.g., calling
annotated_declaration_callback(^^X, X_handler)
would cause X_handler(...)
to
be invoked when an entity is first declared with an annotation of type
X
).