<meta> should minimize standard library dependencies

Document #: P4329R0
Date: 2024-10-15
Project: Programming Language C++
Audience: LEWG
Reply-to: Jonathan Müller (think-cell)
<>

1 Abstract

[P2996R7 - Reflection for C++26] requires library support in the form of the <meta> header. As specified, this library API requires other standard library facilities such as std::vector, std::string_view, and std::optional. We propose minimizing these standard library facilities to ensure more wide-spread adoption. In our testing, the proposed changes have only minimal impact on user code.

2 Background

[P2996R7] adds reflection library support in form of the <meta> header. The functions introduced there are essentially wrappers around compiler built-ins and thus cannot be implemented by users: People that want to use reflection have to use <meta> or directly reach for the compiler built-ins. <meta> thus has the same status as <type_traits>, <coroutine>, <initializer_list>, and <source_location>. Yet, unlike those headers, which carefully avoided standard library dependencies, <meta> does not.

Right now, it is specified to include:

In addition, the interface requires:

This means every implementation of <meta> is required to include the specified headers, and provide the definitions for the specified types. In addition, the compiler needs to be able to construct and manipulate objects of various standard library types at compile-time.

3 Motivation

Reflection is primarily a language feature that requires some standard library APIs. However, the C++ language has more users than the C++ standard library; developers in some domains (e.g. gamedev) make heavy use of C++ but avoid using the standard library. It is not for us in the committee to say whether they are justified in their reasons, but we have a duty to represent the interests of all our users and not just the subset of users that have representation in WG21. If we can better support everyone at minimal cost, we should do so or risk seeming even more out of touch.

To that end, the <meta> header should minimize standard library dependencies. This has the following advantages:

We therefore propose that <meta> should minimize standard library dependencies. In this paper, we exhaustively enumerate every dependency it has and suggest an alternative. Note that we are not proposing to force the implementation to avoid standard library dependencies in their implementations. We are just making it possible for them to do so.

All wording in this paper is relative to [P2996R7].

4 Implementation experience

For the purposes of testing the impact on user code, some of the proposed changes have been implemented by modifying the <meta> header of [bloomberg-clang]. These changes are:

Poll 1.1 has obvious user impact and does not need to be investigated; polls 1.2, 1.3, and 4.1 are non-breaking changes that widen the interface contract; poll 3 is potentially a breaking change but requires compiler support to implement; poll 5 is an obvious breaking change that provides equivalent convenience and does not need to be investigated.

The modified <meta> header is available here: [prototype-impl]

We investigated the following examples from [P2996R7] by replacing the #include <experimental/meta> with an #include of the above implementation:

Similarly, we investigated the following examples of [daveed-keynote]:

We also manually investigated the reflection implementation of [daw_json_link], which requires an additional std::ranges::to<std::vector> in the implementation of a reflection function that has since been added to [P2996R7].

5 Baseline: Depending on <initializer_list>, <compare> and std::size_t is perfectly fine

Those facilities are other compiler interface headers that are perfectly fine to use. We do not propose removing those dependencies.

6 Poll 0: Remove the list of includes from the wording

The wording requires an include of <ranges> but the interface only requires a couple of concepts. Yet, because it is specified in the wording, every implementation, even ones that partition their headers to avoid unnecessary dependencies like libc++, have to include <ranges>, and not just the smaller internal header that defines the necessary concepts.

Removing the requirement gives implementations more freedoms at the cost of users having to include the headers themselves when they need those features. But it is likely that users which heavily use e.g. <ranges> will have it included already anyway. However, users that don’t use <ranges> will not have to pay the extra cost of the include. As noted in the motivation, these includes alone caused a 40% increase in compile-time for [lexy].

Requiring the include of <initializer_list> is not a big problem and is common for other headers.

6.1 User impact

Users that included <meta> now also need to ensure they have <ranges>, <vector>, or <string_view> included if they want to use declarations from those headers that aren’t already used in <meta>’s interface.

6.2 Implementation impact

None. An implementation can still include whatever headers they want.

6.3 Wording

Modify the new subsection in 21 [meta] after 21.3 [type.traits]:

Header <meta> synopsis

#include <initializer_list>
-#include <ranges>
-#include <string_view>
-#include <vector>

namespace std::meta {

7 Range concepts

7.1 Poll 1.1: Make the reflection_range concept exposition-only

The wording defines a concept reflection_range that is used to constrain member functions. It is modeled by input ranges whose value and reference types are meta::info (references).

It is unlikely that users want to refine this concept further in a way that requires subsumption, so it is not necessary to expose it as a concept. So at best exposing the concept is convenient for users writing generic code which also requires input ranges whose value and reference types are meta::info (references). However, this problem is better solved by adding a generic ranges::input_range_of<T> concept, as it is a problem that is not specific to reflection.

Exposing the ad-hoc concept right now as-is would also freeze it in-place, so even if we had a ranges::input_range_of<T> concept, reflection_range would not subsume it. Leaving it exposition-only gives us more leeway.

7.2 User impact

Users that want to constrain a function on a range of std::meta::info objects need to write a similar concept themselves.

7.3 Implementation impact

None. An implementation presumably will still add the concept, just under a different name.

7.3.1 Wording

Modify the new subsection in 21 [meta] after 21.3 [type.traits]:

Header <meta> synopsis


  // [meta.reflection.substitute], reflection substitution
  template <class R>
-    concept reflection_range = see below;
+    concept reflection-range = see below; // exposition-only

-  template <reflection-range R = initializer_list<info>>
+  template <reflection-range R = initializer_list<info>>
    consteval bool can_substitute(info templ, R&& arguments);
-  template <reflection-range R = initializer_list<info>>
+  template <reflection-range R = initializer_list<info>>
    consteval info substitute(info templ, R&& arguments);

And likewise replace all others of reflection_range with reflection-range.

7.4 Poll 1.2: Change the reflection_range concept to match language semantics

As specified, reflection_range is a refinement of ranges::input_range, which imposes additional requirements on the iterator type, like the existence of a value_type and difference_type or std::iterator_traits specialization. Those requirements are not necessary for the range-based for-loop.

People that don’t care about the standard library range concepts, still want to use reflection. They thus might have range types that don’t model any of the standard library range concepts, but are still supported by the range-based for-loop. For an interface that is supposed to be low-level and close to the compiler, it can make sense to instead follow the language semantics, and not the standard library semantics.

7.5 User impact

Positive, functions accept strictly more types than before.

Before
After
class MyRange
{
public:
    class iterator
    {
    public:
        using value_type     = std::meta::info;
        using difference_type = std::ptrdiff_t;

        iterator();

        std::meta::info operator*() const;
        iterator& operator++();
        void operator++(int);

        bool operator==(iterator, iterator) const;
    };

    iterator begin();
    iterator end();
};

auto result = substitute(info, MyRange{});
class MyRange
{
public:
    class iterator
    {
    public:





        std::meta::info operator*() const;
        iterator& operator++();


        bool operator==(iterator, iterator) const;
    };

    iterator begin();
    iterator end();
};

auto result = substitute(info, MyRange{});

7.6 Implementation impact

Implementations need to write a custom concept instead of using ranges::input_range.

7.6.1 Wording

Modify [meta.reflection.substitute] Reflection substitution

template <class R>
concept reflection_range =
-  ranges::input_range<R> &&
-  same_as<ranges::range_value_t<R>, info> &&
-  same_as<remove_cvref_t<ranges::range_reference_t<R>>, info>;
+  see-below; // exposition-only

A type R models the exposition-only concept reflection-range if a range-based for loop statement [stmt.ranged] for (std::same_as<info> auto _ : r) {} is well-formed for an expression of type R.

[Note: This requires checking whether the begin-expr and end-expr as defined in [stmt.ranged] are well-formed and that the resulting types support !=, ++, and * as needed in the loop transformation. — end note]

7.7 Poll 1.3: Replace ranges::input_range

Similarly, the define_static_array function is constrained by ranges::input_range. With the same logic as above, this should be generalized to require only the range-based for loop to work.

7.8 User impact

Positive, functions accept strictly more types than before (see above).

7.9 Implementation impact

Implementations need to write a custom concept instead of using ranges::input_range.

7.9.1 Wording

Modify [meta.reflection.define_static] Static array generation

-template<ranges::input_range R>
+template<typename R>
-    consteval span<const ranges::range_value_t<R>> define_static_array(R&& r);
+    consteval see-below define_static_array(R&& r);

And change the wording as follows:

4 Constraints: A range-based for loop statement [stmt.ranged] for (auto&& x : r) {} is well-formed for an expression of type R.

[Note: This requires checking whether the begin-expr and end-expr as defined in [stmt.ranged] are well-formed and that the resulting types support !=, ++, and * as needed in the loop transformation. — end note]

5 Given for (auto&& x : r) {}, let D be the number of times the range-based for loop is iterated, T be the type decay_t<decltype(x)>, and S be a constexpr variable of array type with static storage duration, whose elements are of type const T, for which there exists some k ≥ 0 such that S[k + i] == ri for all 0 ≤ i < D where ri is the value of x on the ith iteration.

6 Returns: span<const T>(addressof(S[k]), D)

7 Implementations are encouraged to return the same object whenever the same the function is called with the same argument.

And update the synopsis accordingly.

8 Poll 2: Replace std::vector by new std::meta::info_array

Functions that return a range of meta::info objects do it by returning a std::vector<std::meta::info> (for implementation reasons, it has to be an owning container and cannot be something like a std::span). For the reasons discussed above, this is not ideal.

Instead, all functions that return a std::vector<std::meta::info> should instead return a new type std::meta::info_array. This is a type that can only be constructed by the implementation and has whatever internal layout is most appropriate for the implementation. Unlike std::vector, the proposed std::meta::info_array is not mutable and cannot grow in size. This simplifies the implementation further.

For the vast majority of calls, this change is not noticeable: All they do is iterate over the result, compose it with other views, or apply range algorithms. For those users that do rely on having a mutable, growable container, all they need to do is call std::ranges::to<std::vector>.

The proposed wording adds all members of std::array, except for fill (which doesn’t make sense) and reverse_iterator (would introduce more standard library dependencies). That way it also provides all functions provided by std::ranges::view_interface for a contiguous range (except for operator bool, which is weird for a container). Alternatively, the minimum interface could model std::initializer_list and only provide begin/end and size.

8.1 User impact

Users that want to append elements to a range of input objects or remove them from a range of input objects need to either use views::concat/views::filter or std::ranges::to.

In our testing, this affected the command-line argument parsing example of Daveed’s keynote:

Before
After (option 1)
After (option 2)
static consteval auto clap_annotations_of(info dm) {
    auto notes = annotations_of(dm);
    std::erase_if(notes, [](info  ann) {
        return parent_of(type_of(ann)) != ^^clap;
    });
    return notes;
}
static consteval auto clap_annotations_of(info dm) {
    auto notes = annotations_of(dm) | std::ranges::to<std::vector>();
    std::erase_if(notes, [](info  ann) {
        return parent_of(type_of(ann)) != ^^clap;
    });
    return notes;
}
static consteval auto clap_annotations_of(info dm) {
    return annotations_of(dm)
        | std::views::filter([](info  ann) {
            return parent_of(type_of(ann)) == ^^clap;
        });
}

A similar change would be needed for daw_json_link. However, their use case is served by std::meta::get_public_nonstatic_data_members().

For completeness, we also needed to update the implementation of e.g. std::meta::subobjects_of which called .append_range() internally:

Before
After
consteval auto subobjects_of(info r) -> vector<info> {
    if (is_namespace(r))
        throw "Namespaces cannot have subobjects";

    auto subobjects = bases_of(r);
    subobjects.append_range(nonstatic_data_members_of(r));
    return subobjects;
}
consteval auto subobjects_of(info r) -> info_array {
    if (is_namespace(r))
        throw "Namespaces cannot have subobjects";

    auto subobjects = bases_of(r) | ranges::to<vector>();
    subobjects.append_range(nonstatic_data_members_of(r));
    return subobjects;
}

An implementation that does not use std::vector internally would not need to change anything.

Users that do not wish to modify the container and only iterate over it are not affected.

8.2 Implementation impact

An implementation that does not care about decoupling dependencies just need to provide a wrapper class over std::vector. In our prototype implementation, it was done in 30 lines of code. A production-ready implementation can then also optimize std::ranges::to to avoid additional memory allocations and copies when the user wants a std::vector.

8.3 Wording

Modify the new subsection in 21 [meta] after 21.3 [type.traits]:

Header <meta> synopsis


namespace std::meta {
    using info = decltype(^::);

+    // [meta.reflection.info_array], info array
+    class info_array;



-    consteval vector<info> template_arguments_of(info r);
+    consteval info_array template_arguments_of(info r);

    // [meta.reflection.member.queries], reflection member queries
-    consteval vector<info> members_of(info r);
+    consteval info_array members_of(info r);
-    consteval vector<info> bases_of(info type);
+    consteval info_array bases_of(info type);
-    consteval vector<info> static_data_members_of(info type);
+    consteval info_array static_data_members_of(info type);
-    consteval vector<info> nonstatic_data_members_of(info type);
+    consteval info_array nonstatic_data_members_of(info type);
-    consteval vector<info> enumerators_of(info type_enum);
+    consteval info_array enumerators_of(info type_enum);

-    consteval vector<info> get_public_members(info type);
+    consteval info_array get_public_members(info type);
-    consteval vector<info> get_public_static_data_members(info type);
+    consteval info_array get_public_static_data_members(info type);
-    consteval vector<info> get_public_nonstatic_data_members(info type);
+    consteval info_array get_public_nonstatic_data_members(info type);
-    consteval vector<info> get_public_bases(info type);
+    consteval info_array get_public_bases(info type);
}

Add a new section [meta.reflection.info_array] Info array:

class info_array {
public:
    using value_type             = info;
    using pointer                = const info*;
    using const_pointer          = pointer;
    using reference              = const info&;
    using const_reference        = reference;
    using size_type              = size_t;
    using difference_type        = ptrdiff_t;
    using iterator = pointer;

    info_array() = delete;
    ~info_array();
    constexpr info_array(const info_array&);
    constexpr info_array(info_array&&) noexcept;
    constexpr info_array& operator=(const info_array&);
    constexpr info_array& operator=(info_array&&) noexcept;

    constexpr void swap(info_array&) noexcept;

    constexpr iterator begin() const noexcept;
    constexpr iterator end() const noexcept;

    constexpr iterator cbegin() const noexcept { return begin(); }
    constexpr iterator cend() const noexcept { return end(); }

    constexpr bool empty() const noexcept { return begin() == end(); }
    constexpr size_type size() const noexcept { return end() - begin(); }
    constexpr reference operator[](size_type i) const noexcept { return begin()[i]; }
    constexpr reference front() const noexcept { return begin()[0]; }
    constexpr reference back() const noexcept { return end()[-1]; }
    constexpr pointer data() const noexcept { return begin(); }
};

1 The type info_array is a non-mutable, non-resizable container of info objects.

etc. etc.

Update the other sections accordingly to use info_array instead of vector<info>.

9 Poll 3: Replace std::span by std::initializer_list

define_static_array returns a std::span to the static array that is being defined. To reduce dependencies, this could be std::initializer_list. This has the additional benefit that you can pass it to std::initializer_list constructors.

It is not clear to us how define_static_array would be used in the first place, since you cannot actually initialize an array with it. We could imagine changing the language to allow initilization of an array from a consteval std::initializer_list object, but allowing initialization from a std::span (and presumably arbitrary ranges) seems more involved.

9.1 User impact

Users that want to e.g. index into the result of define_static_array need to use .begin()[i] instead or write std::span(define_static_array(…)). Users can now use define_static_array to more easily initialize containers. User that merely iterate over the result are not affected.

9.2 Implementation impact

The implementation needs to add a way to construct a std::initializer_list from a pointer plus size, or change the compiler API to return std::initializer_list directly.

9.3 Wording

Modify [meta.reflection.define_static] Static array generation and update the synopsis accordingly

-template<ranges::input_range R>
+template<typename R>
-    consteval span<const ranges::range_value_t<R>> define_static_array(R&& r);
+    consteval initializer_list<ranges::range_value_t<R>> define_static_array(R&& r);

4 Constraints: is_constructible_v<ranges::range_value_t<R>, ranges::range_reference_t<R>> is true.

5 Let D be ranges::distance(r) and S be a constexpr variable of array type with static storage duration, whose elements are of type const ranges::range_value_t<R>, for which there exists some k ≥ 0 such that S[k + i] == r[i] for all 0 ≤ i < D.

6 Returns: span(addressof(S[k]), D)An initializer_list object il where il.begin() == S + k and il.end() == S + k + D.

7 Implementations are encouraged to return the same object whenever the same the function is called with the same argument.

10 Replace std::[u8]string_view

10.1 Poll 4.1 Replace std::[u8]string_view in argument positions by a generic argument

define_static_string takes a std::[u8]string_view. The dependency can be avoided and the function be made more general if it instead accepts any range of characters.

10.1.1 User impact

Positive, functions accept strictly more types than before.

10.1.2 Implementation impact

Minimal. An implementation that does not care about minimizing dependencies can just construct a std::string with std::ranges::to and pass a std::string_view of that to the already existing compiler API.

10.1.3 Wording (if poll 1.3 has no consensus)

Update [meta.reflection.define_static] Static array generation as follows.

-consteval const char* define_static_string(string_view str);
+template<ranges::input_range R> requires same_as<ranges::range_value_t<R>, char> && same_as<remove_cvref_t<ranges::range_reference_t<R>>, char>
+consteval const char* define_static_string(R&& r);
-consteval const char8_t* define_static_string(u8string_view str);
+template<ranges::input_range R> requires same_as<ranges::range_value_t<R>, char8_t> && same_as<remove_cvref_t<ranges::range_reference_t<R>>, char8_t>
+consteval const char8_t* define_static_string(R&& r);

1 Let str be ranges::to<string>(forward<R>(r)). Let S be a constexpr variable of array type with static storage duration, whose elements are of type const char or const char8_t respectively, for which there exists some k ≥ 0 such that:

2 Returns: &S[k]

3 Implementations are encouraged to return the same object whenever the same variant of these functions is called with the same argument.

And update the synopsis accordingly.

10.1.4 Wording (if poll 1.3 has consensus)

Update [meta.reflection.define_static] Static array generation as follows.

-consteval const char* define_static_string(string_view str);
-consteval const char8_t* define_static_string(string_view str);
+template<typename R>
+consteval const see-below* define_static_string(R&& str);

1 Constraints: A range-based for loop statement [stmt.ranged] for (std::same_as<char> auto x : r) {} or for (std::same_as<char8_t> auto x : r) {} is well-formed for an expression of type R.

[Note: This requires checking whether the begin-expr and end-expr as defined in [stmt.ranged] are well-formed and that the resulting types support !=, ++, and * as needed in the loop transformation. — end note]

2 Given for (auto&& x : r) {}, let D be the number of times the range-based for loop is iterated, T be the type decay_t<decltype(x)>, and S be a constexpr variable of array type with static storage duration, whose elements are of type const T, for which there exists some k ≥ 0 such that S[k + i] == ri for all 0 ≤ i < D, where ri is the value of x on the ith iteration, and @_S_[k + D] == '\0'.

3 Returns: S[k].

4 Implementations are encouraged to return the same object whenever the same the function is called with the same argument.

And update the synopsis accordingly.

10.2 Poll 4.2: Replace std::[u8]string_view as return type with const char[8_t]*

[u8]identifier_of and [u8]display_string_of return a std::[u8]string_view. In addition, to the dependency problems, it is not guaranteed to be a null-terminated string, and even if it were, getting a null-terminated string out of a std::string_view requires an awkward .data() call and a comment explaining why it is null-terminated. Both problems are solved by returning a const char[8_t]* instead, just like std::source_location::file() does, which is a very similar function.

The downside is that users who want one of the gazillion member functions have to manually create a std::string_view first. However, most calls probably forward the resulting identifier unchanged and are unaffected.

10.2.1 User impact

Users that require a std::[u8]string_view as opposed to a const char[8_t]* need to construct it manually. In our testing we have not found any.

Before
After
auto name = identifier_of(info);
… name.find()
std::string_view name = identifier_of(info);
… name.find()

Users that require a null-terminated string get one directly.

10.2.2 Implementation impact

None, the compiler API on clang for example already returns a const char[8_t]*.

10.2.3 Wording

Modify the new subsection in 21 [meta] after 21.3 [type.traits]:

Header <meta> synopsis


    // [meta.reflection.names], reflection names and locations
    consteval bool has_identifier(info r);

-    consteval string_view identifier_of(info r);
+    consteval const char* identifier_of(info r);
-    consteval u8string_view u8identifier_of(info r);
+    consteval const char8_t* u8identifier_of(info r);

-    consteval string_view display_string_of(info r);
+    consteval const char* display_string_of(info r);
-    consteval u8string_view u8display_string_of(info r);
+    consteval const char8_t* u8display_string_of(info r);

    consteval source_location source_location_of(info r);

And update [meta.reflection.names] accordingly.

11 Poll 5: Re-design data_member_spec

data_member_spec defines a new data member. It only takes one mandatory attribute, the type, and optional options in an object of type data_member_options_t. This aggregate type has multiple standard library dependencies:

  1. The members name, alignment and width are std::optional.
  2. The nested type name_type is specified to be constructible from anything a std::string or std::u8string can be constructed from. This requires at least knowledge of the constructors, although an implementation could do heroics to depend pulling in the header.

Ignoring the dependencies, the proposed design has multiple other problems that could be fixed:

A design that instead provides multiple creation functions combined with setters has none of those problems.

11.1 Proposed Design

class data_member_spec_t {
public:
    consteval data_member_spec_t& name(/* depends on polls 4.1 and 1.3 */ name); // set the name
    consteval data_member_spec_t& no_unique_address(bool enable = true); // set no_unique_address

    consteval operator info() const; // build the data member specification
};

consteval data_member_spec_t data_member_spec(info type); // unnamed, unaligned member with no attributes
consteval data_member_spec_t data_member_spec_aligned(info type, int alignment; // unnamed, aligned member with no attributes
consteval data_member_spec_t data_member_spec_bitfield(info type, int width); // unnamed bitfield with no attributes

We provide named functions to create the three different cases of data members. They return a builder object that can be further modified with setters and implicitly converted to an info object.

An implementation of data_member_spec_t that guards against ABI breaks can just store a single info object that represents the data already given, and modifies the compiler internal representation when calling .name() and .no_unique_address().

This requires also changes to define_class to accept a range of types convertible to info.

11.2 User impact

The API changes dramatically.

Before
After
define_class(^storage, {data_member_spec(^T, {.name = "foo", .no_unique_address = true})})
define_class(^storage, {data_member_spec(^T).name("foo").no_unique_address()})

11.3 Implementation impact

An implementation that does not care about minimizing dependencies can implement data_member_spec_t in terms of the current data_member_options_t.

11.4 Wording

TBD

12 References

[bloomberg-clang] Bloomberg’s clang implementation of P2996.
https://github.com/bloomberg/clang-p2996
[daveed-keynote] Daveed Vandevoorde’s closing CppCon2024 keynote.
http://vandevoorde.com/CppCon2024.pdf
[daw_json_link] daw_json_link.
https://github.com/beached/daw_json_link/blob/85f5f3f3d15a27fa000733f758b157d2267a74c8/include/daw/json/daw_json_reflection.h
[lexy] lexy.
https://github.com/foonathan/lexy
[P2996R7] Barry Revzin, Wyatt Childers, Peter Dimov, Andrew Sutton, Faisal Vali, Daveed Vandevoorde, and Dan Katz. Reflection for C++26.
https://wg21.link/P2996R7
[prototype-impl] prototype implementation.
https://gist.github.com/foonathan/457bd0073cfde568e446eb4d42ec87fe