match
ExpressionDocument #: | P2688R2 |
Date: | 2024-09-17 |
Project: | Programming Language C++ |
Audience: |
Evolution |
Reply-to: |
Michael Park <mcypark@gmail.com> |
match
Expressionmatch
match
This paper fleshes out the version of pattern matching that was
described in [P2688R0]. It introduces a unified
match
expression that can perform a single pattern match using the syntax:
match pattern expression
as well as the selection of pattern matches using the following syntax:
match {
expression => expression-statement
pattern /* ... */
}
A single pattern match yields a boolean, and therefore can be used in
other contexts such as
if
,
while
,
for
, and
requires
.
let
is
used to introduce new names within patterns, and are always “bindings”
like the ones introduced by structured bindings in C++17 [P0144R2].
The set of patterns are trimmed down to match the following entities:
42
,
"hello"
,
compute_value()
)T*
,
optional
)pair
,
tuple
)variant
,
expected
,
any
,
exception_ptr
, polymorphic
types)The goal and motivation of this paper is to make further progress on pattern matching for C++.
At the Kona meeting in November 2022, the previous version of this paper [P2688R0] and [P2392R2] was discussed over whether patterns should be composed vs chained. EWG clearly expressed the desire for patterns to be composable (i.e. nested patterns):
Poll: “EWG prefers composition over chaining in pattern matching syntax.”
Result: SF: 13, F: 9, N: 2, A: 1, SA: 0
This paper presents (as did [P1371R3]) a design that continues to offer composable patterns.
At the EWG Telecon July 7, 2021, EWG clearly expressed the desire for
pattern matching to be available outside of
inspect
:
Poll: “Should we spend more time on patmat expressions outside of
inspect
(as proposed in P2392 or otherwise), knowing that time is limited and we already have put in a lot of effort towards another patmat proposal?”Result: SF: 11, F: 12, N:4, A: 2, SA: 0
This paper offers single pattern matching via expression match pattern
which is similar to the
is
-expression
from [P2392R2]. See
Additionally, it aims to address the following pieces of feedback:
“Declaration of new names should have an introducer like most other places in the language.”
New names need the
let
introducer to introduce bindings, just like other new names in most
other places in the language.
“We shouldn’t bifurcate expressions like this.”
That is, expressions are just expressions without needing anything
everywhere else in the language. This is true in this design. That is,
x
by itself is an expression
referring to an existing name like it does everywhere else.
“I don’t want the documentation of pattern matching to have to mention a caveat that
x
is a new name and therefore shadows an existing variable.”
As mentioned above, x
is an
expression that refers to an existing variable.
Another contribution of this paper is Static and Dynamic Conditions, which aim to more clearly specify and discuss the framework of requirements for patterns. They determine how uses of patterns are checked and/or tested at compile-time and/or runtime, within template contexts and outside.
Features such as predicates, extractors, structured bindings with
designators, static type matching by type or concepts, and pattern
combinators
and
and
or
are
proposed to be deferred as future extensions.
The following is a list of key goals of the paper:
match
expression with
let
bindings.The following are 4-way comparison tables between C++23, [P1371R3], [P2392R2], and this paper.
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
This example is matching the variant alternatives using concepts.
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
struct Shape { virtual ~Shape() = default; };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
struct Rgb { int r, g, b; };
struct Hsv { int h, s, v; };
using Color = variant<Rgb, Hsv>;
struct Quit {};
struct Move { int x, y; };
struct Write { string s; };
struct ChangeColor { Color c; };
using Command = variant<Quit, Move, Write, ChangeColor>;
= ChangeColor { Hsv { 0, 160, 255 } }; Command cmd
C++23
|
P1371R3
|
---|---|
|
|
P2392R2
|
This Paper
|
---|---|
|
|
Example from Destructuring Nested Structs and Enums section from Rust documentation.
The overall idea is to introduce a single
match
construct that can be used to perform single pattern match, or a
selection of pattern matches.
match {
expression => expression-statement
pattern // ...
}
let
denotes that an identifier is a new name rather than an existing
name.
constexpr int x = 42;
match {
expression => ... // match against existing `x`
x let x => ... // introduce new x.
}
On the right of
=>
is an
expression-statement rather than a statement. This
means that only expressions are allowed, and I believe it will be best
to pursue do
expressions [P2806R2] to do statement
things.
The following is used to match a value against a single pattern.
match pattern expression
The following is the
match
expression being used within an
if
statement.
if (expr match [0, let foo]) {
// `foo` is available here
} else {
// but not here
}
A optional guard can be added for a single pattern match as well:
::pair<int, int> fetch(int id);
std
bool is_acceptable(int id, int abs_limit) {
return fetch(id) match let [min, max] if -abs_limit <= min && max <= abs_limit;
}
// Single pattern match
match constexpr
opt pattern guardopt
expr-or-braced-init-list
// Selection pattern match
match constexpr
opt trailing-return-typeopt {
expr-or-braced-init-list => expr-or-braced-init-list ;
pattern guardopt => break ;
pattern guardopt => continue ;
pattern guardopt => return expr-or-braced-init-listopt ;
pattern guardopt => co_return expr-or-braced-init-listopt ;
pattern guardopt }
:
guardif expression
:
pattern
match-patternlet binding-pattern
let binding-pattern
match-pattern ... // only in structured bindings pattern (P1061)
:
match-pattern// wildcard
_ // value
constant-expression ( pattern ) // grouping
? pattern // pointer-like
: pattern // variant-like, polymorphic, etc.
discriminator [ pattern-0
, … , pattern-N
] // tuple-like
:
binding-pattern
identifier[ binding-pattern-0
, … , binding-pattern-N
]
... identifier // only in structured bindings pattern (P1061)
:
discriminator
type-id type-constraint
_
A wildcard pattern always matches any subject.
int v = 42;
match {
v => std::print("ignored");
_ // ^ wildcard pattern
};
This paper reattempts for _
to be
the wildcard pattern. See Wildcard
Pattern Syntax for further discussion.
let binding-pattern
A let pattern always matches any subject. The binding-pattern is either an identifier or a structured bindings pattern.
int v = 42;
match {
v let x => std::print("ignored");
// ^^^^^ let pattern
};
let
can
be used to introduce new names individually, or all-in-one.
let x // x is new
[a, let y] // a is old, y is new
[let x, b] // x is new, b is old
let [x, y] // x and y are both new
let [x, [y, z]] // x, y, z are all new
match-pattern let binding-pattern
A let
pattern can appear after a match-pattern to create bindings to
the value that was matched with match-pattern.
int i = 42;
match {
i 42 => // match 42
let x => // bind name
42 let x => // match 42 and bind name at the same time
};
::pair p = {0, 0};
stdmatch {
p [0, let y] => // match and bind a piece
let whole => // bind whole pair
[0, let y] let whole => // do both
};
constant-expression
A constant pattern tests the value of
subject
against the value
of the constant pattern. The constant pattern can be any
constant-expression
, such
as literals,
constexpr
variables, or values of an
enum
.
bool(subject == constant-expression);
( pattern )
A parenthesized pattern is used to group undelimited patterns.
subject match pattern
Example:
void f(const Shape* s) {
match {
s ? (Circle: let c) => // ...
? (Rectangle: let r) => // ...
=> // ...
_ };
}
::optional<int> maybe_int();
std
void f() {
() match {
maybe_int(? let i) let o => // i is int, o is the whole optional
=> // ...
_ };
}
? pattern
An optional pattern tests pointer-like objects. It matches if
subject
contextually
converts to
true
and
*subject
matches pattern
.
bool(subject) && *subject match pattern
type-id : pattern
type-constraint : pattern
An alternative pattern tests sum type objects such as
variant
,
any
, and polymorphic types.
Let s
be subject,
S
be std::remove_cvref_t<decltype(subject)>
.
Case 1: Variant-like
An alternative pattern matches if the
variant
-like object stores a value
of type type-id or the value of type satisfies
type-constraint, and the stored value matches
pattern.
If std::variant_size<S>
is well-formed and std::variant_size<S>::value
is an integral, let I
be the value
of s.index()
.
An alternative pattern matches if std::variant_alternative_t<I, S>
is type-id or if it satisfies type-constraint, and
pattern matches get<I>(s)
.
Case 2: Casts
If auto* p = cast<S>::operator()<type-id>(s)
is well-formed, alternative pattern matches if
p
contextually converts to
true
and
std::forward_like<decltype(s)>(*p)
matches pattern.
A cast
customization point is
proposed, rather than using
any_cast
. Since
any
has an implicit constructor from
anything, overloading any_cast
which
takes const any&
will likely cause a problem. Moreover, [P2927R0] is in the process of
introducing std::try_cast
.
template <typename>
struct cast;
template <>
struct cast<std::any> {
template <typename T>
static const T* operator()(const std::any& a) noexcept {
return std::any_cast<T>(&a);
}
template <typename T>
static T* operator()(std::any& a) noexcept {
return std::any_cast<T>(&a);
}
};
template <>
struct cast<std::exception_ptr> {
template <typename T>
static const T* operator()(const std::exception_ptr& p) noexcept {
return std::try_cast<T>(p); // P2927R0
}
};
Case 3: Polymorphic Types
This is listed as a separate case in case it’s needed for
optimization flexibility. In principle though, the following
specialization of cast
should
provide the desired semantics.
template <typename Base>
requires requires { std::is_polymorphic_v<T>; }
struct cast<Base> {
template <typename T>
static const T* operator()(const Base& b) noexcept {
return dynamic_cast<const T*>(&b);
}
template <typename T>
static T* operator()(Base& b) noexcept {
return dynamic_cast<T*>(&b);
}
};
[ pattern-
0
, … , pattern-N
]
Given the following structured binding declaration:
auto&& [ e-
0
, …, e-N
] = subject ;
Let e-i be a unique exposition-only identifier if
pattern-i is a pattern and an ellipsis
(...
) if
pattern-i is an ellipsis
(...
).
Structured bindings pattern matches subject if e-i
matches pattern-i for all i where e-i is an
identifier.
The scope of the bindings introduced by
let
are as
follows:
=>
, the
scope of the binding is the corresponding expression statement.expression match pattern guardopt
,
the scope of the binding is the expression including the optional guard,
unless:if
statement, the scope of the binding is the then substatement of
the if
statement.for
, or
while
statement, the scope of the binding is the substatement of
for
or
while
statement.Example:
bool b1 = e1 match [0, let x] if x > 1;
// x not available here.
bool b2 = e2 match [let x]; // not a redeclaration
// x not available here.
if (e3 match (? let elem)) {
// elem available here
} else {
// elem not available here
}
while (queue.next() match (? let elem)) {
// elem available here
}
Every pattern has a corresponding condition which is tested against the subject to determine whether the pattern matches the subject.
For example, the constant pattern
0
has a
condition that it matches if subject == 0
is true. However, there are static and dynamic dimensions to which this
condition can be applied. These dimensions are defined here.
Static conditions are the static requirements of a pattern. The patterns being introduced in this paper have dynamic behavior, and therefore their static conditions are the validity of a pattern’s match condition.
See Static Type Checking with Constraint Pattern for an example where this isn’t the case.
The main question is, are these static requirements checked or
tested? Going back to the constant pattern
0
, its
static condition is whether subject == 0
is a valid expression.
void f1(int x) {
match {
x 0 => // ...
=> // ...
_ };
}
In this example, whether x == 0
is a valid expression is checked at compile-time. If
x
is a
std::string
for example, the program is ill-formed.
void f2(std::string x) {
match {
x 0 => // ill-formed
=> // ...
_ };
}
This behavior is likely to be pretty obvious to folks. But what if
x
were a templated parameter
instead?
void f3(auto x) {
match {
x 0 => // fine here
=> // ...
_ };
}
("hello"s); // proposed: ill-formed f3
This paper proposes that this example be ill-formed at the
instantiation site. While a model that treats
0
as a
no-match would be doable, I believe it’ll be better and safer as an
opt-in feature. For f3<std::string>
to have different type-checking behavior than
f2
would be novel and likely lead to
subtle bugs.
This means that static conditions of patterns are always checked and
enforced at compile-time. See More
on Static Conditions for further design discussions, and Testing the
Static Conditions with match requires
which suggests an extension to explicitly treat the static conditions as
compile-time tests rather than checks.
The semantics for this was not precisely defined in [P1371R3], and [P2392R2] proposes for f3("hello"s)
to be well-formed and
0
is a
no-match.
Dynamic conditions are more obvious and straight-forward. The
constant pattern
0
matches if
subject == 0
is true. But true when?
This paper proposes that
match
tests
the dynamic condition at runtime, (think
if
) and
match constexpr
tests it at compile-time (think if constexpr
).
match
|
match constexpr
|
---|---|
|
|
Add to 5.10 [lex.name]/2, Table 4:
2 The identifiers in Table 4 have a special meaning when appearing in a certain context. When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production. Unless otherwise specified, any ambiguity as to whether a given identifier has a special meaning is resolved to interpret the token as a regular identifier.
final
import
module
override
match
let
Add to 5.12 [lex.operators]/1:
1 The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:
preprocessing-op-or-punc: preprocessing-operator operator-or-punctuator preprocessing-operator: one of # ## %: %:%: operator-or-punctuator: one of { } [ ] ( ) <: :> <% %> ; : ... ? :: . .* -> ->* ~ ! + - * / % ^ & | = += -= *= /= %= ^= &= |= == != < > <= >= <=> && ||=> << >> <<= >>= ++ -- , and or xor not bitand bitor compl and_eq or_eq xor_eq not_eq
Change 7.6.5 [expr.mul]/1:
1 The multiplicative operators
*
,/
, and%
group left-to-right.multiplicative-expression:- pm-expression + match-expression multiplicative-expression * pm-expression multiplicative-expression / pm-expression multiplicative-expression % pm-expression
Add a new section after 7.6.4 [expr.mptr.oper]:
+ match-expression:
+ pm-expression
+ match-test-expression
+ match-select-expression
+
+ match-test-expression
+ match-subject match constexpropt match-test-guardopt
+ match-subject match constexpropt match-test-guardopt
+
+ match-subject:
+ pm-expression
+ braced-init-list
+
+ match-test-pattern
+ let-pattern
+ match-test-matching-pattern let-patternopt
+
+ match-test-matching-pattern:
+ _
+ pm-expression
+ ( match-case-pattern )
+ ? match-test-pattern
+ discriminator : match-test-pattern
+ [ match-case-pattern-list ]
+
+ let-pattern:
+ let binding-pattern
+
+ binding-pattern:
+ identifier
+ [ binding-pattern-list ]
+
+ binding-pattern-list:
+ binding-pattern
+ binding-pattern-list , binding-pattern
+
+ match-test-guard:
+ if pm-expression
+
+ match-select-expression
+ match-subject match constexpropt trailing-return-typeopt { match-case-seq }
+
+ match-case-seq:
+ match-case match-case-seqopt
+
+ match-case:
+ match-case-pattern match-case-guardopt => expr-or-braced-init-list ;
+ match-case-pattern match-case-guardopt => escape-statement ;
+
+ match-case-guard:
+ if assignment-expression
+
+ match-case-pattern:
+ let-pattern
+ match-case-matching-pattern let-patternopt
+
+ match-case-matching-pattern:
+ _
+ constant-expression
+ ( match-case-pattern )
+ ? match-case-pattern
+ discriminator : match-case-pattern
+ [ match-case-pattern-list ]
+
+ discriminator:
+ type-id
+ type-constraint
+
+ match-case-pattern-list:
+ match-case-pattern
+ match-case-pattern-list , match-case-pattern
1 Jump statements unconditionally transfer control.
jump-statement:- break ;
- continue ;
- return expr-or-braced-init-listopt ;
- coroutine-return-statement
+ escape-statement
goto identifier ;
+ escape-statement:
+ break ;
+ continue ;
+ return expr-or-braced-init-listopt ;
+ coroutine-return-statement
match
ExpressionThe match
expression presented in this paper unifies the syntax for a single
pattern match and a selection of pattern matches. Namely, expr match pattern
and expr match { ... }
.
The single pattern match expr match pattern
is very similar to expr is pattern
introduced in [P2392R2].
Early attempts at pattern matching with
inspect
also
explored the idea of being a statement and an expression depending on
its context. In short, if it appears in an expression-only context
(e.g. int x = inspect { ... };
)
then it’s an expression. If it appears in a context where a statement or
an expression can appear (e.g. { inspect { ... } }
),
then it’s interpreted as a statement.
Having to differentiate between the statement-form and
expression-form was a novel situation with no other precedent in the
language. Additionally, whatever the keyword, it would’ve needed to be a
full keyword. Maybe
inspect
would’ve been okay, but something like
match
was
not even a possibility.
With this approach,
match
is
feasible as a context-sensitive keyword, and and there is only an
expression-form, which simplifies the design.
This paper proposes _
as the
syntax for wildcard patterns. Note that this is different from bindings
that are introduced with the name
_
.
For example,
match {
e => // ...
_ // ^ this is a wildcard
let [_, _] => // ...
// ^ ^ these are bindings
};
In the bindings case, the semantics are the same as [P2169R4], which was accepted for C++26.
That is, a single declaration of _
is usable but a use after a redeclaration is ill-formed.
In the wildcard case, it is a special rule in that
_
can be an existing variable. For
example,
int i = 101;
int _ = 202;
match {
i => // 101 != 202 but _ is a wildcard, so this matches.
_ };
The recommended workaround is to use a guard:
int i = 101;
int _ = 202;
match {
i let x if x == _ => // ...
};
__
which was also the syntax
recommended in [P1110R0]._
as an identifier in the context of
structured bindings, but this was rejected by EWG. as it’s not referred
to, and was accepted for C++26._
as well.This is a relatively small cost to get
_
as the wildcard pattern, given the
prevalence and scope of adoption of
_
across the industry. Languages
such as Python, Rust, Scala, Swift, C#, Erlang, Prolog, Haskell, OCaml
and many others already use _
.
Pattern matching facilities across different languages do vary, but I’m
not aware of any language that disagree on
_
.
If expressions are not supported at all, this would mean we couldn’t
do some of the most simple operations that
switch
can
handle. We should be able to at the very least match integrals, strings,
and enums.
So we need to allow expressions at least in some capacity.
Let’s say for example we only allow literals. This would give us
matching for integral and string literals, but we wouldn’t be able to
match against
constexpr
variables of integrals and strings.
It also doesn’t get us enums, since enum values are not literals. We
need unqualified names to be able to access
enum
values,
and qualified names to be able to access enum class
values.
At this point, we already basically have primary-expression. The question of how to handle referring to existing names vs introducing new names have to be addressed. Only allowing primary-expression rather than constant-expression might still be useful or needed to avoid further grammar complications, but the fundamental issue of existing vs new names I don’t think could nor should be avoided.
The proposed syntax in this paper is
: pattern
type-id : pattern type-constraint
Here’s a simple example:
::variant<int, bool, std::string> parse(std::string_view);
std
(some_input) match {
parseint: let i => // ...
bool: let b => // ...
::string: let s => // ...
std};
This looks more like
case
labels
where the alternatives are listed and the appropriate one is chosen. The
correponding value is then matched with a nested pattern.
The absolute minimal syntax would be std::string s
,
which is rather appealing but ultimately not what is proposed.
An example using this syntax might be something like:
::variant<int, bool, std::string> parse(std::string_view);
std
(some_input) match {
parseint i => // ...
bool b => // ...
::string s => // ...
std};
Question 1: What are
i
,
b
, and
s
?
They certainly look like variable declarations, and I think it’ll be too surprising for them to be anything else. So let’s for now assume that they are variable declarations. In this case, they should probably used as a general way to introduce new names within a pattern for binding purposes. We want patterns to compose, so this applies to nested patterns as well, but at the top-level this might look like:
int parse_int(std::string_view);
(some_input) match {
parse_int0 => // ...
1 => // ...
auto i => // use `i` which is `int` returned by `parse_int`
// not 0 or 1
}
Question 2: How do you disambiguate
auto x
between variant
itself vs the
alternative inside?
std::variant
is a very unique sum type, in that you are able to handle the
“catch-all” case where you can generically access the value inside of
it.
C++23
|
Variable Declaration Approach
|
---|---|
|
|
But what is x
? Is it the
unhandled alternatives of the variant, or is it the variant itself?
In the parse_int
example from
above,
auto i
was a
binding to the whole value!
Note that for polymorphic types we could actually make this work since there’s no way to generically operate on the runtime value of a polymorphic type anyway.
struct Shape { virtual ~Shape() = default; };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };
const Shape& get_shape();
() match {
get_shapeconst Circle& c => // runtime downcast to `Circle`.
const auto& s => // `s` can't generically be `Triangle` or `Rectangle` anyway.
};
This is what C# does for example:
get_shape();
Shape
get_shape() switch {
=> // runtime downcast to `Circle`
Circle c var s => // `s` is the whole shape.
};
While this syntax would work for polymorphic types specifically,
there is a general desire to unify the handling of sum types like
variant
and polymorphic types. For
example, [P2411R0] points out:
The ‘is’-and-‘as’ notation [P2392] is cleaner and more general than the [P1371] and successors notation. For example, it eliminates the need to use the different notation styles for variant, optional, and any access. Uniform notation is the backbone of generic programming.
[P1371R3] already had uniform notation at
least for variant
,
any
and polymorphic types, but
regardless, the point is that using syntax that works only for
polymophic types but not variant
is
not desired.
Question 3: Initialization? Conversions? First-match? Best-match?
Going back to the first example:
::variant<int, bool, std::string> parse(std::string_view);
std
(some_input) match {
parseint i => // ...
bool b => // ...
::string s => // ...
std};
Are these variable declarations initialized by direct-initialization, copy-initialization, list-initialization, something else? Having to answer this question isn’t necessarily a blocker, but one needs to be chosen.
Regardless of the answer though, there’s no initialization form that
disallows conversions in general. If these have first-match semantics
(the only form of matching that has been proposed so far),
int i
would
match if the variant is in the
bool
state,
since all of these are valid:
int i1(true); // direct
int i2 = true; // copy
int i3{true}; // list
int i4 = {true}; // copy-list
On the other hand, best-match semantics would introduce significant complexity. Value-matching needs to consider becoming best-match, and this would likely mean evaluating more than necessary in order to compute a score to best-match with. If value-matching remained first-match, then we would have best-match semantics weaved info first-match semantics. This is likely very difficult for users.
Note that even with best-match semantics, allowing conversions makes code like this difficult to diagnose missing cases:
(some_input) match {
parseint i => // ...
::string s => // ...
std// maybe missing bool case? it is covered by `int` though...
};
Question 4: How do we match against an existing value?
Variable declaration syntax isn’t conducive to referring to an
existing value. Suppose there is a constexpr value
batch_size
that we want to match
against. int batch_size
wouldn’t work since that would be introducing a new variable.
batch_size
could be annotated
somehow, but annotating existing names rather than the new names has
already been attempted.
More generally, variable declaration syntax isn’t conducive to composition.
With this paper, the first example would be written as:
::variant<int, bool, std::string> parse(std::string_view);
std
(some_input) match {
parseint: let i => // ...
bool: let b => // ...
::string: let s => // ...
std};
i
,
b
, and
s
are bindings, introduced by
let
.auto x
between variant
itself vs the
alternative inside?With this paper,
let x
binds
the whole value, whereas auto: let x
binds to the value inside the variant. The following is an example of
let x
binding the whole value:
int parse_int(std::string_view);
(some_input) match {
parse_int0 => // ...
1 => // ...
let i => // use `i` which is `int` returned by `parse_int`
// not 0 or 1
}
The following is an example of auto: let x
where we bind the alternative inside the variant.
|
|
Initialization and conversions are dictated by the rules and principles of bindings as introduced by structured bindings.
The problem of first-match vs best-match is solved by requiring an exact-match for alternative types. With exact-match, first-match and best-match become equivalent.
Variable Declaration Approach
|
This Paper
|
---|---|
|
|
To be precise, the type to the left of the
:
is used to
match an alternative as declared. This is similar to
how std::get
works. For example:
void f(std::variant<const int, std::string> v) {
match {
v const int: let i => // `const int` is required here.
::string: let s => // ...
std};
}
We have a few variant-like facilities:
optional
,
expected
, and
variant
. Type-based alternative
matching for
std::variant
seems pretty obvious.
void f(std::variant<int, std::string> v) {
match {
v int: let i => // ...
::string: let s => // ...
std};
}
The int
and string
are the states that a
variant<int, std::string>
can be in, and facilities such as holds_alternative<int>
and get<int>
clearly provide type-based access to
variant
.
Of course, in general there’s more to it. The
variant
could be in a
valueless-by-exception state, or we can have std::variant<T, T>
.
Let’s table these for now.
The ? pattern
specifically supports the pointer-like usage pattern, so we can
write:
void f(int* p) {
match {
p ? let i => // ...
nullptr => // ...
};
}
optional
and
expected
are
“variant
-like” in that they have
“one-of” states. However, their interfaces are not
std::variant
-like
at all. They carry much more semantic implications. optional<T>
behaves more like T
than variant<std::nullopt_t, T>
would. expected<T, E>
behaves more like T
than
E
, and again, more like
T
than variant<T, E>
would. Their interfaces are also pointer-like rather than
std::variant
-like.
Given this, it seems natural enough to match on an
optional
like this:
void f(std::optional<int> o) {
match {
o ? let i => // ...
::nullopt => // ...
std};
}
A
std::variant
-like
approach would look like this:
void f(std::optional<int> o) {
match {
o int: let i => // ...
::nullopt_t: _ => // ...
std};
}
Here, if we changed std::optional<int>
to say, a std::optional<double>
the int: let i
pattern would be ill-formed, whereas the
?
would
continue to work. This is consistent with the usage of
optional
today:
void f(std::optional<int> o) {
// no mention of `int` in the below usage.
if (o) {
(*o);
use} else {
// ...
}
}
Open Question: For exhaustiveness checking purposes,
matching with
?
then
_
will always be sufficient. But
this means ?
will need to be matched first. For types like
T*
and
unique_ptr
, it should be possible to
say matching with
?
and
nullptr
is
exhaustive, and
nullptr
can
be matched first as well. For
optional
though, the null state is
std::nullopt
.
To use
nullptr
for
this seems wrong, given that
optional
design explicitly
introduced nullopt
over using
nullptr
itself. The solution in [P2392R2] is to introduce is void
,
but this seems problematic at least for expected<void, error>
where the question becomes ambiguous.
But expected<T, E>
gets more tricky. The “no value” case is not just some sentinel
type/value, but is some type E
retrieved by .error()
.
void f(std::expected<int, parse_error> e) {
match {
e ? let i => // ...
// How do we match and access `.error()` ?
};
}
So perhaps a variant
-like
approach would be better here:
void f(std::expected<int, parse_error> e) {
match {
e int: let i => // ...
: let err => // ...
parse_error};
}
This seems simple and clean enough. Similar to
variant
however, we can have expected<T, T>
.
Unlike variant
though, it actually
goes out of its way to store a std::unexpected<T>
as the error state to distinguish the two. It’s conceivable to use this
unexpected type to support expected<T, T>
:
void f(std::expected<int, int> e) {
match {
e int: let i => // ...
::unexpected<int>: let err => // distinguish
std};
}
But that would really hinder the by-far more common use cases:
void f(std::expected<int, parse_error> e) {
match {
e int: let i => // ...
::unexpected<parse_error>: let err => // yuck
std};
}
It was considered to allow matching std::expected<T, T>
with T
and std::unexpected<T>
while matching std::expected<T, E>
with T
and
E
. But it’s a bit weird for std::unexpected<E>
to then not work at all, and also weird for
err
in std::unexpected<E>: let err
to not be a binding to a std::unexpected<E>
,
but rather a binding to a E
. A
reference to the underlying std::unexpected<E>
is also not an interface that std::expected
exposes. Furthermore, this wouldn’t solve the problem of variant<T, T>
in a consistent manner. At best it’d be a special case for std::expected
.
Ideally, value
and
error
would be
names associated to the types
T
and
E
, such that they can be used even
when T
and
E
are the same, and are stable even
when T
and
E
changes.
This is essentially how the
Result
type in Rust is defined, as
well as many other languages that provide similar functionalities.
enum Result<T, E> {
Ok(T),
Err(E),
}
This is matched like this:
match parse(some_input) {
Ok(v) => // use `v`
Err(err) => // use `err`
}
A few approaches were considered to emulate this “name-based” dispatch.
enum class
with the desired names.enum class expected_state { value, error };
template <typename T, typename E>
class expected {
// ...
() const {
expected_state indexreturn has_value() ? expected_state::value : expected_state::error;
}
template <expected_state S>
auto&& get(this auto&& self) {
if constexpr (S == expected_state::value) {
return *std::forward<decltype(self)>(self);
} else if constexpr (S == expected_state::error) {
return std::forward<decltype(self)>(self).error();
} else {
static_assert(false);
}
}
};
template <typename T, typename E>
struct variant_size<expected<T, E>> : std::integral_constant<std::size_t, 2> {};
template <typename T, typename E>
struct variant_alternative<(std::size_t)expected_state::value, expected<T, E>> {
using type = T;
};
template <typename T, typename E>
struct variant_alternative<(std::size_t)expected_state::error, expected<T, E>> {
using type = E;
};
The usage would need to be something along the lines of:
::expected<int, parse_error> parse(std::string_view sv);
std
void f() {
(some_input) match {
parseusing enum std::expected_state;
: let v => // ...
value: let err => // ...
error};
}
While the introduction of std::expected_state
seems a bit odd on first glance, it actually doesn’t seem any more odd
than other related helper types such as std::in_place_t
,
std::unexpect_t
,
std::unexpected
,
etc.
We already have tag types, and they roughly correspond with the
various states. For example, std::expected
uses
std::in_place_t
and std::unexpect_t
.
void f(std::expected<int, parse_error> e) {
match {
e ::in_place_t: let v => // ...
std::unexpect_t: let err => // ...
std};
}
The names std::in_place_t
and std::unexpect_t
are terrible substitute for value
and error
. We’d be better off with
just using the types directly, and not fully supporting the std::expected<T, T>
case.
This idea would be to come up with a new variant-like protocol using
reflection. If a type let’s say were to advertise its alternatives
through std::vector<std::meta::info>
,
and we use those as the tags for dispatching…
template <typename T, typename E>
struct expected {
static consteval std::vector<std::meta::info> alternatives() {
return { ^value, ^error };
}
constexpr const T& value() const&;
constexpr const E& error() const& noexcept;
// other qualified versions...
};
With this, perhaps we could pull off something like:
void f(std::expected<int, parse_error> e) {
match {
e .value: let v => // ...
e.error: let err => // ...
e};
}
I think this is a very interesting direction for both tuple-like and variant-like protocols, but I haven’t been able to flesh out the details. See Reflection-based Tuple-like and Variant-like Customization Points.
In the end, the suggested path for now is:
T*
|
std::optional<T>
|
---|---|
|
|
std::expected<T, E>
|
std::variant<T, U>
|
---|---|
|
|
“Tuple-like” customization today involves specializing std::tuple_size
,
std::tuple_element
,
and implementing a get<I>
function. Section 2.3.6 “Cleaner form for structured
bindings’”tuple-like” customization” from [P2392R2] has a good summary of the
problem.
It also also says:
If we want to go further, then as Bjarne Stroustrup points out, the logical minimum is something like this, which can be viewed as a jump table (similar to a vtbl) – the most general form, ideally provided by the class author:
(EncapsulatedRect) { topLeft, width, height }; structure_map
and as Bjarne Stroustup points out in [P2411R0]:
The mapping from an encapsulating type to a set of values used by pattern matching must be simple and declarative. The use of
get<>()
for structured binding is an expert-only mess. Any code-based, as opposed to declarative, mapping will have such problems in use and complicate optimization. We can do much better.
Perhaps this problem can be tackled with reflection.
struct EcapsulatedRect {
static consteval std::vector<std::meta::info> elements() {
return { ^topLeft, ^width, ^height };
};
() const;
Point topLeftint width() const;
int height() const;
};
The advantage of this is that we can put data members as well as
member functions into elements
as
opaque reflections and apply them when needed.
Similarly, it seems feasible for there to be a reflection-based variant-like protocol as well.
template <typename... Ts>
struct variant {
static consteval std::vector<std::meta::info> alternatives() {
return { ^Ts... };
};
// ...
};
Note that for tuple-like protocol, even if we were to come up with
something better, I think we’ll still have to continue supporting the
current protocol. There are types written that opted into that protocol
that use structured bindings and
std::apply
and other things today.
Variant-like protocol is actually a different story. Unlike
tuple-like protocol, The variant helpers such as std::variant_size
,
std::variant_alternative
are solely used by
std::variant
.
std::visit
,
the only thing that might already be using a “variant-like protocol”
does not support
non-std::variant
s.
It does support types that directly inherit from
std::variant
[P2162R2], but they work by being
converted into
std::variant
beforehand.
As such, there’s a bigger opportunity for variant-like protocol to not bless the existing set of facilities but to come up with something better.
Update R2: This paper proposes to go with the existing facilities for now. While this direction is interesting, it is likely better to gain experience with reflection in C++26 before really coming up with a solution to supersede the existing facilities.
This is an elaboration of the discussion from Static Conditions. The question is: how are the requirements and validity of patterns handled? The proposed solution in this paper is for the static conditions to always be checked. For templates, this means the they are checked at instantiation.
Another approach is for some patterns to allow to be invalid if the subject is a dependent value. Since in this case, the pattern can be valid under some instantiations.
This can be made to work, and would certainly useful. As the default behavior however, it seems like it will likely cause subtle bugs.
Consider an example like this:
template <typename Operator>
void f(const Operator& op) {
.kind() match {
op'+' => // ...
'-' => // ...
'*' => // ...
"/" => // ...
=> throw UnknownOperator{};
_ };
}
Let’s say op.kind()
returns a
char
, but we
can’t be sure of that since op
is
templated. With the approach in this proposal, the typo of "/"
(should be
'/'
!)
will be detected as a compile-time error. In a model where a pattern can
be invalid because the subject is dependent, this will likely
be well-formed, fallthrough to the _
case, and throw an exception at runtime.
It’s true that this function should probably be better constrained
using concepts, but the reality is that this kind of code is extremely
prevalent today. Note that just using
if
, we would
have been provided this safety:
template <typename Operator>
void f(const Operator& op) {
if (op.kind() == '+') {
// ...
} else if (op.kind() == '-') {
// ...
} else if (op.kind() == '*') {
// ...
} else if (op.kind() == "/") { // error: comparison between pointer and integer
// ...
} else {
throw UnknownOperator{};
}
}
Testing
the Static Conditions with match requires
is described as a future extension where users can explicitly opt in to
relax this requirement on static conditions.
match
The proposed solution in this paper is for
match
to
have a precedence between pointer-to-member operators 7.6.4
[expr.mptr.oper]
and multiplicative operators 7.6.5
[expr.mul]. This is
consistent with the approach proposed for the
is
operator
in [P2392R2], and similar to the precedence
of C# switch
expression.
Input
|
Parsed
|
---|---|
|
|
Input
|
Parsed
|
---|---|
|
|
The main advantage of this approach is that the model is quite
simple. The main idea is
“match
binds
tighter than any binary operator except pointer-to-member.”
Pointer-to-member operators are excluded since many folks expect it to
bind tighter than they actually do.
There are a couple of other advantages worth mentioning. One is that this approach still reads as expected with parentheses. Consider the following example:
* (b + c) match {
a 0 => // ...
1 => // ...
=> // ...
_ }
Here, I’d argue that (b + c)
looks like the match subject, and with this precedence it is indeed the
match subject.
Another advantage is that (though perhaps silly) it’s typically work
to add parentheses around the subject rather than around the whole
match
.
For example, given:
+ b match {
a 0 => // ...
1 => // ...
=> // ...
_ }
Parenthesizing the subject is typically less work:
(a + b) match {
0 => // ...
1 => // ...
=> // ...
_ }
Compared to having to add them around the whole
match
like
this:
+ (b match {
a 0 => // ...
1 => // ...
=> // ...
_ })
The disadvantages of this approach is basically that if the desired
semantics are to match against the result of a + b
,
parentheses are required.
(a + b) match {
0 => // ...
1 => // ...
=> // ...
_ }
Similarly, another use case is to match against the result of
<=>
,
which also need parentheses:
(a <=> b) match {
::strong_ordering::equal => // ...
std::strong_ordering::less => // ...
std::strong_ordering::greater => // ...
std}
Another disadvantage is the deviation from
==
in the
expr match expr
form:
Input
|
Parsed
|
---|---|
|
|
Despite the disadvantages, this paper proposes the simplified model, and users are expected to “parenthesize binary expressions”.
This approach is to have the
match
precedence between the three-way comparison operator 7.6.8
[expr.spaceship]
and relational operators 7.6.9
[expr.rel].
The main idea here is that
<=>
and above,
e.g. *
,
+
, etc yield
interesting values to match, whereas
<
and
below,
e.g. ==
,
&
, etc
yield a boolean values which are typically less interesting.
This addresses some of the disadvantages mentioned in the previous section.
+ b match {
a 0 => // ...
1 => // ...
=> // ...
_ }
<=> b match {
a ::strong_ordering::equal => // ...
std::strong_ordering::less => // ...
std::strong_ordering::greater => // ...
std}
The disadvantage here is that the above examples may give a “false
sense of security”. If one wants to match on the result of say, a < b
,
parentheses are still required:
(a < b) match {
true => ...
false => ...
}
However, these use cases are likely to be less common than the above
cases. Furthermore, we do not want to go much lower than this. Examples
such as a == b && x match { /* ... */ }
are relatively common, and the desired parsing is (a == b) && (x match { /* ... */ })
.
Especially for the expr match expr
form, given a && b match c
it’s more likely we want a && (b match c)
.
As mentioned in the previous section, this approach could cause some confusion in the face of parenthesized expressions. For example,
* (b + c) match {
a // ...
}
This would parse as
(a * (b + c)) match {
// ...
}
which could be surprising.
Input
|
Parsed
|
---|---|
|
|
Input
|
Parsed
|
---|---|
|
|
While this makes
match
more
consistent with
==
, e.g. for
+
, but there
remain deviations.
Input
|
Parsed
|
---|---|
|
|
The following lists patterns and features excluded from this paper, but could still be useful future extensions.
A constraint pattern could be used to perform static type checks.
type-constraint
The static condition of a constraint pattern would be that decltype(subject)
satisfies the type-constraint.
For example,
void f(auto p) {
match {
p [std::convertible_to<int>, 0] => // statically check that first elem converts to int.
// ...
};
}
If used with structured bindings, this becomes very similar to the static type checking proposed in [P0480R1].
auto [std::same_as<std::string> a, std::same_as<int> b] = f();
The syntax changes would be:
match-pattern
// ...+ type-constraint
binding-pattern
// ...+ type-constraint identifier
match requires
The sections Static Conditions and
More on Static Conditions
described what static conditions are. They also described why by
default,
match
and
match constexpr
should both always check the static conditions.
match requires
(or some other spelling) would offer a way to test the static conditions
instead.
match requires
|
if constexpr (requires { ... })
|
---|---|
|
|
Using the constraint pattern from Static Type Checking with Constraint Pattern, we can perform a sequence of static type tests.
void f(auto x) {
match requires { // not proposed
x ::same_as<bool> => // ...
std::integral => // ...
std::same_as<std::string_view> => // ...
std::range => // ...
std};
}
Using the
let
pattern,
we can even bind names to each of these:
void f(auto x) {
match requires { // not proposed
x ::same_as<bool> let b => // ...
std::integral let i => // ...
std::same_as<std::string_view> let sv => // ...
std::range let r => // ...
std};
}
Another example with structured bindings patterns:
void f(auto x) {
match requires { // not proposed
x let [x] => // ...
let [x, y] => // ...
let [x, y, z] => // ...
let [...xs] => // ...
};
}
Rather than the static condition (matching size requirement) of
structured bindings pattern being checked, they are if constexpr
tested instead.
match
|
match constexpr
|
---|---|
|
|
match requires
(not proposed)
|
match requires constexpr
(not proposed)
|
---|---|
|
|
Pattern combinators provide a way to succinctly combine multiple patterns.
or ( pattern-
0
, … , pattern-N
)
and ( pattern-
0
, … , pattern-N
)
Example:
This Paper
|
With
or :
|
---|---|
|
|
|
|
This would extend structured bindings to allow designators
(i.e. .field_name
)
to match on that field.
match-pattern
// ...+ [ designator-0
: pattern-0
, … , designator-N
: pattern-N
]
binding-pattern
// ...+ [ designator-0
: binding-pattern-0
, … designator-N
: binding-pattern-N ]
Example:
return scope match {
: _ => Cxx::Scope::global_();
GlobalScope: [.fact: let f] => Cxx::Scope::namespace_(f);
NamespaceScope: [.fact: let f] => Cxx::Scope::recordWithAccess(f, access(acs));
ClassScope: [.fact: let f] => Cxx::Scope::local(f);
LocalScope// ^^^^^^^^^^^^^^
};
This would extend the alternative pattern to allow value-based discriminators.
discriminator:
type-id
type-constraint+ constant-expression
From Discussion on
Variant-like Types, the example of
enum
values
value
and
error
:
enum class expected_state { value, error };
::expected<int, parse_error> parse(std::string_view sv);
std
void f() {
(some_input) match {
parseusing enum std::expected_state;
: let v => // ...
value: let err => // ...
error};
}
variant<T, T>
|
expected<T, T>
|
---|---|
|
|
A Clang-based implementation is available at https://github.com/mpark/llvm-project/tree/p2688-pattern-matching
(with option -fpattern-matching
)
and available on Compiler Explorer
under x86-64 clang (pattern matching - P2688)
.
The following is a rough overview of the current status:
=>
is
added as a new tokenmatch
,
let
, and
_
added as context-sensitive
keywordspm-expression match pattern
pm-expression match { pattern => expr; ... }
constexpr
:
expr match constexpr { ... }
expr match -> int { ... }
pattern => expr;
,
pattern => jump-stmt;
,
pattern => {a, b};
pattern if condition => expr;
pair match let [x, y] if (x == y)
{ a, b } match pattern
[0, 1] let whole
)type-id: pattern
is implemented but type-constraint: pattern
is not yet.auto f(char c) { return c match { 'a' => 0; 'b' => 1; }; }
if (expr match [0, let x]) { /* x available here */ }
break
,
continue
,
etc), scope management, object lifetimes, etc. Similar concerns as
do
expressions: [P2806R2].type-id: pattern
,
currently it only implements polymorphic types using
dynamic_cast
under the hood.match
operatorBroadly speaking, in Clang, expressions are first parsed as a
cast-expression with
ParseCastExpression
, then
ParseRHSOfBinaryExpression
looks
ahead to see if there is an infix operator. If it finds an operator, it
folds a sequence of RHS into LHS based on the associativity and
precedence of the found operator.
The implementation first adds a
prec::Match
into the operator precedence table, then there are changes made to
ParseRHSOfBinaryExpression
to detect
the upcoming
match
token.
After we encounter a
match
token,
if we have a
constexpr
,
->
or a
{
token, we
have a match select expression expr match { ... }
,
otherwise we assume match test expression expr match pattern
.
In the case of a match test expression, care is taken such that if
pattern
turns out to be an
expression, we correctly parse it as pm-expression. This mostly
just falls out of correct placement of
prec::Match
in the operator precedence table.
For other patterns, the proposed solution is such that a leading
pattern token takes us into pattern parsing. For example,
_
is a the wildcard pattern, and it
is so even if there is a variable named
_
in scope. On the other hand,
*_
is an
expression that dereferences a variable
_
, because
*
takes us
into expression parsing. Finally, _ + 1
produces an error along the lines of "expected '=>' after wildcard pattern"
.
The leading _
takes us into pattern
parsing, even though _ + 1
could be a valid expression.
match {
expr => // wildcard pattern
_ *_ => // dereference
+ 1 => // error: expected '=>' after wildcard pattern
_ let => // error: expected identifier or '[' after 'let'
let x => // let pattern
let [x] => // SB pattern
}
Now, without parenthesized pattern, a
(
would take
us into expression parsing. It was considered to drop the parenthesized
pattern for this simplicity. However, it’s difficult to ignore that
there is virtually no language that provide pattern matching without
parenthesized patterns. Having them now I believe will also be better
down the road in evolving the set of available patterns.
It turns out, the difficulty of parsing parenthesized pattern turns
out is not much harder than the difficulty of parsing parenthesized
expressions. We already have parenthesized expressions of the form ( expression )
,
but this is a bit of a simplification as we also have cast expressions
such as (int)x
.
We already can’t just recurse into a
ParseExpression
upon seeing a
(
.
Even worse, the expression parsing today requires looking
past the
)
to
determine how to parse the expression. Given (T())
,
we don’t yet know whether the
T()
is a
function type or a constructor call. In fact it changes based on what
comes next.
(T()) * x // T() is a function type. Cast of *x to T()
(T()) / x // T() is a constructor call. Same as T() / x
This involves blindly storing everything until the
)
,
tentatively parsing what follows as a cast expression, then deciding
what the T()
means based on whether the cast expression parsing attempt succeeded or
not. * x
is
a cast expression, so
T()
is a
type, / x
is
not a cast expression, so
T()
is an
expression.
Equipped with that monstrosity, Clang basically tries to parse an
expression that starts with
(
such as
cast expression and fold expression, along with extensions such as
statement expressions and compound literals. If that fails, it proceeds
to parse ( expression )
.
For parenthesized patterns, the steps are similar:
(
is a
_
,
?
,
let
, or
[
, parse it
a a pattern.(
.( pattern )
.(x) + y => // ...
so that we can proceed to parse the
+ y
with a
parenthesized expression of (x)
.Thank you to all of the following folks