1. Credits
was initially proposed by Izzy Muerte in [P1275]. Corentin Jabot and Aaron Ballman also proposed
an interface for accessing command line arguments outside
to WG14 in [N2948]. This paper borrows wording,
design elements, and good ideas from both.
2. Introduction
This paper aims to solve three problems: Encoding and portability problems with command line arguments, an interface for
accessing arguments outside of main, and a modern interface for accessing arguments. It does so by introducing a
class that provides a modern and encoding-friendly interface for arguments.
Encoding: The only standard means for accessing access command-line arguments in C++ is via
. This is a staple of C and C++, however, it’s not well-suited for portable applications because the
encoding of
varies system to system [[What is the encoding of argv?]].
On Windows, the native encoding is UTF-16 and it’s recommended to use
instead of
for portable code. In
order to facilitate
, UTF-16 arguments must be converted using legacy Windows code pages. The preferred ways to
handle command line arguments on Windows are platform-specific functions,
, or
. Even on Unix-based
systems the encoding of
is not always clear. Tackling this problem more or less necessitates an interface
for accessing command line arguments independent of
as adding a new signature to
has been rejected by the
committee.
Access outside main: It may be desirable to access command line arguments outside of
and even to do so before
. Some examples could include logging diagnostic information in a crash handler and some designs for a command
line argument parser. Currently, command line arguments are only available inside of
which requires a programmer
to manually pass arguments throughout the program or create their own global storage for arguments. This can add clutter
and introduce unnecessary complexity, especially if argument handling doesn’t happen "close" to
. There is
precedent from other languages for global access, notably languages such as Python, Go, Rust, Swift, Ruby, C#, Haskell,
Ada, and many others provide an interface for accessing arguments from anywhere in a program. Additionally, many C++
frameworks make arguments available outside
, such as QT with
.
Modernity: Passing arrays via a pointer and length argument is a very antiquated pattern rendered obsolete by modern
solutions such as
.
is the one case where separate pointer and length arguments are still a
requirement if command line arguments are desired. A modern signature for
along the lines of
,
, or
was previously rejected by the committee due to concerns surrounding complexity, overhead, and encoding issues [P0781]. On top of new functionality and increased portability, a facility such as
provides a modern
C++ solution for accessing arguments. An important benefit to this interface is teachability: Currently command line
arguments require introduction to pointers relatively early on in education as well as subjection to footguns and
confusion about the difference between C strings and C++ strings. This adds steepness to an already hazardously steep
learning curve.
3. Previous Straw Polls and Discussion
Early polling surrounding an alternative to
/
and a means of accessing arguments outside of
occurred
during discussion of [P0781]:
POLL: A trivial library solution for iterating parameters?
SF F N A SA 2 12 14 2 1
POLL: A non--based way of fetching command line arguments?
main
SF F N A SA 7 9 9 1 2
Polls on [P1275] by LEWGI:
POLL: We should promise more committee time to the
part.
std :: arguments
Unanimous consent
Attendance: 11
POLL:should be available before main
std :: arguments Attendance: 11
SF F N A SA 6 0 3 1 0
Polls on [P1275] by SG16:
POLL:and
std :: environments should follow the precedent set by
std :: arguments .
std :: filesystem :: path Attendance: 14
SF F N A SA 4 6 1 0 2
POLL:and
std :: environment should return a bag-o-bytes and conversion is up to the user.
std :: arguments Attendance: 14
SF F N A SA 3 4 2 1 2
Key concerns discussed included mutability of arguments, overhead of initializing data structures before
, and how to handle different encodings.
4. Implementability
On Windows, command line arguments can be accessed by
. This function returns the command line as a
string which must then be tokenized. This is called by the Windows CRT during startup to populate
for main. The
Windows CRT also provide
and
global variables but only populates one depending on
.
Additionally, neither may be populated if the command line parsing is disabled via options tailored to applications
trying to minimize startup time.
On MacOS,
and
can be used to access
and
outside of
. These are both
trivial functions that don’t allocate.
Implementation on other Unix-based systems is more challenging. There are four options:
-
Modify libc to store
andargv
globally, e.g.argc
and__argc
, similar to__argv
. (reference implementation for this from N2948).__environ -
Alternatively, store
andargc
from the program’s entry point. This would only require compiler support instead of a libc change.argv -
Use
which exists in glibc. Unfortunately, absent a glibc change, looping through__dl_argv
would be needed to determine__dl_argv
asargc
is hidden.__dl_argc -
Read from and tokenize
./ proc / self / cmdline -
Glibc passes
andargc
to entries in theargv
.. init_array
5. Proposed Design
This paper introduces two classes,
and
, and a header,
.
has the interface of a constant
excluding the subview interface, modifiers, constructors,
, and
. Its default constructor initializes the object to represent the program’s command-line
arguments and may perform allocation.
has a
of
which mirrors the design of
by providing
observers that can convert to desired encodings. SG16 indicated a desire to follow the precedent of
. Both paths and arguments can be encoded arbitrarily or even have no encoding; paths could be
any sequence of bytes and command line arguments can be too.
may be a view of a string or may own an
allocation.
While it is not uncommon practice to modify the contents of
,
is entirely read-only in
order to not introduce dangers surrounding global mutable state. Whether changes made to
in
are reflected
in
is implementation-defined.
5.1. Design Considerations
The main design considerations come down to allocation, when tokenization or other argument preprocessing happens, and
whether modifications to
in
are reflected in
.
Reflecting
modifications from
: It is desirable for
to contain the same values
throughout the lifetime of a program and to not reflect changes to
in
. Unfortunately, this would require
allocation and copying on some systems. On Unix-based systems all means to access
will
reflect changes to
in
, including
. Discussion on [P1275] and [P0781] made clear
that any overhead before
in the case of programs that don’t use
is unacceptable. Unfortunately,
an initializer similar to
isn’t an option due to shared libraries not necessarily being loaded
before
. Additionally, with
this would translate to overhead before
that is not pay for what
you use. Due to implementations challenges, this paper leaves behavior implementation-defined in the case of
being modified in
.
Saving
: On Unix-based systems, producing string views for arguments will involve a
. It
may be desirable to save the result of this computation, however, the issue of modification mostly rules this out.
While the storage for the arguments from the system will always be there, the pointers in
could be modified and
detecting this would be sufficiently complicated, involve overhead, or in general may be impossible. Because of this,
every access of an argument string view will require a
unless the implementation makes copies of
string
entries. It would likely be undesirable to make it undefined behavior to use
after modifications in
so this paper leaves the possibility of a strlen cost open.
Preprocessing: On Windows
will return a string which needs to be split into individual
arguments. It may be desirable in some use-cases to only split this string lazily with an input-iterator interface for
arguments. This paper does not suggest any design constrained to input-iteration, though, as much use will want more
general access and iteration abilities and will require having tokenized all arguments anyway - whether by looping
through all the arguments or even just looking at the argument count.
Backing storage for
s: On Unix-based systems it would be simple for
to not
involve any allocation and simply provide iterators over
that dereference to ephemeral
objects.
Unfortunately, this would prevent the iterator from satisfying the Cpp17RandomAccessIterator requirements, container
requirements, and may be error prone in the case of trying to store a reference to a
. The proposed
requirements here will require backing storage.
Global singleton, a function returning a reference, or construction:
could be implemented as a
global singleton similar to
, a
function returning a reference to a singleton, or as an
object that the user constructs. While an object the user constructs potentially results in allocation at multiple
points in a program, as well as possibly seeing different values if
is modified in
, it’s also desirable to
allow the
allocation to be cleaned up. As such, this paper proposes a
class which may
perform allocation and various preprocessing at construction.
Globs and
: On Unix-based systems glob expansion is done by the shell. On Windows it is neither done by
the shell or the Windows CRT. This paper proposes
should correspond directly to
in
without any additional glob expansion. This paper also does not propose any special handling for the first entry of
.
Comparison with other performance-oriented languages: Rust’s
function creates an
object
which involves creating a vector of strings in the OS native encoding, copying from
on Unix-based systems and
tokenizing on Windows. Rust accesses
and
on most Unix-based systems by placing an initializer in the
. Rust doesn’t have to worry about modification of
in
.
Because the design of this library feature involves a lot of tradeoffs, it is the goal of this paper to offer as much implementation flexibility as possible.
5.2. Future Interface Expansion
Author’s note: While most large applications should probably use a library for argument parsing, it is my hope that in
the case of more ad-hoc argument parsing it would be possible to portably write a check such as
or
. Another helpful operation would be
. Unfortunately, encoding makes it challenging to do operations such as this portably.
Because encoding will vary between systems and
is implementation-defined, currently the only way to do this
would involve the overhead of creating a string for a given encoding or an ugly macro to create a platform-dependent
string literal:
// The overhead here is unfortunate but OK for 99% of uses if ( std :: arguments . at ( 1 ). string () == "--help" ) { // ... } // or: #ifdef _WIN32 #define ARG(str) L##str #else #define ARG(str) str #endif if ( std :: arguments . at ( 1 ). native () == ARG ( "--help" )) { // ... }
A UDL could also be considered, however, this is a more general problem that, in the author’s opinion, should be addressed directly rather than through a bespoke solution. The problem of operations between strings of different encodings would best be tackled in another paper.
5.3. Bikeshedding
This paper uses the
naming from [P1275], however, the name is subject to bikeshedding. One point
brought up on the mailing list was that
is a very generic name and it might be desirable to reserve it for
future use. Some names that could be considered instead include:
-
std :: program_arguments -
std :: command_line -
std :: command_line :: arguments -
std :: program_options -
std :: argv -
std :: process : arguments
Naming in other notable languages:
-
Python:
sys . argv -
Go:
os . Args () -
Rust
std :: env :: args () -
Swift:
CommandLine . arguments -
Ruby:
ARGV -
C#:
Environment . GetCommandLineArgs () -
Haskell:
getArgs () -
Ada:
Ada . Command_Line . Argument
In a very informal approval-voting-style poll on the Together C & C++ Discord server (participants were asked to vote
for all they found appealing) members showed a strong preference for either
or
with eight
and 17 votes respectively. Other options had no more than two votes. N.b.: The last option,
,
came up after the poll was started and thus wasn’t captured in the poll.
6. Reference Implementation
A reference implementation / proof of concept is at https://github.com/jeremy-rifkin/arguments.
7. Proposed Wording
Wording is relative to [N4950] and borrows extensively from existing wording.
Insert into [headers] table 24:
< arguments >
Insert a new section [arguments]:
Header < arguments >
synopsis [arguments.syn]
namespace std { class argument ; template < class Allocator = allocator < argument >> class arguments ; }
Class arguments
[arguments.view]
Class
is a read-only container holding a continuous range of
objects corresponding to arguments passed to the program.
All member functions of
have constant time complexity except the constructor.
namespace std { template < class Allocator = allocator < argument >> class arguments { public : using value_type = const argument ; using size_type = size_t ; using difference_type = ptrdiff_t ; using pointer = value_type * ; using const_pointer = value_type * ; using reference = value_type & ; using const_reference = value_type & ; using const_iterator = /* implementation-defined */ ; // see [arguments.view.iterators] using iterator = const_iterator ; using const_reverse_iterator = std :: reverse_iterator < const_iterator > ; using reverse_iterator = const_reverse_iterator ; // [arguments.view.cons], constructors arguments () noexcept ( noexcept ( Allocator ())) : arguments ( Allocator ()) {} explicit arguments ( const Allocator & ); // [arguments.view.access], access reference operator []( size_type index ) const noexcept ; reference at ( size_type index ) const ; // [arguments.view.obs], observers size_type size () const noexcept ; bool empty () const noexcept ; // [arguments.view.iterators], iterators const_iterator begin () const noexcept ; const_iterator end () const noexcept ; const_iterator cbegin () const noexcept ; const_iterator cend () const noexcept ; const_reverse_iterator rbegin () const noexcept ; const_reverse_iterator rend () const noexcept ; const_reverse_iterator crbegin () const noexcept ; const_reverse_iterator crend () const noexcept ; }; }
Constructors [arguments.view.cons]
Effects: Constructs an
object with the program’s arguments using the specified allocator.
Throws: May throw if
throws.
Access [arguments.view.access]
Preconditions:
is true
.
Returns: The argument at index
passed into the program from the environment. It is implementation-defined whether, in a
function with signature
, any modifications to
are reflected by
.
Throws: Nothing.
[Note 1:
corresponds to
in
— end note].
Effects: Equivalent to:
if
is true
.
Throws:
if
is true
.
Observers [arguments.view.obs]
Returns: The number of program argument.
Effects: Equivalent to:
Iterators [arguments.view.iterators]
The type models a
([iterator.concept.contiguous]) and meets the Cpp17RandomAccessIterator requirements ([random.access.iterators]) whose value type is
and whose reference type is
.
All requirements on container iterators ([container.reqmts]) apply to
as well.
Returns: An iterator referring to the first program argument. If
is true
, then it returns the same value
as
.
Returns: An iterator which is the past-the-end value.
Effects: Equivalent to:
Effects: Equivalent to:
Class argument
[arguments.argument]
An object of class
is a view of a character string argument passed to the program in an operating
system-dependent format.
It is implementation-defined whether, in a
function with signature
, any modifications to
are reflected by an
.
namespace std { class argument { public : using value_type = /* see below */ ; using string_type = basic_string < value_type > ; using string_view_type = basic_string_view < value_type > ; // [arguments.argument.native], native observers const string_view_type native () const noexcept ; const string_type native_string () const ; const value_type * c_str () const noexcept ; explicit operator string_type () const ; explicit operator string_view_type () const noexcept ; // [arguments.argument.obs], converting observers template < class EcharT , class traits = char_traits < EcharT > , class Allocator = allocator < EcharT >> basic_string < EcharT , traits , Allocator > string ( const Allocator & a = Allocator ()) const ; std :: string string () const ; std :: wstring wstring () const ; std :: u8string u8string () const ; std :: u16string u16string () const ; std :: u32string u32string () const ; // [arguments.argument.compare], comparison friend bool operator == ( const argument & lhs , const argument & rhs ) noexcept ; friend strong_ordering operator <=> ( const argument & lhs , const argument & rhs ) noexcept ; // [arguments.argument.ins], inserter template < class charT , class traits > friend basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& os , const argument & a ); }; // [arguments.argument.fmt], formatter template < typename charT > struct formatter < argument , charT > : formatter < argument :: string_view_type , charT > { template < class FormatContext > typename FormatContext :: iterator format ( const argument & argument , FormatContext & ctx ) const ; }; }
Conversion [arguments.argument.cvt]
The native encoding of an ordinary character string is the operating system dependent current encoding for arguments. The native encoding for wide character strings is the implementation-defined execution wide-character set encoding ([character.seq]).
For member functions returning strings, value type and encoding conversion is performed if the value type of the
argument or return value differs from
. For the return value, the method of conversion and the
encoding to be converted to is determined by its value type:
-
: The encoding is the native ordinary encoding. The method of conversion, if any, is operating system dependent.char -
: The encoding is the native wide encoding. The method of conversion is unspecified.wchar_t -
: The encoding is UTF-8. The method of conversion is unspecified.char8_t -
: The encoding is UTF-16. The method of conversion is unspecified.char16_t -
: The encoding is UTF-32. The method of conversion is unspecified.char32_t
If the encoding being converted to has no representation for source characters, the resulting converted characters, if any, are unspecified.
Native Observers [arguments.argument.native]
The string returned by all native observers is in the native default argument encoding ([arguments.argument.cvt]).
Returns: A
representing the argument.
Returns: A
representing the argument.
Returns: A pointer to a null-terminated array of
representing the argument.
Returns: A
representing the argument.
Returns: A
representing the argument.
Converting Observers [arguments.argument.obs]
Returns: A string representing the argument.
Remarks: All memory allocation, including for the return value, shall be performed by a. Conversion, if any, is specified by [arguments.argument.cvt].
Returns: A string representing the argument.
Remarks: Conversion, if any, is specified by [arguments.argument.cvt].
Comparison [arguments.view.compare]
Effects: Equivalent to:
.
Effects: Equivalent to:
.
Inserter [arguments.argument.ins]
Effects: Equivalent to:
.
Formatter [arguments.argument.fmt]
Effects: Equivalent to:
.