"Your scientists were so preoccupied with whether or not they could, they
didn’t stop to think if they should."
— Dr. Ian Malcolm
1. Introduction
P1030
is a paper with a long and troubled history,
consistently falling short of its original goals and, in some cases, even
regressing. While recent revisions have removed some of the more questionable
parts of the design, such as the use of locales, numerous critical issues remain
unresolved. This paper highlights some of these issues and argues that
standardizing
in its current form would not only perpetuate past
design flaws but also make future fixes nearly impossible. Additionally, it
points out the severe lack of implementation and practical usage experience with
the latest design.
2. Problems
2.1. Encoding
A significant portion of the initial revision of the paper ([P1030R0]) was
devoted to examining the issues surrounding
and the use
of ANSI encodings on Windows:
came originally from Boost.Filesystem, which in turn underwent three major revisions during the Boost peer review as it was such a lively debate. During those reviews, it was considered very important that paths were passed through, unmodified, to the system API. There are very good reasons for this, mainly that filesystems, for the most part, treat filenames as a bunch of bytes without interpreting them as anything. So any character reencoding could cause a path entered via copy-and-paste from the user to be unopenable, unless the bytes were passed through exactly.
std :: filesystem This is a laudable aim, and it is preserved in this path view proposal. Unfortunately it has a most unfortunate side effect: on Microsoft Windows,
when supplied with
std :: filesystem :: path not
char , is considered to be in ANSI encoding. This is because the
wchar_t accepting syscalls on Microsoft Windows consume ANSI for compatibility with Windows 3.1, and they simply thunk through to the UTF-16 accepting syscall after allocating a buffer and copying the input bytes into shorts. Therefore on Microsoft Windows,
char duly expands
std :: filesystem :: path input into its internal UTF-16
char storage via direct casting. It does not perform a UTF-8 to UTF-16 conversion.
wchar_t Unfortunately any Microsoft Windows IDE or text editor that I have used recently defaults to creating C++ source files in UTF-8, exactly the same as on every other major platform including Linux and MacOS. This in turn means that source code with a char string literal such as
makes a UTF-8 char string, not an ANSI char string, which is consistent across all the major platforms. Thus,
"UTF♠stringΩliteral" ’s behaviour on Microsoft Windows is quite surprising: your portable program will not work. What works on all the other platforms, without issue, does not work on Microsoft Windows, for no obvious reason to the uninitiated.
std :: filesystem :: path This author can only speak from his own personal experience, but what he has found over many years of practice in writing portable code based on
is that one ends up inevitably using preprocessor macros to emit
std :: filesystem :: path L
when"UTF♠stringΩliteral" and
_WIN32 are macro defined, and otherwise emit
_UNICODE . The reason is simple: the same string literal, with merely a
"UTF♠stringΩliteral" or not prefix, works identically on all platforms, no locale induced surprises, because we know that string literals in UTF source code will be in some UTF-x format. The side effect is spamming your ‘portable’ program code with string literal wrapper macros as if we were still writing for MFC, and/or
L all over your code. I do not find this welcome.
#if defined(_WIN32) && defined(_UNICODE)
R0 goes as far as to switch to UTF-8 as the default encoding for
:
I propose that when char strings are supplied as a path string literal, and if and only if a conversion is needed, that we interpret those chars as UTF-8.
I know that this is a breaking change from
, but I would argue that
std :: filesystem :: path needs to be similarly changed. UTF-8 source code is very, very commonplace now, much more so than even a few years ago, and it is extremely likely that almost all new C++ written will be in UTF-8. So best to change
std :: filesystem :: path appropriately, and if that is too great a breaking change, then these proposed path views are ‘fixed’ instead.
std :: filesystem :: path
While this revision confuses source and literal encoding and presents an overly
ambitious solution, the problems described by the author are very real. In fact,
they have worsened as UTF-8 adoption has increased on Windows, particularly with
the ease of enabling UTF-8 via the
compiler flag in MSVC.
Working with certain parts of
is very error-prone for
the increasingly common case of literal encoding being UTF-8. Unfortunately,
later revisions of P1030 not only dropped any attempt to address this problem
but exacerbated it by adopting the legacy ANSI encoding throughout the API.
Worse still, this encoding has been embedded in the internal representation,
making it part of the ABI — a major regression compared to
, where the use of ANSI encoding is far more limited and
rightfully avoided in the internal representation.
[P2319], which was recently approved by SG16 with strong support, proposes
to deprecate the most problematic (from the encoding standpoint) parts of
. [P1030R6] does the opposite and massively increases
the public API (and ABI) surface that relies on error-prone legacy codepages.
In addition to problems described in P1030R0, the use of ANSI encoding makes
it hard for
to interoperate with modern facilities
such as C++20
and C++23
(see Formatting).
2.2. Implementation and usage experience
[P1030R6] claims:
If you wish to use an implementation right now, a highly-conforming reference implementation of the proposed path view can be found at https://github.com/ned14/llfio/blob/master/include/llfio/v2.0/path_view.hpp.
Unfortunately, at the time of writing, important parts of the proposal are
missing from that implementation. Specifically, more than 80 new overloads (for
functions like
to
) remain unimplemented. Even
worse, the paper itself lacks wording for these functions:
Wording note: The definitions for the function declared in the synopsis above are not provided at this time. All of them delegate to the overload taking a
.
path
Additionally, there is no implementation of a path-view-like equivalent that
was designed on-the-fly during one of the LEWG reviews. As a result, there is no
way to evaluate the effects of switching to
in these functions on
real-world user code.
At the time of writing, there are zero uses of
(referred to as
in the paper) on GitHub, aside from its
definition and a mention in a blog. Furthermore, there are no tests available,
despite it being one of the primary APIs.
2.3. Performance
in its current form exacerbates encoding problems, but does it at
least offer performance improvements?
Unfortunately,
goes to great lengths to avoid providing any
performance benefits for existing users. This is achieved through obscure path-view-like overloads so that
existing C++ code would need to ‘opt in’ to using the path view overloads
This stands in stark contrast to the common use of
, which
typically allows users to avoid
allocations:
void f ( std :: string_view s ); f ( "foo" ); // No allocation std :: filesystem :: file_size ( "/path/to/file" ); // Allocates std::filesystem::path // in P1030R6.
Additionally, due to lazy transcoding,
can be
slower than
, which transcodes eagerly, when used
multiple times.
2.4. Formatting and output
Unlike
,
proposed by [P1030R6] does not provide a formatter
so the following examples do not compile:
std :: filesystem :: path_view pv = ...; std :: string s = std :: format ( "/tmp/" ) + pv ; std :: ( "{}" , pv );
Implementing this functionality may be problematic due to unfortunate choices in the latest design.
One issue is related to encoding. The representation of
uses a single
encoding that remains constant at runtime, making it feasible — though not
trivial — to specify a good formatter. In contrast,
complicates
matters by using multiple representations with different encodings, one of which
can be a legacy encoding that can change at runtime. As a result, there is no
way to determine which encoding
was constructed with at the time of
use. This is conceptually similar to the Time of Check to Time of Use
([TOCTOU]) class of problems common in filesystem operations, which in this
case can lead to mojibake, data corruption and other problems.
Another issue is the binary representation, which is severely underspecified and may conflict with other representations, making output hard to round-trip, even within a single implementation. Writing as an author of the path formatter ([P2845]), it remains unclear how it should be defined, and despite multiple requests, [P1030R6] still has failed to provide the necessary specifics.
is defined in terms of path-from-binary which is very vague and
appears to have the same problems.
2.5. Complexity
roughly doubles the API surface area of
,
both in terms of its own definition and by proposing to add an overload that
takes path-view-like arguments for every existing overload that takes
.
For example:
bool equivalent ( const path & p1 , const path & p2 ); bool equivalent ( const path & p1 , const path & p2 , error_code & ec ) noexcept ;
bool equivalent ( path - view - like p1 , path - view - like p2 ); bool equivalent ( path - view - like p1 , path - view - like p2 , error_code & ec ) noexcept ;
Contrary to its name, the proposed
is not truly a
view of
in the same way that
can be
considered a view of
.
has a single representation that is
suitable for the current system. In contrast,
is effectively a
discriminated union of some (but not all) of the types from which
can be
constructed, with a lazy conversion to path. It is unclear what such an unusual
API should be called, but it probably should not be referred to as a "view."
2.6. Conclusion
In summary, the proposed
presents significant
concerns that need to be resolved before standardization. Its design exacerbates
encoding problems and adds unnecessary complexity to the API. The reliance on
legacy ANSI encoding undermines modern practices and complicates
interoperability with other C++ facilities.
Additionally, the increased API surface area and the requirement for users to
opt in to specific overloads detract from its usability. To maximize the utility
of
, future revisions should focus on simplifying its design,
addressing encoding issues, enhancing compatibility with existing libraries and
getting actual implementation and usage experience. Standardizing the current
proposal risks introducing more problems than it solves.