[SG16-Unicode] SG16 approval for LEWG to review std::filesystem::path_view

Niall Douglas s_sourceforge at nedprod.com
Wed Jul 3 22:43:28 CEST 2019


On 03/07/2019 19:17, JeanHeyd Meneide wrote:
> Dear Niall Douglas,
> 
>     FYI I was completely unaware that SG16 had discussed P1030. Nobody
>     told me.
> 
> 
>       Apologies, that was my failing: I think I had only briefly
> mentioned it during my first meeting Standards Meeting after I first met
> you in Rapperswil, Switzerland, but I likely should have delivered the
> feedback VIA e-mail rather than a brief gloss over in person in the most
> timid of fashions.

This rings a bell actually. Was it when we were discussing
implementation of std::embed? The problem with WG21 meetings is that
they are too intense, easy memory recall is always a problem with those.

>     I should add that the killing off of char input was strongly requested
>     by Billy. I got the feeling it was a red line for him. I can understand
>     why, from a MSVC-implementer perspective, and I have witnessed first
>     hand the brokenness of char input to path on Windows.
> 
> 
>      I am very glad that `char` is not included here. My only potential
> concern is that Linux-exclusive users will cry out. But then again, so
> will MSVC users with L"" strings. "Everyone suffers equally" is a bit of
> a cold comfort, though, but it's completely understandable why.

For me personally, if it's a "fixing lots of unintentionally broken
code" kind of source incompatibility when someone ups the C++ standard
version in the compiler, I find that acceptable. Source breakage which
can be easily repaired with find and replace regex in files is in my
opinion very acceptable indeed.

>     The aim is for path_view to be usually no worse than path, nothing more.
>     If the input is in UTF-8, and the system API requires UTF-16, then you
>     need to convert, same as for path. Unless you want to push mandatory
>     #ifdef-ing onto the end user, which I don't think we want. 
> 
> 
>      Right, I think what was lost in the original example upon Tom's
> reading was that it _always_ converted. That's not the case for
> path_view: it will only convert if the passed-in encoding does not match
> the native file system's encoding, and only do such a conversion when
> necessary. If the user passes in UTF8 on POSIX, no converting will be done.

Correct.

There is one other situation where Unicode conversion may occur - it is
when two path views of differing character backing are compared. In this
situation, one of the path views gets converted in order to perform a
string comparison. My personal preference is that both get converted to
whatever is native on the platform if necessary, but I can see people
objecting to the potential inefficiency of that.

>      Finally, I have one more... thing? It's not really a concern or a
> nit with the paper, just a bit of sadness: had we standardized a
> c_string_view of some form, we wouldn't need what will probably amount
> of "/Expects: /ptr[size] == 0 is true" on all the specifications on the
> constructors for the string view / charX + pointer overloads. I agree
> with the reasoning in the paper that there is rarely a case where users
> expect that to be the case, but it's not exactly impossible: I have
> received a lovely crop of bug reports from people seeing
> "std::string_view" overloads in my libraries and passing
> non-null-terminated strings into them, because that's what string_view
> promises. Whether or not your wording has an "expects" clause, it's not
> difficult or hard to imagine substring or other similar pointer + size
> manipulations to produce hell here.
> 
>       Then again, this is marked as the /path/_view type. Maybe that
> will be enough visual indication to the user they should think carefully
> and not just toss in random substrings. I certainly hope it is.

I've not had any reports of problems from LLFIO users, most of whom are
no wizards in C++. Indeed they often haven't realised that paths weren't
being copied until I told them. For them, it all "just worked" silently
without being noticed, or rather, I get feedback along the lines of "OMG
I ported my native API based code to LLFIO and it's now 40% faster! How
is that possible?". When we dig into why, it turns out they were
previously using std::string for path manipulation, and that was gating
open file performance significantly, because opening a file can be
really fast on Linux, so a malloc + memcpy + free cycle actually matters
a lot.

Thanks for the feedback, and sorry for not remembering our conversation
at Rapperswil better. I'm getting old :(

Niall


More information about the Unicode mailing list