Given the increased importance of being able to develop portable, scalable network-aware applications, C++ developers are at a disavantage in having no standard network implementation to use. One of the fundamental components of any network library is a URI and this proposal is motivated by the desire to introduce a portable, efficient and internationalized implementation of a URI to C++ standard library users.
This proposal is based on original development done in the cpp-netlib project http://cpp-netlib.github.com/. This implementation is released using the Boost software license and will track the proposed library as it evolves. A standalone project can be found at https://github.com/cpp-netlib/uri/.
The scope of this proposal will include a single uri type, some specifications about how URIs are intended to be processed and extended, including some additional helper types and functions. It will include a type and functions to build a URI from its components. Finally, it will include types and functions for percent encoding, URI references and URI normalization.
This paper is a follow-up to the preliminary proposals described in document N3420 and N3484 and includes updates based on feedback which are described in the appendix.
std::network::uri uri("http://www.example.com/glynos/?key=value#frag");
assert(uri.absolute());
assert(!uri.opaque());
assert(*uri.scheme() == "http");
assert(*uri.host() == "www.example.com");
assert(*uri.path() == "/glynos/");
assert(*uri.query() == "?key=value");
assert(*uri.fragment() == "frag");
The code excerpt above shows a simple of how the proposed uri will work. The URI string is parsed during object construction and broken down into its component parts. HTTP URIs are absolute and hierarchical (i.e. not opaque).
std::network::uri uri(U"xmpp:example-node@example.com?message;subject=Hello%20World");
assert(uri.absolute());
assert(uri.opaque());
assert(*uri.scheme() == "xmpp");
assert(*uri.path() == "example-node@example.com");
assert(*uri.query() == "?message;subject=Hello%20World");
The uri in this proposal supports encoded strings and supports encoding conversion. The example above shows a uri object constructed using a std::u32string and allow the parts to be accessed as std::string objects in UTF-8 encoding.
The document has been fixed so that ‘query` part now starts after the question-mark separator.
All parts accessors (scheme, user_info, host, port, path, query and fragment) now all return std::optional<string_ref>.
normalize, relativize and resolve all accept as an argument the uri_comparison_level.
The uri(InputIterator, InputIterator) and uri(Source) constructors now can throw a std::system_error exception instead of uri_syntax_error. This makes the constructor exception consistent with the error code set by make_uri.
compare now returns an int instead of a bool. The return value of compare will be -1 if this (normalized) uri is lexographically less than that of the other, 0 if they are equal and 1 if this is greater than the other (see comparison ladder).
The items in the appendix describing the revision history have changed to this section. The appendix now contains a response to questions about existing practice and to the WhatWG URL Standard.
The updated uri class no longer specifies the underlying string type directly; it need not even be a std::basic_string. The std::network::uri::value_type is now defined as the value of the string iterator. Consequently, the uri class will no longer assume that the underlying string is in a contiguous block of memory, so the c_str() has been removed.
The uri(InputIterator, InputIterator) and uri(Source) constructors now can throw a uri_syntax_error exception. The is_valid member has been removed as it is now redundant.
This proposal in its current form only specifies tests for syntactic validity. To make this more explicit, the test for validity (now an exception) is named uri_syntax_error. This decision will influence others, especially related to normalization and equivalence. Future versions of this proposal may address the question of semantic validity. It would boil down to how complex the implementation of a generic URI class needs to be, and if it can be made extensible in the right ways to test scheme-specific semantics.
The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.
The functions normalize, relativize and resolve have been removed and replaced with member functions in the class.
A function compare is provided which takes as a 3rd argument a uri_comparison_level enum. This can be used to determine the level of the comparison ladder to use when comparing URIs. A more complete description of what comparison and equivalence means can be discussed and elaborated in a future proposal.
The question of semantics of relational operators will also be addressed.
The functions pct_encode and pct_decode have been renamed encode and decode. A few more functions have been added to take into account the different types of encoding for different URI parts.
The changes requires to the uri::builder have been taken into account, including renaming it uri_builder and providing a method uri() that returns a new uri object.
The generic syntax of a URI is defined in RFC 3986.
All URIs are of the form:
scheme ":" "hierarchical part" [ "?" query ] [ "#" fragment ]
The scheme is used to identify the specification needed to parse the rest of the URI. A generic syntax parser can parse any URI into its main parts. The scheme can then be used to identify whether further scheme-specific parsing can be performed.
The hierarchical part refers to the part of the URI that holds identification information that is hierarchical in nature. This may contain an authority (always prefixed with a double slash "//") and/or a path. The path part is required, thought it may be empty. The authority part holds an optional user info part, ending with an at sign "@"; a host identifier and an optional port number, preceded by a colon ":". The host may be an IP address or domain name. RFC 3986 does not specify the format for IPv6 addresses, though RFC 2732 does.
The query is an optional part starting with a question mark "?" that contains information that is not hierarchical.
Finally, the fragment is an optional part, prefixed by a hash symbol "#" that is used to identify secondary sources.
RFC 3987 specifies a new protocol element, the Internationalized Resource Identifier (IRI). The IRI complements a URI, and extends it to allow unicode characters.
This proposal will define a uri type that will attempt to encompass all three of these RFCs.
URI percent encoding is described in RFC 3986, section 2.1 and RFC 3986, section 2.4.
Percent encoding is the mechanism used to encode reserved characters in a URI. According to RFC 3986, section 2.2, the set of reserved characters are:
! | # | $ | & | ' | ( | ) | * | + | , | / | : | ; | = | ? | @ | [ | ] |
%21 | %23 | %24 | %26 | %27 | %28 | %29 | %2A | %2B | %2C | %2F | %3A | %3B | %3D | %3F | %40 | %5B | %5D |
Percent encoding is not limited to reserved characters. Any character data may be percent encoded:
newline | space | " | % | - | . | < | > | \ | ^ | _ | ` | { | | | } | ~ |
%0A | %20 | %22 | %25 | %2D | %2E | %3C | %3E | %5C | %5E | %5F | %60 | %7B | %7C | %7D | %7E |
URI normalization is described in RFC 3986, Section 6 and in RFC 3987, Section 5. Normalization is the process by which a URI is transformed in order to determine if two URIs are equivalent.
Different types of normalization may preserve semantics, and others may not. Normalization may also depend on the scheme.
The scheme and host are case-insensitive. The proposed normalization solution will convert these to lowercase.
HTTP://Example.com/ --> http://example.com/
The user info, path, query and fragment are case-sensitive and so must not be converted.
Characters in a percent-encoded triplet are case-insensitive. The proposed normalization solution will convert these to lowercase.
http://example.com/%5b%5d --> http://example.com/%5B%5D
Unreserved characters that have been encoded will be decoded.
http://example.com/%7Eglynos/ --> http://example.com/~glynos/
If a path refers to a directory, it should be indicated with a trailing slash.
http://example.com/glynos --> http://example.com/glynos/
But not if the path refers to a file.
http://example.com/glynos/page.html --> http://example.com/glynos/page.html
The segments ”..” and ”.” can be removed according to the algorithm specified in RFC 3986, Section 5.2.4.
http://example.com/glynos/./proposals/../ --> http://example.com/glynos/
Some schemes may have a default port (for HTTP it is 80). The default port can be removed.
http://example.com:80/ --> http://example.com/
http://example.com:/ --> http://example.com/
The Comparison Ladder is described in RFC 3986, Section 6.2. It explains that comparing URIs using normalization can be implemented in different ways according to the complexity of the method and the number of false negatives which may arise.
The final two steps in the Comparison Ladder require more information than can be provided within the limits of the proposal in order to be implemented comprehensively, and will not form part of the proposal at this stage.
URI references are described in RFC 3986, section 4, RFC 3986, section 5 and RFC 3987, section 6.5. URI references are particularly useful when working on the server side when the base URI is always the same, and also when using URIs within the same document.
Two operations related to references are of use: acquiring the relative reference of a URI, and resolving a reference against a base URI.
#include <string> // std::basic_string
#include <system_error> // std::error_code
#include <iosfwd> // std::istream, std::ostream, std::wistream, std::wostream
#include <iterator> // std::iterator_traits
#include <optional> // std::optional - availability pending
#include <string_ref> // std::basic_string_ref - availability pending
namespace std {
namespace network {
// class declarations
class uri;
class uri_builder;
enum class uri_error {
invalid_syntax,
invalid_uri,
invalid_scheme,
invalid_user_info,
invalid_host,
invalid_port,
invalid_path,
invalid_query,
invalid_fragment,
};
enum class uri_comparison_level {
string_comparison,
case_normalization,
percent_encoding_normalization,
path_segment_normalization,
scheme_based_normalization,
protocol_based_normalization,
};
// factory functions
template <class String>
uri make_uri(const String &u, std::error_code &e) noexcept;
// swap functions
void swap(uri &lhs, uri &rhs);
// equality and comparison operators
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator < (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator > (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);
// percent encoding and decoding
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_user_info(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_host(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_port(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_path(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_query(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_fragment(InputIterator first, InputIterator last, OutputIterator out);
template <typename InputIterator, typename OutputIterator>
OutputIterator decode(InputIterator first, InputIterator last, OutputIterator out);
template <class String>
String encode_user_info(const String &user_info);
template <class String>
String encode_host(const String &host);
template <class String>
String encode_port(const String &port);
template <class String>
String encode_path(const String &path);
template <class String>
String encode_query(const String &query);
template <class String>
String encode_fragment(const String &fragment);
template <class String>
String decode(const String &source);
// stream operators
std::ostream &operator << (std::ostream &os, const uri &u);
std::wostream &operator << (std::wostream &os, const uri &u);
std::istream &operator >> (std::istream &os, uri &u);
std::wistream &operator >> (std::wistream &os, uri &u);
} // namespace network
} // namespace std
The <network/uri> header contains a declaration for a single uri class in the std::network namespace.
At this stage, the sub-namespace network namespace should be regarded as a placeholder for a namespace specified for network components during the standardization process (should such as sub-namespace be specified).
// factory functions
template <class String>
uri make_uri(const String &u, std::error_code &e) noexcept;
This factory function is provided in order to be able to construct a uri object without throwing an exception. The error code is stored in the std::error_code object, if there is a syntax error.
namespace std {
namespace network {
enum uri_comparison_level { ... };
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator < (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator > (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_user_info(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_host(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_port(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_path(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_query(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator encode_fragment(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename InputIterator, typename OutputIterator>
OutputIterator decode(InputIterator first, InputIterator last, OutputIterator out);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <class String>
String decode(const String &source);
} // namespace network
} // namespace std
namespace std {
namespace network {
std::ostream &operator << (std::ostream &os, const uri &u);
std::wostream &operator << (std::wostream &os, const uri &u);
} // namespace network
} // namespace std
This proposal specifies output stream operators for character and wide character streams.
namespace std {
namespace network {
std::istream &operator >> (std::istream &is, uri &u);
std::wistream &operator >> (std::wistream &is, uri &u);
} // namespace network
} // namespace std
This proposal specifies input stream operators for character and wide character streams.
Below is the proposed interface for the uri class:
namespace std {
namespace network {
class uri {
public:
// typedefs
typedef ... string_type;
typedef string_type::const_iterator iterator;
typedef string_type::const_iterator const_iterator;
typedef std::iterator_traits<iterator>::value_type value_type;
typedef basic_string_ref<value_type> string_ref;
// constructors and destructor
uri();
template <typename InputIterator>
uri(const InputIterator &first, const InputIterator &last);
template <class Source>
explicit uri(const Source &source);
uri(const uri &other);
uri(uri &&other) noexcept;
~uri();
// assignment
uri &operator = (const uri &other);
uri &operator = (uri &&other);
// swap
void swap(uri &other) noexcept;
// iterators
const_iterator begin() const;
const_iterator end() const;
// accessors
std::optional<string_ref> scheme() const;
std::optional<string_ref> user_info() const;
std::optional<string_ref> host() const;
std::optional<string_ref> port() const;
std::optional<string_ref> path() const;
std::optional<string_ref> authority() const;
std::optional<string_ref> query() const;
std::optional<string_ref> fragment() const;
// query
bool empty() const noexcept;
bool absolute() const noexcept;
bool opaque() const noexcept;
// transformers
uri normalize(uri_comparison_level level) const;
uri relativize(const uri &other, uri_comparison_level level) const;
uri resolve(const uri &other, uri_comparison_level level) const;
// comparison
int compare(const uri &other, uri_comparison_level level) const;
// string accessors
string string() const;
wstring wstring() const;
u16string u16string() const;
u32string u32string() const;
};
} // namespace network
} // namespace std
The uri class itself is a little more than a light-weight wrapper around a string, a parser and the uri’s component parts. Parsing is performed upon construction and, if successfully parsed, the component parts are stored as iterator ranges that reference the original string. For example, consider the following URI:
http://www.example.com/path/?key=value#fragment
^ ^ ^ ^ ^^ ^^ ^
a b c d ef gh i
On parsing, the uri object will contain a set of range types corresponding to the ranges for scheme, user info, host, port, path, query and fragment. So the ranges corresponding to the example above will be:
URI part | Range | String |
---|---|---|
scheme | [a, b) | "http" |
user_info | nullopt | |
host | [c, d) | "www.example.com" |
port | nullopt | |
path | [d, e) | "/path/" |
query | [f, g) | "key=value" |
fragment | [h, i) | "fragment" |
Template parameters named InputIterator are required meet the requirements for a C++ standard library RandomIterator compliant iterator. The iterator’s value type is required to be char, wchar_t, char16_t, or char32_t.
Template parameters named Source are required to be one of:
A container with a value type of char, wchar_t, char16_t, or char32_t.
An iterator for a null terminated byte-string. The value type is required to be char, wchar_t, char16_t, or char32_t.
A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t.
This is identical wording to that found in the filesystem proposal (N3365).
typedef ... string_type;
typedef string_type::const_iterator iterator;
typedef string_type::const_iterator const_iterator;
typedef std::iterator_traits<iterator>::value_type value_type;
The string_type is left unspecified in this proposal and is intended to implementation defined.
uri();
template <typename InputIterator>
uri(const InputIterator &first, const InputIterator &last);
template <class Source>
explicit uri(const Source &source);
uri &operator = (const uri &other);
uri &operator = (uri &&other);
const_iterator begin() const;
const_iterator end() const;
std::optional<string_ref> scheme() const;
std::optional<string_ref> user_info() const;
std::optional<string_ref> host() const;
std::optional<string_ref> port() const;
std::optional<string_ref> path() const;
std::optional<string_ref> authority() const;
std::optional<string_ref> query() const;
std::optional<string_ref> fragment() const;
bool empty() const noexcept;
bool absolute() const noexcept;
bool opaque() const noexcept;
This proposal specifies three transformer functions: normalize, relativize and resolve.
uri normalize(uri_comparison_level level) const;
uri relativize(const uri &other, uri_comparison_level level) const;
std::network::uri base_uri("http://www.example.com/");
std::network::uri uri("http://www.example.com/glynos/?key=value#fragment");
std::network::uri rel_uri(base_uri.relativize(uri, uri_comparison_level::string_comparison));
assert(rel_uri.string() == "/glynos/?key=value#fragment");
uri resolve(const uri &other, uri_comparison_level level) const;
int compare(const uri &other, uri_comparison_level level) const;
string string() const;
wstring wstring() const;
u16string u16string() const;
u32string u32string() const;
The proposed uri_builder class is provided in order to construct uri objects more safely and more productively.
namespace std {
namespace network {
class uri_builder {
public:
uri_builder();
explicit uri_builder(const uri &base);
template <typename Source>
explicit uri_builder(const Source &base);
uri_builder(const uri_builder &) = delete;
uri_builder &operator = uri_builder(const uri_builder &) = delete;
~uri_builder();
template <typename Source>
uri_builder &scheme(const Source &scheme);
template <typename Source>
uri_builder &user_info(const Source &user_info);
template <typename Source>
uri_builder &host(const Source &host);
template <typename Source>
uri_builder &port(const Source &port);
template <typename Source>
uri_builder &authority(const Source &authority);
template <typename Source>
uri_builder &authority(const Source &user_info, const Source &host, const Source &port);
template <typename Source>
uri_builder &path(const Source &path);
template <typename Source>
uri_builder &append_path(const Source &path);
template <typename Source>
uri_builder &query(const Source &query);
template <class Key, class Param>
uri_builder &query(const Key &key, const Param ¶m);
template <typename Source>
uri_builder &fragment(const Source &fragment);
std::network::uri uri() const;
};
} // namespace network
} // namespace std
The builder methods are templates. This can allow the implementation to provide specializations depending on the argument type in order ensure that resultant URI remains valid and consistent. This could mean performing encoding transformations or percent encoding on input strings where appropriate, and could allow, for example, the port to be provided as an integral type. More detailed examples are provided with the API description of each method below.
std::network::uri_builder builder;
builder.scheme("http")
.host("example.com")
.path("/glynos/")
.query("key", "value");
assert(builder.uri().string() == "http://example.com/glynos/?key=value")
namespace std {
namespace network {
uri_builder::uri_builder();
uri_builder::uri_builder(const uri &base);
template <typename Source>
uri_builder::uri_builder(const Source &base);
} // namespace network
} // namespace std
Constructs a uri_builder object from a base URI, where supplied.
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::scheme(const Source &scheme);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::user_info(const Source &user_info);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::host(const Source &host);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::port(const Source &port);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &authority(const Source &authority);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &authority(const Source &user_info, const Source &host, const Source &port);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::path(const Source &path);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::append_path(const Source &path);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::query(const Source &query);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <class Key, class Param>
uri_builder &uri_builder::query(const Key &key, const Param ¶m);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri_builder &uri_builder::fragment(const Source &fragment);
} // namespace network
} // namespace std
namespace std {
namespace network {
template <typename Source>
uri uri_builder::uri() const;
} // namespace network
} // namespace std
The following is a list of issues that have not yet been addressed as part of this proposal.
The most important open issue for this proposal is how to deal with text encoding in a portable way. The main difficulty is that different platforms, libraries and applications use different encodings, Unicode or otherwise. Interoperability is therefore extremely hard.
With the advent of char16_t and char32_t, and of std::u16string and std::u32string, this problem has been acknowledged and encoding can done more explicitly, but issues with interoperability have not been addressed.
This proposal currently partially resolves interoperability issues by using a template for functions and member functions when string arguments are used. e.g. The uri constructor:
namespace std {
namespace network {
template <class Source>
uri::uri(const Source &source);
} // namespace network
} // namespace std
Internally, the uri constructor can handle strings in different encodings by using template specialization to perform the correct transformation depending on the source type.
This can help simplify the library interface but is not completely satisfactory. For example, template specialization is limited to only those types known to the standard - std::basic_string and its variants, plus character arrays. A proposal exists to add a string_ref type to the standard (N3334), which could provide better flexibility and performance. This would replace part_range in the current proposal.
This approach is limited when returning strings from functions. e.g.:
namespace std {
namespace network {
template <class Result>
Result scheme(const uri &uri_);
} // namespace network
} // namespace std
...
std::network::uri uri("http://example.com/");
auto scheme = std::network::scheme<std::u32string>(uri);
The above excerpt cannot work with character array, character pointers or string_ref since memory allocation is required.
Secondly, RFC 3987 is not well supported and often unicode text is converted using percent encoding:
std::network::uri uri(
"http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");
The encode and decode functions should take this into account.
Thirdly, the iterators returned by begin and end are not portable with the proposal in its current form. This proposal allows different internal character types, but the iterator types are completely unaware of the encoding. So the following examples are not portable:
std::network::uri uri(
"http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");
std::u32string u32;
std::copy(std::begin(uri), std::end(uri), std::back_inserter(u32));
std::network::uri uri(
"http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");
auto path_range = uri.path();
std::u32string path;
std::copy(std::begin(path_range), std::end(path_range), std::back_inserter(path));
This proposal as it stands needs better library or language support in order offer better portability. One promising solution is given in another standard proposal, N3336.
At this stage it is important to note that there are many compatibility issues with URIs. If this proposal were to specify full standard compliance, it may not work in practice. For example, the uri parser implementation must correctly deal with escape control characters. Furthermore, no RFC fully specifies how to parse Windows file-system paths (e.g. file:///C:\path\to\a\file.txt). Finally, it is the belief of the author of this proposal that its success depends strongly on the correct portable behavior of the parser implementation. Without wishing to run the risk of over-specifying the parser, leaving it as “implementation defined” will not be sufficient. There will be more issues that may need to specified in further detail in future revisions of this proposal.
This proposal includes internationalized URIs (RFC 3987). Since this RFC is not widely supported, and as many applications deal with unicode characters through percent encoding, the parser implementation could be simplified if appropriate by removing RFC 3987 from this proposal.
There several potential sources for errors:
As explained in the previous section on normalization, there is more than one way to test if two URIs are equivalent depending on different accuracy, performance and complexity trade-offs.
Providing a single operator == is sufficient to allow developers to choose different ways of testing URI equivalence, although choosing a good default comparison will be sufficient for the vast majority of cases. Future revisions of this proposal could provide functionality to allow a library to use different parts of the Comparison Ladder.
The interface can be extended to allow more flexibility for accessing parts of the URI. For example, an accessor could be provided which converts all query elements into a map:
namespace std {
namespace network {
template <class QueryMap>
QueryMap query(const uri &u);
} // namespace network
} // namespace std
...
typedef std::map<std::string, std::string> QueryMap;
QueryMap query = std::network::query<QueryMap>(uri);
Additionally, this proposal can be extended to include factory functions for common operations, such as constructing a URI from a filesystem path:
namespace std {
namespace network {
uri uri::from_path(const filesystem::path &path);
} // namespace network
} // namespace std
...
std::filesystem::path path("/usr/bin/c++");
auto uri_path = std::network::uri::from_path(path);
assert(uri_path.string() == "file:///usr/bin/c++");
Furthermore, there may be new proposals for types that represent IP addresses. If such types can be accepted into the standard, it would be possible to accommodate them in future revisions of this proposal:
namespace std {
namespace network {
uri_builder &uri_builder::host(const std::network::address_ipv4 &host);
uri_builder &uri_builder::host(const std::network::address_ipv6 &host);
} // namespace network
} // namespace std
There exist several implementations of C++, including (but not limited to):
The cpp-netlib uri is the library that has motivated this proposal. It implements a generic URI, support for percent encoding, normalization and URI resolution. It also currently partially implements a URI builder.
QUrl supports much of the same functionality. It is fully internationalized, supports URI resolution and implicitly normalization. The part accessors differ from cpp-netlib, in that they return copies of the underlying parts, not references.
google-url is another existing library which is fast and robust. Its parsing strategy is such that it never fails (there is no validation), but always tries to find the best fit for the URL components. It supports normalization (but has called it canonicalization).
All of the implementations listed above do effectively the same thing - they parse a string and split it into components, either by storing separate copies of each part or by storing references to sub-strings in the original uri string.
In addition to RFC 3986 and RFC 3987, the WhatWG URL Standard is proposal for a standard that sets out to make URLs more interoperable.
The most important features of this proposed standard seem to be:
It does not explicitly describe operations such as normalization, comparison or resolution.
As this proposal allows the URI parsing to be implementation defined, there is no reason why an implementor could not decide to use this standard. Apart from a few naming differences, the proposed uri interface maps closely to the WhatWG URL JavaScript API. If there is a strong interest in doing so, a future revision of this proposal could rename its class and member functions.
C++ Network Library users and mailing list
Kyle Kloepper and Niklas Gustafsson for providing valuable feedback and encouragement, and for presenting different versions of this proposal at committee meetings.
Beman Dawes and his Filesystem proposal from which I was influenced strongly in the class design.
Wikipedia, for being there.