A URI Library for C++

Document Number: N3625
Date: 2013-04-30
Authors: Glyn Matthews <glyn.matthews@gmail.com>, Dean Michael Berris <dberris@google.com>

Motivation and Scope

Given the increased importance of being able to develop portable, scalable network-aware applications, C++ developers are at a disavantage in having no standard network implementation to use. One of the fundamental components of any network library is a URI and this proposal is motivated by the desire to introduce a portable, efficient and internationalized implementation of a URI to C++ standard library users.

This proposal is based on original development done in the cpp-netlib project http://cpp-netlib.org/. This implementation is released using the Boost software license and will track the proposed library as it evolves. A standalone project can be found at https://github.com/cpp-netlib/uri/.

The scope of this proposal will include a single uri type, some specifications about how URIs are intended to be processed and extended, including some additional helper types and functions. It will include a type and functions to build a URI from its components. Finally, it will include types and functions for percent encoding, URI references and URI normalization.

Example Usage

#include <experimental/uri>
#include <cassert>

int main() {
    std::experimental::uri uri("http://www.example.com/glynos/?key=value#frag");
    assert(uri.is_absolute());
    assert(!uri.is_opaque());
    assert(*uri.scheme() == "http");
    assert(*uri.host() == "www.example.com");
    assert(*uri.path() == "/glynos/");
    assert(*uri.query() == "key=value");
    assert(*uri.fragment() == "frag");
    return 0;
}

The code excerpt above shows a simple of how the proposed uri class will work. The URI string is parsed during object construction and broken down into its component parts. HTTP URIs are absolute and hierarchical (i.e. not opaque).

#include <experimental/uri>
#include <cassert>

int main() {
    std::experimental::uri uri(U"xmpp:example-node@example.com?message;subject=Hello%20World");
    assert(uri.is_absolute());
    assert(uri.is_opaque());
    assert(*uri.scheme() == "xmpp");
    assert(*uri.path() == "example-node@example.com");
    assert(*uri.query() == "message;subject=Hello%20World");
    return 0;
}

The uri in this proposal supports encoded strings and supports encoding conversion. The example above shows a uri object constructed using a std::u32string and allow the parts to be accessed as std::string_view objects.

Impact on the Standard

This proposal is a library extension that adds a header, two classes and some functions. It depends on the acceptance of other proposals (std::optional, std::string_view) for the API.

Revision History

Changes since Bristol

Namespace

The uri and uri_builder and all related free functions and operators now belong in the std::experimental namespace.

Input/Output Stream Operators

The input and output stream operators are now template operators and use std::basic_istream and std::basic_ostream.

Assignment

The assignment operators no longer needs to parse the URI string.

Percent Encoding and Decoding

All functions associated with percent encoding and decoding are now static members of the std::experimental::uri class instead of free functions. They also throw exceptions on encoding failures.

Query Accessors

absolute and opaque become is_absolute and is_opaque. This is a reversal of a previous revision (see N3484) and brings the uri proposal into line with the file system path (N3505).

Transformers

Added noexcept overloads that take an additional error_code & argument. This applies to normalize, make_reference and resolve.

Allocators

Added overloads to the uri constructors and to std::experimental::make_uri to better support allocators.

Changes since N3507

Contact details

The authors’ contact details have been updated.

std::string_view rename

The API has been updated to use the name std::string_view instead of std::string_ref.

Namespace

The proposed uri class is now in the std namespace, after feedback from the BSI C++ Panel. As a consequence the names of the encode_* and decode free functions have been updated to encode_uri_* and decode_uri.

relativize

uri::relativize has been renamed uri::make_reference.

Sections

A section describing the impact of the proposal on the standard has been added and the issues section was removed.

Changes since N3484

Query Accessor Error

The document has been fixed so that query part now starts after the question-mark separator.

Return value of parts accessors

All parts accessors (scheme, user_info, host, port, path, query and fragment) now all return std::optional<std::string_view>.

Comparison Level

normalize, relativize and resolve all accept as an argument the uri_comparison_level.

Exception

The uri(InputIterator, InputIterator) and uri(Source) constructors now can throw a std::system_error exception instead of uri_syntax_error. This makes the constructor exception consistent with the error code set by make_uri.

Comparison and Equality Operator

compare now returns an int instead of a bool. The return value of compare will be -1 if this (normalized) uri is lexographically less than that of the other, 0 if they are equal and 1 if this is greater than the other (see comparison ladder).

Updated Appendix

The items in the appendix describing the revision history have changed to this section. The appendix now contains a response to questions about existing practice and to the WhatWG URL Standard.

Changes since N3420

Underlying String Type

The updated uri class no longer specifies the underlying string type directly; it need not even be a std::basic_string. The std::uri::value_type is now defined as the value of the string iterator. Consequently, the uri class will no longer assume that the underlying string is in a contiguous block of memory, so the c_str() has been removed.

Exception

The uri(InputIterator, InputIterator) and uri(Source) constructors now can throw a uri_syntax_error exception. The is_valid member has been removed as it is now redundant.

Syntactic and Semantic Validity

This proposal in its current form only specifies tests for syntactic validity. To make this more explicit, the test for validity (now an exception) is named uri_syntax_error. This decision will influence others, especially related to normalization and equivalence. Future versions of this proposal may address the question of semantic validity. It would boil down to how complex the implementation of a generic URI class needs to be, and if it can be made extensible in the right ways to test scheme-specific semantics.

Accessor Name Consistency

The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.

normalize as a Free Function

The functions normalize, relativize and resolve have been removed and replaced with member functions in the class.

Comparison and Equality Operator

A function compare is provided which takes as a 3rd argument a uri_comparison_level enum. This can be used to determine the level of the comparison ladder to use when comparing URIs. A more complete description of what comparison and equivalence means can be discussed and elaborated in a future proposal.

The question of semantics of relational operators will also be addressed.

encode and decode

The functions pct_encode and pct_decode have been renamed encode and decode. A few more functions have been added to take into account the different types of encoding for different URI parts.

uri_builder

The changes requires to the uri::builder have been taken into account, including renaming it uri_builder and providing a method uri() that returns a new uri object.

Issues

Bikeshedding

As seems to be inevitable when developing a proposal for a standards committee, there are some differences of opinion regarding the names of certain functions. I will attempt to justify some of them now:

std::experimental::uri::resolve

There has been some concern about the ambiguity with this term and with name resolution. Nevertheless, RFC 3986, Section 5 calls this process ‘Reference Resolution’ and existing URI classes in other languages and frameworks also use this term (e.g. Java and Qt).

Query accessors

The names of the query accessors have changed at least twice (once removing the is_ suffix and again to add them). The current revision of the proposal is consistent at least with std::filesystem::path interface.

Extension points

The current proposal does not provide any way to allow a library implementation or user to extend the API. This would be important to allow scheme or protocol specific implementations of normalize and compare. A future revision should address this.

Allocator support

A future revision of this proposal should acknowledge polymorphics allocators (N3525).

Generic Syntax

The generic syntax of a URI is defined in RFC 3986.

All URIs are of the form:

scheme ":" "hierarchical part" [ "?" query ] [ "#" fragment ]

The scheme is used to identify the specification needed to parse the rest of the URI. A generic syntax parser can parse any URI into its main parts. The scheme can then be used to identify whether further scheme-specific parsing can be performed.

The hierarchical part refers to the part of the URI that holds identification information that is hierarchical in nature. This may contain an authority (always prefixed with a double slash "//") and/or a path. The path part is required, thought it may be empty. The authority part holds an optional user info part, ending with an at sign "@"; a host identifier and an optional port number, preceded by a colon ":". The host may be an IP address or domain name. RFC 3986 does not specify the format for IPv6 addresses, though RFC 2732 does.

The query is an optional part following a question mark "?" that contains information that is not hierarchical.

Finally, the fragment is an optional part, prefixed by a hash symbol "#" that is used to identify secondary sources.

RFC 3987 specifies a new protocol element, the Internationalized Resource Identifier (IRI). The IRI complements a URI, and extends it to allow unicode characters.

This proposal will define a uri type that will attempt to encompass all three of these RFCs.

Percent Encoding

URI percent encoding is described in RFC 3986, section 2.1 and RFC 3986, section 2.4.

Percent encoding is the mechanism used to encode reserved characters in a URI. According to RFC 3986, section 2.2, the set of reserved characters are:

Set of reserved characters and percent encoded strings
! # $ & ' ( ) * + , / : ; = ? @ [ ]
%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D

Percent encoding is not limited to reserved characters. Any character data may be percent encoded:

Common characters and percent encoded strings
newline space " % - . < > \ ^ _ ` { | } ~
%0A %20 %22 %25 %2D %2E %3C %3E %5C %5E %5F %60 %7B %7C %7D %7E

URI Normalization and Comparison

URI normalization is described in RFC 3986, Section 6 and in RFC 3987, Section 5. Normalization is the process by which a URI is transformed in order to determine if two URIs are equivalent.

Different types of normalization may preserve semantics, and others may not. Normalization may also depend on the scheme.

Converting the Scheme / Host to Lower Case

The scheme and host are case-insensitive. The proposed normalization solution will convert these to lowercase.

HTTP://Example.com/ --> http://example.com/

The user info, path, query and fragment are case-sensitive and so must not be converted.

Capitalizing Characters in Escape Sequences

Characters in a percent-encoded triplet are case-insensitive. The proposed normalization solution will convert these to lowercase.

http://example.com/%5b%5d --> http://example.com/%5B%5D

Decoding Unreserved Characters

Unreserved characters that have been encoded will be decoded.

http://example.com/%7Eglynos/ --> http://example.com/~glynos/

Adding Trailing /

If a path refers to a directory, it should be indicated with a trailing slash.

http://example.com/glynos --> http://example.com/glynos/

But not if the path refers to a file.

http://example.com/glynos/page.html --> http://example.com/glynos/page.html

Removing dot-segments from the Path

The segments ”..” and ”.” can be removed according to the algorithm specified in RFC 3986, Section 5.2.4.

http://example.com/glynos/./proposals/../ --> http://example.com/glynos/

Removing the default port

Some schemes may have a default port (for HTTP it is 80). The default port can be removed.

http://example.com:80/ --> http://example.com/
http://example.com:/ --> http://example.com/

The Comparison Ladder

The Comparison Ladder is described in RFC 3986, Section 6.2. It explains that comparing URIs using normalization can be implemented in different ways according to the complexity of the method and the number of false negatives which may arise.

String comparison: The simplest and fastest method is to simply test the URI strings byte-for-byte.
Case normalization: The first step to reduce false negatives is to normalize the parts that are case-insenstive - the scheme and the host and any percent-encoded triplets.
Percent encoding normalization: Next, any percent-encoded triplets that correspond to unreserved characters can be decoded.
Path segment normalization: Any dot-segments can be removed from the path.
Scheme based normalization: Trailing slashes can be added and default ports can be removed. Additionally for HTTP, key/value pairs in the query can appear in any order.
Protocol based normalization: Finally, URI equivalence can be tested by testing the resources directly, e.g. using HTTP to see if one URI redirects to another.

The final two steps in the Comparison Ladder require more information than can be provided within the limits of the proposal in order to be implemented comprehensively, and will not form part of the proposal at this stage.

URI References

URI references are described in RFC 3986, section 4, RFC 3986, section 5 and RFC 3987, section 6.5. URI references are particularly useful when working on the server side when the base URI is always the same, and also when using URIs within the same document.

Two operations related to references are of use: acquiring the relative reference of a URI, and resolving a reference against a base URI.

Header <experimental/uri> Synopsis

#include <string>        // std::basic_string
#include <system_error>  // std::error_code
#include <iosfwd>        // std::basic_istream, std::basic_ostream
#include <iterator>      // std::iterator_traits
#include <memory>        // std::allocator
#include <optional>      // std::optional

namespace std {
namespace experimental {
// class declarations
class uri;
class uri_builder;
class uri_syntax_error;
class uri_builder_error;
class uri_encoding_error;

enum class uri_error {
 invalid_syntax,
 invalid_uri,
 invalid_scheme,
 invalid_user_info,
 invalid_host,
 invalid_port,
 invalid_path,
 invalid_query,
 invalid_fragment,
};

enum class uri_comparison_level {
 string_comparison,
 case_normalization,
 percent_encoding_normalization,
 path_segment_normalization,
 scheme_based_normalization,
 protocol_based_normalization,
};

// factory functions
template <class Source>
uri make_uri(const Source &source, std::error_code &e) noexcept;
template <class InputIter>
uri make_uri(InputIter first, InputIter last, std::error_code &e) noexcept;
template <class Source, class Alloc = std::alloc<uri::value_type>>
uri make_uri(const Source &source, const Alloc &alloc, std::error_code &e) noexcept;
template <class InputIter, class Alloc = std::allocator<uri::value_type>>
uri make_uri(InputIter first, InputIter last, const Alloc &alloc, std::error_code &e) noexcept;

// swap functions
void swap(uri &lhs, uri &rhs);

// equality and comparison operators
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator <  (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator >  (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);

// stream operators
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits> &
operator << (std::basic_ostream<CharT, CharTraits> &os, const uri &u);
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits> &
operator >> (std::basic_istream<CharT, CharTraits> &is, uri &u);
} // namespace experimental
} // namespace std

Declarations

The <experimental/uri> header contains a declaration for a single uri class in the std::experimental namespace.

Previous revisions of this proposal used the sub-namespace network namespace, but given feedback this is no longer the case.

Factory functions

// factory functions
namespace std {
namespace experimental {
template <class Source>
uri make_uri(const Source &source, std::error_code &e) noexcept;
template <class InputIter>
uri make_uri(InputIter first, InputIter last, std::error_code &e) noexcept;
template <class Source, class Alloc = std::alloc<uri::value_type>>
uri make_uri(const Source &source, const Alloc &alloc, std::error_code &e) noexcept;
template <class InputIter, class Alloc = std::allocator<uri::value_type>>
uri make_uri(InputIter first, InputIter last, const Alloc &alloc, std::error_code &e) noexcept;
} // namespace experimental
} // namespace std

These factory functions are provided in order to be able to construct a uri object without throwing an exception. The error code is stored in the std::error_code object, if the string source is an invalid URI.

Equality and Comparison Operators

namespace std {
namespace experimental {
enum uri_comparison_level { ... };
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator <  (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator >  (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);
} // namespace experimental
} // namespace std
Effects: This proposal specifies common overloads of the equality, inequality and comparison. The equality and inequality operators test two uri objects according to the notion of equivalence (RFC 3986, section 6.1 and RFC 3986, section 6.2).

Stream Operators

namespace std {
namespace experimental {
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits> &
operator << (std::basic_ostream<CharT, CharTraits> &os, const uri &u);
} // namespace experimental
} // namespace std

Output stream operator for std::experimental::uri.

namespace std {
namespace experimental {
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits> &
operator >> (std::basic_istream<CharT, CharTraits> &is, uri &u);
} // namespace experimental
} // namespace std

Input stream operator for std::experimental::uri.

Error Reporting

The proposed std::experimental::uri class constructor throws a std::experimental::uri_syntax_error exception if the URI string passed as an argument is an invalid URI. A factory function, std::experimental::make_uri is provided that sets an error code to report an error.

The proposed std::experimental::uri::encode_* and std::experimental::uri::decode throw a uri_encoding_error.

The uri member function of std::experimental::uri_builder can also throw a std::experimental::uri_builder_error exception if it is not possible to build a valid URI.

std::experimental::uri_syntax_error, std::uri_encoding_error and std::experimental::uri_builder_error are all subclasses of std::system_error.

An enum class, std::experimental::uri_error, forms a part of this proposal, and its values are used to indicate the type of URI error in a std::error_code object.

std::experimental::uri constructors can throw std::bad_alloc on a failure to allocate storage.

Class uri

Below is the proposed interface for the uri class:

namespace std {
namespace experimental {
class uri {

public:

    // typedefs
    typedef ... string_type;
    typedef string_type::const_iterator iterator;
    typedef string_type::const_iterator const_iterator;
    typedef std::iterator_traits<iterator>::value_type value_type;
    typedef basic_string_view<value_type> string_view;

    // constructors and destructor
    uri();
    template <typename InputIter, class Alloc = std::allocator<value_type>>
    uri(const InputIter &first, const InputIter &last, const Alloc &alloc = Alloc());
    template <class Source, class Alloc = std::allocator<value_type>>
    explicit uri(const Source &source, const Alloc &alloc = Alloc());
    uri(const uri &other);
    uri(uri &&other) noexcept;
    ~uri();

    // assignment
    uri &operator = (const uri &other);
    uri &operator = (uri &&other) noexcept;

    // swap
    void swap(uri &other) noexcept;

    // iterators
    const_iterator begin() const;
    const_iterator end() const;

    // accessors
    std::optional<string_view> scheme() const;
    std::optional<string_view> user_info() const;
    std::optional<string_view> host() const;
    std::optional<string_view> port() const;
    std::optional<string_view> path() const;
    std::optional<string_view> authority() const;
    std::optional<string_view> query() const;
    std::optional<string_view> fragment() const;

    // query
    bool empty() const noexcept;
    bool is_absolute() const noexcept;
    bool is_opaque() const noexcept;

    // transformers
    uri normalize(uri_comparison_level level) const;
    uri normalize(uri_comparison_level level, std::error_code &ec) const noexcept;
    uri make_reference(const uri &other, uri_comparison_level level) const;
    uri make_reference(const uri &other, uri_comparison_level level,
                       std::error_code &ec) const noexcept;
    uri resolve(const uri &other, uri_comparison_level level) const;
    uri resolve(const uri &other, uri_comparison_level level,
                std::error_code &ec) const noexcept;

    // comparison
    int compare(const uri &other, uri_comparison_level level) const noexcept;

    // string accessors
    template <typename CharT,
              class CharTraits = std::char_traits<CharT>,
              class Alloc = std::allocator<CharT>>
    std::basic_string<CharT, CharTraits, Alloc> string(const Alloc &alloc = Alloc()) const;
    std::string string() const;
    std::wstring wstring() const;
    std::u16string u16string() const;
    std::u32string u32string() const;

    // percent encoding and decoding
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_user_info(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_host(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_port(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_path(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_query(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter encode_fragment(InputIter first, InputIter last, OutputIter out);
    template <typename InputIter, typename OutputIter>
    static OutputIter decode(InputIter first, InputIter last, OutputIter out);

};
} // namespace experimental
} // namespace std

The uri class itself is a little more than a light-weight wrapper around a string, a parser and the uri‘s component parts. Parsing is performed upon construction and, if successfully parsed, the component parts are stored as string_view‘s that reference the original string. For example, consider the following URI:

http://www.example.com/path/?key=value#fragment
^   ^  ^              ^     ^^        ^^       ^
a   b  c              d     ef        gh       i

On parsing, the uri object will contain a set of range types corresponding to the ranges for scheme, user info, host, port, path, query and fragment. So the ranges corresponding to the example above will be:

URI part Range String
scheme [a, b) "http"
user_info   nullopt
host [c, d) "www.example.com"
port   nullopt
path [d, e) "/path/"
query [f, g) "key=value"
fragment [h, i) "fragment"

uri Requirements

Template parameters named InputIter are required meet the requirements for a C++ standard library RandomIterator compliant iterator. The iterator’s value type is required to be char, wchar_t, char16_t, or char32_t.

Template parameters named Source are required to be one of:

A container with a value type of char, wchar_t, char16_t, or char32_t.

An iterator for a null terminated byte-string. The value type is required to be char, wchar_t, char16_t, or char32_t.

A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t.

This is identical wording to that found in the filesystem proposal (N3365).

typedef s

typedef ... string_type;
typedef string_type::const_iterator iterator;
typedef string_type::const_iterator const_iterator;
typedef std::iterator_traits<iterator>::value_type value_type;

The string_type is left unspecified in this proposal and is intended to be implementation defined.

Constructors and Destructors

uri();
Postconditions: empty() == true
template <typename InputIter, class Alloc = std::allocator<value_type>>
uri(const InputIter &first, const InputIter &last, const Alloc &alloc = Alloc());
Effects: The range is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and operating system.
Throws: std::bad_alloc, std::system_error
template <class Source, class Alloc = std::allocator<value_type>>
uri(const Source &source, const Alloc &alloc = Alloc());
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and operating system.
uri(const uri &other);
Effects: The source uri string is copied to the new uri object.
Throws: std::bad_alloc
uri(uri &&other) noexcept;
Effects: The source uri string is moved to the new uri object.

Assignment

uri &operator = (const uri &other);
Effects: The uri is copy assigned to the uri object.
Throws: std::bad_alloc
uri &operator = (uri &&other) noexcept;
Effects: The uri is move assigned to the uri object.

Swap

void swap(uri &other) noexcept;
Effects: Swaps the contents of this object with the other.

Iterators

const_iterator begin() const;
Returns: An iterator to the first element in the underlying string container.
const_iterator end() const;
Returns: An iterator to the end of the underlying string container.

Accessors

std::optional<string_view> scheme() const;
Returns: Returns a std::optional<string_view> object which spans the range of the scheme in the underlying URI.
std::optional<string_view> user_info() const;
Returns: Returns a std::optional<string_view> object which spans the range of the user info in the underlying URI.
std::optional<string_view> host() const;
Returns: Returns a std::optional<string_view> object which spans the range of the host in the underlying URI.
std::optional<string_view> port() const;
Returns: Returns a std::optional<string_view> object which spans the range of the port in the underlying URI.
std::optional<string_view> path() const;
Returns: Returns a std::optional<string_view> object which spans the range of the path in the underlying URI.
std::optional<string_view> authority() const;
Returns: Returns a std::optional<string_view> object which spans the range of the authority in the underlying URI.
std::optional<string_view> query() const;
Returns: Returns a std::optional<string_view> object which spans the range of the query in the underlying URI.
std::optional<string_view> fragment() const;
Returns: Returns a std::optional<string_view> object which spans the range of the fragment in the underlying URI.

Query

bool empty() const noexcept;
Returns: true if the underlying string object is empty, false otherwise.
bool is_absolute() const noexcept;
Returns: true if the URI is absolute. Equivalent to !scheme().empty().
bool is_opaque() const noexcept;
Returns: true if the URI is absolute and its scheme is not hierarchical (i.e. scheme-specific does not start with a double-slash // and its authority is empty).

Transformers

This proposal specifies three transformer functions: normalize, make_reference and resolve.

uri normalize(uri_comparison_level level) const;
Postconditions: u.normalize(level) == u
Effects: normalize takes as an argument a uri object and returns a normalized uri object.
Throws: std::bad_alloc
uri make_reference(const uri &other, uri_comparison_level level) const;
Postconditions: !u1.make_reference(u2, level).absolute()
Effects: Returns a relative URI reference.
Throws: std::bad_alloc
std::experimental::uri base_uri("http://www.example.com/");
std::experimental::uri uri("http://www.example.com/glynos/?key=value#fragment");
std::experimental::uri rel_uri(base_uri.make_reference(uri,
                               std::experimental::uri_comparison_level::string_comparison));
assert(rel_uri.string() == "/glynos/?key=value#fragment");
uri resolve(const uri &other, uri_comparison_level level) const;
Postconditions: u1.resolve(u2, level).absolute()
Effects: resolve resolves the second uri object against the first, and returns a new uri.
Throws: std::bad_alloc
int compare(const uri &other, uri_comparison_level level) const;
Effects:
Returns -1 if this is less than other, given the comparison level; 0 if they are considered equal and 1 if this is greater.

String Accessors

template <typename CharT,
          class CharTraits = std::char_traits<CharT>,
          class Alloc = std::allocator<CharT>>
std::basic_string<CharT, CharTraits, Alloc> string(const Alloc &alloc = Alloc()) const;
Throws: std::bad_alloc
std::string string() const;
Throws: std::bad_alloc
std::wstring wstring() const;
Throws: std::bad_alloc
std::u16string u16string() const;
Throws: std::bad_alloc
std::u32string u32string() const;
Throws: std::bad_alloc

Percent Encoding and Decoding

template <typename InputIter, typename OutputIter>
static OutputIter
encode_user_info(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the user_info part (RFC 3986, section 2.1).
Returns: An iterator to the last element of a user_info string that has been encoded.
Throws: std::experimental::uri_encoding_error
template <typename InputIter, typename OutputIter>
static OutputIter
encode_host(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the host part (RFC 3986, section 2.1).
Returns: An iterator to the last element of a host string that has been encoded.
Throws: std::experimental::uri_encoding_error
template <typename InputIter, typename OutputIter>
static OutputIter
encode_port(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the port part (RFC 3986, section 2.1).
Returns: An iterator to the last element of a port string that has been encoded.
Throws: std::experimental::uri_encoding_error
template <typename InputIter, typename OutputIter>
static OutputIter
encode_path(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the path part (RFC 3986, section 2.1).
Returns: An iterator to the last element of a path string that has been encoded.
Throws: std::experimental::uri_encoding_error
template <typename InputIter, typename OutputIter>
static OutputIter
encode_query(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the query part (RFC 3986, section 2.1).
Returns: An iterator to the last element of a query string that has been encoded.
Throws: std::experimental::uri_encoding_error
template <typename InputIter, typename OutputIter>
static OutputIter
encode_fragment(InputIter first, InputIter last, OutputIter out);
Effects: Encodes special characters for the fragment part (RFC 3986, section 2.1).
Throws: std::experimental::uri_encoding_error
Returns: An iterator to the last element of a fragment string that has been encoded.
template <typename InputIter, typename OutputIter>
static OutputIter
 decode(InputIter first, InputIter last, OutputIter out);
Effects: Decodes special characters in the source string and returns the unencoded string (RFC 3986, section 2.1).
Returns: An iterator to the last element of a uri string that has been decoded.

Class uri_builder

The proposed uri_builder class is provided in order to construct uri objects more safely and more productively.

namespace std {
namespace experimental {
class uri_builder {

private:

    uri_builder(const uri_builder &) = delete;
    uri_builder &operator = uri_builder(const uri_builder &) = delete;

public:

    uri_builder();
    explicit uri_builder(const uri &base);
    template <typename Source>
    explicit uri_builder(const Source &base);
    ~uri_builder();

    template <typename Source>
    uri_builder &scheme(const Source &scheme);

    template <typename Source>
    uri_builder &user_info(const Source &user_info);

    template <typename Source>
    uri_builder &host(const Source &host);

    template <typename Source>
    uri_builder &port(const Source &port);

    template <typename Source>
    uri_builder &authority(const Source &authority);

    template <typename Source>
    uri_builder &authority(const Source &user_info, const Source &host, const Source &port);

    template <typename Source>
    uri_builder &path(const Source &path);

    template <typename Source>
    uri_builder &append_path(const Source &path);

    template <typename Source>
    uri_builder &query(const Source &query);

    template <class Key, class Param>
    uri_builder &query(const Key &key, const Param &param);

    template <typename Source>
    uri_builder &fragment(const Source &fragment);

    std::experimental::uri uri() const;

};
} // namespace experimental
} // namespace std

The builder member functions are templates. This can allow the implementation to provide specializations depending on the argument type in order ensure that resultant URI remains valid and consistent. This could mean performing normalization or percent encoding on input strings where appropriate, and could allow other types to be used. For example, the port could be provided as an integral type.

To build a uri object, invoke the uri_builder::uri() member function. All encoding takes place here, and a std::system_error is thrown in case the uri is invalid.

Example 1:

The first example shows that the uri_builder can build a uri with a port number supplied as an integer. Also, since the uri_builder also handles normalization, a trailing slash is added where a path was not provided.

std::experimental::uri_builder builder;
builder.scheme("http")
       .host("example.com")
       .port(8080);
assert(builder.uri().string() == "http://example.com:8080/");
Example 2:

The builder should work with any scheme. The second example shows how the uri_builder would be used to create an e-mail address using the mailto scheme.

std::experimental::uri_builder builder;
builder.scheme("mailto")
       .path("glynos@example.com");
assert(builder.uri().string() == "mailto:glynos@example.com");
Example 3:

The third example shows how the builder should normalize all parts of the URI.

std::experimental::uri_builder builder;
builder.scheme("http")
       .host("Example.com")
       .path("%7Eglynos//My Path/");
assert(builder.uri().string() == "http://example.com/~glynos/My%20Path/");

Constructors

namespace std {
uri_builder::uri_builder();
} // namespace std

Constructs a uri_builder object.

namespace std {
uri_builder::uri_builder(const uri &base);
} // namespace std

Constructs a uri_builder object from a base URI.

namespace std {
template <typename Source>
uri_builder::uri_builder(const Source &base);
} // namespace std

Constructs a uri_builder object from a base URI.

Builder functions

namespace std {
template <typename Source>
uri_builder &uri_builder::scheme(const Source &scheme);
} // namespace std
Effects: Sets the URI scheme.
namespace std {
template <typename Source>
uri_builder &uri_builder::user_info(const Source &user_info);
} // namespace std
Effects: Sets the URI user_info.
namespace std {
template <typename Source>
uri_builder &uri_builder::host(const Source &host);
} // namespace std
Effects: Sets the URI host.
namespace std {
template <typename Source>
uri_builder &uri_builder::port(const Source &port);
} // namespace std
Effects: Sets the URI port.
namespace std {
template <typename Source>
uri_builder &authority(const Source &authority);
} // namespace std
Effects: Sets the URI authority.
namespace std {
template <typename Source>
uri_builder &authority(const Source &user_info, const Source &host, const Source &port);
} // namespace std
Effects: Sets the URI user info, host and port.
namespace std {
template <typename Source>
uri_builder &uri_builder::path(const Source &path);
} // namespace std
Effects: Sets the URI path.
namespace std {
template <typename Source>
uri_builder &uri_builder::append_path(const Source &path);
} // namespace std
Effects: Appends an element to the uri object’s path.
namespace std {
template <typename Source>
uri_builder &uri_builder::query(const Source &query);
} // namespace std
Effects: Sets the URI query.
namespace std {
template <class Key, class Param>
uri_builder &uri_builder::query(const Key &key, const Param &param);
} // namespace std
Effects: Appends a key-value pair to the uri object’s query.
namespace std {
template <typename Source>
uri_builder &uri_builder::fragment(const Source &fragment);
} // namespace std
Effects: Sets the URI fragment.
namespace std {
template <typename Source>
uri uri_builder::uri() const;
} // namespace std
Effects: Builds a URI object from the provided parts.
Throws: std::bad_alloc, std::system_error

Appendix

Existing Practice

There exist several implementations of C++, including (but not limited to):

The cpp-netlib uri is the library that has motivated this proposal. It implements a generic URI, support for percent encoding, normalization and URI resolution. It also currently partially implements a URI builder.

QUrl supports much of the same functionality. It is fully internationalized, supports URI resolution and implicitly normalization. The part accessors differ from cpp-netlib, in that they return copies of the underlying parts, not references.

google-url is another existing library which is fast and robust. Its parsing strategy is such that it never fails (there is no validation), but always tries to find the best fit for the URL components. It supports normalization (but has called it canonicalization).

All of the implementations listed above do effectively the same thing - they parse a string and split it into components, either by storing separate copies of each part or by storing references to sub-strings in the original uri string.

WhatWG URL Standard

In addition to RFC 3986 and RFC 3987, the WhatWG URL Standard is proposal for a standard that sets out to make URLs more interoperable.

The most important features of this proposed standard seem to be:

  • The obsolescence of RFC 3986 and RFC 3987 and an alignment of the URL with modern implementations
  • To standardize on the term URL, instead of URI or IRI
  • A detailed definition of the URL JavaScript API

It does not explicitly describe operations such as normalization, comparison or resolution.

As this proposal allows the URI parsing to be implementation defined, there is no reason why an implementor could not decide to use this standard. Apart from a few naming differences, the proposed uri interface maps closely to the WhatWG URL JavaScript API. If there is a strong interest in doing so, a future revision of this proposal could rename its class and member functions.

Acknowledgements

C++ Network Library users and mailing list

Kyle Kloepper and Niklas Gustafsson for providing valuable feedback and encouragement, and for presenting different versions of this proposal at committee meetings.

Beman Dawes and his Filesystem proposal from which I was influenced strongly in the class design.

Wikipedia, for being there.