Working Draft Technical Specification - URI

Document Number: N3827
Revises: N3792
Date: 2014-01-19
Authors: Glyn Matthews <glyn.matthews@gmail.com>, Dean Michael Berris <dberris@google.com>

Introduction

Note

Notes highlighted in yellow are comments on the working draft and are not intended for the actual TS.

Revisions to N3792

  1. Normalization is now an invariant of the std::experimental::uri class.
  2. Removed uri_normalization_level.
  3. Removed normalize member function.

Revisions to N3720

  1. Added an “Error Reporting” section.
  2. Renamed “Alloc” to “Allocator” to bring it more in to line with the standard.
  3. Renamed uri::make_reference to uri::make_relative.
  4. Renamed uri_comparison_level to uri_normalization_level.
  5. Added sections for uri_error and uri_normalization_level.
  6. Added a base_uri_error that can be thrown by the resolve and make_relative member functions in the event that the base URI is not valid.
  7. Improved documentation of URI resolution in the Terms and Definition section and in the description of the resolve member functions.
  8. Using the standard hash specialization instead of hash_value.
  9. Allowed the Source template argument to be basic_string_view<CharT, CharTraits>
  10. uri_builder now allows copy and move.
  11. All constructors that accept an argument of type Source now also take InputIterator overloads.
  12. Fixed minor typographical errors.

1 Scope [uri.scope]

The scope of this Technical Specification will include a single std::experimental::uri type, specifications about how the are intended to be processed and extended, including some additional helper types and functions. It will include a std::experimental::uri_builder type to build a URI from its components. Finally, it will include types and functions for percent encoding, URI references, reference resolution and URI normalization and comparison.

2 Conformance [uri.conformance]

2.1 Generic syntax [uri.conformance.generic-syntax]

The generic syntax of a URI is defined in IETF RFC 3986. section 3.

All URIs are of the form:

scheme ":" "hierarchical part" [ "?" query ] [ "#" fragment ]

The scheme is used to identify the specification needed to parse the rest of the URI. A generic syntax parser can parse any URI into its main parts. The scheme can then be used to identify whether further scheme-specific parsing can be performed.

The hierarchical part refers to the part of the URI that holds identification information that is hierarchical in nature. This may contain an authority (always prefixed with a double slash (“//”)) and/or a path. The path part is required, thought it may be empty. The authority part holds an optional user info part, ending with an at sign (“@”); a host identifier and an optional port number, preceded by a colon (”:”). The host may be an IP address or domain name. The normative reference for IPv6 addresses is IETF RFC 2732.

The query is an optional part following a question mark (”?”) that contains information that is not hierarchical.

Finally, the fragment is an optional part, prefixed by a hash symbol (“#”) that is used to identify secondary sources.

IETF RFC 3987 specifies a new protocol element, the Internationalized Resource Identifier (IRI). The IRI complements a URI, and extends it to allow unicode characters. The syntax of an IRI is specified in IETF RFC 3987, section 2.

IETF RFC 6874 specifies scoped IDs in IPv6 addresses. The syntax is specified in IETF RFC 6874, section 2.

2.2 URI Normalization and Comparison [uri.conformance.uri-normalization-and-comparison]

The rules for URI normalization are specified in IETF RFC 3986, section 6 and IETF RFC 3987, section 5.

2.3 URI References [uri.conformance.uri-references]

The rule for transforming references is given in IETF RFC 3986, section 5.2.2.

2.4 Removing Dot Segments [uri.conformance.removing-dot-segments]

The rule for removing dot segments is given in IETF RFC 3986, section 5.2.4.

2.5 URI Recomposition [uri.conformance.uri-recomposition]

The rule for recomposing a URI from its parts is given in IETF RFC 3986, section 5.3.

3 Terms and Definitions [uri.definitions]

3.1 URI [uri.definition.uri]

A Uniform Resource Identifier is a sequence of characters from a limited set with a specific syntax used to identify a name or resource. URIs can be classified as URLs or URNs. The URI syntax is defined in IETF RFC 3986.

3.2 URL [uri.definition.url]

A Uniform Resource Locator (URL) is a type of URI, complementary to a URN used to locate a resource over a network.

3.3 URN [uri.definition.urn]

A Uniform Resource Name (URN) is a type of URI, complementary to a URL used to unambiguously identify resources.

3.4 IRI [uri.definition.iri]

An Internationalized Resource Identifier (IRI) is a complement to the URI that allows characters from the Universal Character Set (Unicode/ISO 10646). The IRI syntax is defined in IETF RFC 3987.

3.5 URI Part [uri.definition.uri-part]

A generic URI is decomposed into four principal parts: the scheme, the hierarchical part, an optional query and optional fragment. The hierarchical part can be further decomposed into four parts: the user info, host, port and path.

3.6 Scheme [uri.definition.scheme]

A scheme name is the top level of the URI naming structure. It indicates the specifications, syntax and semantics of the rest of the URI structure. It is always followed by a colon (”:”).

3.7 Query [uri.definition.query]

A query is a part, indicated by a question mark (”?”) and terminated by a hash (“#”), that contains non-hierarchical information. It is commonly structured as a sequence of key-value parameter values separated by equals (“=”), which are separated by a semi-colon (”;”) or ampersand (“&”).

3.8 Fragment [uri.definition.fragment]

A fragment is indicated by a hash (“#”) and allows indirect identification of a secondary resource. For example, a fragment may refer to a section header in an HTML document with an id attribute of the same name.

3.9 Hierarchical Part [uri.definition.hierarchical-part]

The hierarchical part of a URI contains hierarchical information. If it starts with a double forward slash (“//”), it is followed by an authority and a path. The authority can be further broken down into a user-information part, a hostname and a port. The authority is followed by an optional path. If the hierarchical part does not begin with a double forward slash (“//”), then it must contain a path.

3.10 Authority [uri.definition.authority]

The hierarchical part contains an authority. The authority contains an optional user info followed by at (“@”), a host and an optional port, preceded by a colon (”:”).

3.11 User Info [uri.definition.user-info]

The user info is an optional part of the URI authority, terminated by at (“@”) and is followed by a host. It is used in the telnet scheme:

telnet://<user>:<password>@<host>:<port>/

3.12 Host [uri.definition.host]

The hostname contains a domain name or IP address.

3.13 Domain Name [uri.definition.domain-name]

A domain name is human-readable string used to identify a host. Domain names are registered in the Domain Name System (DNS).

3.14 IP Address [uri.definition.ip-address]

The IP address can either be an IPv4 (e.g. 127.0.0.1) or an IPv6 address (e.g. ::1. In a URI, an IPv6 address is enclosed in square braces (“[]”).

3.15 Port [uri.definition.port]

The optional port is always preceded by a colon (”:”). If the port is not present, even if a colon is present, then the port is considered to have the value of the default port of the scheme.

3.16 Path [uri.definition.path]

The path is a part of the hierarchical data and is a sequence of segments, each separated by a forward slash (“/”). It is terminated by a question mark (”?”), followed by a query, a hash (“#”) followed by a fragment or by the end of the URI.

3.17 Dot Segments [uri.definition.dot-segments]

Dot segments are elements in a path containing either a dot (”.”) or a double dot (”..”), separated by a forward slash (“/”). Dot segments can be removed from a path as part of its normalization without changing the URI semantics.

3.18 Absolute URI [uri.definition.absolute-uri]

An absolute URI always specifies the scheme. URIs that don’t provide the scheme are called relative references.

3.19 Opaque URI [uri.definition.opaque-uri]

An opaque URI is an absolute URI that does not provide a double slash (“//”) after the scheme-delimiting colon (”:”). Opaque URIs have no authority and the part immediately following the colon (”:”) is the path. Some examples of opaque URIs are:

mailto:john.doe@example.com
news:comp.lang.c++

URIs that provide a double slash (“//”) following the scheme-delimiting colon (”:”) are known as hierarchical URIs. Some examples are:

http://www.example.com/
ftp://john.doe@ftp.example.com/

3.20 Normalization [uri.definition.normalization]

URI normalization is the process by which a URI is transformed in order to determine of two URIs are equivalent. There are different levels to comparison, which trade-off the number of false negatives and complexity. The normalization and comparison procedures are defined in IETF RFC 3986, section 6.

3.21 Comparison Ladder [uri.definition.comparison-ladder]

The comparison ladder describes how URIs can be compared using normalization in different ways, trading off the complexity of the method and the number of false negatives. The comparison ladder is defined in IETF RFC 3986, section 6.2 and IETF RFC 3987, section 5.3.

3.22 Relative Reference [uri.definition.relative-reference]

Relative references are URIs that do not provide a scheme. Relative references are only usable when a base URI is known, against which the relative reference can be resolved. The relative reference is defined in IETF RFC 3986, section 4.2 and IETF RFC 3987, section 6.5.

3.23 Reference Resolution [uri.definition.reference-resolution]

Relative references can be resolved against a base URI, producing an absolute URI. Only the scheme is required to be present in the base URI. Reference resolution is defined in IETF RFC 3986, section 5.

Pre-parsing and normalization of the URI is performed before transforming the reference.

The transform reference for resolving URIs is given in IETF RFC 3986, section 5.2.2.

3.24 Percent Encoding [uri.definition.percent-encoding]

Percent encoding is the mechanism used to encode reserved characters in a URI. See IETF RFC 3986, section 2.1.

3.25 Case Normalization [uri.definition.case-normalization]

All characters in a URI scheme and host must be lower-case. All hexidecimal digits within a percent-encoded triplet must be upper-case. See IETF RFC 3986, section 6.2.2.1 and IETF RFC 3987, section 5.3.2.1.

3.26 Percent Encoding Normalization [uri.definition.percent-encoding-normalization]

URIs should be normalized by decoding any percent-encoded octet that corresponds to a an unreserved character. See IETF RFC 3986, section 6.2.2.2 and IETF RFC 3987, section 5.3.2.3.

3.27 Path Segment Normalization [uri.definition.path-segment-normalization]

Path segments [uri.definition.dot-segments] should be removed from URIs that are not relative references. See IETF RFC 3986, section 6.2.2.3 and IETF RFC 3987, section 5.3.2.4.

3.28 Character Normalization [uri.definition.character-normalization]

In Unicode, different sequences of characters could be defined as equivalent depending on how they are encoded. See IETF RFC 3987, section 5.3.2.2.

3.29 IPv6 Zone IDs [uri.definition.ipv6-zone-ids]

A zone index is used to identify to which scope a non-global address belongs in an IPv6 address. It is specified in IETF RFC 6874.

4 Requirements [uri.requirements]

Template parameters named InputIterator shall meet the C++ Standard’s library input iterator requirements ([input.iterators]) and shall have a value type that is one of the encoded character types.

The uri class must be able to parse according to the rules described in IETF RFC 3986, Section 3.

The uri class must be able to correctly parse IPv6 addresses, described in IETF RFC 2732.

The uri class must be able to parse internationalized uri class according to IETF RFC 3987, section 2.

The uri class must be able to parse zone IDs in IPv6 addresses according to IETF RFC 6874, section 2.

5 Header <experimental/uri> Synopsis [uri.header-synopsis]

#include <string>        // std::basic_string
#include <system_error>  // std::error_code
#include <iosfwd>        // std::basic_istream, std::basic_ostream
#include <iterator>      // std::iterator_traits
#include <memory>        // std::allocator
#include <optional>      // std::optional
#include <functional>    // std::hash

namespace std {
namespace experimental {
// class declarations
class uri;
class uri_builder;
class uri_syntax_error;
class base_uri_error;
class uri_builder_error;
class percent_decoding_error;

enum class uri_error {
 // uri syntax errors
 invalid_syntax = 1,

 // uri reference and resolution errors
 base_uri_is_empty,
 base_uri_is_not_absolute,
 base_uri_is_opaque,
 base_uri_does_not_match,

 // builder errors
 invalid_uri,
 invalid_scheme,
 invalid_user_info,
 invalid_host,
 invalid_port,
 invalid_path,
 invalid_query,
 invalid_fragment,

 // decoding errors
 not_enough_input,
 non_hex_input,
 conversion_failed,
};

// factory functions
template <class Source>
uri make_uri(const Source& source, std::error_code& ec);
template <class InputIterator>
uri make_uri(InputIterator first, InputIterator last, std::error_code& ec);
template <class Source, class Allocator>
uri make_uri(const Source& source, const Allocator& alloc, std::error_code& ec);
template <class InputIterator, class Allocator>
uri make_uri(InputIterator first, InputIterator last, const Allocator& alloc,
             std::error_code& ec);

// equality and comparison operators
bool operator== (const uri& lhs, const uri& rhs) noexcept;
bool operator!= (const uri& lhs, const uri& rhs) noexcept;
bool operator<  (const uri& lhs, const uri& rhs) noexcept;
bool operator>  (const uri& lhs, const uri& rhs) noexcept;
bool operator<= (const uri& lhs, const uri& rhs) noexcept;
bool operator>= (const uri& lhs, const uri& rhs) noexcept;

// stream operators
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits>&
operator<< (std::basic_ostream<CharT, CharTraits>& os, const uri& u);
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits>&
operator>> (std::basic_istream<CharT, CharTraits>& is, uri& u);

// swap functions
void swap(uri& lhs, uri& rhs) noexcept;
} // namespace experimental

template <>
struct hash<experimental::uri> {
 size_t operator () (const experimental::uri &u) const;
};

template <>
struct is_error_code_enum<experimental::uri_error> : public true_type { };
} // namespace std

5.1 Declarations [uri.header-synopsis.declarations]

The <experimental/uri> header contains a declaration for a uri class, a uri_builder class and execption classes, uri_syntax_error, uri_builder_error and percent_decoding_error in the std::experimental namespace.

5.2 Factory functions [uri.header-synopsis.factory-functions]

// factory functions
template <class Source>
uri make_uri(const Source& source, std::error_code& ec);
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object.
template <class InputIterator>
uri make_uri(InputIterator first, InputIterator last, std::error_code& ec);
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object.
template <class Source, class Allocator>
uri make_uri(const Source& source, const Allocator& alloc, std::error_code& ec);
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object. All memory allocation shall be performed by alloc.
template <class InputIterator, class Allocator>
uri make_uri(InputIterator first, InputIterator last, const Allocator& alloc,
             std::error_code& ec);
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object. All memory allocation shall be performed by alloc.

5.3 Equality and Comparison Operators [uri.header-synopsis.equality-comparison]

bool operator== (const uri& lhs, const uri& rhs) noexcept;
bool operator!= (const uri& lhs, const uri& rhs) noexcept;
Effects: Common overloads of the equality and inequality operators use string_comparison.
lhs.compare(rhs) == 0 and !(lhs == rhs).
bool operator<  (const uri& lhs, const uri& rhs) noexcept;
bool operator>  (const uri& lhs, const uri& rhs) noexcept;
bool operator<= (const uri& lhs, const uri& rhs) noexcept;
bool operator>= (const uri& lhs, const uri& rhs) noexcept;
Effects: Common overloads of the comparison operators use string_comparison.
lhs.compare(rhs) < 0, (rhs < lhs), !(rhs < lhs) and !(lhs < rhs).

5.4 Stream Operators [uri.header-synopsis.stream-operators]

template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits>&
operator<< (std::basic_ostream<CharT, CharTraits>& os, const uri& u);
Effects: os << u.string<CharT, CharTraits>();
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits>&
operator>> (std::basic_istream<CharT, CharTraits>& is, uri& u);
Effects: string<CharT, CharTraits> tmp; is >> tmp; std::error_code ec; u = make_uri(tmp, ec); if (ec) is.setstate(ios::fail);
Throws: std::bad_alloc

5.5 Swap [uri.header-synopsis.swap]

void swap(uri& lhs, uri& rhs) noexcept;
Effects: lhs.swap(rhs);

5.6 URI Hash [uri.header-synopsis.hash]

template <>
struct hash<experimental::uri> {
 size_t operator () (const experimental::uri &u) const;
};
Returns: A hash value of uri u.

5.7 Error code enumeration [uri.error-code-enumeration]

template <>
struct is_error_code_enum<experimental::uri_error> : public true_type { };
Effects: Allows uri_error values to be used in std::error_code.

6 Error Reporting [uri.error-reporting]

Some URI functions provide two overloads, one that throws an exception to report errors, and a second that sets an std::error_code.

Member functions of uri not having an error of type std::error_code& report errors as follow, unless otherwise specified:

  • A uri_syntax_error or base_uri_error is thrown, depending on the context.
  • Failure to allocate storage is reported by throwing an exception as describe in the C++ standard, 17.6.4.10 [res.on.exception.handling].

Functions that have an error of type std::error_code& report errors as follows:

  • If a parsing error indicates an invalid URI, or a URI relative reference is passed as a base URI, the std::error_code& argument is set as appropriate for the specific error. Otherwise, clear is called on the error_code& argument.

7 Class uri [class.uri]

namespace std {
namespace experimental {
class uri {

public:

    // typedefs
    typedef *unspecified* string_type;
    typedef *unspecified* iterator;
    typedef *unspecified* const_iterator;
    typedef std::iterator_traits<iterator>::value_type value_type;
    typedef basic_string_view<value_type> string_view;

    // constructors and destructor
    uri();
    template <class Source, class Allocator = std::allocator<value_type>>
    explicit uri(const Source& source, const Allocator& alloc = Allocator());
    template <typename InputIterator, class Allocator = std::allocator<value_type>>
    uri(InputIterator begin, InputIterator end, const Allocator& alloc = Allocator());
    uri(const uri& other);
    uri(uri&& other) noexcept;
    ~uri() noexcept;

    // assignment
    uri& operator= (const uri& other);
    uri& operator= (uri&& other) noexcept;

    // modifiers
    void swap(uri& other) noexcept;

    // iterators
    const_iterator begin() const;
    const_iterator end() const;
    const_iterator cbegin() const;
    const_iterator cend() const;

    // accessors
    std::optional<string_view> scheme() const;
    std::optional<string_view> user_info() const;
    std::optional<string_view> host() const;
    std::optional<string_view> port() const;
    template <typename IntT>
    std::optional<IntT> port(typename std::is_integral<IntT>::type* = 0) const;
    std::optional<string_view> path() const;
    std::optional<string_view> authority() const;
    std::optional<string_view> query() const;
    std::optional<string_view> fragment() const;

    // string accessors
    template <typename CharT,
              class CharTraits = std::char_traits<CharT>,
              class Allocator = std::allocator<CharT>>
    std::basic_string<CharT, CharTraits, Allocator>
    to_string(const Allocator& alloc = Allocator()) const;
    std::string string() const;
    std::wstring wstring() const;
    std::string u8string() const;
    std::u16string u16string() const;
    std::u32string u32string() const;

    // query
    bool empty() const noexcept;
    bool is_absolute() const;
    bool is_opaque() const;

    // transformers
    uri make_relative(const uri& base) const;
    template <class Allocator>
    uri make_relative(const uri& base, const Allocator& alloc) const;
    uri make_relative(const uri& base, std::error_code& ec) const;
    template <class Allocator>
    uri make_relative(const uri& base, const Allocator& alloc,
                      std::error_code& ec) const;

    uri resolve(const uri& base) const;
    template <class Allocator>
    uri resolve(const uri& base, const Allocator& alloc) const;
    uri resolve(const uri& base, std::error_code& ec) const;
    template <class Allocator>
    uri resolve(const uri& base, const Allocator& alloc,
                std::error_code& ec) const;

    // comparison
    int compare(const uri& other) const noexcept;

    // percent encoding and decoding
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_user_info(InputIterator begin, InputIterator end,
                                           OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_host(InputIterator begin, InputIterator end,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_port(InputIterator begin, InputIterator end,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_path(InputIterator begin, InputIterator end,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_query(InputIterator begin, InputIterator end,
                                       OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_fragment(InputIterator begin, InputIterator end,
                                          OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator decode(InputIterator begin, InputIterator end,
                                 OutputIterator out);

};
} // namespace experimental
} // namespace std

7.1 uri Requirements [class.uri.reqs]

string_type is unspecified and is not required to be a contiguous memory block. As a consequence, iterator and const_iterator are also unspecified. Should an implementor decide to use a contiguous string (e.g. std::string), iterator and const_iterator can be string_type::const_iterator. Each URI part is required to be a contiguous memory block.

Function template parameters named Source shall be one of:

  • basic_string<CharT, CharTraits, Allocator>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • basic_string_view<CharT, CharTraits>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • A type meeting the input iterator requirements that iterates over a NTCTS [defns.ntcts]. The value type shall be an encoded character type. A function argument InputIterator begin shall have an effective range [begin, end) where end is the first iterator value with an element value equal to iterator_traits<InputIterator>::value_type().
  • A character array that after array-to-pointer decay results in a pointer to a NTCTS. The value type shall be an encoded character type. A function argument const Source& source shall have an effective range [source, end) where end is the first iterator value with an element value equal to iterator_traits<decay<Source>::type>::value_type().

Arguments of type Source shall not be null pointers.

Note

This is similar wording to the filesystem path requirements in N3963 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3693.html#path-Requirements).

A std::experimental::uri object is always normalized upon construction.

Case normalization must be performed according to IETF RFC 3986, section 6.2.2.1 and IETF RFC 3987, section 5.3.2.1.

Percent encoding normalization must be performed according to IETF RFC 3986, section 6.2.2.2 and IETF RFC 3987, section 5.3.2.3.

Removing dot segments (”.”, ”..”) from a path must conform to IETF RFC 3986, section 5.2.4.

URI References returned by std::experimental::uri::make_relative must be transformed by using the algorithm in IETF RFC 3986, section 5.2.2.

7.2 typedef s [class.uri.typedefs]

typedef *unspecified* string_type;
typedef *unspecified* iterator;
typedef *unspecified* const_iterator;
typedef std::iterator_traits<iterator>::value_type value_type;
typedef basic_string_view<value_type> string_view;

The string_type, iterator and const_iterator types are left unspecified.

7.3 uri members [class.uri.members]

7.3.1 uri constructors [class.uri.members.constructors]

uri();
Effects: Constructs an object of class uri.
Postconditions: empty()
uri(const uri& other);
Effects: Constructs a uri object with the underlying string and parts copied.
Throws: std::bad_alloc
uri(uri&& other) noexcept;
Effects: Constructs a uri object with the underlying string and parts moved.
template <class Source, class Allocator = std::allocator<value_type>>
uri(const Source& source, const Allocator& alloc = Allocator());
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. All memory allocation shall be performed by alloc.
Postconditions: !empty() && is_absolute()
Throws: uri_syntax_error if source is not a valid URI string, std::bad_alloc
template <typename InputIterator, class Allocator = std::allocator<value_type>>
uri(InputIterator begin, InputIterator end, const Allocator& alloc = Allocator());
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. All memory allocation shall be performed by alloc.
Postconditions: !empty() && is_absolute()
Throws: uri_syntax_error if the string in the range [begin, end) is not a valid URI string, std::bad_alloc

7.3.2 uri assignment [class.uri.members.assignment]

uri& operator= (const uri& other);
Effects: Assigns a uri object with the underlying string and parts copied.
Throws: std::bad_alloc
uri& operator= (uri&& other) noexcept;
Effects: Assigns a uri object with the underlying string and parts moved.

7.3.3. uri modifiers [uri.members.modifiers]

void swap(uri& other) noexcept;
Effects: Swaps the contents of this object with the other.

7.3.4 uri iterators [uri.members.iterators]

const_iterator begin() const;
Returns: A const_iterator to the first element in the underlying string container.
const_iterator end() const;
Returns: A const_iterator to the end of the underlying string container.
const_iterator cbegin() const;
Returns: A const_iterator to the first element in the underlying string container.
const_iterator cend() const;
Returns: A const_iterator to the end of the underlying string container.

7.3.5 uri accessors [uri.members.accessors]

std::optional<string_view> scheme() const;
Returns: A std::optional<string_view> object which spans the range of the scheme in the underlying URI. If the scheme is not specified, it returns nullopt.
std::optional<string_view> user_info() const;
Returns: A std::optional<string_view> object which spans the range of the user info in the underlying URI. If the user info is not specified, it returns nullopt.
std::optional<string_view> host() const;
Returns: A std::optional<string_view> object which spans the range of the host in the underlying URI. If the host is not specified, it returns nullopt.
std::optional<string_view> port() const;
Returns: A std::optional<string_view> object which spans the range of the port in the underlying URI. If the port is not specified, it returns nullopt.
template <typename IntT>
std::optional<IntT> port(typename std::is_integral<IntT>::type* = 0) const;
Returns: A std::optional<IntT> with the port value, if it is present. If the port is not specified, it returns nullopt.
Requires: is_integral<IntT>::value == true
std::optional<string_view> path() const;
Returns: A std::optional<string_view> object which spans the range of the path in the underlying URI. If the path is not specified, it returns nullopt.
std::optional<string_view> authority() const;
Returns: A std::optional<string_view> object which spans the range of the authority in the underlying URI. If the authority is not specified, it returns nullopt.
std::optional<string_view> query() const;
Returns: A std::optional<string_view> object which spans the range of the query in the underlying URI. If the query is not specified, it returns nullopt.
std::optional<string_view> fragment() const;
Returns: A std::optional<string_view> object which spans the range of the fragment in the underlying URI. If the fragment is not specified, it returns nullopt.
template <typename CharT,
          class CharTraits = std::char_traits<CharT>,
          class Allocator = std::allocator<CharT>>
std::basic_string<CharT, CharTraits, Allocator>
to_string(const Allocator& alloc = Allocator()) const;
Returns: A string object containing a copy of the underlying URI string. All memory allocation shall be performed by alloc.
std::string string() const;
Returns: A string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::wstring wstring() const;
Throws: std::bad_alloc
Returns: A wstring object containing a copy of the underlying URI string.
std::string u8string() const;
Returns: A UTF-8 encoded string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::u16string u16string() const;
Returns: A u16string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::u32string u32string() const;
Returns: A u32string object containing a copy of the underlying URI string.
Throws: std::bad_alloc

7.3.6 uri query [uri.members.query]

bool empty() const noexcept;
Returns: true if the underlying string object is empty, false otherwise.
bool is_absolute() const;
Returns: true if the URI is absolute. Equivalent to !scheme().empty().
bool is_opaque() const;
Returns: true if the URI is absolute and its scheme is not hierarchical (i.e. the scheme-specific part does not start with a double-slash “//” and its authority is empty).

7.3.7 uri transformers [uri.members.transformers]

This proposal specifies three transformer functions:, make_relative and resolve.

uri make_relative(const uri& base) const;
Effects: Returns a relative URI reference against base. A base_uri_error is thrown if base is a relative reference.
Postconditions: !u.make_relative(base).absolute()
Returns: A relative URI reference.
Throws: std::bad_alloc
template <class Allocator>
uri make_relative(const uri& base, const Allocator& alloc) const;
Effects: Returns a relative URI reference against base. All memory allocation shall be performed by alloc. A base_uri_error is thrown if base is a relative reference.
Postconditions: !u.make_relative(base, alloc).absolute()
Returns: A relative URI reference.
Throws: std::bad_alloc
uri make_relative(const uri& base, std::error_code& ec) const;
Effects: Returns a relative URI reference against base. ec is set on error.
Returns: A relative URI reference.
template <class Allocator>
uri make_relative(const uri& base, const Allocator &alloc,
                  std::error_code& ec) const;
Effects: Returns a relative URI reference against base. All memory allocation shall be performed by alloc. ec is set on error and make_relative returns an empty uri object.
Returns: A relative URI reference.
uri resolve(const uri& base) const;
Effects: Pre-parses resolve resolves against base according to IETF RFC 3986, section 5.2.1, then transforms the reference according to IETF 3986, section 5.2.2. The resolved uri object is returned. A base_uri_error is thrown if base is invalid (13.2 URI Reference and Resolution Errors).
Postconditions: u.resolve(base).absolute()
Throws: std::bad_alloc
template <class Allocator>
uri resolve(const uri& base, const Allocator& alloc) const;
Effects: Pre-parses resolve resolves against base according to IETF RFC 3986, section 5.2.1, then transforms the reference according to IETF 3986, section 5.2.2. The resolved uri object is returned. All memory allocation shall be performed by alloc. A base_uri_error is thrown if base is invalid (13.2 URI Reference and Resolution Errors).
Postconditions: u.resolve(base, alloc).absolute()
Throws: std::bad_alloc
uri resolve(const uri& base, std::error_code& ec) const;
Effects: Pre-parses resolve resolves against base according to IETF RFC 3986, section 5.2.1, then transforms the reference according to IETF 3986, section 5.2.2. The resolved uri object is returned. ec is set on error and resolve returns an empty uri object.
Throws: std::bad_alloc
template <class Allocator>
uri resolve(const uri& base, const Allocator& alloc, std::error_code& ec) const;
Effects: Pre-parses resolve resolves against base according to IETF RFC 3986, section 5.2.1, then transforms the reference according to IETF 3986, section 5.2.2. The resolved uri object is returned. All memory allocation shall be performed by alloc. ec is set on error and resolve. returns an empty uri object

7.3.8 uri comparison [uri.members.comparison]

int compare(const uri& other) const;
Returns: -1 if the value of this is lexicographically less than the value other; 0 if they are considered equal and 1 if this is greater.

7.3.9 uri percent encoding [uri.members.percent]

template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_user_info(InputIterator begin, InputIterator end,
                                       OutputIterator out);
Effects: Encodes special characters for the user_info part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a user_info string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_host(InputIterator begin, InputIterator end,
                                  OutputIterator out);
Effects: Encodes special characters for the host part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a host string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_port(InputIterator begin, InputIterator end,
                                  OutputIterator out);
Effects: Encodes special characters for the port part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a port string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_path(InputIterator begin, InputIterator end,
                                  OutputIterator out);
Effects: Encodes special characters for the path part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a path string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_query(InputIterator begin, InputIterator end,
                                   OutputIterator out);
Effects: Encodes special characters for the query part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a query string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_fragment(InputIterator begin, InputIterator end,
                                      OutputIterator out);
Effects: Encodes special characters for the fragment part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a fragment string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator decode(InputIterator begin, InputIterator end,
                             OutputIterator out);
Effects: Decodes special characters in the source string and returns the unencoded string (IETF RFC 3986, section 2.1)
Returns: An iterator to the last element of a uri string that has been decoded.
Throws: uri_decoding_error when the input is exhausted, the input is not a hexadecimal character or when the decoding conversion fails.

8 Class uri_builder [class.uri_builder]

namespace std {
namespace experimental {
class uri_builder {

public:

    // constructors and destructor
    uri_builder();
    explicit uri_builder(const uri& base);
    template <class Source>
    explicit uri_builder(const Source& base);
    template <class InputIterator>
    uri_builder(InputIterator begin, InputIterator end);
    uri_builder(const uri_builder& other);
    uri_builder(uri_builder&& other) noexcept;
    ~uri_builder() noexcept;

    // assignment
    uri_builder& operator = uri_builder(const uri_builder&);
    uri_builder& operator = uri_builder(uri_builder&&) noexcept;

    // modifiers
    void swap(uri_builder& other) noexcept;

    // setters
    template <class Source>
    uri_builder& scheme(const Source& scheme);
    template <class InputIterator>
    uri_builder& scheme(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& user_info(const Source& user_info);
    template <class InputIterator>
    uri_builder& user_info(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& host(const Source& host);
    template <class InputIterator>
    uri_builder& host(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& port(const Source& port);
    template <class InputIterator>
    uri_builder& port(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& authority(const Source& authority);
    template <class InputIterator>
    uri_builder& authority(InputIterator begin, InputIterator end);
    template <class UserInfoSource, class HostSource, PortSource>
    uri_builder& authority(const UserInfoSource& user_info,
                           const HostSource& host, const PortSource& port);
    template <class Source>
    uri_builder& path(const Source& path);
    template <class InputIterator>
    uri_builder& path(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& append_path(const Source& path);
    template <class InputIterator>
    uri_builder& append_path(InputIterator begin, InputIterator end);
    template <class Source>
    uri_builder& query(const Source& query);
    template <class InputIterator>
    uri_builder& query(InputIterator begin, InputIterator end);
    template <class KeySource, class ParamSource>
    uri_builder& append_query(const KeySource& key, const ParamSource& param);
    template <class Source>
    uri_builder& fragment(const Source& fragment);
    template <class InputIterator>
    uri_builder& fragment(InputIterator begin, InputIterator end);

    // builder
    std::experimental::uri uri() const;

};
} // namespace experimental
} // namespace std

8.1 uri_builder requirements [class.uri_builder.requirements]

Function template parameters named Source shall be one of:

  • basic_string<CharT, CharTraits, Allocator>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • basic_string_view<CharT, CharTraits>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • A type meeting the input iterator requirements that iterates over a NTCTS [defns.ntcts]. The value type shall be an encoded character type. A function argument InputIterator begin shall have an effective range [begin, end) where end is the first iterator value with an element value equal to iterator_traits<InputIterator>::value_type().
  • A type that be convertible to std::experimental::uri::string_type by a means that can be chosen by the implementation.

Arguments of type Source shall not be null pointers.

The URI must be built according to component recomposition rules in IETF RFC 3986, section 5.3.

8.2 uri_builder constructors [class.uri_builder.constructors]

uri_builder();
Effects: Constructs a uri_builder object.
uri_builder(const uri& base);
Effects: Constructs a uri_builder object from a base URI.
Throws: std::bad_alloc
template <class Source>
uri_builder(const Source& base);
template <class InputIterator>
uri_builder(InputIterator begin, InputIterator end);
Effects: Constructs a uri_builder object from a base URI.
Throws: std::bad_alloc
uri_builder(const uri_builder& other);
Effects: Constructs a uri_builder object with the underlying parts copied.
Throws: std::bad_alloc
uri_builder(uri_builder&& other) noexcept;
Effects: Constructs a uri_builder object with the underlying parts moved.

8.3 uri_builder assignment [class. uri_builder.members.assignment]

uri_builder& operator= (const uri_builder& other);
Effects: Assigns a uri_builder object with the underlying string and parts copied.
Throws: std::bad_alloc
uri_builder& operator= (uri_builder&& other) noexcept;
Effects: Assigns a uri_builder object with the underlying string and parts moved.

8.4 uri_builder members [class.uri_builder.members]

void swap(uri_builder& other) noexcept;
Effects: Swaps the contents of this object with the other.
template <class Source>
uri_builder& scheme(const Source& scheme);
template <class InputIterator>
uri_builder& scheme(InputIterator begin, InputIterator end);
Effects: Sets the URI scheme.
template <class Source>
uri_builder& user_info(const Source& user_info);
template <class InputIterator>
uri_builder& user_info(InputIterator begin, InputIterator end);
Effects: Sets the URI user_info.
template <class Source>
uri_builder& host(const Source& host);
template <class InputIterator>
uri_builder& host(InputIterator begin, InputIterator end);
Effects: Sets the URI host.
template <class Source>
uri_builder& port(const Source& port);
template <class InputIterator>
uri_builder& port(InputIterator begin, InputIterator end);
Effects: Sets the URI port.
template <class Source>
uri_builder& authority(const Source& authority);
template <class InputIterator>
uri_builder& authority(InputIterator begin, InputIterator end);
Effects: Sets the URI authority.
template <class UserInfoSource, class HostSource, class PortSource>
uri_builder& authority(const UserInfoSource& user_info,
                       const HostSource& host, const PortSource& port);
Effects: Sets the URI user info, host and port.
template <class Source>
uri_builder& path(const Source& path);
template <class InputIterator>
uri_builder& path(InputIterator begin, InputIterator end);
Effects: Sets the URI path.
template <class Source>
uri_builder& append_path(const Source& path);
template <class InputIterator>
uri_builder& append_path(InputIterator begin, InputIterator end);
Effects: Appends an element to the uri object’s path.
template <class Source>
uri_builder& query(const Source& query);
template <class InputIterator>
uri_builder& query(InputIterator begin, InputIterator end);
Effects: Sets the URI query.
template <class KeySource, class ParamSource>
uri_builder& append_query(const KeySource& key, const ParamSource& param);
Effects: Appends a key-value pair to the uri object’s query.
template <class Source>
uri_builder& fragment(const Source& fragment);
template <class InputIterator>
uri_builder& fragment(InputIterator begin, InputIterator end);
Effects: Sets the URI fragment.
std::experimental::uri uri() const;
Effects: Builds a URI object from the provided parts. A URI built using this method should be normalized according to syntax-based normalization. This includes case normalization, percent-encoding normalization, character normalization and path segment normalization.
Throws: std::bad_alloc, or uri_builder_error if any of the parts are invalid and a valid uri cannot be formed.

9 Class uri_syntax_error [class.uri_syntax_error]

namespace std {
namespace experimental {
class uri_syntax_error : public std::system_error {
public:
    uri_syntax_error(const string& what_arg, error_code ec);
    virtual ~uri_syntax_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

9.1 uri_syntax_error members [class.uri_syntax_error.members]

9.1.1 uri_syntax_error constructors [class.uri_syntax_error.constructors]

uri_syntax_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

9.1.2 uri_syntax_error accessors [class.uri_syntax_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

10 Class base_uri_error [class.base_uri_error]

namespace std {
namespace experimental {
class base_uri_error : public std::system_error {
public:
    base_uri_error(const string& what_arg, error_code ec);
    virtual ~base_uri_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

10.1 base_uri_error members [class.base_uri_error.members]

10.1.1 base_uri_error constructors [class.base_uri_error.constructors]

base_uri_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

10.1.2 base_uri_error accessors [class.base_uri_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

11 Class uri_builder_error [class.uri_builder_error]

namespace std {
namespace experimental {
class uri_builder_error : public std::system_error {
public:
    uri_builder_error(const string& what_arg, error_code ec);
    virtual ~uri_builder_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

11.1 uri_builder_error members [class.uri_builder_error.members]

11.1.1 uri_builder_error constructors [class.uri_builder_error.constructors]

uri_builder_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

11.1.2 uri_builder_error accessors [class.uri_builder_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

12 Class percent_decoding_error [class.percent_decoding_error]

namespace std {
namespace experimental {
class percent_decoding_error : public std::system_error {
public:
    percent_decoding_error(const string& what_arg, error_code ec);
    virtual ~percent_decoding_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

12.1 percent_decoding_error members [class.percent_decoding_error.members]

12.1.1 percent_decoding_error constructors [class.percent_decoding_error.constructors]

percent_decoding_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

12.1.2 percent_decoding_error accessors [class.percent_decoding_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

13 Enum class uri_error [class.uri_error]

enum class uri_error {
 // uri syntax errors
 invalid_syntax = 1,

 // uri relative reference and resolution errors
 base_uri_is_empty,
 base_uri_is_not_absolute,
 base_uri_is_opaque,
 base_uri_does_not_match,

 // builder errors
 invalid_uri,
 invalid_scheme,
 invalid_user_info,
 invalid_host,
 invalid_port,
 invalid_path,
 invalid_query,
 invalid_fragment,

 // decoding errors
 not_enough_input,
 non_hex_input,
 conversion_failed,
};

13.1 URI Syntax Errors

13.1.1 invalid_syntax

This error is set when the parser is unable to parse the given URI string.

13.2 URI Reference and Resolution Errors

13.2.1 base_uri_is_empty

This error is set when the base URI passed to make_relative or resolve is empty.

13.2.2 base_uri_is_not_absolute

This error is set when the base URI passed to make_relative or resolve is not absolute (it is itself a relative reference).

13.2.3 base_uri_is_opaque

This error is set when the base URI passed to make_relative or resolve is opaque.

13.2.4 base_uri_does_not_match

This error is set when the base URI passed to make_relative does not match the prefix of the URI.

13.3 URI Builder Errors

13.3.1 invalid_uri

This error is set in the uri_builder when the builder is unable to construct a valid URI.

13.3.2 invalid_scheme

This error is set when the uri_builder if the scheme provided is invalid.

13.3.3 invalid_user_info

This error is set when the uri_builder if the user info provided is invalid.

13.3.4 invalid_host

This error is set when the uri_builder if the host provided is invalid.

13.3.5 invalid_port

This error is set when the uri_builder if the port provided is invalid.

13.3.6 invalid_path

This error is set when the uri_builder if the path provided is invalid.

13.3.7 invalid_query

This error is set when the uri_builder if the query provided is invalid.

13.3.8 invalid_fragment

This error is set when the uri_builder if the fragment provided is invalid.

13.4 Percent Decoding Errors

13.4.1 not_enough_input

This error is set when not enough input was given to the decoder to be able to decode the percent encoded string, e.g. %2.

13.4.2 non_hex_input

This error is set when non-hex input is given to the decoder, e.g. %GG.

13.4.3 conversion_failed

This error is set when the decoder was unable to convert the percent encoded string, e.g. %80.

Issues

Note

Issues

1. Scheme-Specific Normalization

There needs to be an extension point in order to allow scheme- and protocol- specific normalization.

2. empty() vs. is_absolute() vs. is_opaque()

In the minutes to the Chicago meeting, there was a suggestion that the is_ prefix is being applied inconsistently. The current way is consistent with at least the filesystem proposal, but clarification should be made with the LEWG.

3. Factory Functions

The make_uri factory functions are free functions, but the LEWG needs to clarify if they can remain this way or if they should be static members of uri.

4. Source Template Parameters

The Source template parameters seem overly generic, this will be taken up with the LEWG.

Acknowledgements

Note

C++ Network Library users and mailing list

Kyle Kloepper and Niklas Gustafsson for providing valuable feedback and encouragement, and for presenting different versions of this proposal at committee meetings.

Beman Dawes and his Filesystem proposal from which I was influenced strongly in the class design.

Thiago Macieira of Qt for important feedback on the draft proposal.

Wikipedia, for being there.