Doc. no.:	N4449
Date:	2015-04-09
Project:	Programming Language C++, Library Evolution Working Group
Reply-to:	Zhihao Yuan <zhihao.yuan at rackspace dot com>

Message Digest Library for C++

Motivation

Cryptographic hash functions hold irreplaceable roles in a large variety of applications, since security and data integrity are topics that cannot be dismissed to the applications involving data exchanging. Being compared to the ordinary hash functions being used with the unordered containers, the cryptographic functions are significantly different in purposes and design, which result in different properties on interface, implementation, and performance. In order to satisfy such demands in C++, these functions should be standardized as a part of the C++ standard library.

Scope

This paper focuses on exposing the core functionalities of the cryptographic hash functions in an easy-to-use and efficient way; some use cases that can be provided in a higher level abstraction are not covered, including but not limited to Base64 digest, fingerprinting user-defined types, and key derivation.

This paper does not discuss std::hash, because to co-work with the unordered containers is not a main purpose of cryptographic hash functions.

This paper has no direct relationship with N3980, but some functionalities described here can serve as the underlying implementation of a HashAlgorithm. More details can be found in Design Decisions.

Examples

namespace hashlib = std::hashlib;

hashlib::sha1 h;

assert(h.hexdigest() == "da39a3ee5e6b4b0d3255bfef95601890afd80709");

h.update("C-string");
h.update(std::string());

auto h2 = h;

assert(h2 == h);                    // hasher is equationally complete
assert(h2.digest() == h.digest());  // semantics same as above

h.update(buf, sizeof(buf));         // sized input

assert(h.digest() == h.digest());   // can get result multiple times

using rmd160 = hashlib::hasher<rmd160_provider>;  // user-defined

Design Decisions

Interface

The design is unashamedly stolen from Python’s hashlib module[1]. The details are as follows:

Provide both incremental hashing interface and one-shot hashing interface. They accept the same types of input.
No Callable interface. There are two conflicting options here:
1. operator() performs one-shot hashing and returns the digest. This matches the mathematical definition of a hash function, but the input has to be stored contiguously in memory.
2. operator() performs incremental hashing, as an unnamed update. But different from a HashAlgorithm in N3980[2], the users of a cryptographic hash function are often end-users instead of library authors, where a named method is more clear.
Considering the widest audiences, neither use is more important than the other, and both can be easily solved with lambdas:

[](auto&&... ts) { return hashlib::sha1(ts...).digest(); }
[&](auto&&... ts) { h.update(ts...); }

Support returning the result in hexadecimal. Returning std::string is not the most efficient interface to serialize the digest, but it’s very convenient to many users.
The result can be retrived multiple times. This is a highly desirable feature to cryptographic hash functions. Because
1. it makes hasher types more Regular by enabling EqualityComparable with the desired behavior – comparing the digest. Other options are not so satisfactory to me, including defining no operator==, which makes sha1(a) == sha1(b) not compilable, or providing a modifying operator==, which is a non-option since h != h, or comparing the underlying hash contexts, which is technically viable but logically hard to explain and hard to verify.
2. it simplifies some specific use cases. For example,

log(h);                   // user can insert a log line without making
v.push_back(h.digest());  // the rest of the code UB
                          // and the log line does not need to be log(sha1(h))

Or, if one has a hash value of a file block with a known starting point in a file, and we want to know the size of the block, we can retrieve and compare the digest for each byte we consumed.

This goal is achieved by finalizing the hash on a copy of the underlying hash context. The state sizes of the cryptographic hash functions are constrained[3] (actual context sizes: 96 for SHA1, 208 or 216 for SHA512) – even hashing 1 byte is more costly than copying the context. Considering the aforementioned benefits, I claim that the “overhead” is worthy.

Extensibility

The rich interface introduced above is featured by the class template hashlib::hasher<HashProvider>. A user can add a new hasher by instantiating the template with a struct satisfying the HashProvider requirements by describing the implementation of the hash function in terms of a context_type and some core operations involving such a type.

The details are discussed in Technical Specifications.

Choice of algorithms

I propose to require the following message digest algorithms in the standard:

MD5
SHA1
SHA256
SHA512

Standard library implementations need to make all the required algorithms available. If an implementation can achieve this by shipping a header and rely on the base system libraries, we can save an enormous effort to implement and to maintain those algorithms. The design of the message digest library in this proposal made this approach easier, and the choice of algorithms also took the algorithm availability on major platforms into consideration (the last 3 are for reference):

	GNU(*)	FreeBSD	Apple	Windows	Android	Java	.Net	Python
	libc	libmd	CommonCrypto	CryptoAPI	OpenSSL
MD5	✓	✓	✓	✓	✓	✓	✓	✓
SHA1	✓	✓	✓	✓	✓	✓	✓	✓
SHA224			✓		✓			✓
SHA256	✓	✓	✓	✓	✓	✓	✓	✓
SHA384			✓	✓	✓		✓	✓
SHA512	✓	✓	✓	✓	✓		✓	✓
RMD160		✓					✓

* These functions are private to crypt(3).

MD5 is included for supporting existing services; any algorithms weaker than that are excluded.

Based on the research done so far, I believe that the aforementioned 4 algorithms are the most important ones which should be required to support, and possible to support (without duplicated work) on the major platforms.

Technical Specifications

This section gives the layout of a wording.

Message digests

This subclause defines a class template hasher as a common interface to the cryptographic hash and message digest algorithms, and the typedefs sha1, sha256, sha512, and md5 for the unspecified specializations of hasher to implement, respectively, the FIPS secure hash algorithms SHA1, SHA256, and SHA512 [FIPS 180-2] as well as RSA’s MD5 algorithm [RFC 1321].

Through out this subclause, to specialize a template with a template type parameter named HashProvider, the corresponding template argument shall meet the HashProvider requirements.

HashProvider requirements

A class H meets the HashProvider requirements if the expressions shown in the Table below are valid and have the indicated semantics. In that Table and throughout this section:

a) C is a trivially copyable type to hold the state of the message digest algorithm;

b) c is an lvalue of C;

c) p is a value of type const char*;

d) n is a value of size_t;

e) md is a value of type unsigned char*.

Expression	Return type	Assertion/note Pre/post-condition
`H::context_type`	`C`	`C` is `CopyConstructible` and `Destructible`.
`H::digest_size`	`size_t`
`H::block_size`	`size_t`
`H::init(&c)`		post: `c` is ready to accept data input.
`H::update(&c, p, n)`		Requires: `p` points to at least `n` contiguous bytes of input. Effects: Hashes the `n` bytes whose the first byte is designated by `p`. pre: `c` is ready to accept data input.
`H::final(md, &c)`		Requires: `md` points to an object having contiguous space for `digest_size` bytes of output. Effects: Places the message digest in `md`. pre: `c` is ready to accept data input. post: `c` is not ready to accept data input.

Header <hashlib> synopsis

namespace std::hashlib {  // N4026

  template <typename HashProvider>
  struct hasher
  {
    static constexpr auto digest_size = HashProvider::digest_size;
    static constexpr auto block_size  = HashProvider::block_size;

    typedef typename HashProvider::context_type      context_type;
    typedef std::array<unsigned char, digest_size>   digest_type;

    hasher() noexcept;

    explicit hasher(char const* s);
    explicit hasher(char const* s, size_t n);

    template <typename StringLike>
      explicit hasher(StringLike const& bytes) noexcept;

    void update(char const* s);
    void update(char const* s, size_t n);

    template <typename StringLike>
      void update(StringLike const& bytes) noexcept;

    digest_type digest() const noexcept;

    template <typename CharT = char,
              typename Traits = char_traits<CharT>,
              typename Allocator = allocator<CharT>>
      basic_string<CharT, Traits, Allocator>
        hexdigest() const;

  private:
    context_type ctx_;  // exposition only
  };

  template <typename HashProvider>
  bool operator==(hasher<HashProvider> const& a,
                  hasher<HashProvider> const& b) noexcept;

  template <typename HashProvider>
  bool operator!=(hasher<HashProvider> const& a,
                  hasher<HashProvider> const& b) noexcept;

  template <typename CharT, typename Traits,
            typename HashProvider>
    basic_ostream<CharT, Traits>&
      operator<<(basic_ostream<CharT, Traits>& os,
                 hasher<HashProvider> const& h);

  typedef hasher<unspecified>   md5;
  typedef hasher<unspecified>   sha1;
  typedef hasher<unspecified>   sha256;
  typedef hasher<unspecified>   sha512;

}

Class template hasher

hasher() noexcept;

Effects: Constructs an object of hasher by calling HashProvider::init(&ctx_).

explicit hasher(char const* s);

Effects: Equivalent to calling update(s) on a default initialized *this.

explicit hasher(char const* s, size_t n);

Effects: Equivalent to calling update(s, n) on a default initialized *this.

template <typename StringLike>
  explicit hasher(StringLike const& bytes) noexcept;

Effects: Equivalent to calling update(bytes) on a default initialized *this.

void update(char const* s);

Effects: Equivalent to update(s, strlen(s)).

void update(char const* s, size_t n);

Effects: Equivalent to HashProvider::update(&ctx_, s, n).

template <typename StringLike>
  void update(StringLike const& bytes) noexcept;

Effects: Equivalent to update(bytes.data(), bytes.size()).

digest_type digest() const noexcept;

Let md be a default constructed object of digest_type.

Effects: Equivalent to

    auto tmp_ctx = ctx_;
    HashProvider::final(md.data(), &tmp_ctx);

Returns: md.

template <typename CharT = char,
          typename Traits = char_traits<CharT>,
          typename Allocator = allocator<CharT>>
  basic_string<CharT, Traits, Allocator>
    hexdigest() const;

TBD

template <typename HashProvider>
bool operator==(hasher<HashProvider> const& a,
                hasher<HashProvider> const& b) noexcept;

Returns: a.digest() == b.digest().

template <typename HashProvider>
bool operator!=(hasher<HashProvider> const& a,
                hasher<HashProvider> const& b) noexcept;

Returns: !(a == b).

template <typename CharT, typename Traits,
          typename HashProvider>
  basic_ostream<CharT, Traits>&
    operator<<(basic_ostream<CharT, Traits>& os,
               hasher<HashProvider> const& h);

Effects: Equivalent to os << h.template hexdigest<CharT, Traits>().

Implementation

A prototype is available at https://github.com/lichray/cpp-deuceclient/blob/master/include/deuceclient/hashlib.h.

References

[1] “hashlib – Secure hashes and message digests.” The Python Standard Library. https://docs.python.org/2/library/hashlib.html

[2] Hinnant, Howard E., Vinnie Falco, and John Bytheway. “Proposed Wording.” Types Don’t Know #. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3980.html#wording

[3] “SHA-1.” Wikipedia: The Free Encyclopedia. http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions

Message Digest Library for C++

Motivation

Scope

Examples

Design Decisions

Interface

Extensibility

Choice of algorithms

Technical Specifications

Message digests

`HashProvider` requirements

Header `<hashlib>` synopsis

Class template `hasher`

Implementation

References

Message Digest Library for C++

Motivation

Scope

Examples

Design Decisions

Interface

Extensibility

Choice of algorithms

Technical Specifications

Message digests

HashProvider requirements

Header <hashlib> synopsis

Class template hasher

Implementation

References

`HashProvider` requirements

Header `<hashlib>` synopsis

Class template `hasher`