Comments on the Initialization of Random Engines

Document number:  N1547=03-0130
Date:  October 29, 2003
Project:   Programming Language C++
Reference:   ISO/IEC IS 14882:1998(E)
Reply to:   Marc Paterno
  Fermi National Accelerator Laboratory
  paterno@fnal.gov

Purpose

This document addresses some perceived problems with the signature of initialization functions in N1452, A Proposal to Add an Extensible Random Number Facility to the Standard Library (Revision 2). It does not propose new language for TR1; if the arguments in this paper are accepted by the Committee, new language would be required.

Initializers in N1452

The Engines of N1452 can all be initialized and seeded by passing two iterators that specify a range of values. The specification of the range is unusual in that the first iterator is passed by (non-const) reference, rather than by value. The reason for doing this is to allow for the correct initialization of composed engines.

The problem

As Pete Becker points out in N1535, item 6, this signature causes a problem, in that natural use leads to a compilation failure. Pete presents the following example:

unsigned long init[] = { 1, 2, 3 };
minstd_rand rng0(init, init + 3);  // illegal, init not a modifiable lvalue
minstd_rand rng1;
rng1.seed(init, init + 3);         // illegal, init not a modifiable lvalue
    

Pete proposes a solution to this problem, using two-phase initialization. This involves a constructor that takes an object of type rng_no_init; this constructor does no initialization. This solution neatly solves the problem in that it allows the natural use of the initialization range as above. But it has the drawback of allowing the construction of engines in a state that is not usable for most purposes. An engine constructed with the rng_no_init is suitable for nothing except initialization.

I believe that there is another natural solution to this problem, which does not require having a constructor which leaves engines in an incompletely initialized state. This solution is mentioned in section 3E of N1452; I repeat it here, to draw attention to its advantages.

An Alternative Solution

As mentioned in N1452, section 3E, an alternative approach would be to pass a zero-argument function object (a "generator", in an unfortunate collision of terms). N1452 presents the following analysis:

An alternative approach is to pass a zero-argument function object ("generator") for seeding. It is trivial to implement a generator from a given iterator range, but it is more complicated to implement an iterator range from a generator. Also, the exception object that is specified to be thrown when the iterator range is exhausted could be configured in a user-provided iterator to generator mapping. With this approach, some engines would have three one-argument constructors: One taking a single integer for seeding, one taking a (reference?) to a (templated) generator, and the copy constructor. It appears that the opportunities for ambiguities or choosing the wrong overload are too confusing to the unsuspecting user.

In this discussion, I see only one argument made against this solution: that the presence of three one-argument constructors for some engines may lead to ambiguity or confusion on the part of the user. I believe there is no ambiguity, and that there is very little risk of confusion on the part of the user.

No Ambiguity

I believe that it is possible to implement the three one-argument constructors to avoid ambiguity. Note that:

  1. One of the constructors is the copy constructor, which is not a template.
  2. The second of the constructors takes an integral type; this is also not a template.
  3. The third of the constructors takes, by non-const reference, a function or object of a type that models the concept Generator.
If the implementer uses something like the "enable-if" idiom to prevent the templated constructor from being chosen when an integer argument is passed, or when the copy constructor is wanted, then there is no ambiguity. The resulting syntax for the user is very natural. Consider the result of several attempts to instantiate a specific engine:

mt19937 e1(1);       // integral c'tor, conversion if needed
mt19937 e2(e1);      // copy c'tor
mt19337 e3("wrong"); // compilation failure -- const char* is not a generator
SomeGenerator_t g;   // SomeGenerator_t is a model of Generator
mt19937 e4(g);       // initialization using the given generator
    

Construction from integral types works, including conversion of the passed integral type when appropriate. The "enable-if" technique prevents the templated constructor from matching such a call. Copy construction works, because a template constructor is never a copy constructor -- the compiler must choose the copy constructor. Construction from an inappropriate type fails (with a diagnostic of a quality dependent upon the compiler). Construction with a type which is a model of generator succeeds, as desired.

Advantages

I believe that the generator-based constructor is a superior alternative to the range-based constructor. Partly, this is because it does not suffer from the difficulties pointed out in N1535, without introducing the "rng_no_init" constructor which produces an engine in a mostly-unusable state. More importantly, I believe the concept of construction from a generator more closely matches the idea of what is being done.

It seems to be a burden on the user to create a range of the appropriate size to avoid exhaustion when initializing an engine. "Obvious" uses of the range-constructor may, in fact, be incorrect in the sense of being certain to fail, because the range will be exhausted before completion of the constructor. A generator seems more naturally inexhaustable than an range specified by a pair of iterators, making it easy to pass the generator through as many levels of composed engines as the user may require.

A significant benefit of the generator-based constructor is that one engine can be used to seed another engine, as long as the return type of the "seeding" engine is appropriately convertible to the type required by the "seeded" engine.

Conclusion

In summary, I believe that the generator-based constructor can be implemented in a manner unambiguous to the compiler. Additionally, it is clear in meaning to the user. Finally, it is easier to use than the range-based constructor in the construction of composed engines.

While this paper addresses directly the issue of constructors, I believe the same arguments apply to the seed member functions. Changing the function seed to take a generator rather than a range also helps to disentangle two functionalities that have been conflated in previous discussions: the use of seed to do what the function name says, seeding an engine, as opposed to its use to reset the state from sequence describing the state of the engine previously captured with the "save" function, however it is spelled.