Smart Objects, an alternative approach to overloading operator dot

Doc. No.:	N4495
Date:	2015-05-22
Project:	Programming Language C++, Evolution Working Group
Reply To:	Mathias Gaunard <mathias@gaunard.com>, Dietmar Kühl <dkuhl@bloomberg.net>

Introduction

This proposal suggests an approach to allow overloading the dot operator that is an alternative to the solution proposed in N4477. It is based on having the compiler synthesize a function object whenever a call to that operator is made rather than return a user-defined reference.

This approach can do everything that N4477 can while avoiding some of its issues. The approach also allows some novel uses which are described below.

Motivation

The advent of smart pointers, types making use of the capacity to overload the arrow (->) operator to mimic pointers while providing extra behaviour like automatic lifetime management, has only highlighted the lack of having the ability to overload the dot (.) operator to achieve smart references, or better yet, smart objects.

In the following proposal, we will use the terminology smart reference to refer to a use case that is possible with both this proposal and N4477, and smart object for a use case that requires the method presented in this proposal.

Examples of smart references include:

Smart pointers that are really references: smart pointer-like objects that have reference semantics rather than pointer semantics.
Proxies: objects that must intercept every call to a member before rerouting it to some other object.
Interface refinement: build new objects by composing existing objects, forwarding to them, and adding new members.
Pimpl and handles: making an indirection to another private object transparent without having to forward every member value or function.

Examples of smart objects include:

Sum types aka variants: objects that can hold values of different types. It is possible to define operator dot so that it forwards to the members of the actual type contained.
Duck typing: similar to sum types, but the error of not having a particular member function for a given type is delayed to when the binding to the actual value occurs.
Expression templates and DSELs: building a type that allows delayed evaluation by building an object representing the call rather than evaluating it is very useful to build Domain-Specific Embedded Languages that are based on the famous expression templates technique.
Aggregates of key-value pairs: objects that are aggregates of runtime-defined properties could be accessed through operator dot instead of an explicit string.
Repeated evaluation: objects that do not forward the member access to a single object but to several objects in succession.

Additionally, the function object approach allows fine control of what is forwarded, under what condition, and in what order. The discussed below provides more details.

Design and Specification

Lookup rules

When looking up a non-static member of a user-defined-type, if no member by the name requested exists, and the type has a declaration of a non-static member function named operator. with matching cv-qualifiers, then the compiler shall synthesize a function object with a operator() member function template taking an arbitrary object as parameter and applying the name to it. An instance of that synthesized function object is passsed as argument to the operator. member function.

For example, x.some_name; gets translated into x.operator.(synthesized_function_type{}); where x.some_name wouldn't otherwise be found.

In this case, the synthesized function type could be equivalent to the following:

struct synthesized_function_type {
    template<class T>
    auto operator()(T&& t) -> decltype(t.some_name) noexcept(t.some_name) {
        return t.some_name;
    }
};

Special capture behavior for calls to member functions is decribed in the next section.

Member functions and scope capture

If the compiler must synthesize a function object for a member function call, each subexpression evaluating each argument shall be evaluated prior to constructing the synthesized function object. Rvalue-ness shall be forwarded correctly, with rvalues held by value and lvalues by reference.

For example, the calls x.some_name(a, foo(), foo()) shall get translated to to x.operator.(synthesized_function_type{a, foo(), foo()}); where the synthesized function type could be equivalent to the following:

struct synthesized_function_type {
    // `a' or `foo' may not be visible in that context
    // used here lexically just for demonstration purposes
    typedef decltype((a)) T0;
    typedef decltype(foo()) T1;
    typedef decltype(foo()) T2;
    T0 a0;
    T1 a1;
    T2 a2;

    template<class T>
    auto operator()(T&& t) -> decltype(t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2))) noexcept(t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2))) {
        return t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2));
    }
};

The construction of the synthesized_function_type or the initialization of any of the members can be elided as long as calls to the function call operator can be made with the appropriate arguments. In particular, if the compiler can determine that a use of operator.() results in a member function call on a specific object it can elide construction of the function object and replace a call to its function call operator with a direct call to the member function.

Extension to reflection

The mechanism could optionally be extended to support reflection by adding arbitrary extra information within the synthesized function object type.

Further work will be done in that area in future versions of the proposal depending on feedback.

For example for member variables:

struct synthesized_function_type {
    template<class T>
    auto operator()(T&& t) -> decltype(t.some_name) noexcept(t.some_name) {
        return t.some_name;
    }

    static constexpr const char* name() { return "some_name"; }
    static constexpr bool is_member_function = false;
};

For member functions:

struct synthesized_function_type {
    // `a' or `foo' may not be visible in that context
    // used here lexically just for demonstration purposes
    typedef decltype((a)) T0;
    typedef decltype(foo()) T1;
    typedef decltype(foo()) T2;
    T0 a0;
    T1 a1;
    T2 a2;

    template<class T>
    auto operator()(T&& t) -> decltype(t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2))) noexcept(t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2))) {
        return t.t.some_name(static_cast<T0&&>(a0), static_cast<T1&&>(a1), static_cast<T2&&>(a2));
    }

    static constexpr const char* name() { return "some_name"; }
    static constexpr bool is_member_function = true;
    typedef std::tuple<T0, T1, T2> member_arguments;
};

Properties of the Design

Controlling reference leaking

N4477 discusses reference leaking in section 5: the possibility that operator.() can reult in functions accidentally returning a reference to an object held by a smart reference. Since N4477 proposes the implementation of operator.() in similar ways as operator->() but returning a reference rather than a pointer the control of the returned reference needs to be in the language.

When implementing operator.() in terms of a passed function object the implementer of a smart reference has full control over the type returned from operator.(). In its simplest form reference leaking can be prevented by not returning anything from the resulting operator:

class strictly_non_leaking {
    X x;
public:
    template <typename Fun>
    auto operator.(Fun&& fun) -> void { fun(x); }
};

While that may be viable for some smart reference types, most smart reference would probably want to return a suitable result. It would be straight forward to prevent certain known return types, e.g., by conditionally wrapping results into a suitable type:

class wrapping {
    X x;
public:
    wrapping(X& ref);
    template <typename Fun>
    auto operator.(Fun&& fun)
        -> std::condition_t<std::is_same_v<X&, decltype(fun(std::declval<X&>()))>,
                            wrapping, decltype(fun(std::declval<X&>()))> {
        return fun(x);
    }
};

What exactly a smart reference returns from a use can be controlled by the smart reference. In particular the good (incr(x)) case can be supported while the bad (leak(x)) case can be banned.

Overloading operator.() on multiple objects

N4477 discusses overloading operator.() in section 4.9. While there it is obvious that operator.() can be overloaded for cv-qualified version and reference qualifications, N4477 also proposes to overload on the reference type returned from operator.(). The idea is that multiple version of operator.() can be used to return different reference types and the unique version applicable for a member use is chosen. The choice of operator.() based on the reference type is similar to the choice made when finding a member function in a class with multiple base classes: if a unique match is found it is chosen otherwise (if there are no or multiple matches) the use is an error.

When passing a function objcet to operator.() this form of overloading isn't possible. However, such special overloading rules are not needed as the same effect can be achieved by determining if a function call can be made:

class composite {
    A a;
    B b;
public:
    template <typename Fun>
    auto operator.(Fun&& fun)
        -> decltype(call_unique(std::forward<Fun>(fun), std::tie(this->a, this->b))) {
        return call_unique(std::forward<Fun>(fun), std::tie(this->a, this->b));
    }
};

The function call_unique() determines if there is exactly one element x in the passed std::tuple<...> for which fun(x) is a valid called and, if so, returns the result of calling fun with this element. Otherwise, it produces an error. This function isn't easy to write but it could be made available by the standard library to aid with common choices. Since it is a library function other choices could be made, however. For example, instead of calling the unique choice a different approach could be to call the first match.

SFINAE on the synthesized function object

A lot of advanced uses of overloading operator. with the scheme described on this proposal rely on SFINAE extended for expressions. SFINAE for expressions is necessary to be able to tell whether a member exists for a particular object, i.e., whether the synthesized function object can be called with a specific parameter type.

This is why the signature of the operator() member function template of the synthesized function objects has a return type defined with decltype rather than just relying on a specification using auto or decltype(auto).

No handling of static members

To build perfect smart references or proxies, it would not only be required to forward regular members, but static ones as well. For example, consider a proxy for std::vector which should have the ::iterator be provided, too.

Like N4477, the proposal currently doesn't cover static members. The mechanism could be extended to support these members. The main issue is being able to deal with both types and values, in particular in contexts where member types are not annotated with typename.

Repeated evaluation and move semantics

The synthesized function call operator forwards arguments when it is called. As a result rvalue arguments get moved by the first call to the function call operator. For typical uses cases where the member access is forwarded to another object this behavior is exactly what is desired. Since the function call operator on the argument passed to operator. could easily be called multiple times, subsequent calls to the function call operator would call the member access with already moved from arguments. This behavior is similar to using multiple move operations on the same object, though.

For some of the use cases described below repeated calls to the function call operator are desirable. Ideally the captured arguments could be forwarded appropriately. However, since the synthesized function entirely encapsulates the presence and processing of the arguments there is no way to specify whether arguments should be moved if there is just one function call operator with the argument on which member is applied. It would be possible to provide another way to invoke the funciton which wouldn't move, e.g., by passing a second argument or by calling a named function.

Examples and Use Cases

Polymorphic Value Types

Inheritance and subtyping polymorphism is a use case for some sort of dynamic typing. If given a base class Base, and several derived classes Derived1, Derived2 or Derivedn, it is often useful to be able to have an object that can contain any of the Derivedi classes derived from Base.

Smart pointers are a popular solutions to this problem. While smart pointers provide an entity semantic, there is also an argument that can be made for having value semantics instead, where copies actually copies the data instead of aliasing it.

This copying can be achieved using a special smart pointer. There are several implementations of this approach on the Internet under the name clone_ptr. Pointer syntax is however not very appropriate when providing someting using value semantics. An object an example of what it could look like with this proposal. Let's name the type that can hold any type derived from T while providing a value semantic poly<T>, below is a simplistic implementation relying on an intrusive "clone" virtual member function and lacking move semantics and other fancy features.

template<class T>
struct poly {
    poly(T const& t) : ptr(new T(t)) {}
    ~poly() { delete ptr; }

    poly(const poly& other) ptr(other.p->clone()) {}

    poly& operator=(poly other) {
        std::swap(ptr, other.ptr);
        return *this;
    }

    template<class F>
    decltype(auto) operator.(F f) {
        return f(*ptr);
    }

    template<class F>
    decltype(auto) operator.(F f) const {
        return f(*ptr);
    }

private:
    T* ptr;
};

It then becomes possible to write things like this:

poly<Base> obj = Derived1();
a.virtual_member_function_of_Base();
a = Derived2();
int x = a.member_variable_of_Base;

Sum Types

Sum types, also called tagged unions or variants, are data structures that can hold an object of either of a list of possibly unrelated types. Proposal N4450 suggests adding such a type to the standard library, named variant, based on the boost::variant class template.

Proposal N4450 for the variant type provides very limited operator overloading (only <, <=, =, !=, >= and >), but a case could be made for providing overloading for all operators, including operator dot. This raises a couple of interesting questions regarding what the result type of that operator should be, and whether a hard error should be emitted if the operator in question is not available on one or more of the types in the set.

For the use case below, we present a partial interface of a simple variant implementation with freestore-based storage and pseudo-code to ignore some of the implementation complications inherent to variant. Operator dot returns the common type of all possible cases, and the operator being called must be valid for all cases.

template<class... T>
struct variant {
    template<class U>
    variant(U const& u) : ptr(new U(u)), which(find_offset<seq<T...>, U>::value) {}

    ~variant() { type_erased_delete(ptr); }

    variant(variant const& other) : ptr(type_erased_clone(other)), which(other.which) {}

    variant& operator=(variant other) {
        swap(ptr, other.ptr);
        swap(which, other.which);
        return *this;
    }

    template<class F>
    decltype(auto) operator.(F f) {
        switch(which)
        {
            case 0: return f(*static_cast<T0*>(ptr));
            case 1: return f(*static_cast<T1*>(ptr));
            /* for every Ti... */
        }
    }

    template<class F>
    decltype(auto) operator.(F f) const {
        switch(which)
        {
            case 0: return f(*static_cast<T0 const*>(ptr));
            case 1: return f(*static_cast<T1 const*>(ptr));
            /* for every Ti... */
        }
    }

private:
    void* ptr;
    int which;
};

This use case is a prime example of a smart object. This interface can be achieved by synthesizing a function object but it cannot be achieved by forwarding to a reference like in N4477 (operator dot), since there is no reference to a single object that variant could return.

It becomes possible to write things like this:

struct Foo { const char* name() { return "Foo"; } };
struct Bar { const char* name() { return "Bar"; } void bark() { cout << name() << endl; } };

variant<Foo, Bar> v = Foo();
v = Bar();
const char* s = v.name();
//v.bark(); // error: not all types define 'bark'

Dynamic Duck Typing

Duck typing is a technique which involves binding a name to an object as lately as possible: if the name is available at the time it is needed, call it, otherwise raise an error. Most dynamic typed language are based on this principle as it provides a very easy and flexible programming model.

It cannot be implemented in C++ generally, but it is possible to provide duck typing over a finite set of types, so it is possible to provide it for a variant type like above. It requires being able to test at compile-time if a given type satisfies the call, in order to be able to fallback to code that generates an error in case the expression is not supported.

This is one use case that requires the synthesized function object to contain its full body in its signature, so that it can be used in SFINAE contexts.

From a synthesized function object for the operator dot call, the implementation would wrap it in another function object with a fallback by doing something like this:

template<class T, class R = void>
struct sink { typedef R type; };

template<class Sig, class Enable = void>
struct is_callable : std::false_type {};

template<class F, class... Args>
struct is_callable<F(Args...), typename sink<decltype(std::declval<F>()(std::declval<Args>()...))>::type> : std::true_type {};

template<class F, class R>
struct call_or_throw : F {
    using F::operator();

    template<class T>
    typename std::enable_if<!is_callable<F(T&&)>::value, R>::type operator()(T&& t) const
    {
        throw std::runtime_error("No such operation");
    }
};

The operator. overload now looks like this:

template<class... T>
struct duck_variant {
    /* content from variant... */

    template<class F>
    decltype(auto) operator.(F f) {
        switch(which)
        {
            case 0: return call_or_throw{f}(*static_cast<T0*>(ptr));
            case 1: return call_or_throw{f}f(*static_cast<T1*>(ptr));
            /* for every Ti... */
        }
    }
};

It becomes possible to write things like this:

struct Foo { const char* name() { return "Foo"; } };
struct Bar { const char* name() { return "Bar"; } void bark() { cout << name() << endl; } };

duck_variant<Foo, Bar> v = Foo();
v = Bar();
const char* s = v.name();
v.bark(); // correct even if Foo has no 'bark'

Expression Templates

Expression templates is a mechanism using operator overloading to delay evaluations of expressions and then evaluate them with a given context, which enables building entire Domain-Specific Languages embedded into C++. For this use case, forwarding to a reference is not possible, and generating a function object fits exactly the needs of delaying evaluation that is needed.

In this case, the operator. would just return the synthesized function object or more likely a wrapper of that function object.

Dynamic Properties (reflection extension)

Another use case where overloading the dot operator is useful is for objects where the member names are dynamic, like those obtained from binding to a dynamic language, serialization to XML or JSON, or anything deciding fields at runtime. This cannot be addressed by just providing a function object, but it is trivial to extend the function objects to contain additional information to support that use case. With N4477 (operator dot), there is no obvious way to extend the mechanism to support this use case.

Assuming the synthesized function object is augmented to support reflection, i.e., a name() function return the name of the member being applied, it could therefore look like this:

struct Value {
    Value() : data(Map()) {}
    Value(int i) : data(i) {}
    Value(double d) : data(d) {}
    Value(std::string const& s) : data(s) {}

    template<class F>
    Value& operator.(F f) {
        return get<Map>(data)[F::name()];
    }

    template<class F>
    Value const& operator.(F f) const {
        return get<Map>(data)[F::name()];
    }

private:
    typedef std::unordered_map<std::string, Value> Map;
    variant<Map, int, double, std::string> data;
}

And this would allow the following usage:

Value v;
v.foo = 42;
v.bar = "bar";

Adding "foo" and "bar" members dynamically.

Vector of values

Just like + on std::valarray adds all of its values, it could also be interesting to call a member on all the values of the vector as well. Consider valarray< complex<T> >, it would be useful to be able to apply .real() and .imag() to all members of the vector at once without requiring special specialization.

This is a use case that requires calling the function object several times on different values, and that isn't supported by forwarding it to a single reference like in N4477.

template<class T>
struct array {
    template<class F>
    auto operator.(F f) -> array< decltype(f(values[0])) > {
        array< decltype(f(values[0])) > result(values.size());
        for(size_t i=0; i<values.size(); ++i)
            result[i] = f(values[i]);
        }
        return result;
    }

private:
    std::vector<T> values;
};