From owner-sc22wg14+sc22wg14-domo2=www.open-std.org@open-std.org  Sat Mar 28 20:29:41 2020
Return-Path: <owner-sc22wg14+sc22wg14-domo2=www.open-std.org@open-std.org>
X-Original-To: sc22wg14-domo2
Delivered-To: sc22wg14-domo2@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id 39D819DB1BE; Sat, 28 Mar 2020 20:29:41 +0100 (CET)
Delivered-To: sc22wg14@open-std.org
Received: from smtp94.iad3b.emailsrvr.com (smtp94.iad3b.emailsrvr.com [146.20.161.94])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by www.open-std.org (Postfix) with ESMTP id F27189DB1B2
	for <sc22wg14@open-std.org>; Sat, 28 Mar 2020 20:29:40 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=honermann.net;
	s=20180930-2j89z3ji; t=1585423779;
	bh=GrTgDPr8B+xD+l9DaIT+bpx0vp5FZyWY876dAIDtr3E=;
	h=Subject:To:From:Date:From;
	b=X4B3DnuIi+d4UwpBxzUet6Vxvg1oKj1nBEfPIsUjgI/dDiEHSXUTCHe6moR904Rnd
	 +56E7WCfEgzpZcZi2NxzKEmCl68mRUf95ZAWMonoH1LsMRbDtEct9jr2sXatZfcn9o
	 ZqXpvfyaXNs/lhv8ztbdbkmSXkXgWvEEYO29cJr4=
X-Auth-ID: tom@honermann.net
Received: by smtp4.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 0B14320110;
	Sat, 28 Mar 2020 15:29:38 -0400 (EDT)
X-Sender-Id: tom@honermann.net
Received: from [192.168.1.13] (pool-74-110-208-227.rcmdva.fios.verizon.net [74.110.208.227])
	(using TLSv1.2 with cipher DHE-RSA-AES128-SHA)
	by 0.0.0.0:25 (trex/5.7.12);
	Sat, 28 Mar 2020 15:29:39 -0400
Subject: Re: (SC22WG14.17677) mbrtowc() wording ambiguities and surprising
 implementation behavior
To: Jens Gustedt <jens.gustedt@inria.fr>
Cc: wg14 <sc22wg14@open-std.org>, SG16 <sg16@lists.isocpp.org>
References: <20200328044149.75FAD3589AA@www.open-std.org>
 <20200328092930.04ECF3566A9@www.open-std.org>
From: Tom Honermann <tom@honermann.net>
Message-ID: <95cbfc4d-3ac7-99b2-d90c-14896438fb2a@honermann.net>
Date: Sat, 28 Mar 2020 15:29:38 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.4.1
MIME-Version: 1.0
In-Reply-To: <20200328092930.04ECF3566A9@www.open-std.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Classification-ID: 29300d73-ee91-4751-ae04-274142f88577-1-1
Sender: owner-sc22wg14@open-std.org
Precedence: bulk

On 3/28/20 5:29 AM, Jens Gustedt wrote:
> Tom,
> I am not sure that I (yet) understand all the details of your
> question, but I want to make sure to know about the possible extent of
> the problem.
>
> Does `c16rtomb` have the same problem? UTF-16 also has two word
> encodings, so if you'd want to convert a surrogate pair, the first
> call would return "0", store something in the state variable, and the
> second call would effectively return the number of characters to be
> stored?

c16rtomb() is different in that the interface only allows a single input 
code unit to be provided per call and it allows multiple code units to 
be written per call.  mbrtowc() doesn't support writing more than one 
output per call.

Perhaps you meant to ask about mbrtoc16()?

>
> For the case of mbrtowc that you describe, the difficulty is that
> `mbrtowc` already writes a "partial" result into the buffer?
>
> In your code you have
>
>>    assert(result == (size_t) -2);
>>    mbs += 1;
> But the standard explicitly says
>
>> (size_t) (−2) if the next n bytes contribute to an incomplete (but
>>               potentially valid) multibyte character, and all n bytes have been
>>               processed (no value is stored)
> There is "no value is stored". So why do you have
>
>>    mbs += 1;
>>
Incrementing mbs is necessary for the next call to consume the next 
input code unit.  mbs points to the input sequence.


> Shouldn't all information be accumulated in the state variable and
> then all be written out at once?

Yes and no.  Yes all information should be accumulated in the state 
variable, but no, all output cannot be written at once because mbrtowc() 
only supports outputting a single value (pwc is only required to point 
to a single wchar_t).

Tom.

>
> Thanks
> Jens
>

