From owner-sc22wg14+sc22wg14-domo2=www.open-std.org@open-std.org  Sat Mar 28 10:29:29 2020
Return-Path: <owner-sc22wg14+sc22wg14-domo2=www.open-std.org@open-std.org>
X-Original-To: sc22wg14-domo2
Delivered-To: sc22wg14-domo2@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id E55D4358C82; Sat, 28 Mar 2020 10:29:29 +0100 (CET)
Delivered-To: sc22wg14@open-std.org
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by www.open-std.org (Postfix) with ESMTP id 9FA70356571
	for <sc22wg14@open-std.org>; Sat, 28 Mar 2020 10:29:29 +0100 (CET)
X-IronPort-AV: E=Sophos;i="5.72,315,1580770800"; 
   d="scan'208";a="442740080"
Received: from 82-64-221-146.subs.proxad.net (HELO inria.fr) ([82.64.221.146])
  by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2020 10:29:28 +0100
Date: Sat, 28 Mar 2020 10:29:27 +0100
From: Jens Gustedt <jens.gustedt@inria.fr>
To: Tom Honermann <tom@honermann.net>
Cc: wg14 <sc22wg14@open-std.org>, SG16 <sg16@lists.isocpp.org>
Subject: Re: (SC22WG14.17674) mbrtowc() wording ambiguities and surprising
 implementation behavior
Message-ID: <20200328102927.6f03a724@inria.fr>
In-Reply-To: <20200328044149.75FAD3589AA@www.open-std.org>
References: <20200328044149.75FAD3589AA@www.open-std.org>
Organization: inria.fr
X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu)
X-Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAAXNSR0IArs4c6QAAACRQTFRFERslNjAsLTE9Ok9wUk9TaUs8iWhSrYZkj42Rz6aD3sGZ
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="Sig_/Etpk_MwN.nAk0sc0jIiVLh5";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Sender: owner-sc22wg14@open-std.org
Precedence: bulk

--Sig_/Etpk_MwN.nAk0sc0jIiVLh5
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Tom,
I am not sure that I (yet) understand all the details of your
question, but I want to make sure to know about the possible extent of
the problem.

Does `c16rtomb` have the same problem? UTF-16 also has two word
encodings, so if you'd want to convert a surrogate pair, the first
call would return "0", store something in the state variable, and the
second call would effectively return the number of characters to be
stored?

For the case of mbrtowc that you describe, the difficulty is that
`mbrtowc` already writes a "partial" result into the buffer?

In your code you have

> =C2=A0 assert(result =3D=3D (size_t) -2);
> =C2=A0 mbs +=3D 1;

But the standard explicitly says

> (size_t) (=E2=88=922) if the next n bytes contribute to an incomplete (but
>              potentially valid) multibyte character, and all n bytes have=
 been
>              processed (no value is stored)

There is "no value is stored". So why do you have

> =C2=A0 mbs +=3D 1;

Shouldn't all information be accumulated in the state variable and
then all be written out at once?

Thanks
Jens

--=20
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

--Sig_/Etpk_MwN.nAk0sc0jIiVLh5
Content-Type: application/pgp-signature
Content-Description: Digitale Signatur von OpenPGP

-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQSN9stI2OFN1pLljN0P0+hp2tU34gUCXn8Y9wAKCRAP0+hp2tU3
4ugJAJ9OL8nR49kIYddbsGbbt+rRQqhamACfS1hiGsx6gX3SSC4QvVQmhqs82os=
=l02i
-----END PGP SIGNATURE-----

--Sig_/Etpk_MwN.nAk0sc0jIiVLh5--
