[ub] [c++std-ext-14592] Re: Re: Sized integer types and char bits

Ion Gaztañaga igaztanaga at gmail.com
Sat Oct 26 23:50:13 CEST 2013


El 26/10/2013 22:51, John Regehr escribió:
>> ... there is no
>> representation change when converting a signed int value to unsigned int
>> or when converting an unsigned int value to signed int.
>
> Wow-- anyone care to guess what fraction of existing C programs run
> correctly under these conditions?

If you think that's strange you got to read this: Unisys' servers with 
sign-magnitude 48 bit integers with 8 padding bits and 8 bit two's 
complement character types ("ClearPath MCP 15.0 operating system for 
their MCP Serverss", 
http://public.support.unisys.com/search/DocumentationSearch.aspx?ID=750&pla=ps&nav=ps)

They support many programming languages and even partial POSIX support. 
See "Application Development" here: 
http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/70118328-104.pdf

I'm sorry to copy-paste so many lines from the C manual, but I think 
it's interesting to read that really unusual C implementations are still 
produced and maintained.

-------------------------------------------------
-------------------------------------------------
      BEGINNING OF C COMPILER MANUAL QUOTES
-------------------------------------------------
-------------------------------------------------

C compiler manual:

http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/86002268-205.pdf

--------------------------------------------------

Type Specifier / Description

char, unsigned char / Represents an unsigned
whole number in 8 bits (1 byte).

signed char  Represents a signed whole number in 8 bits (1 byte).

float  / Represents a real number in 48 bits (1 word).

double  / Represents a real number in 48 bits (1 word) or a real number 
in 96 bits (2 words), depending on the value of the
DBTOSNGL compiler control option. The default is float.

long double  Represents a real number in 96 bits (2 words).

int, signed, signed int, short int, signed short int, long int, signed 
long int / Represents a signed whole number in 48 bits (1 word).

unsigned, unsigned int, unsigned short int, unsigned long int, / 
Represents an unsigned whole number in 48 bits (1 word).

------------------------------------------------

Char Types

Characters are 8 bits wide. The plain char type is unsigned by default. 
This default can be changed to be signed by the $PORT (SIGNEDCHAR) 
option. This affects all variables of type char, even in arrays and 
structures.

Signed characters are stored in two’s-complement format. The values 0 
through have the same bit pattern for signed and unsigned types. 
Unsigned characters are stored in two’s complement format if the $PORT 
(CHAR2) option is enabled.

...

The default character set used at run time is EBCDIC. This can be 
changed by the $ SET STRINGS=ASCII option. When ASCII is set, all 
characters are stored using the ASCII character set and all I/O is 
translated to or from ASCII if necessary.

Six characters are stored for each word instead of the usual four or 
two. It is not valid to compare multiple characters at once by casting a 
character pointer into an integer pointer and doing integer comparisons. 
This comparison results in a run-time error if the BOUNDS(ALIGNMENT) 
compiler control option is set; otherwise undefined behavior is likely 
to occur.

------------------------------------------------

Integer Types

Integer type representation differs between A Series C and C language on 
most other machines. A Series C uses a signed-magnitude representation 
for integers instead of two’s-complement representation. Furthermore, A 
Series C integers use only 40 of the 48 bits in the word: a separate 
sign bit and the low order 39 bits for the absolute value.

Unsigned types in A Series C use the same representation as signed 
types, except that the sign bit is always zero. Negative values, when 
casted to an unsigned type, are added to (INT_MAX+1), producing a value 
within the signed integer range. This value does not change when cast 
back to a signed type.

The types short, int, and long are all the same size.

Bit operations (bitwise AND, OR, exclusive OR, and NOT) on signed values 
affect only the 40 bits used by integers. Bit operations on unsigned 
values conform to the mathematical definitions given in the ANSI C 
standard. Because the sign bit is not adjacent to the other bits, it is 
not possible to shift into or out of the sign bit.

Operations on unsigned integer types are more expensive than on signed 
types. The $RESET PORT (UNSIGNED) option makes unsigned equivalent to 
signed types and should be used on programs that do not depend upon the 
wraparound or bit operation properties of unsigned types.

By default, bit fields in structures or unions that are of type plain 
int are unsigned. The default can be changed to signed by the $PORT 
(SIGNEDFIELD) option.

------------------------------------------------

Floating Types

By default, double type is the same size and range as floattype. Note 
that the A Series floattype has about 11 digits of precision. The 
default can be changed to be the same size and range of long doubletype 
by the $RESET DBLTOSNGL option (Double to Single)

------------------------------------------------

Pointer Types

Pointers are internally stored as integer values. A pointer to a char is 
the number of bytes from the start of addressable memory (the C heap), 
not the machine memory. A pointer to an int or a float is the number of 
words, and a pointer to a long double is the number of double words, 
from the start of addressable memory. Implicit and explicit casts 
between pointers of different types adjust the value. Casts that are 
invisible to the compiler must be avoided, such as an invisible cast 
declaring a prototype to an external procedure as taking a char* 
parameter, but defining the procedure in another compilation unit as 
taking an int* parameter. See “Pointer Alignment” in this section for 
the implication of implicit and explicit casts between pointers of 
differing types.

Any pointer that is itself pointed at or any pointer stored in an array, 
structure, or union is always stored as if a void cast were done. These 
pointers may be cast safely, either visibly or invisibly.

Pointer arguments to old-style functions are always passed as if a void* 
cast were done.

The allocation of objects is not necessarily consecutive. Bumping a 
pointer beyond the end of an object does not cause the pointer to point 
to the next object declared. This is especially true for function 
arguments.

Problems with implicit and explicit casts between pointers of different 
types can possibly be avoided through use of the $BYTEADDRESS compiler 
control option.

------------------------------------------------

Common Portation Problems

You might encounter the following problems when porting C code to an 
enterprise server:

1.  Pointer alignment — Many programs assume that all pointers can be 
treated in the same manner. You can detect these problems at run-time by 
setting the $BOUNDS(ALIGNMENT) compiler control option.

2.  Signed magnitude representation — Encryption (and other) algorithms 
might assume that integers are stored in a particular format.

3.  Unsigned types — Since the enterprise server stores the sign bit 
separately from the data, different behavior can occur when casting 
between signed and unsigned types. When you are porting an application 
for the first time, it is strongly recommended that you set the 
BOUNDS(ALIGNMENT) compiler control option. When the application is fully 
tested, the option can be reset to remove the performance penalty 
associated with its use.

------------------------------------------------

Two’s Complement Arithmetic

Two’s complement arithmetic is used on many platforms. On A Series 
systems, arithmetic is performed on data in signed-magnitude form. This 
can cause discrepancies in algorithms that depend on the two’s 
complement representation. For example, C applications that use 
encryption algorithms to match data, such as passwords, between client 
and server must perform the encryption the same way on the client-end 
and the server-end. The differences between the two’s complement and 
signed-magnitude representation may result in different values when 
fragments of data are extracted, encrypted, and reinserted into the data.

To obtain matching results, you can define macros that return arithmetic 
results in two’s complement form. The following example illustrates 
macros for two’s complement addition and subtraction:

#define tc_add(arg1, arg2) (((arg1) + (arg2)) & 0xFF)
#define tc_sub(arg1, arg2) (((arg1) + (0x100 - (arg2))) & 0xFF)

-------------------------------------------------
-------------------------------------------------
      END OF C COMPILER MANUAL QUOTES
-------------------------------------------------
-------------------------------------------------

Happy compiler porting ;-)

Ion



More information about the ub mailing list