On 9/3/2024 3:40 AM, Michael S wrote:
On Tue, 3 Sep 2024 05:55:14 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> schrieb:
My suggestion is not to implement a language extension, but to
implement a compiler conforming to C as it is now,
Sure, that was also what I was suggesting - define things that
are currently undefined behavior.
with
additional guarantees for what happens in cases that are
undefined behavior.
Guarantees or specifications - no difference there.
Moreover the additional guarantees are
always in effect unless explicitly and specifically requested
otherwise (most likely by means of a #pragma or _Pragma).
Documentation needs to be written for the #pragmas, but no other
documentation is required (it might be nice to describe the
additional guarantees but that is not required by the C
standard).
It' the other way around - you need to describe first what the
actual behavior in absence of any pragmas is, and this needs to be
a firm specification, so the programmer doesn't need to read your
mind (or the source code to the compiler) to find out what you
meant. "But it is clear that..." would not be a specification;
what is clear to you may absolutely not be clear to anybody else.
This is also the only chance you'll have of getting this
implemented in one of the current compilers (and let's face it, if
you want high-quality code, you would need that; both LLVM and GCC
have taken an enormous amount of effort up to now, and duplicating
that is probably not going to happen).
The point is to change the behavior of the compiler but
still conform to the existing ISO C standard.
I understood that - defining things that are currently undefined.
But without a specification, that falls down.
So, let's try something that causes some grief - what should
be the default behavior (in the absence of pragmas) for integer
overflow? More specifically, can the compiler set the condition
to false in
int a;
...
if (a > a + 1) {
}
and how would you specify this in an unabigous manner?
I'd start much earlier, by declaration of "Homogeneity and
Exclusion". It would state that "more defined C" does not pretend
to cover all targets covered by existing C language.
Specifically, following target characteristics are required:
- byte-addressable machine with 8-bit bytes
- two-complement integer types
- if float type is supported it has to be IEEE-754 binary32
- if double type is supported it has to be IEEE-754 binary64
- if long double type is supported it has to be IEEE-754 binary128
- storage order for multibyte types should be either LE or BE,
consistently for all built-in types
- flat address space That part should be specified in more formal
manner
I might add a few things.
ALU:
If integer types overflow, they wrap, with any internal sign or zero extension consistent with the declared type;
If a multiply overflows, the result will contain the low-order bits
of the product, sign or zero extended according to the declared types;
If a variable is shifted left, it will behave as-if it were sign or
zero extended in a way consistent with the type;
If a signed value is shifted right, its high order bits will remain consistent with the original sign bit.
So, in the above example, one could see:
if (a > a + 1) { }
As a hypothetical:
if (a > SignExtend32(a + 1)) { }
Where SignExtent32 returns the input value sign-extended from 32 bits
(a+1 always incrementing the value, but may conceptually either wrap
or go outside the allowed range for 'int', with the sign extension
always returning it to its canonical form, seen as twos complement).
I will not define the behavior of shifts greater than or equal to the
modulo of the integer size, or of negative shifts, as there isn't a consistent behavior here across targets.
However, will note for shifting in a constant expression, it does
seem to be the case, that the shift will behave as-if the width was unbounded, and negative shifts as a shift in the opposite direction,
with the result then being sign or zero extended in accordance with
the type.
Say, for example, zigzag sign folding:
int32_t i, j, k;
i=somevalue;
j=(i<<1)^(i>>31); //fold sign into LSB
k=(j>>1)^((j<<31)>>31);
assert(k==i);
Memory:
One may freely cast pointers to different types and dereference them, regardless of types or alignment of said pointers;
Pointers will behave as-if the memory space were a linear array of
bytes, with each value as one or more contiguous bytes in memory;
Structs are normally packed with each member stored sequentially in
memory, with each member padded to its natural alignment, and the
overal struct, if needed, padded to a multiple of the largest member alignment; The natural alignment for primitive types is equal to the
size of said primitive type;
The address taken of any variable will have an in-memory layout
consistent with the declared type;
...
Implicitly:
Any memory store may potentially alias with any other memory access,
unless: One or both pointers has the restrict keyword;
It can be reasonably proven that the pointed-to memory locations do
not alias;
A compiler may assume an access is aligned if it can be verified that
no operation has caused the address to become misaligned (though, as
a reservation, may assume that if a variable is declared restrict, it
may also be assumed to be properly aligned for its type).
Granted, there are targets where pointers are assumed aligned by
default and declared unaligned, but there is no standard way in C to
declare an unaligned pointer, and there is code that assumes the
ability to freely de-reference pointers regardless of alignment.
Though, a less conservative option would be to assume that any normal pointer variable is aligned by default, but may become unaligned if
it accepts a value created by casting from a type of smaller
alignment (or is assigned a value from a pointer holding such a
value).
char *cs;
int *pi, *pj;
...
pi=(int *)cs; //taints pi with unaligned status.
..
pj=pi; //taints pj with unaligned status via pi
This would still leave it as UB to pass or return a misaligned
pointer across function boundaries (if the pointer is then
de-referenced), or similar for putting them in struct members.
May leave a partial exception for "void *", which may be cast to
another type without causing the result to become unaligned.
...
Misc:
A missing return value is required to still return as normal;
However, the nature and contents of the value returned will be
undefined (it will be "probably random garbage").
But, would make some reservations:
The relative location and alignment of global variables remains
undefined; The relative location and alignment of automatic variables
remains undefined;
The nature or the storage of any global or automatic variable whose
address has not been taken, remains undefined;
The nature or identity of any temporary variables created within an expression, remains undefined;
Calling a function with a missing prototype will remain undefined,
except if both the argument and return types are all primitive types,
the argument types are an exact match and either pointer or integer
types, and the return type is a small integer;
...
Similar, one likely can't (yet) require that targets be little
endian, but one can make a working assumption that the target is
probably little endian.
...
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 119:38:21 |
Calls: | 12,958 |
Files: | 186,574 |
Messages: | 3,265,637 |