• Re: Computer architects leaving Intel...

    From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 13 11:30:52 2024
    From Newsgroup: comp.arch

    On Thu, 5 Sep 2024 20:08:23 -0500
    BGB <cr88192@gmail.com> wrote:

    On 9/3/2024 3:40 AM, Michael S wrote:
    On Tue, 3 Sep 2024 05:55:14 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Tim Rentsch <tr.17687@z991.linuxsc.com> schrieb:

    My suggestion is not to implement a language extension, but to
    implement a compiler conforming to C as it is now,

    Sure, that was also what I was suggesting - define things that
    are currently undefined behavior.

    with
    additional guarantees for what happens in cases that are
    undefined behavior.

    Guarantees or specifications - no difference there.

    Moreover the additional guarantees are
    always in effect unless explicitly and specifically requested
    otherwise (most likely by means of a #pragma or _Pragma).
    Documentation needs to be written for the #pragmas, but no other
    documentation is required (it might be nice to describe the
    additional guarantees but that is not required by the C
    standard).

    It' the other way around - you need to describe first what the
    actual behavior in absence of any pragmas is, and this needs to be
    a firm specification, so the programmer doesn't need to read your
    mind (or the source code to the compiler) to find out what you
    meant. "But it is clear that..." would not be a specification;
    what is clear to you may absolutely not be clear to anybody else.

    This is also the only chance you'll have of getting this
    implemented in one of the current compilers (and let's face it, if
    you want high-quality code, you would need that; both LLVM and GCC
    have taken an enormous amount of effort up to now, and duplicating
    that is probably not going to happen).

    The point is to change the behavior of the compiler but
    still conform to the existing ISO C standard.

    I understood that - defining things that are currently undefined.
    But without a specification, that falls down.

    So, let's try something that causes some grief - what should
    be the default behavior (in the absence of pragmas) for integer
    overflow? More specifically, can the compiler set the condition
    to false in

    int a;

    ...

    if (a > a + 1) {
    }

    and how would you specify this in an unabigous manner?

    I'd start much earlier, by declaration of "Homogeneity and
    Exclusion". It would state that "more defined C" does not pretend
    to cover all targets covered by existing C language.
    Specifically, following target characteristics are required:
    - byte-addressable machine with 8-bit bytes
    - two-complement integer types
    - if float type is supported it has to be IEEE-754 binary32
    - if double type is supported it has to be IEEE-754 binary64
    - if long double type is supported it has to be IEEE-754 binary128
    - storage order for multibyte types should be either LE or BE,
    consistently for all built-in types
    - flat address space That part should be specified in more formal
    manner

    I might add a few things.

    ALU:
    If integer types overflow, they wrap, with any internal sign or zero extension consistent with the declared type;
    If a multiply overflows, the result will contain the low-order bits
    of the product, sign or zero extended according to the declared types;
    If a variable is shifted left, it will behave as-if it were sign or
    zero extended in a way consistent with the type;
    If a signed value is shifted right, its high order bits will remain consistent with the original sign bit.


    So, in the above example, one could see:
    if (a > a + 1) { }
    As a hypothetical:
    if (a > SignExtend32(a + 1)) { }
    Where SignExtent32 returns the input value sign-extended from 32 bits
    (a+1 always incrementing the value, but may conceptually either wrap
    or go outside the allowed range for 'int', with the sign extension
    always returning it to its canonical form, seen as twos complement).


    I will not define the behavior of shifts greater than or equal to the
    modulo of the integer size, or of negative shifts, as there isn't a consistent behavior here across targets.

    However, will note for shifting in a constant expression, it does
    seem to be the case, that the shift will behave as-if the width was unbounded, and negative shifts as a shift in the opposite direction,
    with the result then being sign or zero extended in accordance with
    the type.

    Say, for example, zigzag sign folding:
    int32_t i, j, k;
    i=somevalue;
    j=(i<<1)^(i>>31); //fold sign into LSB
    k=(j>>1)^((j<<31)>>31);
    assert(k==i);


    Memory:
    One may freely cast pointers to different types and dereference them, regardless of types or alignment of said pointers;
    Pointers will behave as-if the memory space were a linear array of
    bytes, with each value as one or more contiguous bytes in memory;
    Structs are normally packed with each member stored sequentially in
    memory, with each member padded to its natural alignment, and the
    overal struct, if needed, padded to a multiple of the largest member alignment; The natural alignment for primitive types is equal to the
    size of said primitive type;
    The address taken of any variable will have an in-memory layout
    consistent with the declared type;
    ...

    Implicitly:
    Any memory store may potentially alias with any other memory access,
    unless: One or both pointers has the restrict keyword;
    It can be reasonably proven that the pointed-to memory locations do
    not alias;
    A compiler may assume an access is aligned if it can be verified that
    no operation has caused the address to become misaligned (though, as
    a reservation, may assume that if a variable is declared restrict, it
    may also be assumed to be properly aligned for its type).


    Granted, there are targets where pointers are assumed aligned by
    default and declared unaligned, but there is no standard way in C to
    declare an unaligned pointer, and there is code that assumes the
    ability to freely de-reference pointers regardless of alignment.

    Though, a less conservative option would be to assume that any normal pointer variable is aligned by default, but may become unaligned if
    it accepts a value created by casting from a type of smaller
    alignment (or is assigned a value from a pointer holding such a
    value).

    char *cs;
    int *pi, *pj;
    ...
    pi=(int *)cs; //taints pi with unaligned status.
    ..
    pj=pi; //taints pj with unaligned status via pi

    This would still leave it as UB to pass or return a misaligned
    pointer across function boundaries (if the pointer is then
    de-referenced), or similar for putting them in struct members.

    May leave a partial exception for "void *", which may be cast to
    another type without causing the result to become unaligned.

    ...

    Misc:
    A missing return value is required to still return as normal;
    However, the nature and contents of the value returned will be
    undefined (it will be "probably random garbage").


    But, would make some reservations:
    The relative location and alignment of global variables remains
    undefined; The relative location and alignment of automatic variables
    remains undefined;
    The nature or the storage of any global or automatic variable whose
    address has not been taken, remains undefined;
    The nature or identity of any temporary variables created within an expression, remains undefined;
    Calling a function with a missing prototype will remain undefined,
    except if both the argument and return types are all primitive types,
    the argument types are an exact match and either pointer or integer
    types, and the return type is a small integer;
    ...


    Similar, one likely can't (yet) require that targets be little
    endian, but one can make a working assumption that the target is
    probably little endian.

    ...


    I agree with great majority of it.

    Rules for shifts could be formulated better. I think, they are
    formulated better in gcc manual, in section about implementation-defined behaviors.

    For functions without arguments, I'd prefer mandatory prototypes, even
    at cost of breakage of existing code.
    Also more draconian both about missing return type and about missing
    return statement in non-void function.

    About endiannes, I think that my definition in post above is most
    practical. I.e. BE allowed, but inconsistent byte orders are prohibited.
    Plus, of course, standardized name of preprocessor built-in for easy compile-time detection of endianness.






    --- Synchronet 3.20a-Linux NewsLink 1.114