• Time to eat Crow

    From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 02:50:23 2025
    From Newsgroup: comp.arch


    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.
    --------------------------------------------------------------
    Integer instructions are now::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}
    Although I am oscillating whether to support FP8 or FP128.

    With this rearrangement of bit in the instruction formats, I
    was able to get all Constant and routing control bits in the
    same place and format in all {1, 2, and 3}-Operand instructions
    uniformly. This simplifies <trifling> the Decoder, but more
    importantly; the Operand delivery (and/or reception) mechanism.

    I was also able to compress the 7 extended operation formats
    into a single extended operation format. The instruction
    format now looks like:

    inst<31:26> Major OpCode
    inst<20:16> {Rd, Cnd field}
    inst<25:21> {SRC1, Rbase}
    inst<15:10> {SH width, else, {I,d,Sign,Size}}
    inst< 9: 6> {Minor OpCode, SRC3}
    inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

    So there is 1 uniformly positioned field of Minor OpCodes,
    and one uniformly interpreted field of Operand Modifiers.
    Operand Modifiers applies routing registers and inserting
    of constants to XOP Instructions. --------------------------------------------------------------
    So, what does this buy the Instruction Set ??

    A) All integer calculations are performed at the size and
    type of the result as required by the high level language::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
    This, gets rid of all smash instructions across all data
    types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

    B) I actually gained 1 more extended OpCode for future expansion.

    C) assembler/disassembler was simplified

    D) and while I did not add any new 'instructions' I made those
    already present more uniform and supporting of the requirements
    of higher level languages (like ADA) and more suitable to the
    stricter typing LLVM provides over GCC.

    In some ways I 'doubled' the instruction count while not adding
    a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
    The elimination of 'smashes' shrinks the instruction count of
    GNUPLOT by 4%--maybe a bit more once we sort out all of the
    compiler patterns it needs to recognize. --------------------------------------------------------------
    I wonder if crow tastes good in shepard's pie ?!?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Fri Oct 3 03:17:16 2025
    From Newsgroup: comp.arch

    On 2025-10-02 10:50 p.m., MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.
    --------------------------------------------------------------
    Integer instructions are now::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}
    Although I am oscillating whether to support FP8 or FP128.

    For my arch, I decided to support FP128 thinking that FP8 could be
    implemented with lookup tables, given that eight bit floats tend to vary
    in composition. Of course, I like more precision.
    Could it be a build option? Or a bit in a control register to flip
    between FP8 and FP128?

    With this rearrangement of bit in the instruction formats, I
    was able to get all Constant and routing control bits in the
    same place and format in all {1, 2, and 3}-Operand instructions
    uniformly. This simplifies <trifling> the Decoder, but more
    importantly; the Operand delivery (and/or reception) mechanism.

    I was also able to compress the 7 extended operation formats
    into a single extended operation format. The instruction
    format now looks like:

    inst<31:26> Major OpCode
    inst<20:16> {Rd, Cnd field}
    inst<25:21> {SRC1, Rbase}
    inst<15:10> {SH width, else, {I,d,Sign,Size}}
    inst< 9: 6> {Minor OpCode, SRC3}
    inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

    Only four bits for SRC3?

    So there is 1 uniformly positioned field of Minor OpCodes,
    and one uniformly interpreted field of Operand Modifiers.
    Operand Modifiers applies routing registers and inserting
    of constants to XOP Instructions. --------------------------------------------------------------
    So, what does this buy the Instruction Set ??

    A) All integer calculations are performed at the size and
    type of the result as required by the high level language::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
    This, gets rid of all smash instructions across all data
    types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

    B) I actually gained 1 more extended OpCode for future expansion.

    C) assembler/disassembler was simplified

    D) and while I did not add any new 'instructions' I made those
    already present more uniform and supporting of the requirements
    of higher level languages (like ADA) and more suitable to the
    stricter typing LLVM provides over GCC.

    In some ways I 'doubled' the instruction count while not adding
    a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
    The elimination of 'smashes' shrinks the instruction count of
    GNUPLOT by 4%--maybe a bit more once we sort out all of the
    compiler patterns it needs to recognize. --------------------------------------------------------------
    I wonder if crow tastes good in shepard's pie ?!?

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:33:36 2025
    From Newsgroup: comp.arch


    Robert Finch <robfi680@gmail.com> posted:

    On 2025-10-02 10:50 p.m., MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.
    --------------------------------------------------------------
    Integer instructions are now::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}
    Although I am oscillating whether to support FP8 or FP128.

    For my arch, I decided to support FP128 thinking that FP8 could be implemented with lookup tables, given that eight bit floats tend to vary
    in composition. Of course, I like more precision.
    Could it be a build option? Or a bit in a control register to flip
    between FP8 and FP128?

    With this rearrangement of bit in the instruction formats, I
    was able to get all Constant and routing control bits in the
    same place and format in all {1, 2, and 3}-Operand instructions
    uniformly. This simplifies <trifling> the Decoder, but more
    importantly; the Operand delivery (and/or reception) mechanism.

    I was also able to compress the 7 extended operation formats
    into a single extended operation format. The instruction
    format now looks like:

    inst<31:26> Major OpCode
    inst<20:16> {Rd, Cnd field}
    inst<25:21> {SRC1, Rbase}
    inst<15:10> {SH width, else, {I,d,Sign,Size}}
    inst< 9: 6> {Minor OpCode, SRC3}
    inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

    Only four bits for SRC3?
    No, there are 5-bits--inst<9:5>--woops.

    So there is 1 uniformly positioned field of Minor OpCodes,
    and one uniformly interpreted field of Operand Modifiers.
    Operand Modifiers applies routing registers and inserting
    of constants to XOP Instructions. --------------------------------------------------------------
    So, what does this buy the Instruction Set ??

    A) All integer calculations are performed at the size and
    type of the result as required by the high level language::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
    This, gets rid of all smash instructions across all data
    types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

    B) I actually gained 1 more extended OpCode for future expansion.

    C) assembler/disassembler was simplified

    D) and while I did not add any new 'instructions' I made those
    already present more uniform and supporting of the requirements
    of higher level languages (like ADA) and more suitable to the
    stricter typing LLVM provides over GCC.

    In some ways I 'doubled' the instruction count while not adding
    a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
    The elimination of 'smashes' shrinks the instruction count of
    GNUPLOT by 4%--maybe a bit more once we sort out all of the
    compiler patterns it needs to recognize. --------------------------------------------------------------
    I wonder if crow tastes good in shepard's pie ?!?

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri Oct 3 12:40:17 2025
    From Newsgroup: comp.arch

    MitchAlsup wrote:
    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    Why? Compilers do not have any problem with this
    as its been handled by overload resolution since forever.

    Its people who have the problems following type changes and most
    compilers will warn of mixed type operations for exactly that reason.

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.

    Strongly typed languages don't natively support mixed type operations.
    They come with a set of predefined operations for specific types that
    produce specific results.

    If YOU want operators/functions that allow mixed types then they force
    you to define your own functions to perform your specific operations,
    and it forces you to deal with the consequences of your type mixing.

    All this does is force YOU, the programmer, to be explicit in your
    definition and not depend on invisible compiler specific interpretations.

    If you want to support Uns8 * Int8 then it forces you, the programmer,
    to deal with the fact that this produces a signed 16-bit result
    in the range -128*256..+127*256 = -32768..32512.
    Now if you want to convert that result bit pattern to Uns8 by truncating
    it to the lower 8 bits, or worse treat the result as Int8 and take
    whatever random value falls in bit [7] as the sign, then that's on you.
    They just force you to be explicit what you are doing.

    --------------------------------------------------------------
    Integer instructions are now::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow mixing.
    Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
    the normal single type arithmetic instructions.

    Although I am oscillating whether to support FP8 or FP128.

    The issue with FP8 support seems to be that everyone who wants it also
    wants their own definition so no matter what you do, it will be unused.

    The issue with FP128 seems associated with scaling on LD and ST
    because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
    And in the case of a combined int-float register file deciding whether
    to expand all registers to 128 bits, or use 64-bit register pairs.
    Using 128-bit registers raises the question of 128-bit integer support,
    and using register pairs opens a whole new category of pair instructions.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Fri Oct 3 10:55:46 2025
    From Newsgroup: comp.arch

    On 10/2/2025 7:50 PM, MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    I must be missing something. Suppose I have

    C := A + B

    where A and C are 16 bit signed integers and B is an 8 bit signed
    integer. As I understand what you are doing, loading B into a register
    will leave the high order 56 bits zero. But the add instruction will presumably be half word, so if B is negative, it will get an incorrect
    answer (because B is not sign extended to 16 bits).

    What am I missing?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:25:25 2025
    From Newsgroup: comp.arch

    --------------------------------------------------------------
    Integer instructions are now:: {Signed and unSigned}×{Byte, HalfWord, >> Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow mixing.

    Not sure who's confused, but my reading of the above is not some sort of "mixing": I believe Mitch is just saying that his addition operation
    (for example) can be specified to operate on either one of int8, uint8,
    int16, uint16, ...
    But that specification applies to all inputs and outputs of the
    instruction, so it does not support adding an int8 to an int32, or other "mixes".


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 19:55:00 2025
    From Newsgroup: comp.arch


    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

    On 10/2/2025 7:50 PM, MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    I must be missing something. Suppose I have

    C := A + B

    where A and C are 16 bit signed integers and B is an 8 bit signed
    integer. As I understand what you are doing, loading B into a register
    will leave the high order 56 bits zero. But the add instruction will presumably be half word, so if B is negative, it will get an incorrect answer (because B is not sign extended to 16 bits).

    What am I missing?

    A is loaded as 16-bits properly sign to 64-bits: range[-32768..32767]
    B is loaded as 8-bits properly sign to 64-bits: range[-128..127]

    ADDSH Rc,Ra,Rb

    Adds 64-bit Ra and 64-bit Rb and then sign extends the result from bit<15>. The result is a properly signed 64-bit value: range [-32768..32767]


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Oct 3 20:47:08 2025
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:
    MitchAlsup wrote:
    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow.
    --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    Why? Compilers do not have any problem with this
    as its been handled by overload resolution since forever.

    A non-My66000 example:

    int add (int a, int b)
    {
    return a + b;
    }

    is translated on powerpc64le-unknown-linux-gnu (with -O3 to)

    add 3,3,4
    extsw 3,3
    blr

    extsw fills the 32 high-value bits with because numbers returned
    in registers have to be correct, either as 32- or 64-bit values.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Oct 3 21:04:16 2025
    From Newsgroup: comp.arch

    Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
    --------------------------------------------------------------
    Integer instructions are now:: {Signed and unSigned}×{Byte, HalfWord, >>> Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow mixing.

    Not sure who's confused, but my reading of the above is not some sort of "mixing": I believe Mitch is just saying that his addition operation
    (for example) can be specified to operate on either one of int8, uint8, int16, uint16, ...
    But that specification applies to all inputs and outputs of the
    instruction, so it does not support adding an int8 to an int32, or other "mixes".

    The outputs are correctly extended to a 64-bit number (signed or
    unsigned) so it is possible to pass results to wider operations
    without conversion.

    One example would be

    unsigned long foo (unsigned int a, unsigned int b)
    {
    return a + b;
    }

    which would need an adjustment after the add, and which would
    just be somethign like

    adduw r1,r1,r2
    ret

    using Mitch's new encoding.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 21:36:07 2025
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    MitchAlsup wrote:
    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow. --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    Why? Compilers do not have any problem with this
    as its been handled by overload resolution since forever.

    LLVM compiles C with stricter typing than GCC resulting in a lot
    of smashes:: For example::

    int subroutine( int a, int b )
    {
    return a+b;
    }

    Compiles into:

    subroutine:
    ADD R1,R1,R2
    SRA R1,R1,<32,0> // limit result to (int)
    RET

    LLVM thinks the smash is required because [-2^31..+2^31-1] +
    [-2^31..+2^31-1] does not always fit into [-2^31..+2^31-1] !!!
    and chasing down all the cases is harder than the compiler is
    ready to do. At first I though that the Value propagation in
    LLVM would find that the vast majority of arithmetic does not
    need smashing. This proved frustrating to both myself and to
    Brian. The more I read RSIC-V and ARM assembly code, the more
    I realized that adding sized integer arithmetic is the only
    way to get through to the LLVM infrastructure.

    We (the My 66000 team; mostly me and Brian) have been trying to
    obey the stricter than necessary typing of LLVM and achieve the
    code density possible as if K&R rules were in play with 64-bit
    only (int)s.

    RISC-V has ADDW (but no ADDH or ADDB) to alleviate the issue on
    a majority of calculations. ARM has word sized Registers to
    alleviate the issue. Since ARM started as 32-bits ADDW is natural.
    I am exploring how to provide integer arithmetic such that smashing
    never has to happen.

    We have been chasing smashes for 9 months making little progress...

    Its people who have the problems following type changes and most
    compilers will warn of mixed type operations for exactly that reason.

    It is more the ADA problem that values must fit in containers--that
    is values have a range {min..max} and that calculated values outside
    of that range are to be "addressed".

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.

    Strongly typed languages don't natively support mixed type operations.
    They come with a set of predefined operations for specific types that
    produce specific results.

    Yes, indeed, and this is what I am providing: {Sign}×{Size} calculations. Where the result is known to be range lmited to {Sign}×{Size}. Thus:

    ADDSH R7,R8,R9

    R7 is range limited {Signed}×{HalfWord} == [-32768..+32767] ------------------------------------------------------------------------
    So let's look at some egregious cases::

    cvtds r2,r2 // convert double to signed 64
    srl r3,r2,#0,#32 // convert signed 64 to signed 32
    --------
    sra r1,r23,#0,#32 // smash to signed 32
    sra r2,r20,#0,#32 // smash to signed 32
    maxs r23,r2,r1 // max of signed 32
    --------
    ldd r24,[r24] // LD signed 64
    add r1,r28,#1 // innocently add #1
    sra r28,r1,#0,#32 // smash to Signed 32
    cmp r1,r28,r16 // to match the other operand of CMP --------
    call strspn
    srl r2,r1,#0,#32 // smash result Signed 32
    add r1,r25,-r1
    sra r1,r1,#0,#32 // smash Signed 32
    cmp r2,r19,r2
    srl r2,r2,#2,#1
    add r21,r21,r2 // add Bool to Signed 32
    sra r2,r20,#0,#32 // smash Signed 32
    maxs r20,r1,r2 // MAX Signed 32
    --------
    mov r1,r29 // Signed 64
    ple0 r17,FFFFFFF // ignore
    stw r17,[ip,key_rows] // ignore
    add r1,r29,#-1 // innocent subtract
    sra r1,r1,#0,#32 // smash to Signed 32
    divs r1,r1,r17 // DIV Signed 32
    --------
    lduw r2,[ip,keyT+4]
    add r2,r2,#-1 // innocent subtract
    srl r2,r2,#0,#32 // smash to unSigned 32
    cmp r3,r2,#1 // CMP unSigned 32
    // even though CMP is Signless
    --------
    add r1,r19,-r6 // not so innocent subtract
    sra r2,r1,#0,#32 // Signed
    srl r1,r1,#0,#32 // unSigned
    // only one of these can be eliminated
    --------

    If YOU want operators/functions that allow mixed types then they force
    you to define your own functions to perform your specific operations,
    and it forces you to deal with the consequences of your type mixing.

    All this does is force YOU, the programmer, to be explicit in your
    definition and not depend on invisible compiler specific interpretations.

    If you want to support Uns8 * Int8 then it forces you, the programmer,
    to deal with the fact that this produces a signed 16-bit result
    in the range -128*256..+127*256 = -32768..32512.

    Uns8 occupies 64-bits in a register range-limited to [0..255]
    Int8 occupies 64-bits in a register range-limited to [-128..127]
    So, integer values sitting in registers occupy the whole 64-bits
    but are properly range-limited to base-type.

    Multiply multiplies 2×64-bit registers and produces a 128-bit
    result, since CARRY is not in effect, the bits<127..64> are
    discarded; bits<63..0> are then considered.

    unSigned results simply discard bits more significant than base-type.
    Signed results raise OVERFLOW is there is more significance than
    base-type (and if enabled take an exception).
    In all cases, the result delivered fits within the range of base-type.

    So, in the case you mention::

    LDUB R8,[---]
    LDSB R9,[---]
    MULSH R7,R8,R9 // result range [-32768..32767]
    -----
    MULUH R7,R8,R9 // result range [0..65535]


    Now if you want to convert that result bit pattern to Uns8 by truncating
    it to the lower 8 bits,

    MULUB R7,R8,R9 // result range [0..255]

    or worse treat the result as Int8 and take
    whatever random value falls in bit [7] as the sign, then that's on you.

    MULSB R7,R8,R9 // result range [-128..127] or OVERFLOW

    Personally, I prefer range checks that raise OVERFLOW.

    They just force you to be explicit what you are doing.

    --------------------------------------------------------------
    Integer instructions are now::
    {Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.

    RISC-V and ARM LLVM compilers already do this and use it to eliminate
    smashes. RISC-V is limited to WORD, ARM uses registers of WORD size.
    Both eliminate smashes. Since there are already LLVM compilers using
    this (to eliminate smashes) it should be not terribly difficult to add.

    On the other hand:: ILP64 ALSO gets rid of the problem (at a different cost).

    Strong typed languages don't have predefined operators that allow mixing. Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
    the normal single type arithmetic instructions.

    Although I am oscillating whether to support FP8 or FP128.

    The issue with FP8 support seems to be that everyone who wants it also
    wants their own definition so no matter what you do, it will be unused.

    Thank you for your input.

    The issue with FP128 seems associated with scaling on LD and ST
    because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
    And in the case of a combined int-float register file deciding whether
    to expand all registers to 128 bits, or use 64-bit register pairs.

    My position is that people want 64-bit registers and ISA that allow
    reasonably easy and efficient access to 128-bits, CARRY provides this.
    But the architecture is not cut out to be a big 128-bit number cruncher; occasional sure, but all the time, no.

    Using 128-bit registers raises the question of 128-bit integer support,
    and using register pairs opens a whole new category of pair instructions.

    CARRY supports this.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 04:56:21 2025
    From Newsgroup: comp.arch

    On 10/3/2025 4:04 PM, Thomas Koenig wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
    --------------------------------------------------------------
    Integer instructions are now:: {Signed and unSigned}×{Byte, HalfWord,
    Word, DoubleWord}
    while FP instructions are now:
    {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow mixing. >>
    Not sure who's confused, but my reading of the above is not some sort of
    "mixing": I believe Mitch is just saying that his addition operation
    (for example) can be specified to operate on either one of int8, uint8,
    int16, uint16, ...
    But that specification applies to all inputs and outputs of the
    instruction, so it does not support adding an int8 to an int32, or other
    "mixes".

    The outputs are correctly extended to a 64-bit number (signed or
    unsigned) so it is possible to pass results to wider operations
    without conversion.

    One example would be

    unsigned long foo (unsigned int a, unsigned int b)
    {
    return a + b;
    }

    which would need an adjustment after the add, and which would
    just be somethign like

    adduw r1,r1,r2
    ret

    using Mitch's new encoding.



    Yes.

    Sign extend signed types, zero extend unsigned types.
    Up-conversion is free.


    This is something the RISC-V people got wrong IMO, and adding a bunch of
    ".UW" instructions in an attempt to patch over it is just kinda ugly.

    Partly for my own uses revived ADDWU and SUBWU (which had been dropped
    in BitManip), because these are less bad than the alternative.

    I get annoyed that new extensions keep trying to add ever more ".UW" instructions rather than just having the compiler go over to
    zero-extended unsigned and make this whole mess go away.

    ...



    Ironically, the number of new instructions being added to my own ISA has mostly died off recently, largely because there is little particularly relevant to add at this point (within the realm of stuff that could be
    added).


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 04:57:23 2025
    From Newsgroup: comp.arch

    On 10/3/2025 11:40 AM, EricP wrote:
    MitchAlsup wrote:
    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it reached the
    point where it was time to switch to version 2.0.

    Well, its time to eat crow.
    --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both integer and
    floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply value
    range constraints--just like memory !

    Why? Compilers do not have any problem with this
    as its been handled by overload resolution since forever.

    Its people who have the problems following type changes and most
    compilers will warn of mixed type operations for exactly that reason.

    ISA 2.0 changes allows calculation instructions; both Integer and
    Floating Point; and a few other miscellaneous instructions (not so
    easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    Integer and floating point compare instructions only compare
    bits of the specified {Size}.

    Conversions between integer and floating point are now also
    governed by {Size} so one can directly convert FP64 directly
    into {unSigned}×{Int16}--more fully supporting strongly typed
    languages.

    Strongly typed languages don't natively support mixed type operations.
    They come with a set of predefined operations for specific types that
    produce specific results.

    If YOU want operators/functions that allow mixed types then they force
    you to define your own functions to perform your specific operations,
    and it forces you to deal with the consequences of your type mixing.

    All this does is force YOU, the programmer, to be explicit in your
    definition and not depend on invisible compiler specific interpretations.

    If you want to support Uns8 * Int8 then it forces you, the programmer,
    to deal with the fact that this produces a signed 16-bit result
    in the range -128*256..+127*256 = -32768..32512.
    Now if you want to convert that result bit pattern to Uns8 by truncating
    it to the lower 8 bits, or worse treat the result as Int8 and take
    whatever random value falls in bit [7] as the sign, then that's on you.
    They just force you to be explicit what you are doing.

    --------------------------------------------------------------
    Integer instructions are now::      {Signed and unSigned}×{Byte,
    HalfWord, Word, DoubleWord}
    while FP instructions are now:
         {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow mixing. Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
    the normal single type arithmetic instructions.

    Although I am oscillating whether to support FP8 or FP128.

    The issue with FP8 support seems to be that everyone who wants it also
    wants their own definition so no matter what you do, it will be unused.

    The issue with FP128 seems associated with scaling on LD and ST
    because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
    And in the case of a combined int-float register file deciding whether
    to expand all registers to 128 bits, or use 64-bit register pairs.
    Using 128-bit registers raises the question of 128-bit integer support,
    and using register pairs opens a whole new category of pair instructions.


    I generally went with register pairs...

    Where, say, for base types:
    8-bits: Rarely big enough
    16-bits: Sometimes big enough
    32-bits: Usually big enough
    64-bits: Almost always big enough

    Vector types:
    2x: Good
    4x: Better
    8x: Rarely Needed

    For a scalar type, the high 64 bits of a 128-bit register would be
    almost always wasted, so it isn't worthwhile to spend resources on
    things that are mostly just going to waste.



    At least with 64-bit registers, they cover:
    Integer values: Usually overkill
    'int' is far more common than 'long long'.
    Floating Point: Usually Optimal
    Binary64 is almost always good.
    Binary32 is frequently insufficient.
    2x Binary32 and 4x Binary16: OK

    Then, 128-bit as pairs:
    Deals with the occasional 128-bit vector and integer;
    Avoids wasting resources all the times we don't need it.

    Well, since computation isn't exactly a gas that expands to efficiently utilize the register size (going bigger = diminishing returns).


    If the CPU is superscalar, can use 2x64b lanes for the 128-bit path, ...


    As for Binary128:
    Infrequently used;
    Too expensive for direct hardware support;
    So, ended up adding a trap-only support;
    Trap-only allows it to exist without also eating the FPGA.

    As for FP8:
    There are multiple formats in use:
    S.E3.M4: Bias=7 (Quats / Unit Vectors)
    S.E3.M4: Bias=8 (Audio)
    S.E4.M3: Bias=7 (NN's)
    E4.M4: Bias=7 (HDR images)

    Then, for 16-bit:
    S.E5.M10: Generic, Graphics Processing, Sometimes 3D Geometry
    Sometimes not enough dynamic range.
    S.E8.M7: NNs
    Usually not enough precision.

    It is likely the more optimal 16-bit format might actually be S.E6.M9,
    but this is non-standard.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Oct 4 12:37:18 2025
    From Newsgroup: comp.arch

    Stephen Fuld wrote:
    On 10/2/2025 7:50 PM, MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow.
    --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    I must be missing something.  Suppose I have

    C := A + B

    where A and C are 16 bit signed integers and B is an 8 bit signed
    integer.  As I understand what you are doing, loading B into a register will leave the high order 56 bits zero.  But the add instruction will presumably be half word, so if B is negative, it will get an incorrect > answer (because B is not sign extended to 16 bits).

    What am I missing?


    I am pretty sure A would be sign extended to 64 bit on load and the same for B, from 8->64 bits, at which point the addition works as it should?
    When storing a 64-bit result as a 16-bit signed integer, the cpu can
    verify that the top 48 bits are either all 1 or all 0.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 4 10:17:41 2025
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    LLVM compiles C with stricter typing than GCC resulting in a lot
    of smashes:: For example::

    int subroutine( int a, int b )
    {
    return a+b;
    }

    Compiles into:

    subroutine:
    ADD R1,R1,R2
    SRA R1,R1,<32,0> // limit result to (int)
    RET

    I tested this on AMD64, and did not find sign-extension in the caller,
    neither with gcc-14 nor with clang-19; both produce the following code
    for your example (with "subroutine" renamed into "subroutine1").

    0000000000000000 <subroutine1>:
    0: 8d 04 37 lea (%rdi,%rsi,1),%eax
    3: c3 ret

    It's not about strict or lax typing, it's about what the calling
    convention promises about types that are smaller than a machine word.
    If the calling convention requires/guarantees that ints are
    sign-extended, the compiler must use instructions that produce a
    sign-extended result. If the calling convention guarantees that ints
    are zero-extended (sounds perverse, but RV64 has the guarantee that
    unsigned is passed in sign-extended form, which is equally perverse),
    then the compiler must use instructions that produce a zero-extended
    result (e.g., AMD64's addl). If the calling convention only requires
    and guarantees the low-order 32 bits (I call this garbage-extended),
    then the compiler can use instructions that perform 64-bit adds; this
    is what we are seeing above.

    The other side of the medal is what is needed at the caller: If the
    caller needs to cconvert a sign-extended int into a long, it does not
    have to do anything. If it needs to convert a zero-extended or garbage-extended int into a long, it has to sign-extend the value.

    I have tested this with:

    int subroutine2(int,int);

    long subroutine3(int a,int b)
    {
    return subroutine2(a,b);
    }

    On AMD64 the result is:

    gcc-14:
    0000000000000010 <subroutine3>:
    10: 48 83 ec 08 sub $0x8,%rsp
    14: e8 00 00 00 00 call 19 <subroutine3+0x9>
    19: 48 83 c4 08 add $0x8,%rsp
    1d: 48 98 cltq
    1f: c3 ret

    clang-19:
    0000000000000010 <subroutine3>:
    10: 50 push %rax
    11: e8 00 00 00 00 call 16 <subroutine3+0x6>
    16: 48 98 cltq
    18: 59 pop %rcx
    19: c3 ret

    The compilers introduce the sign-extension CLTQ because the result of
    the call is not sign-extended. For parameter passing, it's the same:

    int subroutine4(long,long);

    long subroutine5(int a,int b)
    {
    return subroutine4(a,b);
    }

    0000000000000020 <subroutine5>:
    20: 48 83 ec 08 sub $0x8,%rsp
    24: 48 63 f6 movslq %esi,%rsi
    27: 48 63 ff movslq %edi,%rdi
    2a: e8 00 00 00 00 call 2f <subroutine5+0xf>
    2f: 48 83 c4 08 add $0x8,%rsp
    33: 48 98 cltq
    35: c3 ret
    0000000000000020 <subroutine5>:
    20: 50 push %rax
    21: 48 63 ff movslq %edi,%rdi
    24: 48 63 f6 movslq %esi,%rsi
    27: e8 00 00 00 00 call 2c <subroutine5+0xc>
    2c: 48 98 cltq
    2e: 59 pop %rcx
    2f: c3 ret

    BTW, In C as it was originally conceived, that was not an issue,
    because int occupied a complete register and all smaller types are
    converted to ints. The I32LP64 mistake has required to insert a lot
    of sign-extensions (and C compiler writers embrace undefined behaviour
    to avoid that in some cases).

    Another mistake we see in this example is the 16-byte alignment
    requirement of SSEx. It results in the RSP adjustments around the
    call. If only AMD had decided to support unaligned SSEx memory
    accesses by default in 64-bit mode.

    LLVM thinks the smash is required because [-2^31..+2^31-1] +
    [-2^31..+2^31-1] does not always fit into [-2^31..+2^31-1] !!!
    and chasing down all the cases is harder than the compiler is
    ready to do.

    In your example, there is nothing to chase down, because subroutine()
    can be called from anywhere.

    At first I though that the Value propagation in
    LLVM would find that the vast majority of arithmetic does not
    need smashing. This proved frustrating to both myself and to
    Brian. The more I read RSIC-V and ARM assembly code, the more
    I realized that adding sized integer arithmetic is the only
    way to get through to the LLVM infrastructure.

    You might try changing the calling convention for int to
    garbage-extended. It can introduce sign or zero extension elsewhere,
    but maybe fewer than otherwise.

    RISC-V has ADDW (but no ADDH or ADDB) to alleviate the issue on
    a majority of calculations.

    That's an RV64 extension. RV32 does not have ADDW.

    ARM has word sized Registers to
    alleviate the issue. Since ARM started as 32-bits ADDW is natural.

    Not at all. ARM A64 is a completely new instruction set that has at
    least as much in common with PowerPC as with ARM A32 or ARM T32. I
    expect that they would not have added the 32-bit ADDW or the
    addressing modes with sign- or zero-extended 32-bit indexes if the
    MIPS and Alpha people had not made the I32LP64 mistake. Instead, they
    would have used the encoding space for more useful things.

    I am exploring how to provide integer arithmetic such that smashing
    never has to happen.

    If you want to avoid every use of a separate sign-extension or
    zero-extension instruction, add three bits to every source-register
    specifier: 2 bits for the input size (1,2,4,8 bytes), 1 for
    signed/unsigned. Once you have that, there is no need to extend to
    result: you always can perform the extension on input to the use of a
    result; the natural calling convention to go along with that is to garbage-extend.

    I don't think that extension instructions are frequent enough to merit
    going to such lengths. I actually think that the RISC-V people made
    the wrong choice here, contrary to their usual stance. Instead of
    having sign-extension as a separate instruction (like zero-extension),
    they added it to a number of integer instructions, inflating the
    number of instructions for little benefit.

    So let's look at some egregious cases::

    cvtds r2,r2 // convert double to signed 64
    srl r3,r2,#0,#32 // convert signed 64 to signed 32

    unsigned?

    --------
    sra r1,r23,#0,#32 // smash to signed 32
    sra r2,r20,#0,#32 // smash to signed 32
    maxs r23,r2,r1 // max of signed 32

    With garbage-extension, you need a 32-bit maxs or sign-extend the
    operands. But you are sign-extended; why do you need it?

    Such things are not necessary with garbage-extension for add, sub,
    mul, and, or xor, i.e., the most common operations.

    --------
    ldd r24,[r24] // LD signed 64
    add r1,r28,#1 // innocently add #1
    sra r28,r1,#0,#32 // smash to Signed 32
    cmp r1,r28,r16 // to match the other operand of CMP

    Similar to the maxs case.

    --------
    call strspn
    srl r2,r1,#0,#32 // smash result Signed 32
    add r1,r25,-r1
    sra r1,r1,#0,#32 // smash Signed 32
    cmp r2,r19,r2
    srl r2,r2,#2,#1
    add r21,r21,r2 // add Bool to Signed 32
    sra r2,r20,#0,#32 // smash Signed 32
    maxs r20,r1,r2 // MAX Signed 32

    Maybe the right way here is to use size_t for the variable where you
    put the return value (strspn() returns a size_t).

    --------
    mov r1,r29 // Signed 64
    ple0 r17,FFFFFFF // ignore
    stw r17,[ip,key_rows] // ignore
    add r1,r29,#-1 // innocent subtract
    sra r1,r1,#0,#32 // smash to Signed 32
    divs r1,r1,r17 // DIV Signed 32

    Division is one of the operations where garbage-extended input is not
    ok; but fortunately it is rare.

    I doubt any compilers will use this feature.

    RISC-V and ARM LLVM compilers already do this and use it to eliminate >smashes.

    Shortly after we got our first Alphas in 1995, I saw DEC's C compiler
    produce lots of explicit sign-extensions (using the addl instruction)
    of both int operands and int results. In later years they got the
    compiler to emit many fewer sign-extensions. I don't remember seeing
    that many sign extensions on Alpha from gcc, ever, so apparently they
    already kept track of the extension status of a value at the time.

    On the other hand:: ILP64 ALSO gets rid of the problem (at a different cost).

    Exactly. If the I32LP64 mistake had not been made, we would have been
    spared a lot (not just extension instructions). But for ARM A64 and
    RV64, they have to adapt to the world as it is, not as it should be,
    and unfortunately that means I32LP64. For MY66000, it's your call, of
    course.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 11:52:22 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    If you are not familiar with them, they are:

    - INTEGER takes up one storage unit
    - REAL takes up one storage unit
    - DOUBLE PRECISION takes up two storage units

    where storage units are implementation-defined. Also consider
    that 32-bit REALs and 64-bit REALs are both useful and needed,
    and that (unofficially) C's integers were identical to
    FORTRAN's INTEGER.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 4 16:11:37 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    I am not familiar enough with FORTRAN to give a recommendation on
    that. However, two observations:

    * The Cray-1 is primarily a Fortran machine, and it's C implementation
    is ILP64, and it is successful. So obviously an ILP64 C can live
    fine with FORTRAN.

    * Whatever inconvenience ILP64 would have caused to Fortran
    implementors is small compared to the cost in performance and
    reliability that I32LP64 has cost in the C world and the cost in
    encoding space (and thus code size) and implementation effort and
    transistors (probably not that many, but still) that it is costing
    all customers of 64-bit processors.

    If you are not familiar with them, they are:

    - INTEGER takes up one storage unit
    - REAL takes up one storage unit
    - DOUBLE PRECISION takes up two storage units

    where storage units are implementation-defined. Also consider
    that 32-bit REALs and 64-bit REALs are both useful and needed,
    and that (unofficially) C's integers were identical to
    FORTRAN's INTEGER.

    And unofficially C's integers were as long as pointers (with a legacy
    reaching back to BCPL). If I had to choose between breaking an
    unofficial FORTRAN-C interface tradition and a C-internal tradition, I
    would choose the C-internal tradition every time.

    There are two other languages that I have thought about:

    Java was introduced with fixed-size 32-bit int and 64-bit long, and
    with references typically having the size of a machine word. The
    choice of "int" and "long" may be due to I32LP64, and if the C people
    had gone for ILP64, the Java people might have chosen different names.
    But given their goal of write-once-run-everywhere with bit-identical
    results, they probably did not want to provide a machine-word-sized
    integer type. Java became popular when 32-bit machines were still a
    thing for running Java, so there would be lots of Java around that
    uses the 32-bit integer type. Given the large amount of Java code,
    that alone might be enough to make computer architects want to add
    special architectural support for signed 32-bit integers. At least we
    would have been spared architectural support for unsigned 32-bit
    integers.

    AFAIK Rust does not have a machine-word-sized integer type; instead,
    each type has its size in its name (e.g., i32, u64). Given that Rust
    was designed recently, that does not lead to portability problems yet:
    On servers, desktops (and recently smartphones) machine words are only
    64 bits, so if you write for that, you can just use i64 and u64, and
    your software will be efficient (or you can use smaller integers, and
    unless you store a lot of them, your software will be inefficient on
    various machines thanks to sign or zero extension). If you program on
    an embedded system, the code probably won't be ported to a machine
    with a different word size, so again, choosing the integer types that
    match the word size is a good choice. If there is ever a transition
    to 128-bit machines, I expect that the Rust approach will backfire,
    but who knows if Rust will still be in significant use by then. If it
    is, it may result in costs like I32LP64 is causing now.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sat Oct 4 20:44:37 2025
    From Newsgroup: comp.arch

    On Sat, 04 Oct 2025 16:11:37 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:


    AFAIK Rust does not have a machine-word-sized integer type; instead,
    each type has its size in its name (e.g., i32, u64).

    Rust has machine-dependent isize and usize types, identical to ptrdiff_t
    and size_t in C.





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sat Oct 4 20:51:43 2025
    From Newsgroup: comp.arch

    On Sat, 04 Oct 2025 16:11:37 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    I am not familiar enough with FORTRAN to give a recommendation on
    that. However, two observations:

    * The Cray-1 is primarily a Fortran machine, and it's C implementation
    is ILP64, and it is successful. So obviously an ILP64 C can live
    fine with FORTRAN.


    I would guess that Cray-1 FORTRAN was not 100% conformant to FORTRAN 77 standard. And they likely didn't care.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Oct 4 18:01:59 2025
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    FORTRAN INTEGER == INT32_T

    allowing ILP64.

    If you are not familiar with them, they are:

    - INTEGER takes up one storage unit
    - REAL takes up one storage unit
    - DOUBLE PRECISION takes up two storage units

    where storage units are implementation-defined. Also consider
    that 32-bit REALs and 64-bit REALs are both useful and needed,
    and that (unofficially) C's integers were identical to
    FORTRAN's INTEGER.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Oct 4 18:05:18 2025
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    I am not familiar enough with FORTRAN to give a recommendation on
    that. However, two observations:

    * The Cray-1 is primarily a Fortran machine, and it's C implementation
    is ILP64, and it is successful. So obviously an ILP64 C can live
    fine with FORTRAN.

    * Whatever inconvenience ILP64 would have caused to Fortran
    implementors is small compared to the cost in performance and
    reliability that I32LP64 has cost in the C world and the cost in
    encoding space (and thus code size) and implementation effort and
    transistors (probably not that many, but still) that it is costing
    all customers of 64-bit processors.

    If you are not familiar with them, they are:

    - INTEGER takes up one storage unit
    - REAL takes up one storage unit
    - DOUBLE PRECISION takes up two storage units

    where storage units are implementation-defined. Also consider
    that 32-bit REALs and 64-bit REALs are both useful and needed,
    and that (unofficially) C's integers were identical to
    FORTRAN's INTEGER.

    And unofficially C's integers were as long as pointers (with a legacy reaching back to BCPL). If I had to choose between breaking an
    unofficial FORTRAN-C interface tradition and a C-internal tradition, I
    would choose the C-internal tradition every time.

    There is a quote from K&R C that states int is the most efficient
    form for computing integer arithmetic values.

    With the demand for int to remain 32-bits and the countering demand
    of LLVM to obey typing, int no longer obeys its original stated goal.


    - anton
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sat Oct 4 14:42:25 2025
    From Newsgroup: comp.arch

    Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:
    MitchAlsup wrote:
    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow.
    --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !
    Why? Compilers do not have any problem with this
    as its been handled by overload resolution since forever.

    A non-My66000 example:

    int add (int a, int b)
    {
    return a + b;
    }

    is translated on powerpc64le-unknown-linux-gnu (with -O3 to)

    add 3,3,4
    extsw 3,3
    blr

    extsw fills the 32 high-value bits with because numbers returned
    in registers have to be correct, either as 32- or 64-bit values.

    Ok I see what's going on - the reference to strong typing got me
    thinking this was about operand type matching.

    Above it is treating integer arguments and return types that are
    smaller than full register width, and presumably short and char also,
    as modulo (wrapping) data types and converting them to canonical
    form by sign or zero extension. That avoids later problems in compare operations where the low order bits match but high order bits differ.

    A strong typed language would have a separate data types for signed
    and unsigned linear integers, signed and unsigned modulo integers.
    The sign/zero extend for modulo result types would mask any overflow
    and prevent proper result overflow checking.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 18:55:05 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?

    I am not familiar enough with FORTRAN to give a recommendation on
    that. However, two observations:

    * The Cray-1 is primarily a Fortran machine, and it's C implementation
    is ILP64, and it is successful. So obviously an ILP64 C can live
    fine with FORTRAN.

    As you may know, the Cray-1 was a very special machine, which got
    away with a lot of idiosyncracies because it was blindingly fast
    (and caused users a lot of trouble with conversion between DOUBLE
    PRECISION and REAL).

    But that was in the late 1970s. By the time the 64-bit worksations
    were being designed, REAL was firmly established as 32-bit and
    DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
    and the very 32-bit workstations that the 64-bit workstations were
    supposed to replace.


    * Whatever inconvenience ILP64 would have caused to Fortran
    implementors is small compared to the cost in performance and
    reliability that I32LP64 has cost in the C world and the cost in
    encoding space (and thus code size) and implementation effort and
    transistors (probably not that many, but still) that it is costing
    all customers of 64-bit processors.

    A 64-bit REAL and (consequently) a 128-bit DOUBLE PRECISION
    would have made the 64-bit workstaions pretty much unusable for
    scientific use, and a lot of these were aimed at the technical
    and scientific market, and that meant FORTRAN.

    So, put yourself into the shoes of the people designing workstations
    RS4000 they could allow their scientific and technical customers
    to use the same codes "as is", with no conversion, or tell them
    they cannot use 32-bit REAL any more, and that they need to rewrite
    all their software.

    What would they have expected their customers to do? Buy a system
    which forces them to do this, or buy a competitor's system where
    they can just recompile their software?

    You're always harping about how compilers should be bug-comptatible
    to previous releases. Well, that would have been the mother of
    all incompatiblities, aka business suicide.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 16:04:54 2025
    From Newsgroup: comp.arch

    On 10/4/2025 12:44 PM, Michael S wrote:
    On Sat, 04 Oct 2025 16:11:37 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:


    AFAIK Rust does not have a machine-word-sized integer type; instead,
    each type has its size in its name (e.g., i32, u64).

    Rust has machine-dependent isize and usize types, identical to ptrdiff_t
    and size_t in C.


    I guess, if starting clean slate (in a from-scratch language), it might
    make sense to have:
    A range of defined fixed sizes;
    A range of types whose size is a product of various machine constraints.


    So, say:
    u8/u16/u32/u64/u128 //Unsigned, fixed size, default endian
    s8/s16/s32/s64/s128 //Signed, fixed size, default endian
    u8l/u16l/u32l/u64l/u128l //Unsigned, fixed size, little endian
    s8l/s16l/s32l/s64l/s128l //Signed, fixed size, little endian
    u8b/u16b/u32b/u64b/u128b //Unsigned, fixed size, big endian
    s8b/s16b/s32b/s64b/s128b //Signed, fixed size, big endian
    u8l/s8l/u8b/s8b: Technically redundant with u8/s8, but added for
    consistency.


    i8/i16/i32/i64/i128, could also make sense.
    Could also have sbit(N) and ubit(n) which specify exact width types, but otherwise behave like the normal integer types. The power-of-2 sizes
    could be seen as mostly equivalent to the fixed-size types.


    Floating point types:
    f16/f32/f64/f128
    f8/f8a/f8u/...: Assortment of 8-bit types.
    Since no one-size-fits-all with FP8.
    (Maybe also with f*l and f*b variants?).

    Machine constraint-sized types:
    sasize/uasize: Size for arrays and similar
    spsize/upsize: Size for pointers and pointer differences
    sfsize/ufsize: Size for file offsets
    int: default 'fast' size (32 or 64 bits)
    long: default 'large but fast' size (64 or 128 bits)
    Would be 64 if machine only has 64 bit ALU operations;
    Would be 128 if machine has a 128-bit ALU available.
    intmul: Whichever size allows the fastest integer MUL or MAC.
    More likely to be 16 or 32 bits.
    ...

    Special types:
    void: No Type, pointers may freely convert to other types
    m8: Like void, but with a defined size, but no operators.
    m8 could be assumed the default type for raw memory buffers.
    m8 pointers may be freely cast to/from other pointer types.
    m16/m32/m64/m128: Has size but no defined operators.
    Casts involving these types will be bit-preserving.
    Size-mismatched casts will not be allowed.

    May use slightly different type promotion rules from C, for integer types:
    Td = Ts OP Tt
    If the range of Td is greater or equal to (Ts OP Tt)
    then promote to the wider of the two;
    (Ts OP Tt)
    Promotes by default to the wider of Ts or Tt.
    If a signed/unsigned mismatch of same size or smaller signed type,
    promote to the next larger signed type.
    (Note: NOT the "same sized unsigned" as C would use).
    If the range of Td is less than (Ts OP Tt)
    If the result will be the same either way,
    promote to most efficient type to carry out operation
    Or, use Td if doing so is efficient.
    Narrow result if needed
    Td narrower than intermediate type.
    Else, promote to type of (Ts OP Tt), and narrow result.

    In this case, the types may flow-out from the inputs and operators, but
    also flow-in from the destination type. Usually C lacks the flowing-in
    part, but it is relevant for efficient code generation.

    Note that the inward flow may happen recursively, where if Td promotion
    is used for an outward expression, the two sub-expressions may be
    re-evaluated in light of 'Td' as the destination type (vs merely the
    result of the input expressions).

    Unlike C, would still apply the same promotion behavior to 8 and 16 bit
    types as for wider types (so, there is no implicit "first auto-promote everything to int" rule). Though, it can generally still use wider ALU
    so long as the result value will retain the expected sign or zero extension.


    This would differ from C's behavior in the case of widening expressions,
    in that operating on narrower types and storing the result as a wider
    type will promote first (so no overflow happens) rather than in C where
    an overflow may happen with the narrower types and promoted after the fact.

    This would have fewer "gotchas" on average than the C approach, but C's
    rules need to be maintained for C code, as some code will break if the original integer overflow behavior is not preserved. But, the existing
    rules are not entirely consistent.

    Can make the working assumption that widening is cheap but narrowing has
    a non-zero cost (though, this is the reverse from the normal RV ABI,
    where on RV64G the ABI would normally have people pay the cost at
    "unsigned int"->"long" promotion).

    In the abstract model, all narrower signed or unsigned types are sign or
    zero extended to the maximum widest type in play; we can also assume
    twos complement as the working model; ...



    The big and little endian types would mostly apply to structures and
    pointers. They would only effect local variables if the address of the
    local variable is taken (else the machine default is used; or "all
    choices being equal" assume little endian).

    By default, assume native alignment of a type unless a packed modifier
    is used (with packed applied either par variable or for the structure as
    a whole). If no packed is used, the alignment of a struct will be the
    widest member in the struct. If used on a struct, the whole struct will
    assume byte alignment. Else, the alignment will be the largest alignment
    seen within the struct (or the largest non-packed member). Could maybe
    have an 'align_as()' modifier (to specify to use the same alignment as
    another type) with the packed case being equal to byte alignment.

    Possible:
    Allow 'if()' in structs, but would be evaluated as a compile-time
    constant (so in this sense, functions more like an ifdef, just evaluated
    later in the process).

    Might also allow VLA-like patterns if the expression is a compile time constant. Could allow a VLA as the final member of a struct, which will
    be understood the same as a zero-element array. Will have the side
    effect that the size of the struct is unknown, and it may not be used in arrays nor as the non-final member of a parent struct (and if present,
    will apply the same property on the parent struct).


    Note that structs may be classified as serializable or non-serializable. Serializable structs will need a fixed and unambiguous size;
    They will explicitly disallow pointers, references, or any other types
    that can't be serialized.

    Serializable structs would be assumed to be able to be safely read from
    or written to a file or socket, ...


    Might make sense, in such a language, to have an object model similar to C#: Structs exist, by-value by default;
    Classes always by-reference, with a single inheritance and interfaces model; Maybe for nicety, assume that interfaces can be mapped to COM-like
    objects (should map the underlying COM layout);
    ...

    Could also assume similar scoping rules to C#, with full scope known at
    the time an EXE or DLL is compiled (any undefined types or variables at
    this stage being a compiler error). The front-end parser and compiler
    would be required to still work even without a full knowledge of the type-system (WRT class-like types), but may enforce stricter constraints
    on normal value types. Though, if doing separate compilation, this only
    allows partial compilation of some features (the object system will need
    to be sorted out at link time).

    Would not have C++ style templates, but could still have generics.


    But:
    No garbage collector;
    Objects may have an explicit automatic lifetime.

    Say:
    Foo! foo();
    Does not mean that it is necessarily stack-allocated or by-value (unlike
    C++), but will mean that 'foo' will be auto-deleted when foo goes out of scope.

    Similar could also be applied to class members, so a T! member is
    auto-deleted when the parent goes out of scope. Could maybe also
    consider "T^" for cases where the member is to use reference counting
    (though count also make sense on the class definition).

    so, some modifiers could be applied one of several places:
    Class definition: Default behavior to be used, may be overridden.
    Variable: Used in this context, may override class.
    "new()": Used at object creation for dynamically created objects.

    With possible syntax:
    T //base type, default behavior, global lifetime for objects.
    T* //pointer, structs, N/A for class objects
    T! //automatic / parent-scope lifetime
    T^ //reference counted
    T(Z) //zone lifetime

    Typically the stronger rule may be used, with it being a compiler error
    if a variable or member doesn't match the lifetime specified elsewhere
    (though with fudging for "T!" as it would apply to the point of creation and/or place-of-residence of the object in question). As such, it is
    likely that "T!" class members would primarily be initialized in
    constructors (but may be treated as 'final' outside of a constructor for
    the class in question).

    zones will be compile-time entities. It could be treated as an error for
    an object in a longer-lived zone to have a reference with a
    shorter-lived zone. Though, unclear how to enforce this at compile time.
    Zone lifetime would depend on program control flow rather than known at compile time. Though, a zone-tree could be defined at compile time, and
    the compiler or runtime could error-out or fault if it detects zone
    creation or destruction which deviates from the specified dependency order.

    zonedef Z; //define a zone Z, parent of Z is global
    zonedef Z(Zp); //define zone Z whose lifetime exists within Zp.
    If Z is live and Zp is destroyed, throw.
    If Z is created and Zp is not live, throw
    If an object in Z is created, and Z is not live, throw.
    ...


    In most cases, 'delete' could be discouraged, as the only time delete is likely to be needed is if lifetime is poorly specified in some other
    way. But, we don't need generalized garbage collection, as pretty much
    no one has really made this work acceptably.

    Reference counting may leak memory, though one possibility could be to
    try to detect and flag cycle-formation when creating object graphs, with
    an explicit "weak object reference" being created in cases where cycle-creation is detected (in this case, the reference count is
    special). If the reference count for non-weak references drops to 0, it destroys the object. Downside: This puts some of the computational cost
    of a mark/sweep collector into the code for incrementing and
    decrementing reference counts.

    Though possible is allowing both reference-counting and zones on the
    same object, in which case the zone may clean up leaks from the reference-counter (assuming periodic zone destruction).


    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:28:09 2025
    From Newsgroup: comp.arch

    On 10/4/2025 4:56 AM, BGB wrote:
    On 10/3/2025 4:04 PM, Thomas Koenig wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
    --------------------------------------------------------------
    Integer instructions are now::      {Signed and unSigned}×{Byte, >>>>> HalfWord,
    Word, DoubleWord}
    while FP instructions are now:
          {Byte, HalfWord, Word, DoubleWord}

    I doubt any compilers will use this feature.
    Strong typed languages don't have predefined operators that allow
    mixing.

    Not sure who's confused, but my reading of the above is not some sort of >>> "mixing": I believe Mitch is just saying that his addition operation
    (for example) can be specified to operate on either one of int8, uint8,
    int16, uint16, ...
    But that specification applies to all inputs and outputs of the
    instruction, so it does not support adding an int8 to an int32, or other >>> "mixes".

    The outputs are correctly extended to a 64-bit number (signed or
    unsigned) so it is possible to pass results to wider operations
    without conversion.

    One example would be

    unsigned long foo (unsigned int a, unsigned int b)
    {
       return a + b;
    }

    which would need an adjustment after the add, and which would
    just be somethign like

        adduw    r1,r1,r2
        ret

    using Mitch's new encoding.



    Yes.

    Sign extend signed types, zero extend unsigned types.
    Up-conversion is free.


    This is something the RISC-V people got wrong IMO, and adding a bunch of ".UW" instructions in an attempt to patch over it is just kinda ugly.

    Partly for my own uses revived ADDWU and SUBWU (which had been dropped
    in BitManip), because these are less bad than the alternative.

    I get annoyed that new extensions keep trying to add ever more ".UW" instructions rather than just having the compiler go over to zero-
    extended unsigned and make this whole mess go away.

    ...



    Ironically, the number of new instructions being added to my own ISA has mostly died off recently, largely because there is little particularly relevant to add at this point (within the realm of stuff that could be added).


    Going and looking back, most major new instructions added were:
    BITMOV and BITMOV.S, ~ 7 months ago
    Some new ops related to FP8A handling and similar, ~ 2 months ago
    Mostly for Bias=7 (where, FP8A=S.E3.M4, or A-Law format)
    I couldn't just change the Bias=8 ops to 7 without breaking stuff;
    But, for non-audio uses 7 is a lot more useful.
    Mostly used for unit vectors,
    where ability to store values >= 1.0 sometimes needed.
    But, most values still < 1.0 ...
    Sorta relates to Trellis re-normalization trickery.
    Stored vector isn't exactly unit-length, but unit post-renorm.

    A few operations in the "possible" category:
    A few NN related packed multiply instructions;
    Instructions for a possible UVF1 packed block format
    (graphics and NN);
    ...

    FPU Compare 3R instructions, ~8 months ago


    While XG3 was added 11 months ago, it isn't really new instructions, so
    much as a new more and encoding scheme for the same instructions (and it
    was only fairly recently that I got support for predicated instructions implemented in RISC-V).

    And, 12 months ago, a RISC-V target for BGBCC, and jumbo prefixes for
    the RISC-V side, ... Somehow I thought all of this happened several
    years ago, seems it was 1 year.


    Seems initial efforts to start adding RISC-V support were (only) 2 years
    ago.

    A lot more fiddling has been in things mostly related to dealing with
    RISC-V and trying to make it less terrible.


    The stuff for the recent FPU behavior tweaks are more tweaking FPU
    behavior, and haven't really involved adding new instructions (except on
    the RISC-V side, ones which already existed in the RISC-V specs).


    Hmm...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 5 11:58:14 2025
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> writes:
    On Sat, 04 Oct 2025 16:11:37 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:


    AFAIK Rust does not have a machine-word-sized integer type; instead,
    each type has its size in its name (e.g., i32, u64).

    Rust has machine-dependent isize and usize types

    Good. But for some reasons all the examples I have seen use
    integer types like i32 and u64.

    identical to ptrdiff_t and size_t in C.

    I have read that there are C implementation (variants) where ptrdiff_t
    and size_t are smaller than a pointer, in particular large-model C on
    the 8086, and that was the reason for C standard restrictions about
    pointer subtraction and pointer inequality comparison.

    I hope nobody is doing large-model Rust, even though Rust may be more appropriate for that than C.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 5 15:01:06 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?
    ...
    By the time the 64-bit worksations
    were being designed, REAL was firmly established as 32-bit and
    DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
    and the very 32-bit workstations that the 64-bit workstations were
    supposed to replace.

    On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
    is on the PDP-11 (but I remember reading about INTEGER*2 and
    INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
    requires that C's int is as wide as FORTRAN's REAL is broken at some
    point on the PDP-11.

    So your rules do not even work for the first machine where C has been implemented. If shortsighted FORTRAN people look at 32-bit machines
    and become accomodated to C's int being as wide as FORTRAN's INTEGER
    and REAL, they could have known from the PDP-11 that that's going to
    break for other machine word sizes.

    So, put yourself into the shoes of the people designing workstations
    RS4000 they could allow their scientific and technical customers
    to use the same codes "as is", with no conversion, or tell them
    they cannot use 32-bit REAL any more, and that they need to rewrite
    all their software.

    If they want to use their software as-is, and it is written to work
    with an ILP32 C implementation, the only solution is to continue using
    an ILP32 implementation. That's not only for FORTRAN/C mixing, but
    for most C code of the day, certainly with I32LP64; I expect that the
    porting effort would have been smaller with ILP64, but there still
    would have been some.

    BTW, we have a DecStation 5000/150 with an R4000, and all C compilers
    on this machine support ILP32 and nothing else.

    What would they have expected their customers to do? Buy a system
    which forces them to do this, or buy a competitor's system where
    they can just recompile their software?

    If just recompiling is the requirement, what follows is ILP32.

    You're always harping about how compilers should be bug-comptatible
    to previous releases.

    Not in the least. I did not ask for bug compatibility.

    I also did not ask for "compiling as is" on a different architecture,
    much less on a system with different address size.

    I have actually written up what I ask for: <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>. Maybe you
    should read it one day, or reread it given that you have forgotten it.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 5 18:19:47 2025
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    <snip>

    Not in the least. I did not ask for bug compatibility.

    I also did not ask for "compiling as is" on a different architecture,
    much less on a system with different address size.

    I have actually written up what I ask for: <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>. Maybe you
    should read it one day, or reread it given that you have forgotten it.

    In the referenced article you write::
    "Access to uninitialized data is another issue where absolute equivalence
    with the basic model would make important optimizations impossible. Consider
    a variable v at the end of its life (e.g., at the end of a function). Unless the compiler can prove that the location of the variable is not read later
    as a result of reading uninitialized data (say, reading the uninitialized variable w living in the same location in a different function), v would
    have to stay in the same location in future compiler versions or other optimization levels; or at least the final value of v would have to be
    stored in this location, and the initial value of w would have to be
    fetched from this location."

    If variable v and variable w are "stack variables" local to their own subroutines, it seems perfectly reasonable to assume that all deallocated
    stack variables become inaccessible. Then, later when new stack space is allocated those new variables have no relationship to any previously deallocated variables.

    That is: when the stack pointer is incremented the space is no longer accessible and::
    a) any modified cache lines are discarded instead of being written
    to memory--the space is no longer accessible so don't waste power
    making DRAM coherent with inaccessible stack space.

    Later, when the stack pointer is decremented::
    b) new cache line area can be "allocated" without reading DRAM and
    being <conceptually> initialized to zero.


    - anton
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sun Oct 5 19:30:42 2025
    From Newsgroup: comp.arch

    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
    is on the PDP-11 (but I remember reading about INTEGER*2 and
    INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that >FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
    requires that C's int is as wide as FORTRAN's REAL is broken at some
    point on the PDP-11.

    I wrote INFort, one of the two F77 implementations for the PDP-11.
    INTEGER and REAL were the same size because that's what the standard
    said, and any program that used EQUIVALENCE would break otherwise. If
    you wanted shorter ints, INTEGER*2 provided them.

    Bell Labs independently wrote f77 around the same time, and its manual says they did the same thing, INTEGER was C long int, INTEGER*2 was short int.

    If the speed difference mattered, it wasn't hard to say something like

    IMPLICIT INTEGER*2(I-N)

    to make your ints short.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 5 19:51:26 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    The I32LP64 mistake

    If you consider I32LP64 a mistake, how should FORTRAN's (note the
    upper case, this is pre-Fortran-90) storage association rules have
    been handled, in your opinion?
    ...
    By the time the 64-bit worksations
    were being designed, REAL was firmly established as 32-bit and
    DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
    and the very 32-bit workstations that the 64-bit workstations were
    supposed to replace.

    On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
    is on the PDP-11 (but I remember reading about INTEGER*2 and
    INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
    requires that C's int is as wide as FORTRAN's REAL is broken at some
    point on the PDP-11.

    It is possible to have a two-byte integer and a 32-byte real.
    Storage association then requires four bytes for an integer.
    This wastes space for integers (at least for arrays) but that
    is not such a big deal, because most big arrays in scientific
    code are reals.

    The same held for the Cray-1 - default ingegers (24 bit)
    and their weird 64-bit reals

    The main problem is when the size of default INTEGER size _exceeds_ the smallest useful REAL, then REAL arrays either become twice as big,
    plus you need to implement 128-bit REALs.

    So, put yourself into the shoes of the people designing workstations
    RS4000 they could allow their scientific and technical customers
    to use the same codes "as is", with no conversion, or tell them
    they cannot use 32-bit REAL any more, and that they need to rewrite
    all their software.

    If they want to use their software as-is, and it is written to work
    with an ILP32 C implementation, the only solution is to continue using
    an ILP32 implementation.

    So, kill the 64-bit machines in the scientific marketplace. I'm glad
    you agree.


    What would they have expected their customers to do? Buy a system
    which forces them to do this, or buy a competitor's system where
    they can just recompile their software?

    If just recompiling is the requirement, what follows is ILP32.

    There is absolutely no problem with 64-bit pointers when recompiling
    Fortran.


    You're always harping about how compilers should be bug-comptatible
    to previous releases.

    Not in the least. I did not ask for bug compatibility.

    I'll keep that in mind for the next time.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon Oct 6 05:56:53 2025
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    If variable v and variable w are "stack variables" local to their own >subroutines, it seems perfectly reasonable to assume that all deallocated >stack variables become inaccessible.

    That is debatable. This assumption is the basis of "optimizing" away
    memset() (or similar) that is intended to keep the lifetime of secret
    keys as short as possible. After this "optimization", the secret key
    continues to be in memory, and can be extracted through
    vulnerabilities, preserved for much longer in the swap area or in
    snapshots, or in the value of newly allocated uninitialized areas.
    All of which prove that the assumption is wrong.

    Then, later when new stack space is
    allocated those new variables have no relationship to any previously >deallocated variables.

    That is: when the stack pointer is incremented the space is no longer >accessible and::
    a) any modified cache lines are discarded instead of being written
    to memory--the space is no longer accessible so don't waste power
    making DRAM coherent with inaccessible stack space.

    Later, when the stack pointer is decremented::
    b) new cache line area can be "allocated" without reading DRAM and
    being <conceptually> initialized to zero.

    I have outlined ways to optimize zeroing of memory in <2014Jul9.193122@mips.complang.tuwien.ac.at> <2022Aug5.141325@mips.complang.tuwien.ac.at>

    With that idea, the way to use it is to zero the memory when it is
    deallocated (so it is not written back to main memory; it may be
    written to the zero area as part of a larger unit). And to also zero
    it when it is allocated so that there is no need to load the data from
    outer cache levels or main memory (or their equivalents in zeroed
    memory).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon Oct 6 06:26:12 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    [...]
    It is possible to have a two-byte integer and a 32-byte real.

    But according to John Levine that is not what happens on the PDP-11.
    Instead, it has 4-byte INTEGERs, demonstrating that your "unofficial
    rule" that C int is as wide as FORTRAN INTEGER did not hold.

    The same held for the Cray-1 - default ingegers (24 bit)
    and their weird 64-bit reals

    If FORTRAN INTEGERs are 24 bits on the Cray-1, this architecture is
    another example where your "unofficial rule" does not hold. C ints
    are 64-bit on the Cray 1.

    If they want to use their software as-is, and it is written to work
    with an ILP32 C implementation, the only solution is to continue using
    an ILP32 implementation.

    So, kill the 64-bit machines in the scientific marketplace. I'm glad
    you agree.

    Not in the least. Most C programs did not run as-is on I32LP64, and
    that did not kill these machines, either. And I am sure that C
    programs were much more relevant for selling these machines than
    FORTRAN programs. C programmers changed the programs to run on
    I32LP64 (this was called "making them 64-bit-clean"). And until that
    was done, ILP32 was used.

    If just recompiling is the requirement, what follows is ILP32.

    There is absolutely no problem with 64-bit pointers when recompiling
    Fortran.

    Fortran is not the only consideration for designing an ABI for C, if
    it is one at all. The large number of 32bit->64bit sign-extension and zero-extension operations, either explicitly, or integrated into
    instructions such as RISC-V's addw, plus the
    "optimizations"/miscompilations to ged rid of some of the sign
    extensions are a cost that we pay all the time for the I32LP64
    mistake.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Mon Oct 6 14:23:50 2025
    From Newsgroup: comp.arch

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    [...]
    <snip>

    So, kill the 64-bit machines in the scientific marketplace. I'm glad
    you agree.

    Not in the least. Most C programs did not run as-is on I32LP64.

    The vast majority of C/C++ programs ran just fine on I32LP64. There
    were some that didn't, but it was certainly not "most".
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Mon Oct 6 11:51:18 2025
    From Newsgroup: comp.arch

    On 10/6/2025 9:23 AM, Scott Lurndal wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    [...]
    <snip>

    So, kill the 64-bit machines in the scientific marketplace. I'm glad
    you agree.

    Not in the least. Most C programs did not run as-is on I32LP64.

    The vast majority of C/C++ programs ran just fine on I32LP64. There
    were some that didn't, but it was certainly not "most".

    Yes, most programs only needed minor edits.


    Some stuff I had ported:
    Doom: Mostly trivial edits;
    Had to re-implement audio and music handling.
    Heretic and Hexen:
    More edits, mostly removing MS-DOS stuff;
    Had to replace most of the audio and music code.
    ROTT:
    Extensive modification to graphics handling;
    Was very dependent on low-level VGA hardware twiddling.
    (Vs Doom's "Set 320x200 and done" approach).
    Lots of memory management and out-of-bounds issues;
    Some amount of code that is sensitive to integer wrap-on-overflow;
    ...
    (ROTT was a little harder to port)
    Quake:
    Few issues for most of the engine;
    The "progs.dat" VM required getting creative.
    It mixes pointers and 'float' in ways
    "some might consider unnatural"
    Quake 2:
    Basically 64-bit clean out of the box.
    Quake 3:
    The QVM architecture very much assumes 32-bit,
    not really a way to make it 64-bit absent a significant rewrite.
    Did allow for falling back to the Quake2 strategy,
    of using natively compiled DLLs.


    Of the programs, I still have not fully debugged ROTT when built via
    BGBCC, where there is an issue somewhere that is resulting in demo
    desyncs that tend to change from one run to another.

    Last I checked, I had it stable when built with MSVC, and had it
    basically working with a GCC build.


    Can note that ROTT is one of the larger programs I had ported to my
    project (in terms of code size), where both the ROTT and Quake3 ports
    weigh in at a little over 300 kLOC (very much larger than Doom or Quake).

    Quake 3 builds as multiple DLLs, whereas ROTT as a single binary. As
    such, ROTT currently builds the biggest EXE (with around 1MB of ".text").

    Though, curiously, there is (on average) less than 4 bytes per line on
    C, not entirely sure how that happens.

    ...

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Oct 6 17:38:13 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:

    So, kill the 64-bit machines in the scientific marketplace. I'm glad
    you agree.

    Not in the least. Most C programs did not run as-is on I32LP64, and
    that did not kill these machines, either.

    Only those who assumed sizeof(int) = sizeof(char *). This was
    not true on the PDP-11, and it was a standards violation, anyway.
    Only people who liked to play these kind of games (I know you do)
    were caught.

    And I am sure that C
    programs were much more relevant for selling these machines than
    FORTRAN programs.

    Based on what data? Your own personal guess?

    C programmers changed the programs to run on
    I32LP64 (this was called "making them 64-bit-clean"). And until that
    was done, ILP32 was used.

    The problem with 64-bit INTEGERs for Fortran is that they make REAL
    unusable for lots of existing code.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Mon Oct 6 20:02:50 2025
    From Newsgroup: comp.arch

    According to Thomas Koenig <tkoenig@netcologne.de>:
    Not in the least. Most C programs did not run as-is on I32LP64, and
    that did not kill these machines, either.

    Only those who assumed sizeof(int) = sizeof(char *). This was
    not true on the PDP-11, ...

    The PDP-11 was a 16 bit machine with 16 bit ints and 16 bit pointers.
    There were 32 bit long and float, and 64 bit double.

    I didn't port a lot of code from the 11 to other machines, but my recollection is that the widespread assumption in Berkeley Vax code that location zero was addressable and contained binary zeros was much more painful to fix than
    size issues.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Mon Oct 6 20:46:11 2025
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> writes:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    Not in the least. Most C programs did not run as-is on I32LP64, and
    that did not kill these machines, either.

    Only those who assumed sizeof(int) = sizeof(char *). This was
    not true on the PDP-11, ...

    The PDP-11 was a 16 bit machine with 16 bit ints and 16 bit pointers.
    There were 32 bit long and float, and 64 bit double.

    I didn't port a lot of code from the 11 to other machines, but my recollection >is that the widespread assumption in Berkeley Vax code that location zero was >addressable and contained binary zeros was much more painful to fix than
    size issues.

    "location zero was addressible". Might also point out it was RO, but yes
    that caused many problems porting BSD utilities to SVR4.

    The other issue with leaving the PDP-11 for 32-bit systems was the change
    in the size of the PID, UID, and GID. Which required more than a simple recompile, since there weren't abstract types (e.g. pid_t, gid_t, uid_t)
    for those data items yet, so code needed to be updated manually.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Tue Oct 7 01:38:02 2025
    From Newsgroup: comp.arch

    In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    LLVM compiles C with stricter typing than GCC resulting in a lot
    of smashes:: For example::

    int subroutine( int a, int b )
    {
    return a+b;
    }

    Compiles into:

    subroutine:
    ADD R1,R1,R2
    SRA R1,R1,<32,0> // limit result to (int)
    RET

    I tested this on AMD64, and did not find sign-extension in the caller, >neither with gcc-14 nor with clang-19; both produce the following code
    for your example (with "subroutine" renamed into "subroutine1").

    0000000000000000 <subroutine1>:
    0: 8d 04 37 lea (%rdi,%rsi,1),%eax
    3: c3 ret

    It's not about strict or lax typing, it's about what the calling
    convention promises about types that are smaller than a machine word.
    If the calling convention requires/guarantees that ints are
    sign-extended, the compiler must use instructions that produce a >sign-extended result. If the calling convention guarantees that ints
    are zero-extended (sounds perverse, but RV64 has the guarantee that
    unsigned is passed in sign-extended form, which is equally perverse),
    then the compiler must use instructions that produce a zero-extended
    result (e.g., AMD64's addl). If the calling convention only requires
    and guarantees the low-order 32 bits (I call this garbage-extended),
    then the compiler can use instructions that perform 64-bit adds; this
    is what we are seeing above.

    The other side of the medal is what is needed at the caller: If the
    caller needs to cconvert a sign-extended int into a long, it does not
    have to do anything. If it needs to convert a zero-extended or >garbage-extended int into a long, it has to sign-extend the value.

    AMD64 in hardware does 0 extension of 32-bit operations. From your
    example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
    the 64-bit register %rax will have 0's written into bits [63:32].
    So the AMD64 convention for 32-bit values in 64-bit registers is to
    zero-extend on writes. And to ignore the upper 32-bits on reads, so
    using a 64-bit register should use the %exx name.

    I agree with you that I32LP64 was a mistake, but it exists, and I
    think ARM64 did a good job handling it. It has all integer operations
    working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
    it 0-extends the register value.

    You don't want "garbage extend" since you want a predictable answer.
    Your choices for writing 32-bit results in a 64-bit register are thus sign-extend (not a good choice) or zero-extend (what almost
    everyone chose). RISC-V is in another land, where they effectively have
    no 32-bit operations, but rather a convention that all 32-bit inputs
    must be sign-extended in a 64-bit register.

    For C and C++ code, the standard dictates that all integer operations are
    done with "int" precision, unless some operand is larger than int, and then
    do it in that precision. So there's no real need for 8-bit and 16-bit operations to be natively by the CPU--these operations are actually done
    as int's already. If you have a variable which is a byte, then assigning
    to that variable, and then using that variable again you will need to zero-extend, but honestly, this is not usually a performance path. It's
    likely to be stored to memory instead, so no masking or sign extending
    should be needed.

    If you pick ILP64 for your ABI, then you will get rid of almost all of
    these zero- and sign-extensions of 32-bit C and C++ code. It will just
    work. If you pick I32LP64, then you should have a full suite of 32-bit operations and 64-bit operations, at least for all add, subtract, and
    compare operations. And if you do I32LP64, your indexed addressing
    modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
    and 32-bit unsigned. That worked well for ARM64.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Oct 7 15:52:17 2025
    From Newsgroup: comp.arch


    kegs@provalid.com (Kent Dickey) posted:

    In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    LLVM compiles C with stricter typing than GCC resulting in a lot
    of smashes:: For example::

    int subroutine( int a, int b )
    {
    return a+b;
    }

    Compiles into:

    subroutine:
    ADD R1,R1,R2
    SRA R1,R1,<32,0> // limit result to (int)
    RET

    I tested this on AMD64, and did not find sign-extension in the caller, >neither with gcc-14 nor with clang-19; both produce the following code
    for your example (with "subroutine" renamed into "subroutine1").

    0000000000000000 <subroutine1>:
    0: 8d 04 37 lea (%rdi,%rsi,1),%eax
    3: c3 ret

    It's not about strict or lax typing, it's about what the calling
    convention promises about types that are smaller than a machine word.
    If the calling convention requires/guarantees that ints are
    sign-extended, the compiler must use instructions that produce a >sign-extended result. If the calling convention guarantees that ints
    are zero-extended (sounds perverse, but RV64 has the guarantee that >unsigned is passed in sign-extended form, which is equally perverse),
    then the compiler must use instructions that produce a zero-extended
    result (e.g., AMD64's addl). If the calling convention only requires
    and guarantees the low-order 32 bits (I call this garbage-extended),
    then the compiler can use instructions that perform 64-bit adds; this
    is what we are seeing above.

    The other side of the medal is what is needed at the caller: If the
    caller needs to cconvert a sign-extended int into a long, it does not
    have to do anything. If it needs to convert a zero-extended or >garbage-extended int into a long, it has to sign-extend the value.

    AMD64 in hardware does 0 extension of 32-bit operations. From your
    example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
    the 64-bit register %rax will have 0's written into bits [63:32].
    So the AMD64 convention for 32-bit values in 64-bit registers is to zero-extend on writes. And to ignore the upper 32-bits on reads, so
    using a 64-bit register should use the %exx name.

    I agree with you that I32LP64 was a mistake, but it exists, and I
    think ARM64 did a good job handling it. It has all integer operations working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
    it 0-extends the register value.

    You don't want "garbage extend" since you want a predictable answer.

    Strongly Agree.

    Your choices for writing 32-bit results in a 64-bit register are thus sign-extend (not a good choice) or zero-extend (what almost
    everyone chose). RISC-V is in another land, where they effectively have
    no 32-bit operations, but rather a convention that all 32-bit inputs
    must be sign-extended in a 64-bit register.

    Why not zero extend unSigned and sign extend Signed ?!?
    That way the value in the register is (IS) the value in the smaller
    container !!

    Also, why not extend this to both shorts and chars ?!?

    For C and C++ code, the standard dictates that all integer operations are done with "int" precision, unless some operand is larger than int, and then do it in that precision. So there's no real need for 8-bit and 16-bit operations to be natively by the CPU--these operations are actually done
    as int's already. If you have a variable which is a byte, then assigning
    to that variable, and then using that variable again you will need to zero-extend,

    You could perform the operation at base-size (byte in this case).

    Languages like ADA are not defined like C.

    but honestly, this is not usually a performance path. It's likely to be stored to memory instead, so no masking or sign extending
    should be needed.

    If you pick ILP64 for your ABI, then you will get rid of almost all of
    these zero- and sign-extensions of 32-bit C and C++ code.

    Then, the only access to 32-bit integers is int32_t and uint32-t.

    It will just
    work. If you pick I32LP64, then you should have a full suite of 32-bit operations and 64-bit operations, at least for all add, subtract, and
    compare operations. And if you do I32LP64, your indexed addressing
    modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
    and 32-bit unsigned. That worked well for ARM64.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 7 11:27:39 2025
    From Newsgroup: comp.arch

    kegs@provalid.com (Kent Dickey) writes:
    In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    int subroutine( int a, int b )
    {
    return a+b;
    }
    ...
    I tested this on AMD64, and did not find sign-extension in the caller, >>neither with gcc-14 nor with clang-19; both produce the following code
    for your example (with "subroutine" renamed into "subroutine1").

    0000000000000000 <subroutine1>:
    0: 8d 04 37 lea (%rdi,%rsi,1),%eax
    3: c3 ret
    ...
    AMD64 in hardware does 0 extension of 32-bit operations. From your
    example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
    the 64-bit register %rax will have 0's written into bits [63:32].
    So the AMD64 convention for 32-bit values in 64-bit registers is to >zero-extend on writes. And to ignore the upper 32-bits on reads, so
    using a 64-bit register should use the %exx name.

    Interesting. At some point I got the impression that LEA produces a
    64-bit result, because it produces an address, but testing reveals
    that LEA has a 32-bit zero-extended variant indeed.

    I agree with you that I32LP64 was a mistake, but it exists, and I
    think ARM64 did a good job handling it. It has all integer operations >working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
    it 0-extends the register value.

    You don't want "garbage extend" since you want a predictable answer.

    Zero-extended for unsigned and sign-extended for int are certainly
    more forgiving when some function is called without a prototype and
    the actual type does not match the implied type (I once read about
    IIRC miranda prototypes, but a web search only gives me Star Trek
    stuff when I ask for that).

    Zero-extending for int is less forgiving. Apparently by 2003 (when
    AMD64 appeared) the use of prototypes was widespread enough that such
    a calling convention was acceptable.

    But once all the functions have correct prototypes, garbage-extension
    is just as workable as other alternatives.

    Your choices for writing 32-bit results in a 64-bit register are thus >sign-extend (not a good choice) or zero-extend (what almost everyone chose).

    What makes you think that one is a better choice than the other?

    The most obvious choices to me are:

    Sign-extend int and zero-extend unsigned: That has the best chance at
    the expected behaviour when the prototype is missing and would be
    required.

    If you rely on prototypes being present, you can take any choice,
    including garbage-extension. Then you can use the full 64-bit
    operation in many cases, and only insert sign or zero extension when a conversion from 32-bit to 64 bit is needed (and that extension can be
    part of an instruction, as in ARM A64 addressing modes).

    As for what "almost everyone chose", here's some data:

    int unsigned ABI
    sign-extended sign-extended MIPS o64 and 64
    sign-extended zero-extended SPARC V9
    sign-extended zero-extended PowerPC64
    zero-extended zero-extended AMD64
    zero-extended zero-extended ARM A64
    sign-extended sign-extended RV64

    I determined this by looking at the code for

    unsigned usubroutine( unsigned a, unsigned b )
    {
    return a+b;
    }

    int isubroutine( int a, int b )
    {
    return a+b;
    }

    The code on variois architectures (as compiled with gcc -O) is:

    MIPS64 (gcc -mabi=64 -O and gcc -mabi=o64 -O):
    0000000000000034 <usubroutine>:
    34: 03e00008 jr ra
    38: 00851021 addu v0,a0,a1

    000000000000003c <isubroutine>:
    3c: 03e00008 jr ra
    40: 00851021 addu v0,a0,a1

    SPARC V9:
    0000000000000018 <usubroutine>:
    18: 9d e3 bf 50 save %sp, -176, %sp
    1c: b0 06 00 19 add %i0, %i1, %i0
    20: 81 cf e0 08 return %i7 + 8
    24: 91 32 20 00 srl %o0, 0, %o0

    0000000000000028 <isubroutine>:
    28: 9d e3 bf 50 save %sp, -176, %sp
    2c: b0 06 00 19 add %i0, %i1, %i0
    30: 81 cf e0 08 return %i7 + 8
    34: 91 3a 20 00 sra %o0, 0, %o0

    PowerPC64:
    0000000000000030 <.usubroutine>:
    30: 7c 63 22 14 add r3,r3,r4
    34: 78 63 00 20 clrldi r3,r3,32
    38: 4e 80 00 20 blr
    ...

    0000000000000048 <.isubroutine>:
    48: 7c 63 22 14 add r3,r3,r4
    4c: 7c 63 07 b4 extsw r3,r3
    50: 4e 80 00 20 blr

    RISC-V is in another land, where they effectively have
    no 32-bit operations, but rather a convention that all 32-bit inputs
    must be sign-extended in a 64-bit register.

    RISC-V has a number of sign-extending 32-bit instructions, and a
    calling convention to go with it.

    There seem to be the following options:

    Have no 32-bit instructions, and insert sign-extension or
    zero-extension instructions where necessary (or implicitly in all
    operands, as I outlined earlier). SPARC V9 and PowerPC64 seem to take
    this approach.

    Have 32-bit instructions that sign-extend: MIPS64, Alpha, and RV64.

    Have 32-bit instructions that zero-extend: AMD64 and ARM A64.

    Have 32-bit instructions that sign-extend and 32-bit instructions that zero-extend. No architecture that does that is known to me. It would
    be a good match for the SPARC-V9 and PowerPC64 calling convention.

    There is also one instruction set (ARM A64) that has special 32-bit sign-extension and zero-extension forms for some operands.

    And you can then adapt the calling convention to match the instruction
    set. For "no 32-bit instructions", garbage-extension seems to be the
    cheapest approach to me, but I expect that when SPARC-V9 and PowerPC64
    came on the market, there was enough C code with missing prototypes
    around that they preferred a more forgiving calling convention.

    If you pick ILP64 for your ABI, then you will get rid of almost all of
    these zero- and sign-extensions of 32-bit C and C++ code. It will just
    work. If you pick I32LP64, then you should have a full suite of 32-bit >operations and 64-bit operations, at least for all add, subtract, and
    compare operations.

    For compare, divide, shift-right and rotate, you either first need to sign/zero-extend the register, or you need 32-bit versions (possibly
    both signed and unsigned).

    And if you do I32LP64, your indexed addressing
    modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
    and 32-bit unsigned. That worked well for ARM64.

    It is certainly part of the way towards my idea of having sign- and zero-extended 32-bit operands for every operand of every instruction.

    It would be interesting to see how many sign-extensions and
    zero-extensions (whether explicit or implicitly part of the
    instruction) are executed in code that is generated from various C
    sources (with and without -fwrapv). I expect that it's highly
    dependent on the programming style. Sure there are types like pid_t
    where you have no choice, but in frequently occuring cases you can
    choose:

    for (i=0; i<n; i++) {
    ... a[i] ...
    }

    Here you can choose whether to define i as int, unsigned, long,
    unsigned long, size_t, etc. If you care for portability to 16-bit
    machines, size_t is a good idea here, otherwise long and unsigned long
    also are efficient. If n is unsigned, you can also choose unsigned,
    but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and
    PowerPC64 and Alpha).

    If n is int, you can also choose int, and there is actually enough
    information here to make the code efficient (even with -fwrapv),
    because in this code int overflow really cannot happen, but in code
    that's not much different from this one (e.g., using != instead of <),
    -fwrapv will result in an inserted sign extension on AMD64, and not
    using -fwrapv may result in unintended behaviour thanks to the
    compiler assuming that int overflow does not happen.

    ILP64 would have spared us all these considerations.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue Oct 7 18:01:25 2025
    From Newsgroup: comp.arch

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    kegs@provalid.com (Kent Dickey) writes:
    In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    int subroutine( int a, int b )
    {
    return a+b;
    }
    ...
    I tested this on AMD64, and did not find sign-extension in the caller, >>>neither with gcc-14 nor with clang-19; both produce the following code >>>for your example (with "subroutine" renamed into "subroutine1").

    0000000000000000 <subroutine1>:
    0: 8d 04 37 lea (%rdi,%rsi,1),%eax
    3: c3 ret
    ...
    AMD64 in hardware does 0 extension of 32-bit operations. From your
    example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
    the 64-bit register %rax will have 0's written into bits [63:32].
    So the AMD64 convention for 32-bit values in 64-bit registers is to >>zero-extend on writes. And to ignore the upper 32-bits on reads, so
    using a 64-bit register should use the %exx name.

    Interesting. At some point I got the impression that LEA produces a
    64-bit result, because it produces an address, but testing reveals
    that LEA has a 32-bit zero-extended variant indeed.

    Architecurally, any store to a 32-bit register (%e_x) will
    clear the high-order bits of of the 64-bit version of the
    register.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Oct 7 18:34:45 2025
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    kegs@provalid.com (Kent Dickey) writes:
    In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    int subroutine( int a, int b )
    {
    return a+b;
    }
    ------------------------------------------------------------

    RISC-V is in another land, where they effectively have
    no 32-bit operations, but rather a convention that all 32-bit inputs
    must be sign-extended in a 64-bit register.

    RISC-V has a number of sign-extending 32-bit instructions, and a
    calling convention to go with it.

    RISC-V has word sized integer arithmetic.

    There seem to be the following options:

    Have no 32-bit instructions, and insert sign-extension or
    zero-extension instructions where necessary (or implicitly in all
    operands, as I outlined earlier). SPARC V9 and PowerPC64 seem to take
    this approach.

    This was My 66000 between 2016 and two weeks ago.
    The cost is 4% growth in code footprint and similar perf degradation.

    Have 32-bit instructions that sign-extend: MIPS64, Alpha, and RV64.

    Have 32-bit instructions that zero-extend: AMD64 and ARM A64.

    Have 32-bit instructions that sign-extend and 32-bit instructions that zero-extend. No architecture that does that is known to me. It would
    be a good match for the SPARC-V9 and PowerPC64 calling convention.

    This is the starting point for My 66000 2.0:: integer arithmetic has
    size and signedness, with the property that all integer results have
    the 64-bit register <container> contain a range-limited result suit-
    able to the base-type of the calculation {no garbage in HoBs}.

    There is also one instruction set (ARM A64) that has special 32-bit sign-extension and zero-extension forms for some operands.

    And you can then adapt the calling convention to match the instruction
    set. For "no 32-bit instructions", garbage-extension seems to be the cheapest approach to me, but I expect that when SPARC-V9 and PowerPC64
    came on the market, there was enough C code with missing prototypes
    around that they preferred a more forgiving calling convention.

    If you pick ILP64 for your ABI, then you will get rid of almost all of >these zero- and sign-extensions of 32-bit C and C++ code. It will just >work. If you pick I32LP64, then you should have a full suite of 32-bit >operations and 64-bit operations, at least for all add, subtract, and >compare operations.

    For compare, divide, shift-right and rotate, you either first need to sign/zero-extend the register, or you need 32-bit versions (possibly
    both signed and unsigned).

    My 66000 CMP is signless--it compares two integer registers and delivers
    a bit vector of all possible comparisons {2 equality, 4 signed, 4 unsigned,
    4 range checks, [and in FP land 10-bits are the class of the RS1 operand]}

    My 66000 SL, SR can be used in extract form--and here you need no operand preparation if you only extract meaningful bits.

    My 66000 2.0 DIV has a size component to the calculation.

    And if you do I32LP64, your indexed addressing
    modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
    and 32-bit unsigned. That worked well for ARM64.

    It is certainly part of the way towards my idea of having sign- and zero-extended 32-bit operands for every operand of every instruction.

    Unnecessary if the integer calculation deliver properly range-limited
    64-bit results.

    It would be interesting to see how many sign-extensions and
    zero-extensions (whether explicit or implicitly part of the
    instruction) are executed in code that is generated from various C
    sources (with and without -fwrapv).

    In GNUPLOT is is just over 4% of instruction count for 64-bit-only
    integer calculations.

    I expect that it's highly
    dependent on the programming style. Sure there are types like pid_t
    where you have no choice, but in frequently occuring cases you can
    choose:

    for (i=0; i<n; i++) {
    ... a[i] ...
    }

    Here you can choose whether to define i as int, unsigned, long,
    unsigned long, size_t, etc. If you care for portability to 16-bit
    machines, size_t is a good idea here, otherwise long and unsigned long
    also are efficient.

    Counted for() loops are somewhat special in that it is quite easy to
    determine that the loop index never exceeds the range-limit of the
    container.

    If n is unsigned, you can also choose unsigned,
    but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and PowerPC64 and Alpha).

    Example please !?!

    If n is int, you can also choose int, and there is actually enough information here to make the code efficient (even with -fwrapv),
    because in this code int overflow really cannot happen,

    Consider the case where n is int64_t or uint64_t !?!

    Consider the C-preprocessor with::
    # define int (short int) // !!
    in scope.

    but in code
    that's not much different from this one (e.g., using != instead of <), -fwrapv will result in an inserted sign extension on AMD64, and not
    using -fwrapv may result in unintended behaviour thanks to the
    compiler assuming that int overflow does not happen.

    ILP64 would have spared us all these considerations.

    Agreed. I32LP64 is am abomination, especially if one is bothering to
    ty to keep the number of instructions down.

    - anton
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Tue Oct 7 12:20:08 2025
    From Newsgroup: comp.arch

    On 10/3/2025 12:55 PM, MitchAlsup wrote:

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

    On 10/2/2025 7:50 PM, MitchAlsup wrote:

    My 66000 2.0

    After 4-odd years of ISA stability, I ran into a case where
    I <pretty much> needed to change the instruction formats.
    And after bragging to Quadribloc about its stability--it
    reached the point where it was time to switch to version 2.0.

    Well, its time to eat crow.
    --------------------------------------------------------------
    Memory reference instructions already produce 64-bit values
    from Byte, HalfWord, Word and DoubleWord memory references
    in both Signed and unSigned flavors. These supports both
    integer and floating point due to the single register file.

    Essentially, I need that property in both integer and floating
    point calculations to eliminate instructions that merely apply
    value range constraints--just like memory !

    ISA 2.0 changes allows calculation instructions; both Integer
    and Floating Point; and a few other miscellaneous instructions
    (not so easily classified) the same uniformity.

    In all cases, an integer calculation produces a 64-bit value
    range limited to that of the {Sign}×{Size}--no garbage bits
    in the high parts of the registers--the register accurately
    represents the calculation as specified {Sign}×{Size}.

    I must be missing something. Suppose I have

    C := A + B

    where A and C are 16 bit signed integers and B is an 8 bit signed
    integer. As I understand what you are doing, loading B into a register
    will leave the high order 56 bits zero. But the add instruction will
    presumably be half word, so if B is negative, it will get an incorrect
    answer (because B is not sign extended to 16 bits).

    What am I missing?

    A is loaded as 16-bits properly sign to 64-bits: range[-32768..32767]
    B is loaded as 8-bits properly sign to 64-bits: range[-128..127]

    ADDSH Rc,Ra,Rb

    Adds 64-bit Ra and 64-bit Rb and then sign extends the result from bit<15>. The result is a properly signed 64-bit value: range [-32768..32767]

    First let me apologize, then admit my embarrassment. I didn't write
    what I intended to, and even if I did, it wouldn't have been correct.

    I had totally missed the issue of perhaps not extending result of an arithmetic operation to the full register width. I must admit that this
    never came up in the programming I have done, and I never considered it.
    But subsequent posts in this thread have explained the issue well, and
    so I learned something. Thanks to all!
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 7 19:09:25 2025
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    ...
    My 66000 CMP is signless--it compares two integer registers and delivers
    a bit vector of all possible comparisons {2 equality, 4 signed, 4 unsigned,
    4 range checks, [and in FP land 10-bits are the class of the RS1 operand]}

    With an 88000-style compare and a result register of 64 bits, you can
    spend 14 bits on 64-bit comparison, 14 bits on 32-bit comparison, 14
    bits on 16-bit comparison, and 14 bits on 8-bit comparison, and still
    have 8 bits left. What is a "range check" and why does it take 4
    bits?

    It is certainly part of the way towards my idea of having sign- and
    zero-extended 32-bit operands for every operand of every instruction.

    Unnecessary if the integer calculation deliver properly range-limited
    64-bit results.

    Sign- or zero extension will still be necessary for things like

    long a=...
    int b=a;
    ... c[b];

    With the extension in the operands, you do not need any extension
    instructions, not even for division, right-shift etc.

    The question, however, is if the extensions occur often enough to
    merit such features. I lean towards the SPARC/PowerPC/My 66000-v1
    approach here.

    It would be interesting to see how many sign-extensions and
    zero-extensions (whether explicit or implicitly part of the
    instruction) are executed in code that is generated from various C
    sources (with and without -fwrapv).

    In GNUPLOT is is just over 4% of instruction count for 64-bit-only
    integer calculations.

    Now what if you had a calling convention with garbage-extension? A
    number of extensions in your examples would go away.

    Counted for() loops are somewhat special in that it is quite easy to >determine that the loop index never exceeds the range-limit of the >container.

    There have been enough cases where such reasoning led to "optimizing"
    code into an infinite loop and other fallout of adversarial compilers.

    If n is unsigned, you can also choose unsigned,
    but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and
    PowerPC64 and Alpha).

    Example please !?!

    With a slightly different loop:

    long foo(long a[], unsigned l, unsigned h)
    {
    unsigned i;
    long r=0;
    for (i=l; i!=h; i++)
    r+=a[i];
    return r;
    }

    gcc-10 -O3 produces on RV64G:

    0000000000000000 <foo>:
    0: 872a mv a4,a0
    2: 4501 li a0,0
    4: 00c58c63 beq a1,a2,1c <.L4>

    0000000000000008 <.L3>:
    8: 02059793 slli a5,a1,0x20
    c: 83f5 srli a5,a5,0x1d
    e: 97ba add a5,a5,a4
    10: 639c ld a5,0(a5)
    12: 2585 addiw a1,a1,1
    14: 953e add a0,a0,a5
    16: feb619e3 bne a2,a1,8 <.L3>
    1a: 8082 ret

    000000000000001c <.L4>:
    1c: 8082 ret




    If n is int, you can also choose int, and there is actually enough
    information here to make the code efficient (even with -fwrapv),
    because in this code int overflow really cannot happen,

    Consider the case where n is int64_t or uint64_t !?!

    Then the first condition does not hold on I32LP64.

    Consider the C-preprocessor with::
    # define int (short int) // !!
    in scope.

    Then the compiler will see short int, and generate code accordingly.
    What's your point?

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2