Forum: War Ensemble BBS

Time to eat Crow

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 02:50:23 2025

From Newsgroup: comp.arch

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.
--------------------------------------------------------------
Integer instructions are now::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}
Although I am oscillating whether to support FP8 or FP128.

With this rearrangement of bit in the instruction formats, I
was able to get all Constant and routing control bits in the
same place and format in all {1, 2, and 3}-Operand instructions
uniformly. This simplifies <trifling> the Decoder, but more
importantly; the Operand delivery (and/or reception) mechanism.

I was also able to compress the 7 extended operation formats
into a single extended operation format. The instruction
format now looks like:

inst<31:26> Major OpCode
inst<20:16> {Rd, Cnd field}
inst<25:21> {SRC1, Rbase}
inst<15:10> {SH width, else, {I,d,Sign,Size}}
inst< 9: 6> {Minor OpCode, SRC3}
inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

So there is 1 uniformly positioned field of Minor OpCodes,
and one uniformly interpreted field of Operand Modifiers.
Operand Modifiers applies routing registers and inserting
of constants to XOP Instructions. --------------------------------------------------------------
So, what does this buy the Instruction Set ??

A) All integer calculations are performed at the size and
type of the result as required by the high level language::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
This, gets rid of all smash instructions across all data
types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

B) I actually gained 1 more extended OpCode for future expansion.

C) assembler/disassembler was simplified

D) and while I did not add any new 'instructions' I made those
already present more uniform and supporting of the requirements
of higher level languages (like ADA) and more suitable to the
stricter typing LLVM provides over GCC.

In some ways I 'doubled' the instruction count while not adding
a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
The elimination of 'smashes' shrinks the instruction count of
GNUPLOT by 4%--maybe a bit more once we sort out all of the
compiler patterns it needs to recognize. --------------------------------------------------------------
I wonder if crow tastes good in shepard's pie ?!?
--- Synchronet 3.21a-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Fri Oct 3 03:17:16 2025

From Newsgroup: comp.arch

On 2025-10-02 10:50 p.m., MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.
--------------------------------------------------------------
Integer instructions are now::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}
Although I am oscillating whether to support FP8 or FP128.

For my arch, I decided to support FP128 thinking that FP8 could be
implemented with lookup tables, given that eight bit floats tend to vary
in composition. Of course, I like more precision.
Could it be a build option? Or a bit in a control register to flip
between FP8 and FP128?

With this rearrangement of bit in the instruction formats, I
was able to get all Constant and routing control bits in the
same place and format in all {1, 2, and 3}-Operand instructions
uniformly. This simplifies <trifling> the Decoder, but more
importantly; the Operand delivery (and/or reception) mechanism.

I was also able to compress the 7 extended operation formats
into a single extended operation format. The instruction
format now looks like:

inst<31:26> Major OpCode
inst<20:16> {Rd, Cnd field}
inst<25:21> {SRC1, Rbase}
inst<15:10> {SH width, else, {I,d,Sign,Size}}
inst< 9: 6> {Minor OpCode, SRC3}
inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

Only four bits for SRC3?

So there is 1 uniformly positioned field of Minor OpCodes,
and one uniformly interpreted field of Operand Modifiers.
Operand Modifiers applies routing registers and inserting
of constants to XOP Instructions. --------------------------------------------------------------
So, what does this buy the Instruction Set ??

A) All integer calculations are performed at the size and
type of the result as required by the high level language::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
This, gets rid of all smash instructions across all data
types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

B) I actually gained 1 more extended OpCode for future expansion.

C) assembler/disassembler was simplified

D) and while I did not add any new 'instructions' I made those
already present more uniform and supporting of the requirements
of higher level languages (like ADA) and more suitable to the
stricter typing LLVM provides over GCC.

In some ways I 'doubled' the instruction count while not adding
a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
The elimination of 'smashes' shrinks the instruction count of
GNUPLOT by 4%--maybe a bit more once we sort out all of the
compiler patterns it needs to recognize. --------------------------------------------------------------
I wonder if crow tastes good in shepard's pie ?!?

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:33:36 2025

From Newsgroup: comp.arch

Robert Finch <robfi680@gmail.com> posted:

On 2025-10-02 10:50 p.m., MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.
--------------------------------------------------------------
Integer instructions are now::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}
Although I am oscillating whether to support FP8 or FP128.

For my arch, I decided to support FP128 thinking that FP8 could be implemented with lookup tables, given that eight bit floats tend to vary
in composition. Of course, I like more precision.
Could it be a build option? Or a bit in a control register to flip
between FP8 and FP128?

With this rearrangement of bit in the instruction formats, I
was able to get all Constant and routing control bits in the
same place and format in all {1, 2, and 3}-Operand instructions
uniformly. This simplifies <trifling> the Decoder, but more
importantly; the Operand delivery (and/or reception) mechanism.

I was also able to compress the 7 extended operation formats
into a single extended operation format. The instruction
format now looks like:

inst<31:26> Major OpCode
inst<20:16> {Rd, Cnd field}
inst<25:21> {SRC1, Rbase}
inst<15:10> {SH width, else, {I,d,Sign,Size}}
inst< 9: 6> {Minor OpCode, SRC3}
inst< 4: 0> {offset,SRC2,Rindex,1-OP×}

Only four bits for SRC3?

No, there are 5-bits--inst<9:5>--woops.

So there is 1 uniformly positioned field of Minor OpCodes,
and one uniformly interpreted field of Operand Modifiers.
Operand Modifiers applies routing registers and inserting
of constants to XOP Instructions. --------------------------------------------------------------
So, what does this buy the Instruction Set ??

A) All integer calculations are performed at the size and
type of the result as required by the high level language::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}.
This, gets rid of all smash instructions across all data
types. {smash == {sext, zext, ((x<<2^n)>>2^n), ...}

B) I actually gained 1 more extended OpCode for future expansion.

C) assembler/disassembler was simplified

D) and while I did not add any new 'instructions' I made those
already present more uniform and supporting of the requirements
of higher level languages (like ADA) and more suitable to the
stricter typing LLVM provides over GCC.

In some ways I 'doubled' the instruction count while not adding
a single instruction {spelling or field-pattern} to ISA. --------------------------------------------------------------
The elimination of 'smashes' shrinks the instruction count of
GNUPLOT by 4%--maybe a bit more once we sort out all of the
compiler patterns it needs to recognize. --------------------------------------------------------------
I wonder if crow tastes good in shepard's pie ?!?

--- Synchronet 3.21a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri Oct 3 12:40:17 2025

From Newsgroup: comp.arch

MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

Why? Compilers do not have any problem with this
as its been handled by overload resolution since forever.

Its people who have the problems following type changes and most
compilers will warn of mixed type operations for exactly that reason.

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.

Strongly typed languages don't natively support mixed type operations.
They come with a set of predefined operations for specific types that
produce specific results.

If YOU want operators/functions that allow mixed types then they force
you to define your own functions to perform your specific operations,
and it forces you to deal with the consequences of your type mixing.

All this does is force YOU, the programmer, to be explicit in your
definition and not depend on invisible compiler specific interpretations.

If you want to support Uns8 * Int8 then it forces you, the programmer,
to deal with the fact that this produces a signed 16-bit result
in the range -128*256..+127*256 = -32768..32512.
Now if you want to convert that result bit pattern to Uns8 by truncating
it to the lower 8 bits, or worse treat the result as Int8 and take
whatever random value falls in bit [7] as the sign, then that's on you.
They just force you to be explicit what you are doing.

--------------------------------------------------------------
Integer instructions are now::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow mixing.
Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
the normal single type arithmetic instructions.

Although I am oscillating whether to support FP8 or FP128.

The issue with FP8 support seems to be that everyone who wants it also
wants their own definition so no matter what you do, it will be unused.

The issue with FP128 seems associated with scaling on LD and ST
because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
And in the case of a combined int-float register file deciding whether
to expand all registers to 128 bits, or use 64-bit register pairs.
Using 128-bit registers raises the question of 128-bit integer support,
and using register pairs opens a whole new category of pair instructions.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Fri Oct 3 10:55:46 2025

From Newsgroup: comp.arch

On 10/2/2025 7:50 PM, MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

I must be missing something. Suppose I have

C := A + B

where A and C are 16 bit signed integers and B is an 8 bit signed
integer. As I understand what you are doing, loading B into a register
will leave the high order 56 bits zero. But the add instruction will presumably be half word, so if B is negative, it will get an incorrect
answer (because B is not sign extended to 16 bits).

What am I missing?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:25:25 2025

From Newsgroup: comp.arch

--------------------------------------------------------------
Integer instructions are now:: {Signed and unSigned}�{Byte, HalfWord, >> Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow mixing.

Not sure who's confused, but my reading of the above is not some sort of "mixing": I believe Mitch is just saying that his addition operation
(for example) can be specified to operate on either one of int8, uint8,
int16, uint16, ...
But that specification applies to all inputs and outputs of the
instruction, so it does not support adding an int8 to an int32, or other "mixes".

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 19:55:00 2025

From Newsgroup: comp.arch

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

On 10/2/2025 7:50 PM, MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

I must be missing something. Suppose I have

C := A + B

where A and C are 16 bit signed integers and B is an 8 bit signed
integer. As I understand what you are doing, loading B into a register
will leave the high order 56 bits zero. But the add instruction will presumably be half word, so if B is negative, it will get an incorrect answer (because B is not sign extended to 16 bits).

What am I missing?

A is loaded as 16-bits properly sign to 64-bits: range[-32768..32767]
B is loaded as 8-bits properly sign to 64-bits: range[-128..127]

ADDSH Rc,Ra,Rb

Adds 64-bit Ra and 64-bit Rb and then sign extends the result from bit<15>. The result is a properly signed 64-bit value: range [-32768..32767]

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Oct 3 20:47:08 2025

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow.
--------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

Why? Compilers do not have any problem with this
as its been handled by overload resolution since forever.

A non-My66000 example:

int add (int a, int b)
{
return a + b;
}

is translated on powerpc64le-unknown-linux-gnu (with -O3 to)

add 3,3,4
extsw 3,3
blr

extsw fills the 32 high-value bits with because numbers returned
in registers have to be correct, either as 32- or 64-bit values.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Oct 3 21:04:16 2025

From Newsgroup: comp.arch

Stefan Monnier <monnier@iro.umontreal.ca> schrieb:

--------------------------------------------------------------
Integer instructions are now:: {Signed and unSigned}×{Byte, HalfWord, >>> Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow mixing.

Not sure who's confused, but my reading of the above is not some sort of "mixing": I believe Mitch is just saying that his addition operation
(for example) can be specified to operate on either one of int8, uint8, int16, uint16, ...
But that specification applies to all inputs and outputs of the
instruction, so it does not support adding an int8 to an int32, or other "mixes".

The outputs are correctly extended to a 64-bit number (signed or
unsigned) so it is possible to pass results to wider operations
without conversion.

One example would be

unsigned long foo (unsigned int a, unsigned int b)
{
return a + b;
}

which would need an adjustment after the add, and which would
just be somethign like

adduw r1,r1,r2
ret

using Mitch's new encoding.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 21:36:07 2025

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow. --------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

Why? Compilers do not have any problem with this
as its been handled by overload resolution since forever.

LLVM compiles C with stricter typing than GCC resulting in a lot
of smashes:: For example::

int subroutine( int a, int b )
{
return a+b;
}

Compiles into:

subroutine:
ADD R1,R1,R2
SRA R1,R1,<32,0> // limit result to (int)
RET

LLVM thinks the smash is required because [-2^31..+2^31-1] +
[-2^31..+2^31-1] does not always fit into [-2^31..+2^31-1] !!!
and chasing down all the cases is harder than the compiler is
ready to do. At first I though that the Value propagation in
LLVM would find that the vast majority of arithmetic does not
need smashing. This proved frustrating to both myself and to
Brian. The more I read RSIC-V and ARM assembly code, the more
I realized that adding sized integer arithmetic is the only
way to get through to the LLVM infrastructure.

We (the My 66000 team; mostly me and Brian) have been trying to
obey the stricter than necessary typing of LLVM and achieve the
code density possible as if K&R rules were in play with 64-bit
only (int)s.

RISC-V has ADDW (but no ADDH or ADDB) to alleviate the issue on
a majority of calculations. ARM has word sized Registers to
alleviate the issue. Since ARM started as 32-bits ADDW is natural.
I am exploring how to provide integer arithmetic such that smashing
never has to happen.

We have been chasing smashes for 9 months making little progress...

Its people who have the problems following type changes and most
compilers will warn of mixed type operations for exactly that reason.

It is more the ADA problem that values must fit in containers--that
is values have a range {min..max} and that calculated values outside
of that range are to be "addressed".

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.

Strongly typed languages don't natively support mixed type operations.
They come with a set of predefined operations for specific types that
produce specific results.

Yes, indeed, and this is what I am providing: {Sign}×{Size} calculations. Where the result is known to be range lmited to {Sign}×{Size}. Thus:

ADDSH R7,R8,R9

R7 is range limited {Signed}×{HalfWord} == [-32768..+32767] ------------------------------------------------------------------------
So let's look at some egregious cases::

cvtds r2,r2 // convert double to signed 64
srl r3,r2,#0,#32 // convert signed 64 to signed 32
--------
sra r1,r23,#0,#32 // smash to signed 32
sra r2,r20,#0,#32 // smash to signed 32
maxs r23,r2,r1 // max of signed 32
--------
ldd r24,[r24] // LD signed 64
add r1,r28,#1 // innocently add #1
sra r28,r1,#0,#32 // smash to Signed 32
cmp r1,r28,r16 // to match the other operand of CMP --------
call strspn
srl r2,r1,#0,#32 // smash result Signed 32
add r1,r25,-r1
sra r1,r1,#0,#32 // smash Signed 32
cmp r2,r19,r2
srl r2,r2,#2,#1
add r21,r21,r2 // add Bool to Signed 32
sra r2,r20,#0,#32 // smash Signed 32
maxs r20,r1,r2 // MAX Signed 32
--------
mov r1,r29 // Signed 64
ple0 r17,FFFFFFF // ignore
stw r17,[ip,key_rows] // ignore
add r1,r29,#-1 // innocent subtract
sra r1,r1,#0,#32 // smash to Signed 32
divs r1,r1,r17 // DIV Signed 32
--------
lduw r2,[ip,keyT+4]
add r2,r2,#-1 // innocent subtract
srl r2,r2,#0,#32 // smash to unSigned 32
cmp r3,r2,#1 // CMP unSigned 32
// even though CMP is Signless
--------
add r1,r19,-r6 // not so innocent subtract
sra r2,r1,#0,#32 // Signed
srl r1,r1,#0,#32 // unSigned
// only one of these can be eliminated
--------

If YOU want operators/functions that allow mixed types then they force
you to define your own functions to perform your specific operations,
and it forces you to deal with the consequences of your type mixing.

All this does is force YOU, the programmer, to be explicit in your
definition and not depend on invisible compiler specific interpretations.

If you want to support Uns8 * Int8 then it forces you, the programmer,
to deal with the fact that this produces a signed 16-bit result
in the range -128*256..+127*256 = -32768..32512.

Uns8 occupies 64-bits in a register range-limited to [0..255]
Int8 occupies 64-bits in a register range-limited to [-128..127]
So, integer values sitting in registers occupy the whole 64-bits
but are properly range-limited to base-type.

Multiply multiplies 2×64-bit registers and produces a 128-bit
result, since CARRY is not in effect, the bits<127..64> are
discarded; bits<63..0> are then considered.

unSigned results simply discard bits more significant than base-type.
Signed results raise OVERFLOW is there is more significance than
base-type (and if enabled take an exception).
In all cases, the result delivered fits within the range of base-type.

So, in the case you mention::

LDUB R8,[---]
LDSB R9,[---]
MULSH R7,R8,R9 // result range [-32768..32767]
-----
MULUH R7,R8,R9 // result range [0..65535]

Now if you want to convert that result bit pattern to Uns8 by truncating
it to the lower 8 bits,

MULUB R7,R8,R9 // result range [0..255]

or worse treat the result as Int8 and take
whatever random value falls in bit [7] as the sign, then that's on you.

MULSB R7,R8,R9 // result range [-128..127] or OVERFLOW

Personally, I prefer range checks that raise OVERFLOW.

They just force you to be explicit what you are doing.

--------------------------------------------------------------
Integer instructions are now::
{Signed and unSigned}×{Byte, HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.

RISC-V and ARM LLVM compilers already do this and use it to eliminate
smashes. RISC-V is limited to WORD, ARM uses registers of WORD size.
Both eliminate smashes. Since there are already LLVM compilers using
this (to eliminate smashes) it should be not terribly difficult to add.

On the other hand:: ILP64 ALSO gets rid of the problem (at a different cost).

Strong typed languages don't have predefined operators that allow mixing. Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
the normal single type arithmetic instructions.

Although I am oscillating whether to support FP8 or FP128.

The issue with FP8 support seems to be that everyone who wants it also
wants their own definition so no matter what you do, it will be unused.

Thank you for your input.

The issue with FP128 seems associated with scaling on LD and ST
because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
And in the case of a combined int-float register file deciding whether
to expand all registers to 128 bits, or use 64-bit register pairs.

My position is that people want 64-bit registers and ISA that allow
reasonably easy and efficient access to 128-bits, CARRY provides this.
But the architecture is not cut out to be a big 128-bit number cruncher; occasional sure, but all the time, no.

Using 128-bit registers raises the question of 128-bit integer support,
and using register pairs opens a whole new category of pair instructions.

CARRY supports this.
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 04:56:21 2025

From Newsgroup: comp.arch

On 10/3/2025 4:04 PM, Thomas Koenig wrote:

Stefan Monnier <monnier@iro.umontreal.ca> schrieb:

--------------------------------------------------------------
Integer instructions are now:: {Signed and unSigned}×{Byte, HalfWord,
Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow mixing. >>

Not sure who's confused, but my reading of the above is not some sort of
"mixing": I believe Mitch is just saying that his addition operation
(for example) can be specified to operate on either one of int8, uint8,
int16, uint16, ...
But that specification applies to all inputs and outputs of the
instruction, so it does not support adding an int8 to an int32, or other
"mixes".

The outputs are correctly extended to a 64-bit number (signed or
unsigned) so it is possible to pass results to wider operations
without conversion.

One example would be

unsigned long foo (unsigned int a, unsigned int b)
{
return a + b;
}

which would need an adjustment after the add, and which would
just be somethign like

adduw r1,r1,r2
ret

using Mitch's new encoding.

Yes.

Sign extend signed types, zero extend unsigned types.
Up-conversion is free.

This is something the RISC-V people got wrong IMO, and adding a bunch of
".UW" instructions in an attempt to patch over it is just kinda ugly.

Partly for my own uses revived ADDWU and SUBWU (which had been dropped
in BitManip), because these are less bad than the alternative.

I get annoyed that new extensions keep trying to add ever more ".UW" instructions rather than just having the compiler go over to
zero-extended unsigned and make this whole mess go away.

...

Ironically, the number of new instructions being added to my own ISA has mostly died off recently, largely because there is little particularly relevant to add at this point (within the realm of stuff that could be
added).

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 04:57:23 2025

From Newsgroup: comp.arch

On 10/3/2025 11:40 AM, EricP wrote:

MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it reached the
point where it was time to switch to version 2.0.

Well, its time to eat crow.
--------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both integer and
floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply value
range constraints--just like memory !

Why? Compilers do not have any problem with this
as its been handled by overload resolution since forever.

Its people who have the problems following type changes and most
compilers will warn of mixed type operations for exactly that reason.

ISA 2.0 changes allows calculation instructions; both Integer and
Floating Point; and a few other miscellaneous instructions (not so
easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

Integer and floating point compare instructions only compare
bits of the specified {Size}.

Conversions between integer and floating point are now also
governed by {Size} so one can directly convert FP64 directly
into {unSigned}×{Int16}--more fully supporting strongly typed
languages.

Strongly typed languages don't natively support mixed type operations.
They come with a set of predefined operations for specific types that
produce specific results.

If YOU want operators/functions that allow mixed types then they force
you to define your own functions to perform your specific operations,
and it forces you to deal with the consequences of your type mixing.

All this does is force YOU, the programmer, to be explicit in your
definition and not depend on invisible compiler specific interpretations.

If you want to support Uns8 * Int8 then it forces you, the programmer,
to deal with the fact that this produces a signed 16-bit result
in the range -128*256..+127*256 = -32768..32512.
Now if you want to convert that result bit pattern to Uns8 by truncating
it to the lower 8 bits, or worse treat the result as Int8 and take
whatever random value falls in bit [7] as the sign, then that's on you.
They just force you to be explicit what you are doing.

--------------------------------------------------------------
Integer instructions are now:: {Signed and unSigned}×{Byte,
HalfWord, Word, DoubleWord}
while FP instructions are now:
{Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow mixing. Weak typed languages deal with this in overload resolution and by having predefined invisible type conversions in those operators and then using
the normal single type arithmetic instructions.

Although I am oscillating whether to support FP8 or FP128.

The issue with FP8 support seems to be that everyone who wants it also
wants their own definition so no matter what you do, it will be unused.

The issue with FP128 seems associated with scaling on LD and ST
because now scaling is 1,2,4,8,16 which adds 1 bit to the scale field.
And in the case of a combined int-float register file deciding whether
to expand all registers to 128 bits, or use 64-bit register pairs.
Using 128-bit registers raises the question of 128-bit integer support,
and using register pairs opens a whole new category of pair instructions.

I generally went with register pairs...

Where, say, for base types:
8-bits: Rarely big enough
16-bits: Sometimes big enough
32-bits: Usually big enough
64-bits: Almost always big enough

Vector types:
2x: Good
4x: Better
8x: Rarely Needed

For a scalar type, the high 64 bits of a 128-bit register would be
almost always wasted, so it isn't worthwhile to spend resources on
things that are mostly just going to waste.

At least with 64-bit registers, they cover:
Integer values: Usually overkill
'int' is far more common than 'long long'.
Floating Point: Usually Optimal
Binary64 is almost always good.
Binary32 is frequently insufficient.
2x Binary32 and 4x Binary16: OK

Then, 128-bit as pairs:
Deals with the occasional 128-bit vector and integer;
Avoids wasting resources all the times we don't need it.

Well, since computation isn't exactly a gas that expands to efficiently utilize the register size (going bigger = diminishing returns).

If the CPU is superscalar, can use 2x64b lanes for the 128-bit path, ...

As for Binary128:
Infrequently used;
Too expensive for direct hardware support;
So, ended up adding a trap-only support;
Trap-only allows it to exist without also eating the FPGA.

As for FP8:
There are multiple formats in use:
S.E3.M4: Bias=7 (Quats / Unit Vectors)
S.E3.M4: Bias=8 (Audio)
S.E4.M3: Bias=7 (NN's)
E4.M4: Bias=7 (HDR images)

Then, for 16-bit:
S.E5.M10: Generic, Graphics Processing, Sometimes 3D Geometry
Sometimes not enough dynamic range.
S.E8.M7: NNs
Usually not enough precision.

It is likely the more optimal 16-bit format might actually be S.E6.M9,
but this is non-standard.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Oct 4 12:37:18 2025

From Newsgroup: comp.arch

Stephen Fuld wrote:

On 10/2/2025 7:50 PM, MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow.
--------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}Ã—{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}Ã—{Size}.

I must be missing something. Suppose I have

C := A + B

where A and C are 16 bit signed integers and B is an 8 bit signed
integer. As I understand what you are doing, loading B into a register will leave the high order 56 bits zero. But the add instruction will presumably be half word, so if B is negative, it will get an incorrect > answer (because B is not sign extended to 16 bits).

What am I missing?

I am pretty sure A would be sign extended to 64 bit on load and the same for B, from 8->64 bits, at which point the addition works as it should?
When storing a 64-bit result as a 16-bit signed integer, the cpu can
verify that the top 48 bits are either all 1 or all 0.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 4 10:17:41 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

LLVM compiles C with stricter typing than GCC resulting in a lot
of smashes:: For example::

int subroutine( int a, int b )
{
return a+b;
}

Compiles into:

subroutine:
ADD R1,R1,R2
SRA R1,R1,<32,0> // limit result to (int)
RET

I tested this on AMD64, and did not find sign-extension in the caller,
neither with gcc-14 nor with clang-19; both produce the following code
for your example (with "subroutine" renamed into "subroutine1").

0000000000000000 <subroutine1>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 ret

It's not about strict or lax typing, it's about what the calling
convention promises about types that are smaller than a machine word.
If the calling convention requires/guarantees that ints are
sign-extended, the compiler must use instructions that produce a
sign-extended result. If the calling convention guarantees that ints
are zero-extended (sounds perverse, but RV64 has the guarantee that
unsigned is passed in sign-extended form, which is equally perverse),
then the compiler must use instructions that produce a zero-extended
result (e.g., AMD64's addl). If the calling convention only requires
and guarantees the low-order 32 bits (I call this garbage-extended),
then the compiler can use instructions that perform 64-bit adds; this
is what we are seeing above.

The other side of the medal is what is needed at the caller: If the
caller needs to cconvert a sign-extended int into a long, it does not
have to do anything. If it needs to convert a zero-extended or garbage-extended int into a long, it has to sign-extend the value.

I have tested this with:

int subroutine2(int,int);

long subroutine3(int a,int b)
{
return subroutine2(a,b);
}

On AMD64 the result is:

gcc-14:
0000000000000010 <subroutine3>:
10: 48 83 ec 08 sub $0x8,%rsp
14: e8 00 00 00 00 call 19 <subroutine3+0x9>
19: 48 83 c4 08 add $0x8,%rsp
1d: 48 98 cltq
1f: c3 ret

clang-19:
0000000000000010 <subroutine3>:
10: 50 push %rax
11: e8 00 00 00 00 call 16 <subroutine3+0x6>
16: 48 98 cltq
18: 59 pop %rcx
19: c3 ret

The compilers introduce the sign-extension CLTQ because the result of
the call is not sign-extended. For parameter passing, it's the same:

int subroutine4(long,long);

long subroutine5(int a,int b)
{
return subroutine4(a,b);
}

0000000000000020 <subroutine5>:
20: 48 83 ec 08 sub $0x8,%rsp
24: 48 63 f6 movslq %esi,%rsi
27: 48 63 ff movslq %edi,%rdi
2a: e8 00 00 00 00 call 2f <subroutine5+0xf>
2f: 48 83 c4 08 add $0x8,%rsp
33: 48 98 cltq
35: c3 ret
0000000000000020 <subroutine5>:
20: 50 push %rax
21: 48 63 ff movslq %edi,%rdi
24: 48 63 f6 movslq %esi,%rsi
27: e8 00 00 00 00 call 2c <subroutine5+0xc>
2c: 48 98 cltq
2e: 59 pop %rcx
2f: c3 ret

BTW, In C as it was originally conceived, that was not an issue,
because int occupied a complete register and all smaller types are
converted to ints. The I32LP64 mistake has required to insert a lot
of sign-extensions (and C compiler writers embrace undefined behaviour
to avoid that in some cases).

Another mistake we see in this example is the 16-byte alignment
requirement of SSEx. It results in the RSP adjustments around the
call. If only AMD had decided to support unaligned SSEx memory
accesses by default in 64-bit mode.

LLVM thinks the smash is required because [-2^31..+2^31-1] +
[-2^31..+2^31-1] does not always fit into [-2^31..+2^31-1] !!!
and chasing down all the cases is harder than the compiler is
ready to do.

In your example, there is nothing to chase down, because subroutine()
can be called from anywhere.

At first I though that the Value propagation in
LLVM would find that the vast majority of arithmetic does not
need smashing. This proved frustrating to both myself and to
Brian. The more I read RSIC-V and ARM assembly code, the more
I realized that adding sized integer arithmetic is the only
way to get through to the LLVM infrastructure.

You might try changing the calling convention for int to
garbage-extended. It can introduce sign or zero extension elsewhere,
but maybe fewer than otherwise.

RISC-V has ADDW (but no ADDH or ADDB) to alleviate the issue on
a majority of calculations.

That's an RV64 extension. RV32 does not have ADDW.

ARM has word sized Registers to
alleviate the issue. Since ARM started as 32-bits ADDW is natural.

Not at all. ARM A64 is a completely new instruction set that has at
least as much in common with PowerPC as with ARM A32 or ARM T32. I
expect that they would not have added the 32-bit ADDW or the
addressing modes with sign- or zero-extended 32-bit indexes if the
MIPS and Alpha people had not made the I32LP64 mistake. Instead, they
would have used the encoding space for more useful things.

I am exploring how to provide integer arithmetic such that smashing
never has to happen.

If you want to avoid every use of a separate sign-extension or
zero-extension instruction, add three bits to every source-register
specifier: 2 bits for the input size (1,2,4,8 bytes), 1 for
signed/unsigned. Once you have that, there is no need to extend to
result: you always can perform the extension on input to the use of a
result; the natural calling convention to go along with that is to garbage-extend.

I don't think that extension instructions are frequent enough to merit
going to such lengths. I actually think that the RISC-V people made
the wrong choice here, contrary to their usual stance. Instead of
having sign-extension as a separate instruction (like zero-extension),
they added it to a number of integer instructions, inflating the
number of instructions for little benefit.

So let's look at some egregious cases::

cvtds r2,r2 // convert double to signed 64
srl r3,r2,#0,#32 // convert signed 64 to signed 32

unsigned?

--------
sra r1,r23,#0,#32 // smash to signed 32
sra r2,r20,#0,#32 // smash to signed 32
maxs r23,r2,r1 // max of signed 32

With garbage-extension, you need a 32-bit maxs or sign-extend the
operands. But you are sign-extended; why do you need it?

Such things are not necessary with garbage-extension for add, sub,
mul, and, or xor, i.e., the most common operations.

--------
ldd r24,[r24] // LD signed 64
add r1,r28,#1 // innocently add #1
sra r28,r1,#0,#32 // smash to Signed 32
cmp r1,r28,r16 // to match the other operand of CMP

Similar to the maxs case.

--------
call strspn
srl r2,r1,#0,#32 // smash result Signed 32
add r1,r25,-r1
sra r1,r1,#0,#32 // smash Signed 32
cmp r2,r19,r2
srl r2,r2,#2,#1
add r21,r21,r2 // add Bool to Signed 32
sra r2,r20,#0,#32 // smash Signed 32
maxs r20,r1,r2 // MAX Signed 32

Maybe the right way here is to use size_t for the variable where you
put the return value (strspn() returns a size_t).

--------
mov r1,r29 // Signed 64
ple0 r17,FFFFFFF // ignore
stw r17,[ip,key_rows] // ignore
add r1,r29,#-1 // innocent subtract
sra r1,r1,#0,#32 // smash to Signed 32
divs r1,r1,r17 // DIV Signed 32

Division is one of the operations where garbage-extended input is not
ok; but fortunately it is rare.

I doubt any compilers will use this feature.

RISC-V and ARM LLVM compilers already do this and use it to eliminate >smashes.

Shortly after we got our first Alphas in 1995, I saw DEC's C compiler
produce lots of explicit sign-extensions (using the addl instruction)
of both int operands and int results. In later years they got the
compiler to emit many fewer sign-extensions. I don't remember seeing
that many sign extensions on Alpha from gcc, ever, so apparently they
already kept track of the extension status of a value at the time.

On the other hand:: ILP64 ALSO gets rid of the problem (at a different cost).

Exactly. If the I32LP64 mistake had not been made, we would have been
spared a lot (not just extension instructions). But for ARM A64 and
RV64, they have to adapt to the world as it is, not as it should be,
and unfortunately that means I32LP64. For MY66000, it's your call, of
course.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 11:52:22 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

If you are not familiar with them, they are:

- INTEGER takes up one storage unit
- REAL takes up one storage unit
- DOUBLE PRECISION takes up two storage units

where storage units are implementation-defined. Also consider
that 32-bit REALs and 64-bit REALs are both useful and needed,
and that (unofficially) C's integers were identical to
FORTRAN's INTEGER.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 4 16:11:37 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

I am not familiar enough with FORTRAN to give a recommendation on
that. However, two observations:

* The Cray-1 is primarily a Fortran machine, and it's C implementation
is ILP64, and it is successful. So obviously an ILP64 C can live
fine with FORTRAN.

* Whatever inconvenience ILP64 would have caused to Fortran
implementors is small compared to the cost in performance and
reliability that I32LP64 has cost in the C world and the cost in
encoding space (and thus code size) and implementation effort and
transistors (probably not that many, but still) that it is costing
all customers of 64-bit processors.

If you are not familiar with them, they are:

- INTEGER takes up one storage unit
- REAL takes up one storage unit
- DOUBLE PRECISION takes up two storage units

where storage units are implementation-defined. Also consider
that 32-bit REALs and 64-bit REALs are both useful and needed,
and that (unofficially) C's integers were identical to
FORTRAN's INTEGER.

And unofficially C's integers were as long as pointers (with a legacy
reaching back to BCPL). If I had to choose between breaking an
unofficial FORTRAN-C interface tradition and a C-internal tradition, I
would choose the C-internal tradition every time.

There are two other languages that I have thought about:

Java was introduced with fixed-size 32-bit int and 64-bit long, and
with references typically having the size of a machine word. The
choice of "int" and "long" may be due to I32LP64, and if the C people
had gone for ILP64, the Java people might have chosen different names.
But given their goal of write-once-run-everywhere with bit-identical
results, they probably did not want to provide a machine-word-sized
integer type. Java became popular when 32-bit machines were still a
thing for running Java, so there would be lots of Java around that
uses the 32-bit integer type. Given the large amount of Java code,
that alone might be enough to make computer architects want to add
special architectural support for signed 32-bit integers. At least we
would have been spared architectural support for unsigned 32-bit
integers.

AFAIK Rust does not have a machine-word-sized integer type; instead,
each type has its size in its name (e.g., i32, u64). Given that Rust
was designed recently, that does not lead to portability problems yet:
On servers, desktops (and recently smartphones) machine words are only
64 bits, so if you write for that, you can just use i64 and u64, and
your software will be efficient (or you can use smaller integers, and
unless you store a lot of them, your software will be inefficient on
various machines thanks to sign or zero extension). If you program on
an embedded system, the code probably won't be ported to a machine
with a different word size, so again, choosing the integer types that
match the word size is a good choice. If there is ever a transition
to 128-bit machines, I expect that the Rust approach will backfire,
but who knows if Rust will still be in significant use by then. If it
is, it may result in costs like I32LP64 is causing now.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sat Oct 4 20:44:37 2025

From Newsgroup: comp.arch

On Sat, 04 Oct 2025 16:11:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

AFAIK Rust does not have a machine-word-sized integer type; instead,
each type has its size in its name (e.g., i32, u64).

Rust has machine-dependent isize and usize types, identical to ptrdiff_t
and size_t in C.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sat Oct 4 20:51:43 2025

From Newsgroup: comp.arch

On Sat, 04 Oct 2025 16:11:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

I am not familiar enough with FORTRAN to give a recommendation on
that. However, two observations:

* The Cray-1 is primarily a Fortran machine, and it's C implementation
is ILP64, and it is successful. So obviously an ILP64 C can live
fine with FORTRAN.

I would guess that Cray-1 FORTRAN was not 100% conformant to FORTRAN 77 standard. And they likely didn't care.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Oct 4 18:01:59 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

FORTRAN INTEGER == INT32_T

allowing ILP64.

If you are not familiar with them, they are:

- INTEGER takes up one storage unit
- REAL takes up one storage unit
- DOUBLE PRECISION takes up two storage units

where storage units are implementation-defined. Also consider
that 32-bit REALs and 64-bit REALs are both useful and needed,
and that (unofficially) C's integers were identical to
FORTRAN's INTEGER.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Oct 4 18:05:18 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

I am not familiar enough with FORTRAN to give a recommendation on
that. However, two observations:

* The Cray-1 is primarily a Fortran machine, and it's C implementation
is ILP64, and it is successful. So obviously an ILP64 C can live
fine with FORTRAN.

* Whatever inconvenience ILP64 would have caused to Fortran
implementors is small compared to the cost in performance and
reliability that I32LP64 has cost in the C world and the cost in
encoding space (and thus code size) and implementation effort and
transistors (probably not that many, but still) that it is costing
all customers of 64-bit processors.

If you are not familiar with them, they are:

- INTEGER takes up one storage unit
- REAL takes up one storage unit
- DOUBLE PRECISION takes up two storage units

where storage units are implementation-defined. Also consider
that 32-bit REALs and 64-bit REALs are both useful and needed,
and that (unofficially) C's integers were identical to
FORTRAN's INTEGER.

And unofficially C's integers were as long as pointers (with a legacy reaching back to BCPL). If I had to choose between breaking an
unofficial FORTRAN-C interface tradition and a C-internal tradition, I
would choose the C-internal tradition every time.

There is a quote from K&R C that states int is the most efficient
form for computing integer arithmetic values.

With the demand for int to remain 32-bits and the countering demand
of LLVM to obey typing, int no longer obeys its original stated goal.

- anton

--- Synchronet 3.21a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sat Oct 4 14:42:25 2025

From Newsgroup: comp.arch

Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow.
--------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

Why? Compilers do not have any problem with this
as its been handled by overload resolution since forever.

A non-My66000 example:

int add (int a, int b)
{
return a + b;
}

is translated on powerpc64le-unknown-linux-gnu (with -O3 to)

add 3,3,4
extsw 3,3
blr

extsw fills the 32 high-value bits with because numbers returned
in registers have to be correct, either as 32- or 64-bit values.

Ok I see what's going on - the reference to strong typing got me
thinking this was about operand type matching.

Above it is treating integer arguments and return types that are
smaller than full register width, and presumably short and char also,
as modulo (wrapping) data types and converting them to canonical
form by sign or zero extension. That avoids later problems in compare operations where the low order bits match but high order bits differ.

A strong typed language would have a separate data types for signed
and unsigned linear integers, signed and unsigned modulo integers.
The sign/zero extend for modulo result types would mask any overflow
and prevent proper result overflow checking.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 18:55:05 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

I am not familiar enough with FORTRAN to give a recommendation on
that. However, two observations:

* The Cray-1 is primarily a Fortran machine, and it's C implementation
is ILP64, and it is successful. So obviously an ILP64 C can live
fine with FORTRAN.

As you may know, the Cray-1 was a very special machine, which got
away with a lot of idiosyncracies because it was blindingly fast
(and caused users a lot of trouble with conversion between DOUBLE
PRECISION and REAL).

But that was in the late 1970s. By the time the 64-bit worksations
were being designed, REAL was firmly established as 32-bit and
DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
and the very 32-bit workstations that the 64-bit workstations were
supposed to replace.

* Whatever inconvenience ILP64 would have caused to Fortran
implementors is small compared to the cost in performance and
reliability that I32LP64 has cost in the C world and the cost in
encoding space (and thus code size) and implementation effort and
transistors (probably not that many, but still) that it is costing
all customers of 64-bit processors.

A 64-bit REAL and (consequently) a 128-bit DOUBLE PRECISION
would have made the 64-bit workstaions pretty much unusable for
scientific use, and a lot of these were aimed at the technical
and scientific market, and that meant FORTRAN.

So, put yourself into the shoes of the people designing workstations
RS4000 they could allow their scientific and technical customers
to use the same codes "as is", with no conversion, or tell them
they cannot use 32-bit REAL any more, and that they need to rewrite
all their software.

What would they have expected their customers to do? Buy a system
which forces them to do this, or buy a competitor's system where
they can just recompile their software?

You're always harping about how compilers should be bug-comptatible
to previous releases. Well, that would have been the mother of
all incompatiblities, aka business suicide.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 16:04:54 2025

From Newsgroup: comp.arch

On 10/4/2025 12:44 PM, Michael S wrote:

On Sat, 04 Oct 2025 16:11:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

AFAIK Rust does not have a machine-word-sized integer type; instead,
each type has its size in its name (e.g., i32, u64).

Rust has machine-dependent isize and usize types, identical to ptrdiff_t
and size_t in C.

I guess, if starting clean slate (in a from-scratch language), it might
make sense to have:
A range of defined fixed sizes;
A range of types whose size is a product of various machine constraints.

So, say:
u8/u16/u32/u64/u128 //Unsigned, fixed size, default endian
s8/s16/s32/s64/s128 //Signed, fixed size, default endian
u8l/u16l/u32l/u64l/u128l //Unsigned, fixed size, little endian
s8l/s16l/s32l/s64l/s128l //Signed, fixed size, little endian
u8b/u16b/u32b/u64b/u128b //Unsigned, fixed size, big endian
s8b/s16b/s32b/s64b/s128b //Signed, fixed size, big endian
u8l/s8l/u8b/s8b: Technically redundant with u8/s8, but added for
consistency.

i8/i16/i32/i64/i128, could also make sense.
Could also have sbit(N) and ubit(n) which specify exact width types, but otherwise behave like the normal integer types. The power-of-2 sizes
could be seen as mostly equivalent to the fixed-size types.

Floating point types:
f16/f32/f64/f128
f8/f8a/f8u/...: Assortment of 8-bit types.
Since no one-size-fits-all with FP8.
(Maybe also with f*l and f*b variants?).

Machine constraint-sized types:
sasize/uasize: Size for arrays and similar
spsize/upsize: Size for pointers and pointer differences
sfsize/ufsize: Size for file offsets
int: default 'fast' size (32 or 64 bits)
long: default 'large but fast' size (64 or 128 bits)
Would be 64 if machine only has 64 bit ALU operations;
Would be 128 if machine has a 128-bit ALU available.
intmul: Whichever size allows the fastest integer MUL or MAC.
More likely to be 16 or 32 bits.
...

Special types:
void: No Type, pointers may freely convert to other types
m8: Like void, but with a defined size, but no operators.
m8 could be assumed the default type for raw memory buffers.
m8 pointers may be freely cast to/from other pointer types.
m16/m32/m64/m128: Has size but no defined operators.
Casts involving these types will be bit-preserving.
Size-mismatched casts will not be allowed.

May use slightly different type promotion rules from C, for integer types:
Td = Ts OP Tt
If the range of Td is greater or equal to (Ts OP Tt)
then promote to the wider of the two;
(Ts OP Tt)
Promotes by default to the wider of Ts or Tt.
If a signed/unsigned mismatch of same size or smaller signed type,
promote to the next larger signed type.
(Note: NOT the "same sized unsigned" as C would use).
If the range of Td is less than (Ts OP Tt)
If the result will be the same either way,
promote to most efficient type to carry out operation
Or, use Td if doing so is efficient.
Narrow result if needed
Td narrower than intermediate type.
Else, promote to type of (Ts OP Tt), and narrow result.

In this case, the types may flow-out from the inputs and operators, but
also flow-in from the destination type. Usually C lacks the flowing-in
part, but it is relevant for efficient code generation.

Note that the inward flow may happen recursively, where if Td promotion
is used for an outward expression, the two sub-expressions may be
re-evaluated in light of 'Td' as the destination type (vs merely the
result of the input expressions).

Unlike C, would still apply the same promotion behavior to 8 and 16 bit
types as for wider types (so, there is no implicit "first auto-promote everything to int" rule). Though, it can generally still use wider ALU
so long as the result value will retain the expected sign or zero extension.

This would differ from C's behavior in the case of widening expressions,
in that operating on narrower types and storing the result as a wider
type will promote first (so no overflow happens) rather than in C where
an overflow may happen with the narrower types and promoted after the fact.

This would have fewer "gotchas" on average than the C approach, but C's
rules need to be maintained for C code, as some code will break if the original integer overflow behavior is not preserved. But, the existing
rules are not entirely consistent.

Can make the working assumption that widening is cheap but narrowing has
a non-zero cost (though, this is the reverse from the normal RV ABI,
where on RV64G the ABI would normally have people pay the cost at
"unsigned int"->"long" promotion).

In the abstract model, all narrower signed or unsigned types are sign or
zero extended to the maximum widest type in play; we can also assume
twos complement as the working model; ...

The big and little endian types would mostly apply to structures and
pointers. They would only effect local variables if the address of the
local variable is taken (else the machine default is used; or "all
choices being equal" assume little endian).

By default, assume native alignment of a type unless a packed modifier
is used (with packed applied either par variable or for the structure as
a whole). If no packed is used, the alignment of a struct will be the
widest member in the struct. If used on a struct, the whole struct will
assume byte alignment. Else, the alignment will be the largest alignment
seen within the struct (or the largest non-packed member). Could maybe
have an 'align_as()' modifier (to specify to use the same alignment as
another type) with the packed case being equal to byte alignment.

Possible:
Allow 'if()' in structs, but would be evaluated as a compile-time
constant (so in this sense, functions more like an ifdef, just evaluated
later in the process).

Might also allow VLA-like patterns if the expression is a compile time constant. Could allow a VLA as the final member of a struct, which will
be understood the same as a zero-element array. Will have the side
effect that the size of the struct is unknown, and it may not be used in arrays nor as the non-final member of a parent struct (and if present,
will apply the same property on the parent struct).

Note that structs may be classified as serializable or non-serializable. Serializable structs will need a fixed and unambiguous size;
They will explicitly disallow pointers, references, or any other types
that can't be serialized.

Serializable structs would be assumed to be able to be safely read from
or written to a file or socket, ...

Might make sense, in such a language, to have an object model similar to C#: Structs exist, by-value by default;
Classes always by-reference, with a single inheritance and interfaces model; Maybe for nicety, assume that interfaces can be mapped to COM-like
objects (should map the underlying COM layout);
...

Could also assume similar scoping rules to C#, with full scope known at
the time an EXE or DLL is compiled (any undefined types or variables at
this stage being a compiler error). The front-end parser and compiler
would be required to still work even without a full knowledge of the type-system (WRT class-like types), but may enforce stricter constraints
on normal value types. Though, if doing separate compilation, this only
allows partial compilation of some features (the object system will need
to be sorted out at link time).

Would not have C++ style templates, but could still have generics.

But:
No garbage collector;
Objects may have an explicit automatic lifetime.

Say:
Foo! foo();
Does not mean that it is necessarily stack-allocated or by-value (unlike
C++), but will mean that 'foo' will be auto-deleted when foo goes out of scope.

Similar could also be applied to class members, so a T! member is
auto-deleted when the parent goes out of scope. Could maybe also
consider "T^" for cases where the member is to use reference counting
(though count also make sense on the class definition).

so, some modifiers could be applied one of several places:
Class definition: Default behavior to be used, may be overridden.
Variable: Used in this context, may override class.
"new()": Used at object creation for dynamically created objects.

With possible syntax:
T //base type, default behavior, global lifetime for objects.
T* //pointer, structs, N/A for class objects
T! //automatic / parent-scope lifetime
T^ //reference counted
T(Z) //zone lifetime

Typically the stronger rule may be used, with it being a compiler error
if a variable or member doesn't match the lifetime specified elsewhere
(though with fudging for "T!" as it would apply to the point of creation and/or place-of-residence of the object in question). As such, it is
likely that "T!" class members would primarily be initialized in
constructors (but may be treated as 'final' outside of a constructor for
the class in question).

zones will be compile-time entities. It could be treated as an error for
an object in a longer-lived zone to have a reference with a
shorter-lived zone. Though, unclear how to enforce this at compile time.
Zone lifetime would depend on program control flow rather than known at compile time. Though, a zone-tree could be defined at compile time, and
the compiler or runtime could error-out or fault if it detects zone
creation or destruction which deviates from the specified dependency order.

zonedef Z; //define a zone Z, parent of Z is global
zonedef Z(Zp); //define zone Z whose lifetime exists within Zp.
If Z is live and Zp is destroyed, throw.
If Z is created and Zp is not live, throw
If an object in Z is created, and Z is not live, throw.
...

In most cases, 'delete' could be discouraged, as the only time delete is likely to be needed is if lifetime is poorly specified in some other
way. But, we don't need generalized garbage collection, as pretty much
no one has really made this work acceptably.

Reference counting may leak memory, though one possibility could be to
try to detect and flag cycle-formation when creating object graphs, with
an explicit "weak object reference" being created in cases where cycle-creation is detected (in this case, the reference count is
special). If the reference count for non-weak references drops to 0, it destroys the object. Downside: This puts some of the computational cost
of a mark/sweep collector into the code for incrementing and
decrementing reference counts.

Though possible is allowing both reference-counting and zones on the
same object, in which case the zone may clean up leaks from the reference-counter (assuming periodic zone destruction).

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:28:09 2025

From Newsgroup: comp.arch

On 10/4/2025 4:56 AM, BGB wrote:

On 10/3/2025 4:04 PM, Thomas Koenig wrote:

Stefan Monnier <monnier@iro.umontreal.ca> schrieb:

--------------------------------------------------------------
Integer instructions are now::      {Signed and unSigned}×{Byte, >>>>> HalfWord,
Word, DoubleWord}
while FP instructions are now:
      {Byte, HalfWord, Word, DoubleWord}

I doubt any compilers will use this feature.
Strong typed languages don't have predefined operators that allow
mixing.

Not sure who's confused, but my reading of the above is not some sort of >>> "mixing": I believe Mitch is just saying that his addition operation
(for example) can be specified to operate on either one of int8, uint8,
int16, uint16, ...
But that specification applies to all inputs and outputs of the
instruction, so it does not support adding an int8 to an int32, or other >>> "mixes".

The outputs are correctly extended to a 64-bit number (signed or
unsigned) so it is possible to pass results to wider operations
without conversion.

One example would be

unsigned long foo (unsigned int a, unsigned int b)
{
   return a + b;
}

which would need an adjustment after the add, and which would
just be somethign like

    adduw    r1,r1,r2
    ret

using Mitch's new encoding.

Yes.

Sign extend signed types, zero extend unsigned types.
Up-conversion is free.

This is something the RISC-V people got wrong IMO, and adding a bunch of ".UW" instructions in an attempt to patch over it is just kinda ugly.

Partly for my own uses revived ADDWU and SUBWU (which had been dropped
in BitManip), because these are less bad than the alternative.

I get annoyed that new extensions keep trying to add ever more ".UW" instructions rather than just having the compiler go over to zero-
extended unsigned and make this whole mess go away.

...

Ironically, the number of new instructions being added to my own ISA has mostly died off recently, largely because there is little particularly relevant to add at this point (within the realm of stuff that could be added).

Going and looking back, most major new instructions added were:
BITMOV and BITMOV.S, ~ 7 months ago
Some new ops related to FP8A handling and similar, ~ 2 months ago
Mostly for Bias=7 (where, FP8A=S.E3.M4, or A-Law format)
I couldn't just change the Bias=8 ops to 7 without breaking stuff;
But, for non-audio uses 7 is a lot more useful.
Mostly used for unit vectors,
where ability to store values >= 1.0 sometimes needed.
But, most values still < 1.0 ...
Sorta relates to Trellis re-normalization trickery.
Stored vector isn't exactly unit-length, but unit post-renorm.

A few operations in the "possible" category:
A few NN related packed multiply instructions;
Instructions for a possible UVF1 packed block format
(graphics and NN);
...

FPU Compare 3R instructions, ~8 months ago

While XG3 was added 11 months ago, it isn't really new instructions, so
much as a new more and encoding scheme for the same instructions (and it
was only fairly recently that I got support for predicated instructions implemented in RISC-V).

And, 12 months ago, a RISC-V target for BGBCC, and jumbo prefixes for
the RISC-V side, ... Somehow I thought all of this happened several
years ago, seems it was 1 year.

Seems initial efforts to start adding RISC-V support were (only) 2 years
ago.

A lot more fiddling has been in things mostly related to dealing with
RISC-V and trying to make it less terrible.

The stuff for the recent FPU behavior tweaks are more tweaking FPU
behavior, and haven't really involved adding new instructions (except on
the RISC-V side, ones which already existed in the RISC-V specs).

Hmm...

--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 5 11:58:14 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> writes:

On Sat, 04 Oct 2025 16:11:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

AFAIK Rust does not have a machine-word-sized integer type; instead,
each type has its size in its name (e.g., i32, u64).

Rust has machine-dependent isize and usize types

Good. But for some reasons all the examples I have seen use
integer types like i32 and u64.

identical to ptrdiff_t and size_t in C.

I have read that there are C implementation (variants) where ptrdiff_t
and size_t are smaller than a pointer, in particular large-model C on
the 8086, and that was the reason for C standard restrictions about
pointer subtraction and pointer inequality comparison.

I hope nobody is doing large-model Rust, even though Rust may be more appropriate for that than C.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 5 15:01:06 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

...

By the time the 64-bit worksations
were being designed, REAL was firmly established as 32-bit and
DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
and the very 32-bit workstations that the 64-bit workstations were
supposed to replace.

On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
is on the PDP-11 (but I remember reading about INTEGER*2 and
INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
requires that C's int is as wide as FORTRAN's REAL is broken at some
point on the PDP-11.

So your rules do not even work for the first machine where C has been implemented. If shortsighted FORTRAN people look at 32-bit machines
and become accomodated to C's int being as wide as FORTRAN's INTEGER
and REAL, they could have known from the PDP-11 that that's going to
break for other machine word sizes.

So, put yourself into the shoes of the people designing workstations
RS4000 they could allow their scientific and technical customers
to use the same codes "as is", with no conversion, or tell them
they cannot use 32-bit REAL any more, and that they need to rewrite
all their software.

If they want to use their software as-is, and it is written to work
with an ILP32 C implementation, the only solution is to continue using
an ILP32 implementation. That's not only for FORTRAN/C mixing, but
for most C code of the day, certainly with I32LP64; I expect that the
porting effort would have been smaller with ILP64, but there still
would have been some.

BTW, we have a DecStation 5000/150 with an R4000, and all C compilers
on this machine support ILP32 and nothing else.

What would they have expected their customers to do? Buy a system
which forces them to do this, or buy a competitor's system where
they can just recompile their software?

If just recompiling is the requirement, what follows is ILP32.

You're always harping about how compilers should be bug-comptatible
to previous releases.

Not in the least. I did not ask for bug compatibility.

I also did not ask for "compiling as is" on a different architecture,
much less on a system with different address size.

I have actually written up what I ask for: <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>. Maybe you
should read it one day, or reread it given that you have forgotten it.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 5 18:19:47 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
<snip>

Not in the least. I did not ask for bug compatibility.

I also did not ask for "compiling as is" on a different architecture,
much less on a system with different address size.

I have actually written up what I ask for: <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>. Maybe you
should read it one day, or reread it given that you have forgotten it.

In the referenced article you write::
"Access to uninitialized data is another issue where absolute equivalence
with the basic model would make important optimizations impossible. Consider
a variable v at the end of its life (e.g., at the end of a function). Unless the compiler can prove that the location of the variable is not read later
as a result of reading uninitialized data (say, reading the uninitialized variable w living in the same location in a different function), v would
have to stay in the same location in future compiler versions or other optimization levels; or at least the final value of v would have to be
stored in this location, and the initial value of w would have to be
fetched from this location."

If variable v and variable w are "stack variables" local to their own subroutines, it seems perfectly reasonable to assume that all deallocated
stack variables become inaccessible. Then, later when new stack space is allocated those new variables have no relationship to any previously deallocated variables.

That is: when the stack pointer is incremented the space is no longer accessible and::
a) any modified cache lines are discarded instead of being written
to memory--the space is no longer accessible so don't waste power
making DRAM coherent with inaccessible stack space.

Later, when the stack pointer is decremented::
b) new cache line area can be "allocated" without reading DRAM and
being <conceptually> initialized to zero.

- anton

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sun Oct 5 19:30:42 2025

From Newsgroup: comp.arch

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
is on the PDP-11 (but I remember reading about INTEGER*2 and
INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that >FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
requires that C's int is as wide as FORTRAN's REAL is broken at some
point on the PDP-11.

I wrote INFort, one of the two F77 implementations for the PDP-11.
INTEGER and REAL were the same size because that's what the standard
said, and any program that used EQUIVALENCE would break otherwise. If
you wanted shorter ints, INTEGER*2 provided them.

Bell Labs independently wrote f77 around the same time, and its manual says they did the same thing, INTEGER was C long int, INTEGER*2 was short int.

If the speed difference mattered, it wasn't hard to say something like

IMPLICIT INTEGER*2(I-N)

to make your ints short.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 5 19:51:26 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The I32LP64 mistake

If you consider I32LP64 a mistake, how should FORTRAN's (note the
upper case, this is pre-Fortran-90) storage association rules have
been handled, in your opinion?

...

By the time the 64-bit worksations
were being designed, REAL was firmly established as 32-bit and
DOUBLE PRECISION as 64-bit, from the /360, the PDP-11, the VAX
and the very 32-bit workstations that the 64-bit workstations were
supposed to replace.

On the PDP-11 C's int is 16 bits. I don't know what FORTRAN's INTEGER
is on the PDP-11 (but I remember reading about INTEGER*2 and
INTEGER*4, AFAIK not in a PDP-11 context). In any case, I expect that FORTRAN's REAL was 32-bit on a PDP-11, and that any rule chain that
requires that C's int is as wide as FORTRAN's REAL is broken at some
point on the PDP-11.

It is possible to have a two-byte integer and a 32-byte real.
Storage association then requires four bytes for an integer.
This wastes space for integers (at least for arrays) but that
is not such a big deal, because most big arrays in scientific
code are reals.

The same held for the Cray-1 - default ingegers (24 bit)
and their weird 64-bit reals

The main problem is when the size of default INTEGER size _exceeds_ the smallest useful REAL, then REAL arrays either become twice as big,
plus you need to implement 128-bit REALs.

So, put yourself into the shoes of the people designing workstations
RS4000 they could allow their scientific and technical customers
to use the same codes "as is", with no conversion, or tell them
they cannot use 32-bit REAL any more, and that they need to rewrite
all their software.

If they want to use their software as-is, and it is written to work
with an ILP32 C implementation, the only solution is to continue using
an ILP32 implementation.

So, kill the 64-bit machines in the scientific marketplace. I'm glad
you agree.

What would they have expected their customers to do? Buy a system
which forces them to do this, or buy a competitor's system where
they can just recompile their software?

If just recompiling is the requirement, what follows is ILP32.

There is absolutely no problem with 64-bit pointers when recompiling
Fortran.

You're always harping about how compilers should be bug-comptatible
to previous releases.

Not in the least. I did not ask for bug compatibility.

I'll keep that in mind for the next time.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon Oct 6 05:56:53 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

If variable v and variable w are "stack variables" local to their own >subroutines, it seems perfectly reasonable to assume that all deallocated >stack variables become inaccessible.

That is debatable. This assumption is the basis of "optimizing" away
memset() (or similar) that is intended to keep the lifetime of secret
keys as short as possible. After this "optimization", the secret key
continues to be in memory, and can be extracted through
vulnerabilities, preserved for much longer in the swap area or in
snapshots, or in the value of newly allocated uninitialized areas.
All of which prove that the assumption is wrong.

Then, later when new stack space is
allocated those new variables have no relationship to any previously >deallocated variables.

That is: when the stack pointer is incremented the space is no longer >accessible and::
a) any modified cache lines are discarded instead of being written
to memory--the space is no longer accessible so don't waste power
making DRAM coherent with inaccessible stack space.

Later, when the stack pointer is decremented::
b) new cache line area can be "allocated" without reading DRAM and
being <conceptually> initialized to zero.

I have outlined ways to optimize zeroing of memory in <2014Jul9.193122@mips.complang.tuwien.ac.at> <2022Aug5.141325@mips.complang.tuwien.ac.at>

With that idea, the way to use it is to zero the memory when it is
deallocated (so it is not written back to main memory; it may be
written to the zero area as part of a larger unit). And to also zero
it when it is allocated so that there is no need to load the data from
outer cache levels or main memory (or their equivalents in zeroed
memory).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon Oct 6 06:26:12 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

[...]

It is possible to have a two-byte integer and a 32-byte real.

But according to John Levine that is not what happens on the PDP-11.
Instead, it has 4-byte INTEGERs, demonstrating that your "unofficial
rule" that C int is as wide as FORTRAN INTEGER did not hold.

The same held for the Cray-1 - default ingegers (24 bit)
and their weird 64-bit reals

If FORTRAN INTEGERs are 24 bits on the Cray-1, this architecture is
another example where your "unofficial rule" does not hold. C ints
are 64-bit on the Cray 1.

If they want to use their software as-is, and it is written to work
with an ILP32 C implementation, the only solution is to continue using
an ILP32 implementation.

So, kill the 64-bit machines in the scientific marketplace. I'm glad
you agree.

Not in the least. Most C programs did not run as-is on I32LP64, and
that did not kill these machines, either. And I am sure that C
programs were much more relevant for selling these machines than
FORTRAN programs. C programmers changed the programs to run on
I32LP64 (this was called "making them 64-bit-clean"). And until that
was done, ILP32 was used.

If just recompiling is the requirement, what follows is ILP32.

There is absolutely no problem with 64-bit pointers when recompiling
Fortran.

Fortran is not the only consideration for designing an ABI for C, if
it is one at all. The large number of 32bit->64bit sign-extension and zero-extension operations, either explicitly, or integrated into
instructions such as RISC-V's addw, plus the
"optimizations"/miscompilations to ged rid of some of the sign
extensions are a cost that we pay all the time for the I32LP64
mistake.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Mon Oct 6 14:23:50 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

[...]

<snip>

So, kill the 64-bit machines in the scientific marketplace. I'm glad
you agree.

Not in the least. Most C programs did not run as-is on I32LP64.

The vast majority of C/C++ programs ran just fine on I32LP64. There
were some that didn't, but it was certainly not "most".
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Mon Oct 6 11:51:18 2025

From Newsgroup: comp.arch

On 10/6/2025 9:23 AM, Scott Lurndal wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

[...]

<snip>

So, kill the 64-bit machines in the scientific marketplace. I'm glad
you agree.

Not in the least. Most C programs did not run as-is on I32LP64.

The vast majority of C/C++ programs ran just fine on I32LP64. There
were some that didn't, but it was certainly not "most".

Yes, most programs only needed minor edits.

Some stuff I had ported:
Doom: Mostly trivial edits;
Had to re-implement audio and music handling.
Heretic and Hexen:
More edits, mostly removing MS-DOS stuff;
Had to replace most of the audio and music code.
ROTT:
Extensive modification to graphics handling;
Was very dependent on low-level VGA hardware twiddling.
(Vs Doom's "Set 320x200 and done" approach).
Lots of memory management and out-of-bounds issues;
Some amount of code that is sensitive to integer wrap-on-overflow;
...
(ROTT was a little harder to port)
Quake:
Few issues for most of the engine;
The "progs.dat" VM required getting creative.
It mixes pointers and 'float' in ways
"some might consider unnatural"
Quake 2:
Basically 64-bit clean out of the box.
Quake 3:
The QVM architecture very much assumes 32-bit,
not really a way to make it 64-bit absent a significant rewrite.
Did allow for falling back to the Quake2 strategy,
of using natively compiled DLLs.

Of the programs, I still have not fully debugged ROTT when built via
BGBCC, where there is an issue somewhere that is resulting in demo
desyncs that tend to change from one run to another.

Last I checked, I had it stable when built with MSVC, and had it
basically working with a GCC build.

Can note that ROTT is one of the larger programs I had ported to my
project (in terms of code size), where both the ROTT and Quake3 ports
weigh in at a little over 300 kLOC (very much larger than Doom or Quake).

Quake 3 builds as multiple DLLs, whereas ROTT as a single binary. As
such, ROTT currently builds the biggest EXE (with around 1MB of ".text").

Though, curiously, there is (on average) less than 4 bytes per line on
C, not entirely sure how that happens.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Oct 6 17:38:13 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

So, kill the 64-bit machines in the scientific marketplace. I'm glad
you agree.

Not in the least. Most C programs did not run as-is on I32LP64, and
that did not kill these machines, either.

Only those who assumed sizeof(int) = sizeof(char *). This was
not true on the PDP-11, and it was a standards violation, anyway.
Only people who liked to play these kind of games (I know you do)
were caught.

And I am sure that C
programs were much more relevant for selling these machines than
FORTRAN programs.

Based on what data? Your own personal guess?

C programmers changed the programs to run on
I32LP64 (this was called "making them 64-bit-clean"). And until that
was done, ILP32 was used.

The problem with 64-bit INTEGERs for Fortran is that they make REAL
unusable for lots of existing code.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Mon Oct 6 20:02:50 2025

From Newsgroup: comp.arch

According to Thomas Koenig <tkoenig@netcologne.de>:

Not in the least. Most C programs did not run as-is on I32LP64, and
that did not kill these machines, either.

Only those who assumed sizeof(int) = sizeof(char *). This was
not true on the PDP-11, ...

The PDP-11 was a 16 bit machine with 16 bit ints and 16 bit pointers.
There were 32 bit long and float, and 64 bit double.

I didn't port a lot of code from the 11 to other machines, but my recollection is that the widespread assumption in Berkeley Vax code that location zero was addressable and contained binary zeros was much more painful to fix than
size issues.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Mon Oct 6 20:46:11 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> writes:

According to Thomas Koenig <tkoenig@netcologne.de>:

Not in the least. Most C programs did not run as-is on I32LP64, and
that did not kill these machines, either.

Only those who assumed sizeof(int) = sizeof(char *). This was
not true on the PDP-11, ...

The PDP-11 was a 16 bit machine with 16 bit ints and 16 bit pointers.
There were 32 bit long and float, and 64 bit double.

I didn't port a lot of code from the 11 to other machines, but my recollection >is that the widespread assumption in Berkeley Vax code that location zero was >addressable and contained binary zeros was much more painful to fix than
size issues.

"location zero was addressible". Might also point out it was RO, but yes
that caused many problems porting BSD utilities to SVR4.

The other issue with leaving the PDP-11 for 32-bit systems was the change
in the size of the PID, UID, and GID. Which required more than a simple recompile, since there weren't abstract types (e.g. pid_t, gid_t, uid_t)
for those data items yet, so code needed to be updated manually.
--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Tue Oct 7 01:38:02 2025

From Newsgroup: comp.arch

In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

LLVM compiles C with stricter typing than GCC resulting in a lot
of smashes:: For example::

int subroutine( int a, int b )
{
return a+b;
}

Compiles into:

subroutine:
ADD R1,R1,R2
SRA R1,R1,<32,0> // limit result to (int)
RET

I tested this on AMD64, and did not find sign-extension in the caller, >neither with gcc-14 nor with clang-19; both produce the following code
for your example (with "subroutine" renamed into "subroutine1").

0000000000000000 <subroutine1>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 ret

It's not about strict or lax typing, it's about what the calling
convention promises about types that are smaller than a machine word.
If the calling convention requires/guarantees that ints are
sign-extended, the compiler must use instructions that produce a >sign-extended result. If the calling convention guarantees that ints
are zero-extended (sounds perverse, but RV64 has the guarantee that
unsigned is passed in sign-extended form, which is equally perverse),
then the compiler must use instructions that produce a zero-extended
result (e.g., AMD64's addl). If the calling convention only requires
and guarantees the low-order 32 bits (I call this garbage-extended),
then the compiler can use instructions that perform 64-bit adds; this
is what we are seeing above.

The other side of the medal is what is needed at the caller: If the
caller needs to cconvert a sign-extended int into a long, it does not
have to do anything. If it needs to convert a zero-extended or >garbage-extended int into a long, it has to sign-extend the value.

AMD64 in hardware does 0 extension of 32-bit operations. From your
example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
the 64-bit register %rax will have 0's written into bits [63:32].
So the AMD64 convention for 32-bit values in 64-bit registers is to
zero-extend on writes. And to ignore the upper 32-bits on reads, so
using a 64-bit register should use the %exx name.

I agree with you that I32LP64 was a mistake, but it exists, and I
think ARM64 did a good job handling it. It has all integer operations
working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
it 0-extends the register value.

You don't want "garbage extend" since you want a predictable answer.
Your choices for writing 32-bit results in a 64-bit register are thus sign-extend (not a good choice) or zero-extend (what almost
everyone chose). RISC-V is in another land, where they effectively have
no 32-bit operations, but rather a convention that all 32-bit inputs
must be sign-extended in a 64-bit register.

For C and C++ code, the standard dictates that all integer operations are
done with "int" precision, unless some operand is larger than int, and then
do it in that precision. So there's no real need for 8-bit and 16-bit operations to be natively by the CPU--these operations are actually done
as int's already. If you have a variable which is a byte, then assigning
to that variable, and then using that variable again you will need to zero-extend, but honestly, this is not usually a performance path. It's
likely to be stored to memory instead, so no masking or sign extending
should be needed.

If you pick ILP64 for your ABI, then you will get rid of almost all of
these zero- and sign-extensions of 32-bit C and C++ code. It will just
work. If you pick I32LP64, then you should have a full suite of 32-bit operations and 64-bit operations, at least for all add, subtract, and
compare operations. And if you do I32LP64, your indexed addressing
modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
and 32-bit unsigned. That worked well for ARM64.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Oct 7 15:52:17 2025

From Newsgroup: comp.arch

kegs@provalid.com (Kent Dickey) posted:

In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

LLVM compiles C with stricter typing than GCC resulting in a lot
of smashes:: For example::

int subroutine( int a, int b )
{
return a+b;
}

Compiles into:

subroutine:
ADD R1,R1,R2
SRA R1,R1,<32,0> // limit result to (int)
RET

I tested this on AMD64, and did not find sign-extension in the caller, >neither with gcc-14 nor with clang-19; both produce the following code
for your example (with "subroutine" renamed into "subroutine1").

0000000000000000 <subroutine1>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 ret

It's not about strict or lax typing, it's about what the calling
convention promises about types that are smaller than a machine word.
If the calling convention requires/guarantees that ints are
sign-extended, the compiler must use instructions that produce a >sign-extended result. If the calling convention guarantees that ints
are zero-extended (sounds perverse, but RV64 has the guarantee that >unsigned is passed in sign-extended form, which is equally perverse),
then the compiler must use instructions that produce a zero-extended
result (e.g., AMD64's addl). If the calling convention only requires
and guarantees the low-order 32 bits (I call this garbage-extended),
then the compiler can use instructions that perform 64-bit adds; this
is what we are seeing above.

The other side of the medal is what is needed at the caller: If the
caller needs to cconvert a sign-extended int into a long, it does not
have to do anything. If it needs to convert a zero-extended or >garbage-extended int into a long, it has to sign-extend the value.

AMD64 in hardware does 0 extension of 32-bit operations. From your
example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
the 64-bit register %rax will have 0's written into bits [63:32].
So the AMD64 convention for 32-bit values in 64-bit registers is to zero-extend on writes. And to ignore the upper 32-bits on reads, so
using a 64-bit register should use the %exx name.

I agree with you that I32LP64 was a mistake, but it exists, and I
think ARM64 did a good job handling it. It has all integer operations working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
it 0-extends the register value.

You don't want "garbage extend" since you want a predictable answer.

Strongly Agree.

Your choices for writing 32-bit results in a 64-bit register are thus sign-extend (not a good choice) or zero-extend (what almost
everyone chose). RISC-V is in another land, where they effectively have
no 32-bit operations, but rather a convention that all 32-bit inputs
must be sign-extended in a 64-bit register.

Why not zero extend unSigned and sign extend Signed ?!?
That way the value in the register is (IS) the value in the smaller
container !!

Also, why not extend this to both shorts and chars ?!?

For C and C++ code, the standard dictates that all integer operations are done with "int" precision, unless some operand is larger than int, and then do it in that precision. So there's no real need for 8-bit and 16-bit operations to be natively by the CPU--these operations are actually done
as int's already. If you have a variable which is a byte, then assigning
to that variable, and then using that variable again you will need to zero-extend,

You could perform the operation at base-size (byte in this case).

Languages like ADA are not defined like C.

but honestly, this is not usually a performance path. It's likely to be stored to memory instead, so no masking or sign extending
should be needed.

If you pick ILP64 for your ABI, then you will get rid of almost all of
these zero- and sign-extensions of 32-bit C and C++ code.

Then, the only access to 32-bit integers is int32_t and uint32-t.

It will just
work. If you pick I32LP64, then you should have a full suite of 32-bit operations and 64-bit operations, at least for all add, subtract, and
compare operations. And if you do I32LP64, your indexed addressing
modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
and 32-bit unsigned. That worked well for ARM64.

Kent

--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 7 11:27:39 2025

From Newsgroup: comp.arch

kegs@provalid.com (Kent Dickey) writes:

In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

int subroutine( int a, int b )
{
return a+b;
}

...

I tested this on AMD64, and did not find sign-extension in the caller, >>neither with gcc-14 nor with clang-19; both produce the following code
for your example (with "subroutine" renamed into "subroutine1").

0000000000000000 <subroutine1>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 ret

...

AMD64 in hardware does 0 extension of 32-bit operations. From your
example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
the 64-bit register %rax will have 0's written into bits [63:32].
So the AMD64 convention for 32-bit values in 64-bit registers is to >zero-extend on writes. And to ignore the upper 32-bits on reads, so
using a 64-bit register should use the %exx name.

Interesting. At some point I got the impression that LEA produces a
64-bit result, because it produces an address, but testing reveals
that LEA has a 32-bit zero-extended variant indeed.

I agree with you that I32LP64 was a mistake, but it exists, and I
think ARM64 did a good job handling it. It has all integer operations >working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,
it 0-extends the register value.

You don't want "garbage extend" since you want a predictable answer.

Zero-extended for unsigned and sign-extended for int are certainly
more forgiving when some function is called without a prototype and
the actual type does not match the implied type (I once read about
IIRC miranda prototypes, but a web search only gives me Star Trek
stuff when I ask for that).

Zero-extending for int is less forgiving. Apparently by 2003 (when
AMD64 appeared) the use of prototypes was widespread enough that such
a calling convention was acceptable.

But once all the functions have correct prototypes, garbage-extension
is just as workable as other alternatives.

Your choices for writing 32-bit results in a 64-bit register are thus >sign-extend (not a good choice) or zero-extend (what almost everyone chose).

What makes you think that one is a better choice than the other?

The most obvious choices to me are:

Sign-extend int and zero-extend unsigned: That has the best chance at
the expected behaviour when the prototype is missing and would be
required.

If you rely on prototypes being present, you can take any choice,
including garbage-extension. Then you can use the full 64-bit
operation in many cases, and only insert sign or zero extension when a conversion from 32-bit to 64 bit is needed (and that extension can be
part of an instruction, as in ARM A64 addressing modes).

As for what "almost everyone chose", here's some data:

int unsigned ABI
sign-extended sign-extended MIPS o64 and 64
sign-extended zero-extended SPARC V9
sign-extended zero-extended PowerPC64
zero-extended zero-extended AMD64
zero-extended zero-extended ARM A64
sign-extended sign-extended RV64

I determined this by looking at the code for

unsigned usubroutine( unsigned a, unsigned b )
{
return a+b;
}

int isubroutine( int a, int b )
{
return a+b;
}

The code on variois architectures (as compiled with gcc -O) is:

MIPS64 (gcc -mabi=64 -O and gcc -mabi=o64 -O):
0000000000000034 <usubroutine>:
34: 03e00008 jr ra
38: 00851021 addu v0,a0,a1

000000000000003c <isubroutine>:
3c: 03e00008 jr ra
40: 00851021 addu v0,a0,a1

SPARC V9:
0000000000000018 <usubroutine>:
18: 9d e3 bf 50 save %sp, -176, %sp
1c: b0 06 00 19 add %i0, %i1, %i0
20: 81 cf e0 08 return %i7 + 8
24: 91 32 20 00 srl %o0, 0, %o0

0000000000000028 <isubroutine>:
28: 9d e3 bf 50 save %sp, -176, %sp
2c: b0 06 00 19 add %i0, %i1, %i0
30: 81 cf e0 08 return %i7 + 8
34: 91 3a 20 00 sra %o0, 0, %o0

PowerPC64:
0000000000000030 <.usubroutine>:
30: 7c 63 22 14 add r3,r3,r4
34: 78 63 00 20 clrldi r3,r3,32
38: 4e 80 00 20 blr
...

0000000000000048 <.isubroutine>:
48: 7c 63 22 14 add r3,r3,r4
4c: 7c 63 07 b4 extsw r3,r3
50: 4e 80 00 20 blr

RISC-V is in another land, where they effectively have
no 32-bit operations, but rather a convention that all 32-bit inputs
must be sign-extended in a 64-bit register.

RISC-V has a number of sign-extending 32-bit instructions, and a
calling convention to go with it.

There seem to be the following options:

Have no 32-bit instructions, and insert sign-extension or
zero-extension instructions where necessary (or implicitly in all
operands, as I outlined earlier). SPARC V9 and PowerPC64 seem to take
this approach.

Have 32-bit instructions that sign-extend: MIPS64, Alpha, and RV64.

Have 32-bit instructions that zero-extend: AMD64 and ARM A64.

Have 32-bit instructions that sign-extend and 32-bit instructions that zero-extend. No architecture that does that is known to me. It would
be a good match for the SPARC-V9 and PowerPC64 calling convention.

There is also one instruction set (ARM A64) that has special 32-bit sign-extension and zero-extension forms for some operands.

And you can then adapt the calling convention to match the instruction
set. For "no 32-bit instructions", garbage-extension seems to be the
cheapest approach to me, but I expect that when SPARC-V9 and PowerPC64
came on the market, there was enough C code with missing prototypes
around that they preferred a more forgiving calling convention.

If you pick ILP64 for your ABI, then you will get rid of almost all of
these zero- and sign-extensions of 32-bit C and C++ code. It will just
work. If you pick I32LP64, then you should have a full suite of 32-bit >operations and 64-bit operations, at least for all add, subtract, and
compare operations.

For compare, divide, shift-right and rotate, you either first need to sign/zero-extend the register, or you need 32-bit versions (possibly
both signed and unsigned).

And if you do I32LP64, your indexed addressing
modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
and 32-bit unsigned. That worked well for ARM64.

It is certainly part of the way towards my idea of having sign- and zero-extended 32-bit operands for every operand of every instruction.

It would be interesting to see how many sign-extensions and
zero-extensions (whether explicit or implicitly part of the
instruction) are executed in code that is generated from various C
sources (with and without -fwrapv). I expect that it's highly
dependent on the programming style. Sure there are types like pid_t
where you have no choice, but in frequently occuring cases you can
choose:

for (i=0; i<n; i++) {
... a[i] ...
}

Here you can choose whether to define i as int, unsigned, long,
unsigned long, size_t, etc. If you care for portability to 16-bit
machines, size_t is a good idea here, otherwise long and unsigned long
also are efficient. If n is unsigned, you can also choose unsigned,
but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and
PowerPC64 and Alpha).

If n is int, you can also choose int, and there is actually enough
information here to make the code efficient (even with -fwrapv),
because in this code int overflow really cannot happen, but in code
that's not much different from this one (e.g., using != instead of <),
-fwrapv will result in an inserted sign extension on AMD64, and not
using -fwrapv may result in unintended behaviour thanks to the
compiler assuming that int overflow does not happen.

ILP64 would have spared us all these considerations.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue Oct 7 18:01:25 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

kegs@provalid.com (Kent Dickey) writes:

In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

int subroutine( int a, int b )
{
return a+b;
}

...

I tested this on AMD64, and did not find sign-extension in the caller, >>>neither with gcc-14 nor with clang-19; both produce the following code >>>for your example (with "subroutine" renamed into "subroutine1").

0000000000000000 <subroutine1>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 ret

...

AMD64 in hardware does 0 extension of 32-bit operations. From your
example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),
the 64-bit register %rax will have 0's written into bits [63:32].
So the AMD64 convention for 32-bit values in 64-bit registers is to >>zero-extend on writes. And to ignore the upper 32-bits on reads, so
using a 64-bit register should use the %exx name.

Interesting. At some point I got the impression that LEA produces a
64-bit result, because it produces an address, but testing reveals
that LEA has a 32-bit zero-extended variant indeed.

Architecurally, any store to a 32-bit register (%e_x) will
clear the high-order bits of of the 64-bit version of the
register.
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Oct 7 18:34:45 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

kegs@provalid.com (Kent Dickey) writes:

In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

int subroutine( int a, int b )
{
return a+b;
}

------------------------------------------------------------

RISC-V is in another land, where they effectively have
no 32-bit operations, but rather a convention that all 32-bit inputs
must be sign-extended in a 64-bit register.

RISC-V has a number of sign-extending 32-bit instructions, and a
calling convention to go with it.

RISC-V has word sized integer arithmetic.

There seem to be the following options:

Have no 32-bit instructions, and insert sign-extension or
zero-extension instructions where necessary (or implicitly in all
operands, as I outlined earlier). SPARC V9 and PowerPC64 seem to take
this approach.

This was My 66000 between 2016 and two weeks ago.
The cost is 4% growth in code footprint and similar perf degradation.

Have 32-bit instructions that sign-extend: MIPS64, Alpha, and RV64.

Have 32-bit instructions that zero-extend: AMD64 and ARM A64.

Have 32-bit instructions that sign-extend and 32-bit instructions that zero-extend. No architecture that does that is known to me. It would
be a good match for the SPARC-V9 and PowerPC64 calling convention.

This is the starting point for My 66000 2.0:: integer arithmetic has
size and signedness, with the property that all integer results have
the 64-bit register <container> contain a range-limited result suit-
able to the base-type of the calculation {no garbage in HoBs}.

There is also one instruction set (ARM A64) that has special 32-bit sign-extension and zero-extension forms for some operands.

And you can then adapt the calling convention to match the instruction
set. For "no 32-bit instructions", garbage-extension seems to be the cheapest approach to me, but I expect that when SPARC-V9 and PowerPC64
came on the market, there was enough C code with missing prototypes
around that they preferred a more forgiving calling convention.

If you pick ILP64 for your ABI, then you will get rid of almost all of >these zero- and sign-extensions of 32-bit C and C++ code. It will just >work. If you pick I32LP64, then you should have a full suite of 32-bit >operations and 64-bit operations, at least for all add, subtract, and >compare operations.

For compare, divide, shift-right and rotate, you either first need to sign/zero-extend the register, or you need 32-bit versions (possibly
both signed and unsigned).

My 66000 CMP is signless--it compares two integer registers and delivers
a bit vector of all possible comparisons {2 equality, 4 signed, 4 unsigned,
4 range checks, [and in FP land 10-bits are the class of the RS1 operand]}

My 66000 SL, SR can be used in extract form--and here you need no operand preparation if you only extract meaningful bits.

My 66000 2.0 DIV has a size component to the calculation.

And if you do I32LP64, your indexed addressing
modes should have 3 types of indexed registers: 64-bit, 32-bit signed,
and 32-bit unsigned. That worked well for ARM64.

It is certainly part of the way towards my idea of having sign- and zero-extended 32-bit operands for every operand of every instruction.

Unnecessary if the integer calculation deliver properly range-limited
64-bit results.

It would be interesting to see how many sign-extensions and
zero-extensions (whether explicit or implicitly part of the
instruction) are executed in code that is generated from various C
sources (with and without -fwrapv).

In GNUPLOT is is just over 4% of instruction count for 64-bit-only
integer calculations.

I expect that it's highly
dependent on the programming style. Sure there are types like pid_t
where you have no choice, but in frequently occuring cases you can
choose:

for (i=0; i<n; i++) {
... a[i] ...
}

Here you can choose whether to define i as int, unsigned, long,
unsigned long, size_t, etc. If you care for portability to 16-bit
machines, size_t is a good idea here, otherwise long and unsigned long
also are efficient.

Counted for() loops are somewhat special in that it is quite easy to
determine that the loop index never exceeds the range-limit of the
container.

If n is unsigned, you can also choose unsigned,
but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and PowerPC64 and Alpha).

Example please !?!

If n is int, you can also choose int, and there is actually enough information here to make the code efficient (even with -fwrapv),
because in this code int overflow really cannot happen,

Consider the case where n is int64_t or uint64_t !?!

Consider the C-preprocessor with::
# define int (short int) // !!
in scope.

but in code
that's not much different from this one (e.g., using != instead of <), -fwrapv will result in an inserted sign extension on AMD64, and not
using -fwrapv may result in unintended behaviour thanks to the
compiler assuming that int overflow does not happen.

ILP64 would have spared us all these considerations.

Agreed. I32LP64 is am abomination, especially if one is bothering to
ty to keep the number of instructions down.

- anton

--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Tue Oct 7 12:20:08 2025

From Newsgroup: comp.arch

On 10/3/2025 12:55 PM, MitchAlsup wrote:

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

On 10/2/2025 7:50 PM, MitchAlsup wrote:

My 66000 2.0

After 4-odd years of ISA stability, I ran into a case where
I <pretty much> needed to change the instruction formats.
And after bragging to Quadribloc about its stability--it
reached the point where it was time to switch to version 2.0.

Well, its time to eat crow.
--------------------------------------------------------------
Memory reference instructions already produce 64-bit values
from Byte, HalfWord, Word and DoubleWord memory references
in both Signed and unSigned flavors. These supports both
integer and floating point due to the single register file.

Essentially, I need that property in both integer and floating
point calculations to eliminate instructions that merely apply
value range constraints--just like memory !

ISA 2.0 changes allows calculation instructions; both Integer
and Floating Point; and a few other miscellaneous instructions
(not so easily classified) the same uniformity.

In all cases, an integer calculation produces a 64-bit value
range limited to that of the {Sign}×{Size}--no garbage bits
in the high parts of the registers--the register accurately
represents the calculation as specified {Sign}×{Size}.

I must be missing something. Suppose I have

C := A + B

where A and C are 16 bit signed integers and B is an 8 bit signed
integer. As I understand what you are doing, loading B into a register
will leave the high order 56 bits zero. But the add instruction will
presumably be half word, so if B is negative, it will get an incorrect
answer (because B is not sign extended to 16 bits).

What am I missing?

A is loaded as 16-bits properly sign to 64-bits: range[-32768..32767]
B is loaded as 8-bits properly sign to 64-bits: range[-128..127]

ADDSH Rc,Ra,Rb

Adds 64-bit Ra and 64-bit Rb and then sign extends the result from bit<15>. The result is a properly signed 64-bit value: range [-32768..32767]

First let me apologize, then admit my embarrassment. I didn't write
what I intended to, and even if I did, it wouldn't have been correct.

I had totally missed the issue of perhaps not extending result of an arithmetic operation to the full register width. I must admit that this
never came up in the programming I have done, and I never considered it.
But subsequent posts in this thread have explained the issue well, and
so I learned something. Thanks to all!
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 7 19:09:25 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

...

My 66000 CMP is signless--it compares two integer registers and delivers
a bit vector of all possible comparisons {2 equality, 4 signed, 4 unsigned,
4 range checks, [and in FP land 10-bits are the class of the RS1 operand]}

With an 88000-style compare and a result register of 64 bits, you can
spend 14 bits on 64-bit comparison, 14 bits on 32-bit comparison, 14
bits on 16-bit comparison, and 14 bits on 8-bit comparison, and still
have 8 bits left. What is a "range check" and why does it take 4
bits?

It is certainly part of the way towards my idea of having sign- and
zero-extended 32-bit operands for every operand of every instruction.

Unnecessary if the integer calculation deliver properly range-limited
64-bit results.

Sign- or zero extension will still be necessary for things like

long a=...
int b=a;
... c[b];

With the extension in the operands, you do not need any extension
instructions, not even for division, right-shift etc.

The question, however, is if the extensions occur often enough to
merit such features. I lean towards the SPARC/PowerPC/My 66000-v1
approach here.

It would be interesting to see how many sign-extensions and
zero-extensions (whether explicit or implicitly part of the
instruction) are executed in code that is generated from various C
sources (with and without -fwrapv).

In GNUPLOT is is just over 4% of instruction count for 64-bit-only
integer calculations.

Now what if you had a calling convention with garbage-extension? A
number of extensions in your examples would go away.

Counted for() loops are somewhat special in that it is quite easy to >determine that the loop index never exceeds the range-limit of the >container.

There have been enough cases where such reasoning led to "optimizing"
code into an infinite loop and other fallout of adversarial compilers.

If n is unsigned, you can also choose unsigned,
but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and
PowerPC64 and Alpha).

Example please !?!

With a slightly different loop:

long foo(long a[], unsigned l, unsigned h)
{
unsigned i;
long r=0;
for (i=l; i!=h; i++)
r+=a[i];
return r;
}

gcc-10 -O3 produces on RV64G:

0000000000000000 <foo>:
0: 872a mv a4,a0
2: 4501 li a0,0
4: 00c58c63 beq a1,a2,1c <.L4>

0000000000000008 <.L3>:
8: 02059793 slli a5,a1,0x20
c: 83f5 srli a5,a5,0x1d
e: 97ba add a5,a5,a4
10: 639c ld a5,0(a5)
12: 2585 addiw a1,a1,1
14: 953e add a0,a0,a5
16: feb619e3 bne a2,a1,8 <.L3>
1a: 8082 ret

000000000000001c <.L4>:
1c: 8082 ret

If n is int, you can also choose int, and there is actually enough
information here to make the code efficient (even with -fwrapv),
because in this code int overflow really cannot happen,

Consider the case where n is int64_t or uint64_t !?!

Then the first condition does not hold on I32LP64.

Consider the C-preprocessor with::
# define int (short int) // !!
in scope.

Then the compiler will see short int, and generate code accordingly.
What's your point?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Tue Oct 7 01:30:59 2025
  from Moore, Ok via Telnet
- Microbot
  Mon Oct 6 03:01:21 2025
  from Moore, Ok via Telnet
- Djatropine
  Sun Oct 5 20:05:43 2025
  from Memphis, Tn via SSH
- Microbot
  Sun Oct 5 04:13:15 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,071
Nodes:	10 (0 / 10)
Uptime:	186:26:29
Calls:	13,762
Calls today:	1
Files:	186,985
D/L today:	8,390 files (2,645M bytes)
Messages:	2,427,100

Time to eat Crow

Who's Online

Recent Visitors

System Info