In any case, I think I've come up with something that is a reasonable compromise I can live with after all.
I've made my first change to Concertina IV. I'm not happy with the way things were before the change or the way they are now, so I may change it again.
The 16-bit short instructions only have 12 free bits available. That's not much to work with when there are 32 registers in each register bank.
Initially, I settled on four bits of opcode, along with the basic register specification scheme used for the 15-bit paired short instructions in Concertina II.
But choosing single and double precision floating-point as the only two types supported didn't rest easily with me. Single precision isn't really precise enough to be useful, or so I've heard.
The alternative of supporting 48-bit intermediate precision and double precision, while it appeals to me personally... is clearly untenable.
Medium is a nonstandard data type, and so it would not be widely used.
So instead I decided to only support double precision, and use the extra bits to allow additional ways to specify registers.
The result, of course, is messy.
So I'm considering going back to the earlier format, but instead of supporting two floating-point data types, to support one integer type and one floating type. But which integer type? 32-bit integer, or 64-bit long?
I could get more bits by going to _paired_ instructions. But I have some free space between 32-bit instructions so that I could just add those
while keeping 16-bit short instructions.
And this also led me to thinking about something else.
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So integer operations must sign-extend if they're on values shorter than 64 bits.
Propagating a bit takes time.
So should I design the ALU so that the sign extension takes place after
the rest of the instruction, and allow another 32-bit (or shorter) integer instruction to use results when they're ready, before sign extension? Is that just normal efficiency, or wasteful complexity?
In any case, I think I've come up with something that is a reasonable compromise I can live with after all.--- Synchronet 3.22a-Linux NewsLink 1.2
John Savard
quadi <quadibloc@ca.invalid> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
As far as integers go: all calculations produce proper integer values in
the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
Propagating a bit takes time.
A solved HW gate-level problem.
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
a) I didn't think this really had anything to do with little-endian
versus big-endian.
b) Yes, little-endian is more popular, but that's just because the
PDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.
As far as integers go: all calculations produce proper integer values
in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
If you have 64 bit registers, then if you want to avoid a gap between
the sign in a 32-bit number and the sign of a 64-bit number by placing
the 32-
bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.
Everything you have heard is both true and false::
There are many applications where DP is de rigueur {galactic
simulations} smaller precision simply will not do. Many of these would
like to go FP128 but performance is not there yet.
There is a growing demand for FP16 and FP8 data types for memory-size
and BW reasons.
There is a growing background need for FP128, too.
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
b) Yes, little-endian is more popular, but that's just because the PDP-11, >8080, and 6502 happened to choose it.
Little-endian doesn't work as well
*if* you also want to put packed decimal values in registers.
* 8080: Yes, because AMD64 inherited its byte order from it. But if
we go to the origin here, it's not the 8080 and not the 8008, but
the Datapoint 2200, which is remarkable, because it was designed as
a terminal for mainframes, and S/360 is big-endian.
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|The fact that most laptops and cloud computers today store numbers
|in little-endian format is carried forward from the original
|Datapoint 2200. Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries. Microprocessors descended from the
|Datapoint 2200 (the 8008, Z80, and the x86 chips used in most
|laptops and cloud computers today) kept the little-endian format
|used by that original Datapoint 2200.
b) Yes, little-endian is more popular, but that's just because thePDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Thomas Koenig <tkoenig@netcologne.de> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte
|in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Yes, you start with the least significant bit, but given that the architecture is not bit-addressed, this is irrelevant.
quadi <quadibloc@ca.invalid> writes:
b) Yes, little-endian is more popular, but that's just because the PDP-11, >>8080, and 6502 happened to choose it.
Thinking about it:
With the BCD support of instruction sets typically requiring piecing
together the complete operation of suboperations of less than full
length (e.g., bytes on the 6502 and the 80(2)86), little-endian is
actually easier. When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go >backwards from there. This is especially relevant if you do not want
to completely unroll the loop that handles these bytes.
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go backwards from there. This is especially relevant if you do not want to completely unroll the loop that handles these bytes.
On 5/20/26 04:09, quadi wrote:
b) Yes, little-endian is more popular, but that's just because thePDP-11,
8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
Thomas Koenig <tkoenig@netcologne.de> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description>
says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>> |in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Unfortunately, you are mistaken.
Yes, you start with the least significant bit, but given that the
architecture is not bit-addressed, this is irrelevant.
JMP with a two-byte address was little-endian on the Datapoint 2200,
On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
The reason I blame the PDP-11 for everything is that it was a hugely >influential machine. It was widely used in academic settings, and it was >also the machine for which UNIX was first widely distributed.
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to
completely unroll the loop that handles these bytes.
This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.
The reason I claim that BCD support strongly favors big-endian byte order
is this:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are >written from left to right, numerals appear in texts with the most >significant digit first.
So if one has a hardware instruction to convert from BCD to the string >representation of numbers, such as UNPK or EDIT, then those two >representations should have the same endian-ness.
And if one wants to use the same ALU for binary and BCD arithmetic, then >those have to have the same endianness.
Thomas Koenig <tkoenig@netcologne.de> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
Thomas Koenig <tkoenig@netcologne.de> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:[...]
<https://en.wikipedia.org/wiki/Datapoint_2200#Technical_description> >>>>> says:
|[...] Because the original Datapoint 2200 had a serial
|processor, it needed to start with the lowest bit of the lowest byte >>>>> |in order to handle carries.
For the Datapoint 2200, there was a solid technical reason:
It used shift register memory which supplied one bit at a time,
so the adder *had* to be little-endian.
Looks plausible at first, but when I think about it some more, both
claims are wrong.
Unfortunately, you are mistaken.
A claim without any supporting argument.
quadi <quadibloc@ca.invalid> writes:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive >>characters at increasing addresses - and, at least in languages that are >>written from left to right, numerals appear in texts with the most >>significant digit first.
So if one has a hardware instruction to convert from BCD to the string >>representation of numbers, such as UNPK or EDIT, then those two >>representations should have the same endian-ness.
Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
For packed decimals that are processed in memory, little endian is
superior to big endian, because you don't have to look for the LSB when >performing an addition, you can proceed bytewise on ascending addresses.
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
The 2200 did not have byte-addressable memory; memory contents only
could be used when they bubbled up through the shift registers.
Otherwise, the CPU had to wait. (It was a silicon version of the
mercury delay lines of the UNIVAC I).
So, how do you add or subtract values in memory? From low to high
value, saving carries. You then have a choice of either loading
them in sequence, in a single go, or to load the high value,
wait for half a microsecond and then load the low value.
Would you build such a machine in big-endian or little-endian?
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
I align different integer types on the right, even while aligning
different floating-point types on the left like everyone else. So
integer operations must sign-extend if they're on values shorter than
64 bits.
Go LE all the way. LE won get over BE thinking.
a) I didn't think this really had anything to do with little-endian versus big-endian.
b) Yes, little-endian is more popular, but that's just because the PDP-11, 8080, and 6502 happened to choose it. Little-endian doesn't work as well *if* you also want to put packed decimal values in registers.
--- Synchronet 3.22a-Linux NewsLink 1.2As far as integers go: all calculations produce proper integer values in the 64-bit destination register.
S8 has range [-128..127]
u8 has range [0..255]
...
If you have 64 bit registers, then if you want to avoid a gap between the sign in a 32-bit number and the sign of a 64-bit number by placing the 32- bit number on the most significant side, a 32-bit 1 is equal to a 64-bit 8,589,934,592.
Propagating a bit takes time.
A solved HW gate-level problem.
That's good news, then I don't have a problem. I figured the solution
would be to use slightly slower gates with larger current output.
John Savard
According to Scott Lurndal <slp53@pacbell.net>:
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
How did it handle carries? Let's say you're adding
099999999999999999999999999999999999999999999999999 000000000000000000000000000000000000000000000000001
If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.
quadi <quadibloc@ca.invalid> writes:
On Wed, 20 May 2026 05:38:07 +0000, Anton Ertl wrote:
* The last descendent of the PDP-11 was canceled long before the most
prominent big-endien architecture (SPARC) was canceled, and long
before Power switched its Linux support to little-endian, so the
PDP-11 had little, if any, influence on the outcome.
The reason I blame the PDP-11 for everything is that it was a hugely
influential machine. It was widely used in academic settings, and it was
also the machine for which UNIX was first widely distributed.
But its byte order was not influential into this century. Unix and
its applications are portable, including between byte orders (or at
least they were, when there were still enough machines of either byte
order around that one could test that). And somehow the PDP-11 and
its offspring did not capture the workstation market and the server
market that involved from that, and which constituted the Unix
markets.
Instead, the big-endian 68000 and its offspring dominated that market
for a while, and was replaced with RISCs later, which had the same
byte order as the earlier machines from the same company (i.e.,
little-endian for DEC and big-endian for the others). And when the
market for workstations and server on RISCs shrunk down to almost
nothing, not only did these big-endian machine vanish, but the
offspring of the PDP-11 as well (and actually before some of the
big-endian RISCs). What remains of this world is AIX on Power, and I
have no idea how many installations there still are.
Linux on Power was switched to little-endian with the introduction of OpenPower, not because of the PDP-11 descendants, but because of the Datapoint 2200 descendants. And the Datapoint 2200 (announced in June
1970) was probably not influence by the PDP-11 (announced in January
1970).
When you add two BCD numbers that are longer than a
byte, you don't have to first go to the end of the number and then go
backwards from there. This is especially relevant if you do not want to >>> completely unroll the loop that handles these bytes.
This is the reason little-endian was popular for small processors. It is
no longer relevant if a processor has a 64-bit data bus. And, of course,
it applies equally to binary and BCD.
If the numbers fit in one granule, yes, that benefit does not matter.
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?
The reason I claim that BCD support strongly favors big-endian byte order
is this:
Character strings are, of course, in "big endian" order; that is,
normally, a character string is written in memory with successive
characters at increasing addresses - and, at least in languages that are
written from left to right, numerals appear in texts with the most
significant digit first.
So if one has a hardware instruction to convert from BCD to the string
representation of numbers, such as UNPK or EDIT, then those two
representations should have the same endian-ness.
Reality check: Modern architectures tend to have byte-swap and shuffle instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
John Levine <johnl@taugh.com> writes:
According to Scott Lurndal <slp53@pacbell.net>:
The B3500 had a clever algorithm for adding BCD numbers. The
addend and augend could each be from 1 to 100 digits in length.
The algorithm would start adding from the lowest (most significant
digit in the longested operand) address of each operand adding
each digit in turn.
"The processor uses an adder that accumulates two fields
from the most significant to the least significant digit
positions. Reverse addition, as incorporated in the
B2500 and B3500 systems has the advantage of detecting
an overflow condition prior to altering the receiving field"
The algorithm used a 9's counter to track the leading
digits.
How did it handle carries? Let's say you're adding
099999999999999999999999999999999999999999999999999
000000000000000000000000000000000000000000000000001
A value that overflows the size of the receiving field
cannot be represented, so the overflow toggle is set and
the instruction terminates _without modifying the
receiving field_.
The size of the receiving field is the larger of the
two source fields. So
ADD 0508 000000 100000 200000
would add the 5 digit value at address 0 to the
8 digit value at address 100000 and store the
result at address 200000.
If it starts at the high digit, it won't know until it gets to the end
that it has to propagate carries all the way back to the beginning.
Actually, that's the clever part. They count 9s.
Example 1: 10 digit receiving field, 10 digit addend, 1 digit augend:
Memory contents before:
000000: 9999999999
000010: 1
ADD 1001 000000 000010 000020
The result of the instruction is that the overflow toggle
will be set and the destination field will remain unmodified.
The algorithm implicitly fills leading zeros into
the shorter operand.
The first digit of the addend operand is read. '9' in
this case. The first digit of the augend is added (in this
case, implicitly zero) and the result is 9. A special
register (the 9's counter) is incremented and the algorithm
proceeds to the next digit. Wash, rinse and repeat until
reaching the last digit, where the sum of 9 + 1 will overflow
a single digit, so the instruction terminates with overflow.
If in the case you showed above, there was a zero in the
first digit of both operands, there is no posibility of
overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.
There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.
On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.
So in commenting on a different part of my design entirely, you've
pointed out an important flaw I will have to correct.
On Wed, 20 May 2026 18:07:14 +0000, John Levine wrote:
On S/370 and later machines with virtual memory it was more complicated
since it had to check and be sure that all of the pages where the
operands resided were available.
Yes, since while the System/360 gave you an error if you tried to use >unaligned operands in memory, this restriction was abolished with the >System/370. Only an unaligned operand can possibly cross a page boundary, >since pages have a power-of-two size greater than the size of any data
type.
To make this work S/370 and its successors first do a trial execution of
the instruction without storing anything to see if it causes a page
fault. If not, it then redoes the instruction for real, storing the
result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.
quadi <quadibloc@ca.invalid> posted:
So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.
My 66000 started out that way and the compiler showed that this choice
sucks.
When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.
I will have to review this point, however, to be sure.
Anton Ertl wrote:
But 64 bits are not enough for all binary numbers and probably not for
all BCD numbers, either: the decimal FP people were not satisfied with
the 15-digit mantissa that are easily possible with their
representations in 64 bits; they did not even define a decimal64
format last I checked. So will 16-digit BCD numbers be satisfactory?
ieee754 does define decimal64, decimal128 and even decimal32, but the
first two has pretty much all the actual usage, probably (?) decimal128
as the majority, at least for all accumulators.
Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64
digits, would start with an exchange of the high and low 16-byte halves, >then a permute of each half to reverse the order. The final single-cycle >operation is the only overhead of the little vs high-endian inputs.
Next we duplicate the input by unpacking the high and low 16 bytes into
each byte value into 16 16-bit shorts, with the leading byte 0, then (in >parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >original input. About 15-20 cycles in total with well under 10% being
the byte order swap.
On Wed, 20 May 2026 15:42:03 +0000, Anton Ertl wrote:
quadi <quadibloc@ca.invalid> writes:
Reality check: Modern architectures tend to have byte-swap and shuffle
instructions. They tend not to have BCD-to-ASCII instructions, but
these can be implemented easily enough with the help of shuffle and
bitwise instructions. And given that you need to use shuffle anyway,
the byte-swapping does not cost extra.
An additional instruction is an additional instruction!
Terje Mathisen <terje.mathisen@tmsw.no> writes:
BCD-to-ASCII, with the input in an AVX 32-byte register, so up to 64 >>digits, would start with an exchange of the high and low 16-byte halves, >>then a permute of each half to reverse the order. The final single-cycle >>operation is the only overhead of the little vs high-endian inputs.
Next we duplicate the input by unpacking the high and low 16 bytes into >>each byte value into 16 16-bit shorts, with the leading byte 0, then (in >>parallel) you copy and mask the low nybble while shifting all shorts up
by 4 bits, then use the same all-15 mask to save the high nybbles.
OR these two back together, and do the same for the other half of the >>original input. About 15-20 cycles in total with well under 10% being
the byte order swap.
My thinking was along the lines of using VPERMB to do the
byte-swapping, the duplicating, and the unpacking in one step. E.g.,
if you have a 64-bit BCD number 1234567890123456 as the following
sequence of bytes
56 34 12 90 78 56 34 12
Then you have the index vector
7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
and VPERMB xmm1, xmm2, xmm3
(where the BCD number is in xmm3 and the index vector is in xmm2) will
put the following in xmm1:
12 12 34 34 56 56 78 78 90 90 12 12 34 34 56 56
So no extra instruction for the byte swapping.
The problem is that I now would like a masked parallel byte shift to
shift the even-indexed bytes right by 4 bits, but I don't find
parallel byte shifts. I guess the answer is to let the VPERMB arrange
the result as follows
1234 1234 5678 5678 9012 9012 3456 3456
^^^^ ^^^^ ^^^^ ^^^^
then use a masked VPSRLW for shifting the marked 16-bit pieces to the
right by 4 bits, resulting in
0123 1234 0567 5678 0901 9012 0345 3456
Now use VPSHUFB or VPERMB to rearrange the bytes in the intended order:
01 12 23 34 45 56 67 78 89 90 01 12 23 34 45 56
Finally, I have achieved my dream, insane and useless though it may be!
Well, I have taken the opportunity to squeeze one more little thing into
the instruction set that Concertina III had, but this time I could not squeeze quite as many of them in... 16-bit prefixes for instructions,
which allow the instruction set to be extended.
it will be necessary to have a special
compare instruction for unsigned integers.
Scott Lurndal wrote:
overflow and the algorithm will simply process each
digit of the addend+augend sequentially from higher
magnitude to lower magnitude. It delays writing each
digit of the sum (other than the last) until it knows
the following digit doesn't overflow. If it does
overflow, it increments the delayed value before
writing. To the extent that there multiple sequential
9s in the sum, when the next digit would overflow, the
processor uses the 9's counter and the saved digit to
store the correct digits to the receiving field.
There's a flow chart in 1025475_B2500_B3500_RefMan_Oct69.pdf
which is available on bitsavers.
So it did process them top-down, but delayed writing the anything to the >output field until it was known that it would not overflow, and the same >happened for every subsequent partial sum of 9.
Yeah, that works but it probably caused some output hickups when a long >chain of potential carries finally resolved. :-)
On Thu, 21 May 2026 00:37:39 +0000, John Levine wrote:
result. I suspect that if they had known how soon S/370 would add
paging to the 360 architecture, they might have designed these
instructions differently.
When I first read that, I thought that you meant they would have designed
it differently when they designed the 370, but, of course, the
instructions already existed. After I realized my mistake, of course, I
also knew that back in 1964 or before, there was really no way that they >could possibly have known that.
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
So instead I decided to only support double precision, and use the
extra bits to allow additional ways to specify registers.
My 66000 started out that way and the compiler showed that this choice sucks.
The good news is that this only concerns the 16-bit short instructions. A compiler can choose to ignore them if it can't handle them.
Currently, the 16-bit instructions provide the following:
All the basic operate instructions for two integer types; they can only operate on the first eight integer registers.
The basic floating operate instructions for one floating-point type; the register specification is the one used with Concertina II's paired 15-bit operate instructions; choose one of four banks of eight registers, and
both operands must be in that bank.
The idea is that it can be used for efficient pipelined code where four sequences of instructions which are independent are interleaved.
Everything else is straightforwards; the 24-bit short instructions and all the 32-bit and longer instructions that operate on registers allow the use of all 32 registers in a bank.--- Synchronet 3.22a-Linux NewsLink 1.2
Of course, though, the other restrictions are still present - seven
choices for an index register, seven choices for a base register (for each of three displacement sizes, 20, 16, and 12 bits).
I think I have indeed achieved the goal which, when I started out, I
thought might prove to be an "impossible dream" - combining what a CISC instruction set offers with what a RISC instruction set offers, and yet doing so without making the instructions longer than they usually are in those instruction types.
Except for register-to-register operate instructions being 24 bits instead of 16 bits, this has been achieved - but for a very limited subset of the possible register-to-register operate instructions, chosen by me as the
ones I think are the most useful and popular - and I realize the choice is subjective and hence potentially controversial - the 16-bit instruction length is retained!
I think it's an ISA that, in this respect, has achieved more than anyone could have expected!
Now, of course, whether or not this is an achievement that anyone cares about, that anyone wants, that anyone is interested in... well, I don't know.
John Savard
On Thu, 21 May 2026 00:06:54 +0000, quadi wrote:
I will have to review this point, however, to be sure.
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for signed numbers even if one is comparing, say, a positive number and a negative number which are both over half of the maximum possible magnitude for their format... it will be necessary to have a special compare instruction for unsigned integers.
Since there is opcode space for that readily available, though, there is
no difficulty in adding that.
John Savard
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result
for signed numbers even if one is comparing, say, a positive number and
a negative number which are both over half of the maximum possible
magnitude for their format... it will be necessary to have a special
compare instruction for unsigned integers.
quadi <quadibloc@ca.invalid> posted:
it will be necessary to have a special
compare instruction for unsigned integers.
Or a wider condition register !
quadi <quadibloc@ca.invalid> posted:
Currently, the 16-bit instructions provide the following:
All the basic operate instructions for two integer types; they can only
operate on the first eight integer registers.
I suspect you (and compiler) will end up not liking the restriction.
The basic floating operate instructions for one floating-point type;
the register specification is the one used with Concertina II's paired
15-bit operate instructions; choose one of four banks of eight
registers, and both operands must be in that bank.
I suspect you (and compiler) will end up not liking the restriction.
Amazingly enough, however, it turned out that in each case there was no difficulty in finding the additional opcode space that was needed.
I even managed to find enough opcode space to increase the size of the displacement field from 8 bits to 9 bits in all the branch instructions,
so that having 24-bit short instructions doesn't shorten their range.
Although I have not yet completed that review, it has become apparent
that, since I want the compare instruction to produce a correct result for >signed numbers even if one is comparing, say, a positive number and a >negative number which are both over half of the maximum possible magnitude >for their format... it will be necessary to have a special compare >instruction for unsigned integers.
The compare instruction in my ISA _does not_ return the same condition
codes as the subtract instruction. So if I compare bytes, the compare >instruction will correctly indicate that -100 is less than 100. The fact >that if you subtracted -100 from 100 as byte values, you wouldn't get 200, >since that doesn't fit into a signed byte, but the negative value -44 is >neither here nor there.
Because of this special handling of the MSB, I do need a different compare >instruction - not just the modified branch instructions for unsigned
values - to yield correct behavior.
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.
On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
While the System/360 had only two condition code bits, I do plan to have >full VZNC bits. However, unlike the System/360,
set of sixteen conditional branch instructions. I just have twelve: eight >instructions for testing between negative, zero, and positive nonzero in...
any combination, and instructions for separately testing for carry and >overflow.
I want a compare instruction which, for integers, isn't fooled by
overflows - and overflows happen at a different point in the two's >complement number circle for signed and unsigned; for unsigned, basically >carry takes the role of overflow. And I don't want to have to do two >instructions for the conditional branch afterwards to handle that.
=). One question in such a design is if there are cases where youwant to have the unsigned and signed conditions for the same operands,
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to
fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >to nicely and fully handle both the signed and unsigned cases; I checked >what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for >floating-point numbers.
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the conditional branch instructions, then I also have enough opcode space to fix that instead, so I likely will rework this part of the ISA into something more conventional.
I have made the first set of changes, using five-bit condition code fields to nicely and fully handle both the signed and unsigned cases; I checked what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers, which makes sense, given that they were originally in a coprocessor, but that means an extra set of instructions is needed.)
So, while it used a four-bit condition code field, I needed a five-bit one.
I did notice it didn't just always fail the signed tests if overflow was present; instead, in that case it switched plus and minus. Given that, and treating carry the same way for unsigned tests, you likely are right that--- Synchronet 3.22a-Linux NewsLink 1.2
an unsigned compare is not needed. Oh, wait; my assumed behavior that everything should just fail if there's an overflow... is reasonable for floating-point numbers.
John Savard
quadi <quadibloc@ca.invalid> posted:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >> > fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
(Worse yet, it used separate condition codes for floating-point numbers,
which makes sense, given that they were originally in a coprocessor, but
that means an extra set of instructions is needed.)
So, while it used a four-bit condition code field, I needed a five-bit one.
x86 uses COZAP but this includes P=parity, which it is unlikely you do.
Thus, 4 bits are sufficient to define 16-states, of which you only need >10-states signless{EQ, NEQ}, signed{>=, >, <, <=}, unsigned{>=, >, <, <=}.
quadi <quadibloc@ca.invalid> writes:
On Fri, 22 May 2026 07:35:36 +0000, Anton Ertl wrote:
You only need that if your flags are insufficiently expressive (i.e.,
less powerful than NCZV).
While the System/360 had only two condition code bits, I do plan to have >>full VZNC bits. However, unlike the System/360,
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
quadi <quadibloc@ca.invalid> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
I have made the first set of changes, using five-bit condition code
fields to nicely and fully handle both the signed and unsigned cases; I >>checked what the Motorola 68000 did, and found that it only provided a >>complete set of tests for signed values, but only two tests for unsigned >>ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
quadi <quadibloc@ca.invalid> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked
what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.
The usual setup is that FP operations silently overflow to +INF and
underflow to -INF. They do set sticky flags (called "exceptions" in
the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").
- anton
On Sat, 23 May 2026 09:28:45 +0000, Anton Ertl wrote:
I see four tests for unsigned conditions on the 68000
<https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
For the signed ones there is
GT >
LE <=
GE >=
LT <
What I was going by was Table 3-19 on page 3-19 of the M68000 Family >Programmer's Reference Manual on the Internet Archive from Bitsavers; it >gives the available condition code tests on the architecture as:
0000 True
0001 False
0010 High not C and not Z
0011 Low or Same C or Z
0100 Carry Clear
0101 Carry Set
0110 Not Equal not Z
0111 Equal Z
1000 Overflow Clear not V
1001 Overflow Set V
1010 Plus not N
1011 Minus N
1100 Greater or Equal (N and V) or (not N and not V)
1101 Less Than (N and not V) or (not N and V)
1110 Greater Than (N and V and not Z) or (not N and not V and not Z) >1111 Less or Equal Z or (N and not V) or (not N and V)
I took Low or Same as unsigned, and Plus, Minus, Greater or Equal, Less >Than, Greater Than, and Less or Equal as signed.
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
On 2026-05-23 5:28 a.m., Anton Ertl wrote:
quadi <quadibloc@ca.invalid> writes:
On Fri, 22 May 2026 15:48:18 +0000, quadi wrote:
However, if I have enough opcode space to add a U bit to all the
conditional branch instructions, then I also have enough opcode space to >>> fix that instead, so I likely will rework this part of the ISA into
something more conventional.
I have made the first set of changes, using five-bit condition code fields >> to nicely and fully handle both the signed and unsigned cases; I checked >> what the Motorola 68000 did, and found that it only provided a complete
set of tests for signed values, but only two tests for unsigned ones.
I see four tests for unsigned conditions on the 68000 <https://en.wikibooks.org/wiki/68000_Assembly/Conditional_Tests>:
HI >
LS <=
CC >=
CS <
CS may also be called LO
CC may also be called HS
For the signed ones there is
GT >
LE <=
GE >=
LT <
my assumed behavior that
everything should just fail if there's an overflow... is reasonable for
floating-point numbers.
The usual setup is that FP operations silently overflow to +INF and underflow to -INF. They do set sticky flags (called "exceptions" in
Methinks overflow could be to +/- INF and underflow to zero or a denormal.
the IEEE FP standard) on various conditions, including on overflows,
but also on rounding errors ("inexact").
- anton
If one has CVNZ it is enough for both signed and unsigned integer conditional testing using only four bits.
The CVNZ could be repurposed for float comparisons. V = INF. C=inexact
for instance.
According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
The S/360 is a mess as far as dealing with conditions is concerned.
Or is there a great underlying principle involved, and I fail to see
it? I doubt it, for the following reasons: 1) I have not come across
any description that eplained the underlying principe, and in fact I
have come across few descriptions at all. 2) In the 62 years that
S/360 has been available, it has not found any successors in its
particular approach to conditions.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
I agree that nobody else did that, and in retrospect it was an overoptimization.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
According to MitchAlsup <user5857@newsgrouper.org.invalid>:
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
They'd also have been better off making the addresses 32 bits and not
putting junk in the high byte, which caused endless pain later, but
they were really really worried about making low end models with 8K
bytes usable.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions
I think they thought they were saving on complexity and HW logic, but
According to MitchAlsup <user5857@newsgrouper.org.invalid>:
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
B+X+D addressing only got 12-bits
B+D addressing was for RS and SS instructions
four bits of B, 12 bits of D, 16 bit addresses
you're right that RX used another four bits.
I think they thought they were saving on complexity and HW logic, but
We don't have to guess. "Architecture of the IBM System/360" by Amdahl, Blaauw, and Brooks in the IBM Systems Journal in April 1964 described a lot of the reasoning, and they wrote a whole book about it.
They had to make a lot of other design decisions like 6 vs 8 bit
bytes, ones- vs twos-complement, length fields vs word marks for
variable length data, stack vs registers, floating point format (they
blew that one).
They said that the combination of a full length base register and a
short displacement "gives consequent gains in instruction density. The base-register approach was adopted, and then augmented, for some instructions, with a second level of indexing."
In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.
On the other hand, it's not obvious what a better use of the X
field would have been. I suppose they could have made instructions
three operand, e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat addressing.
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running out
of PSW space.
On Sat, 23 May 2026 20:03:34 +0000, MitchAlsup wrote:
S/360 would have been better off as defining PSW as a PSQW (128-bits)
which would have alleviated several problems associated with running
out of PSW space.
Remember the System/370, and its Extended Control Mode? All they lost
was the ability to switch the computer into an ASCII mode nobody ever
used.
In retrospect, B+X+D was probably a mistake since I believe that double indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Of course, though, people must have been able to get C compilers working
on z/Architecture, despite inefficiencies, or it wouldn't be possible to install Linux on those machines.
I suspect the encoded condition bits in S/360 are a reflection of
the expensive memory era in which it was created. If they had
decoded condition codes, they'd have had to find more bits in
the PSW to store them, and it was already quite full.
On Sun, 24 May 2026 01:43:29 +0000, John Levine wrote:
In retrospect, B+X+D was probably a mistake since I believe that double
indexing is rarely used, and easy to do with an extra register add. On
the other hand, it's not obvious what a better use of the X field would
have been. I suppose they could have made instructions three operand,
e.g.
A Rx,Ry,B(D)
would add the memory operand to Ry and put it in Rx but it was a long
time until compilers could make good use of that.
Since there were three-address machines back in the days before general >registers, I am surprised to hear that they didn't know how to write >compilers that made use of such a field.
But the "better use of the X field" is obvious - make the displacement
field 16 bits instead of 12 bits. Except, of course, that this would have >killed the SS format of instructions.
But I don't agree that B+X+D is a bad thing. An extra register add is an >extra instruction. And it's not rarely used; it's used every time an array >is accessed, and arrays are often accessed in inner loops!
On Sat, 23 May 2026 20:09:54 +0000, John Levine wrote:
Remember that the major reason for B+D addressing was that it let them
have 16 bit address fields in instructions while keeping 24 bit flat
addressing.
12 bits, of course. And they felt that 12 bits were enough because memory >was such an issue back then.
In hindsight, of course having a two-bit condition code was a "mistake".
But C hadn't been invented yet, so nobody knew there would be any real use >for unsigned integers.
And the PSW really was full - when IBM went to System/370, they had to >repurpose a bit in the PSW that was already assigned to an existing
feature, ASCII mode. Since nobody ever used it, however, using it instead >for the System/370's "Extended Control Mode", wherein the PSW *did* get >doubled in length was possible.
Sure they did. S/360 had separate unsigned versions of add and subtract instructions. The results were the same but the condition codes were different and the unsigned versions couldn't overflow.
According to quadi <quadibloc@ca.invalid>:
But the "better use of the X field" is obvious - make the displacement >>field 16 bits instead of 12 bits. Except, of course, that this would
have killed the SS format of instructions.
Or worse had some instructions with 12 bit displacement and some with 16 which would have been a programming nightmare.
In retrospect, B+X+D was probably a mistake since I believe that
double indexing is rarely used, and easy to do with an extra register
add.
That is the view of MIPS and RISC_V
That is not the view of x86 or ARM or My 66000 or Mc 88K
Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],
Concerning the question about why IBM chose big-endian for the S/360
On Sun, 24 May 2026 09:32:07 +0000, Anton Ertl wrote:
Most[1] architecture before the S/360 use ones-complement or
sign/magnitude representation for integers, and trap on overflow [2],
It makes sense to trap on a floating-point overflow, but trapping on an integer overflow is usually a terrible idea.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior. Otherwise, programs like random number generators wouldn't work.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
quadi <quadibloc@ca.invalid> posted:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
They work just fine using unSigned integers.
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could
still fit in, just out of place.
But that meant that this one operation would be missing from the
minimum- length immediate instructions, and would still be treated as
out of the basic instruction set, getting immediate instructions that
were 16 bits longer, for them.
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an
integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
David Brown <david.brown@hesbynett.no> writes:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.
Most programming environments I have had contact with don't trap on floating-point overflow.
So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???
The question is if an integer overflow means that something went
wrong.
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton
On Wed, 20 May 2026 01:35:01 +0000, MitchAlsup wrote:
You will find you have no <marketable> choice; you need to support::
Integer{S8, S16, S32, S64, U8, U16, U32, U64}
Float {FP8, FP16, FP32, FP64 and some way to get FP128}
After realizing that I did need a second instruction for unsigned
_division_ I then learned, to my shock, that division was not one, but
two, instructions, at least in my architecture, for integers.
And there didn't seem to be enough opcode space left for Divide Extensibly Unsigned.
I was able to re-adjust the 32-bit operate instructions so that the two places where only 96 opcodes were provided for the basic operate instructions could now provide 128 opcodes.--- Synchronet 3.22a-Linux NewsLink 1.2
The 16-bit and 24-bit short instructions could not be so modified. But
there were a few unused opcodes; so Divide Extensibly Unsigned could still fit in, just out of place.
But that meant that this one operation would be missing from the minimum- length immediate instructions, and would still be treated as out of the basic instruction set, getting immediate instructions that were 16 bits longer, for them.
The Pigeonhole Principle has finally bit me!
John Savard
David Brown <david.brown@hesbynett.no> writes:-----------------
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton--- Synchronet 3.22a-Linux NewsLink 1.2
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)
It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.
Compilers have not always been good at taking advantage of all the
features provided by hardware
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.
David Brown <david.brown@hesbynett.no> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler >would check for overflow on "a + b", and report it at runtime. >(Unfortunately, gcc does not do that unless the partial expression is >assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.
Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:
|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.
As for what gcc-12.2 does for your example on AMD64:
long foo(long a, long b)
{
return a+b-a;
}
is compiled with gcc -O3 -ftrapv to:
0: 48 89 f0 mov %rsi,%rax
3: c3 ret
If "trap on overflow" has precise semantics in the code, then this >disables a range of useful optimisations and re-arrangements. If it is >just "use trapping arithmetic instructions", then it will miss many >possible cases of actual overflow in the code, which we might want to >catch.
Which would you prefer by default?
The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the >expression "x / 2 + y / 2" - the compiler could implement that as a >combined "(x + y) / 2", but that might introduce overflow.)
x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.
gcc-12.2 compiles
long bar(long x, long y)
{
return x/2+y/2;
}
on AMD64 to:
gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret
so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
It is not easy to see how a tool can avoid false positives and false >negatives and also conveniently optimise and re-arrange code.
It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.
Compilers have not always been good at taking advantage of all the >features provided by hardware
GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage >of them.
Yes. But I leave that for another day.
- anton--- Synchronet 3.22a-Linux NewsLink 1.2
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
David Brown <david.brown@hesbynett.no> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).
Both architectures got this one wrong--IMO--and so does RISC-V.
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer
is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)
If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.
It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode
bit for non-integer calculation instructions.
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
...long bar(long x, long y)
{
return x/2+y/2;
}
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??
David Brown <david.brown@hesbynett.no> writes:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>> integer overflow is usually a terrible idea.
Most programming environments I have had contact with don't trap on floating-point overflow.
So, detecting something went wrong and you should inform the programmer >>>> is a bad idea ???
The question is if an integer overflow means that something went
wrong. Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
This supposedly helpful feature has been neglected by C compiler
developers, and you see in the progression from MIPS (1986) to Alpha
(1992) and then RISC-V (2011) that the hardware architects have
accepted that:
MIPS: add traps on signed overflow, you need to write addu if you
don't want that.
Alpha: add ignores signed overflow, you need to write addv if you want
the trapping.
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
- anton
On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
Yes. And I am used to FORTRAN, which did not trap on integer overflows.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
On Mon, 25 May 2026 19:20:01 +0000, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
David Brown <david.brown@hesbynett.no> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers
have avoided making -ftrap the default, even on platforms like MIPS
and Alpha where the implementation of -ftrapv just means to use
different instructions (e.g., add instead of addu on MIPS, and addv
instead of add on Alpha).
Both architectures got this one wrong--IMO--and so does RISC-V.
You may not have been replying to what Anton Ertl wrote above, since there was a lot in between that I snipped. But it does mention two architectures that took an approach to trapping on integer overflow... that I also tend
to disagree with.
What I'm used to is the System/360. While it made the mistake of having
two condition code bits instead of NZVC, the idea of having "trap on overflow" controlled by a bit in the PSW is... what I assumed to be normal and correct.
I could be wrong, as I haven't examined that approach critically and given full consideration to the alternatives.--- Synchronet 3.22a-Linux NewsLink 1.2
John Savard
David Brown <david.brown@hesbynett.no> schrieb:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer >>> is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
In principle, yes.
In practice, people often used whatever "worked" on their systems.
Implementors have a certain right because they control what their
compiler does or does not do.
But users did so, as well, with
Numerical Recipes a(n in)famous example.
And yes, this bites people. You can see this at https://gcc.gnu.org/gcc-13/porting_to.html :
# GCC 13 includes new optimizations which may change behavior
# on integer overflow. Traditional code, like linear congruential
# pseudo-random number generators in old programs and relying on
# a specific, non-standard behavior may now generate unexpected
# results. The option -fsanitize=undefined can be used to detect
# such code at runtime.
# It is recommended to use the intrinsic subroutine RANDOM_NUMBER for
# random number generators or, if the old behavior is desired, to use
# the -fwrapv option. Note that this option can impact performance.
If that is C, signed integer overflow is UB while unsigned integers
have wrapping behaviour - thus if your code depends on wrapping, and it
is written in C, it needs to use unsigned types or compiler-specific extensions, flags, etc. (Or C23 ckd_add and other checked arithmetic functions.)
If it is written in Zig, you need to use the specific modulo arithmetic functions even for unsigned arithmetic. If it is written in Java,
signed integer arithmetic is fine.
It all depends on the language and/or any options the language and tools might support - and code should be written to work correctly according
to the language rules.
Fortran has no standard way of implementing this unless you
restrict yourself to sizes which do not overflow a signed integer.
Implementing LCGRNGs was one reason why I pushed for unsigned--- Synchronet 3.22a-Linux NewsLink 1.2
arithmetic (modulo 2**n) in Fortran. The attempt failed (not
taken up by WG5 after being endorsed by J3), but I implemented it
for gfortran anyway.
The hardware, of course, cannot always enable trapping on overflow if it is going to efficiently support a range of programming languages. But
as an optional feature it can be helpful for catching a few bugs in
code, so it can be a good idea (both for signed and unsigned overflow).
Sanitizers are also fairly good now, but of course cost performance.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
...long bar(long x, long y)
{
return x/2+y/2;
}
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
Architectures without overflow traps are notorious for excess instruction >count when overflow detection is desired or mandated.
MIPS' add traps on overflow. gcc could have emitted almost the same
code for gcc -O3 -trapv as for gcc -O3, except that the last
instruction would be an add, not an addu. But apparently nobody gives
a damn about the efficiency of -trapv, possibly rightly so.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
It is much harder than that. For example: does a signed shift left
overflow when significant bits are shifted out ??
-ftrapv specifies trapping on overflow only for additions,
subtractions, and multiplications.
On Mon, 25 May 2026 16:45:07 +0000, MitchAlsup wrote:
My 66000 has an instruction bit that denotes the signedness of integer calculations {Signed, unSigned}. This bit is available as another OpCode bit for non-integer calculation instructions.
That's nice. It's not an option I can consider, as having lots of
orthogonal modifiers on instructions would tend to increase their length.
A major goal of the Concertina II, III, and IV architectures is for instructions not to be longer than similar instructions on the Motorola 68020 or the IBM System/360 if at all possible.
Basically, the selling point is... "Your programs only get 10% bigger, if that, and yet you have 32 registers, so they run faster!".
Or they _would_, if the design didn't have so many extra transistors for supporting both IBM-format and Intel-format Decimal Floating Point, old- style IBM floats, simple floating (You too can work with numbers that go around the world 2 1/2 times!), packed decimal, mixed-radix arithmetic...
But, hey, supporting these things in hardware is faster than doing them in software!
And are people even going to _read_ the part of the manual that
explains... as is noted in the description of the original Concertina architecture...
This chip has 8-way simultaneous multi-threading, but only for programs which do not make use of extensions to the register set.
Only two programs per core may use the extended register banks with 128 elements.--- Synchronet 3.22a-Linux NewsLink 1.2
Only one program per core may use the vector registers for long vector instructions. The 256-bit short vector registers, on the other hand, like the integer and floating-point registers, are available to all
simultaneous threads.
John Savard
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution.
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.
An awkward thing about using trap on overflow is determining how
precisely it is defined.
On Mon, 25 May 2026 10:23:00 +0200, David Brown wrote:
The hardware, of course, cannot always enable trapping on overflow if it
is going to efficiently support a range of programming languages.
Yes. And I am used to FORTRAN, which did not trap on integer overflows.
David Brown <david.brown@hesbynett.no> writes:
On 25/05/2026 16:28, Anton Ertl wrote:
Despite their eagerness to "optimize" based on the assumption
that signed integer overflow does not happen, the GCC developers have
avoided making -ftrap the default, even on platforms like MIPS and
Alpha where the implementation of -ftrapv just means to use different
instructions (e.g., add instead of addu on MIPS, and addv instead of
add on Alpha).
An awkward thing about using trap on overflow is determining how
precisely it is defined. Supposing you have the expression "a + b - a".
Perhaps "a + b" overflows. I would hope than when using debug-related
compiler flags such as "-fsanitize=signed-integer-overflow", a compiler
would check for overflow on "a + b", and report it at runtime.
(Unfortunately, gcc does not do that unless the partial expression is
assigned to a variable.) But in "normal" usage, I'd expect the
expression to be simplified, resulting in just "b" and no overflow.
OTOH, cases like a+b+c where the result is in range, while an
intermediate result is out of range are one of the reasons why I
prefer -fwrapv over -ftrapv. As for your preference of nasal demons,
given enough information, the compiler might "optimize" "a+b-a" into,
e.g., 0.
Anyway, the definition of -ftrapv is not very precise; for gcc-12.2:
|'-ftrapv'
| This option generates traps for signed overflow on addition,
| subtraction, multiplication operations.
As for what gcc-12.2 does for your example on AMD64:
long foo(long a, long b)
{
return a+b-a;
}
is compiled with gcc -O3 -ftrapv to:
0: 48 89 f0 mov %rsi,%rax
3: c3 ret
If "trap on overflow" has precise semantics in the code, then this
disables a range of useful optimisations and re-arrangements. If it is
just "use trapping arithmetic instructions", then it will miss many
possible cases of actual overflow in the code, which we might want to
catch.
Which would you prefer by default?
The gcc developers apparently took the latter approach, even when you
ask for -ftrapv explicitly. So what, IYO, speaks against doing that
by default on machines like MIPS and Alpha.
And "trap on overflow" might either trigger when there is no
overflow in the original code, or hinder optimisations. (Consider the
expression "x / 2 + y / 2" - the compiler could implement that as a
combined "(x + y) / 2", but that might introduce overflow.)
x/2+y/2 produces a different result from (x+y)/2 when both x and y are
odd integers.
gcc-12.2 compiles
long bar(long x, long y)
{
return x/2+y/2;
}
on AMD64 to:
gcc -O3 -ftrapv gcc -O3
mov %rdi,%rax mov %rdi,%rax
sub $0x8,%rsp mov %rsi,%rdx
shr $0x3f,%rax shr $0x3f,%rax
add %rax,%rdi shr $0x3f,%rdx
mov %rsi,%rax add %rdi,%rax
shr $0x3f,%rax add %rsi,%rdx
sar %rdi sar %rax
add %rax,%rsi sar %rdx
sar %rsi add %rdx,%rax
call __addvdi3@PLT ret
add $0x8,%rsp
ret
so the -ftrapv introduces an additional mov and a call; I would have
expected that the + would be compiled to an ADD instruction followed
by a JO instruction.
Trying the same on a MIPS64 machine with gcc-8.3 (which apparently
produces ILP32 code) produces a call to __addvsi3 instead of the
expected add instruction:
gcc -O3 -ftrapv gcc -O3
lui gp,0x0 srl v0,a0,0x1f
addiu gp,gp,0 srl v1,a1,0x1f
addu gp,gp,t9 addu v0,v0,a0
srl v1,a0,0x1f addu a1,v1,a1
lw t9,__addvsi3(gp) sra v0,v0,0x1
srl v0,a1,0x1f sra a1,a1,0x1
addiu sp,sp,-32 jr ra
addu a0,v1,a0 addu v0,v0,a1
addu a1,v0,a1
sra a0,a0,0x1
sw ra,28(sp)
sw gp,16(sp)
jalr t9
sra a1,a1,0x1
lw ra,28(sp)
jr ra
addiu sp,sp,32
The call costs a lot of overhead.
It is not easy to see how a tool can avoid false positives and false
negatives and also conveniently optimise and re-arrange code.
It can't. But it does not try to avoid false negatives even when
explicitly asked for trapping on overflow.
If some overflow trapping when it can be done without additional
instructions would be preferable over no overflow, gcc would compile
signed adds that survive after optimization into add on MIPS rather
than addu, by default. Given that it does not, the GCC developers
probably found out that it is not preferable. I guess they would get
too many customer complaints, including for "relevant" code, i.e.,
code where the usual "it's UB, so your code is broken" excuse does not
work.
The fact that they don't even try to make -ftrapv produce efficient
code indicates that there is no "relevant" interest in efficient
-ftrapv. It would be interesting to know who came up with the idea of
adding -ftrapv, and why they are still keeping it.
Compilers have not always been good at taking advantage of all the
features provided by hardware
GCC is pretty good at implementing -fwrapv. For the two examples
above, "gcc -O3 -fwrapv" produces the same code on AMD64 and MIPS as
"gcc -O3".
nor have languages been good at exposing
the possibilities in the language so that programmers can take advantage
of them.
Yes. But I leave that for another day.
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:I think that when an unexpected error is detected (whether it is with
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.
This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
On Sun, 24 May 2026 15:24:22 +0000, John Levine wrote:
Sure they did. S/360 had separate unsigned versions of add and subtract
instructions. The results were the same but the condition codes were
different and the unsigned versions couldn't overflow.
Ah, I didn't remember that!
On 5/25/2026 3:34 PM, quadi wrote:
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even test for fixed-point overflow, is a much worse idea.
Possibly true.
The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.
Though, it is less obvious what a useful behavior is at the language level:
"signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...
Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat; Overflows traps from any code that naively assumes wrap-on-overflow semantics;
...
In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...
One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.
Then again, could maybe classify code, say:5, a language hint about in-range, wrap, trap, signal, throw
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.
"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.
For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.
BGB <cr88192@gmail.com> posted:
On 5/25/2026 3:34 PM, quadi wrote:The important property is that overflow is detected precisely.
On Mon, 25 May 2026 16:49:59 +0000, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
RISC-V: add ignores signed overflow, there is no add that traps on
signed overflow (and detecting signed overflow is pretty
involved if both operands are unknown to the compiler).
The worst of all possible semantic encodings
Although I thought that making trapping on fixed-point overflow the
default is a bad idea, I agree that making it impossible to do so, or even >>> test for fixed-point overflow, is a much worse idea.
Possibly true.
The lack of things like ADD-with-Carry or ADD-with-Overflow are
annoyance points on RISC-V.
Though, it is less obvious what a useful behavior is at the language level: >> "signal()" ? ...
Something like try/catch (mostly N/A to C)?
Something similar to FENV_ACCESS?
...
Whether {trap, signal, throw} is performed is an environmental choice
not an ISA choice.
Well, and that if trapping were applied globally:
Overhead due to trap detection/handling code causing excessive bloat;
Overflows traps from any code that naively assumes wrap-on-overflow
semantics;
...
In some codebases, it is already enough of a pain to hunt and fix all
the out-of-bounds and uninitialized variables mess.
Signed integer overflows would likely "turn it up to 11";
Then, how does one fix it? Ask that people start adding a bunch of casts
to make it work?...
One might say:
Add "if()" cases to deal with the overflows, but, ... this only makes
sense for cases where the overflows are not the expected behavior.
If(overflow(??)) requires some flag to carry overflow from point of
detection to if(()).
And what happens if there is more than 1 overflow ??
Then again, could maybe classify code, say:5, a language hint about in-range, wrap, trap, signal, throw
1, signed, value doesn't (or shouldn't) go out-of-range;
2, unsigned, value doesn't (or shouldn't) go out-of-range;
3, signed, value is expected to be modulo;
4, unsigned, value is expected to be modulo.
"nasal demons" types assume 1 and 4 as dominant.
Or, 1 as exclusive vs 3.
For compilers, we often need to assume 3 and 4.
Because, failure to uphold 3 results in misbehaving programs.
And, if 3 were uncommon, RISC-V's "ADDW"/etc would be pure stupidity.
You would prefer::
AND R7,Rleft,#~(~0<<31)
AND R8,Rright,#~(~0<<31)
ADD Rd,R7,R8
AND Rd,Rd,#~(~0<<31)
That is ADDW range limits operands and performs a shorter ADD.
Matching C's int a,b; semantic. In general the integer instructions
ending with W apply C's int properties to the arithmetic. If compilers
were (WERE) really good at range determination those instructions would
be unnecessary--but they are not.
I (My 66000) had to put in sized integer calculation reasons, and by
doing so, gained 2%-4% in code density and a bit more in latency. -----------------------
BGB <cr88192@gmail.com> posted:
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution.
Even on 64-bit variables/machines ??
On 26/05/2026 01:00, MitchAlsup wrote:I tend to like "Release with sometimes hard-to-grok debug info",
I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid
trapping on overflow without code substitution or being re-compiled.>>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
so that the developer can figure out what went wrong. When not
debugging, there is no sensible default handling that works for jet
engine controllers and video game frame generators.
But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and
"debug" builds. (Of course you might temporarily do builds with
different flags while chasing down a particular bug.)
David Brown wrote:
On 26/05/2026 01:00, MitchAlsup wrote:
I think that when an unexpected error is detected (whether it is with hardware acceleration, like trap on overflow, or via explicit generated code), the way to handle it depends strongly on the situation. If a debugger is present, then it is most helpful to lead to a debugger break so that the developer can figure out what went wrong. When not debugging, there is no sensible default handling that works for jet
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What you do want is compiled code that can trap on overflow and avoid >>>> trapping on overflow without code substitution or being re-compiled. >>>> This way production code can avoid trapping but if the debugger is
turned on, you can trap.
Why do you consider that desirable?
So you can debug production/released code to find subtle errors.
engine controllers and video game frame generators.
But I do support the aim of having the same generated code when
debugging and when shipping - I am not a fan of "release" builds and "debug" builds. (Of course you might temporarily do builds with different flags while chasing down a particular bug.)
I tend to like "Release with sometimes hard-to-grok debug info",
typically resulting in a separate file with a best effort debug map of
the executable.
Then I can at least get some help when running the debugger and trying
to binary search my way into the spot where the bug resides.
Terje
On Mon, 25 May 2026 23:05:06 GMT, MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
BGB <cr88192@gmail.com> posted:
On 5/25/2026 9:28 AM, Anton Ertl wrote:--------------
Integer overflow happens far too often for trapping to be a good solution. >>Even on 64-bit variables/machines ??
Yes if there are options for 8/16/32 bit ops in 64 bit registers.
Encrypt the debug information (and put it in
a {1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
MitchAlsup [2026-05-26 20:54:30] wrote:
Encrypt the debug information (and put it in a
{1234-5678-9101-1121-...} folder) so that only the owner (not
licensee) of the code can debug it.
I resent that. All code should be Free Software.
Thomas Koenig <tkoenig@netcologne.de> posted:
David Brown <david.brown@hesbynett.no> schrieb:
On 24/05/2026 23:39, quadi wrote:
On Sun, 24 May 2026 17:32:10 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
It makes sense to trap on a floating-point overflow, but trapping on an >>>>>> integer overflow is usually a terrible idea.
So, detecting something went wrong and you should inform the programmer >>>>> is a bad idea ???
No, so being able to turn the trap for integer overflow on should
definitely be allowed. But that shouldn't be the default behavior.
Otherwise, programs like random number generators wouldn't work.
John Savard
That does not make sense. Code such as random number generators should
be written so that they are correct in the language they are written in.
In principle, yes.
Principle is better in theory than in practice.
In practice, people often used whatever "worked" on their systems.
Face it, the poor slug writing the code may not have the faintest
grasp at the system qualities we are discussing, and does not care
to learn as long as he can slug through the writing and his pro-
gram not blow up catastrophically while it is under his purview.
That defines a lot of what is wrong with SW programming today.
Implementors have a certain right because they control what their
compiler does or does not do.
You would be surprised at how little influence implementors have
on compilers and other software.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,118 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 39:22:35 |
| Calls: | 14,340 |
| Files: | 186,357 |
| D/L today: |
23,668 files (7,691M bytes) |
| Messages: | 2,532,986 |