In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Kent Dickey wrote:
OK, my post was about how having a hardware trap-on-overflow instruction >>> (or a mode for existing ALU instructions) is useless for anything OTHERFor me error detection of all kinds is useful. It just happens
than as a debug aid where you crash the problem on overflow (you can
have a general exception handler to shut down gracefully, but "patching things
up and continuing" doesn't work). I gave details of reasons folks might >>> want to try to use trap-on-overflow instructions, and show how the
other cases don't make sense.
to not be conveniently supported in C so no one tries it in C.
GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.
In no way was I ever arguing that checking for overflow was a bad idea,I understand, and I disagree with this conclusion.
or a language issue, or anything else. Just that CPUs should not bother >>> having trap-on-overflow instructions.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.
I think I am not explaining the issue well.
I'm not arguing what you want to do with overflow. I'm trying to show that for all uses of detecting overflow other than crashing with no recovery, hardware trapping on overflow is a poor approach.
If you enable hardware traps on integer overflow, then to do anything other than crash the program would require engineering a very complex set of
data structures, roughly approximately the complexity of adding debug information to the executable, in order to make this work. As far as I know, no one in the history of computers has yet undertaken this task.
This is because each instruction which overflows would need special
handling, and the "debug" information would be needed. It would be a huge amount of compiler/linker/runtime complexity.
This is different than most "signal" handlers people have written, where simple inspection of the instruction which failed and the address involved allows it to be "handled". But to do anything other than crash, each instruction which overflows needs special handling unique to that instruction and dependent on what the compiler was in the middle of doing when the overflow happened. This is why trapping just isn't a good idea.
I'm just explaining why trap-on-overflow has gone away, because it's
almost completely useless: hardware trap on overflow is only good for the case that you want to crash on integer overflow. Branch-on-overflow is the correct approach--the compiler can branch to either a trapping instruction (if you just want to crash), or for all other cases of detecting overflow, the compiler branches to "fixup" code.
And crash-on-overflow just isn't a popular use model, as I use the example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Kent Dickey wrote:
And crash-on-overflow just isn't a popular use model, as I use the example >> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Because C doesn't require it. That does not make the capability useless.
On 2024-10-07 22:12, MitchAlsup1 wrote:
On Mon, 7 Oct 2024 18:55:26 +0000, Kent Dickey wrote:
In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Kent Dickey wrote:
In article <O2DHO.184073$kxD8.113118@fx11.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Kent Dickey wrote:
In no way was I ever arguing that checking for overflow was a bad
idea,
or a language issue, or anything else. Just that CPUs should not
bother
having trap-on-overflow instructions.
I understand, and I disagree with this conclusion.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.
I think I am not explaining the issue well.
I'm not arguing what you want to do with overflow. I'm trying to show
that for all uses of detecting overflow other than crashing with no
recovery, hardware trapping on overflow is a poor approach.
If you enable hardware traps on integer overflow, then to do anything
other than crash the program would require engineering a very complex
set of data structures, roughly approximately the complexity of adding
debug information to the executable, in order to make this work. As
far as I know, no one in the history of computers has yet undertaken
this task.
And yet, this is exactly the kind of data C++ needs in order to
use its Try-Throw-Catch exception model. The stack walker needs
to know where on the stack is the list of stuff to free on block
exit, where are the preserved registers and how many, ...
Ada too.
There are at least two ways to do that (at least for Ada, probably also
for C++):
- Dynamically maintain a stack-like data structure (a chain, linked
list) that describes the current nesting of "code blocks" and their exception handlers. Whenever the program enters a block with an
exception handler, there is entry code that pushes the description of
that exception handler on this chain, including the address of its code;
and vice versa pop on exiting such a block.
- Statically construct a mapping table that is stored in the executable
and maps code ranges to exception handlers.
Ada implementations started with the dynamic method, which is simpler
but adds some execution cost to all blocks with exception handlers, even
if an exception never happens. Current implementations tend to the
static method, also called "zero-cost exceptions" because there is no
extra execution cost for blocks with exception handlers /unless/ an exception does occur.
EricP <ThatWouldBeTelling@thevillage.com> writes:
Kent Dickey wrote:
And crash-on-overflow just isn't a popular use model, as I use the example >>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,Because C doesn't require it. That does not make the capability useless.
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
and it's best done with conditional branches, not traps.
Except you keep missing the point:
no one has a handler for integer overflow because it should never
happen. Just like no one has a handler for memory read parity errors.
When you wrote C code using signed integers, *YOU* guarenteed to the
compiler that your code would never overflow. Overflow checking just
detects when you have made an error, just like array bounds checking,
or divide by zero checking.
This is not something being done *to you* against your will,
this is something that you *ask for* because it helps detect your
errors.
Doing it in hardware just makes it efficient.
Kent Dickey wrote:
In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Kent Dickey wrote:
OK, my post was about how having a hardware trap-on-overflowFor me error detection of all kinds is useful. It just happens
instruction
(or a mode for existing ALU instructions) is useless for anything OTHER >>>> than as a debug aid where you crash the problem on overflow (you can
have a general exception handler to shut down gracefully, but
"patching things
up and continuing" doesn't work). I gave details of reasons folks
might
want to try to use trap-on-overflow instructions, and show how the
other cases don't make sense.
to not be conveniently supported in C so no one tries it in C.
GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers
need
as it triggers for many false positives so people turn it off.
In no way was I ever arguing that checking for overflow was a bad idea, >>>> or a language issue, or anything else. Just that CPUs should notI understand, and I disagree with this conclusion.
bother
having trap-on-overflow instructions.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.
I think I am not explaining the issue well.
I'm not arguing what you want to do with overflow. I'm trying to show
that
for all uses of detecting overflow other than crashing with no recovery,
hardware trapping on overflow is a poor approach.
If you enable hardware traps on integer overflow, then to do anything
other
than crash the program would require engineering a very complex set of
data structures, roughly approximately the complexity of adding debug
information to the executable, in order to make this work. As far as
I know,
no one in the history of computers has yet undertaken this task.
VAX/VMS 1.0 in 1979 had stack-based Structured Exception Handling (SEP).
And of course carried it over onto Alpha/VMS.
WinNT had SEP in its first version in 1992 for MIPS and 386
supported both by the C compiler and OS. Win95 had support too.
In WinNT MS added __try __except keywords to the C language to support
it for both themselves inside the OS and for users.
Some languages like C++ and Ada have native support for SEP.
There can be differences in what behaviors languages expect to be
supported,
like can one continue from an exception, or pass arguments to a handler.
This is because each instruction which overflows would need special
handling, and the "debug" information would be needed. It would be a
huge
amount of compiler/linker/runtime complexity.
General structured exception handling is not as complex or expensive
as you think. It's in the multiple 1000's of instructions range
(so don't use it gratuitously).
WinNT implemented it differently on 32-bit x86 and 64-bit x64,
with the x64 method being more efficient because the compiler
does most of the work. On x64 the compiler just needs to supply
bounding low and high RIP's for *just the exception handler code*.
The cost of delivering a structured exception is the OS basically
delivers an exception to a thread dispatcher similar to a signal,
but for structured exceptions that dispatcher code acts differently.
The thread's frame pointer is the head of a single linked list of
stack frames. It starts at the bottom of stack pointed to by the
frame pointer and scans backward, taking the RIP for each context
and looking in a small table of handler bounds to see if it is in range.
If there is a handler, it is called. If it handles it, great.
Otherwise it continues to scan backwards through the stack frames.
If it gets to the top of stack and there is no handler, it invokes the thread's last chance handler, and if that doesn't intercept the exception,
it terminates the thread.
This is different than most "signal" handlers people have written, where
simple inspection of the instruction which failed and the address
involved
allows it to be "handled". But to do anything other than crash, each
instruction which overflows needs special handling unique to that
instruction
and dependent on what the compiler was in the middle of doing when the
overflow happened. This is why trapping just isn't a good idea.
Except you keep missing the point:
no one has a handler for integer overflow because it should never happen. Just like no one has a handler for memory read parity errors.
When you wrote C code using signed integers, *YOU* guarenteed to the
compiler that your code would never overflow. Overflow checking just
detects when you have made an error, just like array bounds checking,
or divide by zero checking.
This is not something being done *to you* against your will,
this is something that you *ask for* because it helps detect your errors. Doing it in hardware just makes it efficient.
A better exception usage example might be a routine that enables exceptions for floating point underflow where the FPU traps to a handler that zeros
the value and logs where it happened so someone can look at it later,
then continues with its calculation.
I'm just explaining why trap-on-overflow has gone away, because it's
almost completely useless: hardware trap on overflow is only good for
the
case that you want to crash on integer overflow. Branch-on-overflow
is the
correct approach--the compiler can branch to either a trapping
instruction
(if you just want to crash), or for all other cases of detecting
overflow,
the compiler branches to "fixup" code.
But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how
it is easily detected.
And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Because C doesn't require it. That does not make the capability useless.
Removing error detectors does not make the errors go away,
just your knowledge of them.
On 2024-10-09 2:16 p.m., EricP wrote:
Kent Dickey wrote:Slightly confused on trap versus branch. Trapping on overflow is not a
But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how
it is easily detected.
And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Because C doesn't require it. That does not make the capability useless.
Removing error detectors does not make the errors go away,
just your knowledge of them.
good solution, but a branch on overflow is? A trap is just a slow
branch. The reason for trapping was to improve code density and non-exceptional performance.
If it is the overhead of performing a trap operation that is the issue,
then a special register could be dedicated to holding the overflow
handler address, and instructions defined to automatically jump through
the overflow handler address register (a branch target address
register).
Overflow detecting instructions are just a fusion of the instruction and
the following branch on overflow operation.
addjo r1,r2,r3 <- does a jump (instead of a trap) to branch register #7
for instance, on overflow.
Having an overflow branch register might be better for code density / performance.
On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:
x86 has seriously distorted peoples view on how much overhead is
associated with a trap*.
On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:
On 2024-10-09 2:16 p.m., EricP wrote:
Kent Dickey wrote:Slightly confused on trap versus branch. Trapping on overflow is not a
But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how >>> it is easily detected.
And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow >>>> is almost as good in every way.
Kent
Because C doesn't require it. That does not make the capability useless. >>>
Removing error detectors does not make the errors go away,
just your knowledge of them.
good solution, but a branch on overflow is? A trap is just a slow
branch. The reason for trapping was to improve code density and
non-exceptional performance.
If it is the overhead of performing a trap operation that is the issue,
x86 has seriously distorted peoples view on how much overhead is
associated with a trap*. MIPS had trap handlers measuring in the
17 cycle range both getting to the handler, handling the exception,
and getting back to the instruction that trapped. Since GBOoO windows
have mispredicted branches in this kind of latency, too; then a
properly designed architecture should be able to do similarly to MIPS.
Whereas x86 may take 1,000 cycles to get to the handler. This is due
to all the Descriptor table stuff, call-gates, protection rings, and segmentation.
(*) trap == exception == fault == any unpredicted control flow
cause by the instruction stream itself (SVC-et-al not included
because it is requested by the instruction stream).
then a special register could be dedicated to holding the overflow
handler address, and instructions defined to automatically jump through
the overflow handler address register (a branch target address
register).
Overflow detecting instructions are just a fusion of the instruction and
the following branch on overflow operation.
addjo r1,r2,r3 <- does a jump (instead of a trap) to branch
register #7
for instance, on overflow.
Having an overflow branch register might be better for code density /
performance.
What if you want to handle multiply overflow differently than
addition overflow ??
Scott Lurndal wrote:
EricP <ThatWouldBeTelling@thevillage.com> writes:
Kent Dickey wrote:
And crash-on-overflow just isn't a popular use model, as I use the example >>>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,Because C doesn't require it. That does not make the capability useless.
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.
Kent
Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
and it's best done with conditional branches, not traps.
Then you use the overflow branching form for those situations
where you have a specific local overflow handler. Nothing stops that.
But that is not a justification for getting rid of overflow trapping >instructions altogether, as Kent was making. And actually it looks to me,
not knowing Cobol, like it should use overflow trapping instructions
UNLESS there is an ON OVERFLOW clause. i.e. that the default should be to >treat overflow as an error unless you explicitly state how to handle it.
The single most canonical test for IBM PC compatibility was Microsoft's Flight Simulator, taking off from the now demolished Meighs Field in
Chicago.
That game used the OS and BIOS for the loading of the game, and then
went on to direct hardware access for pretty much the rest of the
playing time.
In all cases the vendor of GPU changed ...
On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
ARM was rather late to the RISC game, this might have been literally
true.
ARM was rather early to the RISC game. Shipped for profit since late
1986.
I can remember Flight Simulator being used as the benchmark for >compatibility as far back as 1985. A report on a computer show mentioned >that clone makers were demoing it running on their products.
This is why I feel the term “IBM compatible” was misleading, it should >have been “Microsoft compatible” from at least that point on.
On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:
On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
ARM was rather late to the RISC game, this might have been literally
true.
ARM was rather early to the RISC game. Shipped for profit since late
1986.
Shipped in an actual PC, the Acorn Archimedes range.
That was the first time I ever saw a 3D shaded rendition of a flag waving,
on a computer, generated in real time. No other machine could do it,
unless you got up to the really expensive Unix workstation class (e.g.
SGI, custom Evans & Sutherland hardware etc).
Maybe all add/sub/etc opcodes that are immediately followed by an INTO=20 >could be fused into a single ADDO/SUBO/etc version that takes zero extra =
cycles as long as the trap part isn't hit?
But then, risc processors mostly, started using exceptions for housekeeping
- SPARC for register window sliding, Alpha for byte, word and misaligned >memory access
The solution for Alpha was to add back the byte and word instructions,
and add misaligned access support to all memory ops.
Kent Dickey wrote:[...]
GCC's -trapv option is not useful for a variety of reasons....
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need
as it triggers for many false positives so people turn it off.
So why should any hardware include an instruction to trap-on-overflow?
Because ALL the negative speed and code size consequences do not occur.
That's correct about intrinsics, but incorrect about ADCX/ADOX.
The later can be moderately helpful in special situuations, esp.
128b * 128b => 256b multiplication, but it is never necessary
and for addition/sbtraction is not needed at all.
Michael S <already5chosen@yahoo.com> writes:
That's correct about intrinsics, but incorrect about ADCX/ADOX.
The later can be moderately helpful in special situuations, esp.
128b * 128b => 256b multiplication, but it is never necessary
and for addition/sbtraction is not needed at all.
They are useful if there are two strings of additions. This happens naturally in wide multiplication (also beyond 256b results). But it
also happens when you add three multi-precision numbers (say, X, Y,
Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both
additions in one loop, so XYi can be in a register and does not need
to be stored . If you don't have these instructions, only ADC, you
need one loop to compute X+Y and store the result in memory, and one
loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
substantial additional cost.
If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
bits, so you have to spend the overhead of an additional loop (but not
of two additional loops as without ADCX/ADOX).
With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
(one is zero, one is sp), you can add 14 multi-precision numbers per
loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
for the loop counter, 13 registers for loop-carried carry flags.
Of course, the question is if this kind of computation is needed
frequently enough to justify this kind of extension. For
multi-precision multiplication and squaring, Intel considered the
frequency relevant enough to introduce ADCX/ADOX/MULX.
- anton
On Mon, 7 Oct 2024 13:05:53 +0300, Michael S wrote:I am not sure about dog-tail relationships.
In all cases the vendor of GPU changed ...
That, too, added to the problem, in that the software folks had to
rewrite all the performance-intensive bits yet again for the new
machine.
OpenCL never took off because the GPGPU market simply isn’t
competitive enough. NVidia is dominant, AMD plays second fiddle, and
that’s it.
To their defense, AMD's use of the term ROP didn't last for long.
K8 manuals use the better term micro-ops. I don't have K7 manual to
look, but it seems to me that it uses the same terminology as K8.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:mentioned
I can remember Flight Simulator being used as the benchmark for >>compatibility as far back as 1985. A report on a computer show
that clone makers were demoing it running on their products.
This is why I feel the term “IBM compatible” was misleading, it should >>have been “Microsoft compatible” from at least that point on.
It was IBM PC compatible, and that was not misleading, because that's
what it was about.
... lots of hardware was Microsoft DOS compatible ...
On Fri, 11 Oct 2024 01:41:58 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
OpenCL never took off because the GPGPU market simply isn’t competitive
enough. NVidia is dominant, AMD plays second fiddle, and that’s it.
I am not sure about dog-tail relationships.
EricP <ThatWouldBeTelling@thevillage.com> writes:
Kent Dickey wrote:[...]
GCC's -trapv option is not useful for a variety of reasons.....
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.
So why should any hardware include an instruction to trap-on-overflow?Because ALL the negative speed and code size consequences do not occur.
Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
18.1.0, I get a 15-instruction sequence which does not include add
(the trap-on-overflow version).
MIPS gcc 14.2.0 generates a sequence that includes
jal __addvsi3
i.e., just as for x86-64. Similar for MIPS64 with these compilers.
Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
way of checking overflow at all.
- anton
EricP <ThatWouldBeTelling@thevillage.com> writes:
But then, risc processors mostly, started using exceptions for housekeeping >> - SPARC for register window sliding, Alpha for byte, word and misaligned
memory access
On Alpha the assembler expands byte, word and unaligned access
mnemonics into sequences of machine instructions; if you compile for
BWX extensions, byte and word mnemonics get compiled into BWX
instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
signal at least on Linux. This terminates your typical program, so
it's not at all frequent.
Concerning unaligned accesses, if you use a load or store that
requires alignment, Digital OSF/1 (and the later versions with various
names) by default produced a signal rather than fixing it up, so again programs are typically terminated, and the exception is not at all
frequent. There is a system call and a tool (uac) that allows telling
the OS to fix up unaligned accesses, but it played no role in my
experience while I was still using Digital OSF/1 (including it's
successors).
On Linux the default behaviour was to fix up the unaligned accesses
and to log that in the system log. There were a few such messages in
the log per day, so that obviously was not a frequent occurence,
either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
wanted to get a signal when an unaligned access happens.
As for the unaligned-access mnemonics, these were obviously barely
used: I found that gas generates wrong code for ustq several years
after Alpha was introduced, so obviously no software running under
Linux has used this mnemonic.
The solution for Alpha was to add back the byte and word instructions,
and add misaligned access support to all memory ops.
Alpha added BWX instructions, but not because it had used trapping to
emulate them earlier; Old or portable binaries continued to use
instruction sequences. Alpha traps when you do, e.g., an unaligned
ldq in all Alpha implementations I have had contact with (up to a
800MHz 21264B).
- anton
EricP <ThatWouldBeTelling@thevillage.com> writes:
Kent Dickey wrote:[...]
GCC's -trapv option is not useful for a variety of reasons....
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.
So why should any hardware include an instruction to trap-on-overflow?
Because ALL the negative speed and code size consequences do not occur.
Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
18.1.0, I get a 15-instruction sequence which does not include add
(the trap-on-overflow version).
MIPS gcc 14.2.0 generates a sequence that includes
jal __addvsi3
i.e., just as for x86-64. Similar for MIPS64 with these compilers.
Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
way of checking overflow at all.
- anton
I see where I'm going wrong: I'm trying to talk about the machines
designed to run MS-DOS and later Windows, not just the CPUs. The vast
range of hardware that all had substantial degrees of compatibility as regards booting, busses and so on. Those things let their manufacturers compete for the DOS and Windows market, whereas x86-based machines that weren't PC-compatible only succeeded in quite specialised niches.
Those hardware suppliers did not close off access to the more advanced features of i386 onwards, because they had no reason to, and that let
Linux take advantage of all that hardware when it came along. That's the point I was failing to make.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 119:56:44 |
Calls: | 12,958 |
Files: | 186,574 |
Messages: | 3,265,641 |