Forum: War Ensemble BBS

Re: is Vax addressing sane today

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 14:16:51 2024

From Newsgroup: comp.arch

Kent Dickey wrote:

In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Kent Dickey wrote:

OK, my post was about how having a hardware trap-on-overflow instruction >>> (or a mode for existing ALU instructions) is useless for anything OTHER
than as a debug aid where you crash the problem on overflow (you can
have a general exception handler to shut down gracefully, but "patching things
up and continuing" doesn't work). I gave details of reasons folks might >>> want to try to use trap-on-overflow instructions, and show how the
other cases don't make sense.

For me error detection of all kinds is useful. It just happens
to not be conveniently supported in C so no one tries it in C.

GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.

In no way was I ever arguing that checking for overflow was a bad idea,
or a language issue, or anything else. Just that CPUs should not bother >>> having trap-on-overflow instructions.

I understand, and I disagree with this conclusion.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.

I think I am not explaining the issue well.

I'm not arguing what you want to do with overflow. I'm trying to show that for all uses of detecting overflow other than crashing with no recovery, hardware trapping on overflow is a poor approach.

If you enable hardware traps on integer overflow, then to do anything other than crash the program would require engineering a very complex set of
data structures, roughly approximately the complexity of adding debug information to the executable, in order to make this work. As far as I know, no one in the history of computers has yet undertaken this task.

VAX/VMS 1.0 in 1979 had stack-based Structured Exception Handling (SEP).
And of course carried it over onto Alpha/VMS.
WinNT had SEP in its first version in 1992 for MIPS and 386
supported both by the C compiler and OS. Win95 had support too.
In WinNT MS added __try __except keywords to the C language to support
it for both themselves inside the OS and for users.

Some languages like C++ and Ada have native support for SEP.
There can be differences in what behaviors languages expect to be supported, like can one continue from an exception, or pass arguments to a handler.

This is because each instruction which overflows would need special
handling, and the "debug" information would be needed. It would be a huge amount of compiler/linker/runtime complexity.

General structured exception handling is not as complex or expensive
as you think. It's in the multiple 1000's of instructions range
(so don't use it gratuitously).

WinNT implemented it differently on 32-bit x86 and 64-bit x64,
with the x64 method being more efficient because the compiler
does most of the work. On x64 the compiler just needs to supply
bounding low and high RIP's for *just the exception handler code*.

The cost of delivering a structured exception is the OS basically
delivers an exception to a thread dispatcher similar to a signal,
but for structured exceptions that dispatcher code acts differently.
The thread's frame pointer is the head of a single linked list of
stack frames. It starts at the bottom of stack pointed to by the
frame pointer and scans backward, taking the RIP for each context
and looking in a small table of handler bounds to see if it is in range.
If there is a handler, it is called. If it handles it, great.
Otherwise it continues to scan backwards through the stack frames.
If it gets to the top of stack and there is no handler, it invokes the
thread's last chance handler, and if that doesn't intercept the exception,
it terminates the thread.

This is different than most "signal" handlers people have written, where simple inspection of the instruction which failed and the address involved allows it to be "handled". But to do anything other than crash, each instruction which overflows needs special handling unique to that instruction and dependent on what the compiler was in the middle of doing when the overflow happened. This is why trapping just isn't a good idea.

Except you keep missing the point:
no one has a handler for integer overflow because it should never happen.
Just like no one has a handler for memory read parity errors.

When you wrote C code using signed integers, *YOU* guarenteed to the
compiler that your code would never overflow. Overflow checking just
detects when you have made an error, just like array bounds checking,
or divide by zero checking.

This is not something being done *to you* against your will,
this is something that you *ask for* because it helps detect your errors.
Doing it in hardware just makes it efficient.

A better exception usage example might be a routine that enables exceptions
for floating point underflow where the FPU traps to a handler that zeros
the value and logs where it happened so someone can look at it later,
then continues with its calculation.

I'm just explaining why trap-on-overflow has gone away, because it's
almost completely useless: hardware trap on overflow is only good for the case that you want to crash on integer overflow. Branch-on-overflow is the correct approach--the compiler can branch to either a trapping instruction (if you just want to crash), or for all other cases of detecting overflow, the compiler branches to "fixup" code.

But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how
it is easily detected.

And crash-on-overflow just isn't a popular use model, as I use the example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Removing error detectors does not make the errors go away,
just your knowledge of them.

--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed Oct 9 18:42:42 2024

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

And crash-on-overflow just isn't a popular use model, as I use the example >> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause), and it's best done with conditional branches, not traps.
--- Synchronet 3.20a-Linux NewsLink 1.114

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 14:43:49 2024

From Newsgroup: comp.arch

Niklas Holsti wrote:

On 2024-10-07 22:12, MitchAlsup1 wrote:

On Mon, 7 Oct 2024 18:55:26 +0000, Kent Dickey wrote:

In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Kent Dickey wrote:

In article <O2DHO.184073$kxD8.113118@fx11.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Kent Dickey wrote:

In no way was I ever arguing that checking for overflow was a bad
idea,
or a language issue, or anything else. Just that CPUs should not
bother
having trap-on-overflow instructions.

I understand, and I disagree with this conclusion.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.

I think I am not explaining the issue well.

I'm not arguing what you want to do with overflow. I'm trying to show
that for all uses of detecting overflow other than crashing with no
recovery, hardware trapping on overflow is a poor approach.

If you enable hardware traps on integer overflow, then to do anything
other than crash the program would require engineering a very complex
set of data structures, roughly approximately the complexity of adding
debug information to the executable, in order to make this work. As
far as I know, no one in the history of computers has yet undertaken
this task.

And yet, this is exactly the kind of data C++ needs in order to
use its Try-Throw-Catch exception model. The stack walker needs
to know where on the stack is the list of stuff to free on block
exit, where are the preserved registers and how many, ...

Ada too.

There are at least two ways to do that (at least for Ada, probably also
for C++):

- Dynamically maintain a stack-like data structure (a chain, linked
list) that describes the current nesting of "code blocks" and their exception handlers. Whenever the program enters a block with an
exception handler, there is entry code that pushes the description of
that exception handler on this chain, including the address of its code;
and vice versa pop on exiting such a block.

Usually it uses the frame pointer to create a single linked list of
call frames to walk backwards when scanning for an exception handler.

There is also control block information that needs to be dynamically
set up for each handler, so there is some runtime overhead.

- Statically construct a mapping table that is stored in the executable
and maps code ranges to exception handlers.

The static method moves as much as possible of the control block
information out of the dynamic context, lowering the set up cost
for a handler.

Ada implementations started with the dynamic method, which is simpler
but adds some execution cost to all blocks with exception handlers, even
if an exception never happens. Current implementations tend to the
static method, also called "zero-cost exceptions" because there is no
extra execution cost for blocks with exception handlers /unless/ an exception does occur.

Windows used the dynamic method in 32-bit x86 OS and switched
to static method on 64-bit x64 as it has lower runtime overhead.

Structured Exception Handling (C/C++) https://learn.microsoft.com/en-us/cpp/cpp/structured-exception-handling-c-cpp?view=msvc-170

x64 exception handling https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170

Exception handling in MSVC https://learn.microsoft.com/en-us/cpp/cpp/exception-handling-in-visual-cpp?view=msvc-170

Modern C++ best practices for exceptions and error handling https://learn.microsoft.com/en-us/cpp/cpp/errors-and-exception-handling-modern-cpp?view=msvc-170

--- Synchronet 3.20a-Linux NewsLink 1.114

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 15:08:05 2024

From Newsgroup: comp.arch

Scott Lurndal wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

And crash-on-overflow just isn't a popular use model, as I use the example >>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
and it's best done with conditional branches, not traps.

Then you use the overflow branching form for those situations
where you have a specific local overflow handler. Nothing stops that.

But that is not a justification for getting rid of overflow trapping instructions altogether, as Kent was making. And actually it looks to me,
not knowing Cobol, like it should use overflow trapping instructions
UNLESS there is an ON OVERFLOW clause. i.e. that the default should be to
treat overflow as an error unless you explicitly state how to handle it.

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 9 19:43:39 2024

From Newsgroup: comp.arch

On Wed, 9 Oct 2024 18:16:51 +0000, EricP wrote:

Except you keep missing the point:
no one has a handler for integer overflow because it should never
happen. Just like no one has a handler for memory read parity errors.

Oh contrairé:
I understand how to recover from even "late write ECC violations*"--
but mostly that is because I am primarily a HW guy. (*) When a cache
line displaced from L1 or L2 arrives at L3/DRAM with a bad ECC.

When you wrote C code using signed integers, *YOU* guarenteed to the
compiler that your code would never overflow. Overflow checking just
detects when you have made an error, just like array bounds checking,
or divide by zero checking.

I disagree with this statement. I wrote in C under the knowledge
that integer data types can overflow--they have to be able to--
it is the nature of fixed size containers. I am happy for the
compiler to IGNORE the possibility of overflow, but not the HW.

This is not something being done *to you* against your will,
this is something that you *ask for* because it helps detect your
errors.
Doing it in hardware just makes it efficient.

Yes, allow the compiler to IGNORE the problem, but have HW detect the
problem.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Robert Finch@robfi680@gmail.com to comp.arch on Wed Oct 9 16:12:40 2024

From Newsgroup: comp.arch

On 2024-10-09 2:16 p.m., EricP wrote:

Kent Dickey wrote:

In article <efXIO.169388$1m96.45507@fx15.iad>,
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Kent Dickey wrote:

OK, my post was about how having a hardware trap-on-overflow
instruction
(or a mode for existing ALU instructions) is useless for anything OTHER >>>> than as a debug aid where you crash the problem on overflow (you can
have a general exception handler to shut down gracefully, but
"patching things
up and continuing" doesn't work). I gave details of reasons folks
might
want to try to use trap-on-overflow instructions, and show how the
other cases don't make sense.

For me error detection of all kinds is useful. It just happens
to not be conveniently supported in C so no one tries it in C.

GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers
need
as it triggers for many false positives so people turn it off.

In no way was I ever arguing that checking for overflow was a bad idea, >>>> or a language issue, or anything else. Just that CPUs should not
bother
having trap-on-overflow instructions.

I understand, and I disagree with this conclusion.
I think all forms of software error detection are useful and
HW should make them simple and eliminate cost when possible.

I think I am not explaining the issue well.

I'm not arguing what you want to do with overflow. I'm trying to show
that
for all uses of detecting overflow other than crashing with no recovery,
hardware trapping on overflow is a poor approach.

If you enable hardware traps on integer overflow, then to do anything
other
than crash the program would require engineering a very complex set of
data structures, roughly approximately the complexity of adding debug
information to the executable, in order to make this work. As far as
I know,
no one in the history of computers has yet undertaken this task.

VAX/VMS 1.0 in 1979 had stack-based Structured Exception Handling (SEP).
And of course carried it over onto Alpha/VMS.
WinNT had SEP in its first version in 1992 for MIPS and 386
supported both by the C compiler and OS. Win95 had support too.
In WinNT MS added __try __except keywords to the C language to support
it for both themselves inside the OS and for users.

Some languages like C++ and Ada have native support for SEP.
There can be differences in what behaviors languages expect to be
supported,
like can one continue from an exception, or pass arguments to a handler.

This is because each instruction which overflows would need special
handling, and the "debug" information would be needed. It would be a
huge
amount of compiler/linker/runtime complexity.

General structured exception handling is not as complex or expensive
as you think. It's in the multiple 1000's of instructions range
(so don't use it gratuitously).

WinNT implemented it differently on 32-bit x86 and 64-bit x64,
with the x64 method being more efficient because the compiler
does most of the work. On x64 the compiler just needs to supply
bounding low and high RIP's for *just the exception handler code*.

The cost of delivering a structured exception is the OS basically
delivers an exception to a thread dispatcher similar to a signal,
but for structured exceptions that dispatcher code acts differently.
The thread's frame pointer is the head of a single linked list of
stack frames. It starts at the bottom of stack pointed to by the
frame pointer and scans backward, taking the RIP for each context
and looking in a small table of handler bounds to see if it is in range.
If there is a handler, it is called. If it handles it, great.
Otherwise it continues to scan backwards through the stack frames.
If it gets to the top of stack and there is no handler, it invokes the thread's last chance handler, and if that doesn't intercept the exception,
it terminates the thread.

This is different than most "signal" handlers people have written, where
simple inspection of the instruction which failed and the address
involved
allows it to be "handled". But to do anything other than crash, each
instruction which overflows needs special handling unique to that
instruction
and dependent on what the compiler was in the middle of doing when the
overflow happened. This is why trapping just isn't a good idea.

Except you keep missing the point:
no one has a handler for integer overflow because it should never happen. Just like no one has a handler for memory read parity errors.

When you wrote C code using signed integers, *YOU* guarenteed to the
compiler that your code would never overflow. Overflow checking just
detects when you have made an error, just like array bounds checking,
or divide by zero checking.

This is not something being done *to you* against your will,
this is something that you *ask for* because it helps detect your errors. Doing it in hardware just makes it efficient.

A better exception usage example might be a routine that enables exceptions for floating point underflow where the FPU traps to a handler that zeros
the value and logs where it happened so someone can look at it later,
then continues with its calculation.

I'm just explaining why trap-on-overflow has gone away, because it's
almost completely useless: hardware trap on overflow is only good for
the
case that you want to crash on integer overflow. Branch-on-overflow
is the
correct approach--the compiler can branch to either a trapping
instruction
(if you just want to crash), or for all other cases of detecting
overflow,
the compiler branches to "fixup" code.

But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how
it is easily detected.

And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Removing error detectors does not make the errors go away,
just your knowledge of them.

Slightly confused on trap versus branch. Trapping on overflow is not a
good solution, but a branch on overflow is? A trap is just a slow
branch. The reason for trapping was to improve code density and non-exceptional performance.
If it is the overhead of performing a trap operation that is the issue,
then a special register could be dedicated to holding the overflow
handler address, and instructions defined to automatically jump through
the overflow handler address register (a branch target address register). Overflow detecting instructions are just a fusion of the instruction and
the following branch on overflow operation.

addjo r1,r2,r3 <- does a jump (instead of a trap) to branch register #7
for instance, on overflow.

Having an overflow branch register might be better for code density / performance.

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 9 21:36:21 2024

From Newsgroup: comp.arch

On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:

On 2024-10-09 2:16 p.m., EricP wrote:

Kent Dickey wrote:

But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how
it is easily detected.

And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Removing error detectors does not make the errors go away,
just your knowledge of them.

Slightly confused on trap versus branch. Trapping on overflow is not a
good solution, but a branch on overflow is? A trap is just a slow
branch. The reason for trapping was to improve code density and non-exceptional performance.
If it is the overhead of performing a trap operation that is the issue,

x86 has seriously distorted peoples view on how much overhead is
associated with a trap*. MIPS had trap handlers measuring in the
17 cycle range both getting to the handler, handling the exception,
and getting back to the instruction that trapped. Since GBOoO windows
have mispredicted branches in this kind of latency, too; then a
properly designed architecture should be able to do similarly to MIPS.

Whereas x86 may take 1,000 cycles to get to the handler. This is due
to all the Descriptor table stuff, call-gates, protection rings, and segmentation.

(*) trap == exception == fault == any unpredicted control flow
cause by the instruction stream itself (SVC-et-al not included
because it is requested by the instruction stream).

then a special register could be dedicated to holding the overflow
handler address, and instructions defined to automatically jump through
the overflow handler address register (a branch target address
register).
Overflow detecting instructions are just a fusion of the instruction and
the following branch on overflow operation.

addjo r1,r2,r3 <- does a jump (instead of a trap) to branch register #7
for instance, on overflow.

Having an overflow branch register might be better for code density / performance.

What if you want to handle multiply overflow differently than
addition overflow ??
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.arch on Thu Oct 10 15:36:32 2024

From Newsgroup: comp.arch

On Wed, 9 Oct 2024 21:36:21 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:

On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:

x86 has seriously distorted peoples view on how much overhead is
associated with a trap*.

Do you have an opinion about FRED? https://cdrdv2-public.intel.com/819481/346446-flexible-return-and-event-delivery.pdf

--- Synchronet 3.20a-Linux NewsLink 1.114

From Robert Finch@robfi680@gmail.com to comp.arch on Thu Oct 10 08:57:17 2024

From Newsgroup: comp.arch

On 2024-10-09 5:36 p.m., MitchAlsup1 wrote:

On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:

On 2024-10-09 2:16 p.m., EricP wrote:

Kent Dickey wrote:

But crash on overflow *IS* the correct behavior in 99.999% of cases.
Branch on overflow is ALSO needed in certain rare cases and I showed how >>> it is easily detected.

And crash-on-overflow just isn't a popular use model, as I use the
example
of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow >>>> is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless. >>>
Removing error detectors does not make the errors go away,
just your knowledge of them.

Slightly confused on trap versus branch. Trapping on overflow is not a
good solution, but a branch on overflow is? A trap is just a slow
branch. The reason for trapping was to improve code density and
non-exceptional performance.
If it is the overhead of performing a trap operation that is the issue,

x86 has seriously distorted peoples view on how much overhead is
associated with a trap*. MIPS had trap handlers measuring in the
17 cycle range both getting to the handler, handling the exception,
and getting back to the instruction that trapped. Since GBOoO windows
have mispredicted branches in this kind of latency, too; then a
properly designed architecture should be able to do similarly to MIPS.

Whereas x86 may take 1,000 cycles to get to the handler. This is due
to all the Descriptor table stuff, call-gates, protection rings, and segmentation.

(*) trap == exception == fault == any unpredicted control flow
cause by the instruction stream itself (SVC-et-al not included
because it is requested by the instruction stream).

then a special register could be dedicated to holding the overflow
handler address, and instructions defined to automatically jump through
the overflow handler address register (a branch target address
register).
Overflow detecting instructions are just a fusion of the instruction and
the following branch on overflow operation.

addjo r1,r2,r3 <- does a jump (instead of a trap) to branch
register #7
for instance, on overflow.

Having an overflow branch register might be better for code density /
performance.

What if you want to handle multiply overflow differently than
addition overflow ??

The branch register could be reloaded before the operation with a
different handler address. It would be one instruction to load a PC
relative address perhaps. Slightly better than having one instruction
after every op.
For Q+ there would be room in the instruction to specify a branch
register, so multiple handler targets could be supported.

--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Oct 10 15:32:21 2024

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

Scott Lurndal wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

And crash-on-overflow just isn't a popular use model, as I use the example >>>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
and no compiler seems to use it. Especially since branch-on-overflow
is almost as good in every way.

Kent

Because C doesn't require it. That does not make the capability useless.

Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
and it's best done with conditional branches, not traps.

Then you use the overflow branching form for those situations
where you have a specific local overflow handler. Nothing stops that.

But that is not a justification for getting rid of overflow trapping >instructions altogether, as Kent was making. And actually it looks to me,
not knowing Cobol, like it should use overflow trapping instructions
UNLESS there is an ON OVERFLOW clause. i.e. that the default should be to >treat overflow as an error unless you explicitly state how to handle it.

See https://www.mainframestechhelp.com/tutorials/cobol/size-error-phrase.htm

The default is to truncate. All other cases can be handled with
a conditional branch.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:40:15 2024

From Newsgroup: comp.arch

On Mon, 7 Oct 2024 10:17:26 +0200, Terje Mathisen wrote:

The single most canonical test for IBM PC compatibility was Microsoft's Flight Simulator, taking off from the now demolished Meighs Field in
Chicago.

That game used the OS and BIOS for the loading of the game, and then
went on to direct hardware access for pretty much the rest of the
playing time.

I can remember Flight Simulator being used as the benchmark for
compatibility as far back as 1985. A report on a computer show mentioned
that clone makers were demoing it running on their products.

This is why I feel the term “IBM compatible” was misleading, it should have been “Microsoft compatible” from at least that point on.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:41:58 2024

From Newsgroup: comp.arch

On Mon, 7 Oct 2024 13:05:53 +0300, Michael S wrote:

In all cases the vendor of GPU changed ...

That, too, added to the problem, in that the software folks had to rewrite
all the performance-intensive bits yet again for the new machine.

OpenCL never took off because the GPGPU market simply isn’t competitive enough. NVidia is dominant, AMD plays second fiddle, and that’s it.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:46:51 2024

From Newsgroup: comp.arch

On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:

On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

ARM was rather late to the RISC game, this might have been literally
true.

ARM was rather early to the RISC game. Shipped for profit since late
1986.

Shipped in an actual PC, the Acorn Archimedes range.

That was the first time I ever saw a 3D shaded rendition of a flag waving,
on a computer, generated in real time. No other machine could do it,
unless you got up to the really expensive Unix workstation class (e.g.
SGI, custom Evans & Sutherland hardware etc).
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 11 06:42:15 2024

From Newsgroup: comp.arch

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

I can remember Flight Simulator being used as the benchmark for >compatibility as far back as 1985. A report on a computer show mentioned >that clone makers were demoing it running on their products.

This is why I feel the term “IBM compatible” was misleading, it should >have been “Microsoft compatible” from at least that point on.

It was IBM PC compatible, and that was not misleading, because that's
what it was about. "Microsoft compatible" would have been misleading
(if you want it to mean the same as "IBM PC compatible"), because lots
of hardware was Microsoft DOS compatible that was not an IBM PC clone
and therefore not 100% IBM PC compatible. And MS-DOS was certainly
the higher-profile Microsoft product than the Flight Simulator.

And many buyers did not care about the Flight Simulator, but more
about Lotus 1-2-3, which also required an IBM PC compatible machine.

Of course you saw the Flight Simulator a lot at shows: Moving pictures
attract the eye in a way that a static spreadsheet screen does not.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Oct 11 14:20:20 2024

From Newsgroup: comp.arch

On 11/10/2024 03:46, Lawrence D'Oliveiro wrote:

On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:

On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

ARM was rather late to the RISC game, this might have been literally
true.

ARM was rather early to the RISC game. Shipped for profit since late
1986.

Shipped in an actual PC, the Acorn Archimedes range.

That was the first time I ever saw a 3D shaded rendition of a flag waving,
on a computer, generated in real time. No other machine could do it,
unless you got up to the really expensive Unix workstation class (e.g.
SGI, custom Evans & Sutherland hardware etc).

The Acorn Archimedes was /way/ ahead of anything in the PC / x86 world,
both in hardware and software. It could emulate an 80286 PC almost as
fast as real PC's that you could buy at the time for a higher price than
the Archimedes.

The demo that impressed me most was drawing full-screen Mandelbrot sets
in a second or two, compared to several minutes for a typical PC at the
time. It meant you could do real-time zooming and flying around in the set.

My first encounter with ARM assembly was enhancing that demo program for higher screen resolution and deeper zooming.

--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 08:23:39 2024

From Newsgroup: comp.arch

Terje Mathisen <terje.mathisen@tmsw.no> writes:

Maybe all add/sub/etc opcodes that are immediately followed by an INTO=20 >could be fused into a single ADDO/SUBO/etc version that takes zero extra =

cycles as long as the trap part isn't hit?

On Intel P-cores add/inc/sub etc. has been fused with a following
JO/JNO into one uop for quite a while (I guess since Sandy Bridge
(2011)).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 08:45:57 2024

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

But then, risc processors mostly, started using exceptions for housekeeping
- SPARC for register window sliding, Alpha for byte, word and misaligned >memory access

On Alpha the assembler expands byte, word and unaligned access
mnemonics into sequences of machine instructions; if you compile for
BWX extensions, byte and word mnemonics get compiled into BWX
instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
signal at least on Linux. This terminates your typical program, so
it's not at all frequent.

Concerning unaligned accesses, if you use a load or store that
requires alignment, Digital OSF/1 (and the later versions with various
names) by default produced a signal rather than fixing it up, so again
programs are typically terminated, and the exception is not at all
frequent. There is a system call and a tool (uac) that allows telling
the OS to fix up unaligned accesses, but it played no role in my
experience while I was still using Digital OSF/1 (including it's
successors).

On Linux the default behaviour was to fix up the unaligned accesses
and to log that in the system log. There were a few such messages in
the log per day, so that obviously was not a frequent occurence,
either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
wanted to get a signal when an unaligned access happens.

As for the unaligned-access mnemonics, these were obviously barely
used: I found that gas generates wrong code for ustq several years
after Alpha was introduced, so obviously no software running under
Linux has used this mnemonic.

The solution for Alpha was to add back the byte and word instructions,
and add misaligned access support to all memory ops.

Alpha added BWX instructions, but not because it had used trapping to
emulate them earlier; Old or portable binaries continued to use
instruction sequences. Alpha traps when you do, e.g., an unaligned
ldq in all Alpha implementations I have had contact with (up to a
800MHz 21264B).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 09:18:23 2024

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

[...]

GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need
as it triggers for many false positives so people turn it off.

...

So why should any hardware include an instruction to trap-on-overflow?

Because ALL the negative speed and code size consequences do not occur.

Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
18.1.0, I get a 15-instruction sequence which does not include add
(the trap-on-overflow version).

MIPS gcc 14.2.0 generates a sequence that includes

jal __addvsi3

i.e., just as for x86-64. Similar for MIPS64 with these compilers.

Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
way of checking overflow at all.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 10:23:18 2024

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> writes:

That's correct about intrinsics, but incorrect about ADCX/ADOX.
The later can be moderately helpful in special situuations, esp.
128b * 128b => 256b multiplication, but it is never necessary
and for addition/sbtraction is not needed at all.

They are useful if there are two strings of additions. This happens
naturally in wide multiplication (also beyond 256b results). But it
also happens when you add three multi-precision numbers (say, X, Y,
Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both
additions in one loop, so XYi can be in a register and does not need
to be stored . If you don't have these instructions, only ADC, you
need one loop to compute X+Y and store the result in memory, and one
loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
substantial additional cost.

If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
bits, so you have to spend the overhead of an additional loop (but not
of two additional loops as without ADCX/ADOX).

With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
(one is zero, one is sp), you can add 14 multi-precision numbers per
loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
for the loop counter, 13 registers for loop-carried carry flags.

Of course, the question is if this kind of computation is needed
frequently enough to justify this kind of extension. For
multi-precision multiplication and squaring, Intel considered the
frequency relevant enough to introduce ADCX/ADOX/MULX.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 13 13:00:14 2024

From Newsgroup: comp.arch

On Sat, 12 Oct 2024 10:23:18 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:

That's correct about intrinsics, but incorrect about ADCX/ADOX.
The later can be moderately helpful in special situuations, esp.
128b * 128b => 256b multiplication, but it is never necessary
and for addition/sbtraction is not needed at all.

They are useful if there are two strings of additions. This happens naturally in wide multiplication (also beyond 256b results). But it
also happens when you add three multi-precision numbers (say, X, Y,
Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both
additions in one loop, so XYi can be in a register and does not need
to be stored . If you don't have these instructions, only ADC, you
need one loop to compute X+Y and store the result in memory, and one
loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
substantial additional cost.

If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
bits, so you have to spend the overhead of an additional loop (but not
of two additional loops as without ADCX/ADOX).

With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
(one is zero, one is sp), you can add 14 multi-precision numbers per
loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
for the loop counter, 13 registers for loop-carried carry flags.

Of course, the question is if this kind of computation is needed
frequently enough to justify this kind of extension. For
multi-precision multiplication and squaring, Intel considered the
frequency relevant enough to introduce ADCX/ADOX/MULX.

- anton

That's not bad. I think, you see yourself that spill and context switch
parts could benefit from more work.
But I suspect that the main opposition you'll face in RISC-V
organization will center not on that, but on fear of increase in cycle
time, no matter if proven or not with hard numbers.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 13 13:10:58 2024

From Newsgroup: comp.arch

On Fri, 11 Oct 2024 01:41:58 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Mon, 7 Oct 2024 13:05:53 +0300, Michael S wrote:

In all cases the vendor of GPU changed ...

That, too, added to the problem, in that the software folks had to
rewrite all the performance-intensive bits yet again for the new
machine.

OpenCL never took off because the GPGPU market simply isn’t
competitive enough. NVidia is dominant, AMD plays second fiddle, and
that’s it.

I am not sure about dog-tail relationships.
To me it sound plausible that NV dominates due to better software story.
At least that's what I see in certain sectors of embedded market -
people prefer old NV Jetson Xavier over newer AMD and Intel SoCs that
are much better not only on the CPU side, but also provide much more
FLOPs on GPU side. And the reason is that they are much more certain
that they will be able to write programs for NV GPUs than they are for
AMD or Intel GPUs.
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 13 15:16:08 2024

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> writes:

To their defense, AMD's use of the term ROP didn't last for long.
K8 manuals use the better term micro-ops. I don't have K7 manual to
look, but it seems to me that it uses the same terminology as K8.

I have come across ROP (and its expansion RISC op) relatively
recently, but maybe it was in third-party material. Their evil deeds
of the past come back to haunt them:-).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Mon Oct 14 23:38:43 2024

From Newsgroup: comp.arch

On Fri, 11 Oct 2024 06:42:15 GMT, Anton Ertl wrote:

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

I can remember Flight Simulator being used as the benchmark for >>compatibility as far back as 1985. A report on a computer show

mentioned

that clone makers were demoing it running on their products.

This is why I feel the term “IBM compatible” was misleading, it should >>have been “Microsoft compatible” from at least that point on.

It was IBM PC compatible, and that was not misleading, because that's
what it was about.

But then IBM came along shortly afterwards with their PS/2 range, which no longer defined the standard for compatibility.

So at that point it was either “Microsoft compatible” or nothing.

... lots of hardware was Microsoft DOS compatible ...

Yes it was, but none of them could run Flight Simulator.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Mon Oct 14 23:39:59 2024

From Newsgroup: comp.arch

On Sun, 13 Oct 2024 13:10:58 +0300, Michael S wrote:

On Fri, 11 Oct 2024 01:41:58 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

OpenCL never took off because the GPGPU market simply isn’t competitive
enough. NVidia is dominant, AMD plays second fiddle, and that’s it.

I am not sure about dog-tail relationships.

In a market dominated by one player, the dominant player tends not to like open standards. Open standards allow competitors to get a foot in the
door, and the dominant player doesn’t like that.
--- Synchronet 3.20a-Linux NewsLink 1.114

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon Oct 14 21:44:06 2024

From Newsgroup: comp.arch

Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

[...]

GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.

....

So why should any hardware include an instruction to trap-on-overflow?

Because ALL the negative speed and code size consequences do not occur.

Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
18.1.0, I get a 15-instruction sequence which does not include add
(the trap-on-overflow version).

MIPS gcc 14.2.0 generates a sequence that includes

jal __addvsi3

i.e., just as for x86-64. Similar for MIPS64 with these compilers.

Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
way of checking overflow at all.

- anton

Yes. So even when the ADD instruction is available they won't use it.
At least clang for MIPS64 uses one of the overflow detect idioms inlined.
Gcc calls that rather expensive subroutine.

I changed your example to use long instead of int
to avoid any partial register issues.
Also I added a third argument just to see what it would do.
It generates slightly different code for the second check.

long add3 (long a, long b, long c) {
return a + b + c;
}

I also tried Ada mips64 gnat 14.2.0 -O2 (below).
It also didn't use the ADD which traps but uses a different idom inlined.

Both examples should have taken 3 instructions
add3:
dadd $2, $4, $5 ; r2 = r4 + r5
dadd $2, $2, $6 ; r2 = r2 + r6
jr $ra
nop

but what clang generated was:

; The comments on the left are mine
add3:
daddiu $sp, $sp, -16 ; set up call frame
sd $ra, 8($sp)
sd $fp, 0($sp)
move $fp, $sp
daddu $3, $4, $5 ; r3 = r4 + r5
slt $1, $3, $4 ; r1 = r3 < r4
slti $2, $5, 0 ; r2 = r5 < 0
bne $2, $1, .LBB0_3 ; if (r2 != r1) goto Overflow
nop
daddu $2, $3, $6 ; r2 = r3 + r6
slt $1, $2, $3 ; r1 = r2 < r3
slti $3, $6, 0 ; r3 = r6 < 0
xor $1, $3, $1 ; if (r3 != r1) goto Overflow
bnez $1, .LBB0_3
nop
move $sp, $fp ; pop frame
ld $fp, 0($sp)
ld $ra, 8($sp)
jr $ra
daddiu $sp, $sp, 16
.LBB0_3:
break

====================================

-- Ada mips64 gnat 14.2.0 -O2
function add3 (a, b, c : Long_Integer) return Long_Integer is
begin
return a + b + c;
end add3;

.LC0:
.ascii "example.adb"
.space 1
_ada_add3:
daddu $3,$4,$5 # tmp205, a, b
xor $4,$4,$5 # tmp206, a, b
nor $4,$0,$4 # tmp208, tmp206
xor $5,$3,$5 # tmp207, tmp205, b
and $5,$5,$4 # tmp209, tmp207, tmp208
bltz $5,.L7 #, tmp209,
daddu $2,$3,$6 # tmp212, tmp205, c

xor $3,$3,$6 # tmp213, tmp205, c
nor $3,$0,$3 # tmp215, tmp213
xor $6,$2,$6 # tmp214, tmp212, c
and $6,$6,$3 # tmp216, tmp214, tmp215
bltz $6,.L7
nop
jr $31
nop
.L7:
daddiu $sp,$sp,-16 #,,
sd $28,0($sp) #,
lui $28,%hi(%neg(%gp_rel(_ada_add3))) #,
daddu $28,$28,$25 #,,
daddiu $28,$28,%lo(%neg(%gp_rel(_ada_add3))) #,,
ld $4,%got_page(.LC0)($28) # tmp210,,
ld $25,%call16(__gnat_rcheck_CE_Overflow_Check)($28) # tmp211,,
sd $31,8($sp) #,
li $5,3 # 0x3 #,
1: jalr $25 # tmp211
daddiu $4,$4,%got_ofst(.LC0) #, tmp210,
--- Synchronet 3.20a-Linux NewsLink 1.114

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue Oct 15 12:59:03 2024

From Newsgroup: comp.arch

Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

But then, risc processors mostly, started using exceptions for housekeeping >> - SPARC for register window sliding, Alpha for byte, word and misaligned
memory access

On Alpha the assembler expands byte, word and unaligned access
mnemonics into sequences of machine instructions; if you compile for
BWX extensions, byte and word mnemonics get compiled into BWX
instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
signal at least on Linux. This terminates your typical program, so
it's not at all frequent.

Ah yes, that was it. After they added BWX to 21164 in 1996,
for older 21064 models VMS had an optional illegal instruction exception handler that caught BWX instructions, emulated them and continued,
or terminated.

Concerning unaligned accesses, if you use a load or store that
requires alignment, Digital OSF/1 (and the later versions with various
names) by default produced a signal rather than fixing it up, so again programs are typically terminated, and the exception is not at all
frequent. There is a system call and a tool (uac) that allows telling
the OS to fix up unaligned accesses, but it played no role in my
experience while I was still using Digital OSF/1 (including it's
successors).

On Linux the default behaviour was to fix up the unaligned accesses
and to log that in the system log. There were a few such messages in
the log per day, so that obviously was not a frequent occurence,
either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
wanted to get a signal when an unaligned access happens.

IIRC on VMS the unaligned exception was caught and could optionally
log a diagnostic, execute a fixup handler and continue, or terminate.

As for the unaligned-access mnemonics, these were obviously barely
used: I found that gas generates wrong code for ustq several years
after Alpha was introduced, so obviously no software running under
Linux has used this mnemonic.

The solution for Alpha was to add back the byte and word instructions,
and add misaligned access support to all memory ops.

Alpha added BWX instructions, but not because it had used trapping to
emulate them earlier; Old or portable binaries continued to use
instruction sequences. Alpha traps when you do, e.g., an unaligned
ldq in all Alpha implementations I have had contact with (up to a
800MHz 21264B).

- anton

You are right... they didn't add misaligned access to all LD and ST.
Except for LDQ_U and STQ_U they still fault on non-natural alignment.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bernd Linsel@bl1-thispartdoesnotbelonghere@gmx.com to comp.arch on Tue Oct 15 21:24:11 2024

From Newsgroup: comp.arch

On 12.10.24 11:18, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

Kent Dickey wrote:

[...]

GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.

...

So why should any hardware include an instruction to trap-on-overflow?

Because ALL the negative speed and code size consequences do not occur.

Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
18.1.0, I get a 15-instruction sequence which does not include add
(the trap-on-overflow version).

MIPS gcc 14.2.0 generates a sequence that includes

jal __addvsi3

i.e., just as for x86-64. Similar for MIPS64 with these compilers.

Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
way of checking overflow at all.

- anton

Very irritating: https://godbolt.org/z/KsMc3KfKc

Why do neither gcc nor clang use MIPS's trap-on-overflow addition
operators, while they indeed use teq <divisor>, 0 for a division-by-zero check?
--
Bernd Linsel
--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Sat Oct 26 18:37:14 2024

From Newsgroup: comp.arch

John Dallman <jgd@cix.co.uk> wrote:

I see where I'm going wrong: I'm trying to talk about the machines
designed to run MS-DOS and later Windows, not just the CPUs. The vast
range of hardware that all had substantial degrees of compatibility as regards booting, busses and so on. Those things let their manufacturers compete for the DOS and Windows market, whereas x86-based machines that weren't PC-compatible only succeeded in quite specialised niches.

Those hardware suppliers did not close off access to the more advanced features of i386 onwards, because they had no reason to, and that let
Linux take advantage of all that hardware when it came along. That's the point I was failing to make.

I think this is still misleading. Not only 386 was _much_ more
ambitious desgin than just "processor for running DOS". Hadware
manufacturers also cared about running more things than just
DOS. And "running DOS" is misleading too: for many "DOS applications"
DOS provided just program loader and file system access. Such
applications could switch to protected mode, use multitasking
and 32-bit addressing. There were "DOS extenders". Before
Windows gained market dominance there were competing GUI-s.
There were PC servers, which at some time meant Novell.

So things critical to Linux were also important on general PC
market. Clearly Linux benefited from availabilty of comodity
PC-s. But things that made a PC good PC were correlated with
being good Linux machine. As a litte anecdote let med add that
small sellers frequently used Linux as a tester for PC-s they
were selling, as it was stressing machines more than "typical"
DOS applications.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (0 / 10)
Uptime:	119:56:44
Calls:	12,958
Files:	186,574
Messages:	3,265,641

Re: is Vax addressing sane today

Who's Online

System Info