• Re: is Vax addressing sane today

    From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 14:16:51 2024
    From Newsgroup: comp.arch

    Kent Dickey wrote:
    In article <efXIO.169388$1m96.45507@fx15.iad>,
    EricP <ThatWouldBeTelling@thevillage.com> wrote:
    Kent Dickey wrote:
    OK, my post was about how having a hardware trap-on-overflow instruction >>> (or a mode for existing ALU instructions) is useless for anything OTHER
    than as a debug aid where you crash the problem on overflow (you can
    have a general exception handler to shut down gracefully, but "patching things
    up and continuing" doesn't work). I gave details of reasons folks might >>> want to try to use trap-on-overflow instructions, and show how the
    other cases don't make sense.
    For me error detection of all kinds is useful. It just happens
    to not be conveniently supported in C so no one tries it in C.

    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.

    In no way was I ever arguing that checking for overflow was a bad idea,
    or a language issue, or anything else. Just that CPUs should not bother >>> having trap-on-overflow instructions.
    I understand, and I disagree with this conclusion.
    I think all forms of software error detection are useful and
    HW should make them simple and eliminate cost when possible.

    I think I am not explaining the issue well.

    I'm not arguing what you want to do with overflow. I'm trying to show that for all uses of detecting overflow other than crashing with no recovery, hardware trapping on overflow is a poor approach.

    If you enable hardware traps on integer overflow, then to do anything other than crash the program would require engineering a very complex set of
    data structures, roughly approximately the complexity of adding debug information to the executable, in order to make this work. As far as I know, no one in the history of computers has yet undertaken this task.

    VAX/VMS 1.0 in 1979 had stack-based Structured Exception Handling (SEP).
    And of course carried it over onto Alpha/VMS.
    WinNT had SEP in its first version in 1992 for MIPS and 386
    supported both by the C compiler and OS. Win95 had support too.
    In WinNT MS added __try __except keywords to the C language to support
    it for both themselves inside the OS and for users.

    Some languages like C++ and Ada have native support for SEP.
    There can be differences in what behaviors languages expect to be supported, like can one continue from an exception, or pass arguments to a handler.

    This is because each instruction which overflows would need special
    handling, and the "debug" information would be needed. It would be a huge amount of compiler/linker/runtime complexity.

    General structured exception handling is not as complex or expensive
    as you think. It's in the multiple 1000's of instructions range
    (so don't use it gratuitously).

    WinNT implemented it differently on 32-bit x86 and 64-bit x64,
    with the x64 method being more efficient because the compiler
    does most of the work. On x64 the compiler just needs to supply
    bounding low and high RIP's for *just the exception handler code*.

    The cost of delivering a structured exception is the OS basically
    delivers an exception to a thread dispatcher similar to a signal,
    but for structured exceptions that dispatcher code acts differently.
    The thread's frame pointer is the head of a single linked list of
    stack frames. It starts at the bottom of stack pointed to by the
    frame pointer and scans backward, taking the RIP for each context
    and looking in a small table of handler bounds to see if it is in range.
    If there is a handler, it is called. If it handles it, great.
    Otherwise it continues to scan backwards through the stack frames.
    If it gets to the top of stack and there is no handler, it invokes the
    thread's last chance handler, and if that doesn't intercept the exception,
    it terminates the thread.

    This is different than most "signal" handlers people have written, where simple inspection of the instruction which failed and the address involved allows it to be "handled". But to do anything other than crash, each instruction which overflows needs special handling unique to that instruction and dependent on what the compiler was in the middle of doing when the overflow happened. This is why trapping just isn't a good idea.

    Except you keep missing the point:
    no one has a handler for integer overflow because it should never happen.
    Just like no one has a handler for memory read parity errors.

    When you wrote C code using signed integers, *YOU* guarenteed to the
    compiler that your code would never overflow. Overflow checking just
    detects when you have made an error, just like array bounds checking,
    or divide by zero checking.

    This is not something being done *to you* against your will,
    this is something that you *ask for* because it helps detect your errors.
    Doing it in hardware just makes it efficient.

    A better exception usage example might be a routine that enables exceptions
    for floating point underflow where the FPU traps to a handler that zeros
    the value and logs where it happened so someone can look at it later,
    then continues with its calculation.

    I'm just explaining why trap-on-overflow has gone away, because it's
    almost completely useless: hardware trap on overflow is only good for the case that you want to crash on integer overflow. Branch-on-overflow is the correct approach--the compiler can branch to either a trapping instruction (if you just want to crash), or for all other cases of detecting overflow, the compiler branches to "fixup" code.

    But crash on overflow *IS* the correct behavior in 99.999% of cases.
    Branch on overflow is ALSO needed in certain rare cases and I showed how
    it is easily detected.

    And crash-on-overflow just isn't a popular use model, as I use the example
    of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it. Especially since branch-on-overflow
    is almost as good in every way.

    Kent

    Because C doesn't require it. That does not make the capability useless.

    Removing error detectors does not make the errors go away,
    just your knowledge of them.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed Oct 9 18:42:42 2024
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:


    And crash-on-overflow just isn't a popular use model, as I use the example >> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it. Especially since branch-on-overflow
    is almost as good in every way.

    Kent

    Because C doesn't require it. That does not make the capability useless.

    Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause), and it's best done with conditional branches, not traps.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 14:43:49 2024
    From Newsgroup: comp.arch

    Niklas Holsti wrote:
    On 2024-10-07 22:12, MitchAlsup1 wrote:
    On Mon, 7 Oct 2024 18:55:26 +0000, Kent Dickey wrote:

    In article <efXIO.169388$1m96.45507@fx15.iad>,
    EricP <ThatWouldBeTelling@thevillage.com> wrote:
    Kent Dickey wrote:
    In article <O2DHO.184073$kxD8.113118@fx11.iad>,
    EricP <ThatWouldBeTelling@thevillage.com> wrote:
    Kent Dickey wrote:

    In no way was I ever arguing that checking for overflow was a bad
    idea,
    or a language issue, or anything else. Just that CPUs should not
    bother
    having trap-on-overflow instructions.

    I understand, and I disagree with this conclusion.
    I think all forms of software error detection are useful and
    HW should make them simple and eliminate cost when possible.

    I think I am not explaining the issue well.

    I'm not arguing what you want to do with overflow. I'm trying to show
    that for all uses of detecting overflow other than crashing with no
    recovery, hardware trapping on overflow is a poor approach.

    If you enable hardware traps on integer overflow, then to do anything
    other than crash the program would require engineering a very complex
    set of data structures, roughly approximately the complexity of adding
    debug information to the executable, in order to make this work. As
    far as I know, no one in the history of computers has yet undertaken
    this task.

    And yet, this is exactly the kind of data C++ needs in order to
    use its Try-Throw-Catch exception model. The stack walker needs
    to know where on the stack is the list of stuff to free on block
    exit, where are the preserved registers and how many, ...


    Ada too.

    There are at least two ways to do that (at least for Ada, probably also
    for C++):

    - Dynamically maintain a stack-like data structure (a chain, linked
    list) that describes the current nesting of "code blocks" and their exception handlers. Whenever the program enters a block with an
    exception handler, there is entry code that pushes the description of
    that exception handler on this chain, including the address of its code;
    and vice versa pop on exiting such a block.

    Usually it uses the frame pointer to create a single linked list of
    call frames to walk backwards when scanning for an exception handler.

    There is also control block information that needs to be dynamically
    set up for each handler, so there is some runtime overhead.

    - Statically construct a mapping table that is stored in the executable
    and maps code ranges to exception handlers.

    The static method moves as much as possible of the control block
    information out of the dynamic context, lowering the set up cost
    for a handler.

    Ada implementations started with the dynamic method, which is simpler
    but adds some execution cost to all blocks with exception handlers, even
    if an exception never happens. Current implementations tend to the
    static method, also called "zero-cost exceptions" because there is no
    extra execution cost for blocks with exception handlers /unless/ an exception does occur.


    Windows used the dynamic method in 32-bit x86 OS and switched
    to static method on 64-bit x64 as it has lower runtime overhead.

    Structured Exception Handling (C/C++) https://learn.microsoft.com/en-us/cpp/cpp/structured-exception-handling-c-cpp?view=msvc-170

    x64 exception handling https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170

    Exception handling in MSVC https://learn.microsoft.com/en-us/cpp/cpp/exception-handling-in-visual-cpp?view=msvc-170

    Modern C++ best practices for exceptions and error handling https://learn.microsoft.com/en-us/cpp/cpp/errors-and-exception-handling-modern-cpp?view=msvc-170



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Oct 9 15:08:05 2024
    From Newsgroup: comp.arch

    Scott Lurndal wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:

    And crash-on-overflow just isn't a popular use model, as I use the example >>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it. Especially since branch-on-overflow
    is almost as good in every way.

    Kent
    Because C doesn't require it. That does not make the capability useless.

    Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
    and it's best done with conditional branches, not traps.

    Then you use the overflow branching form for those situations
    where you have a specific local overflow handler. Nothing stops that.

    But that is not a justification for getting rid of overflow trapping instructions altogether, as Kent was making. And actually it looks to me,
    not knowing Cobol, like it should use overflow trapping instructions
    UNLESS there is an ON OVERFLOW clause. i.e. that the default should be to
    treat overflow as an error unless you explicitly state how to handle it.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 9 19:43:39 2024
    From Newsgroup: comp.arch

    On Wed, 9 Oct 2024 18:16:51 +0000, EricP wrote:

    Except you keep missing the point:
    no one has a handler for integer overflow because it should never
    happen. Just like no one has a handler for memory read parity errors.

    Oh contrairé:
    I understand how to recover from even "late write ECC violations*"--
    but mostly that is because I am primarily a HW guy. (*) When a cache
    line displaced from L1 or L2 arrives at L3/DRAM with a bad ECC.

    When you wrote C code using signed integers, *YOU* guarenteed to the
    compiler that your code would never overflow. Overflow checking just
    detects when you have made an error, just like array bounds checking,
    or divide by zero checking.

    I disagree with this statement. I wrote in C under the knowledge
    that integer data types can overflow--they have to be able to--
    it is the nature of fixed size containers. I am happy for the
    compiler to IGNORE the possibility of overflow, but not the HW.

    This is not something being done *to you* against your will,
    this is something that you *ask for* because it helps detect your
    errors.
    Doing it in hardware just makes it efficient.

    Yes, allow the compiler to IGNORE the problem, but have HW detect the
    problem.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Robert Finch@robfi680@gmail.com to comp.arch on Wed Oct 9 16:12:40 2024
    From Newsgroup: comp.arch

    On 2024-10-09 2:16 p.m., EricP wrote:
    Kent Dickey wrote:
    In article <efXIO.169388$1m96.45507@fx15.iad>,
    EricP  <ThatWouldBeTelling@thevillage.com> wrote:
    Kent Dickey wrote:
    OK, my post was about how having a hardware trap-on-overflow
    instruction
    (or a mode for existing ALU instructions) is useless for anything OTHER >>>> than as a debug aid where you crash the problem on overflow (you can
    have a general exception handler to shut down gracefully, but
    "patching things
    up and continuing" doesn't work).  I gave details of reasons folks
    might
    want to try to use trap-on-overflow instructions, and show how the
    other cases don't make sense.
    For me error detection of all kinds is useful. It just happens
    to not be conveniently supported in C so no one tries it in C.

    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers
    need
       as it triggers for many false positives so people turn it off.

    In no way was I ever arguing that checking for overflow was a bad idea, >>>> or a language issue, or anything else.  Just that CPUs should not
    bother
    having trap-on-overflow instructions.
    I understand, and I disagree with this conclusion.
    I think all forms of software error detection are useful and
    HW should make them simple and eliminate cost when possible.

    I think I am not explaining the issue well.

    I'm not arguing what you want to do with overflow.  I'm trying to show
    that
    for all uses of detecting overflow other than crashing with no recovery,
    hardware trapping on overflow is a poor approach.

    If you enable hardware traps on integer overflow, then to do anything
    other
    than crash the program would require engineering a very complex set of
    data structures, roughly approximately the complexity of adding debug
    information to the executable, in order to make this work.  As far as
    I know,
    no one in the history of computers has yet undertaken this task.

    VAX/VMS 1.0 in 1979 had stack-based Structured Exception Handling (SEP).
    And of course carried it over onto Alpha/VMS.
    WinNT had SEP in its first version in 1992 for MIPS and 386
    supported both by the C compiler and OS. Win95 had support too.
    In WinNT MS added __try __except keywords to the C language to support
    it for both themselves inside the OS and for users.

    Some languages like C++ and Ada have native support for SEP.
    There can be differences in what behaviors languages expect to be
    supported,
    like can one continue from an exception, or pass arguments to a handler.

    This is because each instruction which overflows would need special
    handling, and the "debug" information would be needed.  It would be a
    huge
    amount of compiler/linker/runtime complexity.

    General structured exception handling is not as complex or expensive
    as you think. It's in the multiple 1000's of instructions range
    (so don't use it gratuitously).

    WinNT implemented it differently on 32-bit x86 and 64-bit x64,
    with the x64 method being more efficient because the compiler
    does most of the work. On x64 the compiler just needs to supply
    bounding low and high RIP's for *just the exception handler code*.

    The cost of delivering a structured exception is the OS basically
    delivers an exception to a thread dispatcher similar to a signal,
    but for structured exceptions that dispatcher code acts differently.
    The thread's frame pointer is the head of a single linked list of
    stack frames. It starts at the bottom of stack pointed to by the
    frame pointer and scans backward, taking the RIP for each context
    and looking in a small table of handler bounds to see if it is in range.
    If there is a handler, it is called. If it handles it, great.
    Otherwise it continues to scan backwards through the stack frames.
    If it gets to the top of stack and there is no handler, it invokes the thread's last chance handler, and if that doesn't intercept the exception,
    it terminates the thread.

    This is different than most "signal" handlers people have written, where
    simple inspection of the instruction which failed and the address
    involved
    allows it to be "handled".  But to do anything other than crash, each
    instruction which overflows needs special handling unique to that
    instruction
    and dependent on what the compiler was in the middle of doing when the
    overflow happened.  This is why trapping just isn't a good idea.

    Except you keep missing the point:
    no one has a handler for integer overflow because it should never happen. Just like no one has a handler for memory read parity errors.

    When you wrote C code using signed integers, *YOU* guarenteed to the
    compiler that your code would never overflow. Overflow checking just
    detects when you have made an error, just like array bounds checking,
    or divide by zero checking.

    This is not something being done *to you* against your will,
    this is something that you *ask for* because it helps detect your errors. Doing it in hardware just makes it efficient.

    A better exception usage example might be a routine that enables exceptions for floating point underflow where the FPU traps to a handler that zeros
    the value and logs where it happened so someone can look at it later,
    then continues with its calculation.

    I'm just explaining why trap-on-overflow has gone away, because it's
    almost completely useless:  hardware trap on overflow is only good for
    the
    case that you want to crash on integer overflow.  Branch-on-overflow
    is the
    correct approach--the compiler can branch to either a trapping
    instruction
    (if you just want to crash), or for all other cases of detecting
    overflow,
    the compiler branches to "fixup" code.

    But crash on overflow *IS* the correct behavior in 99.999% of cases.
    Branch on overflow is ALSO needed in certain rare cases and I showed how
    it is easily detected.

    And crash-on-overflow just isn't a popular use model, as I use the
    example
    of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it.  Especially since branch-on-overflow
    is almost as good in every way.

    Kent

    Because C doesn't require it. That does not make the capability useless.

    Removing error detectors does not make the errors go away,
    just your knowledge of them.


    Slightly confused on trap versus branch. Trapping on overflow is not a
    good solution, but a branch on overflow is? A trap is just a slow
    branch. The reason for trapping was to improve code density and non-exceptional performance.
    If it is the overhead of performing a trap operation that is the issue,
    then a special register could be dedicated to holding the overflow
    handler address, and instructions defined to automatically jump through
    the overflow handler address register (a branch target address register). Overflow detecting instructions are just a fusion of the instruction and
    the following branch on overflow operation.

    addjo r1,r2,r3 <- does a jump (instead of a trap) to branch register #7
    for instance, on overflow.

    Having an overflow branch register might be better for code density / performance.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 9 21:36:21 2024
    From Newsgroup: comp.arch

    On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:

    On 2024-10-09 2:16 p.m., EricP wrote:
    Kent Dickey wrote:

    But crash on overflow *IS* the correct behavior in 99.999% of cases.
    Branch on overflow is ALSO needed in certain rare cases and I showed how
    it is easily detected.

    And crash-on-overflow just isn't a popular use model, as I use the
    example
    of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it.  Especially since branch-on-overflow
    is almost as good in every way.

    Kent

    Because C doesn't require it. That does not make the capability useless.

    Removing error detectors does not make the errors go away,
    just your knowledge of them.


    Slightly confused on trap versus branch. Trapping on overflow is not a
    good solution, but a branch on overflow is? A trap is just a slow
    branch. The reason for trapping was to improve code density and non-exceptional performance.
    If it is the overhead of performing a trap operation that is the issue,

    x86 has seriously distorted peoples view on how much overhead is
    associated with a trap*. MIPS had trap handlers measuring in the
    17 cycle range both getting to the handler, handling the exception,
    and getting back to the instruction that trapped. Since GBOoO windows
    have mispredicted branches in this kind of latency, too; then a
    properly designed architecture should be able to do similarly to MIPS.

    Whereas x86 may take 1,000 cycles to get to the handler. This is due
    to all the Descriptor table stuff, call-gates, protection rings, and segmentation.

    (*) trap == exception == fault == any unpredicted control flow
    cause by the instruction stream itself (SVC-et-al not included
    because it is requested by the instruction stream).

    then a special register could be dedicated to holding the overflow
    handler address, and instructions defined to automatically jump through
    the overflow handler address register (a branch target address
    register).
    Overflow detecting instructions are just a fusion of the instruction and
    the following branch on overflow operation.

    addjo r1,r2,r3 <- does a jump (instead of a trap) to branch register #7
    for instance, on overflow.

    Having an overflow branch register might be better for code density / performance.

    What if you want to handle multiply overflow differently than
    addition overflow ??
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Oct 10 15:36:32 2024
    From Newsgroup: comp.arch

    On Wed, 9 Oct 2024 21:36:21 +0000
    mitchalsup@aol.com (MitchAlsup1) wrote:

    On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:


    x86 has seriously distorted peoples view on how much overhead is
    associated with a trap*.

    Do you have an opinion about FRED? https://cdrdv2-public.intel.com/819481/346446-flexible-return-and-event-delivery.pdf

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Robert Finch@robfi680@gmail.com to comp.arch on Thu Oct 10 08:57:17 2024
    From Newsgroup: comp.arch

    On 2024-10-09 5:36 p.m., MitchAlsup1 wrote:
    On Wed, 9 Oct 2024 20:12:40 +0000, Robert Finch wrote:

    On 2024-10-09 2:16 p.m., EricP wrote:
    Kent Dickey wrote:

    But crash on overflow *IS* the correct behavior in 99.999% of cases.
    Branch on overflow is ALSO needed in certain rare cases and I showed how >>> it is easily detected.

    And crash-on-overflow just isn't a popular use model, as I use the
    example
    of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it.  Especially since branch-on-overflow >>>> is almost as good in every way.

    Kent

    Because C doesn't require it. That does not make the capability useless. >>>
    Removing error detectors does not make the errors go away,
    just your knowledge of them.


    Slightly confused on trap versus branch. Trapping on overflow is not a
    good solution, but a branch on overflow is? A trap is just a slow
    branch. The reason for trapping was to improve code density and
    non-exceptional performance.
    If it is the overhead of performing a trap operation that is the issue,

    x86 has seriously distorted peoples view on how much overhead is
    associated with a trap*. MIPS had trap handlers measuring in the
    17 cycle range both getting to the handler, handling the exception,
    and getting back to the instruction that trapped. Since GBOoO windows
    have mispredicted branches in this kind of latency, too; then a
    properly designed architecture should be able to do similarly to MIPS.

    Whereas x86 may take 1,000 cycles to get to the handler. This is due
    to all the Descriptor table stuff, call-gates, protection rings, and segmentation.

    (*) trap == exception == fault == any unpredicted control flow
    cause by the instruction stream itself (SVC-et-al not included
    because it is requested by the instruction stream).

    then a special register could be dedicated to holding the overflow
    handler address, and instructions defined to automatically jump through
    the overflow handler address register (a branch target address
    register).
    Overflow detecting instructions are just a fusion of the instruction and
    the following branch on overflow operation.

    addjo r1,r2,r3    <- does a jump (instead of a trap) to branch
    register #7
    for instance, on overflow.

    Having an overflow branch register might be better for code density /
    performance.

    What if you want to handle multiply overflow differently than
    addition overflow ??

    The branch register could be reloaded before the operation with a
    different handler address. It would be one instruction to load a PC
    relative address perhaps. Slightly better than having one instruction
    after every op.
    For Q+ there would be room in the instruction to specify a branch
    register, so multiple handler targets could be supported.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Oct 10 15:32:21 2024
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Scott Lurndal wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:

    And crash-on-overflow just isn't a popular use model, as I use the example >>>> of x86 in 32-bit mode having a 1-byte INTO instruction which crashes,
    and no compiler seems to use it. Especially since branch-on-overflow
    is almost as good in every way.

    Kent
    Because C doesn't require it. That does not make the capability useless.

    Other languages do require overflow detection (e.g. COBOL ON OVERFLOW clause),
    and it's best done with conditional branches, not traps.

    Then you use the overflow branching form for those situations
    where you have a specific local overflow handler. Nothing stops that.

    But that is not a justification for getting rid of overflow trapping >instructions altogether, as Kent was making. And actually it looks to me,
    not knowing Cobol, like it should use overflow trapping instructions
    UNLESS there is an ON OVERFLOW clause. i.e. that the default should be to >treat overflow as an error unless you explicitly state how to handle it.

    See https://www.mainframestechhelp.com/tutorials/cobol/size-error-phrase.htm

    The default is to truncate. All other cases can be handled with
    a conditional branch.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:40:15 2024
    From Newsgroup: comp.arch

    On Mon, 7 Oct 2024 10:17:26 +0200, Terje Mathisen wrote:

    The single most canonical test for IBM PC compatibility was Microsoft's Flight Simulator, taking off from the now demolished Meighs Field in
    Chicago.

    That game used the OS and BIOS for the loading of the game, and then
    went on to direct hardware access for pretty much the rest of the
    playing time.

    I can remember Flight Simulator being used as the benchmark for
    compatibility as far back as 1985. A report on a computer show mentioned
    that clone makers were demoing it running on their products.

    This is why I feel the term “IBM compatible” was misleading, it should have been “Microsoft compatible” from at least that point on.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:41:58 2024
    From Newsgroup: comp.arch

    On Mon, 7 Oct 2024 13:05:53 +0300, Michael S wrote:

    In all cases the vendor of GPU changed ...

    That, too, added to the problem, in that the software folks had to rewrite
    all the performance-intensive bits yet again for the new machine.

    OpenCL never took off because the GPGPU market simply isn’t competitive enough. NVidia is dominant, AMD plays second fiddle, and that’s it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Fri Oct 11 01:46:51 2024
    From Newsgroup: comp.arch

    On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:

    On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    ARM was rather late to the RISC game, this might have been literally
    true.

    ARM was rather early to the RISC game. Shipped for profit since late
    1986.

    Shipped in an actual PC, the Acorn Archimedes range.

    That was the first time I ever saw a 3D shaded rendition of a flag waving,
    on a computer, generated in real time. No other machine could do it,
    unless you got up to the really expensive Unix workstation class (e.g.
    SGI, custom Evans & Sutherland hardware etc).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 11 06:42:15 2024
    From Newsgroup: comp.arch

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    I can remember Flight Simulator being used as the benchmark for >compatibility as far back as 1985. A report on a computer show mentioned >that clone makers were demoing it running on their products.

    This is why I feel the term “IBM compatible” was misleading, it should >have been “Microsoft compatible” from at least that point on.

    It was IBM PC compatible, and that was not misleading, because that's
    what it was about. "Microsoft compatible" would have been misleading
    (if you want it to mean the same as "IBM PC compatible"), because lots
    of hardware was Microsoft DOS compatible that was not an IBM PC clone
    and therefore not 100% IBM PC compatible. And MS-DOS was certainly
    the higher-profile Microsoft product than the Flight Simulator.

    And many buyers did not care about the Flight Simulator, but more
    about Lotus 1-2-3, which also required an IBM PC compatible machine.

    Of course you saw the Flight Simulator a lot at shows: Moving pictures
    attract the eye in a way that a static spreadsheet screen does not.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Oct 11 14:20:20 2024
    From Newsgroup: comp.arch

    On 11/10/2024 03:46, Lawrence D'Oliveiro wrote:
    On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:

    On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    ARM was rather late to the RISC game, this might have been literally
    true.

    ARM was rather early to the RISC game. Shipped for profit since late
    1986.

    Shipped in an actual PC, the Acorn Archimedes range.

    That was the first time I ever saw a 3D shaded rendition of a flag waving,
    on a computer, generated in real time. No other machine could do it,
    unless you got up to the really expensive Unix workstation class (e.g.
    SGI, custom Evans & Sutherland hardware etc).

    The Acorn Archimedes was /way/ ahead of anything in the PC / x86 world,
    both in hardware and software. It could emulate an 80286 PC almost as
    fast as real PC's that you could buy at the time for a higher price than
    the Archimedes.

    The demo that impressed me most was drawing full-screen Mandelbrot sets
    in a second or two, compared to several minutes for a typical PC at the
    time. It meant you could do real-time zooming and flying around in the set.

    My first encounter with ARM assembly was enhancing that demo program for higher screen resolution and deeper zooming.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 08:23:39 2024
    From Newsgroup: comp.arch

    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    Maybe all add/sub/etc opcodes that are immediately followed by an INTO=20 >could be fused into a single ADDO/SUBO/etc version that takes zero extra =

    cycles as long as the trap part isn't hit?

    On Intel P-cores add/inc/sub etc. has been fused with a following
    JO/JNO into one uop for quite a while (I guess since Sandy Bridge
    (2011)).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 08:45:57 2024
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    But then, risc processors mostly, started using exceptions for housekeeping
    - SPARC for register window sliding, Alpha for byte, word and misaligned >memory access

    On Alpha the assembler expands byte, word and unaligned access
    mnemonics into sequences of machine instructions; if you compile for
    BWX extensions, byte and word mnemonics get compiled into BWX
    instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
    signal at least on Linux. This terminates your typical program, so
    it's not at all frequent.

    Concerning unaligned accesses, if you use a load or store that
    requires alignment, Digital OSF/1 (and the later versions with various
    names) by default produced a signal rather than fixing it up, so again
    programs are typically terminated, and the exception is not at all
    frequent. There is a system call and a tool (uac) that allows telling
    the OS to fix up unaligned accesses, but it played no role in my
    experience while I was still using Digital OSF/1 (including it's
    successors).

    On Linux the default behaviour was to fix up the unaligned accesses
    and to log that in the system log. There were a few such messages in
    the log per day, so that obviously was not a frequent occurence,
    either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
    wanted to get a signal when an unaligned access happens.

    As for the unaligned-access mnemonics, these were obviously barely
    used: I found that gas generates wrong code for ustq several years
    after Alpha was introduced, so obviously no software running under
    Linux has used this mnemonic.

    The solution for Alpha was to add back the byte and word instructions,
    and add misaligned access support to all memory ops.

    Alpha added BWX instructions, but not because it had used trapping to
    emulate them earlier; Old or portable binaries continued to use
    instruction sequences. Alpha traps when you do, e.g., an unaligned
    ldq in all Alpha implementations I have had contact with (up to a
    800MHz 21264B).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 09:18:23 2024
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:
    [...]
    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need
    as it triggers for many false positives so people turn it off.
    ...
    So why should any hardware include an instruction to trap-on-overflow?

    Because ALL the negative speed and code size consequences do not occur.

    Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
    18.1.0, I get a 15-instruction sequence which does not include add
    (the trap-on-overflow version).

    MIPS gcc 14.2.0 generates a sequence that includes

    jal __addvsi3

    i.e., just as for x86-64. Similar for MIPS64 with these compilers.

    Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
    shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
    way of checking overflow at all.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 12 10:23:18 2024
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> writes:
    That's correct about intrinsics, but incorrect about ADCX/ADOX.
    The later can be moderately helpful in special situuations, esp.
    128b * 128b => 256b multiplication, but it is never necessary
    and for addition/sbtraction is not needed at all.

    They are useful if there are two strings of additions. This happens
    naturally in wide multiplication (also beyond 256b results). But it
    also happens when you add three multi-precision numbers (say, X, Y,
    Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
    of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both
    additions in one loop, so XYi can be in a register and does not need
    to be stored . If you don't have these instructions, only ADC, you
    need one loop to compute X+Y and store the result in memory, and one
    loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
    substantial additional cost.

    If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
    bits, so you have to spend the overhead of an additional loop (but not
    of two additional loops as without ADCX/ADOX).

    With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
    (one is zero, one is sp), you can add 14 multi-precision numbers per
    loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
    for the loop counter, 13 registers for loop-carried carry flags.

    Of course, the question is if this kind of computation is needed
    frequently enough to justify this kind of extension. For
    multi-precision multiplication and squaring, Intel considered the
    frequency relevant enough to introduce ADCX/ADOX/MULX.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 13 13:00:14 2024
    From Newsgroup: comp.arch

    On Sat, 12 Oct 2024 10:23:18 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    That's correct about intrinsics, but incorrect about ADCX/ADOX.
    The later can be moderately helpful in special situuations, esp.
    128b * 128b => 256b multiplication, but it is never necessary
    and for addition/sbtraction is not needed at all.

    They are useful if there are two strings of additions. This happens naturally in wide multiplication (also beyond 256b results). But it
    also happens when you add three multi-precision numbers (say, X, Y,
    Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
    of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both
    additions in one loop, so XYi can be in a register and does not need
    to be stored . If you don't have these instructions, only ADC, you
    need one loop to compute X+Y and store the result in memory, and one
    loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
    substantial additional cost.

    If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
    bits, so you have to spend the overhead of an additional loop (but not
    of two additional loops as without ADCX/ADOX).

    With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
    (one is zero, one is sp), you can add 14 multi-precision numbers per
    loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
    for the loop counter, 13 registers for loop-carried carry flags.

    Of course, the question is if this kind of computation is needed
    frequently enough to justify this kind of extension. For
    multi-precision multiplication and squaring, Intel considered the
    frequency relevant enough to introduce ADCX/ADOX/MULX.

    - anton

    That's not bad. I think, you see yourself that spill and context switch
    parts could benefit from more work.
    But I suspect that the main opposition you'll face in RISC-V
    organization will center not on that, but on fear of increase in cycle
    time, no matter if proven or not with hard numbers.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 13 13:10:58 2024
    From Newsgroup: comp.arch

    On Fri, 11 Oct 2024 01:41:58 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Mon, 7 Oct 2024 13:05:53 +0300, Michael S wrote:

    In all cases the vendor of GPU changed ...

    That, too, added to the problem, in that the software folks had to
    rewrite all the performance-intensive bits yet again for the new
    machine.

    OpenCL never took off because the GPGPU market simply isn’t
    competitive enough. NVidia is dominant, AMD plays second fiddle, and
    that’s it.
    I am not sure about dog-tail relationships.
    To me it sound plausible that NV dominates due to better software story.
    At least that's what I see in certain sectors of embedded market -
    people prefer old NV Jetson Xavier over newer AMD and Intel SoCs that
    are much better not only on the CPU side, but also provide much more
    FLOPs on GPU side. And the reason is that they are much more certain
    that they will be able to write programs for NV GPUs than they are for
    AMD or Intel GPUs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 13 15:16:08 2024
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> writes:
    To their defense, AMD's use of the term ROP didn't last for long.
    K8 manuals use the better term micro-ops. I don't have K7 manual to
    look, but it seems to me that it uses the same terminology as K8.

    I have come across ROP (and its expansion RISC op) relatively
    recently, but maybe it was in third-party material. Their evil deeds
    of the past come back to haunt them:-).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Mon Oct 14 23:38:43 2024
    From Newsgroup: comp.arch

    On Fri, 11 Oct 2024 06:42:15 GMT, Anton Ertl wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    I can remember Flight Simulator being used as the benchmark for >>compatibility as far back as 1985. A report on a computer show
    mentioned
    that clone makers were demoing it running on their products.

    This is why I feel the term “IBM compatible” was misleading, it should >>have been “Microsoft compatible” from at least that point on.

    It was IBM PC compatible, and that was not misleading, because that's
    what it was about.

    But then IBM came along shortly afterwards with their PS/2 range, which no longer defined the standard for compatibility.

    So at that point it was either “Microsoft compatible” or nothing.

    ... lots of hardware was Microsoft DOS compatible ...

    Yes it was, but none of them could run Flight Simulator.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Mon Oct 14 23:39:59 2024
    From Newsgroup: comp.arch

    On Sun, 13 Oct 2024 13:10:58 +0300, Michael S wrote:

    On Fri, 11 Oct 2024 01:41:58 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    OpenCL never took off because the GPGPU market simply isn’t competitive
    enough. NVidia is dominant, AMD plays second fiddle, and that’s it.

    I am not sure about dog-tail relationships.

    In a market dominated by one player, the dominant player tends not to like open standards. Open standards allow competitors to get a foot in the
    door, and the dominant player doesn’t like that.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon Oct 14 21:44:06 2024
    From Newsgroup: comp.arch

    Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:
    [...]
    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.
    ....
    So why should any hardware include an instruction to trap-on-overflow?
    Because ALL the negative speed and code size consequences do not occur.

    Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
    18.1.0, I get a 15-instruction sequence which does not include add
    (the trap-on-overflow version).

    MIPS gcc 14.2.0 generates a sequence that includes

    jal __addvsi3

    i.e., just as for x86-64. Similar for MIPS64 with these compilers.

    Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
    shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
    way of checking overflow at all.

    - anton

    Yes. So even when the ADD instruction is available they won't use it.
    At least clang for MIPS64 uses one of the overflow detect idioms inlined.
    Gcc calls that rather expensive subroutine.

    I changed your example to use long instead of int
    to avoid any partial register issues.
    Also I added a third argument just to see what it would do.
    It generates slightly different code for the second check.

    long add3 (long a, long b, long c) {
    return a + b + c;
    }

    I also tried Ada mips64 gnat 14.2.0 -O2 (below).
    It also didn't use the ADD which traps but uses a different idom inlined.

    Both examples should have taken 3 instructions
    add3:
    dadd $2, $4, $5 ; r2 = r4 + r5
    dadd $2, $2, $6 ; r2 = r2 + r6
    jr $ra
    nop


    but what clang generated was:

    ; The comments on the left are mine
    add3:
    daddiu $sp, $sp, -16 ; set up call frame
    sd $ra, 8($sp)
    sd $fp, 0($sp)
    move $fp, $sp
    daddu $3, $4, $5 ; r3 = r4 + r5
    slt $1, $3, $4 ; r1 = r3 < r4
    slti $2, $5, 0 ; r2 = r5 < 0
    bne $2, $1, .LBB0_3 ; if (r2 != r1) goto Overflow
    nop
    daddu $2, $3, $6 ; r2 = r3 + r6
    slt $1, $2, $3 ; r1 = r2 < r3
    slti $3, $6, 0 ; r3 = r6 < 0
    xor $1, $3, $1 ; if (r3 != r1) goto Overflow
    bnez $1, .LBB0_3
    nop
    move $sp, $fp ; pop frame
    ld $fp, 0($sp)
    ld $ra, 8($sp)
    jr $ra
    daddiu $sp, $sp, 16
    .LBB0_3:
    break

    ====================================

    -- Ada mips64 gnat 14.2.0 -O2
    function add3 (a, b, c : Long_Integer) return Long_Integer is
    begin
    return a + b + c;
    end add3;

    .LC0:
    .ascii "example.adb"
    .space 1
    _ada_add3:
    daddu $3,$4,$5 # tmp205, a, b
    xor $4,$4,$5 # tmp206, a, b
    nor $4,$0,$4 # tmp208, tmp206
    xor $5,$3,$5 # tmp207, tmp205, b
    and $5,$5,$4 # tmp209, tmp207, tmp208
    bltz $5,.L7 #, tmp209,
    daddu $2,$3,$6 # tmp212, tmp205, c

    xor $3,$3,$6 # tmp213, tmp205, c
    nor $3,$0,$3 # tmp215, tmp213
    xor $6,$2,$6 # tmp214, tmp212, c
    and $6,$6,$3 # tmp216, tmp214, tmp215
    bltz $6,.L7
    nop
    jr $31
    nop
    .L7:
    daddiu $sp,$sp,-16 #,,
    sd $28,0($sp) #,
    lui $28,%hi(%neg(%gp_rel(_ada_add3))) #,
    daddu $28,$28,$25 #,,
    daddiu $28,$28,%lo(%neg(%gp_rel(_ada_add3))) #,,
    ld $4,%got_page(.LC0)($28) # tmp210,,
    ld $25,%call16(__gnat_rcheck_CE_Overflow_Check)($28) # tmp211,,
    sd $31,8($sp) #,
    li $5,3 # 0x3 #,
    1: jalr $25 # tmp211
    daddiu $4,$4,%got_ofst(.LC0) #, tmp210,
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue Oct 15 12:59:03 2024
    From Newsgroup: comp.arch

    Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    But then, risc processors mostly, started using exceptions for housekeeping >> - SPARC for register window sliding, Alpha for byte, word and misaligned
    memory access

    On Alpha the assembler expands byte, word and unaligned access
    mnemonics into sequences of machine instructions; if you compile for
    BWX extensions, byte and word mnemonics get compiled into BWX
    instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
    signal at least on Linux. This terminates your typical program, so
    it's not at all frequent.

    Ah yes, that was it. After they added BWX to 21164 in 1996,
    for older 21064 models VMS had an optional illegal instruction exception handler that caught BWX instructions, emulated them and continued,
    or terminated.

    Concerning unaligned accesses, if you use a load or store that
    requires alignment, Digital OSF/1 (and the later versions with various
    names) by default produced a signal rather than fixing it up, so again programs are typically terminated, and the exception is not at all
    frequent. There is a system call and a tool (uac) that allows telling
    the OS to fix up unaligned accesses, but it played no role in my
    experience while I was still using Digital OSF/1 (including it's
    successors).

    On Linux the default behaviour was to fix up the unaligned accesses
    and to log that in the system log. There were a few such messages in
    the log per day, so that obviously was not a frequent occurence,
    either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
    wanted to get a signal when an unaligned access happens.

    IIRC on VMS the unaligned exception was caught and could optionally
    log a diagnostic, execute a fixup handler and continue, or terminate.

    As for the unaligned-access mnemonics, these were obviously barely
    used: I found that gas generates wrong code for ustq several years
    after Alpha was introduced, so obviously no software running under
    Linux has used this mnemonic.

    The solution for Alpha was to add back the byte and word instructions,
    and add misaligned access support to all memory ops.

    Alpha added BWX instructions, but not because it had used trapping to
    emulate them earlier; Old or portable binaries continued to use
    instruction sequences. Alpha traps when you do, e.g., an unaligned
    ldq in all Alpha implementations I have had contact with (up to a
    800MHz 21264B).

    - anton

    You are right... they didn't add misaligned access to all LD and ST.
    Except for LDQ_U and STQ_U they still fault on non-natural alignment.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bernd Linsel@bl1-thispartdoesnotbelonghere@gmx.com to comp.arch on Tue Oct 15 21:24:11 2024
    From Newsgroup: comp.arch

    On 12.10.24 11:18, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:
    [...]
    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need >> as it triggers for many false positives so people turn it off.
    ...
    So why should any hardware include an instruction to trap-on-overflow?

    Because ALL the negative speed and code size consequences do not occur.

    Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
    18.1.0, I get a 15-instruction sequence which does not include add
    (the trap-on-overflow version).

    MIPS gcc 14.2.0 generates a sequence that includes

    jal __addvsi3

    i.e., just as for x86-64. Similar for MIPS64 with these compilers.

    Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
    shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
    way of checking overflow at all.

    - anton

    Very irritating: https://godbolt.org/z/KsMc3KfKc

    Why do neither gcc nor clang use MIPS's trap-on-overflow addition
    operators, while they indeed use teq <divisor>, 0 for a division-by-zero check?
    --
    Bernd Linsel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Sat Oct 26 18:37:14 2024
    From Newsgroup: comp.arch

    John Dallman <jgd@cix.co.uk> wrote:

    I see where I'm going wrong: I'm trying to talk about the machines
    designed to run MS-DOS and later Windows, not just the CPUs. The vast
    range of hardware that all had substantial degrees of compatibility as regards booting, busses and so on. Those things let their manufacturers compete for the DOS and Windows market, whereas x86-based machines that weren't PC-compatible only succeeded in quite specialised niches.

    Those hardware suppliers did not close off access to the more advanced features of i386 onwards, because they had no reason to, and that let
    Linux take advantage of all that hardware when it came along. That's the point I was failing to make.

    I think this is still misleading. Not only 386 was _much_ more
    ambitious desgin than just "processor for running DOS". Hadware
    manufacturers also cared about running more things than just
    DOS. And "running DOS" is misleading too: for many "DOS applications"
    DOS provided just program loader and file system access. Such
    applications could switch to protected mode, use multitasking
    and 32-bit addressing. There were "DOS extenders". Before
    Windows gained market dominance there were competing GUI-s.
    There were PC servers, which at some time meant Novell.

    So things critical to Linux were also important on general PC
    market. Clearly Linux benefited from availabilty of comodity
    PC-s. But things that made a PC good PC were correlated with
    being good Linux machine. As a litte anecdote let med add that
    small sellers frequently used Linux as a tester for PC-s they
    were selling, as it was stressing machines more than "typical"
    DOS applications.
    --
    Waldek Hebisch
    --- Synchronet 3.20a-Linux NewsLink 1.114