• Linus Torvalds on bad architectural features

    From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 3 08:58:32 2025
    From Newsgroup: comp.arch

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus
    Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |
    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.
    |
    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform
    |
    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
    |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
    |
    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!
    |
    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.
    |
    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.
    |
    | - make exceptions asynchronous.
    |
    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!
    |
    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.
    |
    |I'm sure I've forgotten many other points. And I'm sure that hardware
    |people will figure it out!
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 05:40:22 2025
    From Newsgroup: comp.arch

    On 10/3/2025 3:58 AM, Anton Ertl wrote:
    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):


    Yeah...

    Sadly I kinda feel called out here.
    Wouldn't necessarily get the Torvalds' seal of approval...


    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |

    Sorta applies to my core...
    Though the L1D$ also remembers the Phys-Addr and uses this for Write-Back.


    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.
    |

    Avoided in BJX2 Core.

    Would apply to my smaller BSR1 and B32V cores (aligned only = cheaper).


    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform
    |

    Former true of SuperH.
    Both true of BJX1.
    Latter true of BJX2 XG1/XG2.

    Not true of XG3, which went over to superscalar.

    WEX Bundling may have been a mistake in retrospect...



    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
    |
    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |

    Avoided.


    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!
    |

    Avoided.

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.
    |

    Errm, true of BJX2.

    Though, TLB Misses are nowhere near that frequent though (if they were, performance would be unusable dog crap).


    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |

    Also true of my core.


    It also now pretends to have Binary128, pretty much entirely by software traps.

    But, trapping has less code footprint, so if sinl/cosl/... are used,
    they wont burn as much space in ".text" with the function calls (and if
    I can trap out of RISC-V mode, then it can use 128-bit math and a few
    other features that don't exist in RV64, so it isn't necessarily slower
    than using a function call).



    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.
    |

    But, makes hardware cheaper...


    | - make exceptions asynchronous.
    |
    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |

    Avoided:
    TLB Miss handling really needs precise exceptions in order to work
    correctly.


    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!
    |
    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.
    |
    |I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!


    Ignoring HOB's in pointers except in certain edge cases?...

    I have mixed feelings about having put FPU status in HOBs of SP
    (possible foot gun).

    Weak coherence, with special rituals needed to actually get caches
    flushed?...

    Bit-slicing certain address calculations so the relevant structures have mandatory alignment?...

    Interrupt entry is basically just a glorified branch-with-mode change,
    so the ISR handler has to go through a convoluted sequence to get to
    where it can start saving off the registers?...

    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Fri Oct 3 13:46:45 2025
    From Newsgroup: comp.arch

    On Fri, 03 Oct 2025 08:58:32 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |

    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    I see nothing wrong (and plenty right) about SAS as long as address
    space is big enough.
    I.e. not 47-48 bits and preferably even not 56 bits. Considering
    near-death of Moore Law, 58 or 60 bits should be enough for SAS for
    next 50 years. May be, even for 100.

    SAS does not allow few tricks that people play today with aliases, but
    none of these tricks is really important for performance and all are detrimental for sanity.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 11:26:11 2025
    From Newsgroup: comp.arch

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:41:34 2025
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.

    Avoided.

    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.

    Avoided.

    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform

    Avoided

    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

    Avoided

    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!

    Avoided

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.

    Avoided--and mine are even coherent so you don't even have to shoot
    them down.

    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.

    Avoided.

    | - make exceptions asynchronous.

    Avoided

    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!

    Avoided

    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.

    Avoided

    |I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!


    A clean sweep.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:42:35 2025
    From Newsgroup: comp.arch


    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Fri Oct 3 16:18:47 2025
    From Newsgroup: comp.arch

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction
    pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s
    (maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note
    since the OS knows it needs to copy pages, it can pre-copy a bunch of
    pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:44:26 2025
    From Newsgroup: comp.arch

    Kent Dickey [2025-10-03 16:18:47] wrote:
    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    The problem is not how/when you do the "copy", but the fact that once
    the data at address A has been changed, address A in the child process
    and address A in the parent don't contain the same value. This is fundamentally at odds with SASOS and with virtually-indexed&tagged
    caches. The usual workaround is to augment the virtual addresses with
    some kind of "address-space ID" (ASID).

    That in turn makes it harder to share read-write memory between
    processes (Mill's approach tried to accommodate that by augmenting only
    *some* addresses with an ASID, but not all), and requires flushing the
    cache when an ASID is re-used for another process (which can happen
    rather often because the size of the ASID is usually limited to a small
    number of bits).


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 16:19:12 2025
    From Newsgroup: comp.arch

    On 10/3/2025 10:41 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus
    Torvalds (extracted from
    <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.

    Avoided.

    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.

    Avoided.

    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform

    Avoided

    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
    |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

    Avoided

    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!

    Avoided

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.

    Avoided--and mine are even coherent so you don't even have to shoot
    them down.

    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.

    Avoided.

    | - make exceptions asynchronous.

    Avoided

    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!

    Avoided

    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.

    Avoided

    |I'm sure I've forgotten many other points. And I'm sure that hardware
    |people will figure it out!


    A clean sweep.


    The alternative position might be:
    All jank is acceptable so long as it doesn't significantly impede
    performance or negatively impact userland.

    Or, maybe, actively embracing the "full jank route".

    Possibly Torvalds wouldn't exactly approve though...


    Well, except for aligned-only and big-endian, better reasons not to go
    that way. Better IMO to just leave everything LE and then use byte-swap instructions for the rare case one needs to access a big-endian variable.

    Well, and then be annoyed that C lacks any standard way to specify the endianess of variables or pointers; and the need to have compiler
    builtins which map to to htonl/ntohl/htons/ntohs/... (with the usual
    annoyance that one also needs a generic function fallback in the
    background for the case where someone wants to take the function pointer
    of one of these functions; sorta like with memcpy and similar).



    If I were to try to go in a "jank reducing" direction, probably:
    Use XG3 as a design base;
    Comparably cleaner and more orthogonal than XG1 and XG2.
    Eliminate Modal stuff;
    Maybe drop the RISC-V conjoined-twin thing;
    Hardware page walker and fully IEEE FPU?...
    Probably also add cache coherence.
    Mandate zero or sign extended registers as the default (like x86-64);
    Put FPU status/control into its own register or similar (*1).
    ...

    Though, unclear is if a "good" core by these definitions could be done
    without a significant negative impact on FPGA resource budget.



    *1: Sticking it into the HOBs of either GP or SP is ugly, and has an unreasonable level of footgun potential. So, this is pretty high on my
    "I probably need to change this before it ends up getting stuck this way permanently" thing (in which case, would go back to SP[63:48] being
    hard-wired to 0).

    This is probably one of those "going to change once I come up with a
    better option" situations.

    Don't really want to define a new CR for this, but need a place to put
    it that:
    May be exposed to userland without creating problems;
    May be saved/restored on context switches.

    Actually, relocating it the HOBs of TBR could almost work here:
    Already preserved on context switch;
    Not directly visible to RISC-V or XG3 via normal registers;
    TP is a shadow of TBR in TestKern, but TP is its own register here.

    In this case, might change TP from "Read Only in userland" to "Fault on attempt to modify low 48-bits in Userland".


    Exposure to RISC-V land being the bigger problem, as compilers like GCC
    are not going to be aware of "various registers may have weird crap
    squirreled into the HOBs" type issues.

    Granted, Link-Registers have weird stuff in the HOBs, but generally GCC doesn't poke at the link register. But, then again, there is still the
    "glibc violently explodes if I try to use it" issue, and I can't prove
    this is not due to the wacky link registers or similar (would have to
    more carefully examine it to make sure it isn't doing something weird
    here). If it turns out that glibc messes with the link register, may
    need to figure out a way to make RV mode work with bare-pointer link registers.


    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 17:42:19 2025
    From Newsgroup: comp.arch

    On 10/3/2025 10:26 AM, Stefan Monnier wrote:
    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    You can... just sort of not support full "fork()"; or support it in a
    way similar to how it works on ucLinux and Cygwin. Namely, you can use
    it, but trying to use it for anything more than a fork immediately
    followed by an "exec*" call or similar is probably going to break something.

    Well, or anything that depends on "fork()" isn't going to work; and the preferable way to spawn new process instances is something along the
    lines of a "CreateProcessEx()" style mechanism.




    As can be noted, I had designed my ABIs with the assumption of a single address space.

    Generally, it ended up as 48 bit as, even within the limits of an FPGA
    with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped (where, 32-bits is only really enough for a single-program in an address space, if that).


    My "break glass" feature for 48-bits being insufficient for a single
    address space was expanding the VAS to 96 bits, though even this was a
    bit wonk:
    Low 32-bits: Real address bits;
    Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't
    break.

    Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than extending them by 48 bits, and offers a sufficiently low probability of aliasing.


    So, in the 96-bit mode:
    0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:
    Preserved exactly if no higher addresses used.
    Anything else: YMMV.

    There is a non-zero risk of random 4GB regions aliasing based on the
    whims of the XOR, as actually storing full 96-bit addresses is steep.
    The page-tables and TLB could support full-width 96-bit addresses, so
    the main problem area would be trying to use two addresses at the same
    time where they would map to the same location in the L1 cache.

    However, if one assumes a scenario where each program is confined to a
    slice of the bigger 96-bit space, then the XOR's all even out and the
    address space is consistent (the risk mostly appearing when using
    addresses not within the same 48-bit "quadrant").


    Theoretically, the OS's ASLR could keep track of this and not assign
    address ranges that would alias with previously used address ranges (via
    a lookup table).

    Kinda similar crap to the "PE loader may not load a PE to an address
    that crosses a 4GB boundary" because it adds cost to have
    direct-branches and PC increment need to deal with more than 4GB.
    Well, sorta:
    PC increment still has a 4GB window;
    Branches are either 16MB window (via branch predictor);
    Or, +/- 8GB, via normal address calc.
    Branch predictor detecting carry-out and not handling the branch.
    Was 4GB originally, but the above trick allowed being cheaper here.
    However, crossing a 16MB barrier has a performance penalty.
    Statistically low probability of ".text" crossing such a barrier.


    Arguably, all still kinda crap though...


    For now, 48-bits is plenty for my uses.

    I considered possible options 64-bit VAS support (within the 96-bit
    mode), but annoyingly, if done in an affordable way, would likely not
    allow program code outside the low 48 bits, or arrays crossing a 48-bit boundary (or, still slightly jank).



    Though, IMHO, still better than what MIPS did, IIRC:
    PC1[63:28] = PC0[63:28]
    PC1[27: 2] = JAL_Addr[25:0]
    PC1[ 1: 0] = 0

    Or, say, you have a 256MB barrier that may not be crossed, and the
    loader would need to rebase within said 256 MB.

    Information is inconsistent for conditional branches, where some
    information implies it is simply adding the displacement (scaled by 4),
    and other info implies:
    Copy high bits unchanged;
    Add low-order bits;
    Address may wrap if it crosses some ill-defined address barrier.

    They seemingly missed an opportunity to go cheaper for Bcc here, say:
    PC1[63:20] = PC0[63:20]
    PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])
    PC1[13: 2] = Bcc_Addr[11:0]
    PC1[ 1: 0] = 0
    Then, say, one only needs to do an 6-bit addition for the conditional
    branch instruction.

    Trying to rebase a program at load time being "there be dragons here" territory.


    ...




    Stefan

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Sat Oct 4 04:36:28 2025
    From Newsgroup: comp.arch

    In article <jwvo6qoui1m.fsf-monnier+comp.arch@gnu.org>,
    Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most >importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    Stefan

    Copy-on-Access gives you 100% compatibility with all fork() semantics.

    You can define SAS in a way that almost defeats virtual addresses, but
    let's assume we have 48-bit virtual address space and 16-bit ASID, for
    an effective 64-bit SAS. We'll have every process using a different ASID.
    And we'll assume the ASID affects dcache indexing so we have to handle that.

    First process is ASID=1. It forks, and the child is ASID=2. It is a completely new address space. We'll assume they cannot see each other's
    data in the dcache due to the virtual indexes being different. So
    ASID=1, VA=0x1000 maps to a different dcache index than ASID=2,
    VA=0x1000 even if they map to the same physical address. The ASID=2
    process starts (for the sake of a simple explanation) with no pages
    mapped, except it maps all the read-only instruction pages from ASID=1
    as ASID=2. (Note it doesn't matter if these are at different
    instruction and/or data cache indexes since it's always read-only). All
    data pages from the ASID=1 process are made invalid (in the page table,
    and removed from the TLB). Now ASID=1 and ASID=2 are running
    simultaneously. If the ASID=1 process touches any data page, the OS
    copies the contents of that original physical page to a new page, and
    makes that new page available to the ASID=2 process. This copy is the
    real trick: in the dumbest possible implementation, the OS flushes the
    data to DRAM, then copies it to the new physical address, and flushes
    that to DRAM. But systems with caches with virtual aliasing generally
    provide ways to handle the aliasing in a more efficient way to do this
    copying in the caches, at least in the L2 cache. Once the copy of the
    one page is done, the OS then makes the corresponding ASID=1 page
    writeable, and continues. Similarly, if the ASID=2 process touches a
    page, it gets a copy of the ASID=1 page (which ASID=1 has not touched
    yet), and then the OS gives the ASID=1 process write access to that
    page. Basically, both processes are "paging in" the ASID=1 pages.

    ASID=1 keeps all of its physical pages. ASID=2 get a copy of all the
    physical pages from ASID=1 that it touches.

    Note that COW has to go and make all pages of the initial process read-only, which might be more work than to just make all pages invalid.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 18:36:45 2025
    From Newsgroup: comp.arch

    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

    I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As
    Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 19:00:17 2025
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS,

    Don't forget all the home computers. It might be debatable if they
    should be called "system", though.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat Oct 4 12:31:49 2025
    From Newsgroup: comp.arch

    On 10/4/2025 11:36 AM, John Levine wrote:
    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a
    completely new address space. ...

    I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT.

    Isn't the AS/400, or whatever it is called now, a SAS?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 01:05:17 2025
    From Newsgroup: comp.arch

    On Sat, 4 Oct 2025 18:36:45 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

    I don't think anyone would call a system that gives each process a
    completely new address space a single address space system.

    Agreed.

    Making
    the ASID part of the translated address is one of many ways of
    implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,


    How would you call OS/400 (nowadays, IBM i) ?

    each of which provided a single full sized
    address space in which they essentially ran their real memory
    predecessors MFT and MVT. As Lynn has often told us, operating
    system bloat forced them quickly to go to MVS, an address space per
    process.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.


    IIRC, Windows CE supported SAS mode of operation just fine without such limitations.





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 22:44:52 2025
    From Newsgroup: comp.arch

    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,

    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without such >limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:57:16 2025
    From Newsgroup: comp.arch

    On 10/4/2025 5:44 PM, John Levine wrote:
    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,

    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.


    FWIW, I suspect that the number of programs that use "fork()" without immediately calling "exec*()" is probably fairly small.

    AFAIK, programs that depend on full "fork()" semantics wont generally
    work on Cygwin either, as IIRC it is just sort of faked by copying the
    local stack frame and spawning out a new thread that terminates on the "exec*()" call.

    Apart from non-PIE ELF or similar, not much else doesn't work in an SAS. Though, ABI tweaks are needed to make things efficient (eg, not needing
    to load in a new copy of the binaries for every new process).


    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without such
    limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.


    Not sure if 16-bit protected mode segmentation counts as SAS though.
    MS-DOS, maybe, as one could do address math on the segments.


    FWIW, some of my own engineering efforts here took inspiration from
    Windows CE.

    Like, the way I am using the "Global Pointer" directory entry in the
    PE/COFF headers wasn't entirely a novel innovation on my end, ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 02:18:26 2025
    From Newsgroup: comp.arch

    On Sat, 4 Oct 2025 22:44:52 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    It appears that Michael S <already5chosen@yahoo.com> said:

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when
    the system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without
    such limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.


    It's not the same.
    CE supported preemptive multitasking (arguably, better than likes of NT
    or majority of popular Unixes, at least as long as we are talking about non-SMP) and memory protection, both protection of kernel from user
    processes and of user processes from each other.

    I never took a look at CE support for Virtual Memory. Probably it was
    quite weak, if there was support at all. The only CE-based product I
    ever did had absolutely no need for Virtual Memory.

    However I am pretty sure that they utilized paging hardware for
    management of physical memory, removing fear of fragmentation.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lynn Wheeler@lynn@garlic.com to comp.arch on Sat Oct 4 14:17:32 2025
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> writes:
    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    they had two kinds of bloat. original decision to add virtual memory was because of MVT storage management problems, having to specify each
    region (concurrent execution) four times larger than actually used, as a
    result a typical 1mbyte 370/165 only ran four concurrent regions,
    insufficient to keep system busy and justified. Going to 16mbyte virtual address space (VS2/SVS) allowed concurrent regions to be increased by
    factor of four (sort of like running MVT in a 16mbyte CP67 virtual
    machine ... aka CP67 recursor to VM370), with little or no paging
    ... although caped at 15 because of 4bit storage protects keys.

    Problem was that as systems got larger/faster needed to move past 15
    concurrent regions ... which resulted in giving each concurrently
    executing region/program, their own 16mbyte virtual address space
    (VS2/MVS). However, OS/360 & descendents were heavily pointer passing
    APIs (creating a different problem) and so they mapped a 8mbyte image of
    the MVS kernel into every 16mbyte virtual address space (leaving
    8mbytes). Then because each subsystem was moved into their separate
    16mbyte virtual address space, the 1mbyte "Common Segment Area" (CSA)
    was mapped into every virtual address space for passing arguments/data
    back and forth between applications and subsystems (leaving 7mbytes).

    Then because the space requirements for passing arguments/data back and
    forth was somewhat proportional to number of subsystems and concurrently running regions/applications, the CSA started to explode becoming the
    Common System Area (CSA) running 5-6mbytes (leaving 2-3mbytes for regions/applications) and threatening to become 8mbytes (leaving zero
    for regions/applications). At the same time the number of concurrently
    running applications space requirements was exceeding 16mbytes real
    address ... and 2nd half 70s, 3033s were retrofitted for 64mbytes real addressing by taking two unused bits in page table entry and prefixing
    them to the 12bit (4k) real page number for 14bits or 64mbyte
    (instructions were still 16mbyte, but virtual pages could be
    loaded and run "above the 16mbyte line").

    Then part of 370/xa "access registers" was retrofitted to 3033 for dual
    address space mode. Calls to subsystems, could move the caller's address
    space pointer into the secondary address space register and the
    subsystem address space pointer was moved into primary. Subsystems then
    could access the caller's (secondary) virtual address space w/o needing
    data be passed back&forth in CSA. For 370/xa, program call/return
    instructions could perform the address space primary/secondary switches
    all in hardware.

    I had also started pontificating that lot of OS/360 had heavily
    leveraged I/O system to compensate for limited real storage (and
    descendents had inherited it). In early 80s, I wrote a tome that
    relative system disk I/O throughput had declined by an order of
    magnitude (disks throughput got 3-5 times faster while systems got 40-50
    times faster (major motivation for constantly needing increasingly
    number of concurrently executing programs). Disk division executive took exception and directed the division performance organization to refute
    my claims. After a couple weeks, they came back and basically said that
    I had slightly understated the problem. They then respun the analysis
    for SHARE (user group) presentation on how to configure/manage disks for improved system throughput (16Aug1984, SHARE 63, B874).

    3033 above the "16mbyte" line hack: There were problems with parts of
    system that required virtual pages below the "16mbyte line". Introduced
    with 370 was I/O channel program IDALs that were full-word
    addresses. Somebody came up with idea to use IDALs to write a virtual
    page (above 16mbyte) to disk and then read it back into address
    <16mbyte. I gave them a hack using virtual address space table that
    filled in page table entries with the >16mbyte page number and <16mbyte
    page number and use MVCL instruction to copy the virtual page from above 16mbyte line to below the line.
    --
    virtualization experience starting Jan1968, online at home since Mar1970
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Oct 5 13:02:29 2025
    From Newsgroup: comp.arch

    John Levine wrote:
    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,
    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.

    For operating systems like VMS and WNT that cannot fork (duplicate a
    parent virtual space into a child) Posix allows spawn() instead.

    Spawn is equivalent to fork()/exec() and CreateProcess() in that
    it creates a new address space, loads an exe, and starts a thread.
    Like fork() and WNT CreateProcess(), spawn() allows open file descriptor handles to be passed to the child.

    https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html

    https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From George Neuner@gneuner2@comcast.net to comp.arch on Mon Oct 6 06:54:10 2025
    From Newsgroup: comp.arch

    On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
    Dickey) wrote:

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually >>> >> |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches >>> >> |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
    SAS - which is that copied /pointers/ remain referencing objects in
    the original process. Under the multi-space model of Unix/Linux,
    after a fork the copied pointers should be referencing the copied
    objects in the new process.

    Lacking a way to identify and fixup pointer values, under SAS by
    simply copying data (COW or COA) you end unintentionally /sharing/
    data.


    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction
    pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s >(maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note
    since the OS knows it needs to copy pages, it can pre-copy a bunch of
    pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    fork-exec is not a problem. fork alone is.

    How did HP-UX on PA-RISC handle fork?


    Kent

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 15:49:10 2025
    From Newsgroup: comp.arch

    In article <10brpft$23go$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >>completely new address space. ...

    Sorry, bad terminology. I just means all addresses under ASID=2 are
    invalid.

    In my example, all processes can peek inside any other process's address
    space, by just forming the 64-bit virtual address. The ASID thing is
    just a convention, so I wouldn't have to type 16 digit hex numbers over and over.

    [snip]

    The last widely used single address space systems I can think of were OS/VS1 >and OS/VS2 SVS, each of which provided a single full sized address space in >which they essentially ran their real memory predecessors MFT and MVT. As >Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    HP-UX on PA-RISC from 1986-2004 or so was effectively a SAS computer. In 32-bit CPUs, the virtual address space was 48 bits, and normal user code could form any 48-bit address, and this was used for shared libraries and shared
    code (processes running the same executable shared the same virtual address space for the executable). In 64-bit mode, it works mostly as I described. There were 32-bit Space registers which were OR'ed into the upper bits of
    the 64-bit virtual address, to give the global 64-bit system address.
    It was an OS convention to limit the Space values to the upper 16 bits or so, and it could change it to whatever it wanted.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.



    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 16:44:52 2025
    From Newsgroup: comp.arch

    In article <ne67ekdeej48s8jp7jh1ahda32qmiphm0p@4ax.com>,
    George Neuner <gneuner2@comcast.net> wrote:
    On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
    Dickey) wrote:

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually >>>> >> |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches >>>> >> |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. >>>> > Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution, >>>> which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork >>starts the child with a copy of the parent's address mapping, and uses >>"Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
    SAS - which is that copied /pointers/ remain referencing objects in
    the original process. Under the multi-space model of Unix/Linux,
    after a fork the copied pointers should be referencing the copied
    objects in the new process.

    Lacking a way to identify and fixup pointer values, under SAS by
    simply copying data (COW or COA) you end unintentionally /sharing/
    data.


    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction >>pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s >>(maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note >>since the OS knows it needs to copy pages, it can pre-copy a bunch of >>pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    fork-exec is not a problem. fork alone is.

    How did HP-UX on PA-RISC handle fork?


    Kent

    This is what I was saying: if you define SAS to only mean that each
    process is living at a unique address, and it knows its full address,
    then I don't wish to discuss that SAS. That's like running without
    virtual memory.

    If you define SAS that all processes can see other running processes
    addresses, and can directly read/write each others addresses (with protection obviously), then that's the SAS HP PA-RISC ran in.

    HP PA-RISC 64-bit creates a 64-bit global virtual address. Each process
    by convention lives in a smaller part of that, let's say a 48-bit space.
    Each process has 8 32-bit Space Registers (not general registers, and
    some are not writeable by the user, but 5 are writeable) which are OR'ed
    in to bits [63:32] of the VA address bits formed by loads and stores to
    form the GVA. Of GVA bits [63:32], it's an OS convention how many bits
    are effectively the ASID and how many are VA bits for the process.
    The GVA is mostly transparent to the user process--they can read the Space Registers and figure it out if they want to, but this was not usual.

    [The architecture defines Space registers as up to 64-bit, so there's a 96-bit GVA, but the hardware only implemented 32-bit Space registers with a 64-bit GVA].

    Note that at any time, user code can set Space Register 1 to 0, form
    the address 0x12345678_12345670 in a register, and try to read and write
    that address. This will generally fail due to a Protection ID scheme, but
    some Space Register values were reserved for shared libraries to share the
    code at the same GVA in all processes.

    So fork() is easy--no pointers in memory or registers are affected, the
    OS assignes a new ASID, puts that in the upper bits of the Space
    Registers for the new process, and it's off. But all HP PA-RISC CPUs have virtually indexed caches, where the ASID is mixed in with lower address
    bits to "hash" the cache lookup. So it needed to do COA since the new
    ASID is different, so the same VA wouldn't see the cached data of the
    old process.

    Note that the OS sees all processes at once. If it wants to read from
    one process and write to another, it can just do Load, then Store.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2