Forum: War Ensemble BBS

Linus Torvalds on bad architectural features

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 3 08:58:32 2025

From Newsgroup: comp.arch

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus
Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|
| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.
|
| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform
|
| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
|taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
|
| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!
|
| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.
|
| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.
|
| - make exceptions asynchronous.
|
| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!
|
| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.
|
|I'm sure I've forgotten many other points. And I'm sure that hardware
|people will figure it out!
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 05:40:22 2025

From Newsgroup: comp.arch

On 10/3/2025 3:58 AM, Anton Ertl wrote:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

Yeah...

Sadly I kinda feel called out here.
Wouldn't necessarily get the Torvalds' seal of approval...

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|

Sorta applies to my core...
Though the L1D$ also remembers the Phys-Addr and uses this for Write-Back.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.
|

Avoided in BJX2 Core.

Would apply to my smaller BSR1 and B32V cores (aligned only = cheaper).

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform
|

Former true of SuperH.
Both true of BJX1.
Latter true of BJX2 XG1/XG2.

Not true of XG3, which went over to superscalar.

WEX Bundling may have been a mistake in retrospect...

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
|
| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|

Avoided.

| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!
|

Avoided.

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.
|

Errm, true of BJX2.

Though, TLB Misses are nowhere near that frequent though (if they were, performance would be unusable dog crap).

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|

Also true of my core.

It also now pretends to have Binary128, pretty much entirely by software traps.

But, trapping has less code footprint, so if sinl/cosl/... are used,
they wont burn as much space in ".text" with the function calls (and if
I can trap out of RISC-V mode, then it can use 128-bit math and a few
other features that don't exist in RV64, so it isn't necessarily slower
than using a function call).

| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.
|

But, makes hardware cheaper...

| - make exceptions asynchronous.
|
| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|

Avoided:
TLB Miss handling really needs precise exceptions in order to work
correctly.

| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!
|
| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.
|
|I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!

Ignoring HOB's in pointers except in certain edge cases?...

I have mixed feelings about having put FPU status in HOBs of SP
(possible foot gun).

Weak coherence, with special rituals needed to actually get caches
flushed?...

Bit-slicing certain address calculations so the relevant structures have mandatory alignment?...

Interrupt entry is basically just a glorified branch-with-mode change,
so the ISR handler has to go through a convoluted sequence to get to
where it can start saving off the registers?...

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Fri Oct 3 13:46:45 2025

From Newsgroup: comp.arch

On Fri, 03 Oct 2025 08:58:32 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

I see nothing wrong (and plenty right) about SAS as long as address
space is big enough.
I.e. not 47-48 bits and preferably even not 56 bits. Considering
near-death of Moore Law, 58 or 60 bits should be enough for SAS for
next 50 years. May be, even for 100.

SAS does not allow few tricks that people play today with aliases, but
none of these tricks is really important for performance and all are detrimental for sanity.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 11:26:11 2025

From Newsgroup: comp.arch

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:41:34 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

Avoided.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.

Avoided.

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform

Avoided

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

Avoided

| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!

Avoided

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.

Avoided--and mine are even coherent so you don't even have to shoot
them down.

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.

Avoided.

| - make exceptions asynchronous.

Avoided

| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!

Avoided

| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.

Avoided

|I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!

A clean sweep.
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:42:35 2025

From Newsgroup: comp.arch

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Fri Oct 3 16:18:47 2025

From Newsgroup: comp.arch

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction
pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s
(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note
since the OS knows it needs to copy pages, it can pre-copy a bunch of
pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:44:26 2025

From Newsgroup: comp.arch

Kent Dickey [2025-10-03 16:18:47] wrote:

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

The problem is not how/when you do the "copy", but the fact that once
the data at address A has been changed, address A in the child process
and address A in the parent don't contain the same value. This is fundamentally at odds with SASOS and with virtually-indexed&tagged
caches. The usual workaround is to augment the virtual addresses with
some kind of "address-space ID" (ASID).

That in turn makes it harder to share read-write memory between
processes (Mill's approach tried to accommodate that by augmenting only
*some* addresses with an ASID, but not all), and requires flushing the
cache when an ASID is re-used for another process (which can happen
rather often because the size of the ASID is usually limited to a small
number of bits).

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 16:19:12 2025

From Newsgroup: comp.arch

On 10/3/2025 10:41 AM, MitchAlsup wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus
Torvalds (extracted from
<https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

Avoided.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.

Avoided.

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform

Avoided

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
|taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

Avoided

| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!

Avoided

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.

Avoided--and mine are even coherent so you don't even have to shoot
them down.

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.

Avoided.

| - make exceptions asynchronous.

Avoided

| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!

Avoided

| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.

Avoided

|I'm sure I've forgotten many other points. And I'm sure that hardware
|people will figure it out!

A clean sweep.

The alternative position might be:
All jank is acceptable so long as it doesn't significantly impede
performance or negatively impact userland.

Or, maybe, actively embracing the "full jank route".

Possibly Torvalds wouldn't exactly approve though...

Well, except for aligned-only and big-endian, better reasons not to go
that way. Better IMO to just leave everything LE and then use byte-swap instructions for the rare case one needs to access a big-endian variable.

Well, and then be annoyed that C lacks any standard way to specify the endianess of variables or pointers; and the need to have compiler
builtins which map to to htonl/ntohl/htons/ntohs/... (with the usual
annoyance that one also needs a generic function fallback in the
background for the case where someone wants to take the function pointer
of one of these functions; sorta like with memcpy and similar).

If I were to try to go in a "jank reducing" direction, probably:
Use XG3 as a design base;
Comparably cleaner and more orthogonal than XG1 and XG2.
Eliminate Modal stuff;
Maybe drop the RISC-V conjoined-twin thing;
Hardware page walker and fully IEEE FPU?...
Probably also add cache coherence.
Mandate zero or sign extended registers as the default (like x86-64);
Put FPU status/control into its own register or similar (*1).
...

Though, unclear is if a "good" core by these definitions could be done
without a significant negative impact on FPGA resource budget.

*1: Sticking it into the HOBs of either GP or SP is ugly, and has an unreasonable level of footgun potential. So, this is pretty high on my
"I probably need to change this before it ends up getting stuck this way permanently" thing (in which case, would go back to SP[63:48] being
hard-wired to 0).

This is probably one of those "going to change once I come up with a
better option" situations.

Don't really want to define a new CR for this, but need a place to put
it that:
May be exposed to userland without creating problems;
May be saved/restored on context switches.

Actually, relocating it the HOBs of TBR could almost work here:
Already preserved on context switch;
Not directly visible to RISC-V or XG3 via normal registers;
TP is a shadow of TBR in TestKern, but TP is its own register here.

In this case, might change TP from "Read Only in userland" to "Fault on attempt to modify low 48-bits in Userland".

Exposure to RISC-V land being the bigger problem, as compilers like GCC
are not going to be aware of "various registers may have weird crap
squirreled into the HOBs" type issues.

Granted, Link-Registers have weird stuff in the HOBs, but generally GCC doesn't poke at the link register. But, then again, there is still the
"glibc violently explodes if I try to use it" issue, and I can't prove
this is not due to the wacky link registers or similar (would have to
more carefully examine it to make sure it isn't doing something weird
here). If it turns out that glibc messes with the link register, may
need to figure out a way to make RV mode work with bare-pointer link registers.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 17:42:19 2025

From Newsgroup: comp.arch

On 10/3/2025 10:26 AM, Stefan Monnier wrote:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

You can... just sort of not support full "fork()"; or support it in a
way similar to how it works on ucLinux and Cygwin. Namely, you can use
it, but trying to use it for anything more than a fork immediately
followed by an "exec*" call or similar is probably going to break something.

Well, or anything that depends on "fork()" isn't going to work; and the preferable way to spawn new process instances is something along the
lines of a "CreateProcessEx()" style mechanism.

As can be noted, I had designed my ABIs with the assumption of a single address space.

Generally, it ended up as 48 bit as, even within the limits of an FPGA
with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped (where, 32-bits is only really enough for a single-program in an address space, if that).

My "break glass" feature for 48-bits being insufficient for a single
address space was expanding the VAS to 96 bits, though even this was a
bit wonk:
Low 32-bits: Real address bits;
Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't
break.

Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than extending them by 48 bits, and offers a sufficiently low probability of aliasing.

So, in the 96-bit mode:
0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:
Preserved exactly if no higher addresses used.
Anything else: YMMV.

There is a non-zero risk of random 4GB regions aliasing based on the
whims of the XOR, as actually storing full 96-bit addresses is steep.
The page-tables and TLB could support full-width 96-bit addresses, so
the main problem area would be trying to use two addresses at the same
time where they would map to the same location in the L1 cache.

However, if one assumes a scenario where each program is confined to a
slice of the bigger 96-bit space, then the XOR's all even out and the
address space is consistent (the risk mostly appearing when using
addresses not within the same 48-bit "quadrant").

Theoretically, the OS's ASLR could keep track of this and not assign
address ranges that would alias with previously used address ranges (via
a lookup table).

Kinda similar crap to the "PE loader may not load a PE to an address
that crosses a 4GB boundary" because it adds cost to have
direct-branches and PC increment need to deal with more than 4GB.
Well, sorta:
PC increment still has a 4GB window;
Branches are either 16MB window (via branch predictor);
Or, +/- 8GB, via normal address calc.
Branch predictor detecting carry-out and not handling the branch.
Was 4GB originally, but the above trick allowed being cheaper here.
However, crossing a 16MB barrier has a performance penalty.
Statistically low probability of ".text" crossing such a barrier.

Arguably, all still kinda crap though...

For now, 48-bits is plenty for my uses.

I considered possible options 64-bit VAS support (within the 96-bit
mode), but annoyingly, if done in an affordable way, would likely not
allow program code outside the low 48 bits, or arrays crossing a 48-bit boundary (or, still slightly jank).

Though, IMHO, still better than what MIPS did, IIRC:
PC1[63:28] = PC0[63:28]
PC1[27: 2] = JAL_Addr[25:0]
PC1[ 1: 0] = 0

Or, say, you have a 256MB barrier that may not be crossed, and the
loader would need to rebase within said 256 MB.

Information is inconsistent for conditional branches, where some
information implies it is simply adding the displacement (scaled by 4),
and other info implies:
Copy high bits unchanged;
Add low-order bits;
Address may wrap if it crosses some ill-defined address barrier.

They seemingly missed an opportunity to go cheaper for Bcc here, say:
PC1[63:20] = PC0[63:20]
PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])
PC1[13: 2] = Bcc_Addr[11:0]
PC1[ 1: 0] = 0
Then, say, one only needs to do an 6-bit addition for the conditional
branch instruction.

Trying to rebase a program at load time being "there be dragons here" territory.

...

Stefan

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Sat Oct 4 04:36:28 2025

From Newsgroup: comp.arch

In article <jwvo6qoui1m.fsf-monnier+comp.arch@gnu.org>,
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most >importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

Stefan

Copy-on-Access gives you 100% compatibility with all fork() semantics.

You can define SAS in a way that almost defeats virtual addresses, but
let's assume we have 48-bit virtual address space and 16-bit ASID, for
an effective 64-bit SAS. We'll have every process using a different ASID.
And we'll assume the ASID affects dcache indexing so we have to handle that.

First process is ASID=1. It forks, and the child is ASID=2. It is a completely new address space. We'll assume they cannot see each other's
data in the dcache due to the virtual indexes being different. So
ASID=1, VA=0x1000 maps to a different dcache index than ASID=2,
VA=0x1000 even if they map to the same physical address. The ASID=2
process starts (for the sake of a simple explanation) with no pages
mapped, except it maps all the read-only instruction pages from ASID=1
as ASID=2. (Note it doesn't matter if these are at different
instruction and/or data cache indexes since it's always read-only). All
data pages from the ASID=1 process are made invalid (in the page table,
and removed from the TLB). Now ASID=1 and ASID=2 are running
simultaneously. If the ASID=1 process touches any data page, the OS
copies the contents of that original physical page to a new page, and
makes that new page available to the ASID=2 process. This copy is the
real trick: in the dumbest possible implementation, the OS flushes the
data to DRAM, then copies it to the new physical address, and flushes
that to DRAM. But systems with caches with virtual aliasing generally
provide ways to handle the aliasing in a more efficient way to do this
copying in the caches, at least in the L2 cache. Once the copy of the
one page is done, the OS then makes the corresponding ASID=1 page
writeable, and continues. Similarly, if the ASID=2 process touches a
page, it gets a copy of the ASID=1 page (which ASID=1 has not touched
yet), and then the OS gives the ASID=1 process write access to that
page. Basically, both processes are "paging in" the ASID=1 pages.

ASID=1 keeps all of its physical pages. ASID=2 get a copy of all the
physical pages from ASID=1 that it touches.

Note that COW has to go and make all pages of the initial process read-only, which might be more work than to just make all pages invalid.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 18:36:45 2025

From Newsgroup: comp.arch

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As
Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 19:00:17 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> schrieb:

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS,

Don't forget all the home computers. It might be debatable if they
should be called "system", though.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat Oct 4 12:31:49 2025

From Newsgroup: comp.arch

On 10/4/2025 11:36 AM, John Levine wrote:

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a
completely new address space. ...

I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT.

Isn't the AS/400, or whatever it is called now, a SAS?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 01:05:17 2025

From Newsgroup: comp.arch

On Sat, 4 Oct 2025 18:36:45 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

I don't think anyone would call a system that gives each process a
completely new address space a single address space system.

Agreed.

Making
the ASID part of the translated address is one of many ways of
implementing a conventional address space per process system.

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

each of which provided a single full sized
address space in which they essentially ran their real memory
predecessors MFT and MVT. As Lynn has often told us, operating
system bloat forced them quickly to go to MVS, an address space per
process.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such limitations.

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 22:44:52 2025

From Newsgroup: comp.arch

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such >limitations.

For that matter, so did MS-DOS and Windows up through 3.0.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:57:16 2025

From Newsgroup: comp.arch

On 10/4/2025 5:44 PM, John Levine wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

FWIW, I suspect that the number of programs that use "fork()" without immediately calling "exec*()" is probably fairly small.

AFAIK, programs that depend on full "fork()" semantics wont generally
work on Cygwin either, as IIRC it is just sort of faked by copying the
local stack frame and spawning out a new thread that terminates on the "exec*()" call.

Apart from non-PIE ELF or similar, not much else doesn't work in an SAS. Though, ABI tweaks are needed to make things efficient (eg, not needing
to load in a new copy of the binaries for every new process).

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such
limitations.

For that matter, so did MS-DOS and Windows up through 3.0.

Not sure if 16-bit protected mode segmentation counts as SAS though.
MS-DOS, maybe, as one could do address math on the segments.

FWIW, some of my own engineering efforts here took inspiration from
Windows CE.

Like, the way I am using the "Global Pointer" directory entry in the
PE/COFF headers wasn't entirely a novel innovation on my end, ...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 02:18:26 2025

From Newsgroup: comp.arch

On Sat, 4 Oct 2025 22:44:52 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when
the system is built.

IIRC, Windows CE supported SAS mode of operation just fine without
such limitations.

For that matter, so did MS-DOS and Windows up through 3.0.

It's not the same.
CE supported preemptive multitasking (arguably, better than likes of NT
or majority of popular Unixes, at least as long as we are talking about non-SMP) and memory protection, both protection of kernel from user
processes and of user processes from each other.

I never took a look at CE support for Virtual Memory. Probably it was
quite weak, if there was support at all. The only CE-based product I
ever did had absolutely no need for Virtual Memory.

However I am pretty sure that they utilized paging hardware for
management of physical memory, removing fear of fragmentation.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn Wheeler@lynn@garlic.com to comp.arch on Sat Oct 4 14:17:32 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> writes:

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

they had two kinds of bloat. original decision to add virtual memory was because of MVT storage management problems, having to specify each
region (concurrent execution) four times larger than actually used, as a
result a typical 1mbyte 370/165 only ran four concurrent regions,
insufficient to keep system busy and justified. Going to 16mbyte virtual address space (VS2/SVS) allowed concurrent regions to be increased by
factor of four (sort of like running MVT in a 16mbyte CP67 virtual
machine ... aka CP67 recursor to VM370), with little or no paging
... although caped at 15 because of 4bit storage protects keys.

Problem was that as systems got larger/faster needed to move past 15
concurrent regions ... which resulted in giving each concurrently
executing region/program, their own 16mbyte virtual address space
(VS2/MVS). However, OS/360 & descendents were heavily pointer passing
APIs (creating a different problem) and so they mapped a 8mbyte image of
the MVS kernel into every 16mbyte virtual address space (leaving
8mbytes). Then because each subsystem was moved into their separate
16mbyte virtual address space, the 1mbyte "Common Segment Area" (CSA)
was mapped into every virtual address space for passing arguments/data
back and forth between applications and subsystems (leaving 7mbytes).

Then because the space requirements for passing arguments/data back and
forth was somewhat proportional to number of subsystems and concurrently running regions/applications, the CSA started to explode becoming the
Common System Area (CSA) running 5-6mbytes (leaving 2-3mbytes for regions/applications) and threatening to become 8mbytes (leaving zero
for regions/applications). At the same time the number of concurrently
running applications space requirements was exceeding 16mbytes real
address ... and 2nd half 70s, 3033s were retrofitted for 64mbytes real addressing by taking two unused bits in page table entry and prefixing
them to the 12bit (4k) real page number for 14bits or 64mbyte
(instructions were still 16mbyte, but virtual pages could be
loaded and run "above the 16mbyte line").

Then part of 370/xa "access registers" was retrofitted to 3033 for dual
address space mode. Calls to subsystems, could move the caller's address
space pointer into the secondary address space register and the
subsystem address space pointer was moved into primary. Subsystems then
could access the caller's (secondary) virtual address space w/o needing
data be passed back&forth in CSA. For 370/xa, program call/return
instructions could perform the address space primary/secondary switches
all in hardware.

I had also started pontificating that lot of OS/360 had heavily
leveraged I/O system to compensate for limited real storage (and
descendents had inherited it). In early 80s, I wrote a tome that
relative system disk I/O throughput had declined by an order of
magnitude (disks throughput got 3-5 times faster while systems got 40-50
times faster (major motivation for constantly needing increasingly
number of concurrently executing programs). Disk division executive took exception and directed the division performance organization to refute
my claims. After a couple weeks, they came back and basically said that
I had slightly understated the problem. They then respun the analysis
for SHARE (user group) presentation on how to configure/manage disks for improved system throughput (16Aug1984, SHARE 63, B874).

3033 above the "16mbyte" line hack: There were problems with parts of
system that required virtual pages below the "16mbyte line". Introduced
with 370 was I/O channel program IDALs that were full-word
addresses. Somebody came up with idea to use IDALs to write a virtual
page (above 16mbyte) to disk and then read it back into address
<16mbyte. I gave them a hack using virtual address space table that
filled in page table entries with the >16mbyte page number and <16mbyte
page number and use MVCL instruction to copy the virtual page from above 16mbyte line to below the line.
--
virtualization experience starting Jan1968, online at home since Mar1970
--- Synchronet 3.21a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Oct 5 13:02:29 2025

From Newsgroup: comp.arch

John Levine wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

For operating systems like VMS and WNT that cannot fork (duplicate a
parent virtual space into a child) Posix allows spawn() instead.

Spawn is equivalent to fork()/exec() and CreateProcess() in that
it creates a new address space, loads an exe, and starts a thread.
Like fork() and WNT CreateProcess(), spawn() allows open file descriptor handles to be passed to the child.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html

https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html

--- Synchronet 3.21a-Linux NewsLink 1.2

From George Neuner@gneuner2@comcast.net to comp.arch on Mon Oct 6 06:54:10 2025

From Newsgroup: comp.arch

On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
Dickey) wrote:

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually >>> >> |tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches >>> >> |on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
SAS - which is that copied /pointers/ remain referencing objects in
the original process. Under the multi-space model of Unix/Linux,
after a fork the copied pointers should be referencing the copied
objects in the new process.

Lacking a way to identify and fixup pointer values, under SAS by
simply copying data (COW or COA) you end unintentionally /sharing/
data.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction
pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s >(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note
since the OS knows it needs to copy pages, it can pre-copy a bunch of
pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

fork-exec is not a problem. fork alone is.

How did HP-UX on PA-RISC handle fork?

Kent

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 15:49:10 2025

From Newsgroup: comp.arch

In article <10brpft$23go$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >>completely new address space. ...

Sorry, bad terminology. I just means all addresses under ASID=2 are
invalid.

In my example, all processes can peek inside any other process's address
space, by just forming the 64-bit virtual address. The ASID thing is
just a convention, so I wouldn't have to type 16 digit hex numbers over and over.

[snip]

The last widely used single address space systems I can think of were OS/VS1 >and OS/VS2 SVS, each of which provided a single full sized address space in >which they essentially ran their real memory predecessors MFT and MVT. As >Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

HP-UX on PA-RISC from 1986-2004 or so was effectively a SAS computer. In 32-bit CPUs, the virtual address space was 48 bits, and normal user code could form any 48-bit address, and this was used for shared libraries and shared
code (processes running the same executable shared the same virtual address space for the executable). In 64-bit mode, it works mostly as I described. There were 32-bit Space registers which were OR'ed into the upper bits of
the 64-bit virtual address, to give the global 64-bit system address.
It was an OS convention to limit the Space values to the upper 16 bits or so, and it could change it to whatever it wanted.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 16:44:52 2025

From Newsgroup: comp.arch

In article <ne67ekdeej48s8jp7jh1ahda32qmiphm0p@4ax.com>,
George Neuner <gneuner2@comcast.net> wrote:

On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
Dickey) wrote:

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually >>>> >> |tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches >>>> >> |on context switches.

That is only true if one insists on OS with Multiple Address Spaces. >>>> > Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution, >>>> which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork >>starts the child with a copy of the parent's address mapping, and uses >>"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
SAS - which is that copied /pointers/ remain referencing objects in
the original process. Under the multi-space model of Unix/Linux,
after a fork the copied pointers should be referencing the copied
objects in the new process.

Lacking a way to identify and fixup pointer values, under SAS by
simply copying data (COW or COA) you end unintentionally /sharing/
data.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction >>pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s >>(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note >>since the OS knows it needs to copy pages, it can pre-copy a bunch of >>pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

fork-exec is not a problem. fork alone is.

How did HP-UX on PA-RISC handle fork?

Kent

This is what I was saying: if you define SAS to only mean that each
process is living at a unique address, and it knows its full address,
then I don't wish to discuss that SAS. That's like running without
virtual memory.

If you define SAS that all processes can see other running processes
addresses, and can directly read/write each others addresses (with protection obviously), then that's the SAS HP PA-RISC ran in.

HP PA-RISC 64-bit creates a 64-bit global virtual address. Each process
by convention lives in a smaller part of that, let's say a 48-bit space.
Each process has 8 32-bit Space Registers (not general registers, and
some are not writeable by the user, but 5 are writeable) which are OR'ed
in to bits [63:32] of the VA address bits formed by loads and stores to
form the GVA. Of GVA bits [63:32], it's an OS convention how many bits
are effectively the ASID and how many are VA bits for the process.
The GVA is mostly transparent to the user process--they can read the Space Registers and figure it out if they want to, but this was not usual.

[The architecture defines Space registers as up to 64-bit, so there's a 96-bit GVA, but the hardware only implemented 32-bit Space registers with a 64-bit GVA].

Note that at any time, user code can set Space Register 1 to 0, form
the address 0x12345678_12345670 in a register, and try to read and write
that address. This will generally fail due to a Protection ID scheme, but
some Space Register values were reserved for shared libraries to share the
code at the same GVA in all processes.

So fork() is easy--no pointers in memory or registers are affected, the
OS assignes a new ASID, puts that in the upper bits of the Space
Registers for the new process, and it's off. But all HP PA-RISC CPUs have virtually indexed caches, where the ASID is mixed in with lower address
bits to "hash" the cache lookup. So it needed to do COA since the new
ASID is different, so the same VA wouldn't see the cached data of the
old process.

Note that the OS sees all processes at once. If it wants to read from
one process and write to another, it can just do Load, then Store.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Tue Oct 7 01:30:59 2025
  from Moore, Ok via Telnet
- Microbot
  Mon Oct 6 03:01:21 2025
  from Moore, Ok via Telnet
- Djatropine
  Sun Oct 5 20:05:43 2025
  from Memphis, Tn via SSH
- Microbot
  Sun Oct 5 04:13:15 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,071
Nodes:	10 (0 / 10)
Uptime:	186:25:46
Calls:	13,762
Calls today:	1
Files:	186,985
D/L today:	8,389 files (2,645M bytes)
Messages:	2,427,100

Linus Torvalds on bad architectural features

Who's Online

Recent Visitors

System Info