Forum: War Ensemble BBS

x86S Specification

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Thu Oct 17 17:34:14 2024

From Newsgroup: comp.arch

There is a reference in this Reg article

https://www.theregister.com/2024/10/15/intel_amd_x86_future/

to x86S spec, a proposal from Intel to pare-down the x86/x64
by removing or modifying legacy features.

[PDF] Envisioning a Simplified Intel Architecture https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html

Some examples are:

3 Architectural Changes
3.1 Removal of 32-Bit Ring 0
3.2 Removal of Ring 1 and Ring 2
3.3 Removal of 16-Bit and 32-Bit Protected Mode
3.4 Removal of 16-Bit Addressing and Address Size Overrides
3.5 CPUID
3.6 Restricted Subset of Segmentation
3.7 New Checks When Loading Segment Registers
3.7.1 Code and Data Segment Types
3.7.2 System Segment Types (S=0)
3.8 Removal of #SS and #NP Exceptions17
3.9 Fixed Mode Bits
3.9.1 Fixed CR0 Bits
3.9.2 Fixed CR4 Bits
3.9.3 Fixed EFER Bits
3.9.4 Removed RFLAGS
3.9.5 Removed Status Register Instruction
3.9.6 Removal of Ring 3 I/O Port Instructions
3.9.7 Removal of String I/O

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Mon Oct 21 17:02:27 2024

From Newsgroup: comp.arch

On 10/17/2024 4:34 PM, EricP wrote:

There is a reference in this Reg article

https://www.theregister.com/2024/10/15/intel_amd_x86_future/

to x86S spec, a proposal from Intel to pare-down the x86/x64
by removing or modifying legacy features.

[PDF] Envisioning a Simplified Intel Architecture https://www.intel.com/content/www/us/en/developer/articles/technical/ envisioning-future-simplified-architecture.html

Some examples are:

3 Architectural Changes
3.1 Removal of 32-Bit Ring 0
3.2 Removal of Ring 1 and Ring 2
3.3 Removal of 16-Bit and 32-Bit Protected Mode
3.4 Removal of 16-Bit Addressing and Address Size Overrides
3.5 CPUID
3.6 Restricted Subset of Segmentation
3.7 New Checks When Loading Segment Registers
3.7.1 Code and Data Segment Types
3.7.2 System Segment Types (S=0)
3.8 Removal of #SS and #NP Exceptions17
3.9 Fixed Mode Bits
3.9.1 Fixed CR0 Bits
3.9.2 Fixed CR4 Bits
3.9.3 Fixed EFER Bits
3.9.4 Removed RFLAGS
3.9.5 Removed Status Register Instruction
3.9.6 Removal of Ring 3 I/O Port Instructions
3.9.7 Removal of String I/O

Pros:
Technically makes sense for PCs as they are.
Cons:
Looses some of the major aspects of what makes x86 unique;
Doesn't really solve issues for x86-64's longer term survival.

Absent changing to a more sensible encoding scheme and limiting or
removing condition-codes, x86-64 still has this major boat anchor. But,
these can't be changed without breaking backwards compatibility (at
least, assuming hardware that continues running x86-64 as the native
hardware ISA).

Though, ironically, most "legacy x86" stuff could probably be served acceptably with emulators.

If it can't maintain a performance advantage (say, if ARM and RISC-V
catch up or exceed the performance possible on higher end x86 chips), it
is effectively done.

Granted, ARM also has the dead weight that is ALU condition codes; and
RISC-V some of its own traditional limitations.

ARM64 would likely beat RV64G in a clock-per-clock sense, but
potentially RV64 could be clocked a little faster due to not having to
deal with CC's, ...

As I see it, a case could almost be made for going more like the Apple "Rosetta" route, switching to some other ISA (be it ARM or RISC-V or
whatever else), and running any existing/legacy software primarily via emulation. Main thing one would need in this case is a decent emulator
(JIT or AOT based) and enough helpers to work around some things that
are a pain to do efficient in pure software emulation (like twiddle the
bits in EFLAGS/RFLAGS based on the result of ALU instructons).

This matters more for end-user use-cases, since:
End users care about backwards compatibility;
Both low-end embedded, and things like webservers, have little need to
care about compatibility (so in theory could just jump directly to ARM
or RISC-V or whatever).

Not a whole lot of other obvious uses cases where an x86-64 only CPU is
an obvious win and would retain a clear advantage over jumping to
another ISA.

Going forward, it seems more likely to face competition by "cheap"
processors being "good enough" rather than direct competition at the
high-end (where x86-64 has traditionally dominated). High-end designs
can't really compete as well on the "cheap" end (but a cheaper design
may still be competitive if one can have more cores, even if per-thread performance is worse). Seemingly, there isn't much more one can go "up"
in terms of single-threaded performance (more a question of if the
competition can play "catch up").

They could possibly hold on by also jettisoning x86-64 as the native
ISA, and coming up with something that can allow things to be more
competitive at lower cost. But, replacing it doesn't really "save" it
either.

Say:
Switch over to a less terrible encoding scheme;
Limit the use of (if not eliminate) the use of condition codes in the
native ISA (say, CC's mostly existing in the form of helper machinery to
make emulation faster);
Could maybe offload x86 compatibility to firmware (say, the EFI BIOS
provides a hardware-optimized JIT compiler).

If the new ISA were tuned towards efficiently emulating x86-64, while
also being cheaper, it could still hold an advantage.

Say, if one could make the CPU itself have 35% more perf/W by jumping to
a different encoding scheme, this could easily offset if they needed to
pay a 20% cost by JIT compiling everything when running legacy software...

Granted, this is predicated on the assumption that one could get such a
jump by jumping to a different encoding scheme.

In many other cases, even if emulation is slower than it might have been
to run the code natively, it may not matter that much.

Say, for example, one can run WinXP in QEMU on an Android phone and then proceed to play Diablo2 or similar. In these cases, the limiting factor
may be more that the UI experience sucks, rather than the potentially significant performance overhead of running WinXP in QEMU on a smartphone...

The major selling point of x86 has been its backwards compatibility, but
this advantage may be weakening with the rise of the ability to emulate
stuff at near native performance. If Windows could jump ship and provide
an experience that "doesn't suck" (fast/reliable/transparent emulation
of existing software), the main advantages of the x86-64 legacy may go
away (and is already mostly moot in Linux since the distros typically recompile everything from source, with little real/significant ties to
the x86 legacy).

This situation may itself change if MS continues trying to shoot
themselves in the foot (eg, making Win11 bad enough to where people are
more tempted to jump over to Linux when Win10 becomes no longer usable). Theoretically, it being more in MS interests to make Windows not suck
(rather than trying to force crap on people and make the Windows
experience kinda suck...).

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 00:03:43 2024

From Newsgroup: comp.arch

On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

On 10/17/2024 4:34 PM, EricP wrote:

Pros:
Technically makes sense for PCs as they are.
Cons:
Looses some of the major aspects of what makes x86 unique;
Doesn't really solve issues for x86-64's longer term survival.

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Absent changing to a more sensible encoding scheme and limiting or
removing condition-codes, x86-64 still has this major boat anchor. But,
these can't be changed without breaking backwards compatibility (at
least, assuming hardware that continues running x86-64 as the native
hardware ISA).

Condition codes were never "that hard" of a problem wither in
pipelining nor in operand routing.

Though, ironically, most "legacy x86" stuff could probably be served acceptably with emulators.

Every try to emulate A24 ? Address bit 24--when we looked at it, it took
more gates to remove it and put a bit in CPUID so applications could "do
the right thing" than to simply leave the functionality there.

If it can't maintain a performance advantage (say, if ARM and RISC-V
catch up or exceed the performance possible on higher end x86 chips), it
is effectively done.

x86 performance advantage has ALWAYS been in the cubic amounts of cash
flow running through the FAB to pay the engineering team budgets.
--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 00:21:51 2024

From Newsgroup: comp.arch

On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

On 10/17/2024 4:34 PM, EricP wrote:

Say, if one could make the CPU itself have 35% more perf/W by jumping to
a different encoding scheme, this could easily offset if they needed to
pay a 20% cost by JIT compiling everything when running legacy
software...

This only works when the mative ISA has a direct path to emulating
the address modes of x86-64 which includes [Rbase+Rindex<<scale+DISP]

It is also a hopelessly frail path to self destruction:: Transmeta.

Granted, this is predicated on the assumption that one could get such a
jump by jumping to a different encoding scheme.

It is not the encoding scheme that is kaput, it is the semantics
such a scheme provides the programmer via ISA.
--------------------------------

The major selling point of x86 has been its backwards compatibility, but
this advantage may be weakening with the rise of the ability to emulate
stuff at near native performance. If Windows could jump ship and provide
an experience that "doesn't suck" (fast/reliable/transparent emulation
of existing software), the main advantages of the x86-64 legacy may go
away (and is already mostly moot in Linux since the distros typically recompile everything from source, with little real/significant ties to
the x86 legacy).

W11 has done enough to my day-to-day operations I am willing to
jump ship to Linux in order to avoid daily updates an the myriad
of technical issues that never seem to get solved in a way that
makes then "go away" forever. So, for me it is not that it will
be an x86 (or ARM, or ...) it is that it is not MS oriented.
--- Synchronet 3.20a-Linux NewsLink 1.114

From John Levine@johnl@taugh.com to comp.arch on Tue Oct 22 01:16:11 2024

From Newsgroup: comp.arch

According to MitchAlsup1 <mitchalsup@aol.com>:

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Intel's never going to catch up in the phone market but they're still significant in the server and cloud market.

Think about the way that current Intel chips have a native 64 bit architecture but can still have a 32 bit user mode that can run existing 32 bit application binaries. So how about if the next generation is native x86S, but can also run existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Mon Oct 21 23:57:59 2024

From Newsgroup: comp.arch

On 10/21/2024 7:03 PM, MitchAlsup1 wrote:

On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

On 10/17/2024 4:34 PM, EricP wrote:

Pros:
   Technically makes sense for PCs as they are.
Cons:
   Looses some of the major aspects of what makes x86 unique;
   Doesn't really solve issues for x86-64's longer term survival.

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Worked better for them when PCs kept getting faster.
Then there was more reason to want to buy new PCs and new CPUs;
When it is "just because" or planned obsolescence, this isn't so good.

Not so much when a ~ 7 year old CPU model is nearly as fast as its newer equivalents.

Issue then isn't so much one of speed, so much as some newer software
being like "not gonna run on that". Then, Win11 is also like, "Nope, not
gonna run on that"...

Theoretically, the CPU could work, but the MOBO lacks a TPM, and also
the long standing "virtualization doesn't work for whatever reason"
issue (can enable in BIOS, still doesn't work, ...).

Then VirtualBox and HyperV and similar, "Nope".
QEMU and DOSBox are still happy enough though...

Apparently there are ways to force install Win11 without a TPM, but then apparently Windows Update and similar refuses to work.

Win10 still good enough for now, what next?... Dunno.

Still better than the whole Apple / iPhone thing, with the apparent
practice of remotely throttling performance and then (ultimately)
sending a kill-switch signal once the devices get old enough.

Well, then with Android, it lasts until "Google Play" or similar stops
working (well, or on an older device, "Android Market").

Like, say, the usefulness of an Android 2.1 device being more limited by
the non-functional "Android Market" than by the performance of the
hardware. Meanwhile, a Windows Vista era laptop is at least still
technically usable (well, can still technically use my XP era laptop as
well, but at this point have to custom-build software for it via VS2008
or Platform SDK v6.1; and in terms of performance it generally loses to
a RasPi).

Then again (from long ago), I have memories of before messing with a 90s
era laptop that basically failed at trying to run Quake 2 and Half-Life.
IIRC, it was running Win 98, but was hard pressed to run much newer than
Doom or similar on it.

Decided to leave out some stuff, but digging around on the internet,
looks like the closest match I can find to what I remember seems to be
the ThinkPad 365E or 365X (had 3.5" floppy drive and parallel port, did
not have CD-ROM or USB; had a display that did color but was kinda awful
at it, ...).

I think parents got rid of it, but I guess by that point it was kinda
useless (and to get files onto it, one either needed to use floppies or
copy them via HyperTerm and a Null-Modem cable).

It was at least capable of launching Quake, but its performance was
pretty much unusable. A lot of newer software at the time would just immediately crash (that time being roughly in the XP era).

The XP era laptop is getting kinda unusable at this point, but I am half-wondering if an SDcard to laptop-PATA adapter could be an
improvement (vs an otherwise annoyingly slow 20GB HDD; like if I could
get a 64GB or 128GB SDcard to work, this would be a lot more space).

But, probably not going to go much bigger than 128GB, as I seem to
remember WinXP having a problem with drives over 128GB.

If I do so, might almost make sense to try jumping from WinXP to a
32-bit Linux distro (would just need to find something that can run on a laptop from 2003).

...

But, I guess, granted, they would sell more CPUs if people bought new
stuff more often (well, and bought the newest generation parts, rather
than older/cheaper parts). But, then again, not like I have infinite
money, so...

Absent changing to a more sensible encoding scheme and limiting or
removing condition-codes, x86-64 still has this major boat anchor. But,
these can't be changed without breaking backwards compatibility (at
least, assuming hardware that continues running x86-64 as the native
hardware ISA).

Condition codes were never "that hard" of a problem wither in
pipelining nor in operand routing.

It seems, they create a path where each ALU instruction may potentially
depend on the prior ALU instruction, and where instructions like Jcc
need these bits immediately following an ALU instruction, ...

Could be be better if, say:
CC's didn't exist;
CC's are *only* updated by instructions like CMP and similar.

If no CC's, ALU instructions have no implicit dependency and could be evaluated in any order without a visible effect on state.

For a past emulator, did note though that a lot of the CC logic could be skipped by noting cases where a following instruction would fully mask
the CC updates from a prior instruction. This is possibly asking a bit
much from hardware though...

While my use of a T bit could be argued to be "similar" to CC's, it is different:
T bit may only be updated in certain contexts;
I was able to get by with a 2 cycle latency between updating the T bit
and any instructions which use the T bit;
A similar sort of 2-cycle latency constraint for x86-64 rFLAGS would
likely have an adverse effect on performance.

Though, ironically, most "legacy x86" stuff could probably be served
acceptably with emulators.

Every try to emulate A24 ? Address bit 24--when we looked at it, it took
more gates to remove it and put a bit in CPUID so applications could "do
the right thing" than to simply leave the functionality there.

My past x86 emulator attempts were limited mostly to 32-bit user-mode
stuff, so no A20 or A24 wonk or similar (was at the time mostly trying
to get simple 32-bit Windows programs working).

If I were to to try to emulate a full machine, would likely switch out
the memory load/store handling logic (as function pointers) based on the
value of relevant architectural registers (such as whether paging is
enabled or disabled).

Most recent efforts to write an x86 emulator have fizzled relatively
quickly though; mostly for concerns that I wouldn't get enough
performance to make it worthwhile (and emulating x86 on my x86 PC
wouldn't be terribly useful, and on RasPi there is also QEMU and DOSBox,
even if the performance sucks).

If it can't maintain a performance advantage (say, if ARM and RISC-V
catch up or exceed the performance possible on higher end x86 chips), it
is effectively done.

x86 performance advantage has ALWAYS been in the cubic amounts of cash
flow running through the FAB to pay the engineering team budgets.

Recent years have mostly been model numbers advancing faster than any
single threaded performance improvements...

And ARM is catching up.

Seemingly, the RISC-V chips are a bit further behind, but seem to be
advancing up the ladder rather quickly.

Or, "How about bigger AVX?", goes back and forth, AND apparently
supporting AVX512 via the cheaper mechanism of doing the operations as multiple parts.

Where, seemingly SIMD going too much wider than 128 bits actually makes
stuff worse...

Pretty much my entire adult life, there hasn't been much obvious gain
from SIMD going wider than 128 bits, I am inclined to posit that 128
bits is probably near optimal.

And, the advantage of SIMD lies more with subdividing the registers into
N elements (without increasing pipeline or register width), rather than
trying to gain more elements by pushing registers to bigger sizes.

Personally, I also have a lot more use cases for 4-wide vectors of
16-bit elements than I do for 256 bit vectors.

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 00:53:09 2024

From Newsgroup: comp.arch

On 10/21/2024 7:21 PM, MitchAlsup1 wrote:

On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

On 10/17/2024 4:34 PM, EricP wrote:

Say, if one could make the CPU itself have 35% more perf/W by jumping to
a different encoding scheme, this could easily offset if they needed to
pay a 20% cost by JIT compiling everything when running legacy
software...

This only works when the mative ISA has a direct path to emulating
the address modes of x86-64 which includes [Rbase+Rindex<<scale+DISP]

This is a bigger problem for unmodified RISC-V as I see it.

Though, in theory, RISC-V with Zba can do:
SH2ADD Xd, Xb, Xi
LW Xd, Disp(Xd)

Which arguably isn't too bad.

I think they underestimate the costs of needing a 2-op sequence vs, say:
LW Xd, (Xb, Xi)

But, yeah, if trying to design an x86 stand-in, might make sense to
prioritize trying to be close to 1:1 for core x86 ops, or use fewer ops
in cases where multiple x86 ops can be merged (say, 3R vs 2R, ...).

This does probably still mean a variable length encoding, but, say:
32/64/96, or 16/32/64/96, or similar.

Not 1-15 bytes with a fairly ad-hoc set of encoding rules.
Also the official RISC-V strategy for larger encodings sucks...

IMO, jumbo prefixes or similar are preferable, as the extension scheme
is more straightforward (and possible encodings can drop out of a
predefined set of rules; not so much by needing people to go and define
each possible extended encoding individually, with no real consistent
layout guidelines for how the extended instruction spaces are to be structured, ...).

It is also a hopelessly frail path to self destruction:: Transmeta.

I am imagining not exactly taking the Transmeta path, but possibly more
like a Rosetta path, with a Transmeta-like fallback if trying to boot an
old OS (if the OS lacks the native ISA, so is booted in full emulation).

But, yeah, likely would be better to have the OS to be able to run the
native ISA.

Granted, this is predicated on the assumption that one could get such a
jump by jumping to a different encoding scheme.

It is not the encoding scheme that is kaput, it is the semantics
such a scheme provides the programmer via ISA. --------------------------------

Possibly.

My ideas thus far end up looking sort of like:
Core ISA similar to BJX2 or RISC-V semantics, but with ability to
express large immediate values and similar (a weak area for normal
RISC-V). Should be able to efficiently express x86 address modes, ...

Should likely also have helpers for things like rFLAGS twiddling, ...

The major selling point of x86 has been its backwards compatibility, but
this advantage may be weakening with the rise of the ability to emulate
stuff at near native performance. If Windows could jump ship and provide
an experience that "doesn't suck" (fast/reliable/transparent emulation
of existing software), the main advantages of the x86-64 legacy may go
away (and is already mostly moot in Linux since the distros typically
recompile everything from source, with little real/significant ties to
the x86 legacy).

W11 has done enough to my day-to-day operations I am willing to
jump ship to Linux in order to avoid daily updates an the myriad
of technical issues that never seem to get solved in a way that
makes then "go away" forever. So, for me it is not that it will
be an x86 (or ARM, or ...) it is that it is not MS oriented.

I am still using Win10 on my PC, but parents have a PC Win11, and what
little I have encountered it (and heard about it) doesn't really make me
want to use it.

Well, and Win11 doesn't see my PC as being compatible.

But, this leaves stuff in a limbo state for now.

...

Most of the software I run is open source, so theoretically a jump is possible.

Usually, I would skip over Windows versions that kinda sucked, say, I
stuck with XP-X64 until Win7, then went from Win7 to Win10.

Looks like MS is trying to push people into using Win11 though (while at
the same time trying to push SecureBoot and TPMs, ...).

...

Ironically, I would almost assume getting one of the manycore ARM based systems and running Linux on it, except they are still expensive. And,
on the other side, a RasPi or other similar SBC is not a viable replacement.

Would want something I would have a proper GPU on and plug in 6 or 8
SATA devices, etc.

So, my wishlist would include among other things:
ATX family form-factor;
PC-like or PC-superior performance;
Can have, say, 128GB of RAM and 8+ SATA devices;
...

But, at least at the moment, x86-64 still owns this space...

But, may change in the future...

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 01:16:34 2024

From Newsgroup: comp.arch

On 10/21/2024 8:16 PM, John Levine wrote:

According to MitchAlsup1 <mitchalsup@aol.com>:

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Intel's never going to catch up in the phone market but they're still significant in the server and cloud market.

Think about the way that current Intel chips have a native 64 bit architecture
but can still have a 32 bit user mode that can run existing 32 bit application
binaries. So how about if the next generation is native x86S, but can also run
existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.

As I understand it, as-is x86S would mostly effect OS level code,
leaving the userland basically intact (still able to run existing software).

If your goal is "Run an OS like Win 11, Business as usual", this makes
sense. Modern Windows doesn't really use the stuff that x86S removes.

Longer term, this may be insufficient. If Moore's Law grinds entirely to
a halt, running x86-64 in hardware may not be a win.

Then again, I guess both Itanium and Transmeta can be taken as examples
of ways to shoot oneself in the foot as well.

In this case, the Apple situation makes more sense. They have jumped
MacOS from x86 to ARM, without loosing all of their existing software
base, by running a userland emulator that "doesn't suck".

Granted, can't necessarily trust MS here, as much of the time MS has
done stuff like using emulation strategies that are awkward and suck.
Like, say, running Windows inside an emulator, in Windows, and just sort
of crudely gluing the desktops together between programs in the native
and VM Windows instance (without giving programs in the VM transparent
access to the host OS's filesystem, ...).

So, it is more a thing of "What if the emulation layer, doesn't
suck?...". One where there are no obvious seams, the VM program
instances looking and behaving just like native ones, able to see all
the same files, ...

--- Synchronet 3.20a-Linux NewsLink 1.114

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue Oct 22 08:41:40 2024

From Newsgroup: comp.arch

In article <vf7g04$1c5qd$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

In this case, the Apple situation makes more sense. They have
jumped MacOS from x86 to ARM, without loosing all of their existing
software base, by running a userland emulator that "doesn't suck".

Granted, can't necessarily trust MS here, as much of the time MS
has done stuff like using emulation strategies that are awkward and
suck. Like, say, running Windows inside an emulator, in Windows,
and just sort of crudely gluing the desktops together between
programs in the native and VM Windows instance (without giving
programs in the VM transparent access to the host OS's filesystem,
...).

The x86-32 and x86-64 emulation in ARM Windows 11 is pretty good. Native
and emulated programs run on the same desktop with no visible seams.

They don't have the "emulated x86 is faster than native x86" that Apple
users report, but Apple stopped updating their x86 machines in 2018,
creating sn additional performance gap.

John
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 22 15:26:20 2024

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> writes:

Think about the way that current Intel chips have a native 64 bit architecture >but can still have a 32 bit user mode that can run existing 32 bit application >binaries. So how about if the next generation is native x86S, but can also run >existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of
current CPUs would still be there.

And I certainly prefer a CPU that has more capabilities to one that
has less capabilities. Sometimes I want to run old binaries.

So what would be my incentive as a user to buy an x86S CPU? Will they
sell them for less? I doubt it.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue Oct 22 17:38:01 2024

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

John Levine <johnl@taugh.com> writes:

Think about the way that current Intel chips have a native 64 bit architecture
but can still have a 32 bit user mode that can run existing 32 bit application
binaries. So how about if the next generation is native x86S, but can also run
existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of >current CPUs would still be there.

Most of the proposed changes are unintersting to user mode developers.

They're definitly interesting to system software (UEFI, Hypervisor,
Kernel folks), if only to clean up the boot and startup paths.

Those changes also will reduce the RTL verification load, and
perhaps simplify other areas of the implementation leading to further efficiencies down the road. The A20 gate should be relegated
to the trash heap of history.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue Oct 22 18:39:40 2024

From Newsgroup: comp.arch

In article <2024Oct22.172620@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

x86S eliminates a part of the compatibility path from systems
of yesteryear, but not that many people use these parts nowadays
anyway. It's unclear to me what benefits these changes are
supposed to buy

I don't know how much circuitry and firmware in motherboard chipsets is required to support the old compatibility paths, but the manufacturers
would doubtless like to save costs there. This might also make the
machines more "secure" in that special sense used with DRM.

Microsoft would probably like machines where media playing was harder to intercept, because that would earn them more trust from the media conglomerates.

John
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 13:43:40 2024

From Newsgroup: comp.arch

On 10/22/2024 10:26 AM, Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

Think about the way that current Intel chips have a native 64 bit architecture
but can still have a 32 bit user mode that can run existing 32 bit application
binaries. So how about if the next generation is native x86S, but can also run
existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of current CPUs would still be there.

And I certainly prefer a CPU that has more capabilities to one that
has less capabilities. Sometimes I want to run old binaries.

So what would be my incentive as a user to buy an x86S CPU? Will they
sell them for less? I doubt it.

Yeah, basically my thoughts as well.
Business as usual...

Main effect it achieves is breaking legacy boot, doesn't seem like it
would either save all that much nor "solve" x86's longstanding issues.

And, proposing an, "x86-64 but re-imagined as a RISC-style ISA" mode
could have made sense...

Say:
Instructions have a simpler and more consistent encoding;
Probably still VLE, but less free-form.
Maybe expand to 32 or 64 registers;
3 register instructions;
Maybe split ALU ops into CC-update and No-CC-update variants;
Sorta like ARM.
Most other ops become No-CC-Update only.
This mode drops things like x87 and MMX and similar;
Only SSE/AVX.
Would make sense to keep the existing addressing modes (*1);
Maybe keep LoadOP and OPStore, but limited in scope.
Ironically, something like RISC-V AMO requires such a mechanism.
If one implements something like AMO, may as well have LoadOP.
...

Maybe, as for AVX512:
They can either make it usable "in general", or deprecate it.

*1: Probably, say (if I were designing the encoding):
{Rb+Disp10s] //32-bit encoding
{Rb+Ri*FixSc] //32-bit encoding
{Rb+Ri*Sc] //64-bit encoding
[Rb+Disp33s] //64-bit encoding
[Rb+Ri*Sc+Disp11s] //64-bit encoding
[Rb+Ri*Sc+Disp33s] //96-bit encoding

Some tweaks are possible, the above is mostly "if encoded in a similar
way to XG2 or my RV+Jumbo-Prefix thing".

Though, in my case, the issue with the latter address modes had been
less: "How to encode" so much as "Do they buy enough to justify the
added cost of a 3-way adder in the AGU..."

But, for an "x86 successor", might be difficult to justify limiting
things in a way that would require increasing the instruction count.

LoadOP would likely be allowed for basic ALU ops, but N/A for SSE/AVX (assembler could fake these though by breaking them into multiple ops).

Might make sense to offload a lot of the SSE/AVX stuff to 64-bit
encodings, and essentially merge x86 into AVX (makes little sense to
have separate encodings that do the same thing on the same registers).

One other likely goal would be to make it mostly backwards compatible
with existing x86-64 ASM code, which would likely simplify getting a
compiler for it.

In many cases, a JIT could potentially be pretty close to a 1:1 decode/re-encode process in this case (though, a bit more if trying to
emulate legacy modes).

For translation or assembly, No-CC-Update forms could be inferred from
default forms by looking forwards in the instruction stream (if a
following instruction entirely masks any flags updates, can use the non-updating form instead).

Emulating x87 for legacy code could be harder though. Common case, x87
stack could be resolved statically, but there is a subset of cases where
a non-static mapping could result. A JIT Would likely map x87 onto XMM registers, in any case.

Probable register spaces:
R0 / RAX
R1 / RCX
R2 / RDX
R3 / RBX
R4 / RSP
R5 / RBP
R6 / RSI
R7 / RDI
R8..R15: Same as x86-64
R16..R31: Extended, otherwise similar.
XMM0 ..XMM15: Same
XMM16..XMM31: Expanded

Would drop x87, MMX and x87/MMX registers.

Unlike RISC-V, I would assume keeping the base immediate values smaller
(9 or 10 bits) and use VLE for larger immediate values.

Doing 12-bit immediate values as the default essentially eats a lot of encoding space for relatively little gain.
Would have 17-bit constant-load and ADD as special cases.

My usual rationale for prioritizing 17 and 33 bit constants in some
cases, over 16/32, or various other sizes, is that these sizes have
"unusually good" hit rates IME (if you can cover both "signed short" and "unsigned short" ranges, the hit rate is significantly better than if it
just covers "signed short", but adding a few additional bits gains
relatively little; likewise for 33).

Pattern isn't so strong for 9s though, where both 9u and 10s are
stronger than 9s (but 11 and 12 bits seem to see a rapid drop-off; at
which point it is likely better to jump to a bigger encoding).

...

- anton

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 21:13:41 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 18:43:40 +0000, BGB wrote:

On 10/22/2024 10:26 AM, Anton Ertl wrote:

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of
current CPUs would still be there.

And I certainly prefer a CPU that has more capabilities to one that
has less capabilities. Sometimes I want to run old binaries.

So what would be my incentive as a user to buy an x86S CPU? Will they
sell them for less? I doubt it.

Yeah, basically my thoughts as well.
Business as usual...

Main effect it achieves is breaking legacy boot, doesn't seem like it
would either save all that much nor "solve" x86's longstanding issues.

Intel needs a better way to exit reset--and that means the MMU/TLBs
are already up and working at the time reset is exited. This cannot
be made backwards compatible.
-------------------------------

*1: Probably, say (if I were designing the encoding):
{Rb+Disp10s] //32-bit encoding
{Rb+Ri*FixSc] //32-bit encoding
{Rb+Ri*Sc] //64-bit encoding
[Rb+Disp33s] //64-bit encoding
[Rb+Ri*Sc+Disp11s] //64-bit encoding
[Rb+Ri*Sc+Disp33s] //96-bit encoding

[Rb+DISP16] // 32-bit 16 > 10
[Rb+Ri<<sc] // 32-bit
[Rb+Ri<<sc+DISP32] // 64-bit 32 > 11
[Rb+Ri<<sc+DISP64] // 96-bit 64 > 33
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 16:18:50 2024

From Newsgroup: comp.arch

On 10/22/2024 12:38 PM, Scott Lurndal wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

John Levine <johnl@taugh.com> writes:

Think about the way that current Intel chips have a native 64 bit architecture
but can still have a 32 bit user mode that can run existing 32 bit application
binaries. So how about if the next generation is native x86S, but can also run
existing 64 bit binaries, even if not as fast as native x86S. They get the usual
cloud operating systems ported to x86S while leaving a path for people to migrate
their existing applications gradually.

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of
current CPUs would still be there.

Most of the proposed changes are unintersting to user mode developers.

Yes, on current platforms, they are unlikely to notice.

Would make the chip essentially "useless" for a retro system, but most
of these guys are either using vintage parts, or emulation.

Like, little point in trying to run Win98 on a newest-generation
platform (and, apparently, getting Win98 working natively on anything
much newer than the mid 2000s is pain, as even if the CPU is backwards compatible, most of the other hardware is not).

Then, there is the pros/cons option of "Well, run QEMU or similar..."
Decided to leave out going into thoughts on the QEMU experience
(technically works OK, but in some areas is a little lacking).

Though, there is a possible merit to trying to have a userland-only
emulator. Unlike full system/OS level emulator, all of the wonk and
issues with emulating hardware interfaces and drivers, and with
filesystem integration, can largely go away (one can essentially trap
out of the emulation at the system-call level).

Though, it is possible more effort than in may seem to try to write
usable mockups for KERNEL32.DLL and USER32.DLL and similar (and can't
directly copy-paste these parts from Wine; didn't get very far). I think
I got annoyed and gave up trying to debug it at the time.

Though, ironically, some code from this past project was copy-pasted
into what later became TestKern.

IIRC, the goal at the time was to make something like Wine that ran on a RasPi. Since then, Wine itself has gained this capability (by internally offloading the emulation parts to QEMU). Not looked too much into how
this setup works.

But, in any case, at least on theory, the need to stick with x86 as the
native ISA for sake of backwards compatibility seems to be weakening.

Had on/off considered trying to revive the idea in a different form, but
had mostly stalled out (if the host is 50MHz, running x86 via an
interpreter is going to be too slow to be worthwhile).

It seemed likely more practical to try to get RV64G Linux-ELF binaries
working than to try to try to get Win32 binaries working. Though, this
had also stalled, as now there is the issue of trying to figure out why "ld-linux.so" and similar keep exploding (thus far, all the "actually
usable" RV64 builds have been using my own C library).

...

They're definitly interesting to system software (UEFI, Hypervisor,
Kernel folks), if only to clean up the boot and startup paths.

Those changes also will reduce the RTL verification load, and
perhaps simplify other areas of the implementation leading to further efficiencies down the road. The A20 gate should be relegated
to the trash heap of history.

Possibly true, but presumably RTL verification of legacy features is not
a contiguous recurring cost...

--- Synchronet 3.20a-Linux NewsLink 1.114

From George Neuner@gneuner2@comcast.net to comp.arch on Tue Oct 22 17:59:46 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Only because the average cell phone gets broken or flooded within a
year. If people were not so careless, I doubt most would be replaced
so often.

My current phone is over 4 years old and it continues to serve all of
my needs. Sans damage, the only reason I would choose to replace it
would be when critical apps no longer support the OS version.
--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 22:17:28 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 21:59:46 +0000, George Neuner wrote:

On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

x86's long term survival depends on things out of AMD's and Intel's
hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Only because the average cell phone gets broken or flooded within a
year. If people were not so careless, I doubt most would be replaced
so often.

My current phone is over 4 years old and it continues to serve all of
my needs. Sans damage, the only reason I would choose to replace it
would be when critical apps no longer support the OS version.

My first cell phone (Galaxy 3) I got in 2012 and used it until 2022
when the service provider offered a zero cost upgrade because they
were loosing access to the 4G-LTE antennae. I did put in 2 new
batteries, and nothing was scratched or dented after 11 years of use.

I still liked it better than the Galaxy 12 I have now. ...

Oh and BTW:: I do not carry my cell phone unless I am traveling
or expecting a call. It lives in my office--probably why it is
not being damaged by being sat upon or dropped into water, and
other causes of cell phone death.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:46:50 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 01:16:34 -0500, BGB wrote:

In this case, the Apple situation makes more sense. They have jumped
MacOS from x86 to ARM, without loosing all of their existing software
base, by running a userland emulator that "doesn't suck".

Seems like the Apple platform has less need for third-party addons that intrude into the kernel, simply because it has a smaller choice of apps anyway.

For example, anticheat mechanisms for online games. Fortnite is one I have seen mentioned, that cannot work with the x86 emulation offered by Windows-on-ARM. Presumbly this is not a problem on the Mac because
Fortnite is simply unavailable on the Mac.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:48:24 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 18:38 +0100 (BST), John Dallman wrote:

Microsoft would probably like machines where media playing was harder to intercept, because that would earn them more trust from the media conglomerates.

One of the innovations in Windows Vista was the addition of the “Protected Media Path”, which was supposed to solve exactly this problem. Didn’t it? --- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:52:23 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 16:18:50 -0500, BGB wrote:

Like, little point in trying to run Win98 on a newest-generation
platform (and, apparently, getting Win98 working natively on anything
much newer than the mid 2000s is pain ...

Funny, I did exactly that for a friend a couple of years ago. The Windows
98 image ran under PCem <https://pcem-emulator.co.uk/>, on a Linux Mint installation on an MSI Cubi 5.

I set up a “captive” user under Mint that, the moment you logged in, started the emulator running Windows. Shut down Windows, and it logged you
out again.
--- Synchronet 3.20a-Linux NewsLink 1.114

From George Neuner@gneuner2@comcast.net to comp.arch on Thu Oct 24 15:59:48 2024

From Newsgroup: comp.arch

On Tue, 22 Oct 2024 22:17:28 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

On Tue, 22 Oct 2024 21:59:46 +0000, George Neuner wrote:

On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

x86's long term survival depends on things out of AMD's and Intel's >>>hands. It depends on high volume access to devices people will buy
new every year or every other year. A PC is not such a thing, while
a cell phone seems to be.

Only because the average cell phone gets broken or flooded within a
year. If people were not so careless, I doubt most would be replaced
so often.

My current phone is over 4 years old and it continues to serve all of
my needs. Sans damage, the only reason I would choose to replace it
would be when critical apps no longer support the OS version.

My first cell phone (Galaxy 3) I got in 2012 and used it until 2022
when the service provider offered a zero cost upgrade because they
were loosing access to the 4G-LTE antennae. I did put in 2 new
batteries, and nothing was scratched or dented after 11 years of use.

I still liked it better than the Galaxy 12 I have now. ...

I used an LG flip phone from 2008..2020. Prior to that I had a Nokia
"stick" from 1995. Before that I had a Motorola flip phone from early
80's that was on my parents' plan.

Only reasons I have ever upgraded was because carriers changed service requirements: 2G->3G, 3G->4G. I have never had to replace a phone
because it was damaged.

Current phone still is 4GLTE. It's OS is slated to sunset soon, but I
expect to keep using it until developers drop support and the apps I
need will no longer update.

Oh and BTW:: I do not carry my cell phone unless I am traveling
or expecting a call. It lives in my office--probably why it is
not being damaged by being sat upon or dropped into water, and
other causes of cell phone death.

I /do/ carry my phone - always in my left front pocket. I won't answer
if I'm busy (never while in the bathroom or while driving) ... if the
caller won't leave a message, it's obvious that the call was not
important.
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.arch on Thu Oct 24 23:31:35 2024

From Newsgroup: comp.arch

On 10/22/2024 4:13 PM, MitchAlsup1 wrote:

On Tue, 22 Oct 2024 18:43:40 +0000, BGB wrote:

On 10/22/2024 10:26 AM, Anton Ertl wrote:

Several things in this paragraph makes no sense.

In particular, x86S is a proposal for a reduced version of the stuff
that current Intel and AMD CPUs support: There is full 64-bit support,
and 32-bit user-level support. x86S eliminates a part of the
compatibility path from systems of yesteryear, but not that many
people use these parts nowadays anyway. It's unclear to me what
benefits these changes are supposed to buy (unlike the elimination of
A32/T32 from some ARM chips, which obviously eliminates the whole
A32/T32 decoding path). It seems to me that most of the complexity of
current CPUs would still be there.

And I certainly prefer a CPU that has more capabilities to one that
has less capabilities. Sometimes I want to run old binaries.

So what would be my incentive as a user to buy an x86S CPU? Will they
sell them for less? I doubt it.

Yeah, basically my thoughts as well.
   Business as usual...

Main effect it achieves is breaking legacy boot, doesn't seem like it
would either save all that much nor "solve" x86's longstanding issues.

Intel needs a better way to exit reset--and that means the MMU/TLBs
are already up and working at the time reset is exited. This cannot
be made backwards compatible.
-------------------------------

I am not sure how this would have much effect on cost either way.
A physical address mode could just be some edge case logic in the MMU
(say, whenever there is a TLB miss with MMU disabled, it merely loads an identity mapped address into the TLB).

*1: Probably, say (if I were designing the encoding):
   {Rb+Disp10s]        //32-bit encoding
   {Rb+Ri*FixSc]       //32-bit encoding
   {Rb+Ri*Sc]          //64-bit encoding
   [Rb+Disp33s]        //64-bit encoding
   [Rb+Ri*Sc+Disp11s] //64-bit encoding
   [Rb+Ri*Sc+Disp33s] //96-bit encoding

    [Rb+DISP16]         // 32-bit   16 > 10
    [Rb+Ri<<sc]         // 32-bit
    [Rb+Ri<<sc+DISP32] // 64-bit   32 > 11
    [Rb+Ri<<sc+DISP64] // 96-bit   64 > 33

One doesn't want to burn too much encoding space...

If the goal is to redesign x86 as a RISC-like ISA, one is likely going
to need a lot of space for opcode bits.

This is partly why I was thinking 32 registers rather than 64, along
with the smaller immediate fields.

Say, one possible encoding scheme would be to use a similar base format
to RISC-V:
ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY1 //32-bit op
ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY0 //64/96-bit op

Then, say:
1/2 the 32-bit encoding space is 3R ops:
1/4 the 32-bit encoding space is 3RI ops:
Remaining 1/4 for Imm16 and JMP/JCC and similar.

Say, could burn a 24/25-bit chunk of encoding space on JMP/CALL/JCC
iiiiiii-iiiii-iiiii-iii-Zcccc-YY-YYYY1
Where:
cccc is like x86 Jcc condition code,
but maybe reuse P and NP for JMP and CALL.

Though, might make sense to do CALL/RET using a link-register rather
than the stack, even if x86 traditionally used the stack.

For 64-bit:
LD/ST/OPLD/OPST: [Rb+Disp10] expands to [Rb+Disp33s]
LD/ST/OPLD/OPST: [Rb+Ri*Sc] expands to [Rb+Ri*Sc+Disp11s] or Disp17s.
Remaining bits go to opcode.

Say:
ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc]
And:
iiiiiii-iiiii-iiiii-xxx-xxxxx-xx-xxxx0 -
ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp17s]
And:
iiiiiii-iiiii-iiiii-iii-iiiii-ii-iiii0 -
kkkkkkk-kkkkk-kkkkk-xxx-xxxxx-ii-xxxx0 -
ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp33s]

Could maybe use some of the extra bits encoding things like:
ADD.Q [Rb+Ri*Sc+Disp33s], Imm17s.
Or:
ADD.Q [Rb+Ri*Sc+Disp17s], Imm33s.
Say, by having a Rn/Imm bit, and a bit to specify which immediate is
used as the constant and the other as the displacement.

But, with Disp10 base-forms, might expand to Disp33:
iiiiiii-iiiii-iiiii-xxx-iiiii-xx-xxxx0 -
iiiiiZZ-iiiii-mmmmm-dZi-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp17s]

Where the 'd' flag could select between, say:
"ADD Rn, [Rm+Disp]" or "ADD [Rm+Disp], Rn"

32-bit encodings only allowing a register, whereas 64-bit encodings
could allow an immediate.

But, not really sure...

In other news, went and wrote up a spec and threw together Verilog code
for a reworked BSR4K/XG3 ISA design:
https://pastebin.com/yfrh50bk

There are still some holes (the spec is missing pretty much all the 2R
ops for now), but alas. A few parts I have decided would not necessarily
be carried over, as some newer instructions and the addition of a Zero Register made some amount of the former 2R and 2RI instructions no
longer necessary (though, some could still be useful for efficiency; or
have other useful roles like format conversion).

To make implementation cheaper/easier for me, it is essentially XG2RV
with the bits shuffled around, a few inverted, and some special case
changes (changes branch mechanics and some edge cases involving decoding immediate values).

Initially I tried putting the repacking logic at the front end of the ID stage, but (unsurprisingly), synthesis and timing wasn't too happy about this...

Ended up instead putting the repack logic at the end of the IF stage.

There was another possible idea that I could call BSR4J:
Would have done a simpler repacking scheme:
First 16 bits are repacked:
NMOP-YwYY-nnnn-mmmm => NMOY-mmmm-nnnn-YYPw
High 16 bits copied unmodified.

So, overall instruction format, seen as 32-bits, could have been:
ZZZZ-qnmo-oooo-XXXX-NMOY-mmmm-nnnn-YYPw

But, it was admittedly more tempting, if I am going to be repacking
anyways, to make an attempt to "un-dog-chew" the instruction format (in
an attempt to make it look nicer).

It is not fully settled yet, could jump over to the BSR4J strategy
instead if the more aggressive repacking scheme is in-fact a bad idea.
One arguable merit if does have is that all of the original 4-bit fields remain 4-bit aligned (and converting between XG2 and BSR4J would be significantly less bit-twiddling vs BSR4K; while still achieving the
goal of being able to fit it into the same encoding space as RISC-V).

I have yet to decide on some specifics for the mapping of 2R instructions: Simpler/cheaper: Use the same repacking as 3R ops for 2R ops;
Possible: Modify packing rules such that the 3rd part of the opcode
field is also 4-bit aligned.

Say:
As-is : XXXX-kkWWWW-mmmmmm-ZZZZ-nnnnnn-QY-YYPw
Possible: XXXX-WWWWkk-mmmmmm-ZZZZ-nnnnnn-QY-YYPw
Would be slightly more logic complexity, but could make it easier to
visually decode 2R instructions in a hexdump (but, likely not be worth
the additional cost).

Expressing it as bits though makes it more obvious that I actually have
less total encoding space than RISC-V, as the 6-bit register fields take
their cut.

Say:
ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYY11 (RV)
ZZZZ-oooooo-mmmmmm-ZZZZ-nnnnnn-QY-YYPw (XG3)

Each RV Y block has 10 bits of opcode.
Whereas, each XG3 Y block has 9 bits.

XG3 currently has 3 Y blocks reserved for 3R:
0/3/5 (~ 10.585 bits)
RV had 2 blocks for the core ISA (11 bits).

Though, the B extension squanders a few big chunks of it by defining
some 4R instructions (such as Funnel Shift).

Contrast, BJX2 doesn't define any 32-bit 4R instructions.

Also, B extension further weakens the case for not having a register
indexed addressing mode: Any core implementing the B extension's FSR instruction is going to need a 3R capable register port on the GPRs ...

Ironically, it seems that just the 'V' and 'P' extensions both end up
eating more opcode space than the total 3R opcode space in BJX2...

XG3 effectively ends up spending 1/4 of the total encoding space on
Jumbo prefixes.

Where
Baseline: FE/FF, 25 bits total
Imm64 only possible via an Imm16 base op (24+24+16=64).
XG2: xxx1-111x, 28 bits total
XG3: x1-1zyy, ~ 29.585

Though, in XG2 the 28-bit jumbo prefixes do allow 27+27+10=64, for 3RI
Imm64 ops (with the remaining bit for immediate-extension vs more
general instruction extension).

Reason it expands in XG3 is that the Jumbo prefixes effectively also eat
the former PrWEX spaces (XG3 loses WEX and PrWEX, would need to use superscalar instead).

The would-be PrWEX spaces could maybe be used for something else, but
unclear what at the moment (likewise the FA/FB blocks are unused and XG2
and effectively N/A in XG2RV; as the role they served in Baseline has effectively "fallen out of the scope of the ISA"; and being N/E in XG3).

I guess potentially a case could be made to potentially reclaim these
blocks in XG2 as a range of "Non-predicated Scalar-Only" instructions.

Could almost relocate branches here (and work towards potentially
reclaiming the space used by branches in the F0 block to eventually be eventually reassigned to 3R space, roughly worth 64 3R ops).

Say:
CCC1-101Z: FA/FB Imm25
ccc (inverse):
000: MOV Imm25, DLR //As-is, Original Role
001: BSR Disp25s
010: BT Disp25s
011: BF Disp25s
Doesn't solve the issue for Baseline, as these would be N/E in Baseline.

Having the encoding scheme fragmenting into a tree of encoding
sub-variants is getting kind of annoying though, may eventually need to
prune the tree.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (0 / 10)
Uptime:	119:20:45
Calls:	12,958
Files:	186,574
Messages:	3,265,634

x86S Specification

Who's Online

System Info