• x86S Specification

    From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Thu Oct 17 17:34:14 2024
    From Newsgroup: comp.arch

    There is a reference in this Reg article

    https://www.theregister.com/2024/10/15/intel_amd_x86_future/

    to x86S spec, a proposal from Intel to pare-down the x86/x64
    by removing or modifying legacy features.

    [PDF] Envisioning a Simplified Intel Architecture https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html

    Some examples are:

    3 Architectural Changes
    3.1 Removal of 32-Bit Ring 0
    3.2 Removal of Ring 1 and Ring 2
    3.3 Removal of 16-Bit and 32-Bit Protected Mode
    3.4 Removal of 16-Bit Addressing and Address Size Overrides
    3.5 CPUID
    3.6 Restricted Subset of Segmentation
    3.7 New Checks When Loading Segment Registers
    3.7.1 Code and Data Segment Types
    3.7.2 System Segment Types (S=0)
    3.8 Removal of #SS and #NP Exceptions17
    3.9 Fixed Mode Bits
    3.9.1 Fixed CR0 Bits
    3.9.2 Fixed CR4 Bits
    3.9.3 Fixed EFER Bits
    3.9.4 Removed RFLAGS
    3.9.5 Removed Status Register Instruction
    3.9.6 Removal of Ring 3 I/O Port Instructions
    3.9.7 Removal of String I/O


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Mon Oct 21 17:02:27 2024
    From Newsgroup: comp.arch

    On 10/17/2024 4:34 PM, EricP wrote:
    There is a reference in this Reg article

    https://www.theregister.com/2024/10/15/intel_amd_x86_future/

    to x86S spec, a proposal from Intel to pare-down the x86/x64
    by removing or modifying legacy features.

    [PDF] Envisioning a Simplified Intel Architecture https://www.intel.com/content/www/us/en/developer/articles/technical/ envisioning-future-simplified-architecture.html

    Some examples are:

    3 Architectural Changes
    3.1 Removal of 32-Bit Ring 0
    3.2 Removal of Ring 1 and Ring 2
    3.3 Removal of 16-Bit and 32-Bit Protected Mode
    3.4 Removal of 16-Bit Addressing and Address Size Overrides
    3.5 CPUID
    3.6 Restricted Subset of Segmentation
    3.7 New Checks When Loading Segment Registers
    3.7.1 Code and Data Segment Types
    3.7.2 System Segment Types (S=0)
    3.8 Removal of #SS and #NP Exceptions17
    3.9 Fixed Mode Bits
    3.9.1 Fixed CR0 Bits
    3.9.2 Fixed CR4 Bits
    3.9.3 Fixed EFER Bits
    3.9.4 Removed RFLAGS
    3.9.5 Removed Status Register Instruction
    3.9.6 Removal of Ring 3 I/O Port Instructions
    3.9.7 Removal of String I/O



    Pros:
    Technically makes sense for PCs as they are.
    Cons:
    Looses some of the major aspects of what makes x86 unique;
    Doesn't really solve issues for x86-64's longer term survival.


    Absent changing to a more sensible encoding scheme and limiting or
    removing condition-codes, x86-64 still has this major boat anchor. But,
    these can't be changed without breaking backwards compatibility (at
    least, assuming hardware that continues running x86-64 as the native
    hardware ISA).

    Though, ironically, most "legacy x86" stuff could probably be served acceptably with emulators.


    If it can't maintain a performance advantage (say, if ARM and RISC-V
    catch up or exceed the performance possible on higher end x86 chips), it
    is effectively done.


    Granted, ARM also has the dead weight that is ALU condition codes; and
    RISC-V some of its own traditional limitations.

    ARM64 would likely beat RV64G in a clock-per-clock sense, but
    potentially RV64 could be clocked a little faster due to not having to
    deal with CC's, ...


    As I see it, a case could almost be made for going more like the Apple "Rosetta" route, switching to some other ISA (be it ARM or RISC-V or
    whatever else), and running any existing/legacy software primarily via emulation. Main thing one would need in this case is a decent emulator
    (JIT or AOT based) and enough helpers to work around some things that
    are a pain to do efficient in pure software emulation (like twiddle the
    bits in EFLAGS/RFLAGS based on the result of ALU instructons).

    This matters more for end-user use-cases, since:
    End users care about backwards compatibility;
    Both low-end embedded, and things like webservers, have little need to
    care about compatibility (so in theory could just jump directly to ARM
    or RISC-V or whatever).



    Not a whole lot of other obvious uses cases where an x86-64 only CPU is
    an obvious win and would retain a clear advantage over jumping to
    another ISA.

    Going forward, it seems more likely to face competition by "cheap"
    processors being "good enough" rather than direct competition at the
    high-end (where x86-64 has traditionally dominated). High-end designs
    can't really compete as well on the "cheap" end (but a cheaper design
    may still be competitive if one can have more cores, even if per-thread performance is worse). Seemingly, there isn't much more one can go "up"
    in terms of single-threaded performance (more a question of if the
    competition can play "catch up").


    They could possibly hold on by also jettisoning x86-64 as the native
    ISA, and coming up with something that can allow things to be more
    competitive at lower cost. But, replacing it doesn't really "save" it
    either.



    Say:
    Switch over to a less terrible encoding scheme;
    Limit the use of (if not eliminate) the use of condition codes in the
    native ISA (say, CC's mostly existing in the form of helper machinery to
    make emulation faster);
    Could maybe offload x86 compatibility to firmware (say, the EFI BIOS
    provides a hardware-optimized JIT compiler).

    If the new ISA were tuned towards efficiently emulating x86-64, while
    also being cheaper, it could still hold an advantage.


    Say, if one could make the CPU itself have 35% more perf/W by jumping to
    a different encoding scheme, this could easily offset if they needed to
    pay a 20% cost by JIT compiling everything when running legacy software...

    Granted, this is predicated on the assumption that one could get such a
    jump by jumping to a different encoding scheme.

    In many other cases, even if emulation is slower than it might have been
    to run the code natively, it may not matter that much.


    Say, for example, one can run WinXP in QEMU on an Android phone and then proceed to play Diablo2 or similar. In these cases, the limiting factor
    may be more that the UI experience sucks, rather than the potentially significant performance overhead of running WinXP in QEMU on a smartphone...


    The major selling point of x86 has been its backwards compatibility, but
    this advantage may be weakening with the rise of the ability to emulate
    stuff at near native performance. If Windows could jump ship and provide
    an experience that "doesn't suck" (fast/reliable/transparent emulation
    of existing software), the main advantages of the x86-64 legacy may go
    away (and is already mostly moot in Linux since the distros typically recompile everything from source, with little real/significant ties to
    the x86 legacy).

    This situation may itself change if MS continues trying to shoot
    themselves in the foot (eg, making Win11 bad enough to where people are
    more tempted to jump over to Linux when Win10 becomes no longer usable). Theoretically, it being more in MS interests to make Windows not suck
    (rather than trying to force crap on people and make the Windows
    experience kinda suck...).


    ...



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 00:03:43 2024
    From Newsgroup: comp.arch

    On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

    On 10/17/2024 4:34 PM, EricP wrote:

    Pros:
    Technically makes sense for PCs as they are.
    Cons:
    Looses some of the major aspects of what makes x86 unique;
    Doesn't really solve issues for x86-64's longer term survival.

    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Absent changing to a more sensible encoding scheme and limiting or
    removing condition-codes, x86-64 still has this major boat anchor. But,
    these can't be changed without breaking backwards compatibility (at
    least, assuming hardware that continues running x86-64 as the native
    hardware ISA).

    Condition codes were never "that hard" of a problem wither in
    pipelining nor in operand routing.

    Though, ironically, most "legacy x86" stuff could probably be served acceptably with emulators.

    Every try to emulate A24 ? Address bit 24--when we looked at it, it took
    more gates to remove it and put a bit in CPUID so applications could "do
    the right thing" than to simply leave the functionality there.

    If it can't maintain a performance advantage (say, if ARM and RISC-V
    catch up or exceed the performance possible on higher end x86 chips), it
    is effectively done.

    x86 performance advantage has ALWAYS been in the cubic amounts of cash
    flow running through the FAB to pay the engineering team budgets.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 00:21:51 2024
    From Newsgroup: comp.arch

    On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

    On 10/17/2024 4:34 PM, EricP wrote:

    Say, if one could make the CPU itself have 35% more perf/W by jumping to
    a different encoding scheme, this could easily offset if they needed to
    pay a 20% cost by JIT compiling everything when running legacy
    software...

    This only works when the mative ISA has a direct path to emulating
    the address modes of x86-64 which includes [Rbase+Rindex<<scale+DISP]

    It is also a hopelessly frail path to self destruction:: Transmeta.

    Granted, this is predicated on the assumption that one could get such a
    jump by jumping to a different encoding scheme.

    It is not the encoding scheme that is kaput, it is the semantics
    such a scheme provides the programmer via ISA.
    --------------------------------
    The major selling point of x86 has been its backwards compatibility, but
    this advantage may be weakening with the rise of the ability to emulate
    stuff at near native performance. If Windows could jump ship and provide
    an experience that "doesn't suck" (fast/reliable/transparent emulation
    of existing software), the main advantages of the x86-64 legacy may go
    away (and is already mostly moot in Linux since the distros typically recompile everything from source, with little real/significant ties to
    the x86 legacy).

    W11 has done enough to my day-to-day operations I am willing to
    jump ship to Linux in order to avoid daily updates an the myriad
    of technical issues that never seem to get solved in a way that
    makes then "go away" forever. So, for me it is not that it will
    be an x86 (or ARM, or ...) it is that it is not MS oriented.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From John Levine@johnl@taugh.com to comp.arch on Tue Oct 22 01:16:11 2024
    From Newsgroup: comp.arch

    According to MitchAlsup1 <mitchalsup@aol.com>:
    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Intel's never going to catch up in the phone market but they're still significant in the server and cloud market.

    Think about the way that current Intel chips have a native 64 bit architecture but can still have a 32 bit user mode that can run existing 32 bit application binaries. So how about if the next generation is native x86S, but can also run existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Mon Oct 21 23:57:59 2024
    From Newsgroup: comp.arch

    On 10/21/2024 7:03 PM, MitchAlsup1 wrote:
    On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

    On 10/17/2024 4:34 PM, EricP wrote:

    Pros:
       Technically makes sense for PCs as they are.
    Cons:
       Looses some of the major aspects of what makes x86 unique;
       Doesn't really solve issues for x86-64's longer term survival.

    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Worked better for them when PCs kept getting faster.
    Then there was more reason to want to buy new PCs and new CPUs;
    When it is "just because" or planned obsolescence, this isn't so good.


    Not so much when a ~ 7 year old CPU model is nearly as fast as its newer equivalents.

    Issue then isn't so much one of speed, so much as some newer software
    being like "not gonna run on that". Then, Win11 is also like, "Nope, not
    gonna run on that"...

    Theoretically, the CPU could work, but the MOBO lacks a TPM, and also
    the long standing "virtualization doesn't work for whatever reason"
    issue (can enable in BIOS, still doesn't work, ...).


    Then VirtualBox and HyperV and similar, "Nope".
    QEMU and DOSBox are still happy enough though...


    Apparently there are ways to force install Win11 without a TPM, but then apparently Windows Update and similar refuses to work.

    Win10 still good enough for now, what next?... Dunno.


    Still better than the whole Apple / iPhone thing, with the apparent
    practice of remotely throttling performance and then (ultimately)
    sending a kill-switch signal once the devices get old enough.


    Well, then with Android, it lasts until "Google Play" or similar stops
    working (well, or on an older device, "Android Market").

    Like, say, the usefulness of an Android 2.1 device being more limited by
    the non-functional "Android Market" than by the performance of the
    hardware. Meanwhile, a Windows Vista era laptop is at least still
    technically usable (well, can still technically use my XP era laptop as
    well, but at this point have to custom-build software for it via VS2008
    or Platform SDK v6.1; and in terms of performance it generally loses to
    a RasPi).


    Then again (from long ago), I have memories of before messing with a 90s
    era laptop that basically failed at trying to run Quake 2 and Half-Life.
    IIRC, it was running Win 98, but was hard pressed to run much newer than
    Doom or similar on it.

    Decided to leave out some stuff, but digging around on the internet,
    looks like the closest match I can find to what I remember seems to be
    the ThinkPad 365E or 365X (had 3.5" floppy drive and parallel port, did
    not have CD-ROM or USB; had a display that did color but was kinda awful
    at it, ...).

    I think parents got rid of it, but I guess by that point it was kinda
    useless (and to get files onto it, one either needed to use floppies or
    copy them via HyperTerm and a Null-Modem cable).

    It was at least capable of launching Quake, but its performance was
    pretty much unusable. A lot of newer software at the time would just immediately crash (that time being roughly in the XP era).



    The XP era laptop is getting kinda unusable at this point, but I am half-wondering if an SDcard to laptop-PATA adapter could be an
    improvement (vs an otherwise annoyingly slow 20GB HDD; like if I could
    get a 64GB or 128GB SDcard to work, this would be a lot more space).

    But, probably not going to go much bigger than 128GB, as I seem to
    remember WinXP having a problem with drives over 128GB.

    If I do so, might almost make sense to try jumping from WinXP to a
    32-bit Linux distro (would just need to find something that can run on a laptop from 2003).

    ...


    But, I guess, granted, they would sell more CPUs if people bought new
    stuff more often (well, and bought the newest generation parts, rather
    than older/cheaper parts). But, then again, not like I have infinite
    money, so...




    Absent changing to a more sensible encoding scheme and limiting or
    removing condition-codes, x86-64 still has this major boat anchor. But,
    these can't be changed without breaking backwards compatibility (at
    least, assuming hardware that continues running x86-64 as the native
    hardware ISA).

    Condition codes were never "that hard" of a problem wither in
    pipelining nor in operand routing.


    It seems, they create a path where each ALU instruction may potentially
    depend on the prior ALU instruction, and where instructions like Jcc
    need these bits immediately following an ALU instruction, ...

    Could be be better if, say:
    CC's didn't exist;
    CC's are *only* updated by instructions like CMP and similar.

    If no CC's, ALU instructions have no implicit dependency and could be evaluated in any order without a visible effect on state.


    For a past emulator, did note though that a lot of the CC logic could be skipped by noting cases where a following instruction would fully mask
    the CC updates from a prior instruction. This is possibly asking a bit
    much from hardware though...


    While my use of a T bit could be argued to be "similar" to CC's, it is different:
    T bit may only be updated in certain contexts;
    I was able to get by with a 2 cycle latency between updating the T bit
    and any instructions which use the T bit;
    A similar sort of 2-cycle latency constraint for x86-64 rFLAGS would
    likely have an adverse effect on performance.



    Though, ironically, most "legacy x86" stuff could probably be served
    acceptably with emulators.

    Every try to emulate A24 ? Address bit 24--when we looked at it, it took
    more gates to remove it and put a bit in CPUID so applications could "do
    the right thing" than to simply leave the functionality there.

    My past x86 emulator attempts were limited mostly to 32-bit user-mode
    stuff, so no A20 or A24 wonk or similar (was at the time mostly trying
    to get simple 32-bit Windows programs working).

    If I were to to try to emulate a full machine, would likely switch out
    the memory load/store handling logic (as function pointers) based on the
    value of relevant architectural registers (such as whether paging is
    enabled or disabled).


    Most recent efforts to write an x86 emulator have fizzled relatively
    quickly though; mostly for concerns that I wouldn't get enough
    performance to make it worthwhile (and emulating x86 on my x86 PC
    wouldn't be terribly useful, and on RasPi there is also QEMU and DOSBox,
    even if the performance sucks).



    If it can't maintain a performance advantage (say, if ARM and RISC-V
    catch up or exceed the performance possible on higher end x86 chips), it
    is effectively done.

    x86 performance advantage has ALWAYS been in the cubic amounts of cash
    flow running through the FAB to pay the engineering team budgets.

    Recent years have mostly been model numbers advancing faster than any
    single threaded performance improvements...

    And ARM is catching up.

    Seemingly, the RISC-V chips are a bit further behind, but seem to be
    advancing up the ladder rather quickly.



    Or, "How about bigger AVX?", goes back and forth, AND apparently
    supporting AVX512 via the cheaper mechanism of doing the operations as multiple parts.

    Where, seemingly SIMD going too much wider than 128 bits actually makes
    stuff worse...


    Pretty much my entire adult life, there hasn't been much obvious gain
    from SIMD going wider than 128 bits, I am inclined to posit that 128
    bits is probably near optimal.

    And, the advantage of SIMD lies more with subdividing the registers into
    N elements (without increasing pipeline or register width), rather than
    trying to gain more elements by pushing registers to bigger sizes.

    Personally, I also have a lot more use cases for 4-wide vectors of
    16-bit elements than I do for 256 bit vectors.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 00:53:09 2024
    From Newsgroup: comp.arch

    On 10/21/2024 7:21 PM, MitchAlsup1 wrote:
    On Mon, 21 Oct 2024 22:02:27 +0000, BGB wrote:

    On 10/17/2024 4:34 PM, EricP wrote:

    Say, if one could make the CPU itself have 35% more perf/W by jumping to
    a different encoding scheme, this could easily offset if they needed to
    pay a 20% cost by JIT compiling everything when running legacy
    software...

    This only works when the mative ISA has a direct path to emulating
    the address modes of x86-64 which includes [Rbase+Rindex<<scale+DISP]


    This is a bigger problem for unmodified RISC-V as I see it.

    Though, in theory, RISC-V with Zba can do:
    SH2ADD Xd, Xb, Xi
    LW Xd, Disp(Xd)

    Which arguably isn't too bad.

    I think they underestimate the costs of needing a 2-op sequence vs, say:
    LW Xd, (Xb, Xi)


    But, yeah, if trying to design an x86 stand-in, might make sense to
    prioritize trying to be close to 1:1 for core x86 ops, or use fewer ops
    in cases where multiple x86 ops can be merged (say, 3R vs 2R, ...).

    This does probably still mean a variable length encoding, but, say:
    32/64/96, or 16/32/64/96, or similar.

    Not 1-15 bytes with a fairly ad-hoc set of encoding rules.
    Also the official RISC-V strategy for larger encodings sucks...


    IMO, jumbo prefixes or similar are preferable, as the extension scheme
    is more straightforward (and possible encodings can drop out of a
    predefined set of rules; not so much by needing people to go and define
    each possible extended encoding individually, with no real consistent
    layout guidelines for how the extended instruction spaces are to be structured, ...).



    It is also a hopelessly frail path to self destruction:: Transmeta.


    I am imagining not exactly taking the Transmeta path, but possibly more
    like a Rosetta path, with a Transmeta-like fallback if trying to boot an
    old OS (if the OS lacks the native ISA, so is booted in full emulation).

    But, yeah, likely would be better to have the OS to be able to run the
    native ISA.


    Granted, this is predicated on the assumption that one could get such a
    jump by jumping to a different encoding scheme.

    It is not the encoding scheme that is kaput, it is the semantics
    such a scheme provides the programmer via ISA. --------------------------------

    Possibly.

    My ideas thus far end up looking sort of like:
    Core ISA similar to BJX2 or RISC-V semantics, but with ability to
    express large immediate values and similar (a weak area for normal
    RISC-V). Should be able to efficiently express x86 address modes, ...

    Should likely also have helpers for things like rFLAGS twiddling, ...


    The major selling point of x86 has been its backwards compatibility, but
    this advantage may be weakening with the rise of the ability to emulate
    stuff at near native performance. If Windows could jump ship and provide
    an experience that "doesn't suck" (fast/reliable/transparent emulation
    of existing software), the main advantages of the x86-64 legacy may go
    away (and is already mostly moot in Linux since the distros typically
    recompile everything from source, with little real/significant ties to
    the x86 legacy).

    W11 has done enough to my day-to-day operations I am willing to
    jump ship to Linux in order to avoid daily updates an the myriad
    of technical issues that never seem to get solved in a way that
    makes then "go away" forever. So, for me it is not that it will
    be an x86 (or ARM, or ...) it is that it is not MS oriented.


    I am still using Win10 on my PC, but parents have a PC Win11, and what
    little I have encountered it (and heard about it) doesn't really make me
    want to use it.

    Well, and Win11 doesn't see my PC as being compatible.

    But, this leaves stuff in a limbo state for now.

    ...

    Most of the software I run is open source, so theoretically a jump is possible.


    Usually, I would skip over Windows versions that kinda sucked, say, I
    stuck with XP-X64 until Win7, then went from Win7 to Win10.

    Looks like MS is trying to push people into using Win11 though (while at
    the same time trying to push SecureBoot and TPMs, ...).

    ...



    Ironically, I would almost assume getting one of the manycore ARM based systems and running Linux on it, except they are still expensive. And,
    on the other side, a RasPi or other similar SBC is not a viable replacement.

    Would want something I would have a proper GPU on and plug in 6 or 8
    SATA devices, etc.

    So, my wishlist would include among other things:
    ATX family form-factor;
    PC-like or PC-superior performance;
    Can have, say, 128GB of RAM and 8+ SATA devices;
    ...

    But, at least at the moment, x86-64 still owns this space...


    But, may change in the future...


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 01:16:34 2024
    From Newsgroup: comp.arch

    On 10/21/2024 8:16 PM, John Levine wrote:
    According to MitchAlsup1 <mitchalsup@aol.com>:
    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Intel's never going to catch up in the phone market but they're still significant in the server and cloud market.

    Think about the way that current Intel chips have a native 64 bit architecture
    but can still have a 32 bit user mode that can run existing 32 bit application
    binaries. So how about if the next generation is native x86S, but can also run
    existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.


    As I understand it, as-is x86S would mostly effect OS level code,
    leaving the userland basically intact (still able to run existing software).

    If your goal is "Run an OS like Win 11, Business as usual", this makes
    sense. Modern Windows doesn't really use the stuff that x86S removes.


    Longer term, this may be insufficient. If Moore's Law grinds entirely to
    a halt, running x86-64 in hardware may not be a win.


    Then again, I guess both Itanium and Transmeta can be taken as examples
    of ways to shoot oneself in the foot as well.

    In this case, the Apple situation makes more sense. They have jumped
    MacOS from x86 to ARM, without loosing all of their existing software
    base, by running a userland emulator that "doesn't suck".


    Granted, can't necessarily trust MS here, as much of the time MS has
    done stuff like using emulation strategies that are awkward and suck.
    Like, say, running Windows inside an emulator, in Windows, and just sort
    of crudely gluing the desktops together between programs in the native
    and VM Windows instance (without giving programs in the VM transparent
    access to the host OS's filesystem, ...).

    So, it is more a thing of "What if the emulation layer, doesn't
    suck?...". One where there are no obvious seams, the VM program
    instances looking and behaving just like native ones, able to see all
    the same files, ...



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue Oct 22 08:41:40 2024
    From Newsgroup: comp.arch

    In article <vf7g04$1c5qd$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

    In this case, the Apple situation makes more sense. They have
    jumped MacOS from x86 to ARM, without loosing all of their existing
    software base, by running a userland emulator that "doesn't suck".

    Granted, can't necessarily trust MS here, as much of the time MS
    has done stuff like using emulation strategies that are awkward and
    suck. Like, say, running Windows inside an emulator, in Windows,
    and just sort of crudely gluing the desktops together between
    programs in the native and VM Windows instance (without giving
    programs in the VM transparent access to the host OS's filesystem,
    ...).

    The x86-32 and x86-64 emulation in ARM Windows 11 is pretty good. Native
    and emulated programs run on the same desktop with no visible seams.

    They don't have the "emulated x86 is faster than native x86" that Apple
    users report, but Apple stopped updating their x86 machines in 2018,
    creating sn additional performance gap.

    John
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Tue Oct 22 15:26:20 2024
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> writes:
    Think about the way that current Intel chips have a native 64 bit architecture >but can still have a 32 bit user mode that can run existing 32 bit application >binaries. So how about if the next generation is native x86S, but can also run >existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support. x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway. It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path). It seems to me that most of the complexity of
    current CPUs would still be there.

    And I certainly prefer a CPU that has more capabilities to one that
    has less capabilities. Sometimes I want to run old binaries.

    So what would be my incentive as a user to buy an x86S CPU? Will they
    sell them for less? I doubt it.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue Oct 22 17:38:01 2024
    From Newsgroup: comp.arch

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    John Levine <johnl@taugh.com> writes:
    Think about the way that current Intel chips have a native 64 bit architecture
    but can still have a 32 bit user mode that can run existing 32 bit application
    binaries. So how about if the next generation is native x86S, but can also run
    existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support. x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway. It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path). It seems to me that most of the complexity of >current CPUs would still be there.

    Most of the proposed changes are unintersting to user mode developers.

    They're definitly interesting to system software (UEFI, Hypervisor,
    Kernel folks), if only to clean up the boot and startup paths.

    Those changes also will reduce the RTL verification load, and
    perhaps simplify other areas of the implementation leading to further efficiencies down the road. The A20 gate should be relegated
    to the trash heap of history.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue Oct 22 18:39:40 2024
    From Newsgroup: comp.arch

    In article <2024Oct22.172620@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    x86S eliminates a part of the compatibility path from systems
    of yesteryear, but not that many people use these parts nowadays
    anyway. It's unclear to me what benefits these changes are
    supposed to buy

    I don't know how much circuitry and firmware in motherboard chipsets is required to support the old compatibility paths, but the manufacturers
    would doubtless like to save costs there. This might also make the
    machines more "secure" in that special sense used with DRM.

    Microsoft would probably like machines where media playing was harder to intercept, because that would earn them more trust from the media conglomerates.

    John
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 13:43:40 2024
    From Newsgroup: comp.arch

    On 10/22/2024 10:26 AM, Anton Ertl wrote:
    John Levine <johnl@taugh.com> writes:
    Think about the way that current Intel chips have a native 64 bit architecture
    but can still have a 32 bit user mode that can run existing 32 bit application
    binaries. So how about if the next generation is native x86S, but can also run
    existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support. x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway. It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path). It seems to me that most of the complexity of current CPUs would still be there.

    And I certainly prefer a CPU that has more capabilities to one that
    has less capabilities. Sometimes I want to run old binaries.

    So what would be my incentive as a user to buy an x86S CPU? Will they
    sell them for less? I doubt it.


    Yeah, basically my thoughts as well.
    Business as usual...

    Main effect it achieves is breaking legacy boot, doesn't seem like it
    would either save all that much nor "solve" x86's longstanding issues.



    And, proposing an, "x86-64 but re-imagined as a RISC-style ISA" mode
    could have made sense...


    Say:
    Instructions have a simpler and more consistent encoding;
    Probably still VLE, but less free-form.
    Maybe expand to 32 or 64 registers;
    3 register instructions;
    Maybe split ALU ops into CC-update and No-CC-update variants;
    Sorta like ARM.
    Most other ops become No-CC-Update only.
    This mode drops things like x87 and MMX and similar;
    Only SSE/AVX.
    Would make sense to keep the existing addressing modes (*1);
    Maybe keep LoadOP and OPStore, but limited in scope.
    Ironically, something like RISC-V AMO requires such a mechanism.
    If one implements something like AMO, may as well have LoadOP.
    ...

    Maybe, as for AVX512:
    They can either make it usable "in general", or deprecate it.


    *1: Probably, say (if I were designing the encoding):
    {Rb+Disp10s] //32-bit encoding
    {Rb+Ri*FixSc] //32-bit encoding
    {Rb+Ri*Sc] //64-bit encoding
    [Rb+Disp33s] //64-bit encoding
    [Rb+Ri*Sc+Disp11s] //64-bit encoding
    [Rb+Ri*Sc+Disp33s] //96-bit encoding

    Some tweaks are possible, the above is mostly "if encoded in a similar
    way to XG2 or my RV+Jumbo-Prefix thing".

    Though, in my case, the issue with the latter address modes had been
    less: "How to encode" so much as "Do they buy enough to justify the
    added cost of a 3-way adder in the AGU..."

    But, for an "x86 successor", might be difficult to justify limiting
    things in a way that would require increasing the instruction count.

    LoadOP would likely be allowed for basic ALU ops, but N/A for SSE/AVX (assembler could fake these though by breaking them into multiple ops).


    Might make sense to offload a lot of the SSE/AVX stuff to 64-bit
    encodings, and essentially merge x86 into AVX (makes little sense to
    have separate encodings that do the same thing on the same registers).

    One other likely goal would be to make it mostly backwards compatible
    with existing x86-64 ASM code, which would likely simplify getting a
    compiler for it.

    In many cases, a JIT could potentially be pretty close to a 1:1 decode/re-encode process in this case (though, a bit more if trying to
    emulate legacy modes).

    For translation or assembly, No-CC-Update forms could be inferred from
    default forms by looking forwards in the instruction stream (if a
    following instruction entirely masks any flags updates, can use the non-updating form instead).


    Emulating x87 for legacy code could be harder though. Common case, x87
    stack could be resolved statically, but there is a subset of cases where
    a non-static mapping could result. A JIT Would likely map x87 onto XMM registers, in any case.


    Probable register spaces:
    R0 / RAX
    R1 / RCX
    R2 / RDX
    R3 / RBX
    R4 / RSP
    R5 / RBP
    R6 / RSI
    R7 / RDI
    R8..R15: Same as x86-64
    R16..R31: Extended, otherwise similar.
    XMM0 ..XMM15: Same
    XMM16..XMM31: Expanded

    Would drop x87, MMX and x87/MMX registers.


    Unlike RISC-V, I would assume keeping the base immediate values smaller
    (9 or 10 bits) and use VLE for larger immediate values.

    Doing 12-bit immediate values as the default essentially eats a lot of encoding space for relatively little gain.
    Would have 17-bit constant-load and ADD as special cases.


    My usual rationale for prioritizing 17 and 33 bit constants in some
    cases, over 16/32, or various other sizes, is that these sizes have
    "unusually good" hit rates IME (if you can cover both "signed short" and "unsigned short" ranges, the hit rate is significantly better than if it
    just covers "signed short", but adding a few additional bits gains
    relatively little; likewise for 33).

    Pattern isn't so strong for 9s though, where both 9u and 10s are
    stronger than 9s (but 11 and 12 bits seem to see a rapid drop-off; at
    which point it is likely better to jump to a bigger encoding).

    ...



    - anton

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 21:13:41 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 18:43:40 +0000, BGB wrote:

    On 10/22/2024 10:26 AM, Anton Ertl wrote:

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support. x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway. It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path). It seems to me that most of the complexity of
    current CPUs would still be there.

    And I certainly prefer a CPU that has more capabilities to one that
    has less capabilities. Sometimes I want to run old binaries.

    So what would be my incentive as a user to buy an x86S CPU? Will they
    sell them for less? I doubt it.


    Yeah, basically my thoughts as well.
    Business as usual...

    Main effect it achieves is breaking legacy boot, doesn't seem like it
    would either save all that much nor "solve" x86's longstanding issues.

    Intel needs a better way to exit reset--and that means the MMU/TLBs
    are already up and working at the time reset is exited. This cannot
    be made backwards compatible.
    -------------------------------

    *1: Probably, say (if I were designing the encoding):
    {Rb+Disp10s] //32-bit encoding
    {Rb+Ri*FixSc] //32-bit encoding
    {Rb+Ri*Sc] //64-bit encoding
    [Rb+Disp33s] //64-bit encoding
    [Rb+Ri*Sc+Disp11s] //64-bit encoding
    [Rb+Ri*Sc+Disp33s] //96-bit encoding

    [Rb+DISP16] // 32-bit 16 > 10
    [Rb+Ri<<sc] // 32-bit
    [Rb+Ri<<sc+DISP32] // 64-bit 32 > 11
    [Rb+Ri<<sc+DISP64] // 96-bit 64 > 33
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Tue Oct 22 16:18:50 2024
    From Newsgroup: comp.arch

    On 10/22/2024 12:38 PM, Scott Lurndal wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    John Levine <johnl@taugh.com> writes:
    Think about the way that current Intel chips have a native 64 bit architecture
    but can still have a 32 bit user mode that can run existing 32 bit application
    binaries. So how about if the next generation is native x86S, but can also run
    existing 64 bit binaries, even if not as fast as native x86S. They get the usual
    cloud operating systems ported to x86S while leaving a path for people to migrate
    their existing applications gradually.

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support. x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway. It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path). It seems to me that most of the complexity of
    current CPUs would still be there.

    Most of the proposed changes are unintersting to user mode developers.


    Yes, on current platforms, they are unlikely to notice.


    Would make the chip essentially "useless" for a retro system, but most
    of these guys are either using vintage parts, or emulation.

    Like, little point in trying to run Win98 on a newest-generation
    platform (and, apparently, getting Win98 working natively on anything
    much newer than the mid 2000s is pain, as even if the CPU is backwards compatible, most of the other hardware is not).


    Then, there is the pros/cons option of "Well, run QEMU or similar..."
    Decided to leave out going into thoughts on the QEMU experience
    (technically works OK, but in some areas is a little lacking).


    Though, there is a possible merit to trying to have a userland-only
    emulator. Unlike full system/OS level emulator, all of the wonk and
    issues with emulating hardware interfaces and drivers, and with
    filesystem integration, can largely go away (one can essentially trap
    out of the emulation at the system-call level).

    Though, it is possible more effort than in may seem to try to write
    usable mockups for KERNEL32.DLL and USER32.DLL and similar (and can't
    directly copy-paste these parts from Wine; didn't get very far). I think
    I got annoyed and gave up trying to debug it at the time.

    Though, ironically, some code from this past project was copy-pasted
    into what later became TestKern.


    IIRC, the goal at the time was to make something like Wine that ran on a RasPi. Since then, Wine itself has gained this capability (by internally offloading the emulation parts to QEMU). Not looked too much into how
    this setup works.

    But, in any case, at least on theory, the need to stick with x86 as the
    native ISA for sake of backwards compatibility seems to be weakening.



    Had on/off considered trying to revive the idea in a different form, but
    had mostly stalled out (if the host is 50MHz, running x86 via an
    interpreter is going to be too slow to be worthwhile).

    It seemed likely more practical to try to get RV64G Linux-ELF binaries
    working than to try to try to get Win32 binaries working. Though, this
    had also stalled, as now there is the issue of trying to figure out why "ld-linux.so" and similar keep exploding (thus far, all the "actually
    usable" RV64 builds have been using my own C library).

    ...



    They're definitly interesting to system software (UEFI, Hypervisor,
    Kernel folks), if only to clean up the boot and startup paths.

    Those changes also will reduce the RTL verification load, and
    perhaps simplify other areas of the implementation leading to further efficiencies down the road. The A20 gate should be relegated
    to the trash heap of history.

    Possibly true, but presumably RTL verification of legacy features is not
    a contiguous recurring cost...



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch on Tue Oct 22 17:59:46 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Only because the average cell phone gets broken or flooded within a
    year. If people were not so careless, I doubt most would be replaced
    so often.

    My current phone is over 4 years old and it continues to serve all of
    my needs. Sans damage, the only reason I would choose to replace it
    would be when critical apps no longer support the OS version.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 22:17:28 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 21:59:46 +0000, George Neuner wrote:

    On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    x86's long term survival depends on things out of AMD's and Intel's
    hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Only because the average cell phone gets broken or flooded within a
    year. If people were not so careless, I doubt most would be replaced
    so often.

    My current phone is over 4 years old and it continues to serve all of
    my needs. Sans damage, the only reason I would choose to replace it
    would be when critical apps no longer support the OS version.

    My first cell phone (Galaxy 3) I got in 2012 and used it until 2022
    when the service provider offered a zero cost upgrade because they
    were loosing access to the 4G-LTE antennae. I did put in 2 new
    batteries, and nothing was scratched or dented after 11 years of use.

    I still liked it better than the Galaxy 12 I have now. ...

    Oh and BTW:: I do not carry my cell phone unless I am traveling
    or expecting a call. It lives in my office--probably why it is
    not being damaged by being sat upon or dropped into water, and
    other causes of cell phone death.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:46:50 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 01:16:34 -0500, BGB wrote:

    In this case, the Apple situation makes more sense. They have jumped
    MacOS from x86 to ARM, without loosing all of their existing software
    base, by running a userland emulator that "doesn't suck".

    Seems like the Apple platform has less need for third-party addons that intrude into the kernel, simply because it has a smaller choice of apps anyway.

    For example, anticheat mechanisms for online games. Fortnite is one I have seen mentioned, that cannot work with the x86 emulation offered by Windows-on-ARM. Presumbly this is not a problem on the Mac because
    Fortnite is simply unavailable on the Mac.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:48:24 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 18:38 +0100 (BST), John Dallman wrote:

    Microsoft would probably like machines where media playing was harder to intercept, because that would earn them more trust from the media conglomerates.

    One of the innovations in Windows Vista was the addition of the “Protected Media Path”, which was supposed to solve exactly this problem. Didn’t it? --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Tue Oct 22 23:52:23 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 16:18:50 -0500, BGB wrote:

    Like, little point in trying to run Win98 on a newest-generation
    platform (and, apparently, getting Win98 working natively on anything
    much newer than the mid 2000s is pain ...

    Funny, I did exactly that for a friend a couple of years ago. The Windows
    98 image ran under PCem <https://pcem-emulator.co.uk/>, on a Linux Mint installation on an MSI Cubi 5.

    I set up a “captive” user under Mint that, the moment you logged in, started the emulator running Windows. Shut down Windows, and it logged you
    out again.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch on Thu Oct 24 15:59:48 2024
    From Newsgroup: comp.arch

    On Tue, 22 Oct 2024 22:17:28 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    On Tue, 22 Oct 2024 21:59:46 +0000, George Neuner wrote:

    On Tue, 22 Oct 2024 00:03:43 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    x86's long term survival depends on things out of AMD's and Intel's >>>hands. It depends on high volume access to devices people will buy
    new every year or every other year. A PC is not such a thing, while
    a cell phone seems to be.

    Only because the average cell phone gets broken or flooded within a
    year. If people were not so careless, I doubt most would be replaced
    so often.

    My current phone is over 4 years old and it continues to serve all of
    my needs. Sans damage, the only reason I would choose to replace it
    would be when critical apps no longer support the OS version.

    My first cell phone (Galaxy 3) I got in 2012 and used it until 2022
    when the service provider offered a zero cost upgrade because they
    were loosing access to the 4G-LTE antennae. I did put in 2 new
    batteries, and nothing was scratched or dented after 11 years of use.

    I still liked it better than the Galaxy 12 I have now. ...

    I used an LG flip phone from 2008..2020. Prior to that I had a Nokia
    "stick" from 1995. Before that I had a Motorola flip phone from early
    80's that was on my parents' plan.

    Only reasons I have ever upgraded was because carriers changed service requirements: 2G->3G, 3G->4G. I have never had to replace a phone
    because it was damaged.

    Current phone still is 4GLTE. It's OS is slated to sunset soon, but I
    expect to keep using it until developers drop support and the apps I
    need will no longer update.


    Oh and BTW:: I do not carry my cell phone unless I am traveling
    or expecting a call. It lives in my office--probably why it is
    not being damaged by being sat upon or dropped into water, and
    other causes of cell phone death.

    I /do/ carry my phone - always in my left front pocket. I won't answer
    if I'm busy (never while in the bathroom or while driving) ... if the
    caller won't leave a message, it's obvious that the call was not
    important.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From BGB@cr88192@gmail.com to comp.arch on Thu Oct 24 23:31:35 2024
    From Newsgroup: comp.arch

    On 10/22/2024 4:13 PM, MitchAlsup1 wrote:
    On Tue, 22 Oct 2024 18:43:40 +0000, BGB wrote:

    On 10/22/2024 10:26 AM, Anton Ertl wrote:

    Several things in this paragraph makes no sense.

    In particular, x86S is a proposal for a reduced version of the stuff
    that current Intel and AMD CPUs support: There is full 64-bit support,
    and 32-bit user-level support.  x86S eliminates a part of the
    compatibility path from systems of yesteryear, but not that many
    people use these parts nowadays anyway.  It's unclear to me what
    benefits these changes are supposed to buy (unlike the elimination of
    A32/T32 from some ARM chips, which obviously eliminates the whole
    A32/T32 decoding path).  It seems to me that most of the complexity of
    current CPUs would still be there.

    And I certainly prefer a CPU that has more capabilities to one that
    has less capabilities.  Sometimes I want to run old binaries.

    So what would be my incentive as a user to buy an x86S CPU?  Will they
    sell them for less?  I doubt it.


    Yeah, basically my thoughts as well.
       Business as usual...

    Main effect it achieves is breaking legacy boot, doesn't seem like it
    would either save all that much nor "solve" x86's longstanding issues.

    Intel needs a better way to exit reset--and that means the MMU/TLBs
    are already up and working at the time reset is exited. This cannot
    be made backwards compatible.
    -------------------------------

    I am not sure how this would have much effect on cost either way.
    A physical address mode could just be some edge case logic in the MMU
    (say, whenever there is a TLB miss with MMU disabled, it merely loads an identity mapped address into the TLB).



    *1: Probably, say (if I were designing the encoding):
       {Rb+Disp10s]        //32-bit encoding
       {Rb+Ri*FixSc]       //32-bit encoding
       {Rb+Ri*Sc]          //64-bit encoding
       [Rb+Disp33s]        //64-bit encoding
       [Rb+Ri*Sc+Disp11s]  //64-bit encoding
       [Rb+Ri*Sc+Disp33s]  //96-bit encoding

        [Rb+DISP16]         // 32-bit   16 > 10
        [Rb+Ri<<sc]         // 32-bit
        [Rb+Ri<<sc+DISP32]  // 64-bit   32 > 11
        [Rb+Ri<<sc+DISP64]  // 96-bit   64 > 33


    One doesn't want to burn too much encoding space...

    If the goal is to redesign x86 as a RISC-like ISA, one is likely going
    to need a lot of space for opcode bits.

    This is partly why I was thinking 32 registers rather than 64, along
    with the smaller immediate fields.

    Say, one possible encoding scheme would be to use a similar base format
    to RISC-V:
    ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY1 //32-bit op
    ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY0 //64/96-bit op

    Then, say:
    1/2 the 32-bit encoding space is 3R ops:
    1/4 the 32-bit encoding space is 3RI ops:
    Remaining 1/4 for Imm16 and JMP/JCC and similar.

    Say, could burn a 24/25-bit chunk of encoding space on JMP/CALL/JCC
    iiiiiii-iiiii-iiiii-iii-Zcccc-YY-YYYY1
    Where:
    cccc is like x86 Jcc condition code,
    but maybe reuse P and NP for JMP and CALL.

    Though, might make sense to do CALL/RET using a link-register rather
    than the stack, even if x86 traditionally used the stack.



    For 64-bit:
    LD/ST/OPLD/OPST: [Rb+Disp10] expands to [Rb+Disp33s]
    LD/ST/OPLD/OPST: [Rb+Ri*Sc] expands to [Rb+Ri*Sc+Disp11s] or Disp17s.
    Remaining bits go to opcode.

    Say:
    ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc]
    And:
    iiiiiii-iiiii-iiiii-xxx-xxxxx-xx-xxxx0 -
    ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp17s]
    And:
    iiiiiii-iiiii-iiiii-iii-iiiii-ii-iiii0 -
    kkkkkkk-kkkkk-kkkkk-xxx-xxxxx-ii-xxxx0 -
    ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp33s]


    Could maybe use some of the extra bits encoding things like:
    ADD.Q [Rb+Ri*Sc+Disp33s], Imm17s.
    Or:
    ADD.Q [Rb+Ri*Sc+Disp17s], Imm33s.
    Say, by having a Rn/Imm bit, and a bit to specify which immediate is
    used as the constant and the other as the displacement.


    But, with Disp10 base-forms, might expand to Disp33:
    iiiiiii-iiiii-iiiii-xxx-iiiii-xx-xxxx0 -
    iiiiiZZ-iiiii-mmmmm-dZi-nnnnn-YY-YYYY1 //MEM [Rm+Rt*Sc+Disp17s]

    Where the 'd' flag could select between, say:
    "ADD Rn, [Rm+Disp]" or "ADD [Rm+Disp], Rn"

    32-bit encodings only allowing a register, whereas 64-bit encodings
    could allow an immediate.


    But, not really sure...





    In other news, went and wrote up a spec and threw together Verilog code
    for a reworked BSR4K/XG3 ISA design:
    https://pastebin.com/yfrh50bk

    There are still some holes (the spec is missing pretty much all the 2R
    ops for now), but alas. A few parts I have decided would not necessarily
    be carried over, as some newer instructions and the addition of a Zero Register made some amount of the former 2R and 2RI instructions no
    longer necessary (though, some could still be useful for efficiency; or
    have other useful roles like format conversion).


    To make implementation cheaper/easier for me, it is essentially XG2RV
    with the bits shuffled around, a few inverted, and some special case
    changes (changes branch mechanics and some edge cases involving decoding immediate values).

    Initially I tried putting the repacking logic at the front end of the ID stage, but (unsurprisingly), synthesis and timing wasn't too happy about this...

    Ended up instead putting the repack logic at the end of the IF stage.


    There was another possible idea that I could call BSR4J:
    Would have done a simpler repacking scheme:
    First 16 bits are repacked:
    NMOP-YwYY-nnnn-mmmm => NMOY-mmmm-nnnn-YYPw
    High 16 bits copied unmodified.

    So, overall instruction format, seen as 32-bits, could have been:
    ZZZZ-qnmo-oooo-XXXX-NMOY-mmmm-nnnn-YYPw


    But, it was admittedly more tempting, if I am going to be repacking
    anyways, to make an attempt to "un-dog-chew" the instruction format (in
    an attempt to make it look nicer).

    It is not fully settled yet, could jump over to the BSR4J strategy
    instead if the more aggressive repacking scheme is in-fact a bad idea.
    One arguable merit if does have is that all of the original 4-bit fields remain 4-bit aligned (and converting between XG2 and BSR4J would be significantly less bit-twiddling vs BSR4K; while still achieving the
    goal of being able to fit it into the same encoding space as RISC-V).



    I have yet to decide on some specifics for the mapping of 2R instructions: Simpler/cheaper: Use the same repacking as 3R ops for 2R ops;
    Possible: Modify packing rules such that the 3rd part of the opcode
    field is also 4-bit aligned.

    Say:
    As-is : XXXX-kkWWWW-mmmmmm-ZZZZ-nnnnnn-QY-YYPw
    Possible: XXXX-WWWWkk-mmmmmm-ZZZZ-nnnnnn-QY-YYPw
    Would be slightly more logic complexity, but could make it easier to
    visually decode 2R instructions in a hexdump (but, likely not be worth
    the additional cost).


    Expressing it as bits though makes it more obvious that I actually have
    less total encoding space than RISC-V, as the 6-bit register fields take
    their cut.

    Say:
    ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYY11 (RV)
    ZZZZ-oooooo-mmmmmm-ZZZZ-nnnnnn-QY-YYPw (XG3)

    Each RV Y block has 10 bits of opcode.
    Whereas, each XG3 Y block has 9 bits.

    XG3 currently has 3 Y blocks reserved for 3R:
    0/3/5 (~ 10.585 bits)
    RV had 2 blocks for the core ISA (11 bits).

    Though, the B extension squanders a few big chunks of it by defining
    some 4R instructions (such as Funnel Shift).


    Contrast, BJX2 doesn't define any 32-bit 4R instructions.

    Also, B extension further weakens the case for not having a register
    indexed addressing mode: Any core implementing the B extension's FSR instruction is going to need a 3R capable register port on the GPRs ...



    Ironically, it seems that just the 'V' and 'P' extensions both end up
    eating more opcode space than the total 3R opcode space in BJX2...



    XG3 effectively ends up spending 1/4 of the total encoding space on
    Jumbo prefixes.

    Where
    Baseline: FE/FF, 25 bits total
    Imm64 only possible via an Imm16 base op (24+24+16=64).
    XG2: xxx1-111x, 28 bits total
    XG3: x1-1zyy, ~ 29.585

    Though, in XG2 the 28-bit jumbo prefixes do allow 27+27+10=64, for 3RI
    Imm64 ops (with the remaining bit for immediate-extension vs more
    general instruction extension).


    Reason it expands in XG3 is that the Jumbo prefixes effectively also eat
    the former PrWEX spaces (XG3 loses WEX and PrWEX, would need to use superscalar instead).

    The would-be PrWEX spaces could maybe be used for something else, but
    unclear what at the moment (likewise the FA/FB blocks are unused and XG2
    and effectively N/A in XG2RV; as the role they served in Baseline has effectively "fallen out of the scope of the ISA"; and being N/E in XG3).

    I guess potentially a case could be made to potentially reclaim these
    blocks in XG2 as a range of "Non-predicated Scalar-Only" instructions.

    Could almost relocate branches here (and work towards potentially
    reclaiming the space used by branches in the F0 block to eventually be eventually reassigned to 3R space, roughly worth 64 3R ops).

    Say:
    CCC1-101Z: FA/FB Imm25
    ccc (inverse):
    000: MOV Imm25, DLR //As-is, Original Role
    001: BSR Disp25s
    010: BT Disp25s
    011: BF Disp25s
    Doesn't solve the issue for Baseline, as these would be N/E in Baseline.


    Having the encoding scheme fragmenting into a tree of encoding
    sub-variants is getting kind of annoying though, may eventually need to
    prune the tree.

    ...


    --- Synchronet 3.20a-Linux NewsLink 1.114