• Re: Privilege Levels Below User

    From Paul A. Clayton@paaronclayton@gmail.com to comp.arch on Sun Oct 20 20:42:32 2024
    From Newsgroup: comp.arch

    THREAD NECROMANCY

    On 6/11/24 5:18 PM, MitchAlsup1 wrote:
    [snip]
    I doubt that RowHammer still works when refreshes are interspersed
    between accesses--RowHammer generally works because the events are
    not protected by refreshes--the DRC sees the right ROW open and
    simple streams at the open bank.

    If one refreshes the two adjacent rows to avoid data disruption,
    those refreshes would be adjacent reads to two other rows so it
    seems one would have to be a little cautious about excessively
    frequent refreshes.

    Also note, there are no instructions in My 66000 that force a cache
    to DRAM whereas there are instructions that can force a cache line
    into L3.

    How does a system suspend to DRAM if it cannot force a writeback
    of all dirty lines to memory? I am *guessing* this would not use a
    special instruction but rather configuration of power management
    that would cause hardware/firmware to clean the cache.

    Writing back specific data to persistent memory might also
    motivate cache block cleaning operations. Perhaps one could
    implement such by copying from a cacheable mapping to a
    non-cacheable(I/O?) memory?? (I simply remember that Intel added
    instructions to write cache lines to persistent memory.)

    L3 is the buffer to DRAM. Nothing gets to DRAM without
    going through L3 and nothing comes out of DRM that is not also
    buffer by L3. So, if 96 cores simultaneously read a line residing in
    DRAM, DRAM is read once and 95 cores are serviced through L3. So,
    you can't RowHammer based on reading DRAM, either.

    If 128 cores read distinct cache lines from the same page quickly
    enough to hammer the adjacent pages but not quickly enough to get
    DRAM page open hits, this would seem to require relatively
    frequent refreshes of adjacent DRAM rows.

    Since the L3/memory controller could see that the DRAM row was
    unusually active, it could increase prefetching while the DRAM
    row was open and/or queue the accesses longer so that the
    hammering frequency was reduced and page open hits would be more
    common.

    The simple statement that L3 would avoid RowHammer by providing
    the same cache line to all requesters seemed a bit too simple.

    Your design may very well handle all the problematic cases,
    perhaps even with minimal performance penalties for inadvertent
    hammering and logging/notification for questionable activity just
    like for error correction (and has been proposed for detected race
    conditions). I just know that these are hard problems.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 21:08:24 2024
    From Newsgroup: comp.arch

    On Mon, 21 Oct 2024 0:42:32 +0000, Paul A. Clayton wrote:

    THREAD NECROMANCY

    On 6/11/24 5:18 PM, MitchAlsup1 wrote:
    [snip]
    I doubt that RowHammer still works when refreshes are interspersed
    between accesses--RowHammer generally works because the events are
    not protected by refreshes--the DRC sees the right ROW open and
    simple streams at the open bank.

    If one refreshes the two adjacent rows to avoid data disruption,
    those refreshes would be adjacent reads to two other rows so it
    seems one would have to be a little cautious about excessively
    frequent refreshes.

    Also note, there are no instructions in My 66000 that force a cache
    to DRAM whereas there are instructions that can force a cache line
    into L3.

    How does a system suspend to DRAM if it cannot force a writeback
    of all dirty lines to memory?

    In GENERAL, you do not want to give this capability to applications
    nor use it willy-nilly.

    I am *guessing* this would not use a
    special instruction but rather configuration of power management
    that would cause hardware/firmware to clean the cache.

    There is a sideband command from any master (anywhere) that causes
    L3 to get dumped to DRAM over the next refresh interval. It is not
    an instruction, and the TLB has to cooperate. A device may initiate
    "suspend to DRAM" as well as a CPU (or any other bus master).

    Writing back specific data to persistent memory might also
    motivate cache block cleaning operations. Perhaps one could
    implement such by copying from a cacheable mapping to a
    non-cacheable(I/O?) memory?? (I simply remember that Intel added
    instructions to write cache lines to persistent memory.)

    L3 is the buffer to DRAM. Nothing gets to DRAM without
    going through L3 and nothing comes out of DRM that is not also
    buffer by L3. So, if 96 cores simultaneously read a line residing in
    DRAM, DRAM is read once and 95 cores are serviced through L3. So,
    you can't RowHammer based on reading DRAM, either.

    If 128 cores read distinct cache lines from the same page quickly
    enough to hammer the adjacent pages but not quickly enough to get
    DRAM page open hits, this would seem to require relatively
    frequent refreshes of adjacent DRAM rows.

    DDR 5 has a 64 GB/s transfer rate
    128 cache lines (64B) is 8192 bytes
    So this takes 1/8 of a millisecond or 125µs.
    A DDR5 refresh interval is 3.9µs.

    https://www.micron.com/content/dam/micron/global/public/products/white-paper/ddr5-new-features-white-paper.pdf#:~:text=REFRESH%20commands%20are%20issued%20at%20an%20average%20periodic,of%20295ns%20for%20a%2016Gb%20DDR5%20SDRAM%20device.

    So one has refreshes in the described situation.

    Since the L3/memory controller could see that the DRAM row was
    unusually active, it could increase prefetching while the DRAM
    row was open and/or queue the accesses longer so that the
    hammering frequency was reduced and page open hits would be more
    common.

    A DRAM Row stays active, commands just CAS-out more data. That is
    there is no ROW Hammering--the word line remains asserted while
    the sense amplifiers remain asserted with captured data--while
    CASs are used to strobe out ore data {subject to refresh}.

    The simple statement that L3 would avoid RowHammer by providing
    the same cache line to all requesters seemed a bit too simple.

    You need to investigate the difference between RAS and CAS for
    DRAMs.

    Your design may very well handle all the problematic cases,
    perhaps even with minimal performance penalties for inadvertent
    hammering and logging/notification for questionable activity just
    like for error correction (and has been proposed for detected race conditions). I just know that these are hard problems.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 23 22:38:40 2024
    From Newsgroup: comp.arch

    On Sun, 9 Jun 2024 2:23:35 +0000, Lawrence D'Oliveiro wrote:

    On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

    VAX was before common era Hypervisors, do you think VAX could have
    supported secure mode and hypervisor with their 4 levels ??

    “Virtualization” was bandied about in the 1980s more as an idle, theoretical concept rather than a practical one.

    The question was: was the instruction set defined so that code that was designed to run in a privileged mode be run unprivileged, so that any
    attempt to do privileged things would be trapped and emulated by the
    real privileged code? And there was nothing it could do to discover
    it wasn’t running in privileged mode?

    My 66000 ISA has this property, and it is used when hypervisors host hypervisors.

    On the other hand, there is only 1 privileged instruction which
    provides access to 4 separate control register spaces based on
    current Core-Stack level.

    (Obviously performance was not the issue here, but correctness was.)

    For example, the VAX had a MOVPSL instruction that allowed read-only
    access to the entire processor status register. Through this,
    nonprivileged user-mode code could discover it was running in user mode, which would blow the illusion.

    While illustrative, we have entered the realm where processor state
    is closer to a cache line in size than a register in size. And the
    processor (core) stack of software layers is closer to 4 cache lines
    in size.

    The Motorola 680x0 family was I think properly virtualizable in this
    sense. Or maybe the 68020 and 68030 were, but the 68040 was. I think the Motorola engineers working on the ’040 asked if any customers were interested in preserving the self-virtualization feature, and nobody
    seemed to care.

    During 020 development and testing, there was a mode whereby each
    instruction executed raised every possible exception--this only found
    99% of the virtualization problems.
    --- Synchronet 3.20a-Linux NewsLink 1.114