Forum: War Ensemble BBS

Re: Privilege Levels Below User

From Paul A. Clayton@paaronclayton@gmail.com to comp.arch on Sun Oct 20 20:42:32 2024

From Newsgroup: comp.arch

THREAD NECROMANCY

On 6/11/24 5:18 PM, MitchAlsup1 wrote:
[snip]

I doubt that RowHammer still works when refreshes are interspersed
between accesses--RowHammer generally works because the events are
not protected by refreshes--the DRC sees the right ROW open and
simple streams at the open bank.

If one refreshes the two adjacent rows to avoid data disruption,
those refreshes would be adjacent reads to two other rows so it
seems one would have to be a little cautious about excessively
frequent refreshes.

Also note, there are no instructions in My 66000 that force a cache
to DRAM whereas there are instructions that can force a cache line
into L3.

How does a system suspend to DRAM if it cannot force a writeback
of all dirty lines to memory? I am *guessing* this would not use a
special instruction but rather configuration of power management
that would cause hardware/firmware to clean the cache.

Writing back specific data to persistent memory might also
motivate cache block cleaning operations. Perhaps one could
implement such by copying from a cacheable mapping to a
non-cacheable(I/O?) memory?? (I simply remember that Intel added
instructions to write cache lines to persistent memory.)

L3 is the buffer to DRAM. Nothing gets to DRAM without
going through L3 and nothing comes out of DRM that is not also
buffer by L3. So, if 96 cores simultaneously read a line residing in
DRAM, DRAM is read once and 95 cores are serviced through L3. So,
you can't RowHammer based on reading DRAM, either.

If 128 cores read distinct cache lines from the same page quickly
enough to hammer the adjacent pages but not quickly enough to get
DRAM page open hits, this would seem to require relatively
frequent refreshes of adjacent DRAM rows.

Since the L3/memory controller could see that the DRAM row was
unusually active, it could increase prefetching while the DRAM
row was open and/or queue the accesses longer so that the
hammering frequency was reduced and page open hits would be more
common.

The simple statement that L3 would avoid RowHammer by providing
the same cache line to all requesters seemed a bit too simple.

Your design may very well handle all the problematic cases,
perhaps even with minimal performance penalties for inadvertent
hammering and logging/notification for questionable activity just
like for error correction (and has been proposed for detected race
conditions). I just know that these are hard problems.

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Tue Oct 22 21:08:24 2024

From Newsgroup: comp.arch

On Mon, 21 Oct 2024 0:42:32 +0000, Paul A. Clayton wrote:

THREAD NECROMANCY

On 6/11/24 5:18 PM, MitchAlsup1 wrote:
[snip]

I doubt that RowHammer still works when refreshes are interspersed
between accesses--RowHammer generally works because the events are
not protected by refreshes--the DRC sees the right ROW open and
simple streams at the open bank.

If one refreshes the two adjacent rows to avoid data disruption,
those refreshes would be adjacent reads to two other rows so it
seems one would have to be a little cautious about excessively
frequent refreshes.

Also note, there are no instructions in My 66000 that force a cache
to DRAM whereas there are instructions that can force a cache line
into L3.

How does a system suspend to DRAM if it cannot force a writeback
of all dirty lines to memory?

In GENERAL, you do not want to give this capability to applications
nor use it willy-nilly.

I am *guessing* this would not use a
special instruction but rather configuration of power management
that would cause hardware/firmware to clean the cache.

There is a sideband command from any master (anywhere) that causes
L3 to get dumped to DRAM over the next refresh interval. It is not
an instruction, and the TLB has to cooperate. A device may initiate
"suspend to DRAM" as well as a CPU (or any other bus master).

Writing back specific data to persistent memory might also
motivate cache block cleaning operations. Perhaps one could
implement such by copying from a cacheable mapping to a
non-cacheable(I/O?) memory?? (I simply remember that Intel added
instructions to write cache lines to persistent memory.)

L3 is the buffer to DRAM. Nothing gets to DRAM without
going through L3 and nothing comes out of DRM that is not also
buffer by L3. So, if 96 cores simultaneously read a line residing in
DRAM, DRAM is read once and 95 cores are serviced through L3. So,
you can't RowHammer based on reading DRAM, either.

If 128 cores read distinct cache lines from the same page quickly
enough to hammer the adjacent pages but not quickly enough to get
DRAM page open hits, this would seem to require relatively
frequent refreshes of adjacent DRAM rows.

DDR 5 has a 64 GB/s transfer rate
128 cache lines (64B) is 8192 bytes
So this takes 1/8 of a millisecond or 125µs.
A DDR5 refresh interval is 3.9µs.

https://www.micron.com/content/dam/micron/global/public/products/white-paper/ddr5-new-features-white-paper.pdf#:~:text=REFRESH%20commands%20are%20issued%20at%20an%20average%20periodic,of%20295ns%20for%20a%2016Gb%20DDR5%20SDRAM%20device.

So one has refreshes in the described situation.

Since the L3/memory controller could see that the DRAM row was
unusually active, it could increase prefetching while the DRAM
row was open and/or queue the accesses longer so that the
hammering frequency was reduced and page open hits would be more
common.

A DRAM Row stays active, commands just CAS-out more data. That is
there is no ROW Hammering--the word line remains asserted while
the sense amplifiers remain asserted with captured data--while
CASs are used to strobe out ore data {subject to refresh}.

The simple statement that L3 would avoid RowHammer by providing
the same cache line to all requesters seemed a bit too simple.

You need to investigate the difference between RAS and CAS for
DRAMs.

Your design may very well handle all the problematic cases,
perhaps even with minimal performance penalties for inadvertent
hammering and logging/notification for questionable activity just
like for error correction (and has been proposed for detected race conditions). I just know that these are hard problems.

--- Synchronet 3.20a-Linux NewsLink 1.114

From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Wed Oct 23 22:38:40 2024

From Newsgroup: comp.arch

On Sun, 9 Jun 2024 2:23:35 +0000, Lawrence D'Oliveiro wrote:

On Sat, 8 Jun 2024 17:37:46 +0000, MitchAlsup1 wrote:

VAX was before common era Hypervisors, do you think VAX could have
supported secure mode and hypervisor with their 4 levels ??

“Virtualization” was bandied about in the 1980s more as an idle, theoretical concept rather than a practical one.

The question was: was the instruction set defined so that code that was designed to run in a privileged mode be run unprivileged, so that any
attempt to do privileged things would be trapped and emulated by the
real privileged code? And there was nothing it could do to discover
it wasn’t running in privileged mode?

My 66000 ISA has this property, and it is used when hypervisors host hypervisors.

On the other hand, there is only 1 privileged instruction which
provides access to 4 separate control register spaces based on
current Core-Stack level.

(Obviously performance was not the issue here, but correctness was.)

For example, the VAX had a MOVPSL instruction that allowed read-only
access to the entire processor status register. Through this,
nonprivileged user-mode code could discover it was running in user mode, which would blow the illusion.

While illustrative, we have entered the realm where processor state
is closer to a cache line in size than a register in size. And the
processor (core) stack of software layers is closer to 4 cache lines
in size.

The Motorola 680x0 family was I think properly virtualizable in this
sense. Or maybe the 68020 and 68030 were, but the 68040 was. I think the Motorola engineers working on the ’040 asked if any customers were interested in preserving the self-virtualization feature, and nobody
seemed to care.

During 020 development and testing, there was a mode whereby each
instruction executed raised every possible exception--this only found
99% of the virtualization problems.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (0 / 10)
Uptime:	119:57:02
Calls:	12,958
Files:	186,574
Messages:	3,265,641

Re: Privilege Levels Below User

Who's Online

System Info