• MMU using base and bound

    From Robert Finch@robfi680@gmail.com to comp.arch on Thu Apr 10 03:02:41 2025
    From Newsgroup: comp.arch

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Thu Apr 10 15:48:26 2025
    From Newsgroup: comp.arch

    On Thu, 10 Apr 2025 7:02:41 +0000, Robert Finch wrote:

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    Base and Bounds is not compatible with the feature/functionality we see
    in modern applications; things such as::

    a) mmap()
    b) dynamically linked libraries
    c) Address Space Layout Randomization
    d) JITTed binaries

    At least until there are enough base and bounds registers, and when
    there
    are enough of these, then the B&B MMU smells just like a SW programmable TLB--and at this point--either go all the way or don't start down that
    path.

    Also note:: at SATA data transfer rates, activating a 20 GB application
    takes multiple seconds on the disk drive itself, something that only
    suffers a dozen milliseconds with typical paging.
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Al Kossow@aek@bitsavers.org to comp.arch on Thu Apr 10 09:30:18 2025
    From Newsgroup: comp.arch

    On 4/10/25 12:02 AM, Robert Finch wrote:
    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index for selecting the set in the process control block. Defaulting set
    zero for flat addressing.


    Congratulations, you've reinvented the SUN / CADR segment/page MMU made from two sets of SRAMS

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Thu Apr 10 14:23:10 2025
    From Newsgroup: comp.arch

    On 2025-04-10 12:30 p.m., Al Kossow wrote:
    On 4/10/25 12:02 AM, Robert Finch wrote:
    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied
    before translating with the page table. Or to reduce the number of
    page tables using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an
    index for selecting the set in the process control block. Defaulting
    set zero for flat addressing.


    Congratulations, you've reinvented the SUN / CADR segment/page MMU made
    from two sets of SRAMS

    It is kind of similar in concept. I tried looking it up, and found the
    m68k MMU. I did not give enough details of my MMU. The base/bound
    registers feed the TLB on a TLB miss. The TLB then feeds the paging. The registers may not be that useful, but they are also low cost. They add
    only about 1% to the size of the MMU.

    The MMU can handle multiple outstanding misses, which it queues for the
    page table walker.

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Apr 10 18:35:06 2025
    From Newsgroup: comp.arch

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 10 Apr 2025 7:02:41 +0000, Robert Finch wrote:

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    Base and Bounds is not compatible with the feature/functionality we see
    in modern applications; things such as::

    a) mmap()
    b) dynamically linked libraries
    c) Address Space Layout Randomization
    d) JITTed binaries

    At least until there are enough base and bounds registers, and when
    there
    are enough of these, then the B&B MMU smells just like a SW programmable >TLB--and at this point--either go all the way or don't start down that
    path.

    Also note:: at SATA data transfer rates, activating a 20 GB application
    takes multiple seconds on the disk drive itself, something that only
    suffers a dozen milliseconds with typical paging.

    With a PCIe NVMe adapter, the transfer rates are orders of magnitude
    shorter than SATA; NAND speeds rather than speeds limited by the
    rotation rate of spinning rust (although a large on-drive RAM cache
    does help SATA datarates in the streaming cases).
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Thu Apr 10 12:26:12 2025
    From Newsgroup: comp.arch

    On 4/10/2025 12:02 AM, Robert Finch wrote:
    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before translating with the page table. Or to reduce the number of page tables using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.


    Separating the protection aspects (base and bound) from the real memory management aspects (paging) has advantages and disadvantages. Al
    mentioned one implementation (with which I am not familiar), but the
    Mill also does that (though currently at least, only in
    simulation/emulation) and, (out of historical compatibility
    requirements) the Unisys 2200 series (currently emulated but there were dedicated hardware implementations)

    There is some documentation of the Mill online, and there is complete documentation of the Unisys implementation online. Note if you start to
    read the Unisys documentation, they call the memory associated with a particular base and bound, a "bank"
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Thu Apr 10 12:31:40 2025
    From Newsgroup: comp.arch

    On 4/10/2025 8:48 AM, MitchAlsup1 wrote:
    On Thu, 10 Apr 2025 7:02:41 +0000, Robert Finch wrote:

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    Base and Bounds is not compatible with the feature/functionality we see
    in modern applications; things such as::

    a) mmap()
    b) dynamically linked libraries
    c) Address Space Layout Randomization
    d) JITTed binaries

    At least until there are enough base and bounds registers, and when
    there
    are enough of these, then the B&B MMU smells just like a SW programmable TLB--and at this point--either go all the way or don't start down that
    path.

    Also note:: at SATA data transfer rates, activating a 20 GB application
    takes multiple seconds on the disk drive itself, something that only
    suffers a dozen milliseconds with typical paging.

    You seem the be talking about B&B as the only mechanism. In that case,
    your criticisms are (mostly) valid. But if, as I think Robert is taling about, you have a B&B implementation "on top of" a paging
    implementation, most of those problems go away (albeit introducing others).
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Thu Apr 10 20:19:07 2025
    From Newsgroup: comp.arch

    On Thu, 10 Apr 2025 19:31:40 +0000, Stephen Fuld wrote:

    On 4/10/2025 8:48 AM, MitchAlsup1 wrote:
    On Thu, 10 Apr 2025 7:02:41 +0000, Robert Finch wrote:

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    Base and Bounds is not compatible with the feature/functionality we see
    in modern applications; things such as::

    a) mmap()
    b) dynamically linked libraries
    c) Address Space Layout Randomization
    d) JITTed binaries

    At least until there are enough base and bounds registers, and when
    there
    are enough of these, then the B&B MMU smells just like a SW programmable
    TLB--and at this point--either go all the way or don't start down that
    path.

    Also note:: at SATA data transfer rates, activating a 20 GB application
    takes multiple seconds on the disk drive itself, something that only
    suffers a dozen milliseconds with typical paging.

    You seem the be talking about B&B as the only mechanism.

    No,

    In that case,
    your criticisms are (mostly) valid. But if, as I think Robert is taling about, you have a B&B implementation "on top of" a paging
    implementation, most of those problems go away (albeit introducing
    others).

    I saw that and took that into consideration. Way back in 1980 one could
    get away with a single base+bounds per process. My argument above is
    that
    that time has past mostly due to new things we want applications to do
    these days.

    Secondarily, once you get a base+bounds for every kind of memory region
    you are closing in on the number of PTEs in a minimal TLB. Which is why
    I suggested to just use paging under paging (GuestOS vis HyperVisor).

    Consider that every open file is mapped into memory areas that allow
    for the files to grow (significantly). I just done see how a single
    application with 20 open files each of which can grow can use a dew
    number of base+bounds registers is any realistic manner.



    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Thu Apr 10 16:46:02 2025
    From Newsgroup: comp.arch

    On 2025-04-10 3:31 p.m., Stephen Fuld wrote:
    On 4/10/2025 8:48 AM, MitchAlsup1 wrote:
    On Thu, 10 Apr 2025 7:02:41 +0000, Robert Finch wrote:

    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.

    Base and Bounds is not compatible with the feature/functionality we see
    in modern applications; things such as::

    a) mmap()
    b) dynamically linked libraries
    c) Address Space Layout Randomization
    d) JITTed binaries

    At least until there are enough base and bounds registers, and when
    there
    are enough of these, then the B&B MMU smells just like a SW programmable
    TLB--and at this point--either go all the way or don't start down that
    path.

    Also note:: at SATA data transfer rates, activating a 20 GB application
    takes multiple seconds on the disk drive itself, something that only
    suffers a dozen milliseconds with typical paging.

    You seem the be talking about B&B as the only mechanism.  In that case, your criticisms are (mostly) valid.  But if, as I think Robert is taling about, you have a B&B implementation "on top of" a paging
    implementation, most of those problems go away (albeit introducing others).



    Yep, I was talking about extending a TLB based MMU. I am not sure what software I will running on it. I wonder if with a small memory system
    having just a single page table might be advantageous. I may want to
    remove the paging / TLB for a small system, it is a good chunk of LUTs
    and BRAMs, but I want to keep the same MMU interface.

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Apr 10 23:36:41 2025
    From Newsgroup: comp.arch

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 4/10/2025 12:02 AM, Robert Finch wrote:
    Working on the MMU component tonight.

    Just realized that it is possible to have only a single hierarchical
    page table in the system if base and bound addressing is applied before
    translating with the page table. Or to reduce the number of page tables
    using the base/bound addressing.

    Building base/bound registers into the MMU, pondering having multiple
    sets of registers to reduce the amount of register swapping. A single
    BRAM should be enough for 32 sets of 16 registers. Could store an index
    for selecting the set in the process control block. Defaulting set zero
    for flat addressing.


    Separating the protection aspects (base and bound) from the real memory >management aspects (paging) has advantages and disadvantages. Al
    mentioned one implementation (with which I am not familiar), but the
    Mill also does that (though currently at least, only in >simulation/emulation) and, (out of historical compatibility
    requirements) the Unisys 2200 series (currently emulated but there were >dedicated hardware implementations)

    The Unisys V and A series also did that. Ultimately, it suffers
    from all the flaws of segmentation.


    There is some documentation of the Mill online, and there is complete >documentation of the Unisys implementation online. Note if you start to >read the Unisys documentation, they call the memory associated with a >particular base and bound, a "bank"

    V-series had Environments which
    contained up to 100 variable sized memory areas, any eight available simultaneously (it was a backward-compatable enhancement to the original single-segment B3500) in the application address space. Several instructions had direct access to any of the 100 memory areas (bulk xfer instructions
    for the most part).
    --- Synchronet 3.20c-Linux NewsLink 1.2