• Re: Tonight's tradeoff

    From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Sun Apr 28 15:27:41 2024
    From Newsgroup: comp.arch

    George Neuner <gneuner2@comcast.net> writes:

    On Mon, 11 Mar 2024 09:29:57 -0700, Tim Rentsch
    <tr.17687@z991.linuxsc.com> wrote:

    George Neuner <gneuner2@comcast.net> writes:

    On Fri, 8 Mar 2024 20:26:08 -0500, Robert Finch <robfi680@gmail.com>
    wrote:

    I plan on having garbage collection as part of the OS. There
    is a shared hardware-card table involved.

    What kind?
    [Actually "kind" is the wrong word because any non-toy, real
    world GC will need to employ a combination of techniques. So
    the question really should be "in what major class is your GC"?]

    Problem is - whatever you choose - it will be wrong and have bad
    performance for some important class of GC'd applications.

    I'm curious to know what you consider to be the different kinds,
    or classes, of GC, and the same question for applications.

    Certainly, for any given GC implementation, one can construct an
    application that does poorly with respect to that GC, but that
    doesn't make the constructed application a "class". For the
    statement to have meaningful content there needs to be some kind
    of identification of what are the classes of GCs, and what are
    the classes of applications.

    Feeling mathematical are we?

    If you're trying to say I make an effort to be accurate and
    precise in my writing, I plead guilty as charged.

    Every application contains delineated portions of its overall
    allocation profile which correspond closely to portions of the
    profiles of other applications.

    If a given profile performs poorly under a given GC, it is
    reasonable to infer that other applications having corresponding
    profiles also will perform poorly while those profiles are in
    force.

    An empty, circular observation. Very disappointing.

    That said ...



    GC systems - including their associated allocator(s) - are categorized (better word?) by their behavior. Unfortunately, behavior is
    described by a complex set of implementation choices.

    Understand that real world GC typically implement more than one
    algorithm, and often the algorithms are hybridized - derived from and relatable to published algorithms, but having unique mix of function
    that won't be found "as is" in any search of literature. [In truth,
    GC literature tends to leave a lot as exercise for the reader.]

    GC behavior often is adaptive, reacting to run time conditions: e.g.,
    based on memory fragmentation it could shift between non-moving
    mark/sweep and moving mark/compact. It may also employ differing
    algorithms simultaneously in different spaces, such as being
    conservative in stacks while being precise in dynamic heaps, or being stop-world in thread private heaps while being concurrent or parallel
    in shared heaps. Etc.


    Concurrent GC (aka incremental) runs as a co-routine with the mutator.
    These systems are distinguished by how they are triggered to run, and
    what bounds may be placed on their execution time. There are
    concurrent systems having completely deterministic operation [their
    actual execution time, of course, may depend on factors beyond the
    GC's control, such as multitasking, caching or paging.]

    Parallel GC may be both prioritized and scheduled. These systems may
    offer some guarantees about the percentage of (application) time given
    to collector vs mutator(s).


    Major choices:

    - precise or conservative?
    - moving or non-moving?
    - tracing (marking)?
    - copying / compacting?
    - stop-world, concurrent, or parallel?

    - single or multiple spaces?
    - semispaces?
    - generational?

    Minor choices:

    - software-only or hardware (MMU) assisted?
    - snapshot at beginning?

    - bump or block allocation?
    - allocation color?

    - free blocks coalesced? {if not compacting}

    - multiple mutators?
    - mutation color?
    - writable shared objects?
    - FROM-space mutation?
    - finalization?


    Note that all of these represent free dimensions in design. As
    mentioned above, any particular system may implement multiple
    collection algorithms each embodying a different set of choices.

    I'm familiar with many or perhaps most of the variations
    and techniques used in garbage collection. That isn't
    what I was asking about.

    You may wonder why I didn't mention "sweeping" ... essentially it is
    because sequential scan is more an implementation detail than a
    technique. Although "mark/sweep" is a well established technique, it
    is the marking (tracing) that really defines it. Then too, modern
    collectors often are neither mark/sweep nor copying as presented in textbooks: e.g., there are systems that mark and copy, systems that
    sweep and copy (without marking), and "copying" systems in which
    copies are logical and nothing actually changes address.

    Aside: all GC can be considered to use [logical] semispaces because
    all have the notion of segregated FROM-space and TO-space during
    collection. True semispaces are a set of (address) contiguous spaces
    - not necessarily equally sized - which are rotated as targets for new allocation. True semispaces do imply physical copying [but see the
    Treadmill for an example of "non-moving copy" using logical
    semispaces].



    So what do I consider to be the "kind" of a GC?

    The choices above pretty much define the extent of the design space
    [but note I did intentionally leave out reference counting]. However,
    the first 8 choices are structural, whereas the rest specify important characteristics but don't affect structure.

    A particular "kind" might be, e.g.,
    "precise, generational, multispace, non-moving, concurrent
    tracer".
    and so on.

    In effect what you are saying is that if we list all the possible
    attributes that a GC implementation might have, we can charactrize
    what kind of GC it is by giving its value for each attribute. Not
    really a helpful statement.

    I'm guessing this probably didn't really answer your question,

    Your comments didn't address either of my questions, nor as best
    I can tell even make an effort to do so.

    but it was fun to write. ;-)

    I see. Next time I'll know better.
    --- Synchronet 3.20a-Linux NewsLink 1.114