• Re: Interrupts in OoO

    From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 13 15:20:37 2024
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    That's difficult with a circular buffer for the instruction queue/rob
    as you can't edit the order.

    What's wrong with performing an asynchronous interrupt at the ROB
    level rather than inserting it at the decoder? Just stop commiting at
    some point, record this at the interrupt return address and start
    decoding the interrupt code.

    That's worse than a pipeline drain because you toss things you already >invested in, by fetch, decode, rename, schedule, and possibly execute.

    The question is what you want to optimize.

    Design simplicity? I think my approach wins here, too.
    Interrupt response latency? Use what I propose.
    Maximum throughput? Then follow your approach.

    The throughput issue is only relevant if you have lots of interrupts.

    The way I saw it, the core continues to execute its current stream while
    it prefetches the handler prologue into I$L1, then loads its fetch buffer.
    At that point fetch injects a special INT_START uOp into the instruction >stream and switches to the handler. The INT_START uOp travels down the >pipeline following right behind the tail of the original stream.
    If none of the flow disrupting events occur to the original stream then
    the handler just tucks in behind it. When INT_START hits retire then core >send the commit signal to the interrupt controller to confirm the hand-off.

    The interrupt handler should start executing at the same time as it would >otherwise.

    Architecturally, an instruction is only executed when it
    commits/retires. Only then do I/O devices or other CPUs see any
    stores or I/O operations performed in the interrupt handler. With
    your approach, if there are long-latency instructions in the pipeline
    (say, dependence chains containing multiple cache misses) when the
    interrupt strikes, the instructions in your interrupt handler will
    have to wait until the preceding instructions retire, which can take
    thousands of cycles in the worst case.

    By contrast, if you treat an interrupt like a branch misprediction and
    cancel all the speculative work, the instructions of the interrupt
    handler go through the engine as fast as possible, and you get the
    minimum response latency possible in the engine.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.20a-Linux NewsLink 1.114