For something like a bytecode interpreter, the prediction accuracy
of the jump predictor is going to be epically bad.
80% [ertl&gregg03jilp]. With one a non-shared indirect branch per VM instruction implementation (as in typical threaded-codeimplementations), we measured 50%-61% mispredictions for BTB with
mitchalsup@aol.com (MitchAlsup1) writes:
For something like a bytecode interpreter, the prediction accuracy
of the jump predictor is going to be epically bad.
And with "jump predictor", you probably mean the indirect-branch
predictor.
On hardware of up to around 2005 where the indirect branches were at
best predicted by BTBs with 2-bit counters, a "bytecode interpreter"
with one shared indirect branch (as generated by the usual for...{switch...{...}} idiom in C) results in misprediction rates of
80% [ertl&gregg03jilp].
With one a non-shared indirect branch per VM instruction implementation (as in typical threaded-code
implementations), we measured 50%-61% mispredictions for BTB with
2-bit counters in simulation [ertl&gregg03jilp] and similar results on
actual CPUs.
We simulated 2-level history-based indirect branch predictors [ertl&gregg03jilp], and they typically have less than 10%
mispredictions.
On Wed, 13 Nov 2024 8:20:27 +0000, Anton Ertl wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
For something like a bytecode interpreter, the prediction accuracy
of the jump predictor is going to be epically bad.
And with "jump predictor", you probably mean the indirect-branch
predictor.
On hardware of up to around 2005 where the indirect branches were at
best predicted by BTBs with 2-bit counters, a "bytecode interpreter"
with one shared indirect branch (as generated by the usual
for...{switch...{...}} idiom in C) results in misprediction rates of
80% [ertl&gregg03jilp].
That certainly qualifies as bad.
With one a non-shared indirect branch per VM
instruction implementation (as in typical threaded-code
implementations), we measured 50%-61% mispredictions for BTB with
2-bit counters in simulation [ertl&gregg03jilp] and similar results on
actual CPUs.
50% misprediction rate qualifies as bad.
We simulated 2-level history-based indirect branch predictors
[ertl&gregg03jilp], and they typically have less than 10%
mispredictions.
10% misprediction rate qualifies as "not so bad".
On 11/13/2024 1:36 PM, MitchAlsup1 wrote:
On Wed, 13 Nov 2024 8:20:27 +0000, Anton Ertl wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
For something like a bytecode interpreter, the prediction accuracy
of the jump predictor is going to be epically bad.
And with "jump predictor", you probably mean the indirect-branch
predictor.
On hardware of up to around 2005 where the indirect branches were at
best predicted by BTBs with 2-bit counters, a "bytecode interpreter"
with one shared indirect branch (as generated by the usual
for...{switch...{...}} idiom in C) results in misprediction rates of
80% [ertl&gregg03jilp].
That certainly qualifies as bad.
Can note that in my branch predictors, I ended up with a special case
for "predict the opposite of last time" to try to deal with branches
which alternated between being taken and not taken (which could get
greater than a 50% mispredict with the more traditional approaches, like
the 2-bit state model; the transitory states would allow it to adapt to these cases).
A more elaborate state machine could potentially predict simple RLE patterns, but designing the state-transition graph would be harder.
So, say:
000: Weakly Not Taken
001: Strong Not Taken
010: Weakly Transitory Not Taken
011: Strong Transitory Not Taken
100: Weakly Taken
101: Strong Taken
110: Weakly Transitory Taken
111: Strong Transitory Taken
Where the transitory states always flip-flop.
Had done better than a 2-bit state machine (with no transitory states). Adding additional "strength" levels did not help with accuracy in past attempts (actually seemed to make it slightly worse).
Could maybe augment this with a 2-bit periodic field:
00: Period 0, 0000, 1111, 1010, 0101
01: Period 1, 001001, 010010, 100100, -, 110110, 101101, 011011
1z: Period 2, 00010001, ..., 11101110, ...
Though, with the longer periods always assuming a "weak" state, here the pattern also needs to be able to express the phase of the pattern.
Best state transition diagram is less obvious, correct prediction likely jumps to the next stage of the predicted pattern, whereas a mispredict
jumps to a different pattern.
One other possibility being to use 5 or 6 bits of state and throw it at
a genetic algorithm and see what the genetic algorithm comes up with (probably using some variants of the existing state machine as an
initial seed guess).
Say, ranking each possible state machine based on how well it can
predict a series of simple repetitive bit patterns of various periods.
Tempting to consider using a "majority of 8" mutator, as in this case
one may want to limit mutation rate (and "majority of 3" might not be
stable enough, majority of 5 lacks a cheap way to evaluate, but
"majority of 8" allows representing each bit as a byte and using a
lookup table). Well, or maybe "majority of 7".
May try messing with it if anyone is interested.
...
With one a non-shared indirect branch per VM
instruction implementation (as in typical threaded-code
implementations), we measured 50%-61% mispredictions for BTB with
2-bit counters in simulation [ertl&gregg03jilp] and similar results on
actual CPUs.
50% misprediction rate qualifies as bad.
We simulated 2-level history-based indirect branch predictors
[ertl&gregg03jilp], and they typically have less than 10%
mispredictions.
10% misprediction rate qualifies as "not so bad".
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 119:51:15 |
Calls: | 12,958 |
Files: | 186,574 |
Messages: | 3,265,641 |