05 - Pipelining - Branch Prediction

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Pipelining - Branch

Prediction
CS/COE 1541 (term 2174)
Jarrett Billingsley
Class Announcements
● Quizzes/exams reweighted again (last time FOR REAL)
o Quizzes 5% each (15% total)
o Exams 15% each (45% total)
● Short lecture, then quiz
o We'll talk about branch prediction today
o Which will be on the quiz :)
● First exam next week
o Wednesday, February 1st
o There will be a study guide – probably Wednesday?
● No homework for this week since exam is next week!

1/23/2017 CS/COE 1541 term 2174 2


But first... Improving Branch
Penalties

1/23/2017 CS/COE 1541 term 2174 3


The problem
● Branches are based on comparisons. Which are done by...
● The ALU in the EX phase. Which means it take how many cycles to
determine if a branch is taken?
● Three. So how many possibly-wrongly-fetched instructions are in
the pipeline behind the branch, which may need to be flushed?
● Two. But what if our pipeline were longer? What if there were 4
decode phases and 3 execute phases?
o ugh.
● Therefore, we want to determine the branch direction (whether or
not it's taken) as early as possible.
o What are some ways we could improve this situation?

1/23/2017 CS/COE 1541 term 2174 4


The solution: MORE SINKS!! (hardware)
● If we carefully design our instruction set (like the MIPS designers
did), we can determine the branch direction early in decoding!
● In what phase do we read the registers to be compared?
o The ID phase.
● So what hardware can we put in the ID phase to let us determine
the branch direction?
o A comparator!
● But... what's the downside to adding more hardware?
o Starting to notice a pattern, huh.
● So if we mispredict a branch in ID, how many instructions need to
be flushed?
o Just one – the one in IF!

1/23/2017 CS/COE 1541 term 2174 5


Not all sunshine and rainbows
● Uh oh. Remember data hazards?
Time 0 1 2 3 4 5 6 7

sub t0,t1,t2 IF ID EX MEM WB

beq t0,$0,end IF ID ID EX MEM WB


WAIT!

● Now we've added a forwarding path from EX to ID.


● And as you can imagine, we need one from MEM to ID too.
● Doing things in more stages means more forwarding paths.
● Something to keep in mind as you add more pipeline stages!

1/11/2017 CS/COE 1541 term 2174 6


Static Branch Prediction

1/23/2017 CS/COE 1541 term 2174 7


The compiler can help
● In the following loop, what can you say about the blt instruction?
for(s0 = 0 .. 100) li s0, 0
print(s0); top:
move a0, s0
printf("done"); jal print
addi s0, s0, 1
In fact, the original bltl s0, 100, top
version of MIPS had a
special kind of branch la a0, done_msg
for this: branch likely jal printf
(high probability).

1/11/2017 CS/COE 1541 term 2174 8


The old ways
● When MIPS was first designed, the idea was that the compiler
could do the instruction scheduling/branch prediction in advance.
● The regular branch instructions assumed the branch was NOT
taken, so the CPU would keep fetching instructions after them.
● The branch likely instructions assumed the branch WAS taken, so
the CPU would start fetching instructions at the branch target.
● And this can be pretty effective for some control structures!
o Unfortunately, not effective enough... branch likely instructions
are no longer part of the MIPS ISA.
o Many other "compiler-centric" features of MIPS have lost
relevance over the years as well, such as inserting NOP
instructions instead of forwarding/stalling dynamically.

1/23/2017 CS/COE 1541 term 2174 9


The CPU knows best
● Ultimately, for most programs, the compiler cannot statically
predict their behavior to an acceptable degree.
o Solving the halting problem yadda yadda...
● The CPU can dynamically analyze program behavior at runtime,
and adapt gracefully.
o Program behavior can change with user input after all!
● Implementing this analysis in hardware means the CPU architecture
can change drastically without changing compilers and old code.
o It can also allow unoptimized programs to run quickly.
● We'll be learning about several adaptive execution schemes,
starting with...

1/23/2017 CS/COE 1541 term 2174 10


Dynamic Branch Prediction

1/23/2017 CS/COE 1541 term 2174 11


The problem
● Some branches are taken 99% of the time, some 1% of the time,
some always, some never, some 50% of the time, some randomly...
● What we need is hardware that can keep track of:
o Where branch instructions are in the program
o The probability that each branch is taken
o The branch target address of each branch
Branch PC Probability Branch Target Well let's try to
0x007FA004 32% 0x007FA03C
turn this into
hardware...
0x007FA058 94% 0x007FA040
0x007FC380 88% 0x007FC398
0x007FC60C 12% 0x007FC704

1/23/2017 CS/COE 1541 term 2174 12


Compromises
● How many branch instructions might there be in your program?
o How about in all programs running + the operating system?
● So how many entries should our prediction table have?
o One of those "try it and see" things – processor simulation is very
useful in these cases. Law of diminishing returns.
● If we have n entries in our table, how can we quickly look up
addresses in the table? (We're using this on every instruction!)
o Lots of comparators... lots of hardware.
o Hashing... but we could get false positives.
● If we predict incorrectly, what happens?
o Program runs a little slower, but nothing catastrophic.

1/23/2017 CS/COE 1541 term 2174 13


The Branch Target Buffer (BTB)

0 # Branch PC Pred. Branch Target


Hash
0 0x007FA004 NT T? 0x007FA03C
PC: 1 0x007FC60C NT 0x007FC704
==?
0x007FA004 2 0x007FA058 T 0x007FA040
entry = Hash(PC) 3 ... NT ...
if(entry.PC == PC 4 0x007FC380 T 0x007FC398
&& entry.pred == T) 5 ... T ...
NextPC = entry.target
6 ... NT ...
else
NextPC = PC+4 7 ... NT ...

This is to avoid false positives on non-


branch (or wrong branch) instructions!
1/23/2017 CS/COE 1541 term 2174 14
When to read? When to write?
● Ideally, we'd like to start fetching instructions from the "correct"
place during the branch instruction's ID phase. When should we
check the BTB then?
o During IF! What but how does it even know it's a branch—
o Remember that the BTB checks that the instruction PC matches
the BTB entry, so it MUST be a branch instruction.*
● When do we write to the BTB? Well when do we have all the
information needed to fill in a BTB entry?
o After ID, when the branch target and direction have both been
computed. (nice optimization!)
o This also handles adding new entries – only written on branches.

*unless we have an incoherent instruction cache and dynamic code modification ;)

1/23/2017 CS/COE 1541 term 2174 15


Nobody's perfect
● What happens if, at the end of ID, we find our prediction is wrong?
o Flush and start fetching from correct PC.
o But now the BTB is updated with new info as well. (It's updated
even if we predicted correctly, too.)
● Let's make our predictions more accurate. The scheme we showed
here has only a single bit to predict taken/not taken.
o It's.. not much information to go on.
o But adding more bits means more hardware.
o Let's strike a balance.

1/23/2017 CS/COE 1541 term 2174 16


2-bit branch predictor
● We can use 2 bits with a state machine to make better predictions.

Green arrows = taken Weakly Strongly


Red arrows = not taken Taken Taken

Strongly Weakly
Not Taken Not Taken

● The hysteresis (have to make two mistakes before switching


decisions) allows for intermittent changes in branch behavior.

1/23/2017 CS/COE 1541 term 2174 17


3 bits? 4? 10?
● Is it worth adding more bits to the prediction probability?
● Empirically... not really. 2 bits + large number of BTB entries gets
you ~93% accuracy!
● More bits don't help because branch behavior can be complex.
● What does help with prediction accuracy is more complex branch
prediction methods.
o Two-level adaptive predictors...
o Tournament predictors...
o Hybrid predictors...
o Loop detection...
o Return stack buffers...
o Oh and then there are indirect jumps (jr) which can be a whole
different kind of pain to deal with.

1/23/2017 CS/COE 1541 term 2174 18


2-level adaptive predictors
● A common technique today: each entry in the BTB has multiple 2-
bit counters, selected among by using a branch history.
# Branch PC History Branch Target 01

0 0x007FA004 010 0x007FA03C 10 Every entry in


the BTB has its
00
own set of 8
00 counters!
● Every time the branch is taken/not taken, a 1/0 is
shifted into the history on the right side. 01
● This way, the history keeps track of the last three 11
times we encountered this branch. 11
● This kind of predictor can reach 97% accuracy! 00

1/23/2017 CS/COE 1541 term 2174 19


Return stack buffers
● A function return is a special kind of indirect branch. jr $ra on MIPS
or ret on x86 both get the address from somewhere else.
● Since functions return to where they were called virtually every
time, it makes sense to cache the return address on function calls.
When we encounter 40CC00
4AB33C jal someFunc the jal, push the 46280C
4AB340 beq v0, $0, blah return address.
4AB108
... When we encounter
the jr $ra, pop the 000000
4AB340
someFunc:
return address. Easy! 000000
...
Stack overflows aren't an 000000
jr $ra
issue. This is just a prediction, 000000
after all.
000000

1/23/2017 CS/COE 1541 term 2174 20

You might also like