Investigating Instruction Pipelining
Investigating Instruction Pipelining
Investigating Instruction Pipelining
Introduction
Objectives
At the end of this lab you should be able to:
▪ Demonstrate the difference between pipelined and sequential
processing of the CPU instructions
▪ Explain pipeline data dependency and data hazard
▪ Describe a pipeline technique to eliminate data hazards
▪ Demonstrate compiler “loop unrolling” optimization’s benefits for
instruction pipelining
▪ Describe compiler re-arranging instructions to minimize data
dependencies
▪ Show the use of jump-predict table for pipeline optimisation
Basic Theory
Modern CPUs incorporate instruction pipelines which are able to process
different stages of multiple-stage instructions in parallel thus improving the
overall performance of the CPUs. However, most programs include
instructions that do not readily lend themselves to smooth pipelining thus
causing pipeline hazards and effectively reducing the CPU performance. As a
result, CPU pipelines are designed with some tricks up their sleeves for
dealing with these hazards.
1
Exercise 1 – Difference between the sequential and the pipelined execution
of CPU instructions
Enter the following source code, compile it and load in simulator’s memory:
program Ex1
for n = 1 to 20
p = p + 1
next
end
Open the CPU pipeline window by clicking on the SHOW PIPELINE… button in
the CPU simulator’s window. You should now see the Instruction Pipeline
window. This window simulates the behaviour of a CPU pipeline. Here we can
observe the different stages of the pipeline as program instructions are
processed. This pipeline has five stages. The stages are colour-coded as
shown in the key for the “Pipeline Stages”.
List the names of the stages here:
The instructions that are being pipelined are listed on the left side (in white
text boxes). The newest instruction in the pipeline is at the bottom and the
oldest at the top. You’ll see this when you run the instructions. The horizontal
yellowish boxes display the stages of an instruction as it progresses through
the pipeline. At the bottom left corner pipeline statistics are displayed as the
instructions are executed.
Check the box titled Stay on top and make sure No instruction pipeline check
box is selected. In the CPU simulator window bring the speed slider down to
around a reading of 30. Run the program and observe the pipeline. Wait for
the program to complete. Now make a note of the following values
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Next, uncheck the No instruction pipeline checkbox, reset and run the above
program again and wait for it to complete.
Note down your observation on how the pipeline visually behaved differently
2
Now once again make a note of the following values
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Briefly explain why you think there is a difference in the two sets of values:
3
this check the box titled Enable operand forwarding and run the above code
again.
Has the bubble seen in Exercise 2 disappeared (or burst!)?
Make a note of the size of the code generated for Ex4_1 here:
4
Now, load this code in CPU simulator’s memory.
Next, make sure the optimization option Loop Unrolling is selected in
addition to the option Redundant Code optimization. Change the program
name to Ex4_2 and compile it again. Load this code in memory too. So, now
you should have two versions of the code: Ex4_1 without “loop unrolling”
optimization and Ex4_2 with “loop unrolling” optimization.
Make a note of the size of the code generated for Ex4_2 here:
Make sure the pipeline window stays on top. Also make sure the Enable
operand forwarding and Enable jump prediction boxes are all unchecked.
First, select program Ex4_1 from the PROGRAM LIST frame in the CPU
simulator window then click the RESET button. Make sure the speed of
simulation is set at maximum. Now click the RUN button to run program
Ex4_1. Observe the pipeline and when the program is finished make a note of
the following values:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
No of instructions executed
Do the same with program Ex4_2 and make note of the following values:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
No of instructions executed
5
Exercise 5 – Compiler re-arranging instruction sequence to help minimize
data dependencies
The optimization in Exercise 4 is one example of how a modern compiler can
provide support for the CPU pipeline. Another example is when the compiler
re-arranges the code without changing the logic of the code. This is done to
minimize pipeline hazards such as the “data hazard” we studied in Exercise 3.
Here we demonstrate this technique.
Make sure Show dependencies check box is checked and ONLY the
Redundant Code optimization is selected. Enter the following source code,
compile it and load in memory
program Ex5_1
a = 1
b = a
c = 2
end
Copy the CPU instruction sequence generated below (do not include the
instruction addresses):
How do the two sequences differ? Does the change affect the logic of the
program? Briefly explain the rationale for the change:
6
Let’s see if we can measure any improvement introduced by this “out of
sequence execution” method.
First reset and run program Ex5_1 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Next, reset and run program Ex5_2 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Do you see any improvement in program Ex5_2 over program Ex5_1 (express
this in percentage)?
7
Run the program and make a note of the following pipeline stats:
Now, in the pipeline window select the Enable jump prediction check box.
Reset the program and run it again. Make a note of the following pipeline
stats:
Click on the SHOW JUMP TABLE… button. You should see the Jump Predict
Table window showing. This table keeps an entry relevant to each conditional
jump instruction. The information contained has the following fields. Can you
suggest what each field stands for? Enter your suggestions in the table below:
JInstAddr
JTarget
PStat
Count