ELECH473 Th03
ELECH473 Th03
6. Pipeline execution
1. Basic computer architecture concepts
Memory
(exclusive), depending on the control
CPU
logic (seen from CPU) – single ported
memory – only one address/data couple is
accessed at a time (1 clock cycle) CTRL
Memory
control signals to steer their operation
ALU
RF
Ctrl
IR Decoder
Instructions
Address
PC Register A
Register
ALU
Memory Acc
Data
File
Data Data B
Ctrl
b IR Decoder
d
Instructions
Address
PC Register A
Register
ALU
Memory Acc
Data
File
a B
Data Data
RISC16used
ADD in labs ... or x86
Program Counter +1
INSTRUCTION
MEMORY
SRC2 SRC1
WE
SRC2 SRC1
ADD
CONTROL
This figure illustrates the flow-of-control for the ADD instruction. All three ports of the regist
Not that much different!
file are used, and the write-enable bit (WE) is set for the register file. The ALU control signal is
ELEC-H-473 Th03 simple ADD function. 22/65
3. Instruction execution cycle
Address
PC Register A
2
Register
ALU
Memory Acc
Data
File
Data Data B
Ctrl
IR Decoder
Instructions
Address
PC Register A
Register
ALU
Memory Acc
Data
File
Data Data B
Ctrl
3 IR Decoder
1
Instructions
PC
Address 2
Register A
Register
ALU
Memory Acc
Data
File
Data Data B
1
Ctrl
IR Decoder
Instructions
Address
PC Register A
Register
ALU
Memory Acc
Data
File
Data Data B
1
2
1
tCPU = NInst × CPI ×
FClk
where NInst is the number of instructions in the program, CPI is
the number of cycles required to execute one instruction (on
average) and FClk CPU clock frequency
• Having the above in mind:
. RISC → increase NInst , but reduce CPI and FClk since
optimized functions are more complex in silicon
. CISC → do the opposite: they decrease NInst , but increase
FClk , and CPI could be variable for different instructions!
• Which one reduces tCPU will be algorithm, program, compiler &
architecture dependent, so no quick judgments here
F D Ex W
t0 t1 t2 t3 t4
t1 t2 t3 t4
i1 F D Ex W
i0 F D Ex W
t0 t1 t2 t3 t4 t5 t6 t7 Time
• But this doesn’t change things, sum of delays remains the same ...
ELEC-H-473 Th03 48/65
Reducing the critical path delay 2/3
• To really gain for each wire connecting 2 sub-circuits we insert a
Flip-Flop (FF) – synchronous mem. driven by raising Clk edge
• FFs will store the result of the operation of the SmallCircuit1:
D Q
Clk
Small D Q Small
Circuit1 Clk Circuit2
D Q
Clk
Clk
Small D Q Small
Circuit1 Clk Circuit2
D Q
Clk
D Q D Q
Clk Clk
D Q D Q
D Q D Q
Clk Clk
D Q D Q D Q
D Q
Small D Q
Small
D Q
D Q D Q D Q
D Q
Small D Q
Small
D Q
i3 F D Ex W
i2 F D Ex W
i1 F D Ex W
i0 F D Ex W
t0 t1 t2 t3 t4 t5 Time
t0 t1 t2 t3 t4 t5 t6 t7 Time
Csibm = N × n
. for a n-stage pipeline CPU:
Cpipe = n + N − 1
(1) Place the tools and the men in the sequence of the operation so that each
component part shall travel the least possible distance while in the process of finishing.
(2) Use work slides or some other form of carrier so that when a workman completes
his operation, he drops the part always in the same place—which place must always
be the most convenient place to his hand—and if possible have gravity carry the part to
the next workman for his own.
(3) Use sliding assembling lines by which the parts to be assembled are delivered at
convenient distances. 1913
DECODE
WRITE
EXECUTE