Processor Organization & Instruction Cycle

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

CHAPTER 5

Processor organization & Instruction


Cycle

1
Instruction Sets Review

Q. Consider the following assembly code


Memory
read:
199 235
(a) MOV R1, 200 (R1200) 200 420
(b) MOV R2,[R1] 201 330
202 0
(c) MOV R3,[R1+1]

(d) JMP calculate 300 15
calculate: CPU
R1 0
(e) ADD R3, R2 (R3R3+R2)
R2 0
(f) MOV [300], R3 R3 0
2
Instruction Sets Review

1. What types of instructions are used in the program?


2. What addressing modes are used in the program?
3. What will be the values of R1, R2 and R3 after the
execution of the program?

4. Assume a processor has 12 registers (16-bits each) and


an instruction set with 30 instructions. Show possible
instruction formats for the following instructions (How
many bits is required for the instruction fields?)
a. MOV R2,[R1]
b. ADD R3,R2,R1

3
Processor Organization

What is a processor (CPU) required to do?


Fetch and execute instructions
PC, IR Fetch Instruction From memory

Decoding Interpret (decode)


circuit Instruction

MAR, MBR [Fetch Data] From memory, I/O

ALU [Process Data]

MAR, MBR [Write Data] To memory, I/O

4
Processor Organizationcntd

CPU contains:
Registers
Internal processor memory
ALU
performs arithmetic and logic operations (processes data)
Operates only on data in registers
ALU with its inputs and outputs is termed as a data path
Control Unit
Decodes instructions, generates control signals to control
the processor
Internal Bus
Interconnects CPU parts

5
Register Organization

Types of registers
User-visible registers
They can be directly accessed (read or written to) by
programmers (instructions)
Used to minimize memory reference
Control registers
Used by control unit to control operation of the processor
Status (flag) registers
Indicate the current state (status) of the processor

No clean separation of registers into these categories (depends


on the processor)
6
User-visible registers

General purpose registers


Can be used for a variety of functions
(hold data, used for addressing)
Data registers
Hold only data
e.g. Accumulator (working) register used to store
intermediate ALU results
Address registers
Only used for addressing
e.g. Segment registers (SS, DS, CS and ES in x86)
Index registers (SI, DI in x86)
Stack pointer
7
Control registers

Program Counter (PC): Contains address of next instruction


to be fetched
Instruction Register (IR): Temporarily holds most recently
fetched instruction
Memory Address Register (MAR): Specifies the address in
memory of the word to be written from or read into the MBR
Memory Buffer Register (MBR): Contains a word to be stored
in memory or is used to receive a word from memory

8
Status registers

e.g. Flag register (x86), CPSR(ARM)

Flags : Indicate the occurrence of an event in the CPU


Carry flag (CF), Zero flag (ZF), Sign flag (SF), Interrupt
flag (IF), Overflow flag (OF)

Used by branch (jump) instructions and interrupts


(CPU checks the appropriate flags when a conditional branch
instruction is encountered or when interrupt is enabled)

9
Instruction Cycle
e.g. MOV R1, [200] 100 MOV R1, [200]
Memory
200 10

CPU PC 100 Address Bus 100 MOV R1, [200] Memory


Fetch
Cycle
CPU IR MOV R1, [200] Data Bus 100 MOV R1, [200] Memory

CPU Decoder MAR 200 CPU

CPU MAR 200 Address Bus 200 10 Execute


Fetch Memory
Operand Cycle
CPU MBR 10 Data Bus 200 10 Memory

CPU MBR 10 R1 10 CPU

10
Instruction Cycle with Interrupt
Process Interrupt
Fetch Instruction
Store PC in
Interpret (decode)
memory (stack)
Instruction

Load address of
[Fetch Data]
ISR on PC

[Process Data] Execute Interrupt


routine (ISR)
[Write Data]
Restore PC from
Interrupt No memory (stack)
?
Yes
Process Interrupt
11
Instruction Pipelining
In this lecture:

Pipelining
Pipelining hazards
Resource hazards
Data hazards
Control hazards

12
Review


= .
CPI: Average clock cycle per instruction

e.g. Suppose a program has 10 instructions with the following


relationship between instructions and clock cycles required
to execute each instruction
No. of Clock The CPI for this program is given by:
instructions Cycles 41 + 32 + 33
4 1 10
= 1.9
3 2 (10 instructions with 19 clock cycles)
3 3
13
Review

To reduced execution time:


Reduce clock period (Increase clock frequency)
(Improve response time)

Reduce CPI (execute more instructions with the


same number of clock cycles)
(Improve throughput)
One approach to reduce CPI is to overlap execution of
instructions (pipelining)

14
Pipelining

Instruction cycle has several stages (fetch, decode,


execute)
Let instructions execute one after the other
(assume one clock cycle per stage (3 clock cycles per instruction) )

Clk
Instruction 1 Fetch Decode Execute
Instruction 2 Fetch Decode Execute
Instruction 3

9 clock cycles for 3 instructions, 3n clock cycles for n instructions


15
Pipeliningcntd

Let the instruction stages overlap


When instruction2 is being decoded, instruction1
is fetched and so on

Clk
Instruction 1 Fetch Decode Execute
Instruction 2 Fetch Decode Execute
Instruction 3 Fetch Decode Execute

5 clock cycles for 3 instructions (CPI is reduced)


16
Pipeliningcntd

Additional hardware is required for a pipelined


processor (pipeline registers between the stages)

PC FI/DI DI/EI
R R
e e
g g
Fetch i Decode i Execute
s s
(FI) t
(DI) t
(EI)
e e
r r
s s

17
More stages

In practice the three stages may take different times (clock


cycles): execution may take more time than decoding. This
would reduce the effectiveness of the pipeline

10ns 10ns 30ns

Fetch Decode Execute

Currently decoded instruction has to wait until previous


instruction is executed

Throughput is limited by the slowest stage

18
More stagescntd

If we have more stages:


The stages will be of more nearly equal duration
Program execution time is reduced more
e.g. 5-stage pipeline

10ns 10ns 10ns 10ns 10ns

Fetch Decode Fetch Execute Write


Instr. Instr. Operands Instr. Operand
(FI) (DI) (FO) (EI) (WO)

Operands can be fetched from memory or from registers


Operand can be written to memory or to registers
19
5-stage Pipeline

Assume:
All instructions require all the five stages
Equal duration for each stage
Time

I1 FI DI FO EI WO
I2 FI DI FO EI WO
I3 FI DI FO EI WO

Assuming one clock cycle per stage, 3 instructions


would require 7 clock cycles
20
Pipeline Performance

Assume an instruction goes through k stages and each stage has


a duration of
Without pipelining, execution time for n instructions (T) will be:
=
With pipelining
, = + 1

e.g. For =1, k=5, n=10


= 5 10 = 50
, = 5 + 10 1 = 14
50
Speed up factor of = 3.57
14
With pipelining the program is executed 3.57 times faster than
without pipelining
21
Pipeline Performancecntd


Speed up factor ( ) = =
, + 1

22
Pipeline Hazards

Some things could go wrong on real pipelined


executions
A pipeline hazard occurs when the pipeline, or some
portion of the pipeline, must stall (be idle) because
conditions do not permit continued execution

Pipeline hazards:
Resource (Structural) hazards
Data hazards
Control hazards

23
Resource Hazards

Occur when two or more instructions that are already in


the pipeline need the same resource
e.g. Memory access
Consider a 5-stage pipeline (each stage takes one cycle)
Time
Memory 1 2 3 4 5 6 7

Address
Instructions I1 FI DI FO EI WO
CPU I2 FI DI FO EI WO
Data Data
I3 FI DI FO EI WO

If operand is to be fetched from memory at stage 3 of the first instruction, a


resource hazard occurs while the processor tries to fetch third instruction
(both operations need to use the same bus)
24
Resource Hazardscntd

Therefore the fetch instruction stage of the pipeline must stall (be
idle) for one cycle (one more clock cycle required to execute the 3
instructions)
Time
1 2 3 4 5 6 7 8
I1 FI DI FO EI WO Assume all other
I2 FI DI FO EI WO operands are in
registers
I3 Idle FI DI FO EI WO

Another solution for resource hazards is to increase available


resources (e.g. Have separate data and instruction memory
with separate buses)

25
Data Hazards

Occur when one instruction depends on data value


produced by a preceding instruction
e.g.
R1 0
ADD R1,R2 (R1=1) R2 1
ADD R3,R1 (R3=3) R3 2
Wrong
value of R1 Time
is read 1 2 3 4 5 6 7
ADD R1,R2 FI DI FO EI WO
(R1=0) (R1=1)
ADD R3,R1 FI DI FO EI WO
(R1=0)
FI DI FO EI WO

26
Data Hazardscntd

Such hazard is termed as read after write (RAW) hazard since


current instruction must wait to read data until after a previous
instruction writes the correct data

The hazard occurs if read takes place before the write operation is
complete
Other types of data hazards:
Write after read (WAR)
Write after write (WAW)
Approaches for handling data hazards:
Avoid hazard
Detect and stall
Detect and forward

27
Data Hazardscntd

Write after Read (WAR) hazard


The hazard occurs if write takes place before a read operation is complete
Next instruction modifies (writes) operand before current instruction uses
(reads) the operand (Current instruction reads wrong value)
e.g. Add R4,R1,R3 (R4=R1+R3)
Add R3,R1,R2 (R3=R1+R2) If this happens first
WAR hazard occurs

Write after Write (WAW) hazard


Next instruction modifies (writes) operand before current instruction
modifies (writes) the operand (previous instruction reads wrong value)
Current instruction modifies operand before previous instruction uses the
operand (previous instruction reads wrong value)
These hazards occur with multiple pipelines (superscalar processors)

28
Data Hazardscntd

Avoid hazard
Make sure there are no hazards in the code
Put no operation instructions between dependent instructions
(programmer or compiler)
ADD R1,R2
NOP (no operation)
ADD R3,R1
Detect and stall (wait until the write operation is over)
Time
1 2 3 4 5 6 7
ADD R1,R2 FI DI FO EI WO
(R1=0) (R1=1)
ADD R3,R1 FI DI idle idle FO EI
(R1=1)
FI DI FO
29
Control Hazards

Arise from the need to make a decision based on the


results of one instruction while others are executing
Occur with branch instructions
PC=200
e.g. Time

100: JMP 200 FI DI FO EI WO


JMP 200
Add R1,R2 Add R1,R2 FI DI FO EI

200:
SUB R1,R2
Wrong
instruction is
fetched

30
Control Hazardscntd

Approaches for handling control hazards


Detect and stall
Delayed branch
Branch prediction

Zelalem Birhanu, AAiT 31

You might also like