COA Midsem

Computer Organization and Architecture
Lecture1: Introduction to
computers
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Introduction
• Computers have become part and parcel of our daily lives.
• They are everywhere
• Laptops, tablets, mobile phones, intelligent appliances.
2 • It is required to understand how a computer works.
• What are there inside a computer?
• How does it work?
• We distinguish between two terms: Computer Architecture and
Computer Organization.
• Computer Organization:
• Design of the components and functional units using which
computer systems are built.
• Analogy: civil engineer’s task during building construction
(cement, bricks, iron rods, and other building materials).
3
• Computer Architecture:
• How to integrate the functional units to build a computer system
to achieve a desired level of performance.
• Analogy: architect’s task during the planning of a building (overall
layout, floorplan, etc.).
Functional Units
Processor
Arithmetic Control
& Logic Unit Unit
Memory
Input Main Output
Memory
Secondary
Memory
Inside the Processor
• Also called Central Processing Unit (CPU).
• Consists of a Control Unit and an Arithmetic Logic Unit (ALU).
• All calculations happen inside the ALU.
5 • The Control Unit generates sequence of control signals to carry out all
operations.
• The processor fetches an instruction from memory for execution.
• An instruction specifies the exact operation to be carried out.
• It also specifies the data that are to be operated on.
• A program refers to a set of instructions that are required to carry out
some specific task (e.g. sorting a set of numbers).
Role of ALU
• It contains several registers, some general-purpose and some
special purpose, for temporary storage of data.
• It contains circuitry to carry out logic operations, like AND, OR, NOT,
6 shift, compare, etc.
• It contains circuitry to carry out arithmetic operations like addition,
subtraction, multiplication, division, etc.
• During instruction execution, the data (operands) are brought in
and stored in some registers, the desired operation carried out,
and the result stored back in some register or memory.
Role of Control Unit
• Acts as the nerve center that senses the states of various functional
units and sends control signals to control their states.
ADD R1, R2, R3 R1  R2+R3
7
Opcode Operands
• Enable the outputs of registers R2 and R3.
• Select the addition operation.
• Store the output into register R1.
• When an instruction is fetched from memory, the operation (called
opcode) is decoded by the control unit, and the control signals
issued.
Inside the Memory Unit
• Two main types of memory subsystems.
• Primary or Main memory, which stores the active instructions
and data for the program being executed on the processor.
• Secondary memory, which is used as a backup and stores all
8
active and inactive programs and data, typically as files.
• The processor only has direct access to the primary memory.
• All instructions and data are stored in memory.
• An instruction and the required data are brought into the
main memory for execution. It is also known as stored
program concept. Also known as von-Neumann architecture.
Input Unit
• Used to feed data to the computer system from the external
environment.
• Data are transferred to the processor/memory after appropriate
9 encoding.
• Common input devices:
• Keyboard
• Mouse
• Joystick
• Camera
Output Unit
• Used to send the result of some computation to the outside
world.
• Common output devices:
10 • LCD/LED screen
• Printer
• Speaker / Buzzer
• Projection system
11
Source: https://www.insidemylaptop.com/complete-disassembly-guide-for-dell-inspiron-1545-laptop/
12
Basic operation of a
Computer
Special Purpose Registers for Interfacing with
Main Memory
• Two special-purpose registers are used: Address
0
• Memory Address Register (MAR): Holds
1
the address of the memory location to
2
be accessed. 3
13
• Memory Data Register (MDR): Holds the 4
data that is being written into .
memory, or will receive the data being .
read out from memory. .
1023
• Memory considered as a linear array of
Memory
storage locations (bytes or words) each
with unique address.
M Address
A Main
Processor R
14 Memory
M
Data
D
R
Control Signals
• To read data from memory
a) Load the memory address into MAR.
b) Issue the control signal READ.
c) The data read from the memory is stored into MDR.
15 • To write data into memory
a) Load the memory address into MAR.
b) Load the data to be written into MDR.
c) Issue the control signal WRITE.
Special Purpose Register For Keeping Track
of Program / Instructions
• Program Counter (PC): Holds the memory address of the next
instruction to be executed.
• Automatically incremented to point to the next instruction when an
16 instruction is being executed.
• Instruction Register (IR): Temporarily holds an instruction that
has been fetched from memory.
• Need to be decoded to find out the instruction type.
• Also contains information about the location of the data.
Architecture of the Example Processor
Memory
17 MAR MDR
Control
Unit
PC R0 Processor
R1
.
IR . ALU
Rn-1
n General Purpose Register (GPR)

Example Instructions
We shall illustrate the process of instruction execution with the help of
the following two instructions:
a) ADD R1, LOCA
18 Add the contents of memory location LOCA (i.e. address of the memory
location is LOCA) to the contents of register R1.
R1  R1 + Mem[LOCA]
b) ADD R1, R2
Add the contents of register R2 to the contents of register R1.
R1  R1 + R2
Execution of ADD R1,LOCA
Assume that the instruction is stored in memory location 1000, the initial
value of R1 is 50, and LOCA is 5000.
• Before the instruction is executed, PC contains 1000.
19 • Content of PC is transferred to MAR. MAR  PC
• READ request is issued to memory unit.
• The instruction is fetched to MDR. MDR  Mem[MAR]
• Content of MDR is transferred to IR. IR  MDR
• PC is incremented to point to the next instruction. PC  PC + 4
• The instruction is decoded by the control unit.
Add R1 5000
• LOCA (i.e. 5000) is transferred (from IR) to MAR. MAR  IR[Operand]
• READ request is issued to memory unit.
• The data is fetched to MDR. MDR  Mem[MAR]
• The content of MDR is added to R1. R1  R1 + MDR
The steps being carried out are called micro-operations:

20
MAR  PC
MDR  Mem[MAR]
IR  MDR
PC  PC + 4 (given a 32-bit system)
MAR  IR[Operand]
MDR  Mem[MAR]
R1  R1 + MDR
1. PC = 1000
R1 50 125 2. MAR = 1000
3. PC = PC + 4 = 1004
Address Content 4. MDR = ADD R1, LOCA
1000 ADD R1, LOCA
5. IR = ADD R1, LOCA
21 1004 …..
. 6. MAR = LOCA = 5000
. 7. MDR = 75
LOCA 5000 75
8. R1 = R1 + MDR = 50 + 75 = 125
Execution of ADD R1, R2
• Assume that the instruction is stored in memory location 1500, the initial
value of R1 is 50, and R2 is 200.
• Before the instruction is executed, PC contains 1500.
22 • Content of PC is transferred to MAR. MAR  PC
• READ request is issued to memory unit. Add R1, R2
• The instruction is fetched to MDR. MDR  Mem[MAR]

• Content of MDR is transferred to IR. IR  MDR
• PC is incremented to point to the next instruction. PC  PC + 4
• The instruction is decoded by the control unit.
• R2 is added to R1. R1  R1 + R2
R1 250 1. PC = 1500
R2 200 2. MAR = 1500
3. PC = PC + 4 = 1504
Address Content 4. MDR = ADD R1, R2
1500 ADD R1, R2
5. IR = ADD R1, R2
23 1504 …..
6. R1 = R1 + R2 = 250
LOAD R1, LOCA
• Assume that the instruction is stored in memory location 1800, the initial
value LOCA is 3000.
1. PC = 1800
Address Content
24 2. MAR = 1800
1800 LOAD R1, LOCA
1804 …..
3. PC = PC + 4 = 1804
. 4. MDR = LOAD R1, LOCA
. 5. IR = LOAD R1, LOCA
LOCA 3000 175
6. MAR = LOCA = 3000
7. MDR = 175
8. R1 = 175
Bus Architecture
• The different functional modules must be connected in an organized
manner to form an operational system.
• Bus refers to a group of lines that serves as a connecting path for
25 several devices.
• The simplest way to connect the functional unit is to use the single
bus architecture.
• Only one data transfer allowed in one clock cycle.
• For multi-bus architecture, parallelism in data transfer is allowed.
Single Bus Architecture
Input Output Memory Processor
26
Two Bus Architecture
Memory Processor Output

Device
27
Input
I/O
Device
Processor
28 THANK YOU
Number System Basics

Dr. Vineeta Jain
Number System: The Basics
• We are accustomed to the so-called decimal number system.
• Ten digits :: 0,1,2,3,4,5,6,7,8,9
• Every digit position has a weight which is a power of 10.
2 • Base or radix is 10.
• Examples:
• 234 = 2 x 102 + 3 x 101 + 4 x 100
• 250.67 = 2 x 102 + 5 x 101 + 0 x 100 + 6 x 10-1 + 7 x 10-2
Binary Number System
• Each digit position of a binary number has a weight.
• Some power of 2.
• A binary number:
3
B = bn-1 bn-2 …..b1 b0 . b-1 b-2 ….. b-m
where bi are the binary digits.
Corresponding value in decimal:
𝑛−1
𝐷 = ෍ 𝑏𝑖2𝑖
𝑖=−𝑚
Examples
1. 101011  1x25 + 0x24 + 1x23 + 0x22 + 1x21 + 1x20 = 43
(101011)2 = (43)10
4 2. .0101  0x2-1 + 1x2-2 + 0x2-3 + 1x2-4 = .3125

(.0101)2 = (.3125)10
3. 101.11  1x22 + 0x21 + 1x20 + 1x2-1 + 1x2-2 = 5.75

(101.11)2 = (5.75)10
Decimal to Binary Conversion
• Consider the integer and fractional parts separately.
• For the integer part:
• Repeatedly divide the given number by 2, and go on accumulating the
remainders, until the number becomes zero.
5
• Arrange the remainders in reverse order.
• For the fractional part:
• Repeatedly multiply the given fraction by 2.
• Accumulate the integer part (0 or 1).
• If the integer part is 1, chop it off.
• Arrange the integer parts in the order they are obtained.
Examples
2 239 2 64 .634 x 2 = 1.268

2 119 --1 2 32 --0 37.0625
.268 x 2 = 0.536
2 59 --1 2 16 --0
2 29 --1 2 8 --0 .536 x 2 = 1.072 (37)10 = (100101)2
6
2 14 --1 2 4 --0 (.0625)10 = (.0001)2
.072 x 2 = 0.144
2 7 --0 2 2 --0
2 3 --1 2 1 --0 .144 x 2 = 0.288
(37.625)10 =
2 1 --1 2 0 --1 .
2 0 --1 . (100101.0001)2
.
(0.634)10 = (10100..)2
(239)10 = (11101111)2 (64)10 = (1000000)2
Hexadecimal Number System
• A compact way to represent Hex Binary Hex Binary
binary numbers. 0 0000 8 1000
• Group of four binary 1 0001 9 1001
digits are represented by 2 0010 A 1010
7 a hexadecimal digit.
3 0011 B 1011
• Hexadecimal digits are 0 4 0100 C 1100
to 9, A to F.
5 0101 D 1101
6 0110 E 1110
7 0111 F 1111
Binary to Hexadecimal Conversion
• For the integer part:
• Scan the binary number from right to left.
• Translate each group of four bits into the corresponding hexadecimal
digit.
8
• Add leading zeros if necessary.
• For the fractional part:
• Scan the binary number from left to right.
• Translate each group of four bits into the corresponding hexadecimal
digit.
• Add trailing zeros if necessary.
Examples
1. (1011 0100 0011)2 = (B43)16
2. (10 1010 0001)2 = (2A1)16 Two leading 0s are added

9
3. (.1000 010)2 = (.84)16 A trailing 0 is added
4. (101 . 0101 111)2 = (5.5E)16 A leading 0 and trailing 0 are added

Hexadecimal to Binary Conversion
• Translate every hexadecimal digit into its 4-bit binary equivalent.
• Examples:
10
• (3A5)16 = (0011 1010 0101)2
• (12.3D)16 = (0001 0010 . 0011 1101)2
• (1.8)16 = (0001 . 1000)2

How are Hexadecimal Numbers Written?
• Using the suffix “H” or using the prefix “0x”.
• Examples:
11
• ADDI R1,2AH // Add the hex number 2A to register R1
• 0x2AB4 // The 16-bit number 0010 1010 1011 0100
• 0xFFFFFFFF // The 32-bit number for the all-1 string

Lecture 2: Data Representation

and Arithmetic
Dr. Vineeta Jain

Number System Representation
Integer (110) There are a fixed
• Fixed Point Number number of digits after
Fraction (.001) the decimal point
2 Unsigned Signed
Mantissa It allows for a varying

• Floating point Number number of digits after
Exponent the decimal point.
Unsigned Signed
Unsigned Fixed Point Numbers
• An n-bit binary number can have 2n distinct combinations (0 to 2n-1).
• For example, for n=3, the 8 distinct combinations are:
000, 001, 010, 011, 100, 101, 110, 111 (0 to 23-1 = 7 in decimal).
3 • An n-bit binary integer: Number of Range of Numbers
bn-1bn-2 … b2b1b0 bits (n)
8 0 to 28-1 (255)
• Equivalent unsigned decimal value:
16 0 to 216-1 (65535)
D = bn-12n-1 + bn-22n-2 + … + b121 + b020
32 0 to 232-1
• Each digit position has a weight that
64 0 to 264-1
is some power of 2.
Signed Fixed Point Numbers
• Many of the numerical data items that are used in a program are
signed (positive or negative).
• Question:: How to represent sign?
• Three possible approaches:
4
a) Sign-magnitude representation
b) One’s complement representation
c) Two’s complement representation
(a) Sign-magnitude Representation
• For an n-bit number representation:
• The most significant bit (MSB) indicates sign (0: positive, 1: negative).
• The remaining (n-1) bits represent the magnitude of the number.
5 • Range of numbers: – (2n-1 – 1) to + (2n-1 – 1)
bn-1 bn-2 . . . . . . . . b1 b0
Sign Magnitude
• A problem: Two different representations for zero.

+0: 0 00..000 and -0: 1 00..000
Example for n=4
Sign Sign
Decimal Decimal
Magnitude Magnitude To find the representation
+0 0000 -7 1111 of, say, -4, first note that
+1 0001 -6 1110 +4 = 0100
6
+2 0010 -5 1101
-4 = 1100
+3 0011 -4 1100
+4 0100 -3 1011
+5 0101 -2 1010
+6 0110 -1 1001
+7 0111 -0 1000
(b) One’s Complement Representation
• Basic idea:
• Positive numbers are represented exactly as in sign-magnitude form.
• Negative numbers are represented in 1’s complement form.
7
• How to compute the 1’s complement of a number?
• Complement every bit of the number (10 and 01).
• MSB will indicate the sign of the number (0: positive, 1: negative).
Example for n=4
1’s 1’s
Decimal Decimal
complement complement To find the representation
+1 0001 -6 1001 +4 = 0100
8
+2 0010 -5 1010
-4 = 1’s complement of
+3 0011 -4 1011 0100 = 1011
+4 0100 -3 1100
+5 0101 -2 1101
+6 0110 -1 1110
+7 0111 -0 1111
• Range of numbers that can be represented in 1’s complement:
Maximum :: + (2n-1 – 1)
Minimum :: - (2n-1 – 1)
• A problem:
• Two different representations of zero.
9
+0  0 000….0
-0  1 111….1
• Advantage of 1’s complement representation:
• Subtraction can be done using addition.
• Leads to substantial saving in circuitry.
(c) Two’s Complement Representation
• Basic idea:
• Positive numbers are represented exactly as in sign-magnitude
form.
10 • Negative numbers are represented in 2’s complement form.
• How to compute the 2’s complement of a number?
• Complement every bit of the number (10 and 01), and then
add one to the resulting number.
• MSB will indicate the sign of the number (0: positive, 1: negative).
Example for n=4
2’s 2’s
Decimal Decimal
complement complement To find the representation
+1 0001 -7 1001 +4 = 0100
11
+2 0010 -6 1010
-4 = 2’s complement of
+3 0011 -5 1011 0100 = 1011 + 1
+4 0100 -4 1100 = 1100
+5 0101 -3 1101
+6 0110 -2 1110
+7 0111 -1 1111
• Range of numbers that can be represented in 2’s complement:
Maximum :: + (2n-1 – 1)
Minimum :: - 2n-1
• Advantage of 2’s complement representation:
• Unique representation of zero.
12
• Subtraction can be done using addition.
• Leads to substantial saving in circuitry.
• Almost all computers today use 2’s complement representation for
storing negative numbers.
• Some other features of 2’s complement representation
a) Weighted number representation, with the MSB having weight -2n-1.
n-1 n-2 1 0
-2 2 2 2
bn-1 bn-2 . . . . . . . . b1 b0
13
D = -bn-12n-1 + bn-22n-2 + … + b222 + b121 + b020
b) Shift left by k positions with zero padding multiplies the number by 2k.
00010011 = +19 :: Shift left by 2 :: 01001100 = +76
11100011 = -29 :: Shift left by 2 :: 10001100 = -116

c) Shift right by k positions with sign bit padding divides the number by 2k.
00010110 = +22 :: Shift right by 2 :: 00000101 = +5
11100100 = -28 :: Shift right by 2 :: 11111001 = -7
14
d) The sign bit can be copied as many times as required in the beginning to
extend the size of the number (called sign extension).
X = 00101111 (8-bit number, value = +47) X = 10100011 (8-bit number, value = -93)
Sign extend to 32 bits: Sign extend to 32 bits:
00000000 00000000 00000000 00101111 11111111 11111111 11111111 10100011
Arithmetic Operations on Fixed
15
Point Numbers
Subtraction Using Addition :: 1’s Complement
How to compute A – B ?
• Compute the 1’s complement of B (say, B1).
• Compute R = A + B1  A+(-B)
16 • If a carry is obtained after addition is “1”:
• Add the carry back to R (called end-around carry).
• That is, R = R + 1.
• The result is a positive number.
• Else
• The result is negative, and is in 1’s complement form in R.
Example 1 :: 6 – 2
1’s complement of 2 = 1101
Assume 4-bit representations.

6 :: 0110
17 -2 :: 1101 Since there is a carry, it is
added back to the result.
1 0011
End-around
carry 1 The result is positive.
0100 = +4
Example 2 :: 3 – 5
1’s complement of 5 = 1010

3 :: 0011
18 -5 :: 1010 Since there is no carry, the
result is negative.
1101 = -2
1101 is the 1’s complement of
0010, that is, it represents –2.
Subtraction Using Addition :: 2’s Complement
How to compute A – B ?
• Compute the 2’s complement of B (say, B2).
• Compute R = A + B2  A+(-B)
19 • If a carry is obtained after addition is “1”:
• Ignore the carry.
• The result is a positive number.
• Else
• The result is negative, and is in 2’s complement form in R.
Example 1 :: 6 – 2
2s complement of 2 = 1101 + 1 = 1110

6 :: 0110
20 -2 :: 1110 Presence of carry indicates that
the result is positive.
1 0100 = +4
No need to add the end-around
Ignore the carry like in 1’s complement.
carry
Example 2 :: 3 – 5
2’s complement of 5 = 1010 + 1 = 1011

3 :: 0011
21 -5 :: 1011 Since there is no carry, the
result is negative.
1110 = -2
1110 is the 2’s complement of
0010, that is, it represents –2.
Addition of Two Binary Digits (Bits)
• When two bits A and B are added, a sum (S) and carry (C) are
generated as per the following truth table:
Input Output
S = A’.B + A.B’
22 A B S C =A⊕B
0 0 0 0 C = A.B
0 1 1 0
1 0 1 0
1 1 0 1 A S
Half
Adder
B C
Addition of Multi-bit Binary Numbers
0 0 1 0 1 1 0  Carry 1 1 1 1 1 1 0  Carry
0 1 0 1 0 1 1  Number A 0 1 1 1 1 1 1  Number A
+ 0 0 0 1 0 0 1  Number B + 0 0 0 0 0 0 1  Number B
23 0 1 1 0 1 0 0  Sum S 1 0 0 0 0 0 0  Sum S
• At every bit position (stage), we require to add 3 bits:

 1 bit for number A
 1 bit for number B We need a full adder
 1 carry bit coming from the previous stage
S = A’.B’.Cin + A’.B.Cin’ + A.B’Cin’ + A.B.Cin
= A ⊕ B ⊕ Cin
Input Output Cout = A.B + B.Cin + A.Cin
A B Cin S Cout
0 0 0 0 0
0 0 1 1 0
24
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
A S
1 1 0 0 1
B FA
1 1 1 1 1
Cin Cout
Delay of a Full Adder
Delay of a full adder:
• Assume that the delay of all basic gates
(AND, OR, NAND, NOR, NOT) is δ.
• Delay for Carry = 2δ
25
• Delay for Sum = 3δ (AND-OR delay plus
one inverter delay)
Circuitry for Addition
• In RCA, Carry output from
1110  Carry
stage-i propagates as the carry
0111  Number A input to stage-(i+1).
+ 0001  Number B • In the worst-case, carry ripples
through all the stages.
26 1000  Sum S
Ripple Carry Adder

Delay in RCA
Two numbers:
Xn-1…X2X1X0 and
Yn-1…Y2Y1Y0
Input carry: C0
Sum: Sn-1…S2S1S0
27 Output carry: Cn
• Delay for C1 = 2δ • Delay for S0 = 3δ

• Delay for C2 = 4δ • Delay for S1 = 2δ + 3δ = 5δ
• Delay for Cn-1 = 2(n-1)δ • Delay for S2 = 4δ + 3δ = 7δ
• Delay for Cn = 2nδ • Delay for Sn-1 = 2(n-1)δ + 3δ = (2n+1) δ
Delay is proportional to n
Design of Fast Adders
Carry Look-ahead Adder
• The propagation delay of an n-bit ripple carry adder has been seen
to be proportional to n.
• Due to the rippling effect of carry sequentially from one stage to the
28 next.
• One possible way to speedup the addition:
• Generate the carry signals for the various stages in parallel.
• Time complexity reduces from O(n) to O(1).
• Hardware complexity increases rapidly with n.
4-bit CLA Adder
3δ
29
5δ
Carry Generate and Carry Propagate
Xi Yi
Ci+1 = Xi.Yi + Yi.Ci + Xi.Ci • Generate function means
whether at a particular
Ci+1 = Xi.Yi + Ci (Yi + Xi) stage a carry is generated
Ci+1 Full Adder Ci
based on inputs Xi and Yi,
=1 =1
Gi Pi irrespective of Ci.
30 • It can happen when
Si Xi=1 and Yi=1,
where, Gi = carry generate function and
i.e. Gi = Xi.Yi
Pi = carry propagate function
• In propagate function, Ci
Ci+1 = Gi + PiCi propagates to Ci+1 i.e.,
Ci+1 = 1, given Ci=1:
Gi = 1 represents the condition when a carry is
Xi = 0, Yi = 1
generated in stage-i independent of the other stages. Xi = 1, Yi = 0
Pi = 1 represents the condition when an input carry So, Pi = Xi ⊕ Yi
Ci will be propagated to the output carry Ci+1.
Unrolling the Recurrence
Ci+1 = Gi + PiCi
= Gi + Pi (Gi-1 + Pi-1Ci-1)
= Gi + PiGi-1 + PiPi-1Ci-1
31 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-2Ci-2)
= Gi + PiGi-1 + PiPi-1 Gi-2 + .... + PiPi-1....P1G0 + PiPi-1....P1C0
Thus, all the carries can be obtained by 5 gate delays, i.e. 5δ, after the
input signals X, Y and C0 are applied as:
• 3δ delay is incurred in generating Gi and Pi (as Pi uses XOR gate
incurring 3δ delay)
• 2δ is incurred in AND-OR circuit for Ci+1 (as Ci+1 = Gi + PiCi )
Design of 4-bit CLA Adder
C4 = G3 + G2P3 + G1P2P3 + G0P1P2P3 + C0P0P1P2P3
C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2
C2 = G1 + G0P1 + C0P0P1
32
C1 = G0 + C0P0
S0 = X0 ⊕ Y0 ⊕ C0 = P0 ⊕ C0
S1 = P1 ⊕ C1
S2 = P2 ⊕ C2
S3 = P3 ⊕ C3
The 4-bit
CLA Circuit
33
16-bit Adder Using 4-bit CLA Modules
X15-X12 Y15-Y12 X11-X8 Y11-Y8 X7-X4 Y7-Y4 X3-X0 Y3-Y0
4-bit CLA 4-bit CLA 4-bit CLA 4-bit CLA

C0
Adder Adder Adder Adder
34 C12 C8 C4
S15-S12 S11-S8 S7-S4 S3-S0

C16
Problem: Carry propagation between modules still slows down the adder
Solution:
• Use a second level of carry look-ahead mechanism to generate the
input carries to the CLA blocks in parallel.
• The second level of CLA generates C4, C8, C12 and C16 in parallel
with two gate delays (2δ).
35
• For larger values of n, more CLA levels can be added.
• Delay calculation of a 16-bit adder:
a) For original single-level CLA: 14δ
b) For modified two-level CLA: 10δ
36
Delay of a n-bit Adder
n TCLA TRCA
TCLA = (6 +2 log4n ) δ
4 8δ 9δ
16 10δ 33δ
37 32 12δ 65δ TRCA = (2n + 1) δ
64 12δ 129δ
128 14δ 257δ
256 14δ 513δ
Status Flags
• Many contemporary processors have a flag register that contains
the status of the last arithmetic / logic operation.
• Zero (Z): tells whether the result is zero. Can be used for both arithmetic
and logic operations.
38 • Sign (S): tells whether the result is positive (=0) or negative (=1). Can
be used for both arithmetic and logic operations.
• Carry (C): tells whether there has been a carry out of the most
significant stage. Used only for arithmetic operations.
• Overflow (V): tells whether the result is too large to fit in the target
register. Used only for arithmetic operations (addition and
subtraction).
F
Status Flags Carry
out …
Fn-1 Fn-2 F1 F0 C- Flag

A B
…
Overflow can occur during
NOR addition when the sign of the two
ALU
operands are the same.
39 F Sign of the result become different
Z- Flag
from the sign of the operand(s).
Assume A, B +5 0101 -5 1 0 1 1
F
and F are n-bit +4 0100 -4 1 1 0 0
Fn-1 …
registers 1001 1 0111
S- Flag
V = An-1.Bn-1.Fn-1’ + An-1’.Bn-1’.Fn-1
Fixed Point Multiplication
• Unsigned Multiplication
• Booth’s Multiplication Algorithm
40
Unsigned Multiplication
Example: 3 X 3
• Multiplication requires substantially 0011 Multiplicand M (3)

more hardware than addition. Multiplier Q (3)
0011
• Multiplication of two n-bit number
generates a 2n-bit product. ---------
41 0011 Partial
• We can use shift-and-add method.
Product
• Repeated additions of shifted 0011X
versions of the multiplicand.
0000XX
0000XXX
-------------------
0001001
Signed Multiplication - Booth’s Algorithm
Suppose the multiplicand is M and multiplier is Q
1. Find the booth’s encoding of multiplier
• Use the symbols +1, -1 and 0 to indicate changes w.r.t. Qi and Qi-1.
• 01  +1, 10  -1, 00 or 11  0.
• For encoding the least significant bit Q0, we assume Q--1 = 0.
42
2. Multiply the multiplicand with booth’s encoding of multiplier.
• If the value = 0; output = 0s
• If value = +1; output = multiplicand
• If value = -1; output = 2’s complement of multiplicand
3. Sign extend the partial products upto 2n places.
4. Add all the partial products
Booth’s Encoding Example
Q
01110000 0 01110000 0
Qn-1……Q1Q0 Q-1 = 0
Q
01110000 0
0 0
01110000 0 01110000 0
0
43 01110000 0 -1 +1
01110000 0
0
01110000 0 Booth Encoding = +1 0 0 -1 0 0 0 0
0
0 Decimal Value of +1 0 0 -1 0 0 0 0 = 27 + (-24) = 112

The worst case for Booth’s multiplication is when the multiplier contains
alternating 0’s and 1’s
Booth’s Multiplication Example
01101 (+13) M 0 1 1 0 1
X 11010 (-6) Q 0 -1 +1 -1 0
0 0 0 0 0 0 0 0 0 0
44 1 1 1 1 1 0 0 1 1 2’s comp. of 13
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
(-78) 1 1 1 0 1 1 0 0 1 0
MIPS32 Processor
DR. VINEETA JAIN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, LNMIIT JAIPUR
MIPS32 Architecture: A Case Study 2
a) 32, 32-BIT GENERAL PURPOSE REGISTERS high-order 32 bits
(GPRS), R0 TO R31.
31 0 31 0
• R0 is hard-wired to a value of zero. R0 Zero HI
• R31 is used to store return address when a R1 LO
function call is made. Used by the jump- R2 low-order 32 bits
and-link and branch-and-link instructions
. 31 0
b) A SPECIAL-PURPOSE 32-BIT PROGRAM . PC
COUNTER (PC).
.
• Affected only indirectly by certain Special Purpose Registers
R30
instructions (like branch, call, etc.)
R31 Return Address
c) A PAIR OF 32-BIT SPECIAL-PURPOSE REGISTERS
HI AND LO, WHICH ARE USED TO HOLD THE General Purpose Registers
RESULTS OF MULTIPLY, DIVIDE, AND MULTIPLY-
ACCUMULATE INSTRUCTIONS.
3
MIPS32 Assembly Code Layout
.text # code section
.global main # starting point, must be global
main:
# user program code goes here
.data # data section
# user program data goes here

Register name Register number Usage 4
$zero R0 Used to represent the constant zero
value, wherever required in a program
$at R1 Reserved for assembler
$v0 R2 used for up to two function
$v1 R3 return values, and also as temporary
registers during expression evaluation.
$a0 R4 Argument 1 May be used to pass
$a1 R5 Argument 2 up to four
arguments to
$a2 R6 Argument 3 functions.
$a3 R7 Argument 4
$t0-$t9 R8 – R15, R24, R25 Temporary (not used as temporary
preserved across variables in
call) programs. These
registers might get
modified when some
functions are called
Register name Register number Usage
$s0-$s7 R16-R23 Temporary 5
used as temporary variables
(preserved across in programs.
call) These registers do not get
modified across
function calls.
$gp R28 Pointer to global points to the memory
area address from where the
global variables are allocated
space.
$sp R29 Stack pointer points to the top of the stack
in memory.
$fp R30 Frame pointer points to the activation
record in stack.
$ra R31 Return address used while returning from a
(used by function function
call)
$k0 R26 Reserved for OS kernel
$k1 R27 Reserved for OS kernel
Example Program 1
6
.text
.global main
Add two
numbers in
main: la $t0, value memory and
lw $t1, 0($t0) store the
lw $t2, 4($t0) result in the next
add $t3, $t1, $t2 location.
sw $t3, 8($t0)
.data
value: .word 50, 30, 0
Assembler Directives 7
• .TEXT
• Specifies the user text segment, which contains the instrucFons.
• .DATA
• Specifies the data segment, where all the data items are defined.
• .GLOBL SYM
• Specifies that the symbol “sym” is global, and can be referred from
other files.
• .WORD W1, W2, …, WN
• Stores the specified 32-bit numbers in successive memory words.
• .HALF H1, H2, …, HN
• Stores the specified 16-bit numbers in successive memory half-words.
• .BYTE B1, B2, …, BN 8
• Stores the specified 8-bit numbers in successive memory bytes.
• .ASCII STR
• Stores the specified string in memory (in ASCII code), but do not null-
terminate it.
• Strings are enclosed in double quotes and follow C-like convention (“\n”,
etc.).
• .ASCIIZ STR
• Stores the specified string in memory (in ASCII code), and null-terminate it.
• .SPACE N
• Reserve space for n successive bytes in memory.
9
THANK YOU
45
FLOATING-POINT NUMBERS
Floating-Point Number Representation
(IEEE-754)
• For representing numbers with fractional parts, we can assume that
the fractional point is somewhere in between the number (say, n
bits in integer part, m bits in fraction part).  Fixed-point
representation
46 • Lacks flexibility.
• Cannot be used to represent very small or very large numbers (for
example: 2.53 x 10-26, 1.7562 x 10+35, etc.).
• Solution :: use floating-point number representation.
• A number F is represented as a triplet <s, M, E> such that
F = (-1)s M x 2E
F = (-1)s M x 2E
• s is the sign bit indicating whether the number is negative (=1) or positive (=0).
• M is called the mantissa, and is normally a fraction in the range [1.0,2.0].
• E is called the exponent, which weights the number by power of 2.
47
Encoding:
• Single-precision numbers: total 32 bits, E 8 bits, M 23 bits
• Double-precision numbers: total 64 bits, E 11 bits, M 52 bits
S E M
Points to Note
• The number of significant digits depends on the number of bits in M.
• 7 significant digits for 24-bit mantissa (23 bits + 1 implied bit).
• The range of the number depends on the number of bits in E.
48 • 1038 to 10-38 for 8-bit exponent.
How many significant digits? Range of exponent?

224 = 10x 2127 = 10y
24 log102 = x log1010 127 log102 = y log1010
x = 7.2  7 significant decimal places y =38.1  maximum exponent value
38 (in decimal)
Exponent
• We shall now see how E and M are actually encoded.
• Assume that the actual exponent of the number is EXP (i.e. number is M x 2EXP).
• Permissible range of E: 1 ≤ E ≤ 254 (the all-0 and all-1 patterns are not allowed).
• Encoding of the exponent E:
49
• The exponent may be represented in sign magnitude form. In this case, 1 bit is used for sign
and 7 bits for the magnitude. The exponent will range from –127 to +127. The only
disadvantage to this method is that there are two representations for 0 exponent: + 0 and – 0.
• To avoid this, one may use an excess representation or biased format to represent the
exponent. The exponent has no sign in this format.
• The exponent is encoded as a biased value: E = EXP + BIAS,
where BIAS = 2n-1 – 1, where n = no. of bits used for storing E.
• BIAS = 127 (28-1 – 1) for single-precision, and
• BIAS = 1023 (211-1 – 1) for double-precision.
Encoding of the mantissa M: “Normalized”
Representation
• The mantissa is coded with an implied leading 1 (i.e. in 24 bits).
M = 1 . xxxx...x
• Also known as normalized representation.
50 • Here, xxxx…x denotes the bits that are actually stored for the
mantissa. We get the extra leading bit for free.
• When xxxx…x = 0000…0, M is minimum (= 1.0).
• When xxxx…x = 1111…1, M is maximum (= 2.0 – ε).
Encoding Example1
• Consider the number F = 15335
(15335)10 = (11101111100111)2 = 1.1101111100111 x 213
• Here we are considering single precision number
51 • Mantissa will be stored as: M = 1101111100111 00000000002
• Here, EXP = 13, BIAS = 127  E = 13 + 127 = 140 = 100011002
0 10001100 11011111001110000000000 466F9C00 in hex
4 6 6 F 9 C 0 0
Encoding Example2
• Consider the number F = -3.75
-3.7510 = -11.112 = -1.111 x 21
• Considering single precision number:
52 • Mantissa will be stored as: M = 111000000000000000000002
• Here, EXP = 1, BIAS = 127  E = 1 + 127 = 128 = 100000002
1 10000000 11100000000000000000000 40700000 in hex

Special Values
Zero is represented by
• When E = 000…0 the all-zero string
• M = 000…0 represents the value 0.
Also referred to as
• M ≠ 000…0 represents numbers very close to 0. denormalized
• When E = 111…1 numbers.
53
• M = 000…0 represents the value ∞ (infinity).
• M ≠ 000…0 represents Not-a-Number (NaN). NaN represents cases
when no numeric value
can be determined, like
uninitialized values, ∞*0,
∞-∞, square root of a
negative number, etc.
Exercise
• Decode the following single-precision floating-point numbers.
a) 0011 1111 1000 0000 0000 0000 0000 0000
b) 0100 0000 0110 0000 0000 0000 0000 0000
c) 0100 1111 1101 0000 0000 0000 0000 0000
54 d) 1000 0000 0000 0000 0000 0000 0000 0000
e) 0111 1111 1000 0000 0000 0000 0000 0000
f) 0111 1111 1101 0101 0101 0101 0101 0101
Floating Point Addition/Subtraction
• Two numbers: M1 x 2E1 and M2 x 2E2 , where E1 > E2 (say).
• Basic steps:
• Select the number with the smaller exponent (i.e. E2) and shift its
55 mantissa left by (E1-E2) positions.
• Set the exponent of the result equal to the larger exponent (i.e. E1).
• Carry out M1 ± M2, and determine the sign of the result.
• Normalize the resulting value, if necessary.
Addition Example
• Suppose we want to add F1 = 270.75 and F2 = 2.375
• F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
• F2 = (2.375)10 = (10.011)2 = 1.0011 x 21
• Shift the mantissa of F2 left by 8 – 1 = 7 positions
• F2 = 0.00000010011 X 28
56
• Add the Mantissa’s
1. 0 0 0 0 1 1 1 0 1 1 X 28
0. 0 0 0 0 0 0 1 0 0 1 1 X 28
1. 0 0 0 1 0 0 0 1 0 0 1 X 28
• Result: 1.00010001001 x 28
Subtraction Example
• Suppose we want to subtract F2 = 224 from F1 = 270.75
• F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
• F2 = (224)10 = (11100000)2 = 1.11 x 27
57 • Shift the mantissa of F2 left by 8 – 7 = 1 position, and subtract:
1.0000111011 X 28 Perform 2’s complement
0.1110000000 X 28 subtraction by addition
0.00101110110 X 28
• For normalization, shift mantissa left 3 positions, and decrement E by 3.
• Result: 1.01110110 x 25
Floating-Point Multiplication
• Two numbers: M1 x 2E1 and M2 x 2E2
• Basic steps for multiplication:
• Add the exponents E1 and E2 and subtract the BIAS.
58
• Multiply M1 and M2 and determine the sign of the result.
• Normalize the resulting value, if necessary.
Rounding
• Suppose we are adding two numbers (say, in single-precision).
• We add the mantissa values after shifting one of them left
for exponent alignment.
59
• We take the first 23 bits of the sum, and discard the residue
R (beyond 32 bits).
• If the process of rounding generates a result that is not in
normalized form, then we need to re-normalize the result.
60 THANK YOU
Lecture5: Processors
Dr. Vineeta Jain
Software
• A software or a program consists of a set of instructions required to solve a
specific problem.
• A program to sort a set of numbers.
• A program to find the roots of a quadratic equation.
2 • Broadly we can classify programs/software into two types:
a) System Software
• A collection of programs that helps the users to create, analyze and run their
programs.
b) Application Software
• Which helps the user to solve a particular user-level problem.
• They need system software for execution.
(a) System Software
• System software is a collection of programs that are executed as needed to
perform functions such as
• Receiving and interpreting user commands
• Running standard application programs such as word processors, etc, or
games
3
• Managing the storage and retrieval of files in secondary storage devices
• Controlling I/O units to receive input information and produce output results
• Translating programs from source form prepared by the user into object form
consisting of machine instructions
• Linking and running user-written application programs with existing standard
library routines, such as numerical computation packages
• System software is thus responsible for the coordination of all activities in
a computing system
(b) Application Software
• Application software helps users solve particular problems.
• In most cases, application software resides on the computer’s
hard disk or removable storage media (DVD, USB drive, etc.).
4 • Typical examples:
• Financial accounting package
• Mathematical packages like MATLAB or MATHEMATICA
• An app to book a cab
• An app to monitor the health of a person
Operating System
• Operating system (OS) is a large program, or actually a collection of
routines, that is used to control the sharing of and interaction among
various computer units as they perform application programs.
• The OS routines perform the tasks required to assign computer
5 resource to individual application programs
• These tasks include assigning memory and magnetic disk space to
program and data files, moving data between memory and disk units,
and handling I/O operations
• Let us consider a scenario where a system with one processor, one disk, and one printer is
available.
• Assume that part of the program’s task involves reading a data file from the disk into the
memory, performing some computation on the data, and printing the results
User Program and OS Routine Sharing
t2-t3: OS transfers the data file from the disk
t4-t5: Printer prints the file
6
Multiprogramming or Multitasking
• The disk and the processor

are idle during the period
t4-t5.
• The OS can load the next
program to be executed
into the memory from the
7
disk while the printer is
operating.
• Similarly, during t 0-t1. the OS can arrange to print previous program’s results while the
current program is being loaded from the disk.
• Thus, the OS manages the concurrent execution of several application programs to make the
best possible use of computer resources.
• This pattern of concurrent execution is called multiprogramming or multitasking.
Performance
• The most important measure of the performance of a computer is how quickly it can
execute programs. For best performance, it is necessary to design the compilers, the
machine instruction set, and the hardware in a coordinated way.
• The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the
processor, the disk and the printer.
8 • The time needed to execute a instruction is called the processor time. It depends on the
hardware involved in the execution of individual machine instructions. This hardware
comprises the processor and the memory which are usually connected by the bus.
Cache
Main Memory Processor
Memory
Processor Clock
• Processor circuits are controlled by a timing signal called a clock.
• The clock defines regular time intervals, called clock cycles.
• To execute a machine instruction, the processor divides the action to
be performed into a sequence of basic steps, such that each step can
9
be completed in one clock cycle.
• Let the length of one clock cycle be P, its inverse (R) is the clock rate,
R=1/P, which is measured as cycles per second also known as hertz
(Hz).
• The term “million” is denoted by prefix Mega (M) and “billion” is
denoted by prefix Giga (G). Ex: 500 million cycles per second is called
as 500 MHz and 1250 million cycles per second is called as 1.25 GHz.
Basic Performance Equation
𝑁∗𝑆
𝑇=
𝑅
• where,
10 • T is the processor time required to execute a program,
• N is the number of instruction executions, and
• S is the average number of basic steps needed to execute one
machine instruction
• To achieve higher performance, the value of T should be
reduced, which means reducing value of N and S and increasing
the value of R.
Performance Improvement
• Pipelining and superscalar operation
• Pipelining: by overlapping the execution of successive instructions
• Superscalar: different instructions are concurrently executed with
multiple instruction pipelines. This means that multiple functional
11
units are needed.
• Clock rate improvement
• Improving the integrated-circuit technology makes logic circuits faster,
which reduces the time needed to complete a basic step.
Pipelining
• A substantial improvement in performance can be achieved by overlapping the
execution of successive instructions, using a technique called pipelining.
• Instruction execution is typically divided into 5 stages:
• Instruction Fetch (IF)
12 • Instruction Decode (ID)
• ALU operation (EX)
• Memory Access (MEM)
• Write Back result to register file (WB)
• These five stage can be executed in an overlapped fashion in a pipeline
• architecture.
• Results in significant speedup by overlapping instruction execution.
Basic 5-stage Pipelining Diagram
Instruction Pipeline Stage

1 IF ID EX MEM WB
2 IF ID EX MEM WB
3 IF ID EX MEM WB
13
4 IF ID EX MEM WB
Clock 1 2 3 4 5 6 7 8
Cycle
Instruction 1: ADD R1, R2, R3

Instruction 1 Instruction 3
Instruction 2: LOAD R4, LOCA finishes finishes
Instruction 3: STORE R5, LOCB Instruction 2 Instruction 4

finishes finishes
Instruction 4: LOAD LOCD
Superscalar Operations
• A higher degree of concurrency can be achieved if multiple

instruction pipelines are implemented in the processor.
• This means that multiple functional units are used, creating
14 parallel paths through which different instructions can be
executed in parallel.
• With such an arrangement, it becomes possible to start the
execution of several instructions in every clock cycle. This mode
of operation is called superscalar execution.
Example
• Suppose two instructions can be issued every clock cycle.
a) One can be a load, store, branch or integer ALU operation.
b) The other can be any floating-point operation.
15 Instruction Pipeline Stage

Integer IF ID EX MEM WB
instruction
Floating point IF ID EX MEM WB
Instruction
Integer IF ID EX MEM WB
instruction
Floating point IF ID EX MEM WB
Instruction
Instruction Set: CISC and RISC
• Simple instructions require a small number of basic steps to execute.
Complex instructions involve a large number of steps.
• For a processor that has only simple instructions, a large number of
instructions may be needed to perform a given programming task. This
16 could lead a large value of N and a small value of S.
• On the other hand, If individual instructions perform more complex
operations, fewer instructions will be needed, leading to lower value of N,
and larger value of S.
• On basis of above parameters, two broad classifications of Instruction set:
a) Complex Instruction Set Computer (CISC)
b) Reduced Instruction Set Computer (RISC)
CISC versus RISC Architectures
Complex Instruction Set Computer (CISC)
• More traditional approach.
• Main features:
17
• Complex instruction set
• Large number of addressing modes
• Special-purpose registers and Flags (sign, zero, carry, overflow, etc.)
• Variable-length instructions / Complex instruction encoding
• Instruction decoding / control unit design more complex
CISC versus RISC Architectures
Reduced Instruction Set Computer (RISC)
• Very widely used among many manufacturers today.
• Also referred to as Load-Store Architecture.
• Only LOAD and STORE instructions access memory.
18
• All other instructions operate on processor registers.
• Main features:
• Simple architecture for the sake of efficient pipelining.
• Simple instruction set with very few addressing modes.
• Large number of general-purpose registers; very few special-purpose.
• Instruction length and encoding uniform for easy instruction decoding.
19
Parallel Processing
Overview
• Parallel processing is a computing where the jobs are broken into discrete
parts that can be executed concurrently.
• Each part is further broken down to a series of instructions. Instructions
from each part execute simultaneously on different CPUs.
• The goals of parallel processing:
20 • Reduce the time needed to wait for a problem to be solved.
• Solve bigger problems that might not fit in the limited memory of a single CPU.
• To perform parallel processing, we need multiprocessor systems
• A processing system composed of two or more independent cores or CPUs.
• The cores are typically integrated onto a single integrated circuit die, or they
may be integrated on multiple dies in a single-chip package.
• In symmetric multiprocessor systems, all the cores are identical, whereas in
asymmetric ones, the cores may have different functionalities.
Flynn taxonomy
• M.J. Flynn classify the multiprocessor systems on the basis of number of

instruction and data items processed simultaneously
Single data Multiple data
stream streams
21
Single instr
SIMD
stream
SISD Array or Vector
Uniprocessors
Processors
instr stream
MIMD
Multiple
MISD Multiprocessors or
Rarely Used
Multicomputers
Single-instruction, single-data (SISD)
• An SISD computing system is a
uniprocessor machine which is capable of
executing a single instruction, operating on
a single data stream.
• In SISD, machine instructions are
22 processed in a sequential manner and
computers adopting this model are
popularly called sequential computers.
• Most conventional computers have SISD
architecture. All the instructions and data
to be processed have to be stored in
primary memory.
Single-instruction, multiple-data (SIMD)
• An SIMD system is a multiprocessor
machine capable of executing the same
instruction on all the CPUs but operating
on different data streams.
• Single Control Unit (CU) and multiple
processing elements (PEs)
23
• CU fetches an instruction from the
memory and after decoding, broadcast
control signals to all PEs, i.e. at any given
time, all PEs are synchronously executing
the same instruction but on different sets Data parallelism can be achieved
in two ways:
of data. Hence named SIMD.
a) Concurrency in space – array
• SIMD allows data parallelism, i.e., processing
executing one operation on multiple data b) Concurrency in time – vector
streams. processing
Consider a task of adding two group
a) Array Processor (A and B) of 10 numbers.
A. In normal processor:
• Execute the loop 10 times
• It is a processor capable of processing • Read instr and decode
array elements. • Fetch no.s from A & B
• Add them
• It is synchronous parallel computer with • Put the result back
multiple ALU called PE that can operate in • End loop
parallel B. In Array processor
• Read instr and decode
24 • It is composed of N identical PE under the • Fetch no.s from A
control of one CU and a number of • Fetch no.s from B
memory modules • Add them
• Put the result back
• Array processors take the concept of • Only two address translations are
pipelining one step further, i.e. instead of needed
pipelining just the instructions, they also • Fetching and decoding is done only
one time instead of ten times
pipeline the data itself. This allows • The code is also smaller leading to
significant saving in decoding time. efficient memory use
• It improves performance by
avoiding stalls
b) Vector Processor
• A vector is an ordered set of same type of scalar items, where a scalar item can be a
floating point number, an integer, or a logical value.
1 2 . . . . . 64
Vector
• A vector V of length n is represented as row vector by V = [V1,V2,V3….Vn]. The element Vi of
vector V is written as V(I) and the index I refers to a memory address or register where
25 the number is stored.
• Vector processing is the arithmetic or logical computation applied on vectors whereas in
scalar processing only one or pair of data is processed.
• Provides high-level instructions that operate on entire arrays of numbers (called vectors).
Therefore, vector processor is faster than scalar processor.
• Example: A, B and C are three vectors containing 64 numbers each. The three vectors are
mapped to vector registers V1, V2, V3 (say). The following vector instruction computes :
Ci = Ai + Bi  ADDV V3, V1, V2
• A single vector instruction is equivalent to an entire loop. No loop overheads are required.
Multiple-instruction, single-data (MISD)
• An MISD computing system is a
multiprocessor machine capable of
executing different instructions on
different PEs but all of them
operating on the same dataset.
Example Z = sin(x)+cos(x)+tan(x)
26
• The system performs different
operations on the same data set.
• Machines built using the MISD model
are not useful in most of the
application, a few machines are built,
but none of them are available
commercially.
Multiple-instruction, multiple-data (MIMD)
• An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets.
• Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
27
MIMD machines are broadly
categorized based on the way
PEs are coupled to the main
memory:
• Shared-memory MIMD
model (Tightly coupled
multiprocessors
• Distributed-memory MIMD
model (Loosely coupled
multiprocessor)
Multiple-instruction, multiple-data (MIMD)
• An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets.
• Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
28
MIMD machines are broadly
categorized based on the way
PEs are coupled to the main
memory:
• Shared-memory MIMD
model (Tightly coupled
multiprocessors
• Distributed-memory MIMD
model (Loosely coupled
multiprocessor)
Single Processor v/s Multiprocessor Systems
CPU Chip
single
Registers processor
29 ALU
System bus
Bus Interface
Single Processor v/s Multiprocessor Systems
Core 1 Core 2 Core 3
Registers Registers
Registers
ALU ALU ALU

30
Bus Interface
Shared-memory (Tightly coupled) multiprocessors
• All the PEs are connected to a single global memory and they all have access
to it.
• The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one
31 PE is visible to all other PEs.
• Some Features:
• Difficult to extend it to large number of processors.
• Memory bandwidth requirements increase with the number of
processors.
• Memory access time for all processors is uniform, called Uniform Memory
Access – UMA.
Shared-memory (Tightly coupled) multiprocessors
Processor Processor Processor Processor
i-L1 d-L1 i-L1 d-L1 i-L1 d-L1 i-L1 d-L1
32
L2 L2 L2 L2
L3
Main Memory I/O System

Distributed-memory (Loosely coupled) multiprocessor
• In loosely coupled multiprocessors, all PEs have a local memory.
• The communication between PEs in this model takes place through the
interconnection network (the inter process communication channel, or IPC).
• The network connecting PEs can be configured to tree, mesh or in accordance
with the requirement.
33
• It is also known as a Multicomputer.
• Some Features:
• Cost-effective way to scale memory bandwidth.
• More tolerant to failures as compared to shared memory processors.
• Communicating data between processors is complex.
• Memory access time depends on the location of data, called Non Uniform
Memory Access – NUMA.
Distributed-memory (Loosely coupled) multiprocessor
Processor + Processor + Processor +

Cache Cache Cache
Memory I/O Memory I/O Memory I/O
34
Interconnection network
Advanced Microprocessors
Internal Clock Address Max: Memory
Name Date Data Width
Registers Speed Lines Space
8086 1974 16 Bit 2 MHz 16 bits 20 Bit 1MB
80286 1982 16 Bit 6 MHz 16 bits 24 Bit 16 MB
80386 1985 32 Bit 16 MHz 32 bits 32 Bit 4 GB

35
80486 1989 32 Bit 25 MHz 32 bits 32 Bit 4 GB
Pentium 1993 32 Bit 60 MHz 32 bits, 64 bit bus 32 Bit 4 GB
Pentium II 1997 32 Bit 233 MHz 32 bits, 64 bit bus 32 Bit 64 GB
Pentium III 1999 32 Bit 450 MHz 32 bits, 64 bit bus 32 Bit 64 GB
Pentium IV 2000 32 Bit 1.5 GHz 32 bits, 64 bit bus 32 Bit 64 GB

36 THANK YOU
Unit - II
Instruction Set Architecture
Lecture 1: Memory Models
Dr. Vineeta Jain
Overview of Memory Organization
• Memory is one of the most important sub-
systems of a computer that determines the Example: 1024 X 8 Memory
overall performance.  210 X 8 memory
 10 bit of address which can
• Conceptual view of memory: hold 8 bits of data or
• Array of storage locations, with each storage  Address Bus = 10 lines
 and data bus = 8 lines
2 location having a unique address.
• Each storage location can hold a fixed amount
of information (multiple of bits, which is the 0 1 2 3 4 5 6 7
basic unit of data storage).
0000000000
• A memory system with M locations and N bits
0000000001
per location, is referred to as an M x N
memory. .
• Both M and N are typically some powers of 2. .
.
• Example: 1024 x 8, 65536 x 32, etc. .
Some Terminologies
• Bit: A single binary digit (0 or 1).
• Nibble: Collection of 4 bits.
• Byte: Collection of 8 bits.
3
• Word: Does not have a unique definition.
• Varies from one computer to another; typically 32 or 64 bits.
Some Terminologies
• Bit: A single binary digit (0 or 1).
• Nibble: Collection of 4 bits.
• Byte: Collection of 8 bits.
4
• Word: Does not have a unique definition.
• Varies from one computer to another; typically 32 or 64 bits.
• Memory is often byte organized.

• Every byte of the memory has a unique address.
How do we Specify Memory Sizes?
Unit Bytes In Decimal
8 bits (B) 1 or 20 100
Kilobyte (KB) 1024 or 210 103

5 Megabyte (MB) 1,048,576 or 220 106
Gigabyte (GB) 1,073,741,824 or 230 109
Terabyte (TB) 1,099,511,627,776 or 240 1012
Petabyte (PB) 250 1015
Exabyte (EB) 260 1018
Zetabyte (ZB) 270 1021

Questions
• A computer has 64 MB (megabytes) of byte-addressable memory. How
many bits are needed in the memory address?
• Memory = 64M X 1 B  Address Space = 64 MB = 26 X 220 B = 226 B
• 26 bits of address.
6 • A computer has 1 GB of memory. Each word in this computer is 32 bits.
How many bits are needed to address any single word in memory?
• Address Space = 1 GB = 230 B
• 1 word = 32 bits = 4 B
• We have 230 / 4 = 228 words
• Thus, we require 28 bits to address each word.
• If in the above question 1 word = 1 byte, then How many bits are needed to
address any single word in memory?
32 bits
1st word Data Type Size (in Bytes)

2nd word Character 1
. Integer 4
. Long Integer 8
.
Floating-point 4
ith word
Double-precision 8
.
7 . Typical Data Sizes
.
Last word
b31 b30 ..... b1 b0
Memory Words
Sign bit: b31 = 0 for positive number
8 bits 8 bits 8 bits 8bits
b31 = 1 for negative number
ASCII ASCII ASCII ASCII A signed integer

Character Character Character Character
Four Characters
Byte Ordering Conventions
• Many data items require multiple bytes for storage.
• Different computers use different data ordering conventions.
• Low-order byte first
8 • High-order byte first
• Thus a 16-bit number 11001100 10101010 can be stored as either:
11001100 10101010 10101010 11001100

Word Byte Address
address
The two conventions have been 0 3 2 1 0
named as: 4 7 6 5 4
a) Little Endian . .
• The least significant byte is stored . .
at lower address followed by the 2k-4 2k-1 2k-2 2k-3 2k-4
most significant byte. Examples:
Intel processors, DEC alpha, etc. Little Endian
9
Word Byte Address
address
b) Big Endian 0 0 1 2 3
• The most significant byte is stored 4 4 5 6 7
at lower address followed by the . .
least significant byte. Examples:
. .
IBM’s 370 mainframes,
Motorola microprocessors, 2k-4 2k-4 2k-3 2k-2 2k-1
TCP/IP, etc. Big Endian
An Example
Represent the following 32-bit number in both Little-Endian and Big-Endian in
memory from address 2000 onwards in byte addressable memory:
01010101 00110011 00001111 11000011
10
0 1 2 3
Little Endian Big Endian
Address Data Address Data
2000 11000011 2000 01010101
2001 00001111 2001 00110011
2002 00110011 2002 00001111
2003 01010101 2003 11000011
Alignment of Words in Memory
Address
• All words must be aligned in
memory to word boundaries. 0000H W1 W1 W1 W1
• Must start from an address that is 0004H W2 W2 W2

multiple of 4.
11 0008H W2
• Allows a word to be fetched in a
single cycle. 000CH W3 W3
• Misaligned words may require two
0010H W3 W3
cycles.
0014H W4
0018H W4 W4 W4
W1 is aligned, but W2, W3, W4 are not

Accessing Numbers, Characters and Strings
• A number usually occupies one word. It can be accessed by specifying its
word address.
• Individual characters can be accessed by their byte address.
• For character strings of variable length:
12
• The beginning of the string is indicated by giving the address of the location
containing its first character.
• Successive byte locations contains successive characters of the string.
• A special control character known as “End of the string” can be used as a last
character.
• Alternatively, a separate memory word location or processor register can
contain number indicating the length of the string in bytes.
Register Transfer Notation (RTN)
• This is a form of representing the transfer of information from one location in the
computer to another.
• Possible locations that may be involved in such transfers are memory locations,
processor registers, or registers in the I/O subsystem.
• Suppose we want to transfer the contents from memory location LOCA to register R0.
13 • The contents of the memory location will be represented by square brackets around the
name of the location.
R1  Mem[LOCA] or R1 [LOCA]
• Consider the scenario, where we add the contents of register R1 and R2 and place the
sum in register R3.
R3  R1 + R2
• Note that the right hand side of RTN always denotes a value, and left-hand side is the
name of the location, where the value is placed, overwriting the old contents of that
location.
Assembly Language Notation
• To represent machine instructions and programs, we use assembly
language format.
• Suppose we want to transfer the contents from memory location LOCA to
register R0.
14
MOVE R0, LOCA
• The contents of LOCA remain unchanged by execution of this instruction,
but the old contents of register R0 are overwritten.
• Consider the scenario, where we add the contents of register R1 and R2
and place the sum in register R3.
ADD R3, R1, R2
Memory Operations
• For executing the program in LOAD-STORE architecture, two basic
operations are required.
• Load: The contents of a specified memory location is read into a processor
register.
15
LOAD R1, 2000 R1  [2000]
• Store: The contents of a processor register is written into a specified
memory location.
STORE R3, 2020 [2020] R3
An Example
Compute S = (A + B) – (C – D)
• LOAD R1,A
• LOAD R2,B
16 • ADD R3,R1,R2 // R3 = A + B
• LOAD R1,C
• LOAD R2,D
• SUB R4,R1,R2 // R4 = C – D
• SUB R3,R3,R4 // R3 = R3 – R4
• STORE R3, S
17 THANK YOU
Unit - II
Lecture 2: Instruction types and Formats
Dr. Vineeta Jain

Instruction Set Architecture (ISA)
• Includes the microprocessor’s instruction set, the set of all of the assembly
language instructions that the microprocessor can execute
• Specifies:
• The registers accessible to the programmer, their size and the instructions in
2 the instructions set that can use each register
• Information necessary to interact with the memory - Certain microprocessors
require instructions to start only at specific memory locations; (e.g.
alignment)
• How microprocessors react to interrupts - Some microprocessors have
interrupts, that cause the processor to stop what is doing and perform some
other pre-programmed functions (e.g. interrupt routines)
Instruction Types
a) Data Transfer Instructions
• Operations that move data from one place to another
• These instructions don’ actually modify the data, they just copy it to the destination
b) Data Operation Instructions
3
• Unlike the data transfer instructions, the data operation instructions do modify their
data values
• They typically perform some operations using one or two data values (operands) and
store the result
c) Program Control Instructions
• Jump or branch instructions used to go in another part of the program; the jump can
be absolute (always taken) or conditional (taken only if some condition is met)
• Load data from the memory into the microprocessor
• These instructions copy the data from memory into registers
• Example: LOAD R2,X // R2  Mem[X]
• Destination operands is register and source is memory.
• Store data from the register into memory
4 • Similar to load data, except that the data is copied in the opposite direction
• Data is saved from registers into the memory
• Example: STORE R3, Z // Mem[X]  R3
• Destination operands is memory and source is register.
• Move data within the registers
• These instructions move data from one register into another
• Example: MOV R1, R3 // R1  R3
• Both source and destination are registers.
• The source operand may be:
• immediate data (integer constant) LOAD R1, #200  R1 = 200
• a register STORE R1, Z  MEM[Z] = R1
• a memory operand LOAD R1, 20A6H  R1 = Mem[20A6]
5
• The source and destination operand must be of same size
• If the source operand is immediate data, it must not exceed the
destination operand size.
b) Data Operation Instructions
• Arithmetic Instructions
• Add, subtract, multiply or divide: ADD, SUB, MUL, DIV, etc.
• Instructions that decrement or increment one from a value: INC, DEC
6 • Floating point Instructions that operate on floating point values : FADD, FSUB,
FMUL, etc.
• Logic Instructions: AND, OR, NOT, XOR, etc.
• Shift instructions: SR, SL, RR, RL, etc.
Arithmetic Instructions
• Addition: This instruction adds source operand to a destination operand of
the same size.
• Example: ADD, R1, R2  R1 = R1+R2
• Subtraction: This instruction subtracts a source operand from a destination
operand
7
• Example: SUB R1, R2  R1 = R1 – R2
• Multiplication: It multiplies source with the destination operand
• Example: MUL R1, R2  R1 = R1*R2
• Division: DIV R1, R2  R1 = R1/R2
• Increment/Decrement: INC is used to add 1 to the value of a single
operand and DEC is used to subtract 1 from the value of a single operand
• Example: INC R1  R1 = R1+1
Logic Instructions
• AND: produces the result 1 when both input bits are 1
• Example: AND R0, 0xFF000000 (where R0 = 0xFFFFFFFF)
 R0 = R0 && FF000000  Final result = 0xFF000000 contained in R0
• OR: It results into 1 when either input bit is 1
• Example: OR R0, 0xFF000000  0xFFFFFFFF contained in R0
8
• XOR: It results into 1 when the input bits are different and is known as exclusive OR.
• Example: XOR R0, 0xFF000000  0x00FFFFFF contained in R0
• NOT: It results into the reverse of input bit i.e. 1 becomes 0 and 0 becomes 1
• Example: NOT R0 (where R0 = $FFFFFFFF)  0x00000000 contained in R0
• To compute 2’s complement of R0, the commands should be
NOT R0;
ADD R0, #1
• Some processor also support NEGATE command for the same.
Shift Instructions
• Logical Shift Instructions
• Two logical shift instructions are there:
• LShiftL for shifting left
• LShiftR for shifting right
9 • Syntax: LShiftL <destination>, <count>
• These instructions shift the destination operand over a number of bit positions
specified in a count operand contained in the instruction.
• The count operand may be given as an immediate operand, or may be contained
in a processor register.
• Vacated positions are filled with zeros, and the bits shifted out are passed
through Carry flag, C, and then dropped.
C R0 0
before 0 0 1 1 1 0 . . . . 0 1 1 Logical
Shift
Left
LShiftL R0, #2
after 1 1 1 0 . . . . 0 1 1 0 0
10
0 R0 C
before 0 1 1 1 0 . . . . 0 1 1 0 Logical
Shift
LShiftR R0, #2 Right
after 0 0 0 1 1 1 0 . . . ..0 1
Shift Instructions
• Arithmetic Shift Instructions
• Follows 2’s complement number representation
• Two logical shift instructions are there:
11
• AShiftL for shifting left: the number gets multiplied by 2
• AShiftR for shifting right: The number gets divided by 2
• Syntax: AShiftL <destination>, <count>
• Vacated positions are filled with zeros in left shift, and filled with sign bit
in case of right shift.
C R0 0
before 0 0 0 0 1 1 . . . . 0 1 0 Arithmetic
Shift Left
AShiftL R0, #2
after 0 0 1 1 . . . . 0 1 0 0 0
12
R0 C
Arithmetic
before 1 0 0 1 1 . . . . 0 1 0 0 Shift Right
AShiftR R0, #2
after 1 1 1 0 0 1 1 . ... 0 1
Rotate Operations
• In the shift operations, the bits shifted out of the operand are lost, except for
the last bit shifted out which is retained in the carry flag C.
• To preserve all the bits, a set of rotate instructions can be used.
• They move the bits that are shifted out of the one end of the operand back
13 into the other end.
• They are of two types:
• Rotate without carry: The bits of the operand are simply rotated and the last
rotated bit is retained in the Carry flag C.
• Rotate with Carry: The rotation includes the Carry flag C
C R0
before 0 0 1 1 1 0 . . . . 0 1 1 Rotate
Left
without
RotateL R0, #2 Carry
after 1 110 . . . . 0 1 1 0 1
14
C R0
Rotate
before 0 0 1 1 1 0 . . . . 0 1 1 Left
With
Carry
RotateLC R0, #2
after 1 110 . . . . 0 1 1 0 0
R0 C
before 0 1 1 1 0 . . . . 0 1 1 0 Rotate
Right
RotateR R0, #2 without
Carry
1 1 0 1 1 0 . . . . 0 1
after
15
R0 C Rotate
Right
With
Carry
before 0 1 1 1 0 . . . . 0 1 1 0
RotateRC R0, #2
after 1 0 0 1 1 0 . . . . 0 1
• A program control instruction changes address value in the PC and hence the normal
flow of execution.
• Change in PC causes a break in the execution of instructions.
• It is an important feature of the computers since it provides the control over the
16 flow of the program and provides the capability to branch to different program
segments.
Name Mnemonic
Branch BR
Jump JMP
Skip next instruction SKP
Call procedure CALL
Return from procedure RET
Compare (by subtraction) CMP
Test (by ANDing) TEST
• Branch (BR) and Jump (JMP) instructions are used sometimes interchangeably but, they are
different. BR can be conditional while jump is unconditional.
• Jump is used to refer to unconditional version of branch.
• Skip (SKP) instructions is used to skip one(next) instruction. It does not need address field.
• Compare Instruction compares two operands. It basically subtracts one operand from the
other for comparing whether the operands are equal or not. It is used along with the
17 conditional branch instruction for decision making.
• Syntax: CMP destination, source
• Example: CMP DX,00 //Compare the DX value with zero by subtracting
BE L7 //If yes(BE: Branch equal), then jump to label L7
…..
L7: ...
• Similarly TEST instructions performs the AND of two operands.
Conditional Branch Instructions
Code Values
• A conditional branch N (Negative) Set to 1 if the result is negative; otherwise 0
instruction is a branch Z (zero) Set to 1 if the result is 0; otherwise 0
instruction that may or V (overflow) Set to 1 if the arithmetic flow occurs; otherwise 0
may not cause a
C (carry) Set to 1 if a carry-out results from the operation;
transfer of control otherwise 0
depending on the value
18 of stored bits in the PSR Branch Condition Mnemonic Example
(processor status Branch if zero BZ Z=1
register). Branch if not zero BNZ Z=0
Branch if carry BC C=1
• Each conditional branch
Branch if no carry BNC C=0
instruction tests a
Branch if minus BN N=1
different combination
Branch if plus BNN N=0
of Status bits for a
Branch of overflow BV V=1
condition.
Branch if no overflow BNV V=0
19
MIPS Programming Examples
Some Examples of MIPS32 Arithmetic
C- Code C- Code
A=B+C A = B + C – D;
B loaded in $s1
E = F + A;
C loaded in $s2
20 D loaded in $s3
F loaded in $s5
MIPS32- Code $t0 is a temporary
add $s1, $s2, $s3 A  $s0; E  $s4
MIPS32- Code
B loaded in $s2
C loaded in $s3 add $t0, $s1, $s2
A  $s1 sub $s0, $t0, $s3
add $s4, $s5, $s0
Examples on Control Constructs
C- Code
if (x==y) z = x – y;
$s0 loaded with x

21 $s1 loaded with y
MIPS32- Code z  $s3
bne $s0, $s1, Label

sub $s3, $s0, $s1
Label: ……
Examples on Control Constructs
C- Code
if (x != y)
z = x – y;
else
z = x + y;
$s0 loaded with x
22 $s1 loaded with y
z  $s3
MIPS32- Code
beq $s0, $s1, Lab1
sub $s3, $s0, $s1
j Lab2
Lab1: add $s3, $s0, $s1
Lab2: ….
Examples on Loops
C- Code
sum = 0; i=0;
while (i<10) {
sum = sum + a[i]
i = i+1
} $t0 loaded with address of a[0]
23 $t2 holds the sum
MIPS32- Code $t3 is the counter of the loop

$t1 is the temporary variable
loop: lw $t1, 0($t0)
add $t2, $t2, $t1
addi $t3, $t3, 1
addi $t0, $t0, 4
bne $t3, 10, loop
• MIPS32 supports a limited set of conditional branch instructions:
• beq $s2,Label // Branch to Label of $s2 = 0
• bne $s2,Label // Branch to Label of $s2 != 0
• Suppose we need to implement a conditional branch taker comparing two registers
for less-than or greater than.
24 Set if less than.

C- Code
MIPS32- Code
if (x < y) If $s0 < $s1, then
z = x – y; slt $t0,$s0,$s1 set $t0=1; else
else
beq $t0, $zero, Lab1
z = x + y; $t0=0.
sub $s3, $s0, $s1
j Lab2
Lab1: add $s3, $s0, $s1
Lab2: ….
Working with Immediate Values in Registers
• Case 1: Small constants, which can be specified in 16 bits.
• Occurs most frequently (about 90% of the time).
• Examples:
A = A + 16;  addi $s1, $s1, 16 (A in $s1)
25 X = Y – 1025;  subi $s1, $s2, 1025 (X in $s1, Y in $s2)
A = 100;  addi $s1, $zero, 100 (A in $s1)
• Case 2: Large constants, that require 32 bits to represent
• Examples:
• load 0xAAAA3333 in $s1  la $s1, 0xAAAA3333
26
Instruction Formats
Instruction Format
An instruction consists of two parts:-
a) Operation Code or Opcode
• Specifies the operation to be performed by the instruction.
• Various categories of instructions: data transfer, arithmetic and logical,
27 control, I/O and special machine control.
b) Operand(s)
• Specifies the source(s) and destination of the operation.
• Source operand can be specified by an immediate data, by naming a register,
or specifying the address of memory.
• Destination can be specified by a register or memory address.
Example:
ADD R1, R2, R3 //R1 R2+R3 (Arithmetic)
OPCODE OPERANDS
MOVE R1, R2 //R1R2 (Data Transfer)
28
• Number of operands varies from instruction to instruction.
• Also for specifying an operand, various addressing modes are possible
Types of Instruction Format
a) Three-Address Instruction
b) Two-Address Instruction
c) One-Address Instruction
29
d) Zero-Address Instruction
a) Three-Address Instruction
• Two source operands and one destination operand need to be specified.
Opcode Destination Source 1 Source 2
• Computers with three-address instruction formats can use each address

30
field to specify either a processor register or a memory operand or a
immediate data (only in source field).
• It is also called General register organization.
• Example: ADD R1, R2, R3 R1 R2+R3

EVALUATE X=(A+B)*(C+D)
Add R1, A, B R1 ← Mem[A]+Mem[B]

31 R2 ← Mem[C]+Mem[D]
Add R2, C, D
MUL X, R1, R2 Mem[X] ← R1 * R2

b) Two-Address Instruction
• One source operands and one destination need to be specified.
Opcode Destination Source 1
• Assumes that the destination address is the same as that of the first
32
operand.
• Computers with two-address instruction formats can use each address
field to specify either a processor register or a memory operand or a
immediate data (only in source field).
• Example: ADD R1, R2 R1 R1+R2
MOV R1, A R1 ← Mem[A]
Add R1, B R1 ← R1 + Mem[B]

33
MOV R2, C R2 ← Mem[C]
Add R2, D R2 ← R2 + Mem[D]
MUL R1, R2 R1 ← R1 * R2
MOV X, R1 Mem[X] ← R1
b) One-Address Instruction
• Only source operands needs to be specified.
Opcode Source 1
• Assumes that the destination address is Accumulator register (ACC) of the

34
processor. It is also known as Accumulator Instruction format.
• Source operand can be either a processor register or a memory operand or
a immediate data.
• Example: ADD X ACC ACC+Mem[X]
LOAD A ACC ← Mem[A]
Add B ACC ← ACC + Mem[B]
STORE T Mem[T] ← ACC

35
LOAD C ACC ← Mem[C]
Add D ACC ← ACC + Mem[D]
MUL T ACC ← ACC * Mem[T]
STORE X Mem[X] ← ACC

b) Zero-Address Instruction
• No operand needs to be specified
Opcode
• Stack is used. Arithmetic operation pops two operands from the stack and
36
pushes the result.
• Also called stack organization.
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
37
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D)
POP X Mem[X] ← TOS STACK
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
38
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS A
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
39
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS B
MUL TOS ←(A+B)*(C+D) A
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
40
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS A+B
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
41
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS C
MUL TOS ←(A+B)*(C+D) A+B
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
42
PUSH C TOS ← C
PUSH D TOS ← D
TOS D
ADD TOS ←C+D C
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
43
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS C+D
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
44
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS (A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
45
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS
POP X Mem[X] ← TOS X = RESULT

STACK
Instruction Sets
Accumulator- 1- address Add X ACC  ACC + Mem[X] One operand is
based instruction accumulator
Stack-based 0-address Add TOS TOS + Next

instruction
Memory- 2 or 3-address Add A,B Mem[A] Mem[A]+Mem[B] Both the operands

46 Memory based instruction Add A, B, C Mem[A] Mem[B]+Mem[C] are in memory
Register- 2 or 3-address LOAD R1, A R1 Mem[A] One of the operands

Memory based instruction is in register and
another in
memory.
Register- 2 or 3-address Add R1, R2, R1 R2+R3 Also called load-
Register based instruction R3 store architecture,
as only LOAD and
STORE can access
memory
47 THANK YOU
Unit - II
Lecture 3: Addressing Modes
Dr. Vineeta Jain
What are Addressing Modes?
• They specify the mechanism by which the operand data can be located.
• Various addressing modes exist:
• Immediate Mode
• Direct Mode
• Indirect Mode
2 • Register Mode
• Register Indirect Mode
• Indexed Mode
• Stack
• Relative
• Others - Autoincrement, Autodecrement, Base
• Effective address is the address of the exact memory location where the
value of the operand is present.
Immediate Addressing Mode
• The operand is part of the instruction itself.
• No memory reference is required to access the operand.
• Fast but limited range (because a limited number of bits are provided
to specify the immediate data).
3
• Examples:
• ADD #25 // ACC = ACC + 25
• ADDI R1,R2,42 // R1 = R2 + 42
Direct Addressing
• The instruction contains a field that holds the memory address
of the operand.
• Examples:
4 • ADD R1,20A6H // R1 = R1 + Mem[20A6]
• EA = 20A6H
• Single memory access is required to access the operand.
• No additional calculations required to determine the operand
address.
• Limited address space (as number of bits is limited, say, 16 bits).
OPCODE Operand address
Memory
Operand
Indirect Addressing
• The instruction contains a field that holds the memory address,
which in turn holds the memory address of the operand.
• Two memory accesses are required to get the operand value.
• Slower but can access large address space.
6
• Not limited by the number of bits in operand address like direct
addressing.
• For a word length of N, an address space of 2N is now available.
• Examples:
• ADD R1,(20A6H) // R1 = R1 + (Mem[20A6])
• EA = (20A6H)
OPCODE Operand address
Memory
7
Operand
Pointer
Register Addressing
• The operand is held in a register, and the instruction specifies the
register number.
• Very few number of bits needed, as the number of registers is limited.
• Faster execution, since no memory access is required for getting the
8
operand.
• Modern load-store architectures support large number of registers.
• Examples:
• ADD R1,R2,R3 // R1 = R2 + R3
• MOV R2,R5 // R2 = R5
• EA = R
OPCODE Register no.
Register Bank
Operand
Register Indirect Addressing
• The instruction specifies a register, and the register holds the
memory address where the operand is stored.
• Can access large address space.
10 • One fewer memory access as compared to indirect addressing.
• Example:
• ADD R1,(R5) // R1 = R1 + Mem[R5]
• EA = (R5)
OPCODE Register no.
Register Bank Memory
11
Memory address Operand

Relative Addressing (PC Relative)
• The instruction specifies an offset of displacement, which is
added to the program counter (PC) to get the effective address
of the operand.
• Since the number of bits to specify the offset is limited, the range
12
of relative addressing is also limited.
• If a 12-bit offset is specified, it can have values ranging from
-2048 to +2047. Memory
OPCODE offset
ADDER Operand
PC
Indexed Addressing
• Either a special-purpose register, or a general-purpose register, is
used as index register in this addressing mode.
• The instruction specifies an offset of displacement, which is added to
the index register to get the effective address of the operand.
13
• Example:
• LOAD R1,1050(R3) // R1 = Mem[1050+R3]
• Add R1, [R2] // R1 = R1 + Mem[R2*d]
Where, d = size of the word (example: d = 32 bits or 4 B for byte addressable
memory)
• Can be used to sequentially access the elements of an array.
• Offset gives the starting address of the array, and the index register
value specifies the array element to be used.
OPCODE Index reg offset
Register Bank Memory
14
Memory address ADDER Operand

Stack Addressing
• Operand is implicitly on top of the stack.
• Used in zero-address machines earlier.
• Examples:
15 • ADD
• PUSH X
• POP X
• Many processors have a special register called the stack pointer
(SP) that keeps track of the stack-top in memory.
• PUSH, POP, CALL, RET instructions automatically modify SP.
Some Other Addressing Modes
• Base addressing
• The processor has a special register called the base register or
segment register.
16
• All operand addresses generated are added to the base register
to get the final memory address.
• Autoincrement and Autodecrement
• The register holding the operand address is automatically
incremented or decremented after accessing the operand (like
a++ and a-- in C).
Some Examples
Assume that registers R2, R3 and R4 store initial values of 200, 300 and 400
respectively. What value will be stored in register R1 and in R2 in each of the
following cases:
a) ADD R1, R2, R3 Memory
b) ADD R2, R4, #15 200 25
300 50
17 c) ADD R1, R2, (R4)
400 50
d) ADD R2, R3, 100(R4) 500 100
e) ADD R1, R2, (R3+ R4) 596 40
f) ADD R2, R4, 600 600 200
700 100
g) ADD R1, R2, (600)
1000 150
h) ADD R1, R3, (R2)+ (assume d=4)
1200 300
i) ADD R1, R4, -(R2) (assume d=4) 1600 400
j) ADD R1, R2, 100(R3)[R3] (assume d=4) Address Contents
Memory
a) ADD R1, R2, R3 R2 = 200 200 25
R3 = 300 300 50
R4 - 400 400 50
R1  R2 + R3 = 200 + 300
500 100
= 500
596 40
600 200
700 100
1000 150
18
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500
R2 200
Memory
b) ADD R2, R4, #15 200 25
R3 = 300 300 50
R4 = 400 400 50
R2  R4 + 15 = 400 + 15
500 100
= 415
596 40
600 200
700 100
1000 150
19
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500
R2 200 415
Memory
c) ADD R1, R2, (R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1  R2 + Mem[R4] = R2 + Mem[400]
500 100
= 415 + 50
596 40
= 465 600 200
700 100
1000 150
20
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465
R2 200 415 415
Memory
d) ADD R2, R3, 100(R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R2  R3 + Mem[R4+100] = R3 + Mem[500]
500 100
= 300 + 100
596 40
= 400 600 200
700 100
1000 150
21
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465
R2 200 415 415 400
Memory
e) ADD R1, R2, (R3+R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1  R2 + Mem[R3+R4] = R2 + Mem[700]
500 100
= 400 + 100
596 40
= 500 600 200
700 100
1000 150
22
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500
R2 200 415 415 400 400
Memory
f) ADD R2, R4, 600 200 25
R3 = 300 300 50
R4 = 400 400 50
R2  R4 + Mem[600] = 400 + 200
500 100
= 600
596 40
600 200
700 100
1000 150
23
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500
R2 200 415 415 400 400 600
Memory
g) ADD R1, R2, (600) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1  R2 + [Mem[600]] = 600 + [200]
500 100
= 600 + 25
596 40
= 625 600 200
700 100
1000 150
24
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625
R2 200 415 415 400 400 600 600
Memory
h) ADD R1, R3, (R2)+ (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1  R3 + Mem[R2] = 300 + Mem[600]
500 100
= 300 + 200
596 40
= 500 600 200
size of word = 32 bits = 4B 700 100
25 Since, every word occupies 4B, the address will increase by 4. 1000 150
1200 300
Therefore, (R2)+  R2 + 4 = 604
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500
R2 200 415 415 400 400 600 600 604
Memory
i) ADD R1, R4, -(R2) (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
size of word = 32 bits = 4B
500 100
Therefore, -(R2)  R2 - 4 = 600
596 40
600 200
R1  R4 + Mem[R2] = 400 + Mem[600] 700 100
26 = 400 + 200 1000 150

1200 300
= 600
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500 600
R2 200 415 415 400 400 600 600 604 600
Memory
j) ADD R1, R2, 100(R3)[R3] (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
500 100
R1  R2 + Mem[R3 + 100 + R3*4]
596 40
= 600 + Mem[300 + 100 + 300*4] 600 200
= 600 + Mem[1600] 700 100
27 = 600 + 400 1000 150

1200 300
= 1000
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500 600 1000
R2 200 415 415 400 400 600 600 604 600 600
Some Examples
What is the value of R1 and R2 at all stages?
1. MOV R1, 100(R2) Initially R1 =100, R2= 200
2. ADD R1, R2 Memory
3. MOV R2, R1
28 100 200
4. HALT
200 300
1. R1 Mem[R2+100] = Mem[300] = 150
2. R1 R1+R2 = 150 + 200 = 350 R1 R2
300 150
3. R2 R1 = 350 Address Contents
1. 150 200
2. 350 200
3. 350 350
29 THANK YOU
Unit - II
Lecture 4: Expanding Opcodes
Dr. Vineeta Jain
Instruction Length
• Instructions on current architectures can be formatted in two
ways:
• Fixed length -Wastes space but is fast and results in better
performance.
2
• Variable length - More complex to decode but saves storage
space.
• Examples:
• MIPS uses a fixed length instruction of 32 bits.
• Let us assume that instruction length = 16 bits (fixed), opcode = 4bits and
operand = 4bits
• For 3 –address instruction: Total instructions = 24 = 16
OPCODE Operand 1 Operand 2 Operand 3
4 bits 4 bits 4 bits 4 bits

3 • For 2-address instruction: Total instructions = 24 = 16
OPCODE Operand 1 Operand 2 4 bits are wasted
4 bits 4 bits 4 bits
• For 0-address instruction: Total instructions = 24 = 16

OPCODE 12 bits are wasted
4 bits
Expanding Opcodes
• We have seen how instruction length is affected by the number of
operands supported by the ISA.
• In any instruction set, not all instructions require the same
4
number of operands.
• Operations that require no operands, such as HALT, necessarily
waste some space when fixed-length instructions are used.
• One way to recover some of this space is to use expanding
opcodes.
• The idea of expanding opcodes is to make some opcodes short,
but have a means to provide longer ones when needed.
• When the opcode is short, a lot of bits are left to hold operands.
• So, we could have two or three operands per instruction.
• If an instruction has no operands (such as Halt), all the bits can be
5 used for the opcode.
• In between, there are longer opcodes with fewer operands as well
as shorter opcodes with more operands.
How does expanding opcode work
• Let us assume that instruction length = 16 bits (fixed) and operand = 4bits
• For 3 –address instruction:
0000 xxxx yyyy zzzz
15 instructions
0001 xxxx yyyy zzzz
as 1111 is not
….. …. …. ….
allowed
1110 xxxx yyyy zzzz
6
• For 2-address instruction:
1111 0000 yyyy zzzz

15 instructions 1111 0001 yyyy zzzz
….. …. …. ….
1111 1110 yyyy zzzz
opcode
How does expanding opcode word
• For 1 –address instruction:
1111 1111 0000 zzzz
15 instructions
1111 1111 0001 zzzz
….. ….. …. ….
1111 1111 1110 zzzz
7 opcode
• For 0-address instruction:
1111 1111 1111 0000
1111 1111 1111 0001
16 instructions
….. ….. ….. ….
1111 1111 1111 1111
opcode
if (leftmost four bits != 1111 ) {
Execute appropriate three-address instruction}
else if (leftmost eight bits != 1111 1111 ) {
Execute appropriate two-address instruction}
else if (leftmost twelve bits != 1111 1111 1111 ) {
8
Execute appropriate one-address instruction }
else {
Execute appropriate zero-address instruction
}
Example 1
Consider a machine with 16-bit instructions and 16 registers. And we
wish to encode the following instructions:
a) 15 instructions with 3 addresses
9 b) 14 instructions with 2 addresses
c) 31 instructions with 1 address
d) 16 instructions with 0 addresses
Can we encode this instruction set in 16 bits?
Answer: Yes if we use expanding opcodes
a) For 3 –address instruction:
0000 R1 R2 R3
15 instructions
0001 R1 R2 R3
….. …. …. ….
1110 R1 R2 R3
10
b) For 2-address instruction:
1111 0000 R1 R2
1111 0001 R1 R2
14 instructions
….. …. …. ….
1111 1101 R1 R2
opcode
c) For 1 –address instruction:
1111 1110 0000 R1
1111 1110 0001 R1
16 instructions ….. ….. …. ….
1111 1110 1111 R1
1111 1111 0000 R1
15 instructions
1111 1111 0001 R1
….. ….. …. ….
11
1111 1111 1111 R1
d) For 0-address instruction:
1111 1111 1111 0000

16 instructions 1111 1111 1111 0001
….. ….. ….. ….
1111 1111 1111 1111
• How do we know if the instruction set we want is possible when
using expanding opcodes?
• We must determine if we have enough bits to create the desired
number of bits patterns
12
Going back to Example 1 (Slide 30):
• The first 3-address 15 instructions account for:
• 15x24x24x24 = 15 x 212 = 61440 bit patterns
• The next 2-address 14 instructions account for:
• 14 x 24 x 24 = 15 x 28 = 3584 bit patterns
• The next 1-address 31 instructions account for:
13
• 31 x 24 = 496 bit patterns
• The last 0-address 16 instructions account for 16 bit patterns
• In total we need 61440 + 3584 + 496 + 16 = 65536 different bit patterns
• Having a total of 16 bits we can create 216 = 65536 bit patterns
• We have an exact match with no wasted patterns.
• So our instruction set is possible.
Example 2
Is it possible to design an expanding opcode to allow the following to
be encoded with a 12-bit instruction? Assume a register operand
requires 3 bits.
14
a) 4 instructions with 3 registers
b) 255 instructions with 1 register
c) 16 instructions with 0 register
• The first 4 instructions account for:
• 4x23x23x23 = 4 x 29 = 2048 bit patterns
• The next 255 instructions account for:
• 255 x 23= 2040 bit patterns
• The last 16 instructions account for 16 bit patterns
• In total we need 2048 + 2040 + 16 = 4104 bit patterns
15 • With 12 bit instruction we can only have 212 = 4096 bit patterns
• Required bit patterns (4104) is more than what we have (4096), so this
instruction set is not possible with only 12 bits.
Example 3
Given 8-bit instructions, it is possible to use expanding opcodes to
allow the following to be encoded? If so, show the encoding.
a) 3 instructions with two 3-bit operands
16 b) 2 instructions with one 4-bit operand
c) 4 instructions with one 3-bit operand
First, we must determine if the encoding is possible.
a) 3 * 23 * 23 = 3 * 26 = 192
b) 2 * 24 = 32
c) 4 * 23 = 32
• If we sum the required number of bit patterns, we get 192 + 32 + 32 = 256.
• 8 bits in the instruction means a total of 28 =256 bit patterns, so we have an
17 exact match (which means the encoding is possible, but every bit pattern will
be used in creating it).
The encoding we can use is as follows:
00 xxx xxx
01 xxx xxx 3 instructions with two 3-bit operands
10 xxx xxx
11 – escape opcode
1100 xxxx
18 2 instructions with one 4-bit operand
1101 xxxx
11100 xxx
11101 xxx
4 instructions with one 3-bit operand
11110 xxx
11111 xxx
Example 4
Consider a processor supporting 12 bit long instructions and 1 KB
memory space. If there exists 2 1-address instructions, then how
many 0-address instructions can be formulated?
19
Total instruction length = 12 bits
Memory = 1KB = 210 B
Memory
Opcode
address
2 bits 10 bits
Steps:
20 1. Identify the higher order instruction format
2. Identify total no. of instructions possible
3. Identify no. of free opcodes
4. Calculate the no. of derived opcodes by multiplying the no. of free opcodes
with decoded value of address field.
1. 1-address instruction is higher format
2. Total 1-address instruction = 22 = 4
00 xxxxxxxxxx
2 instructions with 1 memory operand
01 xxxxxxxxxx
3. 10 – free opcode
11 – free opcode
21
4. 10 0000000000 11 0000000000
210 210
10 0000000001 instructions 11 0000000001 instructions
………… …………….
10 1111111111 11 1111111111
Total 0-address instructions = 210 + 210 = 211
Total possible instructions = 1-address + 0-address = 2 + 211 = 2050
Example 4
A CPU is designed to have 58 3-address instructions. The CPU is able
to address a maximum of 16 memory locations. If length of all the
instructions is same, then by using expanding opcode technique,
calculate the 2-address instructions possible.
22
23 THANK YOU
Unit - II
Lecture 5: Flow of Control
Dr. Vineeta Jain
Instruction Execution and Straight Line Sequencing
Address Contents C A + B
32 bits
Begin execution here i Move R0, A 3-instruction

i+4 Add R0, B Program
i+8 Move C, R0 segment
2 Program Counter .
.
i
A
.
.
Data for the
B program
.
.
C
• Executing a given instruction is a 2-phase process
• First phase is called – instruction fetch.
• The instruction is fetched from the memory location whose address is there in the
PC.
• This instruction is placed in instruction register (IR) in the processor.
3 • Second phase – instruction execute.
• The instruction in IR is examined to determine which operation is to be performed.
• The specified operation is then performed by the processor.
• This often involves fetching operands from the memory or from the processor
registers, performing an arithmetic or logic operation, and storing the result in
destination location.
• At some point in the 2-phase process, the contents of PC is incremented.
Branching
32 bits
32 bits Move R1, N
i Move R0, NUM1 Clear R0
i+4 Add R0, NUM2 LOOP Determine address

of “Next” number
i+8 Add R0, NUM3 and add “next”
. number to R0
. Program
Decrement R1
Loop
4 i+4n-4 Add R0, Numn Branch>0 LOOP
i+4n Move Sum, R0 Move Sum, R0
. .
. .
Sum Sum
Num1 Num1
Num2 Num2
. .
. .
Numn Numn
Stacks
• A computer program often needs to perform a particular subtask using the familiar
subroutine structure. In order to organize the control and information linkage
between the main program and subroutine, a data structure called stack is used.
• A stack is a list of data elements with the accessing restrictions that elements can be
added or removed at one end of the list only. This end is called top of the stack and
other end is called bottom.
5
• This structure is also called as pushdown stack. Example, a pile of books.
• Stack follows LIFO approach, i.e., Last-in-first-out. The terms push and pop are used
to describe placing a new item on the stack and removing the top from the stack,
respectively.
• Data stored in the memory of the computer can be organized as a stack with
successive elements occupying successive memory locations.
• A processor register “Stack pointer” is used to keep track of top element in the stack.
A stack in the memory
0
Stack pointer
register .
.
Current top
SP -28 element
6 17
Stack 739
.
.
Bottom
BOTTOM 43 element
.
.
2k-1
Stack Operations
• If we assume a word-addressable memory with a 32-bit word length, the Push
operation can be implemented as:
Subtract SP, #4 //SP SP - 4
Move (SP), NEWITEM //Mem[SP] = NEWITEM
7 • The pop operation can be implemented as:

Move ITEM, (SP) //ITEM = Mem[SP]
Add SP, #4 //SP SP + 4
Push and Pop Operations
SP 19
-28 -28
17 SP 17
739 Stack 739
8
. .
. .
BOTTOM 43 BOTTOM 43
NEWITEM 19 ITEM -28
PUSH POP
Safe PUSH and POP operations
Suppose a stack runs from location 2000 (BOTTOM) down no further than location
1500. The stack pointer is initially loaded with address value 2004, i.e., the first
element will be loaded at address 2000.
Compare to see if SP
SAFEPOP Compare SP, #2000
contains an address value
9 greater than 2000. If it does,
Branch>0 EMPTYERROR the stack is empty.
Move ITEM, (SP)+ If not, pop the element
Compare to see if SP
SAFEPUSH Compare SP, #1500
contains an address value
less than or equal to 1500. If
Branch ≤ 0 FULLERROR it does, the stack is full.
Move -(SP), NEWITEM If not, push the element
Subroutines:
• A subroutine is a program fragment that lives in user space, performs a well-
defined task. It is invoked by another user program and returns control to the
calling program when finished.
• When a program branches to a subroutine, we say it is calling the subroutine.
• The instruction that performs the branch operation is called Call instruction.
10 • After a subroutine is executed, the calling program must resume execution,
continuing immediately after the instruction that called the subroutine.
• The subroutine is said to return to the program that called it by executing a Return
instruction.
• The way in which a computer makes it possible to call and return from subroutines
is referred to as its subroutine linkage method.
• The simplest subroutine linkage method is to save the return address in link
register.
Call and Return instructions
• The Call instruction is a special branch instruction that performs

the following operations:
• Store the contents of PC in the link register
11 • Branch to the target address specified by the instruction
• The Return instruction is a special branch instruction that
performs the operation:
• Branch to the address contained in the link register
Subroutine linkage with link register
Memory Memory
Calling Program Subroutine SUB1
location Location
.
.
200 CALL SUB1 1000 First instruction
.
204 Next instruction
12 .
.
Return
.
1000
PC 204
Call instruction Return instruction

204
Link
Subroutine Nesting
• Subroutine nesting refers to one subroutine calling another.
• In this case, the return address of the second call is also saved in the link
register, destroying its previous contents.
• Hence, it is essential to save the contents of link register in some other
location before calling other subroutine. Otherwise, the contents will be lost.
13 • The return address associated with subroutine calls are pushed in a
processor stack as the return address needed for the first return is the last
one generated in nested call sequence, i.e., last in first out.
• A particular register called “stack pointer” points to the top of the stack.
• Call instruction pushes the contents of the PC on the processor stack and
loads subroutine address in the PC.
• The Return instruction pops the return address from the processor stack
onto the PC.
Parameter Passing
Calling Program
Move R1, N R1, serves as a counter

NUM1 = starting address of the
Move R2, #NUM1
list. So, R2 points to the list
CALL LISTADD Call Subroutine
14 Move SUM, R0 Save result
.
.
Subroutine
LISTADD Clear R0 Initialize sum to 0
LOOP Add R0, (R2)+ Add entry from list
Decrement R1
Branch>0 LOOP
Return Return to calling program
Calling Program
Move -(SP), #NUM1
Push parameters onto stack
Move -(SP), N
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result
100
Add SP, #8 Restore TOS (at level 1)
104
Subroutine
LISTADD MoveMultiple -(SP), R0-R2 Save registers (TOS at level 3) 108
Move R1, 16(SP) Initialize counter to n 112
Move R2, 20(SP) Initialize pointer to the list 116
Clear R0 120
124 SP
Decrement R1 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Calling Program
Move -(SP), #NUM1
Move -(SP), N
32 bits
100
104
Subroutine
Clear R0 120
124 SP
Calling Program
Move -(SP), #NUM1
Move -(SP), N
32 bits
100
Subroutine 104

Clear R0
120 NUM1 SP
LOOP Add R0, (R2)+ Add entry from list Level 1
124
Decrement R1
Calling Program
Move -(SP), #NUM1
Move -(SP), N
32 bits
100
Subroutine 104

Clear R0
120 NUM1 SP
124
Decrement R1
Calling Program
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n
32 bits
100
Subroutine 104

Move R2, 20(SP) Initialize pointer to the list 116 n SP
Clear R0
120 NUM1
124
Decrement R1
Calling Program
32 bits
100
Subroutine 104

Clear R0
120 NUM1
124
Decrement R1
Calling Program
32 bits
100
Subroutine 104

Move R1, 16(SP) Initialize counter to n Return Level 2
112
Move R2, 20(SP) Initialize pointer to the list address SP
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Calling Program
32 bits
100
Subroutine 104

112
Clear R0 116 n
Calling Program
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
LISTADD MoveMultiple -(SP), R0-R2 Save registers (TOS at level 3) 108 R0

112
Move R2, 20(SP) Initialize pointer to the list address
Clear R0 116 n
Calling Program
32 bits
100 R2
Subroutine 104 R1

112
Clear R0 116 n
R1
Calling Program
n
32 bits
100 R2
Subroutine 104 R1

112
Clear R0 116 n
R1 R2
Calling Program
n NUM1
32 bits
100 R2
Subroutine 104 R1

112
Clear R0 116 n
R1 R2
Calling Program
n NUM1
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 0
32 bits
100 R2
Subroutine 104 R1

112
Clear R0 116 n
R1 R2
Calling Program
n NUM1
NUM2
0
32 bits
100 R2
Subroutine 104 R1

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
n
n-1 NUM2
32 bits
100 R2
Subroutine 104 R1

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
n-1 NUM2
32 bits
100 R2
Subroutine 104 R1

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
0 NUMn
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 total
32 bits
100 R2
Subroutine 104 R1

Stack 112
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 total
NUM1
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
0 NUM2
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 SUM
0
total
32 bits
100 R2
Subroutine 104 R1

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Stack 112
Clear R0 116 n
Branch>0 LOOP …..
NUM1 10
….
NUMn 12
R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Move R1, 16(SP) Initialize counter to n Stack 112
Level 2
Clear R0
120 total
124
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack NUM1 10

MoveMultiple R0-R2, (SP)+ Restore registers ….
Return Return to calling program NUMn 12
R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Level 2
Clear R0
120 total
124
Branch>0 LOOP …..

R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Level 2
Clear R0
120 total
124
Branch>0 LOOP …..

R1 R2
Calling Program
0 NUM2
0
32 bits
100
Subroutine 104

Level 2
Clear R0
120
124 SP
Branch>0 LOOP …..

Frame Pointer
• In addition to SP, there is another pointer register called Frame pointer (FP), for convenient
access to parameters passed and local memory variables used by the subroutine.
• Let us assume that 4 parameters are passed to the subroutine, 3 local variable are used
within a subroutine, and registers R0 and R1 are used in the subroutine.
Stack pointer SP Saved [R1]
Saved [R0]
Localvar3
39 Subtract SP, #12
Localvar2
Localvar1
Frame pointer FP Saved [FP] MOVE –(SP), FP
Return address MOVE FP, SP
Param1
Param2
Param3
Param4
Old TOS
Stack Frame for Nested Subroutines
Mem Loc Instruction
Main Program
2000 Move -(SP), PAR2
2004 Move -(SP), PAR1
2008 CALL SUB1
2012 Move Result, (SP)
2016 Add SP, #8 SP [R3] from Main
First Subroutine [R2] from Main
2100 SUB1 Move -(SP), FP [R1] from Main
2104 Move FP, SP [R0] from Main
2108 MoveMultiple -(SP), R0-R3 FP [FP] from Main
2112 Move R0, 8(FP) 2012
2116 Move R1, 12(FP) PAR1
2120 Move -(SP), PAR3 PAR2
2124 Call SUB2 Old TOS
Mem Loc Instruction
2128 Move R2, (SP)+
…………..
[R1] from SUB1
Move 8(FP), R3
[R0] from SUB1 Stack
MoveMultiple R0-R3, (SP)+ Frame
FP [FP] from SUB1
Move FP,(SP)+ for
2128
Return SUB2
PAR3
Second Subroutine
[R3] from Main
3000 SUB2 Move -(SP), FP
[R2] from Main
Move FP, SP
[R1] from Main Stack
MoveMultiple -(SP), R0-R1
[R0] from Main Frame
Move R0, 8(FP) for
FP [FP] from Main
…………….. SUB1
2012
Move 8(FP), R1
PAR1
MoveMutiple R0-R1,(SP)+
PAR2
Move FP, (SP)+
Old TOS
Return
42
MIPS32 Subroutines
Jump to and Return from a Function
• To branch to myfunction, you might expect main to use the
MIPS instruction:
j myfunction myfunction:
.
• This is not sufficient, however, since myfunction also needs .
to know where it is supposed to eventually return to.
jr $ra
• Therefore, MIPS uses the jump-and-link instruction jal to .
call functions. .
43
• The jal saves the return address (the address of the next main:
instruction) in the dedicated register $ra, before jumping .
to the function. .
• jal is the only MIPS instruction that can access the value of jal myfunction
the program counter, so it can store the return address .
PC+4 in $ra. .
jal myfunction
• To transfer control back to the caller, the function just has to
jump to the address that was stored in $ra.
jr $ra
”Passing” arguments to and returning values from
functions
MIPS uses the following conventions for function arguments and results.
• Up to four function arguments can be “passed” by placing them in argument
registers $a0-$a3 before calling the function with jal.
• A function can “return” up to two values by placing them in registers $v0-$v1,
before returning via jr.
44 m = myfunction(i);
• The argument of myfunction is the variable i and let’s suppose i is in $s0.
• Suppose the returned value will be put into $s1, that is, the C variable m will be
represented in the MIPS code by $s1.
• Here are the instructions in main:
move $a0, $s0 # copy i into argument register
jal myfunction
move $s1, $v0 # copy result to variable m
MIPS register conventions
Here are the rules for the $s0,...,$s7 registers and $t0,...,$t7 registers:
• The parent assumes that temporary registers $t0,...,$t7 may be written over
by the child, so it should store these values into Memory prior to the call.
• In addition, the parent assumes that the v0, v1 variables can be overwritten,
so parent needs to store them in Memory prior to the call and (re)load them
45 after the call.
• The parent assumes that the values in the save registers $s0,..,$s7 will still be
there after the return from the call, so does not store the values before
making the call.
• If a child wishes to use any of the $s0,..,$s7 registers, it must store the
previous values of those registers into Memory prior to using these values,
and then load these values back into the register(s) prior to returning to the
parent.
MIPS Stack 0xFFFFFFFF
In MIPS machines, part of main
memory is reserved for a stack.
• The stack grows downward in base of stack
terms of memory addresses. 0x7FFFFFFF Stack
$sp
• The address of the top element
46 of the stack is stored (by
convention) in the “stack
pointer” register, $sp.
MIPS does not provide “push” and Heap
0x10010000
“pop” instructions. Instead, they “static” data $gp = 0x10008000
must be done explicitly by the global
0x10000000
programmer.
MIPS instruction
(“text”)
0x00000000 reserved
Pushing elements
To push elements onto the stack:
Word 1
• Move the stack pointer $sp down to
$sp Word 2
make room for the new data.
• Store the elements into the stack.
For example, to push registers $t1 and
47
$t2 onto the stack:
Before
sub $sp, $sp, 8
sw $t1, 4($sp) Word 1
Word 2
sw $t2, 0($sp)
$t1
Before and after diagrams of the stack are $sp $t2
shown on the right.
After
Accessing and popping elements
• You can access any element in the
stack (not just the top one) if you Word 1
know where it is relative to $sp. Word 2
• For example, to retrieve the value of $t1
$t1: $sp $t2
lw $s0, 4($sp)
48 • You can pop, or “erase,” elements Before
simply by adjusting the stack pointer
upwards.
Word 1
• To pop the value of $t2, yielding the
stack shown at the bottom: Word 2
$sp $t1
addi $sp, $sp, 4
$t2
• Note that the popped data is still
present in memory, but data past the
stack pointer is considered invalid. After
Examples on subroutines without stack
Int fun(int a) { main: move $a0, $s0
int t; jal fun

move $s1,$v0
t = 3a + 5;
49 return t;
}
y = fun(x) fun: li $s0, 3
mul $s1,$s0,$a0
addi $s1,$s1,5
$s0 loaded with x move $v0,$s1
$s1 loaded with y jr $ra
Examples on subroutines using stack
Int fun(int a) { fun: addi $sp,$sp,-4
int t; sw $s0,0($sp)
t = 3a + 5; addi $sp,$sp,-4
sw $s1,0($sp)
return t;
li $s0, 3
50 }
mul $s1,$s0,$a0
y = fun(x)
addi $s1,$s1,5
$s0 loaded with x move $v0,$s1

lw $s1,0($sp)
$s1 loaded with y
addi $sp,$sp,4
main: move $a0, $s0 lw $s0,0($sp)
jal fun addi $sp,$sp,4
move $s1,$v0 jr $ra
Interrupts
• Event that requires OS attention.

• Operating system “takes over” computer, attends to whatever
caused the interrupt and resumes interrupted program.
51 • Interrupt Terminology
• Handler: The OS program that “takes over” in response to
interrupt.
• Privileged Mode: It is the processing mode that allows code to
have direct access to all hardware and memory in the system.
Processor switches into privileged mode in response to interrupt
and out of privileged mode when resuming the program.
Types of Interrupts
a) Software Interrupts
b) Hardware Interrupts
52
a) Software Interrupts
• An instruction intended for user programs that transfers control to the

operating system (privileged code).
• Sort of a subroutine call to OS.
• Some of them include:
53
• an attempt to execute an unimplemented opcode;
• an integer or floating point overflow or underflow (on some machines);
• an attempt to divide by zero;
• a misaligned memory reference;
• memory protection violation;
• system calls. (There are instructions that exist for the express purpose of
requesting a service from the operating system.)
b) Hardware Interrupts
• The source of these interrupts are external to the CPU executing programs. They are not
synchronized with the programs. Hence, also called as asynchronous interrupts.
• Procedure for handling Interrupts:
• The I/O device raises an interrupt signal
• The processor provides the requested service by executing an appropriate interrupt-service
routine (ISR) - --also known as an interrupt handler.
54
• When the CPU is ready to handle the interrupt, it transfers control from whatever program
it is executing at that time to the interrupt occurred to the ISR.
• The state of the processor is first saved before servicing the interrupt, which includes the
contents of the PC, the general registers, and some control information.
• Then it loads the Program Counter (PC) with the address of the first instruction of the ISR.
• When the interrupt-service routine is completed, the state of the processor is restored so
that the interrupted program may continue.
Time Line of a Keyboard Interrupt
55
56 THANK YOU
Unit - II
Lecture 13: Assembly Language
Dr. Vineeta Jain
Machine, Assembly and High Level Language
• Machine Language
• Native to a processor: executed directly by hardware.
• Instructions consist of binary code: 1’s and 0’s.
• Assembly Language
2
• Low-level symbolic version of machine language.
• One to one correspondence with machine language.
• Pseudo instructions are used that are much more readable and easy to use.
• High-Level Language
• Programming languages like C, C++, Java.
• More readable and closer to human languages.
Assemblers and Compilers
• Assembler
• Translates an assembly language
program to machine language. High-level
Language
• Compiler
• Translate a high-level language Compiler
3 Assembly
programs to assembly/ machine Compiler Language Alternative
language.
• The translation is done by the
compiler directly, or Assembler
Machine
• The compiler first translates to Code
assembly language and then
• the assembler converts it to
machine code.
.text 0101010 0101000
.global main 1101100 0101111
0000011 0100101
main: la $t0, value 0101010 0101000
Assembling 1101100 0101111
lw$t1, 0($t0)
lw$t2, 4($t0) 0000011 0100101
4
add $t3, $t1, $t2 0101010 0101000
sw $t3, 8($t0) 1101100 0101111
0000011 0100101
.data 0101010 0101000
value: .word 50, 30, 0 1101100 0101111
0000011 0100101
Features of Assembly Language
• One-to-one mapping
• A pure assembly language is a language in which each statement produces
exactly one machine instruction.
• There is a one-to-one correspondence between machine instructions and
statements in the assembly program.
5
• Fully accessing instructions in machines
• The assembly programmer has access to all instructions available on the target
machine. The high-level language programmer does not.
• Everything that can be done in machine language can be done in assembly
language, but many instructions, registers, and similar features are not
available for the high-level language programmer to use.
Pseudoinstructions
• Pseudoinstructions do not correspond to real instructions.

• Instead, the assembler, would translate pseudoinstructions to
real instructions (one on more instructions).
• Pseudoinstructions not only make it easier to program, it can also
6
add clarity to the program, by making the intention of the
programmer more clear.
Pseudoinstructions in MIPS32
• MIPS32 assemblers supports several pseudo-instructions that are meant for user
convenience.
• Internally the assembler converts them to valid MIPS32 instructions.
• Example: The pseudo-instruction branch if less than
7 blt $s1, $s2, Label
MIPS32- Code The assembler requires an
slt $at, $s1, $s2 extra register to do this.
bne $t0, $zero, Label The register $at (= R1) is

…. reserved for this purpose.
Label: ….
Other MIPS32 Pseudo-instructions
Pseudo-Instruction Translates to Function
blt $1, $2, Label slt $at, $1, $2 Branch if less than
bne $at, $zero, Label
bgt $1, $2, Label sgt $at, $1, $2 Branch if greater than
ble $1, $2, Label sle $at, $1, $2 Branch if less or equal
8 bne $at, $zero, Label
bge $1, $2, Label sge $at, $1, $2 Branch if greater or equal
li $1, 0x23ABCD lui $1, 0x0023 Load immediate value into
ori $1, $1, 0xABCD a register
move $1, $2 add $1, $2, $zero Move content of one register
to another
la $a0, 0x2B09D5 lui $a0, 0x002B Load address into a register
ori $a0, $a0, l0x09D5
Macros
• A macro definition is a way to give a name to a piece of text. After a macro
has been defined, the programmer can write the macro name instead of the
piece of program.
• Basic parts in macro definition
• A macro header giving the name of the macro being defined
• The text comprising the body of the macro
9 • A pseudo instruction marking the end of the definition
• Macro call and expansion
• When the assembler encounters a macro definition, it saves it in a macro
definition table for subsequent use.
• From that point on, whenever the name of the macro appears as an opcode,
the assembler replaces it by the macro body.
• The use of a macro name as an opcode is known as a macro call and its
replacement by the macro body is called macro expansion.
Macro Example 1
• To terminate the program, the instructions used:
li $v0,10
syscall
• But still it is tedious to write it again and again. We can define a macro,
let’s call it done.
10
.macro done
li $v0,10
syscall
.end_macro
And then invoke it wherever necessary with the statement:
done
Macro Example 2
• Printing an integer (argument may be either an immediate value or register
name)
.macro print_int %x
li $v0, 1
11 add $a0, $zero, %x
syscall
.end_macro
print_int $s0
The .include directive
• .include directive has one operand, a quoted filename. The contents of the specified file
are substituted for the directive. This occurs during assembly preprocessing.
• It is like #include in C or C++.
• Suppose "macros.asm" contains the following:
.macro done
li $v0,10
12
syscall
.end_macro
• You could then include it in a different source file something like this:
.include "macros.asm"
.text
lw $a0, value
done
13
Stages of Compilation
The Four Stages of Compilation
Executable
calc.c calc.s calc.o Program
calc.exe
math.c math.s math.o Exists on disk
14
LOADER
Obj Files
C Source
Files io.s io.o
Assembly Executing
COMPILER Files libc.o In main
Memory
ASSEMBLER libm.o LINKER

Process
Compiler
• Preprocessing is the first pass of any C compilation. It processes

include-files, conditional compilation instructions and macros.
• Compiler
⁻ Translates high-level language program into assembly language
15
16
17
Assembler
Executable
calc.exe
18
Obj Files
io.s io.o
Assembly Executing
Files libc.o In main
Memory
ASSEMBLER libm.o Process

Assembler
• Assemblers need to
• translate assembly instructions and pseudo-instructions into
machine instructions (object files).
• Convert decimal numbers, etc. specified by programmer into
19 binary
• Typically, assemblers make two passes over the assembly file
• First pass: reads each line and records labels in a symbol table.
• Second pass: use info in symbol table to produce actual machine
code for each line
Object File Format
An object file contains the following information:
• A header that says where in the files the sections below are located
• A text segment, which contains all the source code (with some
missing addresses)
20
• A data segment: static data (local/global vars, strings, constants)
• Relocation Records: identifies lines of code that need to be “fixed”
• Symbol Table: list of this file’s referencable” labels
• Debugging information
• line number  code address map, etc.
21
Symbol Table
• The Symbol table records the list of “items” in the file that can be
used by the code in this file and in other files
• E.g., subprograms
• E.g., “global” variables in the data segment
22
• Each entry in the table contains the name of the label and its
offset within this object file
Relocation Records
• The Relocation records contain the list of “items” that this file
needs (from other object files or libraries)
• E.g., functions not defined in this file’s text segment
• E.g., “global” variables not defined in this file data segment
23
24
25
Linker
Executable
calc.exe
26
Obj Files
io.s io.o
Executing
libc.o In main
Memory
libm.o LINKER
Process
Linker
• Tool that merges the object files produced by separate

compilation or assembly and creates an executable file.
• The Linker proceeds in 3 steps
Step 1: concatenate all the text segments from all the .o files
27
Step 2: concatenate all the data segments from all the .o files
Step 3: Resolve references
Use the relocation records and the symbol tables to compute all
absolute address
Object File
sub: Executable File
Object File .
main:
.
main: jal printf
Instructions .
jal ??? ·
· ·
· ·
· jal sub
28 jal ???
LINKER printf:
call sub .
Relocation .
Records .
call printf
C Library .
sub:
print: .
. .
. .
.
Loader
Executable
calc.exe
29
LOADER
io.s io.o
Executing
libc.o In main
Memory
libm.o Process
Recap
• Compiler output is assembly files
• Assembler output is obj files

30
• Linker joins object files into one executable
• Loader brings it into memory and starts execution

31 THANK YOU
UNIT 4: Memory
Memory Hierarchy and Cache Design-

Lecture 1
The Big Picture: Where are We Now?
■ The Five Classic Components of a Computer
■ Memory is usually implemented as:
❑ Dynamic Random Access Memory (DRAM) - for main memory
❑ Static Random Access Memory (SRAM) - for cache
Processor
Input
Control
Memory
Datapath
Output
Technology Trends
Capacity Speed (latency)
Logic: 2x in 3 years 2x in 3 years
DRAM: 4x in 3 years 2x in 10 years
Disk: 4x in 3 years 2x in 10 years
DRAM
Year Size Cycle Time
1980 1000:1! 64 Kb
2:1! 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1995 64 Mb 120 ns
1998 256 Mb 100 ns
2001 1 Gb 80 ns
Who Cares About Memory?
Processor-DRAM Memory Gap (latency)
1000 CPU
µProc
60%/yr.
“Moore’s Law”
Performance
(2X/1.5yr)
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1983
1980
1981
1982
1984
1985
2000
1991
1987
1988
1989
1990
1992
1993
1994
1995
1996
1997
1998
1999
1986
Time
Today’s Situation: Microprocessors
■ Rely on caches to bridge gap

■ Cache is a high-speed memory between the processor
and main memory
■ 1980: no cache in µproc;
1997 2-level cache, 60% trans. on Alpha 21164 µproc
An Expanded View of the Memory System
Processor
Control
Memory
Memory
Memory
Memory
Datapath Memory
Speed: Fastest Slowest

Size: Smallest Biggest
Cost: Highest Lowest
Memory Hierarchy: How Does it Work?
■ Temporal Locality (Locality in Time):

=> Keep most recently accessed data items closer to the processor
■ Spatial Locality (Locality in Space):
=> Move blocks consists of contiguous words to the upper levels
Lower Level
Upper Level Memory
Processor Memory
transferred
Data are
Memory Hierarchy: Terminology
■ Hit: If the data requested by a processor appears in some
block in the upper level.
❑ Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
❑ Hit Rate: The fraction of memory access found in the upper
level
■ Miss: If the data is not found in the upper level.
❑ Miss Rate = 1 - (Hit Rate)
❑ Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor

■ Hit Time << Miss Penalty
Memory Hierarchy of a Modern Computer System
■ By taking advantage of the principle of locality:
■ Present the user with as much memory as is available in the
cheapest technology.
■ Provide access at the speed offered by the fastest technology.
Processor
Control
Tertiary
Secondary
Storage
Storage
Second Main (Tape)
(Disk)
Level
Registers
Memory
On-
Cache
Chip
Datapath Cache (DRAM)

(SRAM)
Speed (ns): 1s 10s 100s 10,000,000s 10,000,000,000s

(10s ms) (10s sec)
Size (bytes): 100s Ks Ms Gs Ts
How is the hierarchy managed?
■ Registers <-> Memory
❑ by compiler (programmer?)
■ cache <-> memory
❑ by the hardware
■ memory <-> disks
❑ by the hardware and operating system (virtual memory)
❑ by the programmer (files)
Memory Hierarchy Technology
■ Random Access:
■ “Random” is good: access time is the same for all locations
■ DRAM: Dynamic Random Access Memory
■ High density, low power, cheap, slow
■ Dynamic: need to be “refreshed” regularly
■ SRAM: Static Random Access Memory
■ Low density, high power, expensive, fast
■ Static: content will last “forever” (until lose power)
■ “Non-so-random” Access Technology:
■ Access time varies from location to location and from time
to time
■ Examples: Disk, CDROM
■ Sequential Access Technology: access time linear in
location (e.g.,Tape)
General Principles of Memory
■ Locality
■ : referenced memory is likely to be referenced
again soon (e.g. code within a loop)
■ : memory close to referenced memory is likely to
be referenced soon (e.g., data in a sequentially access array)
■ Definitions
■ : memory closer to processor
■ : minimum unit that is present or not present
■ : location of block in memory
■ : Data is found in the desired location
■ : time to access upper level
■ : percentage of time item not found in upper level
■ Locality + smaller HW is faster = memory hierarchy
■ : each smaller, faster, more expensive/byte than level
below
■ : data found in upper level also found in the lower level
Memory Hierarchy
Secondary
Storage Disks (Magnetic) Lower level
Main Memory (DRAM)
Memory L2 Cache (SRAM)

Hierarchy
L1 Cache (SRAM)
Upper level
Processor Registers (D Flip-Flops)
Differences in Memory Levels
Level Memory Typical Size Typical Cost per

Technology Access Time Mbyte
Registers D Flip- 64 32-bit 2 -3 ns N/A
Flops
L1 Cache SRAM 16 Kbytes 5 - 25 ns $100 - $250
(on chip)
L2Cache SRAM 256 Kbytes 5 - 25 ns $100 - $250
(off chip)
Main DRAM 256 Mbytes 60 - 120 ns $5 - $10
Memory
Secondary Magnetic 8 Gbytes 10 - 20 ms $0.10-$0.20
Storage Disk
Four Questions for Memory Hierarchy Designers
■ Q1: Where can a block be placed in the upper level?

(Block placement)
■ Q2: How is a block found if it is in the upper level?

(Block identification)
■ Q3: Which block should be replaced on a miss?

(Block replacement)
■ Q4: What happens on a write?

(Write strategy)
Q1: Where can a block be placed?
■ Direct Mapped: Each block has only one place that

it can appear in the cache.
■ Fully associative: Each block can be placed
anywhere in the cache.
■ Set associative: Each block can be placed in a
restricted set of places in the cache.
❑ If there are n blocks in a set, the cache is called n-way set
associative
■ What is the associativity of a direct mapped cache?
Direct Mapped Caches
■ Mapping for direct mapped cache:

( ) MOD ( )
101
000
001
010
011
100
110
111
Cache
Memory
00001 00101 01001 01101 10001 10101 11001 11101

Associativity Examples
Cache size is 8 blocks

Where does word 12 from memory go?
Fully associative: Direct mapped: Set associative:
block 12 can go block 12 can go block 12 can go Fully associative:
anywhere only into block 4 anywhere in set 0
(12 mod 8) (12 mod 4) Block 12 can go anywhere
Block 01234567 Block 0 1 2 3 4 5 6 7 Block 0 1 2 3 4 5 6 7
no: no: no:
Direct mapped:
Cache Block no. = (Block address) mod
(No. of blocks in cache)
Set Set Set Set
0 1 2 3
Block 12 can go only into block 4
Block frame address (12 mod 8 = 4)
Block 11111111112222222
no: 012345678901234567890123456
=> Access block using lower 3 bits
Memory 2-way set associative:

Set no. = (Block address) mod
(No. of sets in cache)
Block 12 can go anywhere in set 0
(12 mod 4 = 0)
=> Access set using lower 2 bits
Q2: How Is a Block Found?
■ The address can be divided into two main parts
■ Block offset: selects the data from the block
offset size = log2(block size)
■ Block address: tag + index
■ index: selects set in cache
index size = log2(#blocks/associativity)
■ tag: compared to tag in cache to determine hit
tag size = address size - index size - offset size
■ Each block has a valid bit that tells if the block is valid -
the block is in the cache if the tags match and the valid
bit is set.
Block address Block
Tag Index offset
A 4-KB Cache Using 1-word (4-byte) Blocks
Address
31 30 . . . 13 12 11 . . . 21 0
Byte
Tag 20
offset
•Cache index is used
10
Index to select the block
Index Valid Tag Data •Tag field is used to
0 compare with the value
1 of the tag filed of the
2
… cache
… •Valid bit indicates if a
… cache block have valid
1021
information
1022
1023
20 32
Data
=
Hit
Two-way Set-associative Cache
Address
31 30 . . . 13 12 11 . . . 21 0
Byte
offset
20 10
Index V Tag Data V Tag Data

0
1
2
1021
1022
1023
= =
2-to-1 multiplexor
Hit Data
Example: Alpha 21064 Data Cache
■ The data cache of the Alpha 21064 has the following

features
❑ 8 KB of data
❑ 32 byte blocks
❑ Direct mapped placement
❑ Write through (no-write allocate, 4-block write buffer)
❑ 34 bit physical address composed of
■ 5 bit block offset
■ 8 bit index
■ 21 bit tag
Example: Alpha 21064 Data Cache
1 A cache read has 4 steps

Block address Block
offset CPU addr.
<21> <8> <5> Data Data
(1) The address from the
Tag index in out cache is divided into the
4 tag, index, and block offset
Valid Tag Data
<1><21> <256> (2) The index selects block
0
1
2 (3) The address tag is
... ... compared with the tag in
255
the cache, the valid bit is
= 4:1 Mux checked, and data to be
loaded is selected
3
(4) If the valid bit is set, the
Write
buffer data is loaded into the
processor
If there is a write, the data is
Lower level memory
also sent to the write buffer
Q3: Which Block Should be Replaced on a Miss?
■ Easy for Direct Mapped - only on choice

■ Set Associative or Fully Associative:
■ Random - easier to implement
■ Least Recently Used (the block has been unused for the longest
time) - harder to implement
■ Miss rates for caches with different size, associativity and
replacement algorithm.
Associativity: 2-way 4-way 8-way

Size LRU Random LRU Random LRU Random
16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96%
64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
For caches with low miss rates, random is almost as good as LRU.
Q4: What Happens on a Write?
■ Write through: The information is written to both the block
in the cache and to the block in the lower-level memory.
■ Write back: The information is written only to the block in
the cache. The modified cache block is written to main
memory only when it is replaced.
■ is block clean or dirty? (add a dirty bit to each block)
■ Pros and Cons of each:
■ Write through
■ Read misses cannot result in writes to memory,
■ Easier to implement
■ Always combine with write buffers to avoid memory latency
■ Write back
■ Less memory traffic
■ Perform writes at the speed of the cache
Q4: What Happens on a Write?
■ Since data does not have to be brought into the cache on

a write miss, there are two options:
❑ Write allocate
■ The block is brought into the cache on a write miss
■ Used with write-back caches
■ Hope subsequent writes to the block hit in cache
❑ No-write allocate
■ The block is modified in memory, but not brought into the cache
■ Used with write-through caches
■ Writes have to go to memory anyway, so why bring the block into the cache
Calculating Bits in Cache
■ How many total bits are needed for a direct- mapped cache with 64
KBytes of data and one word blocks, assuming a 32-bit address?
■ 64 Kbytes = 16 K words = 2^14 words = 2^14 blocks
■ block size = 4 bytes => offset size = 2 bits,
■ #sets = #blocks = 2^14 => index size = 14 bits
■ tag size = address size - index size - offset size = 32 - 14 - 2 = 16 bits
■ bits/block = data bits + tag bits + valid bit = 32 + 16 + 1 = 49
■ bits in cache = #blocks x (bits/block) = 2^14 x 49 = 98 Kbytes
■ How many total bits would be needed for a 4-way set associative
cache to store the same amount of data
■ block size and #blocks does not change
■ #sets = #blocks/4 = (2^14)/4 = 2^12 => index size = 12 bits
■ tag size = address size - index size - offset = 32 - 12 - 2 = 18 bits
■ bits/block = data bits + tag bits + valid bit = 32 + 18 + 1 = 51
■ bits in cache = #blocks x (bits/block) = 2^14 x 51 = 102 Kbytes
■ Increase associativity => increase bits in cache
Calculating Bits in Cache
■ How many total bits are needed for a direct-mapped

cache with 64 KBytes of data and 8 word blocks,
assuming a 32-bit address?
❑ 64 Kbytes = 2^14 words = (2^14)/8 = 2^11 blocks
❑ block size = 32 bytes => offset size = 5 bits,
❑ #sets = #blocks = 2^11 => index size = 11 bits
❑ tag size = address size - index size - offset size = 32 - 11 - 5 = 16
bits
❑ bits/block = data bits + tag bits + valid bit = 8x32 + 16 + 1 = 273
bits
❑ bits in cache = #blocks x (bits/block) = 2^11 x 273 = 68.25 Kbytes
■ Increase block size => decrease bits in cache
Summary
■ CPU-Memory gap is major performance obstacle for

achieving high performance
■ Memory hierarchies
❑ Take advantage of program locality
❑ Closer to processor => smaller, faster, more expensive
❑ Further from processor => bigger, slower, less expensive
■ 4 questions for memory hierarchy
❑ Block placement, block identification, block replacement,
and write strategy
■ Cache parameters
❑ Cache size, block size, associativity

COA Midsem

Uploaded by

Copyright:

Available Formats

COA Midsem

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COA Midsem

Uploaded by

Copyright:

Available Formats

Computer Organization and Architecture

n General Purpose Register (GPR)

The steps being carried out are called micro-operations:

• The instruction is fetched to MDR. MDR  Mem[MAR]

Input Output Memory Processor

Memory Processor Output

Number System Basics

4 2. .0101  0x2-1 + 1x2-2 + 0x2-3 + 1x2-4 = .3125

3. 101.11  1x22 + 0x21 + 1x20 + 1x2-1 + 1x2-2 = 5.75

2 239 2 64 .634 x 2 = 1.268

1. (1011 0100 0011)2 = (B43)16

2. (10 1010 0001)2 = (2A1)16 Two leading 0s are added

3. (.1000 010)2 = (.84)16 A trailing 0 is added

4. (101 . 0101 111)2 = (5.5E)16 A leading 0 and trailing 0 are added

• Translate every hexadecimal digit into its 4-bit binary equivalent.

• (3A5)16 = (0011 1010 0101)2

• (12.3D)16 = (0001 0010 . 0011 1101)2

• (1.8)16 = (0001 . 1000)2

• Using the suffix “H” or using the prefix “0x”.

• ADDI R1,2AH // Add the hex number 2A to register R1

• 0x2AB4 // The 16-bit number 0010 1010 1011 0100

• 0xFFFFFFFF // The 32-bit number for the all-1 string

Lecture 2: Data Representation

Dr. Vineeta Jain

Mantissa It allows for a varying

• A problem: Two different representations for zero.

00010011 = +19 :: Shift left by 2 :: 01001100 = +76

11100011 = -29 :: Shift left by 2 :: 10001100 = -116

00010110 = +22 :: Shift right by 2 :: 00000101 = +5

11100100 = -28 :: Shift right by 2 :: 11111001 = -7

Assume 4-bit representations.

Assume 4-bit representations.

Assume 4-bit representations.

Assume 4-bit representations.

• At every bit position (stage), we require to add 3 bits:

Ripple Carry Adder

• Delay for C1 = 2δ • Delay for S0 = 3δ

X15-X12 Y15-Y12 X11-X8 Y11-Y8 X7-X4 Y7-Y4 X3-X0 Y3-Y0

4-bit CLA 4-bit CLA 4-bit CLA 4-bit CLA

S15-S12 S11-S8 S7-S4 S3-S0

Fn-1 Fn-2 F1 F0 C- Flag

• Multiplication requires substantially 0011 Multiplicand M (3)

0 Decimal Value of +1 0 0 -1 0 0 0 0 = 27 + (-24) = 112

.global main # starting point, must be global

# user program code goes here

.data # data section

# user program data goes here

How many significant digits? Range of exponent?

• Here, EXP = 13, BIAS = 127  E = 13 + 127 = 140 = 100011002

0 10001100 11011111001110000000000 466F9C00 in hex

1 10000000 11100000000000000000000 40700000 in hex

t2-t3: OS transfers the data file from the disk

t4-t5: Printer prints the file

• The disk and the processor

Instruction Pipeline Stage

Instruction 1: ADD R1, R2, R3

Instruction 3: STORE R5, LOCB Instruction 2 Instruction 4

• A higher degree of concurrency can be achieved if multiple

15 Instruction Pipeline Stage