COA Midsem
COA Midsem
COA Midsem
Lecture1: Introduction to
computers
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Introduction
• Computers have become part and parcel of our daily lives.
• They are everywhere
• Laptops, tablets, mobile phones, intelligent appliances.
2 • It is required to understand how a computer works.
• What are there inside a computer?
• How does it work?
• We distinguish between two terms: Computer Architecture and
Computer Organization.
• Computer Organization:
• Design of the components and functional units using which
computer systems are built.
• Analogy: civil engineer’s task during building construction
(cement, bricks, iron rods, and other building materials).
3
• Computer Architecture:
• How to integrate the functional units to build a computer system
to achieve a desired level of performance.
• Analogy: architect’s task during the planning of a building (overall
layout, floorplan, etc.).
Functional Units
Processor
Arithmetic Control
& Logic Unit Unit
Memory
Input Main Output
Memory
Secondary
Memory
Inside the Processor
• Also called Central Processing Unit (CPU).
• Consists of a Control Unit and an Arithmetic Logic Unit (ALU).
• All calculations happen inside the ALU.
5 • The Control Unit generates sequence of control signals to carry out all
operations.
• The processor fetches an instruction from memory for execution.
• An instruction specifies the exact operation to be carried out.
• It also specifies the data that are to be operated on.
• A program refers to a set of instructions that are required to carry out
some specific task (e.g. sorting a set of numbers).
Role of ALU
• It contains several registers, some general-purpose and some
special purpose, for temporary storage of data.
• It contains circuitry to carry out logic operations, like AND, OR, NOT,
6 shift, compare, etc.
• It contains circuitry to carry out arithmetic operations like addition,
subtraction, multiplication, division, etc.
• During instruction execution, the data (operands) are brought in
and stored in some registers, the desired operation carried out,
and the result stored back in some register or memory.
Role of Control Unit
• Acts as the nerve center that senses the states of various functional
units and sends control signals to control their states.
ADD R1, R2, R3 R1 R2+R3
7
Opcode Operands
• Enable the outputs of registers R2 and R3.
• Select the addition operation.
• Store the output into register R1.
• When an instruction is fetched from memory, the operation (called
opcode) is decoded by the control unit, and the control signals
issued.
Inside the Memory Unit
• Two main types of memory subsystems.
• Primary or Main memory, which stores the active instructions
and data for the program being executed on the processor.
• Secondary memory, which is used as a backup and stores all
8
active and inactive programs and data, typically as files.
• The processor only has direct access to the primary memory.
• All instructions and data are stored in memory.
• An instruction and the required data are brought into the
main memory for execution. It is also known as stored
program concept. Also known as von-Neumann architecture.
Input Unit
• Used to feed data to the computer system from the external
environment.
• Data are transferred to the processor/memory after appropriate
9 encoding.
• Common input devices:
• Keyboard
• Mouse
• Joystick
• Camera
Output Unit
• Used to send the result of some computation to the outside
world.
• Common output devices:
10 • LCD/LED screen
• Printer
• Speaker / Buzzer
• Projection system
11
Source: https://www.insidemylaptop.com/complete-disassembly-guide-for-dell-inspiron-1545-laptop/
12
Basic operation of a
Computer
Special Purpose Registers for Interfacing with
Main Memory
• Two special-purpose registers are used: Address
0
• Memory Address Register (MAR): Holds
1
the address of the memory location to
2
be accessed. 3
13
• Memory Data Register (MDR): Holds the 4
data that is being written into .
memory, or will receive the data being .
read out from memory. .
1023
• Memory considered as a linear array of
Memory
storage locations (bytes or words) each
with unique address.
M Address
A Main
Processor R
14 Memory
M
Data
D
R
Control Signals
• To read data from memory
a) Load the memory address into MAR.
b) Issue the control signal READ.
c) The data read from the memory is stored into MDR.
15 • To write data into memory
a) Load the memory address into MAR.
b) Load the data to be written into MDR.
c) Issue the control signal WRITE.
Special Purpose Register For Keeping Track
of Program / Instructions
• Program Counter (PC): Holds the memory address of the next
instruction to be executed.
• Automatically incremented to point to the next instruction when an
16 instruction is being executed.
• Instruction Register (IR): Temporarily holds an instruction that
has been fetched from memory.
• Need to be decoded to find out the instruction type.
• Also contains information about the location of the data.
Architecture of the Example Processor
Memory
17 MAR MDR
Control
Unit
PC R0 Processor
R1
.
IR . ALU
Rn-1
26
Two Bus Architecture
27
Input
I/O
Device
Processor
28 THANK YOU
Computer Organization and Architecture
𝐷 = 𝑏𝑖2𝑖
𝑖=−𝑚
Examples
1. 101011 1x25 + 0x24 + 1x23 + 0x22 + 1x21 + 1x20 = 43
(101011)2 = (43)10
• Examples:
10
• Examples:
11
2 Unsigned Signed
Unsigned Signed
Unsigned Fixed Point Numbers
• An n-bit binary number can have 2n distinct combinations (0 to 2n-1).
• For example, for n=3, the 8 distinct combinations are:
000, 001, 010, 011, 100, 101, 110, 111 (0 to 23-1 = 7 in decimal).
3 • An n-bit binary integer: Number of Range of Numbers
bn-1bn-2 … b2b1b0 bits (n)
8 0 to 28-1 (255)
• Equivalent unsigned decimal value:
16 0 to 216-1 (65535)
D = bn-12n-1 + bn-22n-2 + … + b121 + b020
32 0 to 232-1
• Each digit position has a weight that
64 0 to 264-1
is some power of 2.
Signed Fixed Point Numbers
• Many of the numerical data items that are used in a program are
signed (positive or negative).
• Question:: How to represent sign?
• Three possible approaches:
4
a) Sign-magnitude representation
b) One’s complement representation
c) Two’s complement representation
(a) Sign-magnitude Representation
• For an n-bit number representation:
• The most significant bit (MSB) indicates sign (0: positive, 1: negative).
• The remaining (n-1) bits represent the magnitude of the number.
5 • Range of numbers: – (2n-1 – 1) to + (2n-1 – 1)
bn-1 bn-2 . . . . . . . . b1 b0
Sign Magnitude
bn-1 bn-2 . . . . . . . . b1 b0
13
D = -bn-12n-1 + bn-22n-2 + … + b222 + b121 + b020
b) Shift left by k positions with zero padding multiplies the number by 2k.
14
d) The sign bit can be copied as many times as required in the beginning to
extend the size of the number (called sign extension).
X = 00101111 (8-bit number, value = +47) X = 10100011 (8-bit number, value = -93)
Sign extend to 32 bits: Sign extend to 32 bits:
00000000 00000000 00000000 00101111 11111111 11111111 11111111 10100011
Arithmetic Operations on Fixed
15
Point Numbers
Subtraction Using Addition :: 1’s Complement
How to compute A – B ?
• Compute the 1’s complement of B (say, B1).
• Compute R = A + B1 A+(-B)
16 • If a carry is obtained after addition is “1”:
• Add the carry back to R (called end-around carry).
• That is, R = R + 1.
• The result is a positive number.
• Else
• The result is negative, and is in 1’s complement form in R.
Example 1 :: 6 – 2
1’s complement of 2 = 1101
Input Output
S = A’.B + A.B’
22 A B S C =A⊕B
0 0 0 0 C = A.B
0 1 1 0
1 0 1 0
1 1 0 1 A S
Half
Adder
B C
Addition of Multi-bit Binary Numbers
0 0 1 0 1 1 0 Carry 1 1 1 1 1 1 0 Carry
0 1 0 1 0 1 1 Number A 0 1 1 1 1 1 1 Number A
+ 0 0 0 1 0 0 1 Number B + 0 0 0 0 0 0 1 Number B
23 0 1 1 0 1 0 0 Sum S 1 0 0 0 0 0 0 Sum S
0 0 0 0 0
0 0 1 1 0
24
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
A S
1 1 0 0 1
B FA
1 1 1 1 1
Cin Cout
Delay of a Full Adder
Delay of a full adder:
• Assume that the delay of all basic gates
(AND, OR, NAND, NOR, NOT) is δ.
• Delay for Carry = 2δ
25
• Delay for Sum = 3δ (AND-OR delay plus
one inverter delay)
Circuitry for Addition
• In RCA, Carry output from
1110 Carry
stage-i propagates as the carry
0111 Number A input to stage-(i+1).
+ 0001 Number B • In the worst-case, carry ripples
through all the stages.
26 1000 Sum S
Delay is proportional to n
Design of Fast Adders
Carry Look-ahead Adder
• The propagation delay of an n-bit ripple carry adder has been seen
to be proportional to n.
• Due to the rippling effect of carry sequentially from one stage to the
28 next.
• One possible way to speedup the addition:
• Generate the carry signals for the various stages in parallel.
• Time complexity reduces from O(n) to O(1).
• Hardware complexity increases rapidly with n.
4-bit CLA Adder
3δ
29
5δ
Carry Generate and Carry Propagate
Xi Yi
Ci+1 = Xi.Yi + Yi.Ci + Xi.Ci • Generate function means
whether at a particular
Ci+1 = Xi.Yi + Ci (Yi + Xi) stage a carry is generated
Ci+1 Full Adder Ci
based on inputs Xi and Yi,
=1 =1
Gi Pi irrespective of Ci.
30 • It can happen when
Si Xi=1 and Yi=1,
where, Gi = carry generate function and
i.e. Gi = Xi.Yi
Pi = carry propagate function
• In propagate function, Ci
Ci+1 = Gi + PiCi propagates to Ci+1 i.e.,
Ci+1 = 1, given Ci=1:
Gi = 1 represents the condition when a carry is
Xi = 0, Yi = 1
generated in stage-i independent of the other stages. Xi = 1, Yi = 0
Pi = 1 represents the condition when an input carry So, Pi = Xi ⊕ Yi
Ci will be propagated to the output carry Ci+1.
Unrolling the Recurrence
Ci+1 = Gi + PiCi
= Gi + Pi (Gi-1 + Pi-1Ci-1)
= Gi + PiGi-1 + PiPi-1Ci-1
31 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-2Ci-2)
= Gi + PiGi-1 + PiPi-1 Gi-2 + .... + PiPi-1....P1G0 + PiPi-1....P1C0
Thus, all the carries can be obtained by 5 gate delays, i.e. 5δ, after the
input signals X, Y and C0 are applied as:
• 3δ delay is incurred in generating Gi and Pi (as Pi uses XOR gate
incurring 3δ delay)
• 2δ is incurred in AND-OR circuit for Ci+1 (as Ci+1 = Gi + PiCi )
Design of 4-bit CLA Adder
C4 = G3 + G2P3 + G1P2P3 + G0P1P2P3 + C0P0P1P2P3
C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2
C2 = G1 + G0P1 + C0P0P1
32
C1 = G0 + C0P0
S0 = X0 ⊕ Y0 ⊕ C0 = P0 ⊕ C0
S1 = P1 ⊕ C1
S2 = P2 ⊕ C2
S3 = P3 ⊕ C3
The 4-bit
CLA Circuit
33
16-bit Adder Using 4-bit CLA Modules
Problem: Carry propagation between modules still slows down the adder
Solution:
• Use a second level of carry look-ahead mechanism to generate the
input carries to the CLA blocks in parallel.
• The second level of CLA generates C4, C8, C12 and C16 in parallel
with two gate delays (2δ).
35
• For larger values of n, more CLA levels can be added.
• Delay calculation of a 16-bit adder:
a) For original single-level CLA: 14δ
b) For modified two-level CLA: 10δ
36
Delay of a n-bit Adder
n TCLA TRCA
TCLA = (6 +2 log4n ) δ
4 8δ 9δ
16 10δ 33δ
37 32 12δ 65δ TRCA = (2n + 1) δ
64 12δ 129δ
128 14δ 257δ
256 14δ 513δ
Status Flags
• Many contemporary processors have a flag register that contains
the status of the last arithmetic / logic operation.
• Zero (Z): tells whether the result is zero. Can be used for both arithmetic
and logic operations.
38 • Sign (S): tells whether the result is positive (=0) or negative (=1). Can
be used for both arithmetic and logic operations.
• Carry (C): tells whether there has been a carry out of the most
significant stage. Used only for arithmetic operations.
• Overflow (V): tells whether the result is too large to fit in the target
register. Used only for arithmetic operations (addition and
subtraction).
F
Status Flags Carry
out …
• Unsigned Multiplication
• Booth’s Multiplication Algorithm
40
Unsigned Multiplication
Example: 3 X 3
0 0 0 0 0 0 0 0 0 0
44 1 1 1 1 1 0 0 1 1 2’s comp. of 13
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
(-78) 1 1 1 0 1 1 0 0 1 0
Computer Organization and Architecture
MIPS32 Processor
DR. VINEETA JAIN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, LNMIIT JAIPUR
MIPS32 Architecture: A Case Study 2
a) 32, 32-BIT GENERAL PURPOSE REGISTERS high-order 32 bits
(GPRS), R0 TO R31.
31 0 31 0
• R0 is hard-wired to a value of zero. R0 Zero HI
• R31 is used to store return address when a R1 LO
function call is made. Used by the jump- R2 low-order 32 bits
and-link and branch-and-link instructions
. 31 0
b) A SPECIAL-PURPOSE 32-BIT PROGRAM . PC
COUNTER (PC).
.
• Affected only indirectly by certain Special Purpose Registers
R30
instructions (like branch, call, etc.)
R31 Return Address
c) A PAIR OF 32-BIT SPECIAL-PURPOSE REGISTERS
HI AND LO, WHICH ARE USED TO HOLD THE General Purpose Registers
RESULTS OF MULTIPLY, DIVIDE, AND MULTIPLY-
ACCUMULATE INSTRUCTIONS.
3
MIPS32 Assembly Code Layout
.text # code section
main:
.text
.global main
Add two
numbers in
main: la $t0, value memory and
lw $t1, 0($t0) store the
lw $t2, 4($t0) result in the next
add $t3, $t1, $t2 location.
sw $t3, 8($t0)
.data
value: .word 50, 30, 0
Assembler Directives 7
• .TEXT
• Specifies the user text segment, which contains the instrucFons.
• .DATA
• Specifies the data segment, where all the data items are defined.
• .GLOBL SYM
• Specifies that the symbol “sym” is global, and can be referred from
other files.
• .WORD W1, W2, …, WN
• Stores the specified 32-bit numbers in successive memory words.
• .HALF H1, H2, …, HN
• Stores the specified 16-bit numbers in successive memory half-words.
• .BYTE B1, B2, …, BN 8
• Stores the specified 8-bit numbers in successive memory bytes.
• .ASCII STR
• Stores the specified string in memory (in ASCII code), but do not null-
terminate it.
• Strings are enclosed in double quotes and follow C-like convention (“\n”,
etc.).
• .ASCIIZ STR
• Stores the specified string in memory (in ASCII code), and null-terminate it.
• .SPACE N
• Reserve space for n successive bytes in memory.
9
THANK YOU
45
FLOATING-POINT NUMBERS
Floating-Point Number Representation
(IEEE-754)
• For representing numbers with fractional parts, we can assume that
the fractional point is somewhere in between the number (say, n
bits in integer part, m bits in fraction part). Fixed-point
representation
46 • Lacks flexibility.
• Cannot be used to represent very small or very large numbers (for
example: 2.53 x 10-26, 1.7562 x 10+35, etc.).
• Solution :: use floating-point number representation.
• A number F is represented as a triplet <s, M, E> such that
F = (-1)s M x 2E
F = (-1)s M x 2E
• s is the sign bit indicating whether the number is negative (=1) or positive (=0).
• M is called the mantissa, and is normally a fraction in the range [1.0,2.0].
• E is called the exponent, which weights the number by power of 2.
47
Encoding:
• Single-precision numbers: total 32 bits, E 8 bits, M 23 bits
• Double-precision numbers: total 64 bits, E 11 bits, M 52 bits
S E M
Points to Note
• The number of significant digits depends on the number of bits in M.
• 7 significant digits for 24-bit mantissa (23 bits + 1 implied bit).
• The range of the number depends on the number of bits in E.
48 • 1038 to 10-38 for 8-bit exponent.
4 6 6 F 9 C 0 0
Encoding Example2
• Consider the number F = -3.75
-3.7510 = -11.112 = -1.111 x 21
• Considering single precision number:
52 • Mantissa will be stored as: M = 111000000000000000000002
• Here, EXP = 1, BIAS = 127 E = 1 + 127 = 128 = 100000002
• Result: 1.00010001001 x 28
Subtraction Example
• Suppose we want to subtract F2 = 224 from F1 = 270.75
• F1 = (270.75)10 = (100001110.11)2 = 1.0000111011 x 28
• F2 = (224)10 = (11100000)2 = 1.11 x 27
57 • Shift the mantissa of F2 left by 8 – 7 = 1 position, and subtract:
1.0000111011 X 28 Perform 2’s complement
0.1110000000 X 28 subtraction by addition
0.00101110110 X 28
• For normalization, shift mantissa left 3 positions, and decrement E by 3.
• Result: 1.01110110 x 25
Floating-Point Multiplication
• Two numbers: M1 x 2E1 and M2 x 2E2
• Basic steps for multiplication:
• Add the exponents E1 and E2 and subtract the BIAS.
58
• Multiply M1 and M2 and determine the sign of the result.
• Normalize the resulting value, if necessary.
Rounding
• Suppose we are adding two numbers (say, in single-precision).
• We add the mantissa values after shifting one of them left
for exponent alignment.
59
• We take the first 23 bits of the sum, and discard the residue
R (beyond 32 bits).
• If the process of rounding generates a result that is not in
normalized form, then we need to re-normalize the result.
60 THANK YOU
Computer Organization and Architecture
Lecture5: Processors
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Software
• A software or a program consists of a set of instructions required to solve a
specific problem.
• A program to sort a set of numbers.
• A program to find the roots of a quadratic equation.
2 • Broadly we can classify programs/software into two types:
a) System Software
• A collection of programs that helps the users to create, analyze and run their
programs.
b) Application Software
• Which helps the user to solve a particular user-level problem.
• They need system software for execution.
(a) System Software
• System software is a collection of programs that are executed as needed to
perform functions such as
• Receiving and interpreting user commands
• Running standard application programs such as word processors, etc, or
games
3
• Managing the storage and retrieval of files in secondary storage devices
• Controlling I/O units to receive input information and produce output results
• Translating programs from source form prepared by the user into object form
consisting of machine instructions
• Linking and running user-written application programs with existing standard
library routines, such as numerical computation packages
• System software is thus responsible for the coordination of all activities in
a computing system
(b) Application Software
• Application software helps users solve particular problems.
• In most cases, application software resides on the computer’s
hard disk or removable storage media (DVD, USB drive, etc.).
4 • Typical examples:
• Financial accounting package
• Mathematical packages like MATLAB or MATHEMATICA
• An app to book a cab
• An app to monitor the health of a person
Operating System
• Operating system (OS) is a large program, or actually a collection of
routines, that is used to control the sharing of and interaction among
various computer units as they perform application programs.
• The OS routines perform the tasks required to assign computer
5 resource to individual application programs
• These tasks include assigning memory and magnetic disk space to
program and data files, moving data between memory and disk units,
and handling I/O operations
• Let us consider a scenario where a system with one processor, one disk, and one printer is
available.
• Assume that part of the program’s task involves reading a data file from the disk into the
memory, performing some computation on the data, and printing the results
User Program and OS Routine Sharing
6
Multiprogramming or Multitasking
• Similarly, during t 0-t1. the OS can arrange to print previous program’s results while the
current program is being loaded from the disk.
• Thus, the OS manages the concurrent execution of several application programs to make the
best possible use of computer resources.
• This pattern of concurrent execution is called multiprogramming or multitasking.
Performance
• The most important measure of the performance of a computer is how quickly it can
execute programs. For best performance, it is necessary to design the compilers, the
machine instruction set, and the hardware in a coordinated way.
• The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the
processor, the disk and the printer.
8 • The time needed to execute a instruction is called the processor time. It depends on the
hardware involved in the execution of individual machine instructions. This hardware
comprises the processor and the memory which are usually connected by the bus.
Cache
Main Memory Processor
Memory
Processor Clock
• Processor circuits are controlled by a timing signal called a clock.
• The clock defines regular time intervals, called clock cycles.
• To execute a machine instruction, the processor divides the action to
be performed into a sequence of basic steps, such that each step can
9
be completed in one clock cycle.
• Let the length of one clock cycle be P, its inverse (R) is the clock rate,
R=1/P, which is measured as cycles per second also known as hertz
(Hz).
• The term “million” is denoted by prefix Mega (M) and “billion” is
denoted by prefix Giga (G). Ex: 500 million cycles per second is called
as 500 MHz and 1250 million cycles per second is called as 1.25 GHz.
Basic Performance Equation
𝑁∗𝑆
𝑇=
𝑅
• where,
10 • T is the processor time required to execute a program,
• N is the number of instruction executions, and
• S is the average number of basic steps needed to execute one
machine instruction
• To achieve higher performance, the value of T should be
reduced, which means reducing value of N and S and increasing
the value of R.
Performance Improvement
• Pipelining and superscalar operation
• Pipelining: by overlapping the execution of successive instructions
• Superscalar: different instructions are concurrently executed with
multiple instruction pipelines. This means that multiple functional
11
units are needed.
• Clock rate improvement
• Improving the integrated-circuit technology makes logic circuits faster,
which reduces the time needed to complete a basic step.
Pipelining
• A substantial improvement in performance can be achieved by overlapping the
execution of successive instructions, using a technique called pipelining.
• Instruction execution is typically divided into 5 stages:
• Instruction Fetch (IF)
12 • Instruction Decode (ID)
• ALU operation (EX)
• Memory Access (MEM)
• Write Back result to register file (WB)
• These five stage can be executed in an overlapped fashion in a pipeline
• architecture.
• Results in significant speedup by overlapping instruction execution.
Basic 5-stage Pipelining Diagram
Single instr
SIMD
stream
SISD Array or Vector
Uniprocessors
Processors
instr stream
MIMD
Multiple
MISD Multiprocessors or
Rarely Used
Multicomputers
Single-instruction, single-data (SISD)
• An SISD computing system is a
uniprocessor machine which is capable of
executing a single instruction, operating on
a single data stream.
• In SISD, machine instructions are
22 processed in a sequential manner and
computers adopting this model are
popularly called sequential computers.
• Most conventional computers have SISD
architecture. All the instructions and data
to be processed have to be stored in
primary memory.
Single-instruction, multiple-data (SIMD)
• An SIMD system is a multiprocessor
machine capable of executing the same
instruction on all the CPUs but operating
on different data streams.
• Single Control Unit (CU) and multiple
processing elements (PEs)
23
• CU fetches an instruction from the
memory and after decoding, broadcast
control signals to all PEs, i.e. at any given
time, all PEs are synchronously executing
the same instruction but on different sets Data parallelism can be achieved
in two ways:
of data. Hence named SIMD.
a) Concurrency in space – array
• SIMD allows data parallelism, i.e., processing
executing one operation on multiple data b) Concurrency in time – vector
streams. processing
Consider a task of adding two group
a) Array Processor (A and B) of 10 numbers.
A. In normal processor:
• Execute the loop 10 times
• It is a processor capable of processing • Read instr and decode
array elements. • Fetch no.s from A & B
• Add them
• It is synchronous parallel computer with • Put the result back
multiple ALU called PE that can operate in • End loop
parallel B. In Array processor
• Read instr and decode
24 • It is composed of N identical PE under the • Fetch no.s from A
control of one CU and a number of • Fetch no.s from B
memory modules • Add them
• Put the result back
• Array processors take the concept of • Only two address translations are
pipelining one step further, i.e. instead of needed
pipelining just the instructions, they also • Fetching and decoding is done only
one time instead of ten times
pipeline the data itself. This allows • The code is also smaller leading to
significant saving in decoding time. efficient memory use
• It improves performance by
avoiding stalls
b) Vector Processor
• A vector is an ordered set of same type of scalar items, where a scalar item can be a
floating point number, an integer, or a logical value.
1 2 . . . . . 64
Vector
• A vector V of length n is represented as row vector by V = [V1,V2,V3….Vn]. The element Vi of
vector V is written as V(I) and the index I refers to a memory address or register where
25 the number is stored.
• Vector processing is the arithmetic or logical computation applied on vectors whereas in
scalar processing only one or pair of data is processed.
• Provides high-level instructions that operate on entire arrays of numbers (called vectors).
Therefore, vector processor is faster than scalar processor.
• Example: A, B and C are three vectors containing 64 numbers each. The three vectors are
mapped to vector registers V1, V2, V3 (say). The following vector instruction computes :
Ci = Ai + Bi ADDV V3, V1, V2
• A single vector instruction is equivalent to an entire loop. No loop overheads are required.
Multiple-instruction, single-data (MISD)
• An MISD computing system is a
multiprocessor machine capable of
executing different instructions on
different PEs but all of them
operating on the same dataset.
Example Z = sin(x)+cos(x)+tan(x)
26
• The system performs different
operations on the same data set.
• Machines built using the MISD model
are not useful in most of the
application, a few machines are built,
but none of them are available
commercially.
Multiple-instruction, multiple-data (MIMD)
• An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets.
• Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
27
MIMD machines are broadly
categorized based on the way
PEs are coupled to the main
memory:
• Shared-memory MIMD
model (Tightly coupled
multiprocessors
• Distributed-memory MIMD
model (Loosely coupled
multiprocessor)
Multiple-instruction, multiple-data (MIMD)
• An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets.
• Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
28
MIMD machines are broadly
categorized based on the way
PEs are coupled to the main
memory:
• Shared-memory MIMD
model (Tightly coupled
multiprocessors
• Distributed-memory MIMD
model (Loosely coupled
multiprocessor)
Single Processor v/s Multiprocessor Systems
CPU Chip
single
Registers processor
29 ALU
System bus
Bus Interface
Single Processor v/s Multiprocessor Systems
Core 1 Core 2 Core 3
Registers Registers
Registers
Bus Interface
Shared-memory (Tightly coupled) multiprocessors
• All the PEs are connected to a single global memory and they all have access
to it.
• The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one
31 PE is visible to all other PEs.
• Some Features:
• Difficult to extend it to large number of processors.
• Memory bandwidth requirements increase with the number of
processors.
• Memory access time for all processors is uniform, called Uniform Memory
Access – UMA.
Shared-memory (Tightly coupled) multiprocessors
32
L2 L2 L2 L2
L3
34
Interconnection network
Advanced Microprocessors
Internal Clock Address Max: Memory
Name Date Data Width
Registers Speed Lines Space
Pentium III 1999 32 Bit 450 MHz 32 bits, 64 bit bus 32 Bit 64 GB
0018H W4 W4 W4
before 0 0 1 1 1 0 . . . . 0 1 1 Logical
Shift
Left
LShiftL R0, #2
after 1 1 1 0 . . . . 0 1 1 0 0
10
0 R0 C
before 0 1 1 1 0 . . . . 0 1 1 0 Logical
Shift
LShiftR R0, #2 Right
after 0 0 0 1 1 1 0 . . . ..0 1
Shift Instructions
• Arithmetic Shift Instructions
• Follows 2’s complement number representation
• Two logical shift instructions are there:
11
• AShiftL for shifting left: the number gets multiplied by 2
• AShiftR for shifting right: The number gets divided by 2
• Syntax: AShiftL <destination>, <count>
• Vacated positions are filled with zeros in left shift, and filled with sign bit
in case of right shift.
C R0 0
before 0 0 0 0 1 1 . . . . 0 1 0 Arithmetic
Shift Left
AShiftL R0, #2
after 0 0 1 1 . . . . 0 1 0 0 0
12
R0 C
Arithmetic
before 1 0 0 1 1 . . . . 0 1 0 0 Shift Right
AShiftR R0, #2
after 1 1 1 0 0 1 1 . ... 0 1
Rotate Operations
• In the shift operations, the bits shifted out of the operand are lost, except for
the last bit shifted out which is retained in the carry flag C.
• To preserve all the bits, a set of rotate instructions can be used.
• They move the bits that are shifted out of the one end of the operand back
13 into the other end.
• They are of two types:
• Rotate without carry: The bits of the operand are simply rotated and the last
rotated bit is retained in the Carry flag C.
• Rotate with Carry: The rotation includes the Carry flag C
C R0
before 0 0 1 1 1 0 . . . . 0 1 1 Rotate
Left
without
RotateL R0, #2 Carry
after 1 110 . . . . 0 1 1 0 1
14
C R0
Rotate
before 0 0 1 1 1 0 . . . . 0 1 1 Left
With
Carry
RotateLC R0, #2
after 1 110 . . . . 0 1 1 0 0
R0 C
before 0 1 1 1 0 . . . . 0 1 1 0 Rotate
Right
RotateR R0, #2 without
Carry
1 1 0 1 1 0 . . . . 0 1
after
15
R0 C Rotate
Right
With
Carry
before 0 1 1 1 0 . . . . 0 1 1 0
RotateRC R0, #2
after 1 0 0 1 1 0 . . . . 0 1
c) Program Control Instructions
• A program control instruction changes address value in the PC and hence the normal
flow of execution.
• Change in PC causes a break in the execution of instructions.
• It is an important feature of the computers since it provides the control over the
16 flow of the program and provides the capability to branch to different program
segments.
Name Mnemonic
Branch BR
Jump JMP
Skip next instruction SKP
Call procedure CALL
Return from procedure RET
Compare (by subtraction) CMP
Test (by ANDing) TEST
c) Program Control Instructions
• Branch (BR) and Jump (JMP) instructions are used sometimes interchangeably but, they are
different. BR can be conditional while jump is unconditional.
• Jump is used to refer to unconditional version of branch.
• Skip (SKP) instructions is used to skip one(next) instruction. It does not need address field.
• Compare Instruction compares two operands. It basically subtracts one operand from the
other for comparing whether the operands are equal or not. It is used along with the
17 conditional branch instruction for decision making.
• Syntax: CMP destination, source
• Example: CMP DX,00 //Compare the DX value with zero by subtracting
BE L7 //If yes(BE: Branch equal), then jump to label L7
…..
L7: ...
• Similarly TEST instructions performs the AND of two operands.
Conditional Branch Instructions
Code Values
• A conditional branch N (Negative) Set to 1 if the result is negative; otherwise 0
instruction is a branch Z (zero) Set to 1 if the result is 0; otherwise 0
instruction that may or V (overflow) Set to 1 if the arithmetic flow occurs; otherwise 0
may not cause a
C (carry) Set to 1 if a carry-out results from the operation;
transfer of control otherwise 0
depending on the value
18 of stored bits in the PSR Branch Condition Mnemonic Example
(processor status Branch if zero BZ Z=1
register). Branch if not zero BNZ Z=0
Branch if carry BC C=1
• Each conditional branch
Branch if no carry BNC C=0
instruction tests a
Branch if minus BN N=1
different combination
Branch if plus BNN N=0
of Status bits for a
Branch of overflow BV V=1
condition.
Branch if no overflow BNV V=0
19
MIPS Programming Examples
Some Examples of MIPS32 Arithmetic
C- Code C- Code
A=B+C A = B + C – D;
B loaded in $s1
E = F + A;
C loaded in $s2
20 D loaded in $s3
F loaded in $s5
MIPS32- Code $t0 is a temporary
add $s1, $s2, $s3 A $s0; E $s4
MIPS32- Code
B loaded in $s2
C loaded in $s3 add $t0, $s1, $s2
A $s1 sub $s0, $t0, $s3
add $s4, $s5, $s0
Examples on Control Constructs
C- Code
if (x==y) z = x – y;
OPCODE OPERANDS
28
• Number of operands varies from instruction to instruction.
• Also for specifying an operand, various addressing modes are possible
Types of Instruction Format
a) Three-Address Instruction
b) Two-Address Instruction
c) One-Address Instruction
29
d) Zero-Address Instruction
a) Three-Address Instruction
• Two source operands and one destination operand need to be specified.
• Assumes that the destination address is the same as that of the first
32
operand.
• Computers with two-address instruction formats can use each address
field to specify either a processor register or a memory operand or a
immediate data (only in source field).
• Example: ADD R1, R2 R1 R1+R2
EVALUATE X=(A+B)*(C+D)
MUL R1, R2 R1 ← R1 * R2
MOV X, R1 Mem[X] ← R1
b) One-Address Instruction
• Only source operands needs to be specified.
Opcode Source 1
Opcode
• Stack is used. Arithmetic operation pops two operands from the stack and
36
pushes the result.
• Also called stack organization.
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
37
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D)
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
38
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS A
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
39
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS B
MUL TOS ←(A+B)*(C+D) A
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
40
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS A+B
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
41
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS C
MUL TOS ←(A+B)*(C+D) A+B
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
42
PUSH C TOS ← C
PUSH D TOS ← D
TOS D
ADD TOS ←C+D C
MUL TOS ←(A+B)*(C+D) A+B
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
43
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D TOS C+D
MUL TOS ←(A+B)*(C+D) A+B
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
44
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS (A+B)*(C+D)
POP X Mem[X] ← TOS STACK
EVALUATE X=(A+B)*(C+D)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ←A + B
45
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ←C+D
MUL TOS ←(A+B)*(C+D) TOS
Memory
Operand
Indirect Addressing
• The instruction contains a field that holds the memory address,
which in turn holds the memory address of the operand.
• Two memory accesses are required to get the operand value.
• Slower but can access large address space.
6
• Not limited by the number of bits in operand address like direct
addressing.
• For a word length of N, an address space of 2N is now available.
• Examples:
• ADD R1,(20A6H) // R1 = R1 + (Mem[20A6])
• EA = (20A6H)
OPCODE Operand address
Memory
7
Operand
Pointer
Register Addressing
• The operand is held in a register, and the instruction specifies the
register number.
• Very few number of bits needed, as the number of registers is limited.
• Faster execution, since no memory access is required for getting the
8
operand.
• Modern load-store architectures support large number of registers.
• Examples:
• ADD R1,R2,R3 // R1 = R2 + R3
• MOV R2,R5 // R2 = R5
• EA = R
OPCODE Register no.
Register Bank
Operand
Register Indirect Addressing
• The instruction specifies a register, and the register holds the
memory address where the operand is stored.
• Can access large address space.
10 • One fewer memory access as compared to indirect addressing.
• Example:
• ADD R1,(R5) // R1 = R1 + Mem[R5]
• EA = (R5)
OPCODE Register no.
11
OPCODE offset
ADDER Operand
PC
Indexed Addressing
• Either a special-purpose register, or a general-purpose register, is
used as index register in this addressing mode.
• The instruction specifies an offset of displacement, which is added to
the index register to get the effective address of the operand.
13
• Example:
• LOAD R1,1050(R3) // R1 = Mem[1050+R3]
• Add R1, [R2] // R1 = R1 + Mem[R2*d]
Where, d = size of the word (example: d = 32 bits or 4 B for byte addressable
memory)
• Can be used to sequentially access the elements of an array.
• Offset gives the starting address of the array, and the index register
value specifies the array element to be used.
OPCODE Index reg offset
14
a b c d e f g h i j
R1 500
R2 200
Memory
b) ADD R2, R4, #15 200 25
R3 = 300 300 50
R4 = 400 400 50
R2 R4 + 15 = 400 + 15
500 100
= 415
596 40
600 200
700 100
1000 150
19
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500
R2 200 415
Memory
c) ADD R1, R2, (R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1 R2 + Mem[R4] = R2 + Mem[400]
500 100
= 415 + 50
596 40
= 465 600 200
700 100
1000 150
20
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465
R2 200 415 415
Memory
d) ADD R2, R3, 100(R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R2 R3 + Mem[R4+100] = R3 + Mem[500]
500 100
= 300 + 100
596 40
= 400 600 200
700 100
1000 150
21
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465
R2 200 415 415 400
Memory
e) ADD R1, R2, (R3+R4) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1 R2 + Mem[R3+R4] = R2 + Mem[700]
500 100
= 400 + 100
596 40
= 500 600 200
700 100
1000 150
22
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500
R2 200 415 415 400 400
Memory
f) ADD R2, R4, 600 200 25
R3 = 300 300 50
R4 = 400 400 50
R2 R4 + Mem[600] = 400 + 200
500 100
= 600
596 40
600 200
700 100
1000 150
23
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500
R2 200 415 415 400 400 600
Memory
g) ADD R1, R2, (600) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1 R2 + [Mem[600]] = 600 + [200]
500 100
= 600 + 25
596 40
= 625 600 200
700 100
1000 150
24
1200 300
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625
R2 200 415 415 400 400 600 600
Memory
h) ADD R1, R3, (R2)+ (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
R1 R3 + Mem[R2] = 300 + Mem[600]
500 100
= 300 + 200
596 40
= 500 600 200
size of word = 32 bits = 4B 700 100
25 Since, every word occupies 4B, the address will increase by 4. 1000 150
1200 300
Therefore, (R2)+ R2 + 4 = 604
1600 400
Address Contents
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500
R2 200 415 415 400 400 600 600 604
Memory
i) ADD R1, R4, -(R2) (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
size of word = 32 bits = 4B
500 100
Therefore, -(R2) R2 - 4 = 600
596 40
600 200
R1 R4 + Mem[R2] = 400 + Mem[600] 700 100
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500 600
R2 200 415 415 400 400 600 600 604 600
Memory
j) ADD R1, R2, 100(R3)[R3] (size of word = 32 bits, byte organized) 200 25
R3 = 300 300 50
R4 = 400 400 50
500 100
R1 R2 + Mem[R3 + 100 + R3*4]
596 40
= 600 + Mem[300 + 100 + 300*4] 600 200
= 600 + Mem[1600] 700 100
a b c d e f g h i j
R1 500 500 465 465 500 500 625 500 600 1000
R2 200 415 415 400 400 600 600 604 600 600
Some Examples
What is the value of R1 and R2 at all stages?
1. MOV R1, 100(R2) Initially R1 =100, R2= 200
2. ADD R1, R2 Memory
3. MOV R2, R1
28 100 200
4. HALT
200 300
1. R1 Mem[R2+100] = Mem[300] = 150
2. R1 R1+R2 = 150 + 200 = 350 R1 R2
300 150
3. R2 R1 = 350 Address Contents
1. 150 200
2. 350 200
3. 350 350
29 THANK YOU
Computer Organization and Architecture
Unit - II
Instruction Set Architecture
Lecture 4: Expanding Opcodes
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Instruction Length
• Instructions on current architectures can be formatted in two
ways:
• Fixed length -Wastes space but is fast and results in better
performance.
2
• Variable length - More complex to decode but saves storage
space.
• Examples:
• MIPS uses a fixed length instruction of 32 bits.
• Let us assume that instruction length = 16 bits (fixed), opcode = 4bits and
operand = 4bits
• For 3 –address instruction: Total instructions = 24 = 16
OPCODE Operand 1 Operand 2 Operand 3
4 bits
Expanding Opcodes
• We have seen how instruction length is affected by the number of
operands supported by the ISA.
• In any instruction set, not all instructions require the same
4
number of operands.
• Operations that require no operands, such as HALT, necessarily
waste some space when fixed-length instructions are used.
• One way to recover some of this space is to use expanding
opcodes.
• The idea of expanding opcodes is to make some opcodes short,
but have a means to provide longer ones when needed.
• When the opcode is short, a lot of bits are left to hold operands.
• So, we could have two or three operands per instruction.
• If an instruction has no operands (such as Halt), all the bits can be
5 used for the opcode.
• In between, there are longer opcodes with fewer operands as well
as shorter opcodes with more operands.
How does expanding opcode work
• Let us assume that instruction length = 16 bits (fixed) and operand = 4bits
• For 3 –address instruction:
0000 xxxx yyyy zzzz
15 instructions
0001 xxxx yyyy zzzz
as 1111 is not
….. …. …. ….
allowed
1110 xxxx yyyy zzzz
6
• For 2-address instruction:
opcode
How does expanding opcode word
• For 1 –address instruction:
1111 1111 0000 zzzz
15 instructions
1111 1111 0001 zzzz
….. ….. …. ….
1111 1111 1110 zzzz
7 opcode
• For 0-address instruction:
1111 1111 1111 0000
1111 1111 1111 0001
16 instructions
….. ….. ….. ….
1111 1111 1111 1111
opcode
if (leftmost four bits != 1111 ) {
Execute appropriate three-address instruction}
else if (leftmost eight bits != 1111 1111 ) {
Execute appropriate two-address instruction}
else if (leftmost twelve bits != 1111 1111 1111 ) {
8
Execute appropriate one-address instruction }
else {
Execute appropriate zero-address instruction
}
Example 1
Consider a machine with 16-bit instructions and 16 registers. And we
wish to encode the following instructions:
a) 15 instructions with 3 addresses
9 b) 14 instructions with 2 addresses
c) 31 instructions with 1 address
d) 16 instructions with 0 addresses
Can we encode this instruction set in 16 bits?
Answer: Yes if we use expanding opcodes
a) For 3 –address instruction:
0000 R1 R2 R3
15 instructions
0001 R1 R2 R3
….. …. …. ….
1110 R1 R2 R3
10
b) For 2-address instruction:
1111 0000 R1 R2
1111 0001 R1 R2
14 instructions
….. …. …. ….
1111 1101 R1 R2
opcode
c) For 1 –address instruction:
1111 1110 0000 R1
1111 1110 0001 R1
16 instructions ….. ….. …. ….
1111 1110 1111 R1
1111 1111 0000 R1
15 instructions
1111 1111 0001 R1
….. ….. …. ….
11
1111 1111 1111 R1
12
Going back to Example 1 (Slide 30):
• The first 3-address 15 instructions account for:
• 15x24x24x24 = 15 x 212 = 61440 bit patterns
• The next 2-address 14 instructions account for:
• 14 x 24 x 24 = 15 x 28 = 3584 bit patterns
• The next 1-address 31 instructions account for:
13
• 31 x 24 = 496 bit patterns
• The last 0-address 16 instructions account for 16 bit patterns
• In total we need 61440 + 3584 + 496 + 16 = 65536 different bit patterns
• Having a total of 16 bits we can create 216 = 65536 bit patterns
• We have an exact match with no wasted patterns.
• So our instruction set is possible.
Example 2
Is it possible to design an expanding opcode to allow the following to
be encoded with a 12-bit instruction? Assume a register operand
requires 3 bits.
14
a) 4 instructions with 3 registers
b) 255 instructions with 1 register
c) 16 instructions with 0 register
• The first 4 instructions account for:
• 4x23x23x23 = 4 x 29 = 2048 bit patterns
• The next 255 instructions account for:
• 255 x 23= 2040 bit patterns
• The last 16 instructions account for 16 bit patterns
• In total we need 2048 + 2040 + 16 = 4104 bit patterns
15 • With 12 bit instruction we can only have 212 = 4096 bit patterns
• Required bit patterns (4104) is more than what we have (4096), so this
instruction set is not possible with only 12 bits.
Example 3
Given 8-bit instructions, it is possible to use expanding opcodes to
allow the following to be encoded? If so, show the encoding.
a) 3 instructions with two 3-bit operands
16 b) 2 instructions with one 4-bit operand
c) 4 instructions with one 3-bit operand
First, we must determine if the encoding is possible.
a) 3 * 23 * 23 = 3 * 26 = 192
b) 2 * 24 = 32
c) 4 * 23 = 32
• If we sum the required number of bit patterns, we get 192 + 32 + 32 = 256.
• 8 bits in the instruction means a total of 28 =256 bit patterns, so we have an
17 exact match (which means the encoding is possible, but every bit pattern will
be used in creating it).
The encoding we can use is as follows:
00 xxx xxx
01 xxx xxx 3 instructions with two 3-bit operands
10 xxx xxx
11 – escape opcode
1100 xxxx
18 2 instructions with one 4-bit operand
1101 xxxx
1110 – escape opcode
1111 – escape opcode
11100 xxx
11101 xxx
4 instructions with one 3-bit operand
11110 xxx
11111 xxx
Example 4
Consider a processor supporting 12 bit long instructions and 1 KB
memory space. If there exists 2 1-address instructions, then how
many 0-address instructions can be formulated?
19
Total instruction length = 12 bits
Memory = 1KB = 210 B
Memory
Opcode
address
2 bits 10 bits
Steps:
20 1. Identify the higher order instruction format
2. Identify total no. of instructions possible
3. Identify no. of free opcodes
4. Calculate the no. of derived opcodes by multiplying the no. of free opcodes
with decoded value of address field.
1. 1-address instruction is higher format
2. Total 1-address instruction = 22 = 4
00 xxxxxxxxxx
2 instructions with 1 memory operand
01 xxxxxxxxxx
3. 10 – free opcode
11 – free opcode
21
4. 10 0000000000 11 0000000000
210 210
10 0000000001 instructions 11 0000000001 instructions
………… …………….
10 1111111111 11 1111111111
Total 0-address instructions = 210 + 210 = 211
Total possible instructions = 1-address + 0-address = 2 + 211 = 2050
Example 4
A CPU is designed to have 58 3-address instructions. The CPU is able
to address a maximum of 16 memory locations. If length of all the
instructions is same, then by using expanding opcode technique,
calculate the 2-address instructions possible.
22
23 THANK YOU
Computer Organization and Architecture
Unit - II
Instruction Set Architecture
Lecture 5: Flow of Control
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Instruction Execution and Straight Line Sequencing
Address Contents C A + B
32 bits
2 Program Counter .
.
i
A
.
.
Data for the
B program
.
.
C
• Executing a given instruction is a 2-phase process
• First phase is called – instruction fetch.
• The instruction is fetched from the memory location whose address is there in the
PC.
• This instruction is placed in instruction register (IR) in the processor.
3 • Second phase – instruction execute.
• The instruction in IR is examined to determine which operation is to be performed.
• The specified operation is then performed by the processor.
• This often involves fetching operands from the memory or from the processor
registers, performing an arithmetic or logic operation, and storing the result in
destination location.
• At some point in the 2-phase process, the contents of PC is incremented.
Branching
32 bits
Sum Sum
Num1 Num1
Num2 Num2
. .
. .
Numn Numn
Stacks
• A computer program often needs to perform a particular subtask using the familiar
subroutine structure. In order to organize the control and information linkage
between the main program and subroutine, a data structure called stack is used.
• A stack is a list of data elements with the accessing restrictions that elements can be
added or removed at one end of the list only. This end is called top of the stack and
other end is called bottom.
5
• This structure is also called as pushdown stack. Example, a pile of books.
• Stack follows LIFO approach, i.e., Last-in-first-out. The terms push and pop are used
to describe placing a new item on the stack and removing the top from the stack,
respectively.
• Data stored in the memory of the computer can be organized as a stack with
successive elements occupying successive memory locations.
• A processor register “Stack pointer” is used to keep track of top element in the stack.
A stack in the memory
0
Stack pointer
register .
.
Current top
SP -28 element
6 17
Stack 739
.
.
Bottom
BOTTOM 43 element
.
.
2k-1
Stack Operations
• If we assume a word-addressable memory with a 32-bit word length, the Push
operation can be implemented as:
Subtract SP, #4 //SP SP - 4
Move (SP), NEWITEM //Mem[SP] = NEWITEM
SP 19
-28 -28
17 SP 17
739 Stack 739
8
. .
. .
BOTTOM 43 BOTTOM 43
PUSH POP
Safe PUSH and POP operations
Suppose a stack runs from location 2000 (BOTTOM) down no further than location
1500. The stack pointer is initially loaded with address value 2004, i.e., the first
element will be loaded at address 2000.
Compare to see if SP
SAFEPOP Compare SP, #2000
contains an address value
9 greater than 2000. If it does,
Branch>0 EMPTYERROR the stack is empty.
Move ITEM, (SP)+ If not, pop the element
Compare to see if SP
SAFEPUSH Compare SP, #1500
contains an address value
less than or equal to 1500. If
Branch ≤ 0 FULLERROR it does, the stack is full.
Move -(SP), NEWITEM If not, push the element
Subroutines:
• A subroutine is a program fragment that lives in user space, performs a well-
defined task. It is invoked by another user program and returns control to the
calling program when finished.
• When a program branches to a subroutine, we say it is calling the subroutine.
• The instruction that performs the branch operation is called Call instruction.
10 • After a subroutine is executed, the calling program must resume execution,
continuing immediately after the instruction that called the subroutine.
• The subroutine is said to return to the program that called it by executing a Return
instruction.
• The way in which a computer makes it possible to call and return from subroutines
is referred to as its subroutine linkage method.
• The simplest subroutine linkage method is to save the return address in link
register.
Call and Return instructions
1000
PC 204
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Return Return to calling program
Calling Program
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Return Return to calling program
R1
Calling Program
n
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Return Return to calling program
R1 R2
Calling Program
n NUM1
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Return Return to calling program
R1 R2
Calling Program
n NUM1
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 0
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP Processor stack
Move 20(SP), R0 Put result onto stack
MoveMultiple R0-R2, (SP)+ Restore registers
Return Return to calling program
R1 R2
Calling Program
n NUM1
NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 10
0
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
n
n-1 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 10
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
n-1 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 10
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 NUM1
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
0 NUMn
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 total
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 total
NUM1
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
0 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 SUM
0
total
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100 R2
Add SP, #8 Restore TOS (at level 1) SP
Subroutine 104 R1
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 total
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
0 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 SUM
0
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100
Add SP, #8 Restore TOS (at level 1)
Subroutine 104
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 total
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
0 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 SUM
0
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100
Add SP, #8 Restore TOS (at level 1)
Subroutine 104
Clear R0 116 n
LOOP Add R0, (R2)+ Add entry from list 120 total
Decrement R1 124 Level 1
Branch>0 LOOP …..
Move 20(SP), R0 Put result onto stack
NUM1 10
MoveMultiple R0-R2, (SP)+ Restore registers
….
Return Return to calling program
NUMn 12
R1 R2
Calling Program
0 NUM2
Move -(SP), #NUM1 Push parameters onto stack
Move -(SP), N SP = SP-1; SP = Mem[N] = n R0 SUM
0
CALL LISTADD Call Subroutine (TOS at level 2)
32 bits
Move SUM, 4(SP) Save result Level 3
100
Add SP, #8 Restore TOS (at level 1)
Subroutine 104
0x00000000 reserved
Pushing elements
To push elements onto the stack:
Word 1
• Move the stack pointer $sp down to
$sp Word 2
make room for the new data.
• Store the elements into the stack.
For example, to push registers $t1 and
47
$t2 onto the stack:
Before
sub $sp, $sp, 8
sw $t1, 4($sp) Word 1
Word 2
sw $t2, 0($sp)
$t1
Before and after diagrams of the stack are $sp $t2
shown on the right.
After
Accessing and popping elements
• You can access any element in the
stack (not just the top one) if you Word 1
know where it is relative to $sp. Word 2
• For example, to retrieve the value of $t1
$t1: $sp $t2
lw $s0, 4($sp)
48 • You can pop, or “erase,” elements Before
simply by adjusting the stack pointer
upwards.
Word 1
• To pop the value of $t2, yielding the
stack shown at the bottom: Word 2
$sp $t1
addi $sp, $sp, 4
$t2
• Note that the popped data is still
present in memory, but data past the
stack pointer is considered invalid. After
Examples on subroutines without stack
t = 3a + 5; addi $sp,$sp,-4
sw $s1,0($sp)
return t;
li $s0, 3
50 }
mul $s1,$s0,$a0
y = fun(x)
addi $s1,$s1,5
a) Software Interrupts
b) Hardware Interrupts
52
a) Software Interrupts
55
56 THANK YOU
Computer Organization and Architecture
Unit - II
Instruction Set Architecture
Lecture 13: Assembly Language
Dr. Vineeta Jain
Department of Computer Science and Engineering, LNMIIT Jaipur
Machine, Assembly and High Level Language
• Machine Language
• Native to a processor: executed directly by hardware.
• Instructions consist of binary code: 1’s and 0’s.
• Assembly Language
2
• Low-level symbolic version of machine language.
• One to one correspondence with machine language.
• Pseudo instructions are used that are much more readable and easy to use.
• High-Level Language
• Programming languages like C, C++, Java.
• More readable and closer to human languages.
Assemblers and Compilers
• Assembler
• Translates an assembly language
program to machine language. High-level
Language
• Compiler
• Translate a high-level language Compiler
3 Assembly
programs to assembly/ machine Compiler Language Alternative
language.
• The translation is done by the
compiler directly, or Assembler
Machine
• The compiler first translates to Code
assembly language and then
• the assembler converts it to
machine code.
.text 0101010 0101000
.global main 1101100 0101111
0000011 0100101
main: la $t0, value 0101010 0101000
Assembling 1101100 0101111
lw$t1, 0($t0)
lw$t2, 4($t0) 0000011 0100101
4
add $t3, $t1, $t2 0101010 0101000
sw $t3, 8($t0) 1101100 0101111
0000011 0100101
.data 0101010 0101000
value: .word 50, 30, 0 1101100 0101111
0000011 0100101
Features of Assembly Language
• One-to-one mapping
• A pure assembly language is a language in which each statement produces
exactly one machine instruction.
• There is a one-to-one correspondence between machine instructions and
statements in the assembly program.
5
• Fully accessing instructions in machines
• The assembly programmer has access to all instructions available on the target
machine. The high-level language programmer does not.
• Everything that can be done in machine language can be done in assembly
language, but many instructions, registers, and similar features are not
available for the high-level language programmer to use.
Pseudoinstructions
• MIPS32 assemblers supports several pseudo-instructions that are meant for user
convenience.
• Internally the assembler converts them to valid MIPS32 instructions.
• Example: The pseudo-instruction branch if less than
7 blt $s1, $s2, Label
blt $1, $2, Label slt $at, $1, $2 Branch if less than
bne $at, $zero, Label
bgt $1, $2, Label sgt $at, $1, $2 Branch if greater than
bne $at, $zero, Label
ble $1, $2, Label sle $at, $1, $2 Branch if less or equal
8 bne $at, $zero, Label
bge $1, $2, Label sge $at, $1, $2 Branch if greater or equal
bne $at, $zero, Label
li $1, 0x23ABCD lui $1, 0x0023 Load immediate value into
ori $1, $1, 0xABCD a register
move $1, $2 add $1, $2, $zero Move content of one register
to another
la $a0, 0x2B09D5 lui $a0, 0x002B Load address into a register
ori $a0, $a0, l0x09D5
Macros
• A macro definition is a way to give a name to a piece of text. After a macro
has been defined, the programmer can write the macro name instead of the
piece of program.
• Basic parts in macro definition
• A macro header giving the name of the macro being defined
• The text comprising the body of the macro
9 • A pseudo instruction marking the end of the definition
• Macro call and expansion
• When the assembler encounters a macro definition, it saves it in a macro
definition table for subsequent use.
• From that point on, whenever the name of the macro appears as an opcode,
the assembler replaces it by the macro body.
• The use of a macro name as an opcode is known as a macro call and its
replacement by the macro body is called macro expansion.
Macro Example 1
• To terminate the program, the instructions used:
li $v0,10
syscall
• But still it is tedious to write it again and again. We can define a macro,
let’s call it done.
10
.macro done
li $v0,10
syscall
.end_macro
And then invoke it wherever necessary with the statement:
done
Macro Example 2
• Printing an integer (argument may be either an immediate value or register
name)
.macro print_int %x
li $v0, 1
11 add $a0, $zero, %x
syscall
.end_macro
print_int $s0
The .include directive
• .include directive has one operand, a quoted filename. The contents of the specified file
are substituted for the directive. This occurs during assembly preprocessing.
• It is like #include in C or C++.
• Suppose "macros.asm" contains the following:
.macro done
li $v0,10
12
syscall
.end_macro
• You could then include it in a different source file something like this:
.include "macros.asm"
.text
lw $a0, value
done
13
Stages of Compilation
The Four Stages of Compilation
Executable
calc.c calc.s calc.o Program
calc.exe
math.c math.s math.o Exists on disk
14
LOADER
Obj Files
C Source
Files io.s io.o
Assembly Executing
COMPILER Files libc.o In main
Memory
calc.exe
math.c math.s math.o Exists on disk
18
Obj Files
io.s io.o
Assembly Executing
Files libc.o In main
Memory
• Assemblers need to
• translate assembly instructions and pseudo-instructions into
machine instructions (object files).
• Convert decimal numbers, etc. specified by programmer into
19 binary
• Typically, assemblers make two passes over the assembly file
• First pass: reads each line and records labels in a symbol table.
• Second pass: use info in symbol table to produce actual machine
code for each line
Object File Format
An object file contains the following information:
• A header that says where in the files the sections below are located
• A text segment, which contains all the source code (with some
missing addresses)
20
• A data segment: static data (local/global vars, strings, constants)
• Relocation Records: identifies lines of code that need to be “fixed”
• Symbol Table: list of this file’s referencable” labels
• Debugging information
• line number code address map, etc.
21
Symbol Table
• The Symbol table records the list of “items” in the file that can be
used by the code in this file and in other files
• E.g., subprograms
• E.g., “global” variables in the data segment
22
• Each entry in the table contains the name of the label and its
offset within this object file
Relocation Records
• The Relocation records contain the list of “items” that this file
needs (from other object files or libraries)
• E.g., functions not defined in this file’s text segment
• E.g., “global” variables not defined in this file data segment
23
24
25
Linker
Executable
calc.c calc.s calc.o Program
calc.exe
math.c math.s math.o Exists on disk
26
Obj Files
io.s io.o
Executing
libc.o In main
Memory
libm.o LINKER
Process
Linker
calc.exe
math.c math.s math.o Exists on disk
29
LOADER
io.s io.o
Executing
libc.o In main
Memory
libm.o Process
Recap
Processor
Input
Control
Memory
Datapath
Output
Technology Trends
Capacity Speed (latency)
Logic: 2x in 3 years 2x in 3 years
DRAM: 4x in 3 years 2x in 10 years
Disk: 4x in 3 years 2x in 10 years
DRAM
Year Size Cycle Time
1980 1000:1! 64 Kb
2:1! 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1995 64 Mb 120 ns
1998 256 Mb 100 ns
2001 1 Gb 80 ns
Who Cares About Memory?
Processor-DRAM Memory Gap (latency)
1000 CPU
µProc
60%/yr.
“Moore’s Law”
Performance
(2X/1.5yr)
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1983
1980
1981
1982
1984
1985
2000
1991
1987
1988
1989
1990
1992
1993
1994
1995
1996
1997
1998
1999
1986
Time
Today’s Situation: Microprocessors
Processor
Control
Memory
Memory
Memory
Memory
Datapath Memory
Lower Level
Upper Level Memory
Processor Memory
transferred
Data are
Memory Hierarchy: Terminology
■ Hit: If the data requested by a processor appears in some
block in the upper level.
❑ Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
❑ Hit Rate: The fraction of memory access found in the upper
level
■ Miss: If the data is not found in the upper level.
❑ Miss Rate = 1 - (Hit Rate)
❑ Miss Penalty: Time to replace a block in the upper level +
Processor
Control
Tertiary
Secondary
Storage
Storage
Second Main (Tape)
(Disk)
Level
Registers
Memory
On-
Cache
Chip
Secondary
Storage Disks (Magnetic) Lower level
L1 Cache (SRAM)
Upper level
Processor Registers (D Flip-Flops)
Differences in Memory Levels
101
000
001
010
011
100
110
111
Cache
Memory
Tag 20
offset
•Cache index is used
10
Index to select the block
Index Valid Tag Data •Tag field is used to
0 compare with the value
1 of the tag filed of the
2
… cache
… •Valid bit indicates if a
… cache block have valid
1021
information
1022
1023
20 32
Data
=
Hit
Two-way Set-associative Cache
Address
31 30 . . . 13 12 11 . . . 21 0
Byte
offset
20 10
1021
1022
1023
= =
2-to-1 multiplexor
Hit Data
Example: Alpha 21064 Data Cache
For caches with low miss rates, random is almost as good as LRU.
Q4: What Happens on a Write?
■ Write through: The information is written to both the block
in the cache and to the block in the lower-level memory.
■ Write back: The information is written only to the block in
the cache. The modified cache block is written to main
memory only when it is replaced.
■ is block clean or dirty? (add a dirty bit to each block)
■ Pros and Cons of each:
■ Write through
■ Read misses cannot result in writes to memory,
■ Easier to implement
■ Always combine with write buffers to avoid memory latency
■ Write back
■ Less memory traffic
■ Perform writes at the speed of the cache
Q4: What Happens on a Write?