Introduction To Computer Architecture and Performance Measurement

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

CE 2471

Computer architecture and assembly


language

1
Chapter 1
Introduction to Computer
Architecture and Performances
Measurement

2
Course Objectives
• This course is designed to
➢ provide you with basic concepts and techniques that will
get you started in understanding and analysis of hardware
and software interaction in computer systems.
➢ show how computers work, how to analyze computer
performance, and what issues affect the design and
function of modern computers.

• Computing is a rapidly changing field, with processor speed


doubling every 1.5 years, and entire computer systems
becoming obsolete in two to four years.

4
Course Outline

• Technology and Performance Measurement


• Instruction Set Architecture (ISA)
• Arithmetic and ALU Design
• CPU Design and Execution
• Pipelining for Increased Performance
• Memory: Cache, Main, Virtual

5
Introduction

• Difference between hardware/software?


➢Hardware : physical elements of a computer
➢Software : programs or applications (all instructions that tell the hardware
how to perform a task)
➢Both implement algorithms
• What is a computer?
➢ Input, output, memory, processor
• How old is computing?
➢ 1803, Jacquard loom
➢1830, Charles Babbage, Analytical engine
➢1943, enigma, Alan Turing, Blechly Park, Colossus
➢1951, UNIVAC, 1st commercial computer

6
History of Computers

• Mechanical calculator
• Vacuum tube
• Transistor
• Integrated circuit
• Very Large Scale Integration (VLSI) /
Microprocessor
• Ultra Large Scale Integration (ULSI) /
Microprocessor

7
Mechanical calculator
• Mechanical calculator is a mechanical device used to
perform automatically the basic operations of
arithmetic. They have been rendered obsolete by the
advent of the electronic calculator.

8
Vacuum Tubes

• The vacuum tube is a glass tube that


has its gas removed, creating a vacuum.
• Vacuum tubes were used in early
computers

9
Transistors

• The transistor is the fundamental component


of modern electronic devices

10
Integrated Circuits

• Integrated circuits is a semiconductor


on which thousands or millions of tiny
resistors, capacitors, and transistors are
fabricated.

11
VLSI and ULSI

• In the 1980s, integrated circuits gave way to very large scale integrated
(VLSI) circuit technology, which eventually contain 20 000 to 1 000 000
transistors.

▪ ULSI: created in 1984 and contain 1 000 000 transistors and more (The
Intel 486 and Pentuim microprocessor use ULSI technology).

• The 1990s saw the emergence of distributed or networked computing of


personal computers

12
Moore’s Law

• The number of transistors per chip has approximately doubled


every 1.5 years since the early 1970s, when integrated circuits
became available. This effect is called Moore's Law.

• Moore predicted that


transistor counts
would double every 2
years

• Not really a law, just


an observation

13
Components of a Computer System

• Processor Computer

– Datapath unit Memory


– Control unit I/O Devices

• Memory & Storage Control Input


B
– Main Memory Processor U
S
– Disk Storage Datapath Output

• Input / Output devices


Network
– User-interface devices
– Network adapters
• For communicating with other computers

• Bus: Interconnects processor to memory and I/O


Types of Architectures
• There are two principal ways to connect components in a
computer, called the:
➢ Von Neumann architecture
➢ Harvard architecture

15
Von-Neumann architecture

• The same memory are used to


store both data and
program(instructions) that run
the program.

• CPU cannot access to program


memory and data memory
simultaneously.

16
Harvard architecture

• The data and program


(instructions) are stored into
separate memories

• CPU can access to program


memory and data memory
simultaneously.

17
Below Your Program
• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to
machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
– Processor, memory, I/O controllers
Levels of Program Code
• High-level language
– Level of abstraction closer to
problem domain
– Provides for productivity and
portability
• Assembly language
– Textual representation of
instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and data
A Hierarchy of Languages

Application Programs

High-Level Languages
Machine independent High-Level Language
Machine specific Low-Level Language
Assembly Language

Machine Language

Hardware
A Six-Level Computer

Application Programs
Level 5
Increased level
High-Level Language
of abstraction
Software Assembly Language Level 4

Operating System
Level 3

Interface Instruction Set


SW & HW Architecture Level 2

Microarchitecture Level 1 Each level hides


Hardware the details of the
Physical Design level below it
Level 0

21
Programmer's View (cont'd) Application Programs
High-Level Language Level 5

Assembly Language Level 4

• Application Programs (Level 5) Operating System Level 3

– Written in high-level programming languages Instruction Set


Architecture Level 2
– Such as Java, C++, Pascal, Visual Basic . . .
Microarchitecture
– Programs compile into assembly language level (Level 4) Level 1

• Assembly Language (Level 4) Physical Design Level 0

– Instruction mnemonics (symbols) are used


– Have one-to-one correspondence to machine language
– Calls functions written at the operating system level (Level 3)
– Programs are translated into machine language (Level 2)
• Operating System (Level 3)
– Provides services to level 4 and 5 programs
– Translated to run at the machine instruction level (Level 2)
Programmer's View (cont'd)
• Instruction Set Architecture (Level 2)
– Interface between software and hardware
– Specifies how a processor functions
– Machine instructions, registers, and memory are exposed
– Machine language is executed by Level 1 (microarchitecture)
• Microarchitecture (Level 1)
Application Programs
– Controls the execution of machine instructions (Level 2) High-Level Language Level 5

– Implemented by digital logic Assembly Language Level 4

• Physical Design (Level 0) Operating System Level 3


– Implements the microarchitecture at the transistor-level
Instruction Set
– Physical layout of circuits on a chip Architecture Level 2

Microarchitecture Level 1

Physical Design Level 0


What is an Instruction Set?

• To command a computer, you must speak its language.


• The words of a computer are called instructions, and its
vocabulary is an instruction set.
• ISA :
➢ Defines registers
➢ Defines data transfer modes (instructions) between registers,
memory and I/O
➢ There should be sufficient instructions to efficiently translate any
program for machine processing
• Next, define instruction set format – binary representation used
by the hardware Variable-length vs. fixed-length instructions

24
Types of ISA

• Complex Instruction Set Computer (CISC)


➢ Many instructions (several hundreds)
➢ An instruction takes many cycles to execute
➢ Example: Intel Pentium
• Reduced Instruction Set Computer (RISC)
➢ Small set of instructions
➢ Simple instructions
➢ Each executes in one clock cycle
➢ Effective use of pipelining
➢ Example: ARM
25
Processor Performance

• In order to measure processor performance, we use the units


of SPEC benchmarks.

• The System Performance Evaluation Cooperative (SPEC)


benchmark suite was created in 1989 to provide a consistent
set of realistic benchmarks for evaluating CPU performance.

• Physical Limits on Computer Capacity and Performance: that


these trends can not continue forever. In fact, there are
physical limits to computer chip size and speed!

26
Relative Performance
• Processor performance: Based on execution time of a program
on a processor X:
𝟏
Performancex =
Execution Timex
• Processor relative performance:
PerformanceX
Relative Performance = n =
PerformanceY
Execution TimeY
n=
Execution Timex
n > 1 => X is n times faster than Y

▪ Example: time taken to run a program


– 10s on A, 15s on B
– Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
– So A is 1.5 times faster than B
27
CPU Clocking
• Operation of digital hardware governed by a constant rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

• Clock cycle time (Clock period): is the amount of time for one
clock period to elapse (duration of a clock cycle)
• Clock frequency (rate): is the number of cycles per second (the
inverse of the clock cycle time).
◼ Clock period: duration of a clock cycle
◼ e.g., 250ps = 0.25ns = 250×10^–12s

◼ Clock frequency (rate): cycles per second


◼ e.g., 4GHz = 4000MHz = 4 ×10^9Hz
CPU Time
• Elapsed time
– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– Comprises user CPU time and system CPU time
– Different programs are affected differently by CPU and system performance

𝐶𝑃𝑈𝑇𝑖𝑚𝑒 = 𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠 x Clock Cycle Time


𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠
𝐶𝑃𝑈𝑇𝑖𝑚𝑒 =
𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒
• Performance improved by
➢ Reducing number of clock cycles
➢ Increasing clock rate (Frequency)
➢ Hardware designer must often trade off clock rate against cycle count
CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?

Clock CyclesB 1.2  Clock CyclesA


Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A  Clock Rate A
= 10s  2GHz = 20  10 9
1.2  20  10 9 24  10 9
Clock RateB = = = 4GHz
6s 6s
Instruction Count (IC)
Cycle Per Instruction (CPI)

• The CPU time must depend on the number of


instructions in a program (IC: instruction count).
• The CPU clock cycle is equal to the number of instructions
executed multiplied by the Average clock cycles per
instruction, which is the average number of clock cycles
each instruction takes to execute, is often abbreviated as
CPI.
• Since different instructions may take different amounts of
time depending on what they do, CPI is an average of all
the instructions executed in the program.
31
Instruction Count and CPI

𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠 = 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑢𝑛𝑡 x Cycles Per Incstruction

𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠 = IC x CPI

𝐶𝑃𝑈𝑇𝑖𝑚𝑒 = 𝐶𝑃𝑈 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒𝑠 x Clock Cycle Time

= 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑢𝑛𝑡 x Cycles Per Incstruction x Clock Cycle Time

= IC x CPI x Clock Cycle Time

IC 𝐱 CPI
=
Clock Rate
CPI Example1

• Computer A: Clock Cycle Time = 250ps, CPI = 2.0


• Computer B: Clock Cycle Time = 500ps, CPI = 1.2
• Same ISA
• Which is faster, and by how much?

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐴 = 𝐼𝐶 ∗ 𝐶𝑃𝐼𝐴 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒𝐴


= 𝐼𝐶 ∗ 2.0 ∗ 250𝑝𝑠 = 𝐼𝐶 ∗ 500𝑝𝑠 A is faster…

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 = 𝐼𝐶 ∗ 𝐶𝑃𝐼𝐵 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒𝐵


= 𝐼𝐶 ∗ 1.2 ∗ 500𝑝𝑠 = 𝐼𝐶 ∗ 600𝑝𝑠

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐵 𝐼𝐶 ∗ 600𝑝𝑠


= = 1.2 …by this much
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒𝐴 𝐼𝐶 ∗ 500𝑝𝑠
CPI Example2
Consider a processors, P1, that has a 3 GHz clock rate and a CPI of 1.5.
1. If the processor executes a program in 10 seconds, find the number of clock cycles
and the number of instructions.
2. We are trying to reduce the execution time by 30% but this leads to an increase of
20% in the CPI. What clock rate should we have to get this time reduction?

Solution
1. CPU clock cycles= CPU time * clock rate
=10*3*10^9=30*10^9 clock cycles

IC= CPU clock cycles/ CPI= 30*10^9/1.5 = 20*10^9 instructions

2. CPI new= CPI old + 0,2 * CPI old


= 1,2* CPI old
CPU time new= CPUtime old - 0,3 * CPUtime old
= 0,7 * CPUtime old
Clock rate = IC * CPI new / CPU time new
= 20 *10^9 * 1,2 *1,5 / ( 0,7 *10) =5.14 GHz
CPU Performance Equation
• Different types of instructions have different CPI
Let CPIi = clocks per instruction for class i of instructions
Let ICi = instruction count for class i of instructions

n ∑ (CPI × IC )
∑(CPI × IC )
i i
CPU cycles = i i CPI =
i=1
n

∑ IC
i=1

i
i=1

• Designers often obtain CPI by a detailed simulation


• Hardware counters are also used for operational CPUs
35
Determining the CPI
• Example: A compiler designer is trying to decide between two
code sequences for a particular machine. Based on the hardware
implementation, there are three different classes of instructions:

Given a program with 10^6 instructions divided into classes as follows:


10% class A, 20% class B, 50% class C, and 20% class D.
1. Find the clock cycles required for P1 and P2
2. Which implementation is faster?
Determining the CPI
Solution
Amdahl’s Law
• Amdahl's Law is a measure of Speedup
– How a program performs after improving portion of a computer
– Relative to how it performed previously

Execution Time old


=
Execution Time new
Amdahl’s Law Examples
• Example 1:
Assume that a program runs in 100 seconds on a machine, with
multiply operations responsible for 80 seconds. How much do I have
to improve the speed of multiplication if I want my program to run 2
times faster

39
Amdahl’s Law Examples
• Example 2:
▪ The following table shows the execution time of five routines of
a program running on different numbers of processors.
A (ms) B (ms) C(ms) D(ms) E(ms)
4 14 2 12 2

A) Find the total execution time, and how much it is reduced if


the time of routines A, C and E is improved by a factor of 1,3?

B) By how much is the total time reduced if routine B is


improved by a factor of 1,9?

C) By how much is the total time reduced if routine D is


improved by a factor of 1,9?
40
Amdahl’s Law Examples
• Solution:
A) Find the total execution time, and how much it is reduced if the time of routines
A, C and E is improved by a factor of 1,3?
• Total Execution time= A+B+C+D+E=4+14+2+12+2=34ms
• We assume that :
f: fraction affected by improvement f
Execution Time new = + (1 − f)
n: 𝑓𝑎𝑐𝑡𝑜𝑟 𝑜𝑓 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡=1.3 𝑛
Execution Time new = (A+C+E)/1.3 +B+D=(4+2+2)/1.3 + 14+12 = 6.15+26=32.15ms

Execution Time old 1


Speedupoverall = =
Execution Time new ((1 – f ) + f / n)

34
• Speedupoverall = 32.15 = 1.06

A (ms) B (ms) C(ms) D(ms) E(ms)


4 14 2 12 2 41
Amdahl’s Law Examples

• Solution:

B) By how much is the total time reduced if routine B is improved by a factor of 1,9?
• Execution Time new = B/1.9 +A+C+D+E=0.52*14 +4 +2 +12 +2=7.36+20=27.36
34
• Speedupoverall = = 1.24
27.36

C) By how much is the total time reduced if routine D is improved by a factor of 1,9?
• Execution Time new = D/1.9 +A+B+C+E=12/1.9 + 4+14+2+2=6.31+22=28.31
34
• Speedupoverall = = 1.20
28.31

A (ms) B (ms) C(ms) D(ms) E(ms)


4 14 2 12 2 42

You might also like