Homework 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

1.

Consider computing the overall CPI for a machine A for which the following
performance measures were recorded when executing a set of benchmark
programs. Assume that the clock rate of the CPU is 200 MHz.
Instruction category Percentage of occurrence No. of cycles per
instruction
ALU 38 1
Load & store 15 3
Branch 42 4
Others 5 5
Assuming the execution of 100 instructions,
- What is machine A’s CPI?
- What is machine A’s MIPS?

Given data:
+, clock rate 200 MHz
Find: CPI and MIPS
CPI (Cycles Per Instruction) = ∑(Execution Time / Instruction Count)
Overall CPI = 0.38*1 + 0.15*3 + 0.42*4 + 0.05*5 = 2.76
MIPS= Clock Rate (MHz) / CPI = 200 / 2.76 = 72.46

2. Suppose that the same set of benchmark programs considered above were executed
on another machine, call it machine B, for which the following measures were
recorded.
Instruction category Percentage of occurrence No. of cycles per
instruction
ALU 35 1
Load & store 30 2
Branch 15 3
Others 20 5
What is the MIPS rating for the machine considered in the previous example
(machine A) and machine B assuming a clock rate of 400 MHz?

Given data:
+, Machine A : Clock rate of 200Mhz
+, Machine B : Clock rate of 400 MHz
Find: CPI(B) and MIPS(A)
CPI (Cycles Per Instruction) = ∑(Execution Time / Instruction Count)
Overall CPI = 0.35*1 + 0.3*2 + 0.15*3 + 0.2*5 = 2.4
MIPS = Clock Rate (MHz) / CPI
For machine A : 400 / 2.76 = 144.93
For machine B : 400 / 2.4 = 166.67

3. Consider a machine with three instruction classes and CPI measurements as follows:
Instruction class CPI of the instruction class
A 2
B 5
C 7
Suppose that we measured the code for a given program in two different compilers and
obtained the following data:
Instruction counts (in millions)
Code sequence
A B C
1 15 5 3
2 25 2 2
Assume that the machine’s clock rate is 500 MHz.
a. Which code sequence will execute faster according to MIPS?
b. How much according to the execution time of each code sequence?

Given data:
+, Machine clock rate is 500 Mhz
Find:
Total Number of Cycles for Each Code Sequence:
Code Sequence 1:
1. Class A cycles: 15 * 2=30 (million cycles)
2. Class B cycles: 5 * 5 = 25 (million cycles)
3. Class C cycles: 3 * 7 = 21 (million cycles)
Total cycles for Code sequence 1: 30 + 25 + 21 = 76 (million cycles)
Code Sequence 2:
1. Class A cycles: 25 * 2 = 50 (million cycles)
2. Class B cycles: 2 * 5 = 10 (million cycles)
3. Class C cycles: 2 * 7 = 14 (million cycles)
Total cycles for Code sequence 1: 50 + 10 + 14 = 74 (million cycles)
Execution Time = Clock Rate / Total Cycles
Execution Time for Code Sequence 1:
4. A compiler designer is trying to decide between two code sequences for a particular
machine. The hardware designers have supplied the following facts:
Instruction class CPI of the instruction class
A 1
B 3
C 4
For a particular high-level language, the compiler writer is considering two sequences that
require the following instruction counts:
Instruction counts (in millions)
Code sequence
A B C
1 2 1 2
2 4 3 1
a. What is the CPI for each code sequence?
b. Which code sequence is faster? By how much?

ANS:
CPI = SUM of INSTRUCTION COUNTS of EACH CLASS

a. For code sequence 1: 2*1 + 3*1 + 4*2 = 13


For code sequence 2: 4*1 + 3*3 + 4*1 = 17
b. Code 1 is faster, by (17-13) 4
5. Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1,
12, and 5, respectively. Also assume that on a single processor a program requires
the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and
256 million branch instructions. Assume that each processor has a 2 GHz clock
frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of
arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is
the number of processors) but the number of branch instructions per processor
remains the same.
a. Find the total execution time for this program on 1, 2, 4, and 8 processors, and
show the relative speedup of the 2, 4, and 8 processor result relative to the single
processor result.
b. If the CPI of the arithmetic instructions was doubled, what would the impact be on
the execution time of the program on 1, 2, 4, or 8 processors?
c. To what extent should the CPI of load/store instructions be reduced in order for a
single processor to match the performance of four processors using the original CPI
values?

Task Given Data ANS

- Arithmetic instruction: CPI = 1 - ARITHMETIC CYCLE = NUMBER OF AR.


- Load/Store instruction: CPI = 12 INSTRUCTION * CPI of AR
- Branch instruction: CPI = 5 + A.C = 2.56*10^9 * 1 = 2.56*10^9
- Requirements: - LOAD/STORE CYCLE = NUMBER OF L/S
+ 2.56E9 Arithmetic instruction INSTRUCTION * CPI of L/S
+ 1.28E9 Load/Store instruction + L/S.C = 1.28*10^9*12 =
+ 256 mil Branch instruction 1.536*10^10
- Frequency: 2 GHz - BRANCH CYCLE = NUMBER OF B.
- Parallelization factor: 0.7 INSTRUCTION * CPI of B.
+ B.C = 256*10^6 * 5 = 1.28* 10^9
=> Total cycle = A.C + L/S.C + B.C =
1.8432*10^10

a.
- EXECUTION TIME per PROCESSOR =
TOTAL CYCLE/FREQUENCY
= 1.8342*10^10/(2*10^9)
= 9.216(s)
- With 1 processor: 9.216(s)
- With 2 processors: 4.608(s)
- With 4 processors: 2.304(s)
- With 8 processors: 1.152(s)

- Comparison: it is 2, 4 , 8 times faster


than the time of single processor
b.
- A.C = 2.56*10^9 * 2 = 5.12*10^9
-> Total cycle = 1.7408*10^10
-> Execution per processor= 8.704(s)

- With 1 processor: 8.704(s)


- With 2 processors: 4.352(s)
- With 4 processors: 2.176(s)
- With 8 processors: 1.008(s)

c.
- We consider L/S CPI before adjusting is
CPI1; after adjusting is CPI2
- In order to find CPI2, execution time
on single processor with CPI2 =
execution time on 4 processors with
CPI1

*With CPI1
- Based on the results in a., we know the
execution time = 2.304(s) → (1)
*With CPI2
- Execution time = [A.C + B.C
+L/S.C(new)]/(2*10^9) →(2)

- We let (1)=(2) => L/S.C (new) =


7.68*10^8
=> CPI2 = 0.6

6. Assume that a switching component such as a transistor can switch in zero time. We
propose to construct a disk-shaped computer chip with such a component. The only
limitation is the time it takes to send electronic signals from one edge of the chip to
the other. Make the simplifying assumption that electronic signals can travel at
300,000 kilometers per second.
a.What is the limitation on the diameter of a round chip so that any computation
result can be used anywhere on the chip at a clock rate of 1 GHz?
b.What are the diameter restrictions if the whole chip should operate at 1THz =
1012Hz?
c. Is such a chip feasible?

a. From the given data, we know that each clock lasts for 1/1GHz = 10^-9 (s)
- We know that MAXIMUM DISTANCE = Speed * Time, whereas Time is the delay for
the electronics signal from one edge to other
=> MAXIMUM DISTANCE = 300,000 * 10^-9 = 0.3 (mm)

- Since it is a round chip -> Diameter = 2*0.3=0.6mm


-> Any round chip with diameter smaller than 0.6mm would ensure the requirements

b. Time = 1/10^12=10^-12(s)
=> Maximum distance = 0.3*10^-6(m)
=> Diameter = 0.6*10^-6(m)

c. No since with the current technology, we cannot create a chip with a specified
diameter restriction for operation at 1Thz

7. Consider having a program that runs in 50 s on computer A, which has a 500 MHz
clock. We would like to run the same program on another machine, B, in 20 s. If
machine B requires 2.5 times as many clock cycles as machine A for the same
program, what clock rate must machine B have in MHz?

CPU time of A: 50 seconds


Clock rate A: 500MHz = 500 * 10^6 Hz

=> Clock cycle of A = 50 * 500 * 10^6 = 2.5*10^10


CPU time of B: 20 seconds
Clock cycle of B = 2.5 * Clock cycle of A = 6.25*10^10
=> Clock rate of B = 6.25 * 10^10 / 20 = 3.125 * 10^9 Hz = 3125 MHz

8. Suppose that we have two implementations of the same instruction set architecture.
Machine A has a clock cycle time of 50 ns and a CPI of 4.0 for some program, and
machine B has a clock cycle of 65 ns and a CPI of 2.5 for the same program. Which
machine is faster and by how much?

We know that CPU time = Instruction counts * CPI * Clock cycle time
At here because we don’t know exactly how many instructions are there and two
implementations of the same instruction set, so I will call N is the number of
instructions in this case
Plug it to formula:
- CPU time of A = N * 4.0 * 50 * 10^-9 = 2 * 10^-7
- CPU time of B = N * 2.5 * 65 * 10^-9 = 1.625 * 10^-7
To know which machine is faster, we can calculate ratio of two CPU times:
=> A / B = 1.23 time
=> Machine B is faster 1.23 time than machine A

9. Consider three different processors P1, P2, and P3 executing the same instruction
set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI
of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions per
second?
- Way 1 (my solution):

Call N1, N2, N3 is the number of instruction sets of P1, P2, P3


CPU time of P1 = N1 * 1.5 * 1/(3 * 10^9) = N1 * 5 * 10^-10
CPU time of P2 = N2 * 1.0 * 1/(2.5 * 10^9) = N2 * 4 * 10^-10
CPU time of P3 = N3 * 2.2 * 1/(4 * 10^9) = N3 * 5.5 * 10^-10

And we know that, the smaller CPU time is, the faster and more efficient machine is,
so from that we can decide that P2 has the highest performance

- Way 2: Calculate IPS (Instruction per second) = Clock rate / CPI


IPS1 = 2.5 * 10^9
IPS2 = 2.5 * 10^9
IPS3 = 1.8 * 10^9

You can see that IPS1 = IPS2, from here we will base on CPI and you already know
that the smaller CPI, the better machine is
=> P2 has the highest performance

b. If the processors each execute a program in 10 seconds, find the number of cycles
and the number of instructions.

Clock cycles = Clock rate * CPU time


C1 = 3 * 10 = 30
C2 = 1 * 10 = 10
C3 = 2.2 * 10 = 22

N1 = 2 * 10^10
N2 = 2.5 * 10^10
N3 = 1.81 * 10^10
c. We are trying to reduce the execution time by 30% but this leads to an increase of
20% in the CPI. What clock rate should we have to get this time reduction?
10. Consider two different implementations of the same instruction set architecture. The
instructions can be divided into four classes according to their CPI (class A, B, C, and
D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate
of 3 GHz and CPIs of 2, 2, 2, and 2. Given a program with a dynamic instruction count
of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50%
class C, and 20% class D, which implementation is faster?
a. What is the global CPI for each implementation?

To calculate the global CPI, we consider the average CPI weighted by the percentage
of instructions in each class.

+ Implementations P1
● Instruction classes and CPIs:
○ A: 1 CPI
○ B: 2 CPI
○ C: 3 CPI
○ D: 3 CPI
● Instruction mix:
○ A: 10%
○ B: 20%
○ C: 50%
○ D: 20%

→ Global CPI for P1:

P1 CPI = (1 * 0.1) + (2 * 0.2) + (3 * 0.3) + (3 * 0.2)

P1 CPI = 2.6

+ Implementations P2

● Instruction classes and CPIs:


○ A: 2 CPI
○ B: 2 CPI
○ C: 2 CPI
○ D: 2 CPI
● Instruction mix:
○ A: 10%
○ B: 20%
○ C: 50%
○ D: 20%

→ Global CPI for P2:

P2 CPI = (2 * 0.1) + (2 * 0.2) + (2 * 0.3) + (2 * 0.2)

P2 CPI = 2

b. Find the clock cycles required in both cases.


With the global CPI for each processor, we can calculate the clock cycles required to
execute the 1 million instruction program.
● Clock rate:
○ P1: 2.5 GHz = 2.5 * 10^9 Hz
○ P2: 3 Hz = 3 * 10^9 Hz
● Instruction count: 1.0E6 instructions

→ Clock cycles for P1:


P1 cycles = (1.0E6 instructions * 2.6 CPI) / (2.5 * 10^9 Hz)
P1 cycles ≈ 1.04 million cycles
→ Clock cycles for P2:
P2 cycles = (1.06E instructions * 2 CPI) / (3 * 10^9 Hz)
P2 cycles ≈ 0.667 million cycles
CONCLUSION:
● P2 is faster than P1. It has a lower global CPI (2) compared to P1 (2.6)
● P2 requires fewer clock cycles (≈ 0.667 million cycles) to execute the program
compared to P1 (≈ 1.04 million cycles).
11. Compilers can have a profound impact on the performance of an application.
Assume that for a program, compiler A results in a dynamic instruction count of
1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic
instruction count of 1.2E9 and an execution time of 1.5 s.
a. Find the average CPI for each program given that the processor has a clock cycle
time of 1 ns.

We have a formula:

CPI (Cycles Per Instruction) = Execution Time * Clock Cycle / Instruction Count

→ For Compiler A:

CPI A = (1.1 s * 1) / (1.0E9 instructions)

= (1.1 * 10^9 ns * 1) / (1.0E9 instructions)

= 1.1 cycles/instruction

→ For Compiler B:

CPI B = 1.5 s * 1 / (1.2E9 instructions)

= 1.5 * 10^9 ns / (1.2E9 instructions)

= 1.25 cycles/instruction

b. Assume the compiled programs run on two different processors. If the execution
times on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?
If the execution time on the two processors are the same, then the clock cycles for
each processor can be calculated using the formula:

Clock cycles = Instruction Count * CPI


→ For Compiler A: clock cycles = 1.0E9 * 1.1 = 1.1E9 cycles
→ For Compiler B: clock cycles = 1.2E9 * 1.25 = 1.5E9 cycles

The ratio of the clock cycles for Compiler A vs Compiler B is:


Clock cycles ratio = clock cycles of B / clock cycles of A
Clock cycles ratio = 1.5E9 / 1.1E9
Clock cycles ratio = 1.37

→ Therefore, the clock of the processor running Compiler A's code is 1.37 times
faster than the clock of the processor running Compiler B's code.
c. A new compiler is developed that uses only 6.0E8 instructions and has an average
CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or
B on the original processor?

New Compiler Speedup

● New Compiler Instruction Count = 6.0E8 instructions


● New Compiler CPI = 1.1 cycles/instruction

We can calculate the execution time for the new compiler:

● New Compiler Execution Time = New Compiler Instruction Count * New


Compiler CPI * Clock Cycle Time
● Assuming the same clock cycle time (1 ns) from part a:

New Compiler Execution Time = 6.0E8 instructions * 1.1 cycles/instruction * 1


ns/cycle

= 6.6 ns

Speedup:

● Speedup is the ratio of the original execution time to the new execution time.
● Speedup_A = Original Execution Time_A / New Compiler Execution Time

Speedup_A = 1.1 s / (6.6 ns) = 1.1 x 10^9 ns / (6.6 ns)

Speedup_A = 1.667 x 10^8 (approximately 166.7 million times faster)

● Speedup_B = Original Execution Time_B / New Compiler Execution Time

Speedup_B = 1.5 s / (6.6 ns) = 1.5 x 10^9 ns / (6.6 ns)

Speedup_B = 2.273 x 10^8 (approximately 227.3 million times faster)

→ The new compiler offers significant speedup compared to both Compiler A and B.

12. Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1,
12, and 5, respectively. Also assume that on a single processor a program requires
the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and
256 million branch instructions. Assume that each processor has a 2 GHz clock
frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of
arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is
the number of processors) but the number of branch instructions per processor
remains the same.
a. Find the total execution time for this program on 1, 2, 4, and 8 processors, and
show the relative speedup of the 2, 4, and 8 processor result relative to the
single processor result.
n
We have formula: Clock Cycles = ∑ ❑(CPI i∗Instruction Count i )
i=1

Clock Cycles
Execution Time per processor =
Clock frequency

ExecutionTime ( Single Processor)


Relative Speedup (S) =
Execution Time (p Processors)
● Execution time for processor 1:
Compute the clock cycle using the following strategy:
Clock Cycles = 2.56E9 * 1 + 1.28E9 * 12 + 256M * 5
= 2.56E9 + 15.36E9 + 1280M
= 19.2E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 19.2E9 / 2GHz = 19.2E9 / (2 * 10^9)Hz = 9.6 sec
→ Relative Speedup = 9.6/9.6 = 1
● Execution time for processor 2:
Compute the clock cycle using the following strategy:
Clock Cycles = (2.56E9/0.7*2) * 1 + (1.28E9/0.7*2) * 12 + 256M * 5
= 1.83E9 + 10.97E9 + 1280M
= 14.08E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 14.08E9 / 2GHz = 14.08E9/(2*10^9)Hz = 7.04 sec
→ Relative Speedup = 9.6/7.04 = 1.37
● Execution time for processor 4:
Compute the clock cycle using the following strategy:
Clock Cycles = (2.56E9/0.7*4) * 1 + (1.28E9/0.7*4) * 12 + 256M * 5
= 0.91E9 + 5.49E9 + 1280M
= 7.68E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 7.68E9 / 2GHz = 7.68E9/(2*10^9)Hz = 3.84 sec
→ Relative Speedup = 9.6/3.84 = 2.5
● Execution time for processor 8:
Compute the clock cycle using the following strategy:
Clock Cycles = (2.56E9/0.7*8) * 1 + (1.28E9/0.7*8) * 12 + 256M * 5
= 0.46E9 + 2.74E9 + 1280M
= 4.48E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 4.48E9 / 2GHz = 4.48E9/(2*10^9)Hz = 2.24 sec
→ Relative Speedup = 9.6/2.24 = 4.29
b. If the CPI of the arithmetic instructions was doubled, what would the impact be
on the execution time of the program on 1, 2, 4, or 8 processors?
● Execution time for processor 1 when Arithmetic instructions are doubled:
If an Arithmetic instruction is double (CPI =2) then Execution time for processor 1
can be calculate as:
Clock Cycles = 2.56E9 * 2 + 1.28E9 * 12 + 256M * 5
= 5.12E9 + 15.36E9 + 1280M
= 21.76E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 21.76E9/2GHz = 21.76E9/(2*10^9)Hz = 10.88 sec
● Execution time for processor 2 when Arithmetic instructions are doubled:
If an Arithmetic instruction is double (CPI =2) then Execution time for processor 2
can be calculate as:
Clock Cycles = (2.56E9/0.7*2) * 2 + (1.28E9/0.7*2) * 12 + 256M * 5
= 3.65E9 + 10.97E9 + 1280M
= 15.9E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 15.9E9/2GHz = 15.9E9/(2*10^9)Hz = 7.954 sec
● Execution time for processor 4 when Arithmetic instructions are doubled:
If an Arithmetic instruction is double (CPI =2) then Execution time for processor 4
can be calculate as:
Clock Cycles = (2.56E9/0.7*4) * 2 + (1.28E9/0.7*4) * 12 + 256M * 5
= 1.83E9 + 5.49E9 + 1280M
= 8.6E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 8.6E9/2GHz = 8.6E9/(2*10^9)Hz = 4.297 sec
● Execution time for processor 8 when Arithmetic instructions are doubled:
If an Arithmetic instruction is double (CPI =2) then Execution time for processor 8
can be calculate as:
Clock Cycles = (2.56E9/0.7*8) * 2 + (1.28E9/0.7*8) * 12 + 256M * 5
= 0.914E9 + 2.74E9 + 1280M
= 4.934E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 4.934E9/2GHz = 4.934E9/(2*10^9)Hz = 2.468 sec
c. To what extent should the CPI of load/store instructions be reduced in order for
a single processor to match the performance of four processors using the original
CPI values?
For 4 processors: When a program is distributed to operate on multicore
processors, the amount of arithmetic and load store operations per processor is
divided by 0.7 and multiplied by the number of processors p, but the branch
command remains unchanged. There are four processors: 1,2,4,8.
Therefore,
Clock Cycles = (2.56E9/0.7*4) * 1 + (1.28E9/0.7*4) * 12 + 256M * 5
= 0.91E9 + 5.49E9 + 1280M
= 7.68E9
Now calculate the execution time with the help of following method:
Execution Time per processor = 7.68E9 / 2GHz = 7.68E9/(2*10^9)Hz = 3.84 sec

Reducing CPI of a single processor to match the performance of 4 processors:


Compute the clock cycle using the following strategy:
Clock Cycles = 2.56E9 * 1 + 1.28E9 * a + 256M * 5 = 3.84E9 + 1.28E9*a
Now calculate the execution time with the help of following method:
Clock Cycles
Execution Time per processor =
Clock frequency
Therefore, Execution Time per processor = (3.84E9 + 1.28E9*a) / 2GHz
= (3.84E9 + 1.28E9*a)/(2*10^9)Hz
= 3.84 sec
→a=3
The reduced CPI is calculated as follows:
Original CPI for load instructions = 12
a 3
Reduced CPI = = = 0.25 = 25%
original CPI for load instructions 12
Thus, the reduced CPI of load/store instructions in 25%
13. Suppose that we are developing a new version of the AMD Barcelona processor with
a 4 GHz clock rate. We have added some additional instructions to the instruction set
in such a way that the number of instructions has been reduced by 15%. The
execution time is reduced to 700 s and the new SPEC ratio is 13.7.
a. Find the new CPI?
b. This CPI value is larger than obtained in (a) as the clock rate was increased from 3
GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the
clock rate. If they are dissimilar, why?
c. By how much has the CPU time been reduced?
d. For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of
1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional
10% without affecting the CPI and with a clock rate of 4 GHz, determine the
number of instructions.
e. Determine the clock rate required to give a further 10% reduction in CPU time
while maintaining the number of instructions and with the CPI unchanged.
f. Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20%
while the number of instructions is unchanged.

CPI (Cycles Per Instruction) is a measure of how many clock cycles a processor takes to
execute one instruction.

CPI = (Execution Time * Clock Rate) / Instruction Count

a)

Given:

● Clock Rate = 4 GHz = 4 * 10^9 cycles per second


● Instruction Count Reduction = 15%
● Execution Time = 700 seconds
● SPEC Ratio = 13.7 (This doesn't directly factor into the CPI calculation)

Assuming the original instruction count for the SPEC benchmark was 2.389 * 10^12:

1. Calculate New Instruction Count: (1 - 0.15) * 2.389 * 10^12 = 2.03065 * 10^12


2. Calculate CPI: (700 s * 4 * 10^9 cycles/s) / 2.03065 * 10^12 instructions ≈ 1.379

b)
The Cycle per Instruction (CPI) is based on the architecture of the processor and the
Program

13) The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has
an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.
a) Find the CPI if the clock cycle time is 0.333 ns.
b) Find the SPECratio.
c) Find the increase in CPU time if the number of instructions of the benchmark is
increased by 10% without affecting the CPI.
d) Find the increase in CPU time if the number of instructions of the benchmark is
increased by 10% and the CPI is increased by 5%.
e) Find the change in the SPECratio for this change.
f) Suppose that we are developing a new version of the AMD Barcelona processor
with a 4 GHz clock rate. We have added some additional instructions to the
instruction set in such a way that the number of instructions has been reduced by
15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find
the new CPI.
g) This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from
3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the
clock rate. If they are dissimilar, why?
h) By how much has the CPU time been reduced? 58 Chapter 1 Computer
Abstractions and Technology
i) For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of
1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10%
without affecting the CPI and with a clock rate of 4 GHz, determine the number of
instructions.
j) Determine the clock rate required to give a further 10% reduction in CPU time while
maintaining the number of instructions and with the CPI unchanged.
k) Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20%
while the number of instructions is unchanged

a) Find the CPI if the clock cycle time is 0.333 ns.

0.333 ns (clock cycle time): Given in the problem statement.

750 s (execution time): Given in the problem statement.


2.389E12 (instruction count): Given in the problem statement.

● CPI (Cycles Per Instruction) = Total Clock Cycles / Instruction Count


● Clock Rate = 1 / 0.333 ns = 3 GHz (3 x 10^9 cycles per second)
● Total Clock Cycles = Execution Time * Clock Rate = 750 s * 3 x 10^9 cycles/s =
2.25 x 10^12 cycles
● CPI = 2.25 x 10^12 cycles / 2.389 x 10^12 instructions = 0.942

b) Find the SPECratio.

9650 s (reference time): Given in the problem statement.

750 s (execution time): Given in the problem statement.

● SPECratio = Reference Time / Execution Time = 9650 s / 750 s = 12.87

c) Find the increase in CPU time if the number of instructions is increased by 10%
without affecting the CPI.

● 10% (increase in instructions): Given in the problem statement.


● 0.942 (CPI): Calculated in part (a).
● 2.389E12 (original instruction count): Given in the problem statement.
● 3 GHz (clock rate): Calculated in part (a).

● New Instruction Count = 2.389 x 10^12 instructions * 1.10 = 2.628 x 10^12


instructions
● New Total Clock Cycles = New Instruction Count * CPI = 2.628 x 10^12 instructions
* 0.942 = 2.476 x 10^12 cycles
● New CPU Time = New Total Clock Cycles / Clock Rate = 2.476 x 10^12 cycles / 3 x
10^9 cycles/s = 825.33 s
● Increase in CPU Time = 825.33 s - 750 s = 75.33 s

d) Find the increase in CPU time if the number of instructions is increased by 10% and
the CPI is increased by 5%.

● 10% (increase in instructions)


● 5% (increase in CPI)
● 0.942 (original CPI): Calculated in part (a).
● 2.389E12 (original instruction count):
● 3 GHz (clock rate): Calculated in part (a).

● New CPI = 0.942 * 1.05 = 0.989


● New Total Clock Cycles = 2.628 x 10^12 instructions * 0.989 = 2.599 x 10^12
cycles
● New CPU Time = 2.599 x 10^12 cycles / 3 x 10^9 cycles/s = 866.33 s
● Increase in CPU Time = 866.33 s - 750 s = 116.33 s

e) Find the change in the SPECratio for this change.

● 9650 s (reference time): Given in the problem statement.


● 866.33 s (new CPU time): Calculated in part (d).

● New SPECratio = Reference Time / New CPU Time = 9650 s / 866.33 s = 11.14
● Change in SPECratio = 11.14 - 12.87 = -1.73

f) Find the new CPI for the new version of the AMD Barcelona processor.

● New Clock Rate = 4 GHz


● New Instruction Count = 2.389 x 10^12 instructions * 0.85 = 2.031 x 10^12
instructions
● New Total Clock Cycles = New Execution Time * New Clock Rate = 700 s * 4 x
10^9 cycles/s = 2.8 x 10^12 cycles
● New CPI = New Total Clock Cycles / New Instruction Count = 2.8 x 10^12 cycles /
2.031 x 10^12 instructions = 1.38

g) Determine whether the increase in the CPI is similar to that of the clock rate. If they
are dissimilar, why?

The increase in CPI (from 0.942 to 1.38) is not proportional to the increase in clock rate
(from 3 GHz to 4 GHz). Possible reasons include:

Original CPI = 0.942 : Calculated in part (a).

New CPI = 1.38 : Calculated in part (f).

Original clock rate = 3 GHz : Calculated in part (a).

New clock rate = 4 GHz : Given in the problem statement.

● Architectural Changes: New instructions may be more complex.


● Pipeline Inefficiencies: New instructions may not fit well in the pipeline.
● Memory Access: Changes could lead to more frequent, slower memory accesses.

h) By how much has the CPU time been reduced?

Original CPU time = 750 s : Given in the problem statement.

New CPU time = 700 s : Given in the problem statement.


● Reduction in CPU Time = 750 s - 700 s = 50 s

i) Determine the number of instructions for libquantum.

Execution time = 960 ns : Given in the problem statement.

CPI = 1.61 : Given in the problem statement.

Clock rate = 3 GHz : Given in the problem statement.

● Total Clock Cycles = Execution Time * Clock Rate = 960 x 10^-9 s * 3 x 10^9
cycles/s = 2880 cycles
● Number of Instructions = Total Clock Cycles / CPI = 2880 cycles / 1.61 = 1790
instructions

j) Determine the clock rate required to give a further 10% reduction in CPU time while
maintaining the number of instructions and with the CPI unchanged.

Reduction in CPU time = 10% : Given in the problem statement.

Number of instructions = 1790 : Calculated in part (i).

CPI = 1.61 : Given in the problem statement.

Original execution time = 960 ns : Given in the problem statement.

● New Execution Time: 960 ns * 0.90 = 864 ns


● Required Clock Rate: (Number of instructions * CPI) / New Execution Time = (1790
instructions * 1.61) / 864 x 10^-9 s = 3.34 GHz

k) Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20%
while the number of instructions is unchanged.

Reduction in CPI = 15% : Given in the problem statement.


Reduction in CPU time = 20% : Given in the problem statement.
Number of instructions = 1790 : Calculated in part (i).
Original CPI = 1.61 : Given in the problem statement.

Original execution time = 960 ns : Given in the problem statement.


● New CPI: 1.61 * 0.85 = 1.37
● New Execution Time: 960 ns * 0.80 = 768 ns
● Required Clock Rate: (Number of instructions * New CPI) / New Execution Time =
(1790 instructions * 1.37) / 768 x 10^-9 s = 3.18 GHz

You might also like