PDF 2.5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 37

COVER-CA

[Document subtitle]

Abstract
[Draw your reader in with an engaging abstract. It is typically a short summary of the document.
When you’re ready to add your content, just click here and start typing.]

mpoon
[Email address]
The following shows a typical computing process. This process reads input data, processes
the data, and then writes output data. The process has access to a data storage which can be
used to store data for future use. The stored data can turn back and influence the process.

Two examples of computing process:


 A census of Hong Kong collected a lot of data about her citizens. The process converts
the raw census data into more informative statistics such as mean family size and
income.
 An online bookstore receives a purchase order from a customer. A process handles the
purchase order by storing the current order and uses previous orders to make
recommendation of other books for the customer.

2
1. Components of a Programmable Computer

We will identify the components of a programmable computer.


We solve this problem by referring to the computing process model and then listing out
the major items relevant to a programmable computer:
 programs: contain instructions
 instructions: which is commands executable by the computer
 data: to be processed by the computer

These items will perform some processes. The major processes are listed below:
 instruction execution: an essential function of the programmable computer
 data storage: a function for storing the data before and after the instruction execution
 program storage: a function for storing the program in the programmable computer
for instruction execution
 inputting data and program: a function for data and program to go into the computer
from the outside world
 outputting data: a function for data to leave the computer to the outside world
The last two processes are essential because a computer cannot exist in isolation. A computer
useful for any purpose must be able to interact with the outside world.
These processes are refined and their roles are abstracted into the following
major components for a programmable computer.
 Arithmetic and Logic Unit (ALU): for instruction execution
 Memory system: for data and program storage
 Input: for data input into the programmable computer
 Output: for data output from the programmable computer
2. Introduction to Arithmetic and Logic Unit

The first component of a programmable computer is the Arithmetic and Logic Unit (ALU).
The ALU is a functional unit responsible for the execution of instructions.
 The execution of instructions is an essential function of a programmable computer.
 The input to the ALU includes the data and the instructions that command how to
deal with the data.
 The result of the instruction execution will appear at the output of the ALU.
The following figure shows a schematic diagram of ALU with its input and output. One input
channel is for sending in instructions and others for sending in data. The ALU can typically
many types of instructions, for example, add, subtract, and negation.

In the figure, the ALU has two data input channels and one data output channel. This is a
typical arrangement because most operations (instructions) have at most two operands.
 Addition: A + B. A and B are passed into the two input channels and the result of A+B
will appear at the output channel.
 Subtraction: A – B. The same case as addition.
 Negation: -A. This is a single operand operation. A is passed into one input channel.
The operation of the ALU is controlled by the instruction. For example, an Addition
instruction will make the ALU performing an addition operation on the input data. The ALU
will output data to inform other components of its status. For example, if an error occurs in
the calculation, then an error status may be emitted.
Different instructions are represented by different electronic signals which in turn
representing numbers. The ALU designer may specify that 01 representing Addition and 02
representing Subtraction. The coding of instruction is usually published in a technical
manual.
The ALU does not carry out any operation unless it is told to do so. The clock line connected
to the ALU sends regular signal to the ALU, in a way similar to the alarm "beep beep beep"
sound. Upon receiving a beep sound, the ALU executes one instruction. Then it executes
another instruction when the second beep sound arrives.
3. Input and Output

The final components are Input and Output. These two components connect the
programmable computer and the outside world. The following shows a schematic diagram of
the two components.
1. Digital Representation

The ALU will use digital representation to encode data. Data is an abstract entity but
eventually it must be represented somehow with a physical attribute. In electronic systems,
the common physical attribute to use is the voltage. Voltage is a continuous scalar.
There are actually two fundamental ways to represent data with voltages: analogue and
digital. Our ALU will use digital representation for greater reliability and error tolerance.
The following figure explains the analogue representation.
 Analogue representation is continuous. Any small changes in the signal can change
the original value into an incorrect value.
 Analogue representation can represent continuous data values, but it is not tolerant
to noise and other forms of signal degradation.
Digital representation represents data in discrete levels.

The following figure shows that a 2-level digital representation is a lot more error
tolerance than a 10-level digital representation.
 The levels are well defined and any sufficiently small fluctuation in the signal
will keep the signal at the same level. Digital representation is therefore less
prone to errors.
 One can decide the number of levels used in a digital representation. The error
tolerance decreases as more levels are defined.
2. Binary and Other Numeral Systems

The ALU will use the binary numeral system. The binary numeral system uses two symbols
to represent data '0' and '1'. Therefore its implementation requires a 2-level digital
representation, which is most error tolerant and less technically challenging. Normally, a low
voltage represents '0' and a high voltage represents '1' but it could be the other way round. A
even more reliable method is to encode '0' as a change of voltage from low to high, and '1' as
a change from high to low.
Numeral system supports a systematic and consistent set of rules to represent numbers. The
commonly known numeral systems include decimal, binary, and hexadecimal. Some key
characteristics of numerals system include the following:
 Each numeral system defines a set of numbers such as integers or positive numbers.
 Each numeral system can provide each number in the number set a
unique representation.
 Each numeral system contains of a set of unique symbols, each representing a
certain value in the set of numbers. In the decimal numeral system, the ten symbols
are 0, 1, 2, ..., 9.
 Each numeral system provides rules for combining symbols to represent a
larger range of numbers and therefore it can support a large number set.
For example, the decimal numeral system uses the positional notation to combine the
symbols to represent numbers such as 32 and 1589. Larger numbers are constructed by
putting symbols together in juxtapositions.
The base of a numeral system is the number of unique symbols used in the system.

Base Numeral System


2 Binary
3 Ternary
8 Octal
10 Decimal
12 Doudecimal
16 Hexadecimal
20 Vigasimal
60 Sexagesimal

Decimal number system is the norm in today's societies, as in ancient China and the
Hindu- Arabic world. However, in the ancient world, there were all sorts of number
systems.
 Vigesimal or base-20 used by Mayans.
 Duodecimal or base-12 used by Nigerians.
 Sexagesimal or base 60 used by Babylonians.
Decimal number system has 10 symbols, from 0 to 9. To represent values larger than 0 to 9,
we use the positional notation. Positional notation is based on a system that each digit is
related to the next by a multiplier, which is the base or the radix of the number system. In
decimal number system, the multiplier is 10. This means for one digit, the digit to the left
hand side is worth 10 times more multiplied by the value represented by the digit.
Example: Positional Notation in Decimal System
Positive binary
Question: Why is the number 3456 representing the value 3456 in the decimal
numeral system?
Answer:
Values
1
1-bit 0 to 2 -1 1 1 2
8-bit 0 to 28 -1 11111111 255 256
16-bit 0 to 216 -1 1111111111111111 65535 65536
32-bit 0 to 232 -1 11111111111111111111111111111111 4294967295 42949672956
Number of Unique
Bit size Range Maximum (Binary) Maximum (Decimal)

pair indicates overflow.


Binary Coded Decimal (BCD)
Binary coded decimal is a special representation that codes each decimal digit independently
into binary representation. For example, the decimal number 68(decimal) is coded in BCD as
the following. Each decimal digit requires 4 bits of binary digits to encode.
6 -> 0110
8 -> 1000
So 68 (decimal) is equivalent to 0110 1000 (BCD)
A drawback of this approach is in the range of numbers that it can represent. A 8-digit BCD
can cover only 0 to 99(decimal) but a 8-digit positive binary number can cover 0 to
255(decimal). Resource utilization is low.

Bit size Range Range (Binary) Number of Unique Values


4-bit 0 to 9 0000 to 1001 10
8-bit 0 to 99 0000 0000 to 1001 1001 100
16-bit 0 to 9999 0000 0000 0000 0000 to 1001 1001 1001 1001 10000
32-bit 0 to 99999999 Too long to show 100000000

Addition and subtraction operations pose little problem in circuitry implementation. Each 4
digits are considered together in one operation and so addition of two 8-bit BCD numbers
requires two addition operations. BCD addition and subtraction is therefore more efficient
than that in the positive binary representation. The circuitry is a bit more complex. Another
advantage is the ease of conversion to printing and LCD display formats.

Sign-Magnitude Binary Representation


Both the positive binary representation and BCD cover only positive range of values. There
are several variants that can cover negative values as well.
Sign-magnitude representation assigns the most significant bit as the indicator of
negativity. A value of 1 indicates a negative value and 0 indicates positive. In an 8-bit sign-
magnitude binary number, one bit is used for the sign bit and the remaining 7-bits for the
magnitude.

Bit size Minimum (Binary) Maximum(Binary) Range (Decimal) Number of Unique Values
8-bit 1111 1111 0111 1111 -127 to +127 255
16-bit 1111 1111 1111 1111 0111 1111 1111 1111 -32767 to +32767 65535

The resource utilization is good, except that there are now two numbers representing the
value zero. They are 0000 0000 and 1000 0000. So an 8-bit sign-magnitude binary
representation can represent 255 values only (from -127 to +127).
Addition and subtraction of sign-magnitude binary numbers is more challenging. The
operations cannot be simplified into smaller operations on individual digits.
One's Complement Binary Representation
The method of complement is sometimes used in subtraction. This method turns a
subtraction operation into a complement and an addition operation. For example, the
expression 654 - 234 is converted into a complement operation for 234 and an addition
operation:
654 - 234 (-234 is converted into +766 by subtracting it from 1000)
654 + ( +766) = 1 420 = 420 (The carry 1 is discarded)
In one's complement binary representation, we represent negative numbers by finding
one's complement of the corresponding positive number. The one's complement operation
is carried out by inverting every bit (turning 0 into 1 and 1 into 0).
Given a positive 8-bit binary number 0011 1000 (decimal 56), to find its corresponding
negative number (decimal -56), we apply one's complement operation to the 8-bit
binary number.

Each time the one's complement operation is applied, the sign of the number is reversed. This
is equivalent to a negation operation.

Note that the number 1100 0111 is in 1's complement format, and it cannot be directly
converted to decimal if it is a negative number. One's complement binary representation is
the system that uses one's complement operation to work out a positive binary number's
corresponding negative number.
The following table shows the range of numbers that can be represented by 8-bit and 16-bit
one's complement binary representation.

Bit size Minimum (Binary) Maximum(Binary) Range (Decimal) Number of Unique Values
8-bit 1000 0000 0111 1111 -127 to +127 255
16-bit 1000 0000 0000 0000 0111 1111 1111 1111 -32767 to +32767 65535

e width is the smallest.

Numeral System Number of Symbols (Radix) Width Required to Represent 0 to 99


Binary 2 7
Decimal 10 2
Centesimal 100 1
1. System Bus for Connecting the Registers

The registers in the programmable computers are all connected together with the system bus.
 The system bus is the most important highway for data movement in the computer.
 The system bus allows a pair of the registers to move data between them.
 At any one moment, only one such data movement route can operate.
 This is a limitation of the system bus design.
 Although a system bus can connect many registers, only two of them can exchange
data at any one time.

Data movement can be speed up by having them to happen in parallel. The system bus can be
replaced by the fully connected network. Each pair of registers now has a dedicated highway.
There are a few drawbacks to this approach:
 The fully connected network is clearly a lot more costly to build
 A register can only handle one data movement at one time even if it is connected to
all other registers.
 Some connections have no use in the operations of the computer.

The performance of the system bus is an important factor to the working of the
programmable computer.
 The system bus operates according to the clock and signal from the Controller.
 The Controller determines which pair of components are to establish connection and
to move data.
2. Input and Output Controllers

The design of our basic programmable computer is completed by adding the input and
output controllers. These controllers are connected to the system bus.

The components that are connected to the input and output are considered as peripherals.
 Input devices such as keyboard and mouse, and output devices such as monitor
are common.
 Some IO devices can operate both as input and output devices.
 A hard disk is a memory device that can do both input and output.

The programmable computer has two levels of memory. The memory system that connects
directly to the system bus through the MAR and MDR is called the main memory or
primary memory. The main memory stores data and program for program execution. The
memory system that connects through IO controllers is called secondary memory. Second
memory is usually designed for long-term storage.
3. Central Processing Unit (CPU)

For ease of design, implementation, and production, some components of the


programmable computer are integrated closely to form a single component called the
Central Processing Unit (CPU).
The following figure shows that the CPU includes the ALU, the main registers, the
system bus and the controller.

Leaving the main memory out of the CPU has advantages:


 The memory physical size is large and to include memory in the CPU makes design
very difficult.
 Leaving the memory as a separated component allows the memory size to
expand independently.
 The CPU can operate more independently from the Memory System. It allows these two
components to operate at different speeds.
It also has an important disadvantage:
 The data movement route from the registers to the Memory System becomes longer
and data movement will take longer time to complete.
 nt: units of CPU die are cut from wafer
 Chip packaging and testing
Operation of Main Memory Systems

Operation of the main memory system can occur when the MAR, MDR, and the address
R/W line are loaded with data. The loading of data takes time. The operation is therefore
synchronized with a memory clock so that the loading of data and the operation can take
place at correct timings.
An electronic clock on a computer is a signal that goes between high and low repeatedly.
Memory operation occurs according to the signal, and usually triggered by the edge (rising
or falling edge) of the clock.

The rising edge or falling edge is useful because it represent a time instance that every
data (or signal) involved are ready.
The clock rate (or frequency) has a bearing on the speed of the memory. A slow clock rate
would means that the memory operation occurs less frequently, and therefore slowing the
data movement.
However, we cannot simply increase the clock rate without making other considerations.
Memory operations take time to complete. So the clock rate must allow the completion of
one operation before triggering the next one.
Semiconductor Memory: Static RAM and Dynamic RAM

Current main memory system is based on semiconductor memory. A standard circuitry called
flip-flop can store 1-bit of data. A memory system can be designed with millions of these
circuitry integrated together.
 Random access memory (RAM) refers to such memory system in which the stored
data to be accessed in any order.
 RAM based on flip-flops is called static RAM (SRAM). SRAM is fast and non-volatile
as long as powered and volatile if there is no power.
 Each flip-flop is made up of 6 to 8 transistors, which can take up some space if
larger memory size is to be packaged.
1. Addressing Modes

In summary, E-LMC supports the following kinds of addressing modes:


 Direct addressing mode. The operand specifies an address of data.
o LMC instructions are in direct addressing mode.
 Immediate addressing mode. The operand is the desired data.
 Indirect addressing mode. The operand specifies an address that contains the address of
the desired data.
 Register indirect addressing mode. The operand specifies a register that contains the
address of the desired data.
 Register index relative addressing mode. There are two operands, one of which is a base
address and another operand is a register containing an offset value. The desired data is
found in an address that is the sum of the base address and the offset.
o This is known as relative addressing mode because the address of the
desired data is related to a base address.

Specifying Addressing Modes in Mnemonic Form

The address mode of an operand in instructions is expressed using the following syntax.

Addressing Mode Syntax Examples


Immediate #Data LDA #20
Direct Address LDA 20
Indirect (Address) LDA (20)
Register Index Relative RN + Address LDA R4 + 20
Register RN LDA R4
Register Indirect (RN) LDA (R4)

Case Studies of Instruction Set Architectures

The addressing modes covered in this chapter are the most common ones. In the real
world there are processors designed with many address modes, though most of them
are combinations or variants of the common addressing modes.

Processor Remarks
Intel 8086 17 addressing modes
Pentium 17 addressing modes (backward compatibility)
Itanium 1 addressing mode (register indirect addressing)
MIPS Register addressing mode mainly
Java bytecode Register indirect with offset in a stack architecture
Example: Operations of LDA of Different Addressing Modes
The following gives the content of a range of main memory addresses and some registers.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3

Work out the value loaded into ACC after the execution of the following instructions
based on LDA 22.
 Direct addressing LDA 22
 Immediate addressing LDA #22
 Indirect addressing LDA (22)
 Register addressing LDA R5
 Register indirect addressing LDA (R5)
 Register Index Relative Addressing LDA
R5+2 Answer:

Direct addressing LDA 22


ACC will contain 25. The operand 22 is an address where the data is found.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3

Immediate addressing LDA #22


ACC will contain 22. The operand is the data.
Indirect addressing LDA (22)
ACC will contain 21. The operand 22 specifies an address that contains the address of the
desired data. Address 22 contains 25, which is the address containing the data 21.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3

Register addressing LDA R5


ACC will contain 23. The operand R5 is the register R5. R5 contains 23.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3

Register indirect addressing LDA (R5)


ACC will contain 26. The operand R5 is the register R5 that contains the address of the
desired data. R5 contains 23, which is the address containing the data 26.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3
Register Index Relative Addressing LDA R5+2
ACC will contain 21. The operand 2 is the base address and R5 contains the offset. The
resolved address is 23(which comes from R5) +2 = 25. This address contains the desired data
21.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3

Example: Operations of STO of Different Addressing Modes


The following gives the content of a range of main memory addresses and some registers.
Address Content Address Content Register Content
30 31 34 30 R4 5
31 0 35 2 R5 2
32 33 36 37 R6 31
33 35 37 34 R7 1

Work out the values in the address range given above after the execution of the
following instructions based on STO 32. Assume that ACC contains 18.
 Direct addressing STO 32
 Indirect addressing STO (32)
 Register indirect addressing STO (R6)
 Register Index Relative Addressing STO
R5+32 Answer:

Direct addressing STO 32


The operand 32 specifies the destination where the data is stored. Address 32 will be
changed to 18 (which is the content of ACC).

Indirect addressing STO 32


The operand 32 specifies the address that contains the destination of the data. Address 32
contains 33 which is the destination of the data (ACC). Address 33 is changed to 18.

Register indirect addressing STO (R6)


The register R6 specifies the address that contains the destination of the data. R6 contains
31 which is the destination of the data (ACC). Address 31 is changed to 18.

Register Index Relative Addressing STO R5+32


The destination is the sum of address 32 and R5. It is 2 + 32 = 34. Address 34 is changed
to 18.
2. Instruction Execution in E-LMC

E-LMC is no different from LMC that the execution of instruction follows the fetch
and execution cycle.
 Some instructions are two word long. If a memory system supports data transfer size of 2
words, should the fetch phase always read in 2 words at a time?
 The operations of the execution phase can vary greatly. Most instructions do not require
memory operation in the executions phase. Their execution phase is short. A few
instructions are based on indirect or index relative address modes, the execution phase
of these instructions is longer.
The following shows the operations of fetch and execution cycle of various LDA
instructions in E-LMC. It is assumed that the memory fetch is 1 word each time.

LDA #DAT (Immediate Addressing) LDA Addr (Direct Addressing)


PC > MAR PC > MAR
M[MAR] > MDR M[MAR] > MDR
MDR > IR (loaded the first word) MDR > IR (loaded the first word)
PC + 1 > PC PC + 1 > PC
PC > MAR PC > MAR
M[MAR] > MDR M[MAR] > MDR
MDR > ACC (2nd word is data) MDR > MAR (2nd word is address)
PC + 1 > PC M[MAR] > MDR
MDR > ACC
PC + 1 > PC

LDA (Addr) (Indirect Addressing)


PC > MAR
M[MAR] > MDR
MDR > IR (loaded the first word)
PC + 1 > PC
PC > MAR
M[MAR] > MDR
MDR > MAR (2nd word is address ADDR, load its content)
M[MAR] > MDR
MDR > MAR (the content of ADDR is the address of desired data)
M[MAR] > MDR
MDR > ACC (the data is stored)
PC + 1 > PC

LDA (RN) (Indirect Register Addressing)


PC > MAR
M[MAR] > MDR
MDR > IR (loaded the first word)
RN > MAR (copy the value of RN to MAR, the data of the addr is loaded)
M[MAR] > MDR
MDR > ACC
PC + 1 > PC
LDA RN+Addr (Register Index Relative Addressing)
PC > MAR
M[MAR] > MDR
MDR > IR (loaded the first word)
PC + 1 > PC
PC > MAR
M[MAR] > MDR
MDR > IR (2nd op is the base address)
IR + RN > MAR (add the value of RN the offset, by control unit)
M[MAR] > MDR
MDR > ACC
PC + 1 > PC

.
Fixed Instruction Length Design

In the fixed instruction length design approach, every instruction is of the same length. LMC
is fixed length, while E-LMC is variable length.
Fixed length allows more efficient instruction fetch.
 The fetch phase can read in 2 or 4 words at the same time.
 Some memory systems support fetching multiple addresses in one operation.
 MDR and IR sizes are larger to store more words in one instruction.
 The number of RTL steps is reduced.
The following shows an example of LDA under fixed instruction length design.

LDA ADDR (Direct Addressing) LDA ADDR (Direct Addressing)


Fixed Instruction Length (2-bytes) Variable Instruction Length
PC > MAR PC > MAR
M[MAR] > MDR M[MAR] > MDR
MDR > IR (loaded two words) MDR > IR (loaded the first word)
IR[ADDR] > MAR PC + 1 > PC
M[MAR] > MDR PC > MAR
MDR > ACC M[MAR] > MDR
PC + 2 > PC MDR > MAR (2nd op is address)
M[MAR] > MDR
MDR > ACC
PC + 1 > PC
Answer:
(i) Computer A takes 3 seconds to execute 300 million instructions. The millions
instructions per second (MIPS) is 300 million / 3 seconds = 100 MIPS. Computer B takes
5 seconds to execute 100 million instructions. The millions instructions per second (MIPS)
is 100 million / 5 seconds = 20 MIPS.
(ii) There are a few reasons: (1) Computer A and B supports different instruction sets, and
so the same program is compiled with two sets of machine code. (2) The compilers are not
of the same quality and so one of them might have generated poor and inefficient code. (3)
The clock rates of the 2 computers are different. One of them may be slower.

Complex and Simple Instructions

The MIPS measurement has some merits for comparing performance of processors, but
it does not take into account the amount of work actually done by an instruction.
 Complex instructions generally take more clock cycles to complete than
simple instructions.
 For example, ADD (addition) and MUL (multiplication) are two instructions of
different levels of effort.
 Without MUL in an instruction set, a program would need a number of ADD and
other instructions to perform multiplication.
 One cannot compare the performance of two ways of doing multiplications
without looking into the detail performance parameters.
Performance for Enterprise Computing
Enterprise computing refers to the application of computing technologies for large-scale
business applications. Computing solutions for banks, financial institutions, logistics and
government are often based on enterprise computing technologies. These users are more
concerned with the number of tasks completed, and these tasks are business
transactions, processed orders, and requests handled. The performance measurement is
therefore the number of such tasks completed in a second.
The following are some figures obtained from a test of applying IBM Power 750 with 32
POWER7 cores in a bank (Reference: http://www.ameinfo.com/record-breaking-
unmatched- results-ics-banks-305031):
 30000 concurrent users and 14700 financial transactions per second.
 51431 transactions per second in ATM and Internet Banking activities.
 401606 interest accounts processed per second.

Benchmarking
Benchmarking is the technique that compares the performance of two different computers
by measuring the time that each one takes to complete a set of particular programs.
Benchmark programs are a specially designed set of programs for measurement purposes.
 For a particular benchmark, the same workload is given to a set of computers to test
their performance.
 Benchmarking provides a common standard for comparing performance.
 Benchmarking is especially important for comparing computers of different architectures.
o Computers of same architecture may be compared at the design
level: instructions per cycle, clock rate, etc.
o Computers of different architecture have different instruction sets are
difficult to compare conceptually.
There are a number of industry standard benchmarks. These benchmark standards have been
scientifically tested so that the test results are consistent and re-producible. Here are some
examples:
 Standard Performance Evaluation Corporation (SPEC)
 Business Applications Performance Corporation (BAPCo)
Benchmarks are usually specific to a particular workload. Here workload means the type of
computer applications. Typical workloads are Business applications and Graphical
applications.
 The type of instructions executed by a Graphical application is different from that by
a Business application.
o Graphical application typically performs more floating point arithmetic
(for 2D and 3D coordinate calculation)
o Business application typically performs more integer data movement
and some integer arithmetic.
 A CPU that is efficient on data movement and integer arithmetic will perform better
with business applications. The same CPU will not perform as well with Graphical
applications.
1. Pipeline Architectures and Instruction Pipelines
Pipeline architectures achieve very high performance by executing multiple instructions
in parallel. The time taken for individual instructions does not reduce. However, the
overall throughput is improved because there are more instructions executed per second.
Parallel execution of instructions is difficult to realize. The following shows three
instructions running in a sequence.

If the three instructions were to be executed at the same time, then theoretically the
following would happen.

For the above to happen, the following is required:


 The computer has the mechanism to fetch three instructions from main memory to CPU
at the same time.
 The computer has the mechanism to execute three instructions in the CPU at the
same time.
 The computer has the mechanism to handle multiple memory operations that may
occur due to the instruction execution.
The following diagram explains this situation more clearly by separating instruction
execution into four stages.
 Fetch the instruction from memory into the IR
 Decode the instruction in the IR
 Execute the instruction (may include memory operation)
 Write the result to a register, the accumulator, or the main memory
Basically the above model of instruction execution is similar to the Register
Transfer Language (RTL) perspective. The following shows the RTL for LMC

RTL Steps of ADD Instruction


PC > MAR
M[MAR] > MDR
MDR > IR
IR[ADDR] > MAR
M[MAR] > MDR
MDR + ACC > ACC
PC + 1 > PC

ADD.
Structural hazards

Structural hazards mean that hardware cannot support the running of two instructions at
the same time even if they are in different stages
 For example, both steps require access to memory but there is only one memory port
for accessing data.
 Like the data hazard, a solution to structural hazards is to stall the pipeline by
inserting one or more bubbles in the pipeline.

Instruction reordering can be used to solve some of the hazards.

Disadvantage: This approach is not flexible to change. Consider what needs to be done when
you want to upgrade the CPU by adding several new instructions and modifying a few of the
existing instructions.
Microprogramming implementation approach

The microprogramming approach is based on the following observation that no matter


how complex an instruction, it can be broken down into a series of fundamental operations
within the CPU.
 Data movement: moving data from one register to another.
 Arithmetic and logic functions: performing simple arithmetic or logic functions on data
in registers.
 Conditional branches: making simple decisions based on the values stored in flags
and registers.
Rather than building separate hardware logic for each and every instruction, a number of
simple hardware logic units are built for internal CPU operations, and these internal
CPU operations are then used to form the instructions of the CPU.
The set of fundamental CPU operations is called microinstructions. These microinstructions
are then programmed to form the actual instructions of the CPU. The tiny programs that form
the CPU instruction set are called microcode. The CPU has built-in read-only memory to
store the microcode.
The following shows an example of executing the ADD instruction. The control unit
executes the micro-instructions according to the sequencing logic in the microcode
library.
2. Enterprise and Mainframe Computing

(This section is adapted from the IBM Academic Initiative course on mainframe, and it
is used with permission)
Enterprise computing is the style of computing that satisfies the information processing
need of large enterprises.
 Very large amount of data (i.e. transaction data in stock market exchange)
 High availability (i.e. almost never break-down)
 Integrity and security (i.e. that the data are safe and correct is guaranteed)
 Scalability (i.e. the system capacity can increase gracefully)
A mainframe is what businesses use to host their commercial databases, transaction
servers, and applications that require a greater degree of security and availability than is
commonly found on smaller-scale machines.

Strengths of Mainframes

The following table summaries the major strengths of mainframes:

Strengths Description
Reliability Hardware: provides self-checking and recovery from error ability.
Software: extensively checked and tested.
Availability Usually measure in Mean Time Between Failure (MTBF) which may be months or
years in modern mainframe. Able to continuously operating while dealing with errors
or scheduled upgrade.
Serviceability Provides information about the source of failure and allow a rapid problem fix.
Security Provides a framework to manage authentication and prevent unauthorized access.
Scalability Provides a flexibility to change capacity with minimal impact on the operation and
the cost.
Continuing Enterprises typically invest a lot of money on application development on mainframe
Compatibility and it is important that such applications will continue to function even after decades.
Mainframes in the Modern World

Mainframe computers are usually hidden from public eyes. However they are the driving
force behind many essential day-to-day activities.
 Many of the Fortune 1000 companies use a mainframe system.
 Over 60% of all data available on the Internet is stored on mainframe systems.
 There are at least 10,000 mainframe systems still running in the world.
 Most banks in Hong Kong are supported by mainframes.
The yearly revenue generated from mainframe computing is still between 4 to 6 billions
US dollars. There are 2,000 to 3,000 mainframe systems shipped every year.
IBM is the largest mainframe vendor and probably the only large vendor still in the
market. IBM has continuously enhanced mainframe computing with the most current
technologies. The modern mainframe computer is no longer a room-size computer system.
It has now included distributed computing, cloud computing and virtualization in its
armoury.
The current IBM mainframe systems are called the System/Z series. The following lists the
core features of IBM zEnterprise System introduced in 2010:
 The processor z196 chip is a quad-core 5.2GHz CISC processor.
 The z196 system can support a maximum of 24 processors.
 Each core may be assigned a specific role such as a typical Central Processor or
an Application Assist Processor for running Java/XML.
 Maximum memory is 3TB.

Mainframes Architectures

Mainframe architecture is continuously evolving due to emergence of new computing


technologies. There are however some architectural features that characterise a mainframe
system.

Architectural Features Relevant to core values


More processors and faster processors Large amount of data processing

More memory Large amount of data processing

Upgrading hardware and software dynamically Scalability and Availability

Enhanced IO capacity Large amount of data processing

Flexible resource provisioning (i.e. divide resources into Scalability


multiple, logically independent systems)

Distributed computing capability Scalability, availability

Encryption of data (i.e. AES encryption) Security


The specifics of the architectural features are usually worked out rigorously from
Service Level Agreement (SLA).
 A SLA is an agreement between a service provider and a recipient about the level
of performance required.
o For example, a bank may want 99% of ATM transactions to be completed
in one second.
 The number of processors, IO bandwidth, memory, etc is worked out from the
required performance level.
The following figure shows the conceptual structure of a traditional mainframe.
 Central processor contains processors, main memory, and other control circuitries.
 Large capacity data processing is enabled through a large number of channels, each
of which connects IO devices (such as hard-disks) to the memory storage.
 Processing capacity is scaled up through connecting IO devices to more Central
Processors. The Control Units manage the paths of data movement between IO devices,
channels, and other Control Units.
 More Central Processors can be connected when the processing capacity requirement
is increased.
Virtualization and Partitioning

Virtualization is an important feature in mainframes so that the massive computing


resources can be suitably provisioned to individual applications.
The idea is to have individual applications to have the illusion of running its own hardware.
Applications can share the computing resources through partitioning the physical server into
a number of virtual servers.
IBM mainframes’ resources are managed by a hypervisor. Hypervisor is called a Control
Program (CP) in which users can create virtual servers with specific operating systems
and hardware provisioning.

You might also like