PDF 2.5
PDF 2.5
PDF 2.5
[Document subtitle]
Abstract
[Draw your reader in with an engaging abstract. It is typically a short summary of the document.
When you’re ready to add your content, just click here and start typing.]
mpoon
[Email address]
The following shows a typical computing process. This process reads input data, processes
the data, and then writes output data. The process has access to a data storage which can be
used to store data for future use. The stored data can turn back and influence the process.
2
1. Components of a Programmable Computer
These items will perform some processes. The major processes are listed below:
instruction execution: an essential function of the programmable computer
data storage: a function for storing the data before and after the instruction execution
program storage: a function for storing the program in the programmable computer
for instruction execution
inputting data and program: a function for data and program to go into the computer
from the outside world
outputting data: a function for data to leave the computer to the outside world
The last two processes are essential because a computer cannot exist in isolation. A computer
useful for any purpose must be able to interact with the outside world.
These processes are refined and their roles are abstracted into the following
major components for a programmable computer.
Arithmetic and Logic Unit (ALU): for instruction execution
Memory system: for data and program storage
Input: for data input into the programmable computer
Output: for data output from the programmable computer
2. Introduction to Arithmetic and Logic Unit
The first component of a programmable computer is the Arithmetic and Logic Unit (ALU).
The ALU is a functional unit responsible for the execution of instructions.
The execution of instructions is an essential function of a programmable computer.
The input to the ALU includes the data and the instructions that command how to
deal with the data.
The result of the instruction execution will appear at the output of the ALU.
The following figure shows a schematic diagram of ALU with its input and output. One input
channel is for sending in instructions and others for sending in data. The ALU can typically
many types of instructions, for example, add, subtract, and negation.
In the figure, the ALU has two data input channels and one data output channel. This is a
typical arrangement because most operations (instructions) have at most two operands.
Addition: A + B. A and B are passed into the two input channels and the result of A+B
will appear at the output channel.
Subtraction: A – B. The same case as addition.
Negation: -A. This is a single operand operation. A is passed into one input channel.
The operation of the ALU is controlled by the instruction. For example, an Addition
instruction will make the ALU performing an addition operation on the input data. The ALU
will output data to inform other components of its status. For example, if an error occurs in
the calculation, then an error status may be emitted.
Different instructions are represented by different electronic signals which in turn
representing numbers. The ALU designer may specify that 01 representing Addition and 02
representing Subtraction. The coding of instruction is usually published in a technical
manual.
The ALU does not carry out any operation unless it is told to do so. The clock line connected
to the ALU sends regular signal to the ALU, in a way similar to the alarm "beep beep beep"
sound. Upon receiving a beep sound, the ALU executes one instruction. Then it executes
another instruction when the second beep sound arrives.
3. Input and Output
The final components are Input and Output. These two components connect the
programmable computer and the outside world. The following shows a schematic diagram of
the two components.
1. Digital Representation
The ALU will use digital representation to encode data. Data is an abstract entity but
eventually it must be represented somehow with a physical attribute. In electronic systems,
the common physical attribute to use is the voltage. Voltage is a continuous scalar.
There are actually two fundamental ways to represent data with voltages: analogue and
digital. Our ALU will use digital representation for greater reliability and error tolerance.
The following figure explains the analogue representation.
Analogue representation is continuous. Any small changes in the signal can change
the original value into an incorrect value.
Analogue representation can represent continuous data values, but it is not tolerant
to noise and other forms of signal degradation.
Digital representation represents data in discrete levels.
The following figure shows that a 2-level digital representation is a lot more error
tolerance than a 10-level digital representation.
The levels are well defined and any sufficiently small fluctuation in the signal
will keep the signal at the same level. Digital representation is therefore less
prone to errors.
One can decide the number of levels used in a digital representation. The error
tolerance decreases as more levels are defined.
2. Binary and Other Numeral Systems
The ALU will use the binary numeral system. The binary numeral system uses two symbols
to represent data '0' and '1'. Therefore its implementation requires a 2-level digital
representation, which is most error tolerant and less technically challenging. Normally, a low
voltage represents '0' and a high voltage represents '1' but it could be the other way round. A
even more reliable method is to encode '0' as a change of voltage from low to high, and '1' as
a change from high to low.
Numeral system supports a systematic and consistent set of rules to represent numbers. The
commonly known numeral systems include decimal, binary, and hexadecimal. Some key
characteristics of numerals system include the following:
Each numeral system defines a set of numbers such as integers or positive numbers.
Each numeral system can provide each number in the number set a
unique representation.
Each numeral system contains of a set of unique symbols, each representing a
certain value in the set of numbers. In the decimal numeral system, the ten symbols
are 0, 1, 2, ..., 9.
Each numeral system provides rules for combining symbols to represent a
larger range of numbers and therefore it can support a large number set.
For example, the decimal numeral system uses the positional notation to combine the
symbols to represent numbers such as 32 and 1589. Larger numbers are constructed by
putting symbols together in juxtapositions.
The base of a numeral system is the number of unique symbols used in the system.
Decimal number system is the norm in today's societies, as in ancient China and the
Hindu- Arabic world. However, in the ancient world, there were all sorts of number
systems.
Vigesimal or base-20 used by Mayans.
Duodecimal or base-12 used by Nigerians.
Sexagesimal or base 60 used by Babylonians.
Decimal number system has 10 symbols, from 0 to 9. To represent values larger than 0 to 9,
we use the positional notation. Positional notation is based on a system that each digit is
related to the next by a multiplier, which is the base or the radix of the number system. In
decimal number system, the multiplier is 10. This means for one digit, the digit to the left
hand side is worth 10 times more multiplied by the value represented by the digit.
Example: Positional Notation in Decimal System
Positive binary
Question: Why is the number 3456 representing the value 3456 in the decimal
numeral system?
Answer:
Values
1
1-bit 0 to 2 -1 1 1 2
8-bit 0 to 28 -1 11111111 255 256
16-bit 0 to 216 -1 1111111111111111 65535 65536
32-bit 0 to 232 -1 11111111111111111111111111111111 4294967295 42949672956
Number of Unique
Bit size Range Maximum (Binary) Maximum (Decimal)
Addition and subtraction operations pose little problem in circuitry implementation. Each 4
digits are considered together in one operation and so addition of two 8-bit BCD numbers
requires two addition operations. BCD addition and subtraction is therefore more efficient
than that in the positive binary representation. The circuitry is a bit more complex. Another
advantage is the ease of conversion to printing and LCD display formats.
Bit size Minimum (Binary) Maximum(Binary) Range (Decimal) Number of Unique Values
8-bit 1111 1111 0111 1111 -127 to +127 255
16-bit 1111 1111 1111 1111 0111 1111 1111 1111 -32767 to +32767 65535
The resource utilization is good, except that there are now two numbers representing the
value zero. They are 0000 0000 and 1000 0000. So an 8-bit sign-magnitude binary
representation can represent 255 values only (from -127 to +127).
Addition and subtraction of sign-magnitude binary numbers is more challenging. The
operations cannot be simplified into smaller operations on individual digits.
One's Complement Binary Representation
The method of complement is sometimes used in subtraction. This method turns a
subtraction operation into a complement and an addition operation. For example, the
expression 654 - 234 is converted into a complement operation for 234 and an addition
operation:
654 - 234 (-234 is converted into +766 by subtracting it from 1000)
654 + ( +766) = 1 420 = 420 (The carry 1 is discarded)
In one's complement binary representation, we represent negative numbers by finding
one's complement of the corresponding positive number. The one's complement operation
is carried out by inverting every bit (turning 0 into 1 and 1 into 0).
Given a positive 8-bit binary number 0011 1000 (decimal 56), to find its corresponding
negative number (decimal -56), we apply one's complement operation to the 8-bit
binary number.
Each time the one's complement operation is applied, the sign of the number is reversed. This
is equivalent to a negation operation.
Note that the number 1100 0111 is in 1's complement format, and it cannot be directly
converted to decimal if it is a negative number. One's complement binary representation is
the system that uses one's complement operation to work out a positive binary number's
corresponding negative number.
The following table shows the range of numbers that can be represented by 8-bit and 16-bit
one's complement binary representation.
Bit size Minimum (Binary) Maximum(Binary) Range (Decimal) Number of Unique Values
8-bit 1000 0000 0111 1111 -127 to +127 255
16-bit 1000 0000 0000 0000 0111 1111 1111 1111 -32767 to +32767 65535
The registers in the programmable computers are all connected together with the system bus.
The system bus is the most important highway for data movement in the computer.
The system bus allows a pair of the registers to move data between them.
At any one moment, only one such data movement route can operate.
This is a limitation of the system bus design.
Although a system bus can connect many registers, only two of them can exchange
data at any one time.
Data movement can be speed up by having them to happen in parallel. The system bus can be
replaced by the fully connected network. Each pair of registers now has a dedicated highway.
There are a few drawbacks to this approach:
The fully connected network is clearly a lot more costly to build
A register can only handle one data movement at one time even if it is connected to
all other registers.
Some connections have no use in the operations of the computer.
The performance of the system bus is an important factor to the working of the
programmable computer.
The system bus operates according to the clock and signal from the Controller.
The Controller determines which pair of components are to establish connection and
to move data.
2. Input and Output Controllers
The design of our basic programmable computer is completed by adding the input and
output controllers. These controllers are connected to the system bus.
The components that are connected to the input and output are considered as peripherals.
Input devices such as keyboard and mouse, and output devices such as monitor
are common.
Some IO devices can operate both as input and output devices.
A hard disk is a memory device that can do both input and output.
The programmable computer has two levels of memory. The memory system that connects
directly to the system bus through the MAR and MDR is called the main memory or
primary memory. The main memory stores data and program for program execution. The
memory system that connects through IO controllers is called secondary memory. Second
memory is usually designed for long-term storage.
3. Central Processing Unit (CPU)
Operation of the main memory system can occur when the MAR, MDR, and the address
R/W line are loaded with data. The loading of data takes time. The operation is therefore
synchronized with a memory clock so that the loading of data and the operation can take
place at correct timings.
An electronic clock on a computer is a signal that goes between high and low repeatedly.
Memory operation occurs according to the signal, and usually triggered by the edge (rising
or falling edge) of the clock.
The rising edge or falling edge is useful because it represent a time instance that every
data (or signal) involved are ready.
The clock rate (or frequency) has a bearing on the speed of the memory. A slow clock rate
would means that the memory operation occurs less frequently, and therefore slowing the
data movement.
However, we cannot simply increase the clock rate without making other considerations.
Memory operations take time to complete. So the clock rate must allow the completion of
one operation before triggering the next one.
Semiconductor Memory: Static RAM and Dynamic RAM
Current main memory system is based on semiconductor memory. A standard circuitry called
flip-flop can store 1-bit of data. A memory system can be designed with millions of these
circuitry integrated together.
Random access memory (RAM) refers to such memory system in which the stored
data to be accessed in any order.
RAM based on flip-flops is called static RAM (SRAM). SRAM is fast and non-volatile
as long as powered and volatile if there is no power.
Each flip-flop is made up of 6 to 8 transistors, which can take up some space if
larger memory size is to be packaged.
1. Addressing Modes
The address mode of an operand in instructions is expressed using the following syntax.
The addressing modes covered in this chapter are the most common ones. In the real
world there are processors designed with many address modes, though most of them
are combinations or variants of the common addressing modes.
Processor Remarks
Intel 8086 17 addressing modes
Pentium 17 addressing modes (backward compatibility)
Itanium 1 addressing mode (register indirect addressing)
MIPS Register addressing mode mainly
Java bytecode Register indirect with offset in a stack architecture
Example: Operations of LDA of Different Addressing Modes
The following gives the content of a range of main memory addresses and some registers.
Address Content Address Content Register Content
20 20 24 1 R4 0
21 9 25 21 R5 23
22 25 26 0 R6 20
23 26 27 25 R7 3
Work out the value loaded into ACC after the execution of the following instructions
based on LDA 22.
Direct addressing LDA 22
Immediate addressing LDA #22
Indirect addressing LDA (22)
Register addressing LDA R5
Register indirect addressing LDA (R5)
Register Index Relative Addressing LDA
R5+2 Answer:
Work out the values in the address range given above after the execution of the
following instructions based on STO 32. Assume that ACC contains 18.
Direct addressing STO 32
Indirect addressing STO (32)
Register indirect addressing STO (R6)
Register Index Relative Addressing STO
R5+32 Answer:
E-LMC is no different from LMC that the execution of instruction follows the fetch
and execution cycle.
Some instructions are two word long. If a memory system supports data transfer size of 2
words, should the fetch phase always read in 2 words at a time?
The operations of the execution phase can vary greatly. Most instructions do not require
memory operation in the executions phase. Their execution phase is short. A few
instructions are based on indirect or index relative address modes, the execution phase
of these instructions is longer.
The following shows the operations of fetch and execution cycle of various LDA
instructions in E-LMC. It is assumed that the memory fetch is 1 word each time.
.
Fixed Instruction Length Design
In the fixed instruction length design approach, every instruction is of the same length. LMC
is fixed length, while E-LMC is variable length.
Fixed length allows more efficient instruction fetch.
The fetch phase can read in 2 or 4 words at the same time.
Some memory systems support fetching multiple addresses in one operation.
MDR and IR sizes are larger to store more words in one instruction.
The number of RTL steps is reduced.
The following shows an example of LDA under fixed instruction length design.
The MIPS measurement has some merits for comparing performance of processors, but
it does not take into account the amount of work actually done by an instruction.
Complex instructions generally take more clock cycles to complete than
simple instructions.
For example, ADD (addition) and MUL (multiplication) are two instructions of
different levels of effort.
Without MUL in an instruction set, a program would need a number of ADD and
other instructions to perform multiplication.
One cannot compare the performance of two ways of doing multiplications
without looking into the detail performance parameters.
Performance for Enterprise Computing
Enterprise computing refers to the application of computing technologies for large-scale
business applications. Computing solutions for banks, financial institutions, logistics and
government are often based on enterprise computing technologies. These users are more
concerned with the number of tasks completed, and these tasks are business
transactions, processed orders, and requests handled. The performance measurement is
therefore the number of such tasks completed in a second.
The following are some figures obtained from a test of applying IBM Power 750 with 32
POWER7 cores in a bank (Reference: http://www.ameinfo.com/record-breaking-
unmatched- results-ics-banks-305031):
30000 concurrent users and 14700 financial transactions per second.
51431 transactions per second in ATM and Internet Banking activities.
401606 interest accounts processed per second.
Benchmarking
Benchmarking is the technique that compares the performance of two different computers
by measuring the time that each one takes to complete a set of particular programs.
Benchmark programs are a specially designed set of programs for measurement purposes.
For a particular benchmark, the same workload is given to a set of computers to test
their performance.
Benchmarking provides a common standard for comparing performance.
Benchmarking is especially important for comparing computers of different architectures.
o Computers of same architecture may be compared at the design
level: instructions per cycle, clock rate, etc.
o Computers of different architecture have different instruction sets are
difficult to compare conceptually.
There are a number of industry standard benchmarks. These benchmark standards have been
scientifically tested so that the test results are consistent and re-producible. Here are some
examples:
Standard Performance Evaluation Corporation (SPEC)
Business Applications Performance Corporation (BAPCo)
Benchmarks are usually specific to a particular workload. Here workload means the type of
computer applications. Typical workloads are Business applications and Graphical
applications.
The type of instructions executed by a Graphical application is different from that by
a Business application.
o Graphical application typically performs more floating point arithmetic
(for 2D and 3D coordinate calculation)
o Business application typically performs more integer data movement
and some integer arithmetic.
A CPU that is efficient on data movement and integer arithmetic will perform better
with business applications. The same CPU will not perform as well with Graphical
applications.
1. Pipeline Architectures and Instruction Pipelines
Pipeline architectures achieve very high performance by executing multiple instructions
in parallel. The time taken for individual instructions does not reduce. However, the
overall throughput is improved because there are more instructions executed per second.
Parallel execution of instructions is difficult to realize. The following shows three
instructions running in a sequence.
If the three instructions were to be executed at the same time, then theoretically the
following would happen.
ADD.
Structural hazards
Structural hazards mean that hardware cannot support the running of two instructions at
the same time even if they are in different stages
For example, both steps require access to memory but there is only one memory port
for accessing data.
Like the data hazard, a solution to structural hazards is to stall the pipeline by
inserting one or more bubbles in the pipeline.
Disadvantage: This approach is not flexible to change. Consider what needs to be done when
you want to upgrade the CPU by adding several new instructions and modifying a few of the
existing instructions.
Microprogramming implementation approach
(This section is adapted from the IBM Academic Initiative course on mainframe, and it
is used with permission)
Enterprise computing is the style of computing that satisfies the information processing
need of large enterprises.
Very large amount of data (i.e. transaction data in stock market exchange)
High availability (i.e. almost never break-down)
Integrity and security (i.e. that the data are safe and correct is guaranteed)
Scalability (i.e. the system capacity can increase gracefully)
A mainframe is what businesses use to host their commercial databases, transaction
servers, and applications that require a greater degree of security and availability than is
commonly found on smaller-scale machines.
Strengths of Mainframes
Strengths Description
Reliability Hardware: provides self-checking and recovery from error ability.
Software: extensively checked and tested.
Availability Usually measure in Mean Time Between Failure (MTBF) which may be months or
years in modern mainframe. Able to continuously operating while dealing with errors
or scheduled upgrade.
Serviceability Provides information about the source of failure and allow a rapid problem fix.
Security Provides a framework to manage authentication and prevent unauthorized access.
Scalability Provides a flexibility to change capacity with minimal impact on the operation and
the cost.
Continuing Enterprises typically invest a lot of money on application development on mainframe
Compatibility and it is important that such applications will continue to function even after decades.
Mainframes in the Modern World
Mainframe computers are usually hidden from public eyes. However they are the driving
force behind many essential day-to-day activities.
Many of the Fortune 1000 companies use a mainframe system.
Over 60% of all data available on the Internet is stored on mainframe systems.
There are at least 10,000 mainframe systems still running in the world.
Most banks in Hong Kong are supported by mainframes.
The yearly revenue generated from mainframe computing is still between 4 to 6 billions
US dollars. There are 2,000 to 3,000 mainframe systems shipped every year.
IBM is the largest mainframe vendor and probably the only large vendor still in the
market. IBM has continuously enhanced mainframe computing with the most current
technologies. The modern mainframe computer is no longer a room-size computer system.
It has now included distributed computing, cloud computing and virtualization in its
armoury.
The current IBM mainframe systems are called the System/Z series. The following lists the
core features of IBM zEnterprise System introduced in 2010:
The processor z196 chip is a quad-core 5.2GHz CISC processor.
The z196 system can support a maximum of 24 processors.
Each core may be assigned a specific role such as a typical Central Processor or
an Application Assist Processor for running Java/XML.
Maximum memory is 3TB.
Mainframes Architectures