Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
(DEC 2019)
MUQuestionPapers.com 1
• To have a large faster memory is very costly and hence the different memory at
different memory at different levels gives the memory hierarchy.
MUQuestionPapers.com 2
c) Explain principle of locality of reference in detail. (5 M)
Ans:
• Locality of reference is the term used to explain the characteristics of program
that run in relatively small loops in consecutive memory locations.
• The locality of reference principle comprises of two components:
o Temporal locality.
o Spatial locality.
• Temporal locality: since the programs have loops, the same instructions are
required frequently, i.e. the programs tend to use the most recently used
information again and again.
• If for a long time a information in cache is not used, then it is less likely to be
used again.
• This is known as the principle of temporal locality.
• Spatial Locality: Programs and the data accessed by the processor mostly reside
in consecutive memory locations.
• This means that processor is likely to need code or data that are close to locations
already accessed.
• This is known as the principle of spatial Locality.
• The performance gains are realized by the use of cache memory subsystem are because
of most of the memory accesses that require zero wait states due to principles of
locality.
MUQuestionPapers.com 3
e) Explain Superscalar Architecture. (5 M)
Ans:
• Superscalar processor are those processors that have multiple execution units.
• Hence these processors can execute the independent instructions
simultaneously and hence with the help of this parallelism it increases the speed
of the processor.
• It has been that the number of independent consecutive instructions 2 to 5.
Hence the instruction issue degree in a superscalar processor is restricted from 2
to 5.
Pipelining in Superscalar Processor:
• A RISC or CISC processors execute one instruction per cycle. Their performance
can be improved with superscalar architecture:
o Multiple instruction pipelines are used.
o Multiple instructions are issued for execution per cycle.
o Multiple results are generated per cycles.
MUQuestionPapers.com 4
Q.2)
a) A program having 10 instructions (without Branch and Call
Instructions) is executed on non-pipeline and pipeline processors. All
instructions are of same length and having 4 pipeline stages and time
required to each stage is 1nsecc.
I. Calculate time required to execute the program or Non-pipeline
and Pipeline processor.
II. Calculate Speedup. (10 M)
Ans:
I. Given: n = 10 instructions, K = 4, t = 1nsec
Execution time pipelined = (4 + 100 - 1) * t
= (4 + 90) * 1
= 94 nsec.
Execution time unpipelined = (K * t) n
MUQuestionPapers.com 5
= (4 * 1) 1
= 4 nsec.
II. Speedup = 4/94 = 0.043 times.
• It predicts always the same direction for the same branch during the whole
program execution.
• It comprises hardware-fixed prediction and compiler-directed prediction.
• Simple hardware-fixed direction mechanism can be:
o Predict always not taken
o Predict always taken
o Backward branch predict to be taken, forward branch prediction not to be
taken
• Sometimes a bit in the branch opcode allows the compiler to decide the
prediction direction.
Branch Target Buffer:
• The branch target buffer(BTB) stores branch and jump addresses, their target
addresses, and optionally prediction information.
• The BTB is accessed during IF stage.
Branch Address Target Address Predicted Bits
…… …… …….
MUQuestionPapers.com 6
• In general, dynamic branch prediction gives better results than static branch
prediction, but at the cost of increased hardware completely.
One-bit Dynamic Branch Predictor:
Two-Bit Prediction:
MUQuestionPapers.com 7
Q.3)
a) Explain page address translation with respect to virtual memory and
further explain TLB in detail. (10 M)
Ans:
Virtual Memory:-
• Virtual Memory was introduced in the system in order to increase the size the size of
memory.
• A hardware unit called Memory Management Unit (MMU) translates Virtual
addresses into physical addresses.
• If CPU wants data from main memory and it is not present in main memory then
MMU causes operating system to bring the data into the Memory from disk.
• As the disk limit is beyond the main memory address, the desired data address has
to be translated from Virtual to physical address. MMU does the address translation.
• Figure below shows Virtual Memory Organization:-
Paging:
• Virtual Memory space is divided into equal size pages.
• Main Memory space is divided into equal size page frames each frame can hold any
page from Virtual Memory.
MUQuestionPapers.com 8
• When CPU wants to access page, it first looks into main memory. If it is found in
main memory then it is called Hit and page is transfer from main memory to CPU.
• If CPU needs page that is not present in main memory then it is called as page fault.
The page has to be loaded from Virtual Memory to main memory.
• There are different page replacement schemes such as FIFO, LRU, LFU, Random Etc.
• During page replacement, it the old page has been modified in the main memory,
then it needs to be first copied into the Virtual Memory and then replaced. CPU
keeps track of such updated pages by maintaining Dirty bit for each page. When
page is updated in main memory dirty bit is set then this dirty page first copied into
Virtual Memory & then replaced.
• Pages are loaded into main memory only when required by the CPU, then it is called
demand paging. Thus pages are loaded only after page faults.
• Hence for accessing the page CPU has to perform 2 Memory Operations:-
• First access the page table to get information about where the page is stored in main
memory, than access the main memory for the page. To solve this problems CPU
copies the pages table information of the most recently used pages in the on-chip
TLB. Therefore, subsequent access to the pages will be faster and information will be
provided by the TBL and CPU need not Access the Table.
MUQuestionPapers.com 9
b) What is micro program? Write microprogram for following operations
I. ADD R1, M, Register R1 and Memory location M are added and
result store at Register R1.
II. MUL R1, R2 Register R1 and Register R2 are multiplied and result
store at Register R1. (10 M)
Ans:
• Microprogramming is a process of writing microcode for a microprocessor.
Microcode is low-level code that defines how a microprocessor should function
when it executes machine-language instructions.
• Typically, one machine language instruction translates into several microcode
instruction, on some computers, the microcode is stored in ROM and cannot be
modified.
• Micro Program to add R1, M.
T-state Operation Microinstructions
T1 PC → MAR PCout, Marin, Read, Clear
y, set Cin, Add, Zinn
T2 M → MBR Zout, PCin, Wait for
PC PC + 1 memory fetch cycle
T3 MBR → IR MBRout, IRin
T4 R1 → x R1out, Xin, CLRC
T5 M → ALU Mout, ADD, Zin
T6 Z → R1 Zout, R1in
T7 Check for intr Assumption enabled intr
pending, CLRX, SETC,
Spout, SUB, Zin
MUQuestionPapers.com 10
T 10 MDR → [SP] Wait for Mem access
T11 PC IS Raddr PCin IS Raddr out.
Q.4)
a) Explain Bus Contention and different method to resolve it. (10 M)
Ans:
• In a bus system, processors, memory modules and peripheral devices are
attached to the bus. The bus can handle only one transaction at a time between
a master and slave. In case of multiple requests, the bus arbitration logic must be
able to allocate or deallocate and it should service request at a time.
• Thus, such a bus is a time sharing or contention bus among multiple functional
modules. As only one transfer can take place at any time on the bus, the overall
performance of the system is limited by the bandwidth of the bus.
• When number of processors contending to acquire a bus exceeds the limit then a
single bus architecture may become a major bottleneck. This may cause a serious
delay in servicing a transaction.
MUQuestionPapers.com 11
• Aggregate data transfer demand should never exceed the capacity of the bus.
This problem can be countered to some extent by increasing the data rate of the
bus and by using a wider bus.
• Method of avoiding contention is multiple bus hierarchy.
Multiple-Bus Architecture:
• If a greater number of devices are connected to the bus, performance will suffer
due to following reasons:
MUQuestionPapers.com 12
o In general, the more devices attached to the bus, the greater will be
propagation delay.
o The bus may become a bottleneck as the aggregate data transfer demand
approaches the capacity of the bus.
o This problem can be countered to some extent by increasing the data rate
the bus can carry and by using wider buses.
o Most computer systems enjoy the use of multiple buses. These buses are
arranged in a hierarchy.
• In a pipelined processor,
Throughput > 1/latency
Pipeline Hazards:
• Instruction hazards occur when instructions read or write registers that are used
by other instructions. The type of conflicts are divided into three categories:
o Structural Hazards (resource conflicts)
o Data Hazards (Data dependency conflicts)
o Branch difficulties (Control Hazards)
• Structural hazards: these hazards are caused by access to memory by two
instructions at the same time. These conflicts can be slightly resolved by using
separate instruction and data memories.
MUQuestionPapers.com 13
• It occurs when the processor’s hardware is not capable of executing all the
instructions in the pipeline simultaneously.
• Data Hazards: This hazard arises when an instruction depends on the result of a
previous instruction, but this result is not available.
• Branch Hazards: Branch instructions, particularly conditional branch instructions,
create data dependencies between the branch instruction and the previous
instruction, fetch stage of the pipeline.
Q.5)
a) Explain multicore processor architecture in detail. (10 M)
Ans:
• A multi-core processor is a computer processor integrated circuit with two or more
separate processing units, called cores, each of which reads and executes program
instructions, as if the computer had several processors.
• The instructions are ordinary CPU instructions (such as add, move data, and branch)
but the single processor can run instructions on separate cores at the same time,
increasing overall speed for programs that support multithreading or other parallel
computing techniques.
• Manufacturers typically integrate the cores onto a single integrated
circuit die (known as a chip multiprocessor or CMP) or onto multiple dies in a
single chip package. The microprocessors currently used in almost all personal
computers are multi-core.
MUQuestionPapers.com 14
Just as with single-processor systems, cores in multi-core systems may implement
architectures such as VLIW, superscalar, vector, or multithreading.
• Multi-core processors are widely used across many application domains,
including general-purpose, embedded, network, digital signal processing (DSP),
and graphics (GPU).
• The improvement in performance gained by the use of a multi-core processor
depends very much on the software algorithms used and their implementation. In
particular, possible gains are limited by the fraction of the software that can run in
parallel simultaneously on multiple cores; this effect is described by Amdahl's law.
• In the best case, so-called embarrassingly parallel problems may realize speedup
factors near the number of cores, or even more if the problem is split up enough to
fit within each core's cache(s), avoiding use of much slower main-system memory.
Most applications, however, are not accelerated so much unless programmers invest
a prohibitive amount of effort in re-factoring the whole problem.
• The parallelization of software is a significant ongoing topic of research.
Cointegration of multiprocessor applications provides flexibility in network
architecture design. Adaptability within parallel models is an additional feature of
systems utilizing these protocols.
b) Explain Booth’s Multiplication algorithm and perform (17)10*(-5)10
(10 M)
Ans:
Booth’s principle states that “The value of series of 1’s of binary can be given as the
weight of the bit preceding the series minus the weight of the last bit in the series.”
MUQuestionPapers.com 15
Given:
Q = -5 = (11011)2
M = 17 = (10001)2
-M = (01111)2
AC Q Q-1 M count
00000 11011 0 10001 5
+01111
01111 11011 0
00111 11101 1 4
00011 11110 1 3
+10001
10100 11110 1
11010 01111 0 2
+01111
01001 01111 0
10100 10111 1 1
11101 10111 1 0
-(0001010101)2 = -(85)2
• In the programmed I/O method of interfacing. CPU has direct control over I/O.
• The processor checks the status of the devices and issues read or write
commands and then transfer data. During the data transfer. CPU waits for I/O
module to complete operation and hence this system wastes the CPU time.
• The sequence of operations to be carried out in programmed I/O operation are:
o CPU requests for I/O operation.
o I/O module performs the said operation.
o I/O module update the status bits.
MUQuestionPapers.com 16
o CPU checks these status bits periodically. Neither the I/O module can
inform CPU directly nor can I/O module interrupt CPU.
o CPU may wait for the operation to complete or may continue the
operation later.
Interrupt driven I/O
• Interrupt driven I/O overcomes the disadvantage of programmed I/O i.e. the CPU
waiting for I/O device.
• This disadvantage is overcome by CPU not repeatedly checking for the device
being ready or not instead the I/O module interrupts when ready.
• The sequence of operations for interrupt Driven I/O is as below:
o CPU issues the read command to I/O device.
o I/O module gets data from peripheral while CPU does other work.
o Once the I/O module completes the data transfer from I/O device, it
interrupts CPU.
o On getting the interrupt, CPU requests data from the I/O module.
o I/O module transfers the data to CPU.
• The interrupt driven I/O mechanism for transferring a block of data.
• After issuing the read command the CPU performs its work, but checks for the
interrupt after every instruction cycle.
• When CPU gets an interrupt, it performs the following operation in sequence:
o Save context i.e. the contents of the registers on the stack
o Processes interrupt by executing the corresponding ISR
o Restore the register context from the stack.
Transferring a word of data
• CPU issues a ‘READ‘ command to I/O device and then switches to some other
program. CPU may be working on different programs.
• Once the I/O device is ready with the data in its data register. I/O device signals an
interrupt to the CPU.
• When then interrupt from I/O device occurs, it suspends execution of the current
program, reads from the port and then resumes execution of the suspended
program.
Data Transfer Modes
• DMA stands for Direct Memory Access. The I/O can directly access the memory
using this method.
• Interrupt driven and programmed I/O require active operation of the CPU. Hence
transfer rate is limited and CPU is also busy doing the transfer operation. DMA is
the solution to this problem.
• DMA controller takes over the control of the bus form CPU for I/O transfer.
MUQuestionPapers.com 17
• The address register is used to hold the address of the memory location from
which the data is to be transferred. There may be multiple address registers to
hold multiple addresses.
• The address may be incremented or decremented after every transfer based on
mode of operation.
• The data count register is used to keep a track of the number of bytes to be
transferred. The counter register is decremented after every transfer.
• The data register is used in a special case i.e. when the transfer of a block is to be
done from one memory location to another memory location.
• The DMA controller is initially programmed by the CPU, for the count of bytes to
bee transferred address of the memory block for the data to be transferred etc.
• During this programming DMAC, the read and write lines work as inputs for
DMAC.
• Once the DMAC takes the control of the system bus i.e. transfers the data
between the memory and I/O device, these read and write signals work as
output signals.
• They are used to tell the memory that the DMAC wants to read or write from the
memory according to the operation being data transfer from memory to I/O or
from I/O to memory.
DMA Transfer Modes:
• Single transfer mode: In this, the device is programmed to make one byte
transfer only after getting the control of system bus.
• After transferring one byte the control of the bus will be returned back to the
CPU.
• The word count will be decremented and the address decremented or
incremented following each transfer.
MUQuestionPapers.com 18
• Block transfer Mode: In this, the device is activated by DREQ or software
request and continues making transfers during the service until a Terminal
Count, or an external End of Process is encountered.
• The advantage is that the I/O device gets the transfer of data a very faster
speed.
• Demand Transfer Mode: In this, the devices continues making transfer until a
Terminal Count or external EOP is encountered, or until DREQ goes inactive.
• Thus, transfer may continue until the I/O device has exhausted its data
handling capacity.
• Hidden Transfer Mode: In this, the DMA controller takes over the charge on
the system bus and transfers data when processor does not needs system
bus.
• The processor does not even realize of this transfer being taking place.
• Hence these transfer are hidden from the processor.
• In set associative cache mapping, cache is divided into a number of sets. Each set
contains a number of lines.
• A given block maps to any line in a given set (i mod j), where i is the line number
of the main memory to be mapped and j is the total number of sets in cache
memory.
• This form of mapping is an enhanced form of direct mapping where the drawbacks
of direct mapping are removed.
• Set associative addresses the problem of possible thrashing in the direct mapping
method.
• It does this by saying that instead of having exactly one line that a block can map
to in the cache, we will group a few lines together creating a Then a block in
memory can map to any one of the lines of a specific set.
• Set-associative mapping allows that each word that is present in the cache can
have two or more words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and associative cache
mapping techniques.
• For example, if there are 2 lines per set, it is called as 2 way associative mapping
i.e. given block can be In one of 2 lines in only one set.
MUQuestionPapers.com 19
c) Flynn’s Classification. (10 M)
Ans:
• A method introduced by Flynn, for classification of parallel processors is most
common. This classification is based on the number of instruction Streams and
Data Streams in the system. There may be single or multiple streams of each of
these. Hence accordingly, Flynn classified the parallel processing into four
categories:
o Single instruction Single Data (SISD)
o Single instruction Multiple Data (SIMD)
o Multiple Instruction Single Data (MISD)
o Multiple Instruction Multiple Data (MIMD)
• SISD: In this case there is a single processor that executes one instruction at a
time on single data stored in the memory.
• In fact, this type of processing can be said to be unit processing, hence unit
processors fall into this category.
• The processing Element accesses the data from the memory and performs the
operation on this data as per the signal given by control unit.
MUQuestionPapers.com 20
• SIMD: In this case the same instruction is given to multiple processing elements,
but different data.
• This kind of system is mainly used when many data have to be operated with
same operation.
• MISD: In case of MISD, there are multiple instruction streams and hence multiple
control units to decode these instructions.
• Each control unit takes a different instructions from the different memory
module in same memory.
• The data stream is single. In this case the data is taken by the first processing
element.
MUQuestionPapers.com 21
d) Control Unit of processor. (10 M)
Ans:
Micro Programmed Control unit:
MUQuestionPapers.com 22
• Each instruction points to a corresponding location in the control memory that
loads the control signals in the control register.
• The control register is then read by a sequencing logic that issues the control
signals in a proper sequence.
• The implementation of the micro programmed.
• The instruction register (IR), status flag and condition codes are read by the
sequencer that generates the address of the control memory location for the
corresponding instruction in the IR.
• This address is stored in the control address register that selects one of the
locations in the control memory having the corresponding control signals.
• These control signals are given to the microinstruction register, decoded and then
given to the individual components of the processor and the external devices.
MUQuestionPapers.com 23