Parallel Processing

Parallel Processing
Presented by:
Dishant Khosla
Asst. Professor
Parallel Processing
Parallel processing is a method of simultaneously breaking up and running
program tasks on multiple microprocessors, thereby reducing processing
time.
Parallel processing may be accomplished via a computer with two or more
processors or via a computer network.
Parallel processing is also called parallel computing.
Parallel processing is particularly useful when running programs that
perform complex computations, and it provides a viable option to the quest
for cheaper computing alternatives.
Supercomputers commonly have hundreds of thousands of
microprocessors for this purpose.
Parallel processing should not be confused with concurrency, which refers
to multiple tasks that run simultaneously.
Difference between Serial and Parallel Processing
The main difference between serial and parallel processing in
computer architecture is that serial processing performs a single
task at a time while parallel processing performs multiple tasks at
a time.
Computer architecture defines the functionality, organization, and
implementation of a computer system.
It explains how the computer system is designed and the
technologies it is compatible with.
The processor is one of the most essential components in the
computer system.
It executes instructions and completes the tasks assigned to it.
There are two main types of processing as serial and parallel
processing.
• Number of processors
A major difference between serial and parallel processing is that there is a single processor in
serial processing, but there are multiple processors in parallel processing.
• Performance
Therefore, the performance of parallel processing is higher than in serial processing.
• Work Load
In serial processing, the workload of the processor is higher. However, in parallel processing,
the workload per processor is lower. Thus, this is an important difference between serial and
parallel processing.
• Data transferring
Moreover, in serial processing, data transfers are in bit by bit format. However, in parallel
processing, data transfers are in byte form (8 bits).
• Required time
Time taken is also a difference between serial and parallel processing. That is; serial
processing requires more time than parallel processing to complete a task.
• Cost
Furthermore, parallel processing is more costly than serial processing as it uses multiple
processors.
Flynn’s Hardware Taxonomy
Processor Organizations
Single instruction, single Single instruction, multiple Multiple instruction, single Multiple instruction,
data (SISD) stream data (SIMD) stream data (MISD) stream multiple data (MIMD)
stream
Uniprocessor
Vector Array processor Shared Distributed
processor memory memory
• Flynn has classified the computer systems based on parallelism in the
instructions and in the data streams. These are:
1. Single instruction stream, single data stream (SISD).
2. Single instruction stream, multiple data stream (SIMD).
3. Multiple instruction streams, single data stream (MISD).
4. Multiple instruction stream, multiple data stream (MIMD).
1. Single-instruction, single-data (SISD) systems –
• An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream.
• In SISD, machine instructions are processed in a sequential manner and computers adopting this model are
popularly called sequential computers.
• Most conventional computers have SISD architecture.
• All the instructions and data to be processed have to be stored in primary memory.
• The speed of the processing element in the SISD model is limited(dependent) by the rate at which the
computer can transfer information internally.
• Dominant representative SISD systems are IBM PC, workstations.
2. Single-instruction, multiple-data (SIMD) systems –
• An SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams.
• Machines based on an SIMD model are well suited to scientific computing since they involve lots of vector
and matrix operations.
• So that the information can be passed to all the processing elements(PEs) organized data elements of vectors
can be divided into multiple sets(N-sets for NPE systems) and each PE can process one data set.
3. Multiple-instruction, single-data (MISD) systems –
• An MISD computing system is a multiprocessor machine capable of executing different instructions on
different PEs but all of them operating on the same dataset .
• Example Z = sin(x)+cos(x)+tan(x)
• The system performs different operations on the same data set.
• Machines built using the MISD model are not useful in most of the application, a few machines are built, but
none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD) systems –
• An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on multiple
data sets.
• Each PE in the MIMD model has separate instruction and data streams; therefore machines built using this
model are capable to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
MIMD Types
Shared-memory MIMD
Distributed memory MIMD
Shared-memory MIMD
• In the shared memory MIMD model (tightly coupled multiprocessor systems),
all the PEs are connected to a single global memory and they all have access
to it.
• The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one
PE is visible to all other PEs.
• Dominant representative shared memory MIMD systems are Silicon Graphics
machines and Sun/IBM’s SMP(Symmetric Multi-Processing).
Distributed -memory MIMD
• In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all Pes have a local
memory.
• The communication between PEs in this model takes place through the interconnection network
(the inter process communication channel, or IPC).
• The network connecting PEs can be configured to tree, mesh or in accordance with the
requirement.
• The shared-memory MIMD architecture is easier to program but is less tolerant to failures and
harder to extend with respect to the distributed memory MIMD model.
• Failures in a shared-memory MIMD affect the entire system, whereas this is not the case of the
Distributed model, in which each of the PEs can be easily isolated.
• Moreover, shared memory MIMD architectures are less likely to scale because the addition of more
PEs leads to memory contention.
• This is a situation that does not happen in the case of distributed memory, in which each PE has its
own memory.
• As a result of practical outcomes and user’s requirement, distributed memory MIMD architecture is
superior to the other existing models.
APPLICATIONS OF PARALLEL PROCESSING
Parallel computing is an evolution of serial computing that attempts to emulate
what has always been the state of affairs in the natural world. In the natural
world, it is quite common to find many complex, interrelated events happening
at the same time. Examples of concurrent processing in natural and man-made
environments include:
• Building a shopping mall
• Automobile assembly line
• Predicting results of chemical and nuclear reactions
• Daily operations within a business
• Weather forecasting
• DNA structures of various species
Cache Memory in Computer Organization
• Cache Memory is a special very high-speed memory.
• It is used to speed up and synchronizing with high-speed CPU.
• Cache memory is costlier than main memory or disk memory but economical than
CPU registers.
• Cache memory is an extremely fast memory type that acts as a buffer between RAM
and the CPU.
• It holds frequently requested data and instructions so that they are immediately
available to the CPU when needed.
• Cache memory is used to reduce the average time to access data from the Main
memory.
• The cache is a smaller and faster memory which stores copies of the data from
frequently used main memory locations.
• There are various different independent caches in a CPU, which store instructions
and data.
Levels of memory
• Level 1 or Register –
It is a type of memory in which data is stored and accepted that are immediately
stored in CPU. Most commonly used register is accumulator, Program counter,
address register etc.
• Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is temporarily stored
for faster access.
• Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and once power is
off data no longer stays in this memory.
• Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data stays permanently
in this memory.
Cache Performance:
When the processor needs to read or write a data at a particular location in main
memory, it first checks for a corresponding entry in the cache.
• If the processor finds that the memory location is in the cache, a cache hit has
occurred and data is read from cache.
• If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
main memory, then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity
called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher
associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit in the
cache.
Application of Cache Memory:
1. Usually, the cache memory can store a reasonable number of blocks at any given
time, but this number is small compared to the total number of blocks in the main
memory.
2. The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.
Types of Cache :
• Primary Cache –
A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers.
• Secondary Cache –
Secondary cache is placed between the primary cache and the rest of the memory. It
is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the
processor chip.
Cache Coherence
• In computer architecture, cache coherence is the uniformity of shared
resource data that ends up stored in multiple local caches.
• When clients in a system maintain caches of a common memory resource,
problems may arise with incoherent data, which is particularly the case with
CPUs in a multiprocessing system.
Uniform Memory Access (UMA)
• In UMA, where Single memory controller is used. Uniform Memory Access is
slower than non-uniform Memory Access.
• In Uniform Memory Access, bandwidth is restricted or limited rather than
non-uniform memory access.
• There are 3 types of buses used in uniform Memory Access which are: Single,
Multiple and Crossbar. It is applicable for general purpose applications and
time-sharing applications.
Non-uniform Memory Access (NUMA)
In NUMA, where different memory controller is used.
Non-uniform Memory Access is faster than uniform Memory Access. There
are 2 types of buses used which are: Tree and hierarchical.
Non-uniform Memory Access is applicable for real-time applications and
time-critical applications.
S.N
UMA NUMA
O
1. UMA stands for Uniform Memory Access. NUMA stands for Non-uniform Memory Access.
In Uniform Memory Access, Single In Non-uniform Memory Access, Different

2.
memory controller is used. memory controller is used.
Uniform Memory Access is slower than Non-uniform Memory Access is faster than
3.
non-uniform Memory Access. uniform Memory Access.
Uniform Memory Access has limited Non-uniform Memory Access has more
4.
bandwidth. bandwidth than uniform Memory Access.
Uniform Memory Access is applicable for Non-uniform Memory Access is applicable for
5. general purpose applications and real-time applications and time-critical
time-sharing applications. applications.
In uniform Memory Access, memory In non-uniform Memory Access, memory
6.
access time is balanced or equal. access time is not equal.
There are 3 types of buses used in While in non-uniform Memory Access, There
7. uniform Memory Access which are: are 2 types of buses used which are: Tree and
Single, Multiple and Crossbar. hierarchical.

Parallel Processing

Uploaded by

Copyright:

Available Formats

Parallel Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Processing

Uploaded by

Copyright:

Available Formats

Parallel Processing

In Uniform Memory Access, Single In Non-uniform Memory Access, Different

You might also like