0% found this document useful (0 votes)
86 views30 pages

Lecture 5 Principles of Parallel Algorithm Design

This document discusses key concepts in parallel and distributed computing including: 1) The steps in parallel algorithm design: identifying concurrent tasks, mapping tasks to processes, data partitioning, and defining access protocols. 2) Examples of parallel algorithms like chess playing and matrix multiplication. 3) The concepts of load balancing, task decomposition, task dependency graphs, granularity, and degrees of concurrency. 4) Task interaction graphs and the relationship between processes and processors in parallel mapping.

Uploaded by

nimranoor137
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
86 views30 pages

Lecture 5 Principles of Parallel Algorithm Design

This document discusses key concepts in parallel and distributed computing including: 1) The steps in parallel algorithm design: identifying concurrent tasks, mapping tasks to processes, data partitioning, and defining access protocols. 2) Examples of parallel algorithms like chess playing and matrix multiplication. 3) The concepts of load balancing, task decomposition, task dependency graphs, granularity, and degrees of concurrency. 4) Task interaction graphs and the relationship between processes and processors in parallel mapping.

Uploaded by

nimranoor137
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 30

Parallel and Distributed Computing

CS 3006 (BCS-7A | BDS-7A)


Lecture 5
Danyal Farhat
FAST School of Computing
NUCES Lahore
Principles of Parallel Algorithm
Design, Task Dependency Graphs,
Granularity and Concurrency, Task
Interaction Graphs
Steps in Parallel Algorithm Design
• Identification: Identifying portions of the work that can be
performed concurrently.
• Work-units are also known as tasks
• E.g., Initializing two mega-arrays are two tasks and can be performed in
parallel

• Mapping: The process of mapping concurrent pieces of the work or


tasks onto multiple processes running in parallel.
• Goal: balance load; maximize data locality
• Approach: static vs. dynamic task assignment
• Process is logical agent for computation over a physical processing element
(processor).
Steps in Parallel Algorithm Design (Cont.)
• Data Partitioning: Distributing the input, output, and intermediate
data associated with the program.
One way is to copy whole data at each processing node
 Memory challenges for huge-size problems
Other way is to give fragments of data to each processing node
 Communication overheads

• Defining Access Protocol: Managing accesses to data shared by


multiple processors
Access protocol that Manage communication and synchronization
Parallel Computing Example
Chess Player
• a parallel program to play chess might look at all the possible first
moves it could make
• each different first move could be explored by a different processor,
to see how the game would continue from that point
• results have to be combined to figure out which is the best first
move
• famous IBM Deep Blue machine that beat Kasparov
brute force computing power
massively parallel with 30 nodes, with each node containing a
120 MHz P2SC microprocessor
Load Balance
• Inefficient if many processors are idle while one processor
has lots of work to do and slowdown the whole applicatio
n
• Best utilizations of parallel processors
• Require load balancing (parallel processors are typically
symmetric
• For example
Web Servers
Matrix Multiplication
Task Decomposition
Decomposition
• “The process of dividing a computation into smaller parts, some or all of
which may potentially be executed in parallel.”
Tasks
• Programmer-defined units of computation into which the main computation
is subdivided by means of decomposition
• Tasks can be of arbitrary size, but once defined, they are regarded as
indivisible units of computation.
• The tasks into which a problem is decomposed may not all be of the same size
• Simultaneous execution of multiple tasks is the key to reducing the time
required to solve the entire problem.
Multiplication of a dense matrix with a vector

• Problem can be decomposed into n tasks


• Computation of each element of vector y is independent of
other elements
• No control dependencies so no task-dependency graph
Vector Multiplication n x 1
• So the multiplication program like:
for (row = 0; row < n; row++)
y[row] = dot_product( get_row(A, row), get_col(b));
• can be transformed to:
for (row = 0; row < n; row++)
y[row]= create_thread ( dot_product(get_row(A, row), get_col(b)));

• In this case, one may think of the thread as an instance of


a function that returns before the function has finished
executing
Matrix Multiplication n x n
for (row = 0; row < n; row++)
for (column = 0; column < n; column++)
c[row][column] = dot_product( get_row(a, row), get_col(b, col));

Multithreaded:
for (row = 0; row < n; row++)
for (column = 0; column < n; column++)
c[row][column] = create_thread( dot_product(get_row(a, row),
get_col(b, col)));
Task Dependency Graph
• The tasks in the previous examples are independent and
can be performed in any sequence.
• In most of the problems, there exist some sort of
dependencies between the tasks.
• An abstraction used to express such dependencies among
tasks and their relative order of execution is known as a
task-dependency graph.
Task Dependency Graph (Cont.)
• “It is a directed acyclic graph in which node are tasks and
the directed edges indicate the dependencies between
them”
• The task corresponding to a node can be executed when all
tasks connected to this node by incoming edges have
completed.
Some tasks may use data produced by other tasks and thus may need to
wait for these tasks to finish execution
Database Query

Execution of the query:


MODEL = “CIVIC” AND YEAR = 2001 AND
(COLOR = “GREEN” OR COLOR = “WHITE”)
Task Dependency Graph (Cont.)
Task Dependency Graph (Cont.)
Granularity
• The number and sizes of tasks into which a problem is
decomposed determines the granularity of the decomposition
Granularity: roughness (means consisting of small grains or particles)
A decomposition into a large number of small tasks is called fine-grained
A decomposition into a small number of large tasks is called coarse-grained
• For matrix-vector multiplication, the decomposition would
usually be considered fine-grained, although coarse-grained
could also be an option
Granularity (Cont.)
• Below figure shows a coarse-grained decomposition as each
tasks computes n/3 of the entries of the output vector of
length n
Maximum Degree of Concurrency
• “The maximum number of tasks that can be executed simultaneously in
a parallel program at any given time is known as its maximum degree of
concurrency.”

• Usually, it is always less than total number of tasks due to dependencies.

• E.g., max-degree of concurrency in the task-graphs of Figure 3.3 is 4.

• Rule of thumb: For task-dependency graphs that are trees, the maximum
degree of concurrency is always equal to the number of leaves in the
tree
Maximum Degree of Concurrency (Cont.)
• Determine Maximum Degree of Concurrency?
Average Degree of Concurrency
• A relatively better measure for the performance of a parallel
program

• The average number of tasks that can run concurrently over


the entire duration of execution of the program

• The ratio of the total amount of work to the critical-path


length
• So, what is the critical path in a graph?
Critical Path
• Critical Path: The longest directed path between any pair of
start and finish nodes is known as the critical path.
• Critical Path Length: The sum of the weights of nodes along
this path
• the weight of a node is the size or the amount of work associated with
the corresponding task.
• A shorter critical path favors a higher average-degree of
concurrency.
• Both, maximum and average degree of concurrency increases
as tasks become smaller(finer)
Maximum and Average Degree of Concurrency

• Maximum Degree of concurrency: 4 and 4


• Critical path lengths: 27 and 34
• Total amount of work: 63 and 64
• Average degree of concurrency: 2.33 and 1.88
Task Interaction Graph
• Depicts pattern of interaction between the tasks
• Dependency graphs only show that how output of first task
becomes input to the next level task.
• But how the tasks interact with each other to access
distributed data is only depicted by task interaction graphs
• The nodes in a task-interaction graph represent tasks
• The edges connect tasks that interact with each other
• Example: Dense matrix-vector multiplication
Task Interaction Graph (Cont.)
• The edges in a task interaction graph are usually undirected.
but directed edges can be used to indicate the direction of flow of
data, if it is unidirectional.

• The edge-set of a task-interaction graph is usually a superset of


the edge-set of the task-dependency graph.

• In database query processing example, the task-interaction


graph is the same as the task-dependency graph.
Task Interaction Graph (Cont.)
Processes and Mapping
• Logical processing or computing agent that performs tasks is called
process.
• The mechanism by which tasks are assigned to processes for
execution is called mapping.
• Multiple tasks can be mapped on a single process
• Independent task should be mapped onto different processes
• Map tasks with high mutual-interactions onto a single process
• A parallel program must have several processes active and
simultaneously working on different tasks to gain a significant
speedup over the sequential program.
Processes and Mapping (Cont.)
Processes vs Processors
• Processes are logical computing agents that perform tasks

• Processors are the hardware units that physically perform


computations

• Depending on the problem, multiple processes can be mapped


on a single processor

• But, in most of the cases, there is one-to-one correspondence


between processors and processes
Additional Resources
Book: Introduction to Parallel Computing by Ananth Grama
and Anshul Gupta
• Chapter 3. Principles of Parallel Algorithm Design
Section 3.1: Preliminaries
Thank You!

You might also like