Lecture 5 Principles of Parallel Algorithm Design

Parallel and Distributed Computing
CS 3006 (BCS-7A | BDS-7A)

Lecture 5
Danyal Farhat
FAST School of Computing
NUCES Lahore
Principles of Parallel Algorithm
Design, Task Dependency Graphs,
Granularity and Concurrency, Task
Interaction Graphs
Steps in Parallel Algorithm Design
• Identification: Identifying portions of the work that can be
performed concurrently.
• Work-units are also known as tasks
• E.g., Initializing two mega-arrays are two tasks and can be performed in
parallel
• Mapping: The process of mapping concurrent pieces of the work or

tasks onto multiple processes running in parallel.
• Goal: balance load; maximize data locality
• Approach: static vs. dynamic task assignment
• Process is logical agent for computation over a physical processing element
(processor).
Steps in Parallel Algorithm Design (Cont.)
• Data Partitioning: Distributing the input, output, and intermediate
data associated with the program.
One way is to copy whole data at each processing node
 Memory challenges for huge-size problems
Other way is to give fragments of data to each processing node
 Communication overheads
• Defining Access Protocol: Managing accesses to data shared by

multiple processors
Access protocol that Manage communication and synchronization
Parallel Computing Example
Chess Player
• a parallel program to play chess might look at all the possible first
moves it could make
• each different first move could be explored by a different processor,
to see how the game would continue from that point
• results have to be combined to figure out which is the best first
move
• famous IBM Deep Blue machine that beat Kasparov
brute force computing power
massively parallel with 30 nodes, with each node containing a
120 MHz P2SC microprocessor
Load Balance
• Inefficient if many processors are idle while one processor
has lots of work to do and slowdown the whole applicatio
n
• Best utilizations of parallel processors
• Require load balancing (parallel processors are typically
symmetric
• For example
Web Servers
Matrix Multiplication
Task Decomposition
Decomposition
• “The process of dividing a computation into smaller parts, some or all of
which may potentially be executed in parallel.”
Tasks
• Programmer-defined units of computation into which the main computation
is subdivided by means of decomposition
• Tasks can be of arbitrary size, but once defined, they are regarded as
indivisible units of computation.
• The tasks into which a problem is decomposed may not all be of the same size
• Simultaneous execution of multiple tasks is the key to reducing the time
required to solve the entire problem.
Multiplication of a dense matrix with a vector
• Problem can be decomposed into n tasks

• Computation of each element of vector y is independent of
other elements
• No control dependencies so no task-dependency graph
Vector Multiplication n x 1
• So the multiplication program like:
for (row = 0; row < n; row++)
y[row] = dot_product( get_row(A, row), get_col(b));
• can be transformed to:
y[row]= create_thread ( dot_product(get_row(A, row), get_col(b)));
• In this case, one may think of the thread as an instance of

a function that returns before the function has finished
executing
Matrix Multiplication n x n
for (column = 0; column < n; column++)
c[row][column] = dot_product( get_row(a, row), get_col(b, col));
Multithreaded:
for (column = 0; column < n; column++)
c[row][column] = create_thread( dot_product(get_row(a, row),
get_col(b, col)));
Task Dependency Graph
• The tasks in the previous examples are independent and
can be performed in any sequence.
• In most of the problems, there exist some sort of
dependencies between the tasks.
• An abstraction used to express such dependencies among
tasks and their relative order of execution is known as a
task-dependency graph.
Task Dependency Graph (Cont.)
• “It is a directed acyclic graph in which node are tasks and
the directed edges indicate the dependencies between
them”
• The task corresponding to a node can be executed when all
tasks connected to this node by incoming edges have
completed.
Some tasks may use data produced by other tasks and thus may need to
wait for these tasks to finish execution
Database Query
Execution of the query:

MODEL = “CIVIC” AND YEAR = 2001 AND
(COLOR = “GREEN” OR COLOR = “WHITE”)
Granularity
• The number and sizes of tasks into which a problem is
decomposed determines the granularity of the decomposition
Granularity: roughness (means consisting of small grains or particles)
A decomposition into a large number of small tasks is called fine-grained
A decomposition into a small number of large tasks is called coarse-grained
• For matrix-vector multiplication, the decomposition would
usually be considered fine-grained, although coarse-grained
could also be an option
Granularity (Cont.)
• Below figure shows a coarse-grained decomposition as each
tasks computes n/3 of the entries of the output vector of
length n
Maximum Degree of Concurrency
• “The maximum number of tasks that can be executed simultaneously in
a parallel program at any given time is known as its maximum degree of
concurrency.”
• Usually, it is always less than total number of tasks due to dependencies.
• E.g., max-degree of concurrency in the task-graphs of Figure 3.3 is 4.
• Rule of thumb: For task-dependency graphs that are trees, the maximum
degree of concurrency is always equal to the number of leaves in the
tree
Maximum Degree of Concurrency (Cont.)
• Determine Maximum Degree of Concurrency?
Average Degree of Concurrency
• A relatively better measure for the performance of a parallel
program
• The average number of tasks that can run concurrently over

the entire duration of execution of the program
• The ratio of the total amount of work to the critical-path

length
• So, what is the critical path in a graph?
Critical Path
• Critical Path: The longest directed path between any pair of
start and finish nodes is known as the critical path.
• Critical Path Length: The sum of the weights of nodes along
this path
• the weight of a node is the size or the amount of work associated with
the corresponding task.
• A shorter critical path favors a higher average-degree of
concurrency.
• Both, maximum and average degree of concurrency increases
as tasks become smaller(finer)
Maximum and Average Degree of Concurrency
• Maximum Degree of concurrency: 4 and 4

• Critical path lengths: 27 and 34
• Total amount of work: 63 and 64
• Average degree of concurrency: 2.33 and 1.88
Task Interaction Graph
• Depicts pattern of interaction between the tasks
• Dependency graphs only show that how output of first task
becomes input to the next level task.
• But how the tasks interact with each other to access
distributed data is only depicted by task interaction graphs
• The nodes in a task-interaction graph represent tasks
• The edges connect tasks that interact with each other
• Example: Dense matrix-vector multiplication
Task Interaction Graph (Cont.)
• The edges in a task interaction graph are usually undirected.
but directed edges can be used to indicate the direction of flow of
data, if it is unidirectional.
• The edge-set of a task-interaction graph is usually a superset of

the edge-set of the task-dependency graph.
• In database query processing example, the task-interaction

graph is the same as the task-dependency graph.
Task Interaction Graph (Cont.)
Processes and Mapping
• Logical processing or computing agent that performs tasks is called
process.
• The mechanism by which tasks are assigned to processes for
execution is called mapping.
• Multiple tasks can be mapped on a single process
• Independent task should be mapped onto different processes
• Map tasks with high mutual-interactions onto a single process
• A parallel program must have several processes active and
simultaneously working on different tasks to gain a significant
speedup over the sequential program.
Processes and Mapping (Cont.)
Processes vs Processors
• Processes are logical computing agents that perform tasks
• Processors are the hardware units that physically perform

computations
• Depending on the problem, multiple processes can be mapped

on a single processor
• But, in most of the cases, there is one-to-one correspondence

between processors and processes
Additional Resources
Book: Introduction to Parallel Computing by Ananth Grama
and Anshul Gupta
• Chapter 3. Principles of Parallel Algorithm Design
Section 3.1: Preliminaries
Thank You!

Lecture 5 Principles of Parallel Algorithm Design

Uploaded by

Lecture 5 Principles of Parallel Algorithm Design

Uploaded by

Parallel and Distributed Computing

CS 3006 (BCS-7A | BDS-7A)

• Mapping: The process of mapping concurrent pieces of the work or

• Defining Access Protocol: Managing accesses to data shared by

• Problem can be decomposed into n tasks

• In this case, one may think of the thread as an instance of

Execution of the query:

• Usually, it is always less than total number of tasks due to dependencies.

• E.g., max-degree of concurrency in the task-graphs of Figure 3.3 is 4.

• The average number of tasks that can run concurrently over

• The ratio of the total amount of work to the critical-path

• Maximum Degree of concurrency: 4 and 4

• The edge-set of a task-interaction graph is usually a superset of

• In database query processing example, the task-interaction

• Processors are the hardware units that physically perform

• Depending on the problem, multiple processes can be mapped

• But, in most of the cases, there is one-to-one correspondence

You might also like