M1 Data Structures
M1 Data Structures
M1 Data Structures
Ullman
Syllabus
Module I (11 hours) Review of Data Types - Scalar Types - Primitive types - Enumerated types - Subranges - Arrays- sparse matrices - representation - Records - Complexity of Algorithms - Time & Space Complexity of Algorithms Recursion: Recursive algorithms - Analysis of Recursive algorithms Module II (18 hours) Linear Data Structures - Stacks Queues -Lists - Dequeus - Linked List - singly, doubly linked and circular lists - Application of linked lists - Polynomial Manipulation - Stack & Queue implementation using Array & Linked List - Typical problems - Conversion of infix to postfix - Evaluation of postfix expression priority queues Module III (18 hours) Non Linear Structures - Graphs - Trees - Graph and Tree implementation using array and Linked List Binary trees - Binary tree traversals - pre-order, in-order and postorder - Threaded binary trees Binary Search trees - AVL trees - B trees and B+ trees - Graph traversals - DFS, BFS - shortest path - Dijkstras algorithm, Minimum spanning tree - Kruskal Algorithm, Prims algorithm Module IV (18 hours) Searching - Sequential Search - Searching Arrays and Linked Lists - Binary Searching - Searching arrays and Binary Search Trees - Hashing - Open & Closed Hashing - Hash functions - Resolution of Collision Sorting- n2 Sorts - Bubble Sort - Insertion Sort - Selection Sort - n log n Sorts - Quick Sort - Heap Sort Merge Sort - External Sort - Merge Files
CS09 303 Data Structures - Module 1
References
Text Books Aho A.V, Hopcroft J.E. & Ullman J.D, Data Structures and Algorithms, AddisonWesley Reference Books 1. Sahni S, Data Structures, Algorithms and Applications in C++, McGrawHill 2. Wirth N, Algorithms + Data Structures = Programs, Prentice Hall. 3. Cormen T.H, Leiserson C.E & Rivest R.L, Introduction to Algorithms in C++, Thomson Books. 4. Fundamentals of Computer Algorithms, Ellis Horowitz, S. Sahni 5. Deshpande P.S, Kakde O.G, C and Data Structures, Dream- tech India Pvt. Ltd.
CS09 303 Data Structures - Module 1
Module 1
Introduction Review of Data Types Scalar types Primitive types Enumerated types Subrange types Arrays - representation sparse matrices Records - representation Sets - representation
Complexity of Algorithms Time complexity Space Complexity Recursion: Recursive algorithms Analysis of Recursive algorithms
Problems to Programs
Problem formulation and specification Design of the solution Implementation Testing and documentation Evaluation of the solution
CS09 303 Data Structures - Module 1
Problems to Programs
Identify the problem Formal model (mathematical model informal algorithm) Check for existing programs for the given problem Algorithm pseudo-language and stepwise refinement Implementation
CS09 303 Data Structures - Module 1
Algorithm
Finite sequence of instructions Clear meaning Finite amount of effort Finite length of time (never enters an infinite loop on any input.)
Algorithm Specification
Pascal Language Pseudo Language Constructs of a programming language + Informal English statements
Mathematical model (together with various operations defined on the model). Designing Phase Data Structure: Logical Model (to represent the mathematical model underlying an ADT) Implementation Phase Both are models with collection of operations
CS09 303 Data Structures - Module 1
ADT
The model defines an abstract view to the problem. This implies that the model focuses only on problem and tries to define properties of the problem. These properties include: the data which are affected and the operations which are identified by the problem.
As an example, consider the administration of employees in an institution. What employee information administration? What tasks should be allowed? is needed by the
Employees are real persons who can be characterized with many properties; a few are: name, size, date of birth, social number, room number, hobbies.
CS09 303 Data Structures - Module 1
Only some of them are problem specific. Consequently, you create a model of an employee for the problem. This model only implies properties which are needed to fulfill the requirements of the administration, for instance, name, date of birth and social number. These properties are called the data of the (employee) model. Now you have described real persons with help of an abstract employee.
CS09 303 Data Structures - Module 1
There must be some operations defined with which the administration is able to handle the abstract employees. For example, there must be an operation which allows you to create a new employee once a new person enters the institution. Also, you have to identify the operations which should be able to be performed on an abstract employee. You also decide to allow access to the employees' data only with associated operations.
CS09 303 Data Structures - Module 1
Data Abstraction
Abstraction is the structuring of a nebulous(not properly defined) problem into well-defined entities by defining their data and operations. These entities combine data and operations.
An entity with the properties just described is called an abstract data type (ADT). ADT consists of an abstract data structure and operations. Only the operations are viewable from the outside.
Once a new employee is ``created', the data structure is filled with actual values; You then have an instance of an abstract employee. As many instances of an abstract employee as needed can be created to describe every real employed person.
An abstract data type (ADT) is characterized by the following properties: 1. It exports a type.
2. It exports a set of operations. This set is called interface. 3. Operations of the interface are the one and only access mechanism to the type's data structure. 4. Axioms and preconditions define the application domain of the type.
CS09 303 Data Structures - Module 1
To define an ADT for complex numbers. Complex numbers consists of two parts: real part and imaginary part. Both parts are represented by real numbers. Complex numbers define several operations: addition, subtraction, multiplication, division etc.
To represent a complex number, it is necessary to define the data structure to be used by its ADT. Two possibilities: Both parts are stored in a two-valued array where the first value indicates the real part and the second value the imaginary part of the complex number. If x denotes the real part and y the imaginary part, you could think of accessing them via array subscription: x=c[0] and y=c[1].
CS09 303 Data Structures - Module 1
Both parts are stored in a two-valued record. If the element name of the real part is r and that of the imaginary part is i, x and y can be obtained with: x=c.r and y=c.i. ADT definition also says that for each access to the data structure, there must be an operation defined. The addition of two complex numbers requires you to perform an addition for each part. Consequently, you must access the value of each part which is different for each version.
CS09 303 Data Structures - Module 1
By providing an operation ``add'' you can encapsulate these details from its actual use. In an application context you simply ``add two complex numbers'' regardless of how this functionality is actually achieved. Once you have created an ADT for complex numbers, say Complex, it can be used in the same way as well-known data types such as integers.
CS09 303 Data Structures - Module 1
Abstract Data Type (ADT) Mathematical model with collection of operations defined on that model Generalization of primitive data types Eg; Sets of integers together with operations of union, intersection and set difference. Eg; LIST (of integers)
Make list empty Get first member of list, return null if empty Get next member of list, return null if empty Insert integer into list
CS09 303 Data Structures - Module 1
Implementation of ADT is a translation into statements of a programming language of the declaration that defines a variable to be that ADT, plus a procedure in that language for each operation of that ADT. Implementation chooses a data structure to represent the ADT. Ie; Data structure is used to represent the mathematical model underlying an ADT.
BOOLEAN Values: FALSE , TRUE Operations: NOT, AND, OR, assignment Predefined functions: PRED, SUCC, ORD Predefined procedures: WRITE, WRITELN CHAR Values: space < '!' < ... < '0' < '1' < ... < '9' < ':' < ';' < '<' < '=' < '>' < '?' < '@'
< 'A' < 'B' < ... < 'Z' < ... < 'a' < 'b' < ... < 'z
Operations: assignment, comparison with relational characters Predefined functions: PRED, SUCC, ORD Predefined procedures: READ, READLN, WRITE, WRITELN
INTEGER Values: -32768, ..., -1, 0, 1, 2, 3, ... 32767 (= MAXINT) Operations: +, -, *, DIV, MOD, assignment, comparison with relationals Predefined functions: PRED, SUCC, ORD, ABS, SQR, SQRT, ODD, CHR Predefined procedures: READ, READLN, WRITE, WRITELN REAL Values: ratios of integers Operations: +, -, *, /, assignment, comparison Predefined functions: ABS, SQR, SQRT, ROUND, TRUNC Predefined procedures: READ, READLN, WRITE, WRITELN
CHR The chr or character position function returns the character associated with the ASCII value being asked. eg; chr( 65 ) will return the character A. ORD The ord or ordinal function returns the ASCII value of a requested character. In essence, it works backwards to the chr function. Ordinal data types are those which have a predefined, known set of values. Each value which follows in the set is one greater than the previous. Characters and integers are thus ordinal data types. ord( 'C' ) will return the value 67.
SUCC The successor function determines the next value or symbol in the set, thus succ( 'd' ) will return e. PRED The predecessor function determines the previous value or symbol in the set, thus pred( 'd' ) will return c. ord(false) ord(true) ord(21) ord('A') returns returns returns returns 0 1 21 65
var School_days: Work_days; Free_days: Week_end; Worst_days: Monday..Thursday; { user defined subrange type }
CS09 303 Data Structures - Module 1
type beverage = ( coffee, tea, cola, soda, milk, water ); color = ( green, red, yellow, blue, black, white ); var drink : beverage; chair : color; drink := coffee; chair := green; if chair = yellow then drink := tea;
CS09 303 Data Structures - Module 1
SUBRANGE TYPES
Identifiers can be defined so that they have a restricted range of values of a given ordinal type. The syntax is: Identifier: First value..Last value; These constructs are called subrange types. Examples: var num: -10..19; { Num has integer values -10 to 19 inclusive } alphabet : 'A'..'Z; { Alphabet is char with values 'A' to 'Z' }
CS09 303 Data Structures - Module 1
Subranges
type name = <constant1> .. <constant2> eg: type range = low..high type digit = 0..99
Cell
Cell Basic building block of data structures Capable of holding a value
Data Structure
Created by giving names to aggregates of cells.
Arrays
Homogeneous Random access structure Index array[0..99] of type var A :array[0..99] of char name:array[index type] of celltype Array is a sequence of cells of a given type often referred to as celltype. Indextype can be an enumerated data type or subrange.
Records
Collection of cells called fields May be of dissimilar types Records can be grouped into arrays.
record <name1> : <type1> <name2> :<type2> . . <namek> :<typek> end
Using var & array combination var reclist: array[1..4]of record data:real; next:integer; end;
type cardsuit = (clubs, diamonds, hearts, spades); card = record suit: cardsuit; value: 1 .. 13; end; var hand: array [ 1 .. 13 ] of card; trump: cardsuit;
Array Vs Record
Array Homogeneous Run Time Record Heterogeneous Compile time
SETS
Ordered collection of elements + No repetition Set brackets - individually or subranges eg:[a..z,A..Z],[0..9],[mon..sun] Operators defined on all set types - [+,-,*, in ]
type T = set of T0 Eg; type set1 = set of [1..3] var A: set1; or var A: set of [1..3];
CS09 303 Data Structures - Module 1
Set Operations =>Membership operator (in) =>Bitwise operator (Union, difference, intersection) =>Relational (<=,=,>=)
FILES
Sequence of values of some particular type No index type, accessed in order Number of elements in a file can be time-varying and unlimited.
type T = file of T0 Eg; type file1 = file of char var A: file1; Or var A: file of char;
POINTERS
Cell whose value indicates another cell.
var ptr : cell type;
type link = cell; cell = record info: integer; next: link end;
CURSOR
Integer valued cell, used as array(used in Fortran)
header
pointer to an
.
1.2 3.4 5.6 7.8
data reclist
1 2
3 0 2 1
next
3 4
Complexity of algorithm
Time
Space
Understandability
Big-Oh(Worst case)
The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the size of the input to the problem. The time complexity of an algorithm is commonly expressed using Big - Oh notation. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm Big - Oh notation usually only provides an upper bound on the growth rate of the function.
Asymptotic efficiency- how running time of an algorithm increases with the size of input.
CS09 303 Data Structures - Module 1
Definition of Big-Oh
T(n) is O (n) Means that there exist +ve constants c and n0 such that for all n n0 , we have, T(n) c n O(g(n)) = f(n) : if there exist +ve constants c and n0 such that 0 f(n) c.g(n) , for all n n0 Find time complexity of T(n) = (n+1) Find time complexity of T(n) = 3n+2n
CS09 303 Data Structures - Module 1
Find running time of O(n),O(n),O(n log n) Find O(max(f(n),g(n))), if f(n)={n4, if n is even; n, if n is odd} g(n)={n, if n is even; n, if n is odd}
Big O notation will always assume the upper limit where the algorithm will perform the maximum number of operations or iterations. O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set. O(1) operations run in constant time. O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. O(n) operations run in linear time.
O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc. O(log n) - Any algorithm which cuts the problem in half each time is O(log n). O(log n) operations run in logarithmic time. O(n log n) Performs O(log n) operation for each item in your input. O(n log n) operations run in loglinear time.
CS09 303 Data Structures - Module 1
O(2^n) - means that the time taken will double with each additional element in the input data set. O(2^n) operations run in exponential time. O(n!) - involves doing something for all possible permutations of the n elements. O(n!) operations run in factorial time.
Elementary operations:
Arithmetic operations Assignment operation Testing a condition Read operation Write operation These operations taken independently has complexity O(1).
1. Sequence of statements which is executed only once. Constant time O(1) Statement1; Statement2; : : Statement k; Total time taken = time(Statement1) + time(Statement2) + + time(Statement k) Time for each statement is a constant. Therefore, total time is a constant and hence T(n) is O(1). This has complexity O(1) as it does not depend on the number of statements in the sequence.
CS09 303 Data Structures - Module 1
Example: Algorithm to check if a number is even or odd. if n%2 = 0 then ------O(1) writeln(number is even); ------O(1) else writeln(number is odd); ------O(1) T(n) is O(1).
2. For loop Linear time O(n) for i:=1 to n do begin sequence1; end To find time complexity of an algorithm with loops, count the number of times the loop executes. Worst case: loop executes n times. Here, each of the statements in the sequence is O(1). Total no. of iterations = n Therefore, total time taken = n*O(1) = O(n). Hence, T(n) is O(n).
CS09 303 Data Structures - Module 1
Example: Linear Search or Sequential Search Algorithm to search for a number in an array. Best case: number is found at first position. Worst case: number is found at the last position or it is not present in the array( entire array is searched). for i:=1 to n do begin if num = arr[i] then ---O(1) writeln(number found); ---O(1) end T(n) is n*O(1) = O(n).
CS09 303 Data Structures - Module 1
n times
3. If-then-else statements if condition then sequence1; else sequence2; Either sequence1 or sequence2 will execute. Worst case time is the slowest of the 2 possibilities. T(n) = O(max[time(sequence1), time(sequence2)]) Suppose time(sequence1) = O(1) and time(sequence2) = O(n), then T(n) = O(n).
CS09 303 Data Structures - Module 1
Example: Algorithm to print first n numbers if choice = 1 else print wrong choice. Best case: choice not equal to 1. Worst case: choice = 1. All n numbers are printed. if choice = 1 then for i:=1 to n do begin if num = arr[i] then ---O(1) n times writeln(number found); ---O(1) end else writeln(Wrong choice!); ---O(1) T(n) is max(O(n), O(1)) = O(n).
CS09 303 Data Structures - Module 1
4. Nested for loop a)Quadratic time O(n2) for i:=1 to m do begin for j:=1 to n do begin sequence1; end end To find time complexity of an algorithm with loops, count the total number of iterations. Worst case: Inner loop executes n times and outer loop executes m times.
CS09 303 Data Structures - Module 1
Here, each of the statements in the sequence is O(1). For each iteration of outer loop, no. of iterations of inner loop = n. Therefore, T(inner loop) = n*O(1) = O(n). No. of iterations of outer loop = m. Therefore, total no. of iterations of O(1) sequence = m*n. Or There are m repetitions of O(n) sequence. Hence, T(n) = m* O(n) = O(mn)
Example: Displaying a matrix for i:=1 to n do begin for j:=1 to n do begin writeln(arr[i,j]) ; ---O(1) end end T(n) is n*(n*O(1))) = O(n2).
n times n times
4. Nested for loop b) Cubic time O(n3) Matrix multiplication for i:=1 to n do begin for j:=1 to n do begin c[i,j] := 0; for k:=1 to n do c[i,j] := c[i,j] + a[i,k]*b[k,j]; end To find time complexity of an algorithm with loops, count the total number of iterations.
Here, c[i,j] := c[i,j] + a[i,k]*b[k,j]; is O(1). The innermost loop executes n times. ie; There are n repetitions of O(1) sequence. Therefore, T(innermost loop) = n*O(1) = O(n). No. of iterations of loop with index j = n. Therefore, total no. of iterations of O(1) sequence = n*n = n2. Or There are n repetitions of O(n) sequence. Outer loop executes n times. Therefore, total no. of iterations of O(1) sequence = n2 *n = n3. Or There are n repetitions of O(n2) sequence. Hence, T(n) is O(n3). The algorithm takes Cubic time.
CS09 303 Data Structures - Module 1
5. Nested loop where inner loop index depends on outer loop index. for i:=1 to m do begin for j:= i+1 to n do sequence1;
Value of i 1 2 : n-2 n-1 No. of iterations of inner loop n-1 n-2 : 2 1
Nested loop followed by non-nested loop for i:=1 to n do for j:= 1 to n do sequence1; for k:=1 to n do sequence2; Complexity of first loop is O(n2) and second is O(n). By sum rule, runtime of the whole sequence is T(n) is O(max(n2,n)) = O(n2).
Thus, Time taken by loop body for each iteration is O(1). Time spent in the loop of line (2) - (6) = O((n-i)* 1) = O(n-i). Statement (1) is executed (n-1) times. So the total running time of the program is bounded above by some constant times.
n-1 i=1
(n-i) = (n-1)(n-2)(n-3)(1)
Most Familiar Big-Oh notations (Orders of growth in increasing order) (1) (2) (3) (4) (5) (6) (7) (8) Constant time Logarithmic Time Linear time n log n Quadratic time Cubic time Polynomial time Exponential Time O(1) O(logn) O(n) O(nlogn) O(n2) O(n3) O(nk) O(kn)
function fact (n : integer):integer; {fact(n) computes n!} begin (1) if n<=1 then (2) fact:=1; else (3) fact:=n*fact(n-1); end; {fact}
Sample Data fact(3)=3 *fact(2) fact(2)=2*fact(1) fact(1)=1*fact(0) fact(0)=1 fact(1)=1 fact(2)=2 fact(3)=6
Time Complexity
O(1) O(1)
O(1) + T(n-1)
Computing total running time of factorial procedure T(n) = c+ T(n-1) if n>1 (1) =d if n<=1 Hence, T(n-1) = c + T((n-1)-1) = c + T(n-2) Substitute (2) in (1) T(n) = c + [c + T(n-2)] T(n) = 2c + T(n-2) Similarly, T(n-2) = c + T(n-3)
(2)
(3) (4)
Substitute (4) in (3) Thus, T(n) = 3c + T(n-3) Or, For n>k, T(n) = k.c + T(n-k) Finally, when k = n-1, T(n) = c(n-1) + T(1) = c(n-1) + d = cn c + d Therefore ,T(n) is O(n).
if n>3 .(5)
Solution
Move top (n-1) disks from A to B.
1 = O(2N).
Each time we increment N, we double the amount of work. This grows incredibly fast!
CS09 303 Data Structures - Module 1
Deriving Recurrence Relation function THanoi(n, A, B, C): THanoi (n-1, A, C, B); Move larger disc from A to C; THanoi (n-1, B, A, C); T(n) = T (n-1) + O(1) + T(n-1) = 2T(n-1) + O(1) //A to B using C //B to C using A
T(n) = 2T(n-1) + O(1) = 2[2T(n-2) + O(1)] + O(1) = 2[2 [2T(n-3) + O(1)] + O(1)] + O(1) =23 T(n-3) + 22 O(1) + 2O(1) + O(1) : = 2n T(0) + (2n -1)O(1) Therefore, T(n) is O(2n). The algorithm takes exponential time.
Additional Points
Space complexity Recursive algorithm(Fibonacci) Steps in recursion Recursion Vs Iteration Disadvantages of recursion
Thank You