Data Structure - ch2 PDF
Data Structure - ch2 PDF
Processing
Bhargavi H. Goswami
Assistant Professor
Sunshine Group of Institutions
INTRODUCTION:
Which operation is frequently used by a
Language Processor?
Ans: Search.
This makes the design of data structures a
crucial issue in language processing activities.
In this chapter we shall discuss the data
structure requirements of LP and suggest
efficient data structure to meet there
requirements.
Linear DS
Linear data structure consist of a linear arrangement of
elements in the memory.
Advantage: Facilitates Efficient Search.
Dis-Advantage: Require a contagious area of memory.
Do u consider it a problem? Yes or No?
What the problem is?
Size of a data structure is difficult to predict.
So designer is forced to overestimate the memory
requirements of a linear DS to ensure that it does not outgrow
the allocated memory.
Disadvantage: Wastage Of Memory.
Non Linear DS
Overcomes the disadvantage of Linear DS.
HOW?
Elements of Non Linear DS are accessed using
pointers.
Hence the elements need not occupy
contiguous areas of memory.
Disadvantage: Non Linear DS leads to lower
search efficiency.
Linear
Non Linear
C:
var p :
integer;
begin
new (p);
float *ptr;
ptr = (float*)calloc(5,sizeof(float));
1.
2.
3.
4.
5.
Symbol
Class
Type
Length
Dimension
Information
6. Parameter List
Address
10
7. No. of
Parameters
8. Type of returned
value
9. Length of
returned value
10.Statement
number.
1. Name
2. Class
3. Statement Number
When class = label, all fields excepting name, class and statement
number are redundant.
Here, Search method may require knowledge of length of entry.
So the record would contain following fields:
1. A length field
2. Fields in fixed part including tag field
3. { fj | fj SFVj if tag = Vj }
length
entry
Pointer
length
Variable Part
Allocation DS
Table Organization:
Entries of table occupy adjoining areas of memory. Adjoining
areas here means previous entry and next entry.
Positional Determinacy: Tables using fixed length entry
organization possess this property. This property states that
the address of an entry in a table can be determined from its
entry number.
Eg: Address of the eth entry is
a + (e 1). L
a : address of first entry.
L : length of an entry.
e : entry number.
Use of Positional Determinacy:
Representation of symbols by e
Entry number in the search structure
Intermediate code generated by LP
#1
#2
Occupied Entries
#f
Free Entries
#n
Active/ Deleted
Symbol
Other Info
Other Info
Pointer
Binary Trees
Each node in the tree is a symbol entry with two pointer fields
i.e Left Pointer and Right Pointer.
Algorithm 2.4 (Binary Tree Search)
1. current_node_pointer := address of root
2. if s = (current_node_pointer)*.symbol then exit with success;
3. if s<(current_node_pointer)*.symbol then
current_node_pointer:= (current_node_pointer) *. left_pointer;
else current_node_pointer:= (current_node_pointer) *. right_pointer;
4. if current_node_pointer := nill then
exit with failure.
else goto step 2.
When can we obtain best search performance?
Ans: when the tree is balanced.
When the search performance is worst?
Ans: when tree degenerates to linked list and performance becomes similar to
sequential search.
Example: p,c,t,f,h,k,e.
After Rebalancing:
c, e, f, h, k, p, t
p
t
f
c
k
Eg: personal_info :
record
name : array[1..10] of char;
gender : char;
id: int;
end;
Name
Field List
personal info
Next Field
name
gender
id
Stacks
Is a linear data structure which satisfies following
properties:
1. Allocation and de-allocation are performed in a LIFO manner.
2. Only last element is accessible at any time.
SB
TOS
10
SB
10
SB
10
20
20
20
30
30
30
40
40
50
50
TOS
60
TOS
40
TOS
SB
SB
SB
RB
TOS
SB,RB
TOS
RB
TOS
Allocation
1. TOS := TOS + 1;
2. TOS* := RB;
3. RB := TOS;
4. TOS := TOS + n;
The first statement increments TOS by one stack entry.
Now TOS points to reserved pointer of new record.
2nd statement deposits address of previous record base into
reserved pointer.
3rd statement sets RB to point at first stack entry in the new
record.
4th statement performs allocation of n stack entries to the new
entity. See fig 2 in previous slide.
The newly created entity now occupies the address <RB> + l to
<RB> + l x n.
RB stands for contents of Record in RB.
De-Allocation
1. TOS := RB 1;
2. RB := RB*;
1st statement pops a record off the stack by
resetting TOS to the value it had before the
record was allocated.
2nd statement points RB to base of the previous
record.
That was all about allocation and de-allocation in
extended stack model.
Now let us see an implementation of this model
in a Pascal program that contains nested
procedures where many symbol table must coexist during compilation.
SB
sample
x
y
i
RB
calc
a
b
TOS
sum
Heaps
Non Linear Data Structure
Permits allocation and de-allocation of entities in
random order.
Heaps DS does not provide any specific means to
access an allocated entity.
Hence, allocation request returns pointer to allocated
area in heap.
Similarly, de-allocation request must present a pointer
to area to be de-allocated.
So, it is assumed that each user of an allocated entity
maintains a pointer to the memory area allocated to
the entity.
Lets take the example to clarify more what we talked.
--
intprt
Memory Management
We have seen how holes are developed in
memory due to allocation and de-allocation in
the heap.
This creates requirement of memory
management that identifies free memory
areas and reusing them while making fresh
allocation.
Performance criteria for memory
management would be
Speed of allocation / de-allocation
Efficiency of memory utilization
Reference Counts
In this technique system associates a reference
count with each memory area to indicate the
number of its active users.
The number is incremented when new user gains
access to that area.
And the number is decremented when user
finishes using it.
The area is knows to be free when its reference
count drops to zero.
Advantage: Simple to implement.
Disadvantage: Incurs Overheads at every allocation
and de-allocation.
Garbage Collection
Garbage collection makes two passes over
memory to identify unused areas.
1st Pass: It traverses all pointers pointing to
allocated areas and marks the memory areas
which are in use.
2nd Pass: Finds all unmarked areas and declare
them to be free.
Advantage: Doesnt incur incremental
overhead.
Disadvantage: Incurred only when system
runs out of free memory to allocate to fresh
request, resulting to delayed performance.
Memory Compaction:
To manage the reuse of free memory, perform memory compaction to
combine these free list areas into single free area.
Green box indicates allocated area and Maroon box indicates de-allocated
area which later gets converted to n free lists in second fig and at last
compacting memory to single free list.
First word of this area contains a count of words in area and second is next
pointer which may be NULL.
a
x
b
c
y
d
z
e
a
b
c
d
e
b
c
--
d
e
--
Reuse of Memory:
After memory compaction, fresh allocation
can be made on free block of memory.
Free area descriptor and count of words in
free area are updated.
When a free list is used, two techniques
can be used to perform a fresh allocation:
1. First Fit Technique
2. Best Fit Technique
Best Fit