Distributed Systems - Concepts and Design

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

Distributed Garbage Collection

An Overview

Presented by
Dotan Adler

Copyright, 1996 © Dale Carnegie & Associates, Inc.


Representation Outline

• Present the need for Distributed GC


• Present the Distributed Model
• Present the problem of Distributed GC
• Present some outlines of solutions to the
problem (one Direct, and two Indirect)
• Conclusion

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 2


The need for a Distributed GC

• Where do we need Distributed GC ?


– Distributed applications (DCOM, CORBA, etc ..)
– Internet applications (Java extensions)
– Distributed file servers
– HTML pages & links

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 3


The Distributed Computing Model
A Distributed System is a set of Autonomous
systems connected by a network.

IBM Compatible

Cray Supercomputer
Communication
Network
Mac Classic

IBM PS/2
Laptop computer

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 4


The Classical GC Problem

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 5


Distributed Garbage Collection

• Each computer in a distributed system has :


– A local store (memory or disk)
– A stack
– Local running programs (or RPCs)
– A GC algorithm (a part of a global GC algorithm)
• Each computer has access only to local store
• Access to remote store is achieved by message
passing
‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 6
Distributed Garbage Collection

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 7


Distributed Garbage Collection

• Because there is no global address space,


references to remote cells are necessarily
indirect

A Entry Communication Exit B


Recotd Network Recotd

a b

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 8


Problems With Distributed GC

• Unreliability (Relaxed)
– Duplications of messages
– Out of order delivery
– Communication fail
• Latency - the elapsed time between the issue of a
task and when it is executed is undeterministic
• Synchronization - it takes a lot of resources to
synchronize all computers in a network

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 9


Direct Approach

• Direct approach means to identify the garbage


and mark it, so that it could be removed
• Reference counting is a direct GC approach
since it marks garbage with count equals to 0

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 10


Reference Counting

• Every object has a reference count


• Inc(@a) message increases a’s count
• Dec(@a) decreases the count
• “Inc”, and “Dec” messages
are sent to the CPU which
hosts the object inc(@b)

• Deletion occurs when cp(@b)

reference equals 0
A B C
‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 11
Reference Counting (Cont’d)

• Problem : If A or C were responsible for


incrementing b’s count, then latency is an
important factor
A B
@b inc(@b)
b del(@b)
inc(@b)
del(@b)
C
cp(@b) cp(@b)
@b

A B C A B C

A is responsible for inc C is responsible for inc

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 12


Reference Counting (Cont’d)

• Solution : When A duplicates @b, it first sends


an ack-request to B and a copy message to C
• B will not accept any
request after it gets an ack(@b)
ack_req until it sends
an ack to C cp(@b)
• C will not use @b until
it get ack(@b) ack_req(@b,C)

A B C

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 13


Reference Counting (Cont’d)

• Problems
– Does not destroy inter-site cyclic references
– Relies on blocking of operations

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 14


Indirect Approach

• Indirect approach means to identify all the live


objects first, and then reclaim all space not used
by “live” objects
• Examples of indirect GC algorithms :
– Mark-scan
– Generation scavenging
– Dijkstra’s 3-color algorithm

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 15


Centralized Indirect Solution

• Let every site run a GC locally on it’s private


memory.

RCT = remote cell table ERT = exit reference table

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 16


Centralized Indirect Solution (Cont)

• Still we have problems with inter-site cycles.


• For this we add a centralized service :
– Whenever a local site finishes it’s GC it sends a list
of RCT to ERT paths, which are inaccessible from
the roots to the centralized service.
– The centralized service can then find dead inter-site
cycles, and report them to the sites.

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 17


Cell Migration
• We can solve the inter-site cycles problem by
migrating cells, so that inter-site cycles become
local cycles.
• We need to define an order, among the sites, so
that we won’t get a cyclic migration. This way a
cell migrates only to an inferior site.
A B A B A B
a c a c a b
b
b

C C C
c

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 18


Cell Migration

• Problems:
– Copy operations could be very slow - especially
when working with big networks, and big data
structures
– An order between sites must be set prior to the
running of the algorithm

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 19


Empirical Results

• Little work has been done in the field of DGC


because it is hard to compare the execution of
two distributed systems. (delay times,
synchronization, etc …)
• Tests, however, show that Direct methods give
better pause times than Indirect methods.

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 20


Future Directions of research

• Use some hybrid methods (Direct & Indirect)


• Try to create a framework for testing
Distributed GC algorithms

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 21


Conclusions

• Distributed GC adds a new dimesion to old-


style GC. It adds the dimension of asynchronity
& latency.
• Alltought standard GCs suffer from the same
problems (memory leaks, pause time), creating
a robust DGC is much more complicated.

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 22


The End

‫תשע"ט‬/‫ניסן‬/‫כ"ד‬ Distributed Garbage Collection 23

You might also like