Introduction Distributed Syetem and Principles For Distributed System
Introduction Distributed Syetem and Principles For Distributed System
Introduction Distributed Syetem and Principles For Distributed System
1.1 Introduction
1.1.1 Goal of Distributed System
1.1.2 Types of Distributed System
1.1.3 Characterization of DS
1.2 Architecture
1.3 Naming
1.4 Process
1.5 Communication
1.6 Synchronization
1.7 Fault Tolerance
Faizur Rashid (PhD)
1
1.1. Introduction and Definition
• before the mid-80s, computers were
very expensive (hundred of thousands or even millions of dollars)
very slow (a few thousand instructions per second)
not connected among themselves
after the mid-80s: two major developments
cheap and powerful microprocessor-based computers appeared
computer networks
LANs at speeds ranging from 10 to 1000 Mbps (now even
10Gbps)
WANs at speed ranging from 64 Kbps to gigabits/sec
• Consequence
feasibility of using a large network of computers to
work for the same application; this is in contrast to the
old centralized systems where there was a single
computer with its peripherals 2
1.1. Introduction and Definition
• Definition of a Distributed System
3
1.1. Introduction and Definition
Other Definition
• A distributed system is a system designed to support the
development of applications and services which can
exploit a physical architecture consisting of multiple,
autonomous processing elements that do not share
primary memory but cooperate by sending
asynchronous messages over a communication network
(Blair & Stefani)
4
1.1. Introduction and Definition
• Why Distributed?
• Resource and Data Sharing
printers, databases, multimedia servers, ...
• Availability, Reliability
the loss of some instances can be hidden
• Scalability, Extensibility
the system grows with demand (e.g., extra servers)
• Performance
huge power (CPU, memory, ...) available
• Inherent distribution, communication
organizational distribution, e-mail, video
5
1.1. Introduction and Definition
• Characteristics of Distributed Systems
• differences between the computers and the ways they
communicate are hidden from users
• users and applications can interact with a distributed
system in a consistent and uniform way regardless of
location
• distributed systems should be easy to expand and scale
• a distributed system is normally continuously available,
• even if there may be partial failures
6
1.1.1. Goals of a Distributed System
• to support heterogeneous computers and networks and to provide a
single-system view, a distributed system is often organized by
means of a layer of software called middleware that extends over
multiple machines
10
1.1.1. Goals of a Distributed System
• Scalability in Distributed Systems
• a distributed system should be scalable; there are three dimensions
size: adding more users and resources to the system
geographically: users and resources may be far apart
administratively: should be easy to manage even if it spans many
administrative organizations
11
1.1.1. Goals of a Distributed System
• Concept Example
• Centralized services Single server for all users-mostly for
security reasons
• Centralized data A single on-line telephone book
• Centralized algorithms Doing routing based on complete
information
• Scaling Techniques: how to solve scaling problems
the problem is mainly performance, and arises as a result of
limitations in the capacity of servers and networks (for
geographical scalability with high latency and mostly unreliable
links)
three possible solutions: hiding communication latencies,
distribution, and replication
12
1.1.1. Goals of a Distributed System
13
1.1.2. Types of Distributed System
Three types: distributed computing systems, distributed information
systems, and distributed pervasive/embedded systems
1.Distributed Computing Systems
-Used for high-performance computing tasks
-two types: cluster computing and grid computing
-Cluster Computing
a collection of similar workstations or PCs (homogeneous),
closely connected by means of a high-speed LAN
each node runs the same operating system
used for parallel programming in which a single compute intensive
program is run in parallel on multiple machines
14
1.1.2. Types of Distributed System
Price
delivery date
quality etc.
• until the deal is concluded they can continue negotiating
or one of them can terminate
• but once they have reached an agreement they are
bound by law to carry out their part of the deal
• transactions between processes is similar with this
17
scenario
1.1.2. Types of Distributed System
• e.g., assume the following banking operation
withdraw an amount x from account 1
deposit the amount x to account 2
• what happens if there is a problem after the first activity is carried
out
• group the two operations into one transaction; either both are
carried out or neither
• we need a way to roll back when a transaction is not completed
18
2.1. Architecture
2.1 Architectural Styles
• refers to the logical organization of distributed systems
into software components
• a component is a modular unit with well-defined,
required and provided interfaces that is replaceable
within its environment; can be replaced provided that we
respect its interfaces
• a connector is a mechanism that mediates
communication, coordination, or cooperation among
components, e.g., facilities for RPC, message passing, or
streaming multimedia data
• there are various architectural styles
Layered architectures
Object-based architectures
Data-centered architectures
19
Event-based architectures
1.2. Architecture
• Layered architectures
• components are organized in a layered fashion where a component at layer
Li is allowed to call components at the underlying layer Li-1, but not the
other way around;
• requests go down the hierarchy and results flow Upward
• e.g., network layers
22
the event-based architectural style
2.2.1. Centralized Architectures
• a server may sometimes act as a client leading to a
physically three-tiered architecture; an example is the
organization of Web sites
23
1.3 Naming
24
1.3.1. Introduction
• names play an important role to:
share resources
uniquely identify entities
refer to locations
etc.
• an important issue is that a name can be resolved to the entity it
refers to
• to resolve names, it is necessary to implement a naming system
• in a distributed system, the implementation of a naming system is
itself often distributed.
• Efficiency and scalability of the naming system are the main issues
25
1.3.1 Names, Identifiers, and Addresses
• a name in a distributed system is a string of bits or
characters that is used to refer to an entity
• an entity is anything; e.g., resources such as hosts,
printers, disks, files, objects, processes, users, Web
pages, newsgroups, mailboxes, network connections, ...
• entities can be operated on
e.g., a resource such as a printer offers an interface
containing operations for printing a document,
requesting the status of a job, etc.
a network connection may provide operations for
sending and receiving data, setting quality of service
parameters, etc.
• to operate on an entity, it is necessary to access it through its
access point, itself an entity (special) 26
1.3.1 Names, Identifiers, and Addresses
Access point
• the name of an access point is called an address(such as
IP address and port number as used by the transport
layer)
• the address of the access point of an entity is also
referred to as the address of the entity
• an entity can have more than one access point (similar to
accessing an individual through different telephone
numbers)
• an entity may change its access point in the course of
time (e.g., a mobile computer getting a new IP address as
it moves)
27
1.3.1 Names, Identifiers, and Addresses
Examples
• name of an FTP server (entity)
URL of the FTP server
• address of the FTP server
IP number: port number
• the address of the FTP server may change
• there are three classes on naming systems: flat naming, structured
naming, and attribute-based naming
28
1.3.2 Flat Naming
• a name is a sequence of characters without structure;
like human names? may be if it is not an Ethiopian name
• difficult to be used in a large system since it must be
centrally controlled to avoid duplication
• moreover, it does not contain any information on how to
locate the access point of its associated entity
• how are flat names resolved (or how to locate an entity
when a flat name is given)
name resolution: mapping a name to an address or an
address to a name is called name-address resolution
possible solutions: simple solutions, home-based
approaches, and hierarchical approaches
29
1.3.2 Flat Naming
1.Simple Solutions
two solutions (for LANs only): broadcasting and Multicasting,
and Forwarding Pointers
a. Broadcasting and Multicasting
broadcast a message containing the identifier of an entity; only
machines that can offer an access point for the entity send a
reply
e.g., ARP (Address Resolution Protocol) in the Internet to find the
data link address (MAC address) of a machine
a computer that wants to access another computer for which it
knows its IP address broadcasts this address
b. Forwarding Pointers
how to look for mobile entities
when an entity moves from A to B, it leaves
behind a reference to its new location 30
1.3.2 Flat Naming
2.Home-Based Approaches
• Broadcasting and multicasting have scalability problems;
performance and broken links are problems in forwarding pointers
• a home location keeps track of the current location of an entity;
often it is the place where an entity was created
31
1.3.2 Flat Naming
problems:
• creates communication latency (Triangle routing: correspondent-
home network-mobile)
• the home location must always exist;
• the host is unreachable if the home does no more exist
(permanently changed);
• the solution is to register the home at a traditional name service and
let a client first look up the location of the home
32
1.3.2 Flat Naming
3.Hierarchical Approaches
• a generalization of the two-tiered approach into multiple layers
• a network is divided into a collection of domains, similar to DNS
• a single top-level domain spans the entire network
• each domain can be subdivided into multiple, smaller domains
• the lowest-level domain is called a leaf domain; typically a LAN
• each domain D has an associated directory node dir(D) that keeps
track of the entities in that domain leading to a tree of directory
nodes
• the root (directory) node knows about all entities
33
1.3.2 Flat Naming
34
1.3.2 Flat Naming
• each entity is represented by a location record in the directory node
dir(D) to keep track of its whereabouts
• a location record for an entity in a leaf domain contains the entity’s
current address; all other high-level domains will have only pointers
to this address; this means the root node will store only pointers to
all entities
• an entity may have multiple addresses, for instance, if it is
replicated; a higher level domain containing the two sub domains
where the entity has addresses will have two pointers
35
1.3.2 Flat Naming
40
1.3.3 Structured Naming
• Symbolic link: representing an entity by a leaf node and
instead of storing the address or state of the entity, the
node stores an absolute path name
42
1.4. Process (Threads)
• Threads and their Implementation
• how are processes and threads related?
• Process tables or PCBs are used to keep track of
processes
• there are usually many processes executing
concurrently
• processes should not interfere with each other; sharing
resources by processes is transparent
• this concurrency transparency has a high price;
allocating resources for a new process and context
switching take time
• a thread also executes independently from other
threads; but no need of a high degree of concurrency
transparency thereby resulting in better performance
43
1.4. Process (Threads)
• threads can be used in both distributed and non-
distributed systems
• Threads in Non-distributed Systems
a process has an address space(containing program text and
data) and a single thread of control, as well as other resources
such as open files, child processes, accounting information, etc.
three processes each with one thread one process with three threads
44
1.4. Process (Threads)
• Threads take turns in running
• Threads allow multiple executions to take place in the same process environment,
called multithreading
• Thread Usage –Why do we need threads?
e.g., a word processor has different parts for
interacting with the user
formatting the page as soon as changes are made
timed savings (for auto recovery)
spelling and grammar checking, etc.
1.Simplifying the programming model: since many activities are going on at once more
or less independently
2.They are easier to create and destroy than processes since they do not have any
resources attached to them
3.Performance improves by overlapping activities if there is too much I/O; i.e., to avoid
blocking when waiting for input or doing calculations, say in a spreadsheet
4. Real parallelism is possible in a multiprocessor system
45
1.4. Process (Threads)
• Thread Implementation
• threads are usually provided in the form of a thread package
• the package contains operations to create and destroy a thread,
a. construct a thread library that is executed entirely in user mode(the
OS is not aware of threads)
cheap to create and destroy threads; just allocate and free memory
context switching can be done using few instructions; store and reload
only CPU register values
disadvantage: invocation of a blocking system call will block the entire
process to which the thread belongs and all other threads in that
process
b. implement them in the OS’skernel
let the kernel be aware of threads and schedule them
• expensive for thread operations such as creation and deletion since each requires a
system call
46
1.4. Process (Threads)
• Threads in Distributed Systems
• Multithreaded Clients
consider a Web browser; fetching different parts of a page can be
implemented as a separate thread, each opening its own TCP
connection to the server
each can display the results as it gets its part of the page
parallelism can also be achieved for replicated servers since
each thread request can be forwarded to separate replicas
• Multithreaded Servers
servers can be constructed in three ways
a. single-threaded process
it gets a request, examines it, carries it out to completion before getting
48
a multithreaded server organized in a dispatcher/worker model
1.4.1. Anatomy of a Client
• Two issues: user interfaces and client-side software for distribution
transparency
a. User Interfaces
to create a convenient environment for the interaction of a human
user and a remote server;
e.g. mobile phones with simple displays and a set of keys
GUIs are most commonly used
49
1.4.1. Anatomy of a Client
50
1.4.1. Anatomy of a Client
proxy can send requests to each replica and a client side software
can transparently collect all responses and passes a single return
value to the client application
51
1.4.1. Anatomy of a Client
52
1.4.2. Server
3.3.1 General Design Issues
• How to organize servers?
• Where do clients contact a server?
• Whether and how a server can be interrupted
• Whether or not the server is stateless
a. How to organize servers?
• Iterative server
the server itself handles the request and returns the result
• Concurrent server
it passes a request to a separate process or thread and waits for
the next incoming request; e.g., a multithreaded server; or by
forking a new process as is done in Unix
53
1.4.2. Server
b.Where do clients contact a server?
• using end points or ports at the machine where the server is running
where each server listens to a specific endpoint
• how do clients know the endpoint of a service?
globally assign endpoints for well-known services; e.g. FTP is on
TCP port 21, HTTP is on TCP port 80
for services that do not require pre-assigned endpoints, it can be
dynamically assigned by the local OS
IANA (Internet Assigned Numbers Authority) Ranges
IANA divided the port numbers into three ranges
55
1.4.2. Server
c. Whether and how a server can be interrupted
• for instance, a user may want to interrupt a file transfer, may be it
was the wrong file
• let the client exit the client application; this will break the connection
to the server; the server will tear down the connection assuming
that the client had crashed
d. Whether or not the server is stateless
• a stateless server does not keep information on the state of its
clients; for instance a Web server
• soft state: a server promises to maintain state for a limited time;
e.g., to keep a client informed about updates; after the time expires,
the client has to poll
56
1.4.2. Server
58
1.4.3 Code Migration
• so far, communication was concerned on passing data
• we may pass programs, even while running and in heterogeneous
systems
• code migration also involves moving data as well: when a program
migrates while running, its status, pending signals, and other
environment variables such as the stack and the program counter
also have to be moved
59
1.4.3 Code Migration
• Reasons for Migrating Code
• to improve performance; move processes from heavily-loaded to
lightly-loaded machines (load balancing)
• to reduce communication: move a client application that performs
many database operations to a server if the database resides on the
server; then send only results to the client
• to exploit parallelism (for nonparallel programs): e.g., copies of a
mobile program (called a mobile agent or a crawler as is called in
search engines) moving from site to site searching the Web
60
1.5 Communication
64
1.5 Network Protocols and Standards
65
1.5 Network Protocols and Standards
b.The TCP/IP Reference Model
• TCP/IP -Transmission Control Protocol/Internet Protocol
• used by ARPANET and its successor the Internet
• design goals
the ability to connect multiple networks (internetworking) in a
seamless way
the network should be able to survive loss of subnet hardware,
i.e., the connection must remain intact as long as the source and
destination machines are properly functioning
flexible architecture to accommodate requirements of different
applications -ranging from transferring files to real-time speech
transmission
• has 4 (or 5 depending on how you see it) layers: Application,
Transport, Internet (Internetwork), Host-to-network (some split it into
Physical and Data Link)
66
1.5 Network Protocols and Standards
• OSI and TCP/IP Layers Correspondence
67
1.5 Network Protocols and Standards
Middleware Protocols
• a middleware is an application that contains general-purpose
protocols to provide services
• example of middleware services
authentication and authorization services
distributed transactions (commit protocols; locking mechanisms)
middleware communication protocols (calling a procedure or
invoking an object remotely, synchronizing streams for real-time
data, multicast services)
• hence an adapted reference model for networked communications is
required
68
1.5.2 Remote Procedure Call
69
1.5.2 Remote Procedure Call
70
1.5.3 Types of Communication
Message-Oriented Communication (JAVA)
• RPCs and RMIs are not adequate for all distributed system
applications
• the provision of access transparency may be good but they have
semantics that is not adequate for all applications
• example problems
• they assume that the receiving side is running at the time of
communication
• a client is blocked until its request has been processed
• messaging is the solution
71
1.5.3 Types of Communication
• communication can be
persistent or transient
asynchronous or synchronous
• persistent: a message that has been submitted for transmission is
stored by the communication system as long as it takes to deliver it
to the receiver
e.g., e-mail delivery, snail mail delivery
• transient: a message that has been submitted for transmission is
stored by the communication system only as long as the sending
and receiving applications are executing
• asynchronous: a sender continues immediately after it has
submitted its message for transmission
• synchronous: the sender is blocked until its message is
stored in a local buffer at the receiving host or delivered to the
receiver
72
1.5.3 Types of Communication
Stream Oriented Communication
• until now, we focused on exchanging independent and complete
units of information
• time has no effect on correctness; a system can be slow or fast
• however, there are communications where time has a critical role
Multimedia
• media
storage, transmission, interchange, presentation, representation
and perception of different data types
text, graphics, images, voice, audio, video, animation, ...
movie: video + audio + …
• multimedia: handling of a variety of representation media
• end user pull
information overload and starvation
• technology push
emerging technology to integrate media 73
1.5.3 Types of Communication
The Challenge
• new applications
multimedia will be pervasive in few years (as graphics)
• continuous delivery
e.g., 30 frames/s (NTSC), 25 frames/s (PAL) for video
guaranteed Quality of Service admission control
• storage and transmission
e.g., 2 hours uncompressed HDTV (1920×1080) movie: 1.12 TB
(1920×1080x3x25x60x60x2)
videos are extremely large, even after compressed (actually
encoded)
• search
can we look at 100…videos to find the proper one?
74
1.6 Synchronization
75
1.6. Synchronization
76
1.6. Clock Synchronization
• in centralized systems, time can be unambiguously
decided by a system call
• e.g., process A at time t1gets the time, say tA, and
process b at time t2, where t1< t2, gets the time, say tB
then tA is always less than (possibly equal to but never
greater than) tB
• achieving agreement on time in distributed systems is
difficult
• e.g., consider the make program on a UNIX machine
• a large program is usually split up into several source
files
• make compiles only source files for which the time of
their last update was later than the existing object file 77
1.6. Clock Synchronization
when each machine has its own clock, an event that occurred after
another event may nevertheless be assigned an earlier time
78
1.7. Fault Tolerance
1.7. Fault Tolerance (Challenges)
•Heterogeneity The Internet enables users to access services and run
applications over a heterogeneous collection of computers and networks.
Heterogeneity (that is, variety and difference) applies to all of the following:
•networks;
•computer hardware;
•operating systems;
•Programming languages;
•implementations by different developers.
79
1.7. Fault Tolerance
1.7. Challenges
•Although the Internet consists of many different sorts of network their
differences are masked by the fact that all of the computers attached to them
use the Internet protocols to communicate with one another.
•For example, a computer attached to an Ethernet has an implementation of
the Internet protocols over the Ethernet, whereas a computer on a different
sort of network will need an implementation of the Internet protocols for that
network.
•Data types such as integers may be represented in different ways on
different sorts of hardware – for example, there are two alternatives for the
byte ordering of integers. These differences in representation must be dealt
with if messages are to be exchanged between programs running on
different hardware.
•Different programming languages use different representations for
characters and data structures such as arrays and records. These
differences must be addressed if programs written in different languages are
to be able to communicate with one another. 80
1.7. Fault Tolerance
1.7. Challenges
Openness: The openness of a computer system is the characteristic that
determines whether the system can be extended and reimplemented in
various ways. The openness of distributed systems is determined primarily
by the degree to which new resource-sharing services can be added and be
made available for use by a variety of client programs.
•Openness cannot be achieved unless the specification and documentation
of the key software interfaces of the components of a system are made
available to software developers.
Security: Many of the information resources that are made available and
maintained in distributed systems have a high intrinsic value to their users.
•Their security is therefore of considerable importance. Security for
information resources has three components: confidentiality (protection
against disclosure to unauthorized individuals), integrity (protection against
alteration or corruption), and availability (protection against interference with
the means to access the resources).
81
1.7. Fault Tolerance
1.7. Challenges
•In a distributed system, clients send requests to access data managed by
servers, which involves sending information in messages over a network.
For example:
1.A doctor might request access to hospital patient data or send additions to
that data.
2.In electronic commerce and banking, users send their credit card numbers
across the Internet.
•In both examples, the challenge is to send sensitive information in a
message over a network in a secure manner.
Scalability: Distributed systems operate effectively and efficiently at many
different scales, ranging from a small intranet to the Internet.
•A system is described as scalable if it will remain effective when there is a
significant increase in the number of resources and the number of users.
82
1.7. Fault Tolerance
1.7. Challenges
•The design of scalable distributed systems presents the following
challenges:
Controlling the cost of physical resources:
Controlling the performance loss:
Preventing software resources running out:
Avoiding performance bottlenecks:
Failure handling: Failures in a distributed system are partial – that is,
some components fail while others continue to function. Therefore the
handling of failures is particularly difficult.
Detecting failures: Some failures can be detected. For example, checksums can be
used to detect corrupted data in a message or a file.
Masking failures: Some failures that have been detected can be hidden or made
less severe. Two examples of hiding failures:
1. Messages can be retransmitted when they fail to arrive.
83
2. File data can be written to a pair of disks so that if one is corrupted, the other
1.7. Fault Tolerance
1.7. Challenges
•Tolerating failures: Most of the services in the Internet do exhibit failures –
it would not be practical for them to attempt to detect and hide all of the
failures that might occur in such a large network with so many components.
•Recovery from failures: Recovery involves the design of software so that the state
of permanent data can be recovered or ‘rolled back’ after a server has crashed .
Concurrency :The process that manages a shared resource could take
one client request at a time. But that approach limits throughput. Therefore
services and applications generally allow multiple client requests to be
processed concurrently.
•For example, if two concurrent bids at an auction are ‘Smith: $122’ and
‘Jones: $111’, and the corresponding operations are interleaved without any
control, then they might get stored as ‘Smith: $111’ and ‘Jones: $122’.
•Transparency
84
Assignment #1
Q1. Use the World Wide Web as an example to illustrate the
concept of resource sharing, client and server. What are
the advantages and disadvantages of HTML, URLs
and HTTP as core technologies for information browsing? Are
any of these technologies suitable as a basis for client-server
computing in general?
Q2. Describe Trend, Focus, and Challenges in context to
World Wide Web.
Q3. List the three main software components that may fail
when a client process invokes a method in a server object,
giving an example of a failure in each case. Suggest how
the components can be made to tolerate one another’s
failures.
85
Quiz #1
• Q1. Q1. Tick out () all types of wireless technologies.
i) WiMax ii) WiFi iii) Bluetooth
iv) Hotspot v) Television & Radio broadcasting
vi) Sattelite communication
vii) Infrared Communication viii) GPS ix) RFID
• Q2. Separate algorithm is needed to handle the locking of data and transaction, says.
• Separate transaction
• Distributed transaction
• Flat Transaction
• Top level Transaction
• Q3. Tick all the properties that stands the meaning of ACID in distributed transaction.
• Atomic ii) Serializable iii) Consistent
• iv)Durable v) Isolated vi) Correlate
• vii) Dynamic viii) Automatic
86
• Q4. Which Properties are