An Approach To Decentralized Computer Systems: ..-,TANDEM
An Approach To Decentralized Computer Systems: ..-,TANDEM
An Approach To Decentralized Computer Systems: ..-,TANDEM
-,TANDEM
An Approach to Decentralized
Computer Systems
Jim Gray
An Approach to Decentralized
Computer Systems
Jim Gray
ABSTRACT
This paper begins with the rational for and against decentralization.
Then, a technical approach to decentralized systems is sketched. This
approach contrasts with the popular concept of a distributed
integrated database which transparently provides remote 10 against
single system image. Rather, it proposes that function be distributed
as "servers" which abstract data as high-level operations on objects
and communicate with "requestors" via a standard message protocol.
The requestor-server approach has the advantages of modularity and
performance.
This paper has been submitted for publication to the IEEE Transactions
on Software Engineering.
TABLE OF CONTENTS
1
These systems have the following common features:
* diverse organizations and organizational procedures,
* diverse computer architectures, both hardware and software,
* diverse terminal types,
* diverse system sizes, from tiny to large, and
* diverse site environments.
The thesis of this article is that these systems could have been built
as an "integrated database", but that such a system would be a
management nightmare. In reality, the parts of these systems are not
tightly integrated -- each part is generally quite different from the
others. It is "integrated" by having the parts agree to a common but
arms-length message protocol.
2
1.1. Why decentralize systems?
There is no "best" form of organization. Each form has its advantages
and disadvantages. The structure of computing is just one aspect of
the structure of an organization. Centralized organizations will
continue with centralized computing and decentralized organizations
will adopt appropriate degrees of decentralized computing [March].
3
real, most administrators also want operational control of their
computers and their databases. Increasingly, computer systems are
being designed to reflect the organization structure.
4
* Security: Some orgainzations feel more secure when they have
physical control over the site that stores their data. (I am
skeptical of this argument but it is frequently made).
5
1.2 Integrated system or integrated database
6
2. Technical aspects of decentralized systems
7
2.1 Computational model: objects-processes-messages
8
Object creation is accomplished by opening a session to a process
managing the object type. This session is then used to communicate the
that file. Sending messages via the session instructs the server
process to position within the file, and insert, update or extract
Because every object has a name, a process can address any object in
the network subject to authorization restrictions. This means that
the process is unaware of the physical location of the object. It can
write on any terminal, read or write any file, and address any other
Rather than the program having to OPEN, WRITE, READ and CLOSE the
(1) If the process managing the object is not OPEN, an open message is
sent to it.
(2) The following message is sent to the object manager:
9
«operation>,<object>,<pl>,<p2>, ••• ,<pn»
(3) The object manager performs the operation and replies with
message:
The message formats (2) and (3) above, describe the interface to the
This model is the basis for the Tandem system [Tandem], Argus System
[Liskov] and the R* System [Lindsay]. It also forms the basis for
IBM's SNA Logical Unit Type 6, which defines the types DL/1 Database,
Queue, Process, etc., and will grow to include many more types
10
2.2 Dictionary: naming, authorization and control
11
When an object name is presented to OPEN, it is looked up in the name
space. The prefix of the name designates the partition of the name
space, for example Lookup wA.B.C.D" might look for "B.C.D" in the name
space at node "A". The alias object type allows renaming of objects
or directories. Aliases are generally followed until a non-alias
object or error is encountered. Lookup returns a descriptor for an
whole file.
12
When a process tries to OPEN the object "Z", the name "Z" is looked up
in the dictionary to find its descriptor and access control list. The
name server first checks the access list to see if the requestor has
the authority to open the object. If not, the lookup signals a
security violation.
The issue of authenticating the requestor arises when the OPEN travels
across the network. At best, the server knows that requestor R at
node N made the request -- i.e. node N will vouch for R. Either of
the following two approaches deal with this: either the access list
can be structured as "ID at NODE" elements for the "who" fields or a
bidirectional password scheme can be required for remote requestors.
13
< FILE-A , INDEX-I >
are added.
14
2.3 Data Management
15
* Autonomy: allowing local control of each partition of the data.
16
substantial message delay cost. Other algorithms are known which
tradeoff data currency in exchange for better performance or
availability. Three kinds of replication are worth considering:
17
Each of these approaches to replication has its place. Notice that the
dictionary must use a current data algorithm for managing the
replication of object descriptions because its users may not be able
to tolerate inconsistencies.
18
2.3.2 Data manipulation
The DML compiler produces a plan to perform the desired operation. The
compiler is aware of the data distribution criterion and based on that
it picks a global plan which minimizes a cost function. It then sub-
contracts the access to the local data to the remote nodes. These
nodes develop their own plans for doing the data manipulation at their
nodes. If the data organization at a node changes, it recompiles its
plan without affecting the other nodes.
19
2.3.3 Data independence
record type with two others which can be joined together to form the
original. There are limits to what can be done here, but the basic
Unless the view is very simple, a one-to-one map from the base
the base files. For this reason, views are primarily used to subset a
record type to provide access control to data. They allow one to hide
data from programs unless they have a "need to know".
20
2.3.4 Database design
The first step is to instrument the system so that one can get
meaningful measurements of record statistics and activity. Given
these statistics and the response time requirements of the system, the
database design problem can be mechanized as follows: Since the DML
can predict the cost of a particular plan for a particular database
design, one can evaluate the cost of a particular database design on a
particular application workload -- it is the weighted sum of the costs
of the individual queries. By evaluating all possible designs, one
can find acceptable or least-cost designs. This search can be
mechanized.
21
2.4 Networking and terminal support
22
2.4.2 Terminal support -- terminal independence
The real revolution in data management has been ln the area of data
display, not data storage.
Information storage has not changed much in the last twenty years. We
had random access memories, discs and tapes then and we have them now.
The cost per byte has declined dramatically, but the logical interface
to storage has not changed much.
When discs get bigger or faster, the files grow and the programs
continue to work. When terminals change from card-readers to bitmap
displays the programs need to be overhauled. This overhaul is
expensive. Thus one frequently sees a desk with three or four
terminals on it, one for each application.
23
The solution to this problem is to provide a terminal-type independent
interface a virtual terminal. Programs read and write records from
a virtual terminal and the terminal handler formats these records into
screen images for that terminal type. Four pieces of information are
needed to do this:
* A detailed description of the terminal characteristics (e.g.
terminal type and options)
* An abstract description of the desired screen layout (e.g. field 3
of the record should be centered at the top of the screen).
* An abstract description of the desired record layout.
* A particular data record.
Given a data record, the display manager can produce a screen image.
If the record is empty, an empty template is displayed. When the user
fills in the template, the display manager uses the inputs to create a
record that the program can read.
24
2.4.3 Network management
Each node is a cost and profit center and so can manage itself. But
the network is a corporate resource which carries traffic between
organizations. The management of this resource, both capacity
planning and day to day operations is best done by a central authority
with a global view.
25
2.5 Transaction management
26
transaction. Such assumptions are a liability when operating in a
decentralized environment.
ABORT-TRANSACTION (TRANS-ID)
COMMIT-TRANSACTION (TRANS-ID)
public and durable. The commit operation must query all participants
in the transaction to assure that they are prepared to commit. If this
query fails, the transaction is aborted. Prior to commit, the system
may unilaterally abort a transaction in order to handle overloads,
deadlocks, system failures and network failures.
This model is implemented by the Tandem system [Borr] and, except for
27
A slightly more general model in which transactions may be nested
within one another is desirable. As it becomes better understood,
this idea will probably be considered essential [Gray], [Liskov].
28
2.5.2 Direct and queued transaction processing
Step (2) of processing a queued transaction can not converse with the
terminal, it must get all its input from step (1) and deliver all its
output in step (3). This is the major limitation of queued
transaction processing.
29
updates mentioned for replicated data are an excellent example of
this.
30
3. An approach to designing and managing decentralized systems
available.
decentralized system.
31
resource controlled by a corporate organization. This corporate
organization manages the corporate network and the protocols for its
use.
of the corporation and in turn needs services from other parts of the
corporation. The syntax and semantics of these requests and replies
constitute the global architecture. Each individual organization
32
* Autonomy: Allowing change of internal procedures without
33
* is the message to be processed immediately, or should it be queued
for later processing?
34
The first field specifies the service (organization) which handles the
message, the second field specifies the operation, the third field
specifies the version of the interface (later versions may have
additional or changed parameters) and the remaining fields are
parameters to the operation. Careful specification of the meanings of
each of these fields and the meaning of the operation is part of the
message definition and would be administered by the network
administrator.
35
A benefit of this approach is that it obviates the need for directly
integrating the heterogeneous databases of an organization. Each
department can use its favorite brand of computer and favorite brand
of data management system as long as they all support the common
transaction and virtual terminal protocols.
36
4. Summary
I have tried to make the point that many of the reasons for
decentralization run counter to the concept of an integrated
distributed database.
37
5. Acknowledgments
38
6. References
[Abadi] Abadi, A.L., Skeen, D., Christian, F., "An efficient, fault-
tolerant protocol for replicated data management", ACM PODS March
1985.
[Adiba] Adiba, M., Lindsay, B.G., "Database Snapshots", Proc Proc. 6th
Int. Conf. Very Large Databases, IEEE N.Y., Oct 1980, pp. 86-91.
[CICS] "CICS/VS System Application Design Guide", Ver. 1.4, Chap. 13,
IBM Form No. SC33-0068-1, June 1978, Armonk NY., pp. 379-412.
"CICS/VS Introduction to Program Logic", Ver. 1.4, Chap. 1.6, IBM
Form No. SC33-0067-1, June 1978, Armonk NY., pp. 83-110.
[Gray] Gray, J.N., et. ale "The recovery manager of a data management
system" ACM Compo Surveys, 13.2, June 1981, pp. 223-242.
[March] March, J.C., Simon H.A., Organizations, Wiley, 1958, N.Y. NY.
39
Distributed by
~TANDEM
Corporate Information Center
10400 N. Tantau Ave., LOC 248-07
Cupertino, CA 95014-0708