The Hardware Architecture and Linear Expansion of Tandem Nonstop Systems
The Hardware Architecture and Linear Expansion of Tandem Nonstop Systems
Robert Horst
Tim Chou
ABSTRACT
Acknowledgments. ••••••••••••••••• 22
Re f erences 23
TANDEM HARDWARE ARCHITECTURE EVOLUTION: 1976-1981
language [5].
processors which communicated with each other over dual high speed
1
unlike most scientific processing, is easily partitioned into multiple
relatively independent processes.
DYNABUS
I I I I
DYNABUS
CONTROLLER
CPU
PROCESSOR MODULES
MAIN MEMORY
I/O CONTROLLER
Y DISC
CONTROLLER
t- --I I-
--1 TERMINAL
CONTROLLER
~
I
L ~
TAPE
CONTROLLER
I DISC DISC
\ 7 I
I DISC
CONTROLLER
-i DISC
CONTROLLER
I-
Figure 1: A diagram of Tandem's NonStop System architecture.
2
In 1981, the NonStop II was introduced to remove addressing
limitations of the 16-bit NonStop I. Many new software features have
been added to the basic offering. The system was expanded into a
implementation.
3
TANDEM HARDWARE ARCHITECTURE EVOLUTION: 1981-1985
for OLTP, but the performance required for some large applications was
was apparent that there was a need for systems able to support high
4
An alternative approach to adding processors in the same syste~ is to
use a high speed connection between systems. This effectively adds
another level of interconnection hierarchy between the 26 Mbytes/sec
inter-CPU links, and the 56 Kbyte/sec network links. Figure 2
illustrates the Tandem solution which uses fiber-optics to link up to
fourteen systems.
SYSTEM 0
SYSTEM 1
FOX FOX
CPU
o
CPU
1
... CPU
15
FIBER
OPTICS
UP TO
1 KM
EACH •
• • •
SYSTEM 13
CPU
o
CPU
1
... CPU
15
5
Each node can accept or send up to 4 Mbytes/sec. With this additional
structure. Four fibers are connected between a system and each of its
the ring. The four paths provided between any pair of systems assures
sends to all other nodes with equal probability the network has a
6
Fiber optic links were chosen both to solve technical and practical
problems in configuring large clusters. Since fiber optics are not
susceptible to electromagnetic interference, they provide a reliable
connection even in noisy environments. They also provide high
bandwidth communications over fairly large distances (1 km). This
eases the conge~tion in the computer room and allows many computers in
the same or nearby buildings to be linked. Fiber optic cables are
also flexible and of small diameter, thus easing installation. Figure
3 provides additional details on FOX.
7
TXP PROCESSOR DESIGN RATIONALE
chip microprocessors such as the Motorola 68000 family, the Intel 8086
Packard 3000 and the IBM 4300 series. At the high end are designs
Examples of this type of design are mainframes such as the IBM 3090,
Amdahl 580, and high end machines from Sperry, Burroughs, CDC and
Cray.
8
costs over a minicomputer-style design due to the many fixed costs
which make up a system. The cost of main memory, packaging, power and
cabling are not reduced in proportion to CPU cost reductions.
9
while a hardware failure is being repaired. If the application
requires only one processor to handle the peak load, a second
processor is needed in case of failure, for a 100% overhead. In
contrast, if four less powerful processors are used to handle the same
25% overhead.
10
THE NONSTOP TXP PROCESSOR
all the way to the component level. One of the first decisions to be
made was the selection of static RAM's to be used in the cache memory
and control store. The most advanced RAM's at that time were
organized as 4Kx4 and 16Kx1 bits with access times of 45ns. These
were four times the density of and 10 ns faster than the RAM's used in
have been about the same performance level, higher cost, and would
performance and cost-performance. One such area was the cache memory
design. Although extensive academic research in cache memories was
11
available during the NonStop TXP design [11], most of the studies did
not anticipate the impact of large RAM's on cache organizations.
Using 16K static RAM's, a 64K byte cache requires only 32 components
(not including parity or the tag comparison RAM's or logic). This
makes it much more economical to design a large "dumb" cache (direct
mapped) than a smaller "smart" cache (set associative). After
performing some real time emulation of different cache organizations,
the final cache design for the NonStop TXP was chosen. It is a 64K
byte direct mapped virtual cache with hashing of some of the address
bits. Hit ratios for the cache have been measured between 96% and 99%
while performing transaction processing workloads.
12
Many of the tradeoffs made in the NonStop TXP design were based on
detailed measurements of the NonStop II performance. A complex
performance analyzer, named XPLOR, was designed and built solely for
that purpose. XPLOR was used to perform the cache emulation
experiments. In addition, it provided data on instruction
frequencies, percent time spent in each instruction, and the memory
reference activity of each instruction. This allowed hardware to be
saved in the support of less frequent instructions and applied to
accelerating the more frequent instructions. XPLOR also provided data
which enabled the microcode to be optimized for the more frequent
paths through complex instructions.
13
o DYNABUS: 26Mbytes/sec
o 2 MIP's per processor
o 83.3 nsec microcycle time
o Three stage microinstruction pipelining
o Three stage macroinstruction pipelining
o Dual data paths and dual arithmetic-logic units
o Two leve~ control store - 8K x 40 bits and 4K x 84 bits
o Extensive parity and selfchecking
o 64 Kbyte cache memory - 96% to 99% hit ratio
o 64-bit access of main memory
o 2-8 Mbytes physical memory (64K DRAMs)
o 1 Gbytes virtual memory addressing
14
PERFORMANCE BENCHMARKS
BANKING BENCHMARK
following files:
two most frequent transactions were the DEBIT and LOOKUP transactions.
15
DEBIT Transaction Profile:
X.25 Message in;
Read random Card;
Read random Account;
X.25 Message out;
X.25 Message in;
Read random with lock same Account;
Rewrite Account;
Sequential write to log of Account update;
X.25 Message out.
are shown in Figure 5. Note that the transaction rate grows linearly
in the case where the processors are interconnected via the Dynabus.
16
70
60
50
0 40
~
1/1
~ 30
20
10
0
4- 6 8 10 12 14 16
1# of CPUs
Cl TX/sec. one System + TX/sec. lx4•.4x4
17
RETAILING BENCHMARK
per second for its area. Each site will be connected through a large
SNA network and can communicate directly with the other two when
doubling the number of cpus and discs, the benchmark got twice the
throughput for the same response time -- less than 1.7 seconds for 90%
18
1..0
I.ITAIL BINCBMAn
120 TPS YI CPU •
100
TPS
o e 16 24 32
Number of CPUs 1n system
results.
somewhere over 800 transaction per second. In reality this may or ~ay
20
CONCLUSIONS
For a number of years there has been academic interest and hypotheses
[9] that a number of small processors could be tied together in some
way and provide the computing power of a larger machine. While this
may not be true in general, this paper illustrates that it is possible
in on-line transaction processing.
21
ACKNOWLEDGEMENTS
The authors would like to thank Jim Gray, Harald Sammer, Eric Chow,
Joan Zimmerman, and Hoke Johnson for providing many helpful remarks
22
REFERENCES
[1] Arceneaux, G. et al., "A Closer Look at Pathway," Tandem
Computers, SEDS-003, June 1982.
23
Distributed by
~TANDEMCOMPUTERS
Corporate Information Center
19333 Vallco Parkway MS3-07
Cupertino, CA 95014-2599