Tutorial NetworksonChip
Tutorial NetworksonChip
The 1st ACM/IEEE International Symposium on Networks-on-Chip Princeton, New Jersey, May 6, 2007
Network Layer Communication Performance in Network-on-Chips A. Jantsch (Royal Institute of Technology, Sweden) 10:00 - 11:45 am Power, Energy and Reliability Issues in NoC R. Marculescu (Carnegie Mellon University, USA) 1:15 - 3:00 pm Tooling, OS Services and Middleware L. Benini (University of Bologna, Italy) 3:15 - 5:00 pm
Introduction Communication Performance Organizational Structure Interconnection Topologies Trade-os in Network Topology Routing Quality of Service
A. Jantsch, KTH
Introduction
Interconnection Network
Topology: How switches and nodes are connected Routing algorithm: determines the route from source to destination Switching strategy: how a message traverses the route
Network interface Network interface
Communication assistm
A. Jantsch, KTH
Basic Denitions
A. Jantsch, KTH
Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer.
A. Jantsch, KTH
Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node.
A. Jantsch, KTH
Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node. Hop is the basic communication action from node to switch or from switch to switch.
A. Jantsch, KTH
Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node. Hop is the basic communication action from node to switch or from switch to switch. Diameter is the length of the maximum shortest path between any two nodes measured in hops. Routing distance between two nodes is the number of hops on a route. Average distance is the average of the routing distance over all pairs of nodes.
A. Jantsch, KTH
A. Jantsch, KTH
A. Jantsch, KTH
A. Jantsch, KTH
A. Jantsch, KTH
A. Jantsch, KTH
Performance - 5
Latency
1 A B C D
Time(n) = Admission + RoutingDelay + ContentionDelay Admission is the time it takes to emit the message into the network. RoutingDelay is the delay for the route. ContentionDelay is the delay of a message due to contention.
2 3
A. Jantsch, KTH
Performance - 6
Routing Delay
Store and Forward: Tsf (n, h) = h( n b + )
n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop
A. Jantsch, KTH
Performance - 6
Routing Delay
Store and Forward: Circuit Switching: Tsf (n, h) = h( n b + ) Tcs(n, h) =
n b
+ h
n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop
A. Jantsch, KTH
Performance - 6
Routing Delay
Store and Forward: Circuit Switching: Cut Through: Tsf (n, h) = h( n b + ) Tcs(n, h) = Tct(n, h) =
n b n b
+ h + h
n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop
A. Jantsch, KTH
Performance - 6
Routing Delay
Store and Forward: Circuit Switching: Cut Through: Store and Forward with fragmented packets: Tsf (n, h) = h( n b + ) Tcs(n, h) = Tct(n, h) =
n b n b
+ h + h
nnp b
+ h(
np b
+ )
n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop
A. Jantsch, KTH
Performance - 7
100
200
300
800
900
1000
1100
100
200
300
800
900
1000
1100
A. Jantsch, KTH
Performance - 7
100
200
300
400 500 600 700 Packet size in Bytes SF vs CT switching, k=2, m=8
800
900
1000
1100
100
200
300
400 500 600 700 Packet size in Bytes SF vs CT switching, d=2, m=8
800
900
1000
1100
300 Cut Through Store and Forward 250 200 150 100 50
10 0 0 200 400 600 800 Number of nodes (k=2) 1000 1200 0 0 200 400 600 800 Number of nodes (d=2) 1000 1200
A. Jantsch, KTH
Performance - 8
Local bandwidth = b n+n n+w E Total bandwidth = Cb[bits/second] = Cw[bits/cycle] = C [phits/cycle] Bisection bandwidth ... minimum bandwidth to cut the net into two equal parts.
b ... raw bandwidth of a link; n ... message size; nE ... size of message envelope; w ... link bandwidth per cycle; ... switching time for each switch in cycles; w ... bandwidth lost during switching; C ... total number of channels;
1
For a k k mesh with bidirectional channels: Total bandwidth = (4k 2 4k )b Bisection bandwidth = 2kb
2 3
A B C D
A. Jantsch, KTH
Performance - 9
N hl [phits/cycle] M N hl [phits/cycle] 1 MC
M ... each host issues a packet every M cycles C ... number of channels N ... number of nodes h ... average routing distance l = n/w ... number of cycles a message occupies a channel n ... average message size w ... bitwidth per channel
A. Jantsch, KTH
Performance - 10
Network Saturation
Network saturation
Network saturation
Delivered bandwidth
Latency
Offered bandwidth
Delivered bandwidth
Typical saturation points are between 40% and 70%. The saturation point depends on Trac pattern Stochastic variations in trac Routing algorithm
A. Jantsch, KTH
Organizational Structure - 11
Organizational Structure
A. Jantsch, KTH
Organizational Structure - 12
Link
Short link At any time there is only one data word on the link. Long link Several data words can travel on the link simultaneously. Narrow link Data and control information is multiplexed on the same wires. Wide link Data and control information is transmitted in parallel and simultaneously. Synchronous clocking Both source and destination operate on the same clock. Asynchronous clocking The clock is encoded in the transmitted data to allow the receiver to sample at the right time instance.
A. Jantsch, KTH
Organizational Structure - 13
Switch
Receiver Input ports Crossbar Input buffer Output buffer Transmitter Output ports
A. Jantsch, KTH
Organizational Structure - 14
A. Jantsch, KTH
Organizational Structure - 15
Network Interface
Admission protocol Reception obligations Buering Assembling and disassembling of messages Routing Higher level services and protocols
A. Jantsch, KTH
Topologies - 16
Interconnection Topologies
Fully connected networks Linear arrays and rings Multidimensional meshes and tori Trees Butteries
A. Jantsch, KTH
Topologies - 17
Node
Node
= = = = = =
N 1 1 O(N ) b b
Node
Node Node
Crossbar: switch degree diameter distance network cost total bandwidth bisection bandwidth
= = = = = =
N 1 1 O(N 2) Nb Nb
A. Jantsch, KTH
Topologies - 18
Linear array
Torus
Folded torus
= = = = =
A. Jantsch, KTH
Topologies - 19
k -ary d-cubes are d-dimensional tori with unidirectional links and k nodes in each dimension:
3d cube
= kd = d = d(k 1) d1 2 (k 1) = O(N ) = 2N b
2d torus
A. Jantsch, KTH
Topologies - 20
A. Jantsch, KTH
Topologies - 21
2ary 2cube
2ary 3cube
2ary 4cube
2ary 5cube
A. Jantsch, KTH
Topologies - 22
Binary Trees
number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth
= = = = = = =
A. Jantsch, KTH
Topologies - 23
k -ary Trees
number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth
= = = = = =
A. Jantsch, KTH
Topologies - 24
7 128 4
8 256 4
9 512 8
10 1024 8
A. Jantsch, KTH
Topologies - 25
k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d+2 = d(k 1) d1 2 (k 1) = O(N ) = 2N b
k -ary trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = kd kd k+1 2d d+2 O(N ) 2 2(N 1)b kb
A. Jantsch, KTH
Topologies - 26
Butteries
01 01 Butterfly building block
0
01
1
01
4 3 2 1 0 16 node butterfly
A. Jantsch, KTH
Topologies - 27
Buttery Characteristics
4 3 2 1 0
number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth
= = = = = = = =
A. Jantsch, KTH
Topologies - 28
k -ary n-cubes binary tree cost distance links per node bisection frequency limit of random trac O(N ) 1 d 2 N log N 2 2N
d1 d
1/( d
N 2)
1/2
A. Jantsch, KTH
Topologies - 29
Cost of the network O(N log N ) 2-d layout is more dicult than for binary trees Number of long wires grows faster than for trees. For each source-destination pair there is only one route. Each route blocks many other routes.
A. Jantsch, KTH
Topologies - 30
Benes Networks
Many routes; Costly to compute non-blocking routes; High probability for non-blocking route by randomly selecting an intermediate node [Leighton, 1992];
A. Jantsch, KTH
Topologies - 31
Fat Trees
fat nodes
A. Jantsch, KTH
Topologies - 32
number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth
= = = = = = =
fat nodes
A. Jantsch, KTH
Topologies - 33
k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d = d(k 1) d1 2 (k 1) = O(N ) = 2N b
k -ary n-dimensional fat trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = = kd k d1d 2k 2d d O(N d) 2k ddb 2k d1b
A. Jantsch, KTH
Topologies - 34
binary 1cube
A. Jantsch, KTH
Topologies - 35
binary 2cube
binary 2cube
A. Jantsch, KTH
Topologies - 36
binary 3cube
binary 3cube
A. Jantsch, KTH
Trade-os in Topologies - 37
A. Jantsch, KTH
Trade-os in Topologies - 38
1000
2000
3000
7000
8000
9000
10000
1000
2000
3000
7000
8000
9000
10000
A. Jantsch, KTH
Trade-os in Topologies - 39
20
30
40
50 60 Number of nodes
70
80
90
100
Network scalabilit wrt latency (m=128; h=dk/5) 170 165 160 Average latency 155 150 145 140 135 130 125 0 1000 2000 3000 4000 5000 6000 Number of nodes 7000 8000 9000 10000 k=2 d=5 d=4 d=3 d=2
A. Jantsch, KTH
Trade-os in Topologies - 40
Free-wire cost model: Wires are free and can be added without penalty.
Latency wrt dimension under freewire cost model (m=32;b=32) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under freewire cost model (m=128;b=32)
Average latency
4 Dimension
4 Dimension
A. Jantsch, KTH
Trade-os in Topologies - 41
Fixed-wire cost model: The number of wires is constant per node: 128 wires per node: w(d) = 64 d . d 2 3 4 5 6 7 8 9 10 w(d) 32 21 16 12 10 9 8 7 6
Latency wrt dimension under fixedwire cost model (m=32;b=64/d) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under fixedwire cost model (m=128;b=64/d)
Average latency
6 Dimension
10
6 Dimension
10
A. Jantsch, KTH
Trade-os in Topologies - 42
A. Jantsch, KTH
Trade-os in Topologies - 43
Average latency
6 Dimension
10
6 Dimension
10
A. Jantsch, KTH
Trade-os in Topologies - 44
5000
6 Dimension
10
6 Dimension
10
A. Jantsch, KTH
Trade-os in Topologies - 45
Assumptions [Agarwal, 1991]: k -ary n-cubes random trac dimension-order cut-through routing unbounded internal buers (to ignore ow control and deadlock issues)
A. Jantsch, KTH
Trade-os in Topologies - 46
A. Jantsch, KTH
Trade-os in Topologies - 47
Average latency
0.1
0.2
0.3
0.7
0.8
0.9
A. Jantsch, KTH
Routing - 48
Routing
Deterministic routing The route is determined solely by source and destination locations. Arithmetic routing The destination address of the incoming packet is compared with the address of the switch and the packet is routed accordingly. (relative or absolute addresses) Source based routing The source determines the route and builds a header with one directive for each switch. The switches strip o the top directive. Table-driven routing Switches have routing tables, which can be congured. Adaptive routing The route can be adapted by the switches to balance the load. Minimal routing allows only shortest paths while non-minimal routing allows even longer paths.
A. Jantsch, KTH
Quality of Service - 49
Quality of Service
Best Eort (BE) Optimization of the average case Loose or non-existent worst case bounds Cost eective use of resources Guaranteed Service (GS) Maximum delay Minimum bandwidth Maximum Jitter Requires additional resources
A. Jantsch, KTH
Quality of Service - 50
Regulated Flows
A Flow F is (, ) regulated if F (b) F (a) + (b a) for all time intervals [a, b], 0 a b and where F (t) the cumulative amount of trac between 0 and t 0. 0 is the burstiness constraint; 0 is the maximum average rate;
A. Jantsch, KTH
Quality of Service - 50
Regulated Flows
A Flow F is (, ) regulated if F (b) F (a) + (b a) for all time intervals [a, b], 0 a b and where F (t) the cumulative amount of trac between 0 and t 0. 0 is the burstiness constraint; 0 is the maximum average rate;
F(t)
t1
t2 t3 t4
t5
t
A. Jantsch, KTH
Quality of Service - 51
F 1
F 2
A. Jantsch, KTH
Quality of Service - 51
F 1
D
F1 (, ) F2 ( + D, )
F 2
A. Jantsch, KTH
Quality of Service - 52
F 1 F 2
D B
F 3 b
A. Jantsch, KTH
Quality of Service - 52
F 1 F 2
D B
F 3 b
A. Jantsch, KTH
Quality of Service - 53
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 53
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 53
t= 0
t1 taccu
t2
tdrain
Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2
A. Jantsch, KTH
Quality of Service - 53
t= 0
t1 taccu
t2
tdrain
Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2 Injection rate: 2b; Drain rate: b
A. Jantsch, KTH
Quality of Service - 53
t= 0
t1 taccu
t2
tdrain
Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2 Injection rate: 2b; Drain rate: b
bt1 = 1 + 1t1 1 t1 = b 1
A. Jantsch, KTH
Quality of Service - 54
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 54
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 54
t= 0
t1 taccu
t2
tdrain
Phase 2 (t2): F1 transmits at rate 1, F2 transmits at full speed; Injection rate: b + 1; Drain rate: b
A. Jantsch, KTH
Quality of Service - 54
t= 0
t1 taccu
t2
tdrain
Phase 2 (t2): F1 transmits at rate 1, F2 transmits at full speed; Injection rate: b + 1; Drain rate: b
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b
A. Jantsch, KTH
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2
A. Jantsch, KTH
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2
A. Jantsch, KTH
Quality of Service - 55
t= 0
t1 taccu
t2
tdrain
Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2 1 2 b 2
A. Jantsch, KTH
Quality of Service - 56
t= 0
t1 taccu
t2
tdrain
A. Jantsch, KTH
Quality of Service - 56
t= 0
t1 taccu
t2
tdrain
Bmax = 1 +
1 2 b 2
A. Jantsch, KTH
Quality of Service - 56
t= 0
t1 taccu
t2
tdrain
Bmax = 1 +
1 2 b 2 1 + 2 = b 1 2
A. Jantsch, KTH
Quality of Service - 56
t= 0
t1 taccu
t2
tdrain
Bmax = 1 + Dmax F3
1 2 b 2
1 + 2 = taccu + tdrain = b 1 2 (1 + 2, 1 + 2)
A. Jantsch, KTH
Quality of Service - 57
A. Jantsch, KTH
Quality of Service - 57
Processor
Custom HW Interconnect
Custom HW
Memory
A. Jantsch, KTH
Quality of Service - 57
Processor
Custom HW Interconnect
Custom HW
Memory
A. Jantsch, KTH
Quality of Service - 57
Processor
Custom HW Interconnect
Custom HW
Memory
A. Jantsch, KTH
Quality of Service - 57
Processor
Custom HW Interconnect
Custom HW
Memory
A. Jantsch, KTH
Quality of Service - 58
T F1
M
F1 (0, t)
A. Jantsch, KTH
Quality of Service - 59
T F1 C1 F2 F4 C2 F3 M
S F6 C3 F7 F8
V F9 C4
F1 (0, t)
A. Jantsch, KTH
Quality of Service - 59
T F1 C1 F2 F4 C2 F3 M
S F6 C3 F7 F8
V F9 C4
A. Jantsch, KTH
Quality of Service - 59
T F1 C1 F2 F4 C2 F3 M
S F6 C3 F7 F8
V F9 C4
F1 (0, t) C1 : (t, D1) C2 : (t, D2) C1 : (t, D3) C4 : (t, D4) F2 (tD1, t)
A. Jantsch, KTH
Quality of Service - 60
F2 F7
M :
FM1
FM2
FM3 FM4
(2t, DM )
A. Jantsch, KTH
Quality of Service - 60
F2 F7
M :
FM1
FM2
FM3 FM4
Dmux Fmuxout
A. Jantsch, KTH
Quality of Service - 60
F2 F7
M :
FM1
FM2
FM3 FM4
Dmux Fmuxout
FM 3 (t(D1 + Dmux + DM ), t)
A. Jantsch, KTH
Quality of Service - 60
F2 F7
M :
FM1
FM2
FM3 FM4
Dmux Fmuxout
FM 3 (t(D1 + Dmux + DM ), t) FM 4 ?
A. Jantsch, KTH
Quality of Service - 61
A. Jantsch, KTH
Quality of Service - 61
RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.
A. Jantsch, KTH
Quality of Service - 61
RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.
D(, )-regulator =
max(0, )
A. Jantsch, KTH
Quality of Service - 61
RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.
F (Sbuer,max(0 t ) , ) D(, 6 = )-regulator C3 : (t, D3) B(, )-regulator = max(0, ) F7 (Sbuer + tD3, t)
A. Jantsch, KTH
Quality of Service - 62
F2 F7
FM1
FM2
FM3 FM4
A. Jantsch, KTH
Quality of Service - 62
F2 F7
M :
FM1
FM2
FM3 FM4
(2t, DM )
Dmux = FM 1
A. Jantsch, KTH
Quality of Service - 63
T F1 F4 C2 C1 F2 F3
S F5 RS F6 C3 F7 RM2 FM4
V F9 C4 F8
RM1 FM3 M
A. Jantsch, KTH
Quality of Service - 63
BRM 1 = max(0, t(D1 + Dmux + DM ) Sbuer) BRM 2 = max(0, 128B + t(D3 + Dmux +DM ) Sbuer)
A. Jantsch, KTH
Quality of Service - 64
T F1 F4 C2 C1 F2 F3
S F5 RS F6 C3 F7 RM2 FM4
V F9 C4 F8
RM1 FM3 M
A. Jantsch, KTH
Quality of Service - 64
C2
F4 (Sbuer + D2, t), A charatcerization of S and its output: Sbuer S : (t, ) t F5 (2Sbuer + tD2, t) The ows between memory and V: F8 C4 (Sbuer, t) : (, D4)
F9 (Sbuer + tD4, t)
A. Jantsch, KTH
Quality of Service - 65
A. Jantsch, KTH
Quality of Service - 65
End to end delay: Dtotal =D1 + Dmux + DM + DRM 1 + D2 + DS + DRS + D3 + Dmux + DM + DRM 2 + D4 The ow at V: FT V (0 + tDtotal, t)
A. Jantsch, KTH
Quality of Service - 66
A. Jantsch, KTH
Quality of Service - 67
bits (t)
ba a
Given a monotonically increasing function , dened for t 0, is an arrival curve for ow F if for all 0 a b: F (b) F (a) (b a)
A. Jantsch, KTH
Quality of Service - 68
A. Jantsch, KTH
Quality of Service - 68
A. Jantsch, KTH
Quality of Service - 68
If is an arrival curve for F we have: F F and F with being the best bound that we can nd based on information of .
A. Jantsch, KTH
Quality of Service - 69
Given a system S with an input ow F and an output ow F . S oers the ow a service curve if and only if is a monotonically increasing function and F F which means that F (t) inf (F (t) + (t s))
st
A. Jantsch, KTH
Quality of Service - 70
Given a ow F constrained by arrival curve and a system oering a service curve , the backlog F (t) F (t) for all t satises F (t) F (t) sup((s) (s))
s0
A. Jantsch, KTH
Quality of Service - 71
F*(t)
Given a ow F constrained by arrival curve and a system oering a service curve , the delay d(t) at time t is d(t) = inf( 0 : F (t) F (t + )). It satises d(t) h(, ) = sup(inf( 0 : (t) (t + )))
t0
A. Jantsch, KTH
Quality of Service - 72
F*(t)
Given a ow F constrained by arrival curve and a system oering a service curve , the output ow F is constrained by the arrival curve = ( .
A. Jantsch, KTH
Quality of Service - 73
bits
bits
Quality of Service - 74
S / 1 F S1 1
2 S2 F* 2
A. Jantsch, KTH
Quality of Service - 74
S / 1 F S1 1
2 S2 F* 2
Example:
A. Jantsch, KTH
Quality of Service - 74
S / 1 F S1 1
2 S2 F* 2
Example:
Useful properties:
Quality of Service - 75
T1 T2
T1 + T2
A. Jantsch, KTH
Quality of Service - 75
T1 T2
T1 + T2
A. Jantsch, KTH
Quality of Service - 75
T1 T2
T1 + T2
= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2))
A. Jantsch, KTH
Quality of Service - 75
T1 T2
T1 + T2
= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2)) T1 D1 + D2 = + + + T1 + T2 R1 R2 R2
A. Jantsch, KTH
Quality of Service - 75
T1 T2
T1 + T2
= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2)) T1 D1 + D2 = + + + T1 + T2 R1 R2 R2 DS = + T1 + T2 min (R1, R2)
A. Jantsch, KTH
Summary - 76
Summary
Communication Performance: bandwidth, unloaded latency, loaded latency Organizational Structure: NI, switch, link Topologies: wire space and delay domination favors low dimension topologies; Routing: deterministic vs source based vs adaptive routing; deadlock; Quality of Service and ow regulation
A. Jantsch, KTH
To Probe Further
Classic papers: [Agarwal, 1991] Agarwal, A. (1991). Limit on interconnection performance. IEEE Transactions on Parallel and Distributed Systems, 4(6):613624. [Dally, 1990] Dally, W. J. (1990). Performance analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, 39(6):775785. Text books: [Duato et al., 1998] Duato, J., Yalamanchili, S., and Ni, L. (1998). Interconnection Networks - An Engineering Approach. Computer Society Press, Los Alamitos, California. [Culler et al., 1999] Culler, D. E., Singh, J. P., and Gupta, A. (1999). Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufman Publishers. [Dally and Towels, 2004] Dally, W. J. and Towels, B. (2004). Principles and Practices of Interconnection Networks. Morgan Kaufman Publishers. [DeMicheli and Benini, 2006] DeMicheli, G. and Benini, L. (2006). Networks on Chip. Morgan Kaufmann. [Leighton, 1992] Leighton, F. T. (1992). Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Francisco. [LeBoudec, 200] Jean-Yves LeBoudec, J-Y. (2001). Network Calculus. Springer Verlag, LCNS 2050
A. Jantsch, KTH