I3 Sigcomm
I3 Sigcomm
I3 Sigcomm
Ion Stoica Daniel Adkins Shelley Zhuang Scott Shenker Sonesh Surana
University of California, Berkeley
istoica, dadkins, shelleyz, sonesh @cs.berkeley.edu
Figure 1: (a) ’s API. Example illustrating communication between two nodes. (b) The receiver 1 inserts trigger 2 !3 45176 . (c) The
sender sends packet 2 !3 4398;:<8=6 .
ditional aspects of the design such as scalability and efficient rout- inserts the trigger 2 @3 4176 into the network. When a packet is sent
ing. Section 5 describes some simulation results on performance to identifier !3 , the trigger causes it to be forwarded via IP to 1 .
along with a discussion on an initial implementation. Related work Thus, much as in IP multicast, the identifier !3 represents a log-
is discussed in Section 6, followed by a discussion on future work ical rendezvous between the sender’s packets and the receiver’s
Section 7. We conclude with a summary in Section 8. trigger. This level of indirection decouples the sender from the
receiver. The senders need neither be aware of the number of re-
2. >? OVERVIEW ceivers nor their location. Similarly, receivers need not be aware of
the number or location of senders.
In this section we present an overview of . We start with the The above description is the simplest form of the abstraction.
basic service model and communication abstraction, then briefly We now describe a generalization that allows inexact matching be-
describe ’s design. tween identifiers. (A second generalization that replaces identi-
fiers with a stack of identifiers is described in Section 2.5.) We as-
2.1 Service Model sume identifiers are B bits long and that there is some exact-match
The purpose of is to provide indirection; that is, it decouples threshold D with DFEGB . We then say that an identifier @3=H in a
the act of sending from the act of receiving. The service model trigger matches an identifier @3 in a packet if and only if
is simple: sources send packets to a logical identifier, and receivers
express interest in packets sent to an identifier. Delivery is best- (a) !3 and !3 H have a prefix match of at least D bits, and
effort like in today’s Internet, with no guarantees about packet de-
livery. (b) there is no trigger with an identifier that has a longer prefix
This service model is similar to that of IP multicast. The cru- match with @3 .
cial difference is that the equivalent of an IP multicast join is
more flexible. IP multicast offers a receiver a binary decision of In other words, a trigger identifier @3=H matches a packet identi-
whether or not to receive packets sent to that group (this can be fier @3 if and only if @3;H is a longest prefix match (among all other
indicated on a per-source basis). It is up to the multicast infrastruc- trigger identifiers) and this prefix match is at least as long as the
ture to build efficient delivery trees. The equivalent of a join is exact-match threshold D . The value D is chosen to be large enough
inserting a trigger. This operation is more flexible than an IP mul- so that the probability that two randomly chosen identifiers match
ticast join as it allows receivers to control the routing of the packet. is negligible.1 This allows end-hosts to choose the identifiers inde-
This provides two advantages. First, it allows them to create, at the pendently with negligible chance of collision.
application level, services such as mobility, anycast, and service 2.3 Overview of the Design
composition out of this basic service model. Thus, this one simple
service model can be used to support a wide variety of application- We now briefly describe the infrastructure that supports this ren-
level communication abstractions, alleviating the need for many dezvous communication abstraction; a more in-depth description
parallel and redundant overlay infrastructures. Second, the infras- follows in Section 4. is an overlay network which consists of
tructure can give responsibility for efficient tree construction to the a set of servers that store triggers and forward packets (using IP)
end-hosts. This allows the infrastructure to remain simple, robust, between nodes and to end-hosts. Identifiers and triggers have
and scalable. meaning only in this overlay.
One of the main challenges in implementing is to efficiently
2.2 Rendezvous-Based Communication match the identifiers in the packets to those in triggers. This is
done by mapping each identifier to a unique node (server); at
The service model is instantiated as a rendezvous-based com-
any given time there is a single node responsible for a given @3 .
munication abstraction. In their simplest form, packets are pairs
When a trigger 2 @3 458;3939C6 is inserted, it is stored on the node
2 @3 453989:<8A6 where !3 is an B -bit identifier and 3989:8 consists of
responsible for @3 . When a packet is sent to @3 it is routed by to
a payload (typically a normal IP packet payload). Receivers use
the node responsible for @3 ; there it is matched against any triggers
triggers to indicate their interest in packets. In the simplest form,
for that !3 and forwarded (using IP) to all hosts interested in packets
triggers are pairs 2 @3 458;3939C6 , where !3 represents the trigger iden-
sent to that identifier. To facilitate inexact matching, we require that
tifier, and 8;3939C represents a node’s address which consists of an
all !3 ’s that agree in the first D bits be stored on the same server.
IP address and a port number. A trigger 2 @3 458;3939C'6 indicates that
The longest prefix match required for inexact matching can then be
all packets with an identifier @3 should be forwarded (at the IP
executed at a single node (where it can be done efficiently).
level) by the infrastructure to the node identified by 8;3939C . More
Note that packets are not stored in ; they are only forwarded.
specifically, the rendezvous-based communication abstraction ex-
provides a best-effort service like today’s Internet. implements
ports three basic primitives as shown in Figure 1(a).
neither reliability nor ordered delivery on top of IP. End-hosts use
Figure 1(b) illustrates the communication between two nodes, I
where receiver 1 wants to receive packets sent to @3 . The receiver In our implementation we choose BKJMLNO and DPJRQLS .
(id, data)
receiver (R) (id, data)
sender (S) (id, R) (R, data) sender (S) (id, R’) (R’, data)
receiver (R’)
(a) Mobility
Figure 2: Communication abstractions provided by . (a) Mobility: The change of the receiver’s address from 1 to 1 is transparent
to the sender. (b) Multicast: Every packet 2 !3 4398;:<8=6 is forwarded to each receiver 1 that inserts the trigger 2 @3 41 6 . (c) Anycast:
! #"
The packet matches the trigger of receiver 17L . @3 @3 denotes an identifier of size B , where @3 represents the prefix of the D most $
"
significant bits, and !3 represents the suffix of the B &%
D least significant bits.
periodic refreshing to maintain their triggers in . Hosts contact an any packet that matches @3 is forwarded to all members of the group
node when sending packets or inserting triggers. This node as shown in Figure 2(b). We discuss how to make this approach
then forwards these packets or triggers to the node responsible scalable in Section 3.4.
for the associated identifiers. Thus, hosts need only know one Note that unlike IP multicast, with there is no difference be-
node in order to use the infrastructure. tween unicast or multicast packets, in either sending or receiving.
Such an interface gives maximum flexibility to the application. An
2.4 Communication Primitives Provided by >? application can switch on-the-fly from unicast to multicast by sim-
We now describe how can be used by applications to achieve ply having more hosts maintain triggers with the same identifier.
the more general communication abstractions of mobility, multi- For example, in a telephony application this would allow multiple
cast, and anycast. parties to seamlessly join a two-party conversation. In contrast,
with IP, an application has to at least change the IP destination ad-
2.4.1 Mobility dress in order to switch from unicast to multicast.
The form of mobility addressed here is when a host (e.g., a lap-
top) is assigned a new address when it moves from one location to 2.4.3 Anycast
another. A mobile host that changes its address from 1 to 1 as a ' Anycast ensures that a packet is delivered to exactly one receiver
result of moving from one subnet to another can preserve the end- in a group, if any. Anycast enables server selection, a basic building
to-end connectivity by simply updating each of its existing triggers block for many of today’s applications. To achieve this with , all
from 2 @3 4176 to 2 @3 41 6 , as shown in Figure 2(a). The sending host hosts in an anycast group maintain triggers which are identical in
needs not be aware of the mobile host’s current location or address. the D most significant bits. These D bits play the role of the anycast
Furthermore, since each packet is routed based on its identifier to group identifier. To send a packet to an anycast group, a sender uses
the server that stores its trigger, no additional operation needs to be an identifier whose D -bit prefix matches the anycast group identi-
invoked when the sender moves. Thus, can maintain end-to-end fier. The packet is then delivered to the member of the group whose
connectivity even when both end-points move simultaneously. trigger identifier best matches the packet identifier according to the
longest prefix matching rule (see Figure 2(c)). Section 3.3 gives
(%
With any scheme that supports mobility, efficiency is a major
concern [25]. With , applications can use two techniques to two examples of how end-hosts can use the last B D bits of the
achieve efficiency. First, the address of the server storing the trigger identifier to encode their preferences.
is cached at the sender, and thus subsequent packets are forwarded
directly to that server via IP. This way, most packets are forwarded 2.5 Stack of Identifiers
through only one server in the overlay network. Second, to al- In this section, we describe a second generalization of , which
leviate the triangle routing problem due to the trigger being stored replaces identifiers with identifier stacks. An identifier stack is a
at a server far away, end-hosts can use off-line heuristics to choose list of identifiers that takes the form 2 !3 I 4 @3 4 @3 '4 4 @3 6 where *) #+ -,.,-, */
triggers that are stored at servers close to themselves (see Sec- #
@3 is either an identifier or an address. Packets and triggers : are 0
tion 4.5 for details). thus of the form:
// is local server responsible for id’s best match?
if 2 8 <2 @36J FALSE 6
0
89:
+%,%+ ,%+ ,+
((T,id), data)
(R, data)
forward( ); // matching trigger stored elsewhere
((id HTML−WML , id), data)
0
0 0 ,
(id, R)
return; (id HTML−WML , T)
sender (S) receiver (R)
: :J : B 8;: 2 @36 ; // get all triggers matching id (a) Service composition
0 ,
if 2 ": :J 6
if 2 @3 :8 DPJ 6
0
drop( ) // nowhere else to forward
MPEG−H.263 transcoder (T)
forward 2 6 ; (id MPEG−H.263 , T) receiver (R1)
0
0 0
: J : :C "C2 ": :6 ;
sender (S) (id, (id MPEG−H.263 , R1)) (R2, data)
Q J 2 6 ; // create new packet to send
receiver (R2)
0 0 ! 5, 0 ,
// ... add t’s stack at head of p1’s stack (id, R2)
0 0 -
0 ,
forward( ) // send/forward packet
.
Figure 4: (a) Service composition: The sender ( ) specifies that
0
@3 J "8;3 2 !3 :8 DA6 ; // get head of p’s stack packets should be transcoded at server before being delivered
if 2: 92 @36 J IP ADDR TYPE6
0
IP send(@3 4 ); // id is an IP address; send p to id via IP
to the destination (1 ). (b) Heterogeneous multicast: Receiver
1 Q specifies that wants to receive H.263 data, while 17L specifies
else
0
forward 2 6 ; // forward via overlay network 0 that wants to receive MPEG data. The sender sends MPEG
data.
Figure 3: Pseudo-code of the receiving and forward operations
executed by an server. 0 . For each matching trigger , the identifier stack of the trigger is
prepended to 0 ’s identifier stack. The packet 0 is then forwarded
:
ized form of triggers allows a trigger to send a packet to another In this section we present a few examples of how can be used.
identifier rather than to an address. This extension allows for a We discuss service composition, heterogeneous multicast, server
selection, and large scale multicast. In the remainder of the paper,
0 0
much greater flexibility. To illustrate this point, in Sections 3.1, 3.2,
and 4.3, we discuss how identifier stacks can be used to provide we say that packet matches trigger : if the first identifier of ’s
service composition, implement heterogeneous multicast, and in- identifier stack matches : ’s identifier.
crease ’s robustness, respectively.
0
A packet is always forwarded based on the first identifier @3 3.1 Service Composition
in its identifier stack until it reaches the server who is responsible Some applications may require third parties to process the data
0 0
) +
for storing the matching trigger(s) for . Consider a packet with before it reaches the destination [10]. An example is a wireless
an identifier stack 2 @3 I 4 @3 4 @3 6 . If there is no trigger in whose application protocol (WAP) gateway translating HTML web pages
identifier matches @3 I , @3 I is popped from the stack. The process is to WML for wireless devices [35]. WML is a lightweight version of
0
repeated until an identifier in ’s identifier stack matches a trigger HTML designed to run on wireless devices with small screens and
0
.
: . If no such trigger is found, packet is dropped. If on the other limited capabilities. In this case, the server can forward the web
hand, there is a trigger : whose identifier matches @3 I , then !3 I is page to a third-party server that implements the HTML-WML
#"
#"
) #+
replaced by : ’s identifier stack. In particular, if : ’s identifier stack transcoding, which in turn processes the data and sends it to the
0
is 2 4 =6 , then ’s identifier stack becomes 2 4 4 @3 '4 @3 "6 . If @3 I destination via WAP.
0
is an IP address, is sent via IP to that address, and the rest of In general, data might need to be transformed by a series of
0 ) #+
) +
’s identifier stack, i.e., 2 @3 4 @3 "6 is forwarded to the application. third-party servers before it reaches the destination. In today’s In-
The semantics of @3 and !3 are in general application-specific. ternet, the application needs to know the set of servers that per-
However, in this paper we consider only examples in which the form transcoding and then explicitly forward data packets via these
application is expected to use these identifiers to forward the packet servers.
after it has processed it. Thus, an application that receives a packet With , this functionality can be easily implemented by using a
) #+
with identifier stack 2 @3 4 @3 6 is expected to send another packet stack of identifiers. Figure 4(a) shows how data packets containing
) #+
with the same identifier stack 2 !3 '4 @3 6 . As shown in the next HTML information can be redirected to the transcoder, and thus
/10 2435768293
section this allows to provide support for service composition. arrive at the receiver containing WML information. The sender
Figure 3 shows the pseudo-code of the receiving and forward- associates with each data packet the stack 2 @3 4 @36 ,
ing operations executed by an node. Upon receiving a packet , 0 where @3 represents the flow identifier. As a result, the data packet
a server first checks whether it is responsible for storing the trig- is routed first to the server which performs the transcoding. Next,
0
ger matching packet . If not, the server forwards the packet at the server inserts packet 2 @3 453989:8=6 into , which delivers it to the
the level. If yes, the code returns the set of triggers that match receiver.
3.2 Heterogeneous Multicast
(idg, data) S
Figure 4(b) shows a more complex scenario in which an MPEG
video stream is played back by one H.263 receiver and one MPEG
receiver.
To provide this functionality, we use the ability of the receiver,
instead of the sender (see Section 2.5), to control the transforma- idg idg idg
2 5 / ) +
.
result the packet’s identifier !3 is replaced by the trigger’s stack
2 @3 4 6 . Next, the packet is forwarded to the MPEG-
id1 id1 id2 id2 id2
H.263 transcoder, and then directly to receiver 1 Q . In contrast, an R4 R5 R6 R1
R2 R3
MPEG receiver 17L only needs to maintain a trigger 2 @3 41 Q"6 in .
This way, receivers with different display capabilities can subscribe
to the same multicast group.
Another useful application is to have the receiver insist that all
data go through a firewall first before reaching it. R4 R5 R6
R2 R3
3.3 Server Selection
provides good support for basic server selection through the
use of the last B %
D bits of the identifiers to encode application Figure 5: Example of a scalable multicast tree with bounded
preferences.2 To illustrate this point consider two examples. degree by using chains of triggers.
In the first example, assume that there are several web servers
and the goal is to balance the client requests among these servers.
This goal can be achieved by setting the B %
D least significant bits
receivers of the multicast group construct and maintain the hierar-
chy of triggers.
of both trigger and packet identifiers to random values. If servers
have different capacities, then each server can insert a number of
triggers proportional to its capacity. Finally, one can devise an 4. ADDITIONAL DESIGN AND PERFOR-
adaptive algorithm in which each server varies the number of trig- MANCE ISSUES
gers as a function of its current load.
In this section we discuss some additional design and per-
In the second example, consider the goal of selecting a server
formance issues. The design was intended to be (among other
that is close to the client in terms of latency. To achieve this goal,
each server can use the last B %
D bits of its trigger identifiers to
properties) robust, self-organizing, efficient, secure, scalable, incre-
encode its location, and the client can use the last B %
D bits in the
mentally deployable, and compatible with legacy applications. In
this section we discuss these issues and some details of the design
packets’ identifier to encode its own location. In the simplest case,
that are relevant to them.
the location of an end-host (i.e., server or client) can be the zip
Before addressing these issues, we first review our basic design.
code of the place where the end-host is located; the longest pre-
is organized as an overlay network in which every node (server)
fix matching procedure used by would result then in the packet
stores a subset of triggers. In the basic design, at any moment of
being forwarded to a server that is relatively close to the client.3
time, a trigger is stored at only one server. Each end-host knows
3.4 Large Scale Multicast about one or more servers. When a host wants to send a packet
2 @3 453989:8=6 , it forwards the packet to one of the servers it knows. If
The multicast abstraction presented in Section 2.4.2 assumes
the contacted server doesn’t store the trigger matching 2 @3 453989:8=6 ,
that all members of a multicast group insert triggers with identical
the packet is forwarded via IP to another server. This process con-
identifiers. Since triggers with identical identifier are stored at the
tinues until the packet reaches the server that stores the matching
same server, that server is responsible for forwarding each mul-
trigger. The packet is then sent to the destination via IP.
ticast packet to every member of the multicast group. This solution
obviously does not scale to large multicast groups. 4.1 Properties of the Overlay
One approach to address this problem is to build a hierarchy of
The performance of depends greatly on the nature of the un-
" #" " ) ,-,-,
triggers, where each member 1 of a multicast group @3 replaces
! #5
ing table of size B . The -th entry in the routing
I
#!
table of server One potential problem with this approach is that although the
5
contains
L
I the first server that follows L , i.e.,
!
C2
6 . This server is called the -th finger of . Note that the first
triggers are eventually reinserted, the time during which they are
unavailable due to server failures may be too large for some appli-
cations. There are at least two solutions to address this problem.
!
finger is the same as the successor server.
The first solution does not require -level changes. The idea is to
Upon receiving a packet with identifier @3 , server checks whether
@3 lies between itself and its successor. If yes, the server forwards have each receiver 1 maintain a backup trigger 2 @3 2-4 / -
4176 in
addition to the primary trigger 2 @3 4176 , and have the sender send
!
the packet to its successor, which should store the packet’s trigger.
If not, sends the packet to the closest server (finger) in its rout- packets with the identifier stack 2 @3 4 !3 2-4 / .
6 . If the server stor-
ing table that precedes !3 . In this way, we are guaranteed that the ing the primary trigger fails, the packet will be then forwarded via
distance to @3 in the identifier space is halved at each step. As a re- the backup trigger to 1 .4 Note that to accommodate the case when
sult, it takes P 2
)
6 hops to route a packet to the server storing the packet is required to match every trigger in its identifier stack
the best matching trigger for the packet, irrespective of the start- (see Section 3.2), we use a flag in the packet header, which, if set,
ing point of the packet, where is the number of servers in the causes the packet to be dropped if the identifier at the head of its
system. stack doesn’t find a match. The second solution is to have the over-
In the current implementation, we assume that all identifiers that lay network itself replicate the triggers and manage the replicas.
share the same D -bit prefix are stored on the same server. A simple In the case of Chord, the natural replication policy is to replicate a
way to achieve this is to set the last B % D bits of every node iden- trigger on the immediate successor of the server responsible for that
trigger [5]. Finally, note that when an end-host fails, its triggers are
tifier to zero. As a result, finding the best matching trigger reduces
to performing a longest prefix matching operation locally. automatically deleted from after they time-out.
While is implemented on top of Chord, in principle can use
any of the recently proposed P2P lookup systems such as CAN [22],
4.4 Self-Organizing
Pastry [23] and Tapestry [12]. is an overlay infrastructure that may grow to large sizes. Thus,
it is important that it not require extensive manual configuration or
4.2 Public and Private Triggers human intervention. The Chord overlay network is self-configuring,
Before discussing ’s properties, we introduce an important tech- in that nodes joining the infrastructure use a simple bootstrap-
nique that allows applications to use more securely and effi- ping mechanism (see [26]) to find out about at least one existing
ciently. With this technique applications make a distinction be- node, and then contacts that node to join the infrastructure. Sim-
tween two types of triggers: public and private. This distinction Here we implicitly assume that the primary and backup triggers
is made only at the application level: itself doesn’t differentiate
2-4 / - %
are stored on different servers. The receiver can ensure that this is
between private and public triggers. the case with high probability by choosing @3 7J L @3 .
ilarly, end-hosts wishing to use can locate at least one server tmndec
using a similar bootstrapping technique; knowledge of a single MPEG−H.263 transcoder (T)
-
the packet header, and, if is set, it returns its IP address back to the
original sender. In turn, the sender caches this address and uses it a trigger : exceeds a certain threshold, the server storing the trig-
to send the subsequent packets with the same identifier. The sender ger pushes a copy of : to another server. This process can continue
-
can periodically set the refreshing flag as a keep-alive message recursively until the load is spread out. The decision of where to
with the cached server responsible for this trigger. push the trigger is subject to two constraints. First, should push
-
Note that the optimization of caching the server which stores the trigger to the server most likely to route the packets matching
. -
the receiver’s trigger does not undermine the system robustness. that trigger. Second, should try to minimize the state it needs
If the trigger moves to another server (e.g., as the result of a to maintain; at least needs to know the servers to which it has
new server joining the system), will simply route the subsequent already pushed triggers in order to forward refresh messages for
packets from to . When the first packet reaches , the receiver these triggers (otherwise the triggers will expire). With Chord, one
will replace with in its cache. If the cached server fails, the simple way to address these problems is to always push the triggers
client simply uses another known server to communicate. This to the predecessor server.
is the same fall-back mechanism as in the unoptimized case when If there are more triggers that share the same D -bit prefix with a
the client uses only one server to communicate with all the other popular trigger : , all these triggers need to be cached together with
clients. Actually, the fact that the client caches the server storing : . Otherwise, if the identifier of a packet matches the identifier
the receiver’s trigger can help reduce the recovery time. When the of a cached trigger : , we cannot be sure that : is indeed the best
sender notices that the server has failed, it can inform the receiver matching trigger for the packet.
to reinsert the trigger immediately. Note that this solution assumes
that the sender and receiver can communicate via alternate triggers 4.7 Scalability
that are not stored at the same server. Since typically each flow is required to maintain two triggers
While caching the server storing the receiver’s trigger reduces (one for each end-point), the number of triggers stored in is of the
the number of hops, we still need to deal with the triangle rout- order of the number of flows plus the number of end-hosts. At first
ing problem. That is, if the sender and the receiver are close by, sight, this would be equivalent to a network in which each router
but the server storing the trigger is far away, the routing can be in- maintains per-flow state. Fortunately, this is not the case. While the
!
efficient. For example, if the sender and the receiver are both in state of a flow is maintained by each router along its path, a trigger
!
Berkeley and the server storing the receiver’s trigger is in London, is stored at only one node at a time. Thus, if there are triggers and
each packet will be forwarded to London before being delivered servers, each server will store triggers on the average. This
back to Berkeley! also suggests that can be easily upgraded by simply adding more
One solution to this problem is to have the receivers choose their servers to the network. One interesting point to note is that these
private triggers such that they are located on nearby servers. This nodes do not need to be placed at specific locations in the network.
would ensure that packets won’t take a long detour before reach-
ing their destination. If an end-host knows the identifiers of the 4.8 Incremental Deployment
nearby servers, then it can easily choose triggers with identifiers Since is designed as an overlay network, is incrementally
that map on these servers. In general, each end-host can sample deployable. At the limit, may consist of only one node that stores
the identifier space to find ranges of identifiers that are stored at all triggers. Adding more servers to the system does not require
nearby servers. To find these ranges, a node can insert random any system configuration. A new server simply joins the system
triggers 2 !3 4 6 into , and then estimate the RTT to the server
that stores the trigger by simply sending packets, 2 !3 43
B B A6 , to
using the Chord protocol, and becomes automatically responsible
for an interval in the identifier space. When triggers with identifiers
itself. Note that since we assume that the mapping of triggers onto in that interval are refreshed/inserted they will be stored at the new
servers is relatively stable over time, this operation can be done server. In this way, the addition of a new server is also transparent
off-line. We evaluate this approach by simulation in Section 5.1. to the end-hosts.
-
Unlike IP, where an end-host can only send and receive pack- of indirection. Consider a server that wants to advertise a public
" #" - "
ets, in end-hosts are also responsible for maintaining the routing trigger with identifier @3 . Instead of inserting the trigger 2 @3 4 6 ,
- "
information through triggers. While this allows flexibility for ap- the server can insert two triggers, 2 @3 =4 6 and 2 4 6 , where is an
plications, it also (and unfortunately) creates new opportunities for identifier known only by . Since a malicious user has to know
malicious users. We now discuss several security issues and how in order to remove either of the two triggers, this simple technique
"
addresses them. provides effective protection against this type of attack. To avoid
" #" -
We emphasize that our main goal here is not to design a bullet performance penalties, the receiver can choose such that both
"
proof system. Instead, our goal is to design simple and efficient 2 @3 4 6 and 2 4 6 are stored at the same server. With the current
solutions that make not worse and in many cases better than implementation this can be easily achieved by having @3 and
today’s Internet. The solutions outlined in this section should be share the same D -bit prefix.
viewed as a starting point towards more sophisticated and better
security solutions that we will develop in the future. 4.10.3 DoS Attacks
The fact that gives end-hosts control on routing opens new
4.10.1 Eavesdropping possibilities for DoS attacks. We consider two types of attacks: (a)
Recall that the key to enabling multicast functionality is to al- attacks on end-hosts, and (b) attacks on the infrastructure. In the
low multiple triggers with the same identifer. Unfortunately, a ma- former case, a malicious user can insert a hierarchy of triggers (see
licious user that knows a host’s trigger can use this flexibility to Figure 5) in which all triggers on the last level point to the victim.
eavesdrop the traffic towards that host by simply inserting a trig- Sending a single packet to the trigger at the root of the hierarchy
ger with the same identifier and its own address. In addressing this will cause the packet to be replicated and all replicas to be sent
problem, we consider two cases: (a) private and (b) public triggers to the victim. This way an attacker can mount a large scale DoS
(see Section 4.2). attack by simply leveraging the infrastructure. In the later case,
Private triggers are secretly chosen by the application end-points a malicious user can create trigger loops, for instance by connecting
and are not supposed to be revealed to the outside world. The length the leaves of a trigger hierarchy to its root. In this case, each packet
of the trigger’s identifier makes it difficult for a third party to use sent to the root will be exponentially replicated!
a brute force attack. While other application constraints such as To alleviate these attacks, uses three techniques:
storing a trigger at a server nearby can limit the identifier choice, 1. Challenges: assumes implicitly that a trigger that points
the identifier is long enough (i.e., LNO bits), such that the appli- to an end-host 1 is inserted by the end-host itself. An
cation can always reserve a reasonably large number of bits that server can easily verify this assumption by sending a chal-
are randomly chosen. Assuming that an application chooses QI L'S
random bits in the trigger’s identifier, it will take an attacker L
)
lenge to 1 the first time the trigger is inserted. The challenge
consists of a random nonce that is expected to be returned by
probes on the average to guess the identifier. Even in the face of the receiver. If the receiver fails to answer the challenge the
L
) 5)
a I distributed attack
J L
I of say one millions of hosts, it will take about
probes per host to guess a private trigger. We
trigger is removed. As a result an attacker cannot use a hier-
archy of triggers to mount a DoS attack (as described above),
note that the technique of using random identifiers as probabilistic since the leaf triggers will be removed as soon as the server
secure capabilities was previously used in [28, 37]. detects that the victim hasn’t inserted them.
Furthermore, end-points can periodically change the private trig- 2. Resource allocation: Each server uses Fair Queueing [7] to
gers associated with a flow. Another alternative would be for the allocate resources amongst the triggers it stores. This way
receiver to associate multiple private triggers to the same flow, and the damage inflicted by an attacker is only proportional to
the sender to send packets randomly to one of these private triggers. the number of triggers it maintains. An attacker cannot sim-
The alternative left to a malicious user is to intercept all private trig- ply use a hierarchy of triggers with loops to exponentially
gers. However this is equivalent to eavesdropping at the IP level or increase its traffic. As soon as each trigger reaches its fair
taking control of the server storing the trigger, which makes share the excess packets will be dropped. While this tech-
no worse than IP. nique doesn’t solve the problem, it gives time to detect
With , a public trigger is known by all users in the system, and and to eventually break the cycles.
thus anyone can eavesdrop the traffic to such a trigger. To alleviate To increase protection, each server can also put a bound on
this problem, end-hosts can use the public triggers to choose a pair the number of triggers that can be inserted by a particular
of private triggers, and then use these private triggers to exchange end-host. This will preclude a malicious end-host from mo-
the actual data. To keep the private triggers secret, one can use
public key cryptography to exchange the private triggers. To initiate Note that an attacker can still count the number of connection re-
2
a connection, a host encrypts its private trigger !3 under the
quests to . However, this information is of very limited use, if
any, to the attacker. If, in the future, it turns out that this is un-
public key of a receiver , and then sends it to via ’s public
2
acceptable for some applications, then other security mechanisms
trigger. decrypts ’s private trigger @3 , then chooses its own such as public trigger authentication will need to be used.
4.5
power law random graph following network topologies in our simulations:
4
transit−stub
1 A power-law random graph topology generated with the INET
topology generator [16] with 5000 nodes, where the delay of
90th Percentile Latency Stretch
- -
1
0 5 10 15 20 25 30 35 2 @3 4176 . As discussed in Section 4.5, once the first packet reaches
the server storing the trigger 2 @3 4176 , caches and sends all
-
Number of Samples
- -
subsequent packets directly to . As a result, the packets will be
Figure 8: The 90th percentile latency stretch vs. number of routed via IP from to and then from to 1 . The obvious
samples for PLRG and transit-stub with 5000 nodes. -
question is how efficient is routing through as compared to rout-
ing directly from to 1 . Section 4.5 presents a simple heuristic
in which a receiver 1 samples the identifier space to find an iden-
nopolizing a server’s resources. $4
tifier @3 that is stored at a nearby server. Then 1 inserts trigger
3. Loop detection: When a trigger that doesn’t point to an IP
address is inserted, the server checks whether the new trigger
2 @34 45176 .
Figure 8 plots the 90th percentile latency stretch versus the num-
doesn’t create a loop. A simple procedure is to send a special ber of samples D in a system with QO4 S servers. Each point
packet with a random nonce. If the packet returns back to the represents the 90th percentile over 1000 measurements. For each
server, the trigger is simply removed. To increase the robust- measurement, we randomly choose a sender and a receiver. In
ness, the server can invoke this procedure periodically after each case, the receiver generates D triggers with random identi-
such a trigger is inserted. Another possibility to detect loops fiers. Among these triggers, the receiver retains the trigger that is
more efficiently would be to use a Bloom filter to encode the
- -
stored at the closest server. Then we sum the shortest path latency
set of servers along the packet’s path, as proposed in the from the sender to and from to the receiver, and divide it by
Icarus framework [34]. the shortest path latency from the sender to the receiver to obtain
the latency stretch. Sampling the space of identifiers greatly low-
4.11 Anonymity ers the stretch. While increasing the number of samples decreases
Point-to-point communication networks such as the Internet pro- the stretch further, the improvement appears to saturate rapidly, in-
vide limited support for anonymity. Packets usually carry the des-
tination and the source addresses, which makes it relatively easy
#%
dicating that in practice, just Q"O L samples should suffice. The
receiver does not need to search for a close identifier every time a
for an eavesdropper to learn the sender and the receiver identi- connection is open; in practice, an end-host can sample the space
ties. In contrast, with , eavesdropping the traffic of a sender will periodically and maintain a pool of identifiers which it can reuse.
not reveal the identity of the receiver, and eavesdropping the traf-
fic of a receiver will not reveal the sender’s identity. The level of 5.2 Proximity Routing in > ?
anonymity can be further enhanced by using chain of triggers or While Section 5.1 evaluates the end-to-end latency experienced
stack of identifiers to route packets. by data packets after the sender caches the server storing the re-
ceiver’s trigger : , in this section, we evaluate the latency incurred
5. SIMULATION RESULTS by the sender’s first packet that matches trigger : . This packet is
routed through the overlay network until it reaches the server stor-
In this section, we evaluate the routing efficiency of by sim-
ing : . While Chord ensures that the overlay route length is only
ulation. One of the main challenges in providing efficient routing
2
6 , where is the number of servers, the routing latency
P
is that end-hosts have little control over the location of their trig-
can be quite large. This is because server identifiers are randomly
gers. However, we show that simple heuristics can significantly
chosen, and therefore servers close in the identifier space can be
enhance ’s performance. The metric we use to evaluate these
very far away in the underlying network. To alleviate this problem,
heuristics is the ratio of the inter-node latency on the network to
we consider two simple heuristics:
the inter-node latency on the underlying IP network. This is called
the latency stretch. 1 Closest finger replica In addition to each finger, a server
The simulator is based on the Chord protocol and uses iterative
#%
maintains C 7Q immediate successors of that finger. Thus,
style routing [26]. We assume that node identifiers are randomly
distributed. This assumption is consistent with the way the iden-
each node maintains references to about C )
other
nodes for routing proposes. To route a packet, a server se-
tifiers are chosen in other lookup systems such as CAN [22] and lects the closest node in terms of network distance amongst
Pastry [23]. As discussed in [26], using random node identifiers (1) the finger with the largest identifier preceding the packet’s
increases system robustness and load-balancing. 6 We consider the
We have also experimented with identifiers that have location se- was shown to approximate the Internet latency well [20]—onto the
one-dimensional Chord identifier space. However, the preliminary
mantics. In particular, we have used space filling curves, such as results do not show significant gains as compared to the heuristics
the Hilbert curve, to map a 3 -dimensional geometric space—which presented in this section, so we omit their presentation here.
16
16
14
90th Percentile Latency Stretch 14
10
10
8
8
6
6
4 4
(a) (b)
Figure 9: The 90th percentile latency stretch in the case of (a) a power-law random network topology with 5000 nodes, and (b) a
transit-stub topology with 5000 nodes. The servers are randomly assigned to all nodes in case (a), and only to the stub nodes in
case (b).
#%
identifier and (2) the C Q immediate successors of that fin- theory, this allows us to use any of the proposed lookup algorithms
ger. This heuristic was originally proposed in [5]. that performs exact matching.
1
Both insert trigger requests and data packets share a common
Closest finger set Each server chooses
fingers as header of 48 bytes. In addition, data packets can carry a stack of
6 and E L . To
C=2 6 , where 2 E up to four triggers (this feature isn’t used in the experiments). Trig-
route a packet, server considers only the closest
) gers need to be updated every 30 seconds or they will expire. The
fingers in terms of network distances among all its
control protocol to maintain the overlay network is minimal. Each
fingers. server performs stabilization every 30 seconds (see [26]). Dur-
ing every stabilization period all servers generate approximately
Figure 9 plots the 90th percentile latency stretch as a function
control messages. Since in our experiments the number
of ’s size for the baseline Chord protocol and the two heuris- of servers is in the order of tens, we neglect the overhead due to
tics. The number of replicas C is 10, and is chosen such that the control protocol.
J C )
. Thus, with both heuristics, a server con- The testbed used for all of our experiments was a cluster of Pen-
tium III 700 MHz machines running Linux. We ran tests on systems
siders roughly the same number
number of servers from L
I of I routing entries. We vary the
to L , and in each case we aver-
of up to 32 nodes, with each node running on its own processor.
age routing latencies over 1000 routing queries. In all cases the The nodes communicated over a shared 1 Gbps Ethernet. For time
server identifiers are randomly generated. measurements, we use the Pentium timestamp counter (TSC). This
As shown in Figure 9, both heuristics can reduce the 90th per- method gives very accurate wall clock times, but sometime it in-
#%
centile latency stretch up to L times as compared to the default cludes interrupts and context switches as well. For this reason, the
high extremes in the data are unreliable.
Chord protocol. In practice, we choose the “closest finger set”
heuristic. While this heuristic achieves comparable latency stretch
with “closest finger replica”, it is easier to implement and does not 5.4 Performance
require to increase the routing table size. The only change in the In the section, we present the overhead of the main operations
Chord protocol is to sample the identifier space using base in-
)
performed by . Since these results are based on a very prelim-
stead of L , and store only the closest
fingers among the inary implementation, they should be seen as a proof of feasibil-
nodes sampled so far. ity and not as a proof of efficiency. Other Chord related perfor-
mance metrics such as the route length and system robustness are
5.3 Implementation and Experiments presented in [5].
We have implemented a bare-bones version of using the Chord Trigger insertion: We consider the overhead of handling an
protocol. The control protocol used to maintain the overlay net- insert trigger request locally, as opposed to forwarding a request
work is fully asynchronous and is implemented on top of UDP. The to another server. Triggers are maintained in a hash table, so the
implementation uses 256 bit (B J LNO ) identifiers and assumes time is practically independent of the number of triggers. Inserting
that the matching procedure requires exact matching on the 128 a trigger involves just a hash table lookup and a memory alloca-
most significant bits ( DPJRQLS ). This choice makes it very unlikely tion. The average and the standard deviation of the trigger inser-
that a packet will erroneously match a trigger, and at the same time tion operation over 10,000 insertions are 12.5 sec, and 7.12 sec,
gives applications up to 128 bits to encode application specific in- respectively. This is mostly the time it takes the operating system
formation such as the host location (see Section 2.4.3). to process the packet and to hand it to the application. By compar-
For simplicity, in the current implementation we assume that all ison, memory allocation time is just 0.25 sec on the test machine.
triggers that share the first 128 bits are stored on the same server. In Note that since each trigger is updated every 30 sec, a server would
40 80
50 percentile 50 percentile
Per Data Packet Forwarding Overhead (usec) 25, 75 percentiles 25, 75 percentiles
35 10, 90 percentiles 70 10, 90 percentiles
25 50
20 40
15 30
10 20
5 10
0 0
0 200 400 600 800 1000 1200 1400 0 5 10 15 20 25 30 35
Packet Payload Size (bytes) Number of I3 Servers
Figure 10: Per packet forwarding overhead as a function of Figure 11: Per packet routing overhead as a function of
payload packet size. In this case, the header size is 48 bytes. nodes in the system. The packet payload size is zero.