Internet Traffic Engineering
Internet Traffic Engineering
Internet Traffic Engineering
Number 532
Computer Laboratory
April 2002
JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/
c 2002 Richard Mortier This technical report is based on a dissertation submitted October 2001 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/TechReports/ Series editor: Markus Kuhn ISSN 1476-2986
Abstract
Due to the dramatically increasing popularity of the services provided over the public Internet, problems with current mechanisms for control and management of the Internet are becoming apparent. In particular, it is increasingly clear that the Internet and other networks built on the Internet protocol suite do not provide sufcient support for the efcient control and management of trafc, i.e. for Trafc Engineering. This dissertation addresses the problem of trafc engineering in the Internet. It argues that trafc management techniques should be applied at multiple timescales, and not just at data timescales as is currently the case. It presents and evaluates mechanisms for trafc engineering in the Internet at two further timescales: ow admission control and control of per-ow packet marking, enabling control timescale trafc engineering; and support for load based inter-domain routeing in the Internet, enabling management timescale trafc engineering. This dissertation also discusses suitable policies for the application of the proposed mechanisms. It argues that the proposed mechanisms are able to support a wide range of policies useful to both users and operators. Finally, in a network of the size of the Internet consideration must also be given to the deployment of proposed solutions. Consequently, arguments for and against the deployment of these mechanisms are presented and the conclusion drawn that there are a number of feasible paths toward deployment. The work presented argues the following: rstly, it is possible to implement mechanisms within the Internet framework that enable trafc engineering to be carried out by operators; secondly, that applying these mechanisms with suitable policies can ease the management problems faced by operators and at the same time improve the efciency with which the network can be run; thirdly, that these improvements can correspond to increased network performance as viewed by the user; and nally, that not only the resulting deployment but also the deployment process itself are feasible.
Acknowledgements
I welcome the opportunity to thank my supervisor. During the course of this Ph.D., Ian Leslie read and discussed more versions of this dissertation than anyone should have had to; Ian Pratt was the instigator of much debate; and Simon Crosby provided a most enriching internship with Cplane Inc. I wish to acknowledge the work of Christopher Clark for the initial NS implementation of the DropTailMtk queue used in Section 3.4; the work of Ian Pratt in collecting the data for Table 3.1 describing the back-off behaviour of various TCP stacks and providing the complex trafc model for Chapter 3; and the work of Austin Donnelly in modifying the VIC tool to adapt in response to receive reports for its use as an RTP source in Section 3.5. This Ph.D. was initially funded by an EPSRC CASE award in conjunction with BT, and latterly by Marconi Research, Cambridge. Thanks are due to all who suffered to read drafts of this dissertation, but especially to Steve Hand, Rebecca Isaacs, Ian Leslie, and Ian Pratt. Any remaining errors are mine alone. Finally, I also wish to thank all those past and present members of the SRG who made this Ph.D. such an educational, rewarding, but above all alcoholic, experience. In particular: Paul Barham, Herbert Bos, Austin Donnelly, Steve Hand, Tim Harris, Rebecca Isaacs, Paul Jardetzky, Derek McAuley, Andrew Moore, Ian Pratt, Sean Rooney, Dave Stewart, and Neil Stratford. Cheers.
Chapter 1. Introduction
Chapter 1
Introduction
Data networks exist to transport information at the behest of users. Users receive value from the network based on the various properties of this transport, such as latency, throughput, and reliability. Network providers operate networks to provide value to users by carrying data under user-specied constraints. The process of managing the allocation of network resources to carry trafc subject to constraints is known as Trafc Engineering. The Internet currently provides for trafc engineering only at data timescales, with poor support for expression of policy. The thesis of this dissertation is that mechanisms for trafc engineering in the Internet are required at multiple timescales, and furthermore, pricing is a useful mechanism through which to express trafc engineering policies as it is both intuitive and exible. This chapter motivates this thesis, and then summarises the contributions and outlines the remainder of the dissertation.
1.1
Trafc engineering
Trafc engineering is concerned with the performance optimization of networks [Xiao00]. It addresses the problem of efciently allocating resource in the network so that user constraints are met and operator benet is maximized. It can be performed automatically or through manual intervention, and is required at a variety of timescales discussed below. One might consider that current technology trends remove the need for trafc engineering. Advances in optical networking are making ever-increasing amounts of bandwidth available, effectively reducing the marginal cost1 of
Marginal cost is dened as the increase in cost of production resulting from a small increase in output.
1
Chapter 1. Introduction
bandwidth to zero. The widespread deployment of such technologies is accelerating, and companies are able to sell high-bandwidth, trans-national and international connectivity simply by massive over-provisioning of their networks. Notwithstanding such developments, trafc engineering remains important and efcient mechanisms for performing it are therefore valuable. There are a number of reasons it retains importance, perhaps principally that both the number of users and their expectations are exponentially increasing in parallel to the exponential increase in available bandwidth. In addition, the bandwidth available to users at the edges of the network is undergoing dramatic increase with the deployment of technologies such as xDSL, Fibre-tothe-Curb, and Fibre-to-the-Home. Coupled with these increases in user numbers, expectation, and access bandwidth, it remains the case that companies that have invested in such overprovision of bandwidth need to recoup sunk costs. Service-differentiated pricing and usage-proportional charging are widely accepted mechanisms for doing so. Simple and cost-effective mechanisms for monitoring usage and ensuring that customers receive what they request are required to make usage-proportional charging practical. Consequently, trafc engineering still performs a useful function for network operators and customers. Enabling it to be performed in an efcient and consistent manner is valuable.
A note on terminology
Throughout this dissertation, packet is used to mean the smallest unit of data considered by the network, whether a frame as in Ethernet, or an Internet datagram as in IP (INTERNET PROTOCOL). A network such as the Internet consists of links that connect pairs of nodes. If an interior node is capable of routeing IP packets, it is known as a router. A collection of routers controlled by a single administrative entity forms an AS (AUTONOMOUS SYSTEM). ASs which carry trafc for other ASs are known as transit ASs; those which dont are called stub ASs. An internet is a network that runs IP; the (public) Internet is a particular instance of an internet, formed by the interconnection of many ASs. A multiservice network is a network such as the Internet which attempts to offer multiple services directly over the same transport technology. This contrasts with older networks such as the telephone network, which have typically offered only a single basic service to users. When discussing pricing the following terms will be used:
Chapter 1. Introduction
1.2. Timescales
Cost refers to the value expended in providing the service; in a sense the manufacturing cost. This can include both the marginal cost of forwarding a particular packet, the sunk cost associated with the installation of bres, and the costs of maintaining and managing an installed network. Pricing is the process of associating a (potentially arbitrary) number with some service. In most real systems this will have some relation to the total real cost of the service and not, in fact, be arbitrary. Note that the consumer need not see the price, but will instead be charged some amount based on the price. Charging is the mechanism by which the price is expressed to the consumer. The charge can be viewed as a function of the price and the consumer involved, and perhaps other parameters. This separation of pricing and charging gives greater exibility for the operator to express policies such as time-, customer- or service-specic discounts, whilst keeping pricing mechanisms consistent and simple. Billing is the mechanism by which charges are recovered from the customer. It is possible for this to be applied either pre- or post-consumption. Although all these issues are connected, the research described in this dissertation concentrates on the application of pricing and charging to trafc engineering in the Internet, rather than on billing. Given the required information, billing is a problem associated with the support services provided to the operator by the network, and by the operator to the customer. The work described in this dissertation should have a positive impact on the problem of cost-effective billing, but this is not an explicit aim.
1.2
Timescales
A common scheme, and that followed in this dissertation, is to classify network resource control into three timescales: data, control, and management [Hui88]. Data timescales are considered to be of the order of packet forwarding times. They concern the behaviour and effect of individual packets or packettrains within the network. At these timescales resource control is usually concerned with controlling transient overload in the network, and with buffer management in the network switching elements and endpoints. For example, TCP (TRANSMISSION CONTROL PROTOCOL) provides both ow and congestion control in terms of segments transmitted. Dealing with resource control at these timescales is not part of
Chapter 1. Introduction
the main contribution of this dissertation; it is however relevant and thus discussed in Chapter 2. Control timescales concern ows, where a ow is a collection of closelyspaced packets travelling between two dened end-points. Closelyspaced is not a well-dened term, but can be taken in the Internet to mean that inter-packet gaps are of the order of less than a few seconds. Flows are not only generated through open-loop control, such as single le transfers, but also as related collections of transfers, such as involved in downloading an entire web-page when using HTTP/1.0. At these timescales, trafc engineering techniques include CAC (CONNEC TION ADMISSION CONTROL ), and per-ow signalling and resource reservation. For example, RSVP (RESOURCE RESERVATION PROTO COL ) [ RFC 2205] creates paths with resource guarantees through the network. Management timescales concern large aggregates of trafc as might be routed between ASs in the Internet. The protocols that act on these scales are routeing protocols, such as BGP (BORDER GATEWAY PROTOCOL) and OSPF (OPEN SHORTEST PATH FIRST), operating at timescales of the order of minutes or hours. In addition, they include the longerterm deployment and provisioning decisions of the network operators at timescales of the order of days, weeks and longer. All three timescales must be addressed since they all affect the service perceived by users and the ease and efciency with which the network can be operated. Each has a space analogue in terms of the level of aggregation data deals with packets; control deals with aggregates of packets, i.e. ows; and management deals with aggregates of ows.
Chapter 1. Introduction
than simply using the network as they desire. This results in incentives for users to be non-conforming, increasing policing problems for the service provider. Technology oriented approaches typify the one bit ts all paradigm, and assume that all users of a particular technology place the same value in the use of that technology. They can be further classied as user approaches or network approaches. User approaches rely on the user accurately expressing the requirements of their trafc in some manner dictated by the technology, such as the complex trafc specications required by the ATM Forum service classes. Experience with such systems suggests that most users are unable to specify their requirements to the required degree of accuracy, as they do not fully understand the characteristics of their trafc. Network approaches attempt to treat all trafc equally, relying on the protocols for fairness of resource allocation. For example, the Internet largely relies on the TCP protocol to perform perconnection resource allocation. By mandating that implementations comply with the standard, and furthermore, by recommending that newly developed Internet transport protocols implement TCP friendly congestion control, the network explicitly aims to give no ow preference over another. Technology oriented approaches have a number of aws. Specically, different users typically place different valuations on transfer of data, giving many users an incentive to misuse the protocols. Ensuring that users do not do so is difcult. More generally, it is very hard to design and implement such protocols in a robust and efcient manner even if there is no malicious intent, one implementation can give a user an unfair advantage over another. In the Internet, these protocols also give the user very little control over the service that they receive, leading to users having very little incentive not to use non-conformant implementations to their advantage. The IETFs on-going DIFFSERV (DIFFERENTIATED SERVICES) effort does provide simple mechanisms to enable users to express different service requirements for their trafc. However, it does not provide any rm guarantees as to the service trafc will receive, concentrating instead on allowing simple differentiation between the service individual packets receive. Furthermore, it does not address the problems of network interconnection and how to translate between service classes at network borders, and how DIFFSERV should be used to build end-to-end services. Even hybrids of the above approaches are not sufcient to perform resource allocation satisfactorily in multi-service networks. For example, the phone network could be considered a mixture of the service approach and the network approach. Its charging infrastructure has allowed operators to control the load on the network and to inform the provisioning of the network suc-
1.4. Contribution
Chapter 1. Introduction
cessfully in the past. However, the phone network has offered the same basic service, largely unchanged for over 50 years, allowing detailed statistics to be built up about trafc patterns. The Internet does not follow the same trafc patterns, and there is increasing evidence that the growth of the Internet is even changing trafc patterns in the phone network. These effects are leading to severe problems for operators still using old models and it may even by the case that Internet trafc is not susceptible to such techniques [Leland93, Leland94]. In addition, new services are introduced at a much higher rate and with much less operator control. These problems seem unlikely to be solvable by accruing statistics and building trafc models.
1.4 Contribution
This dissertation contends that existing approaches to trafc engineering in the Internet are not sufcient for multi-service networks, as the Internet is increasingly becoming. Although the Internet provides end-to-end connectivity, it guarantees nothing more as soon as administrative boundaries are crossed. Provision for QOS (QUALITY OF SERVICE) in the Internet is currently dependent on two mechanisms: existing data timescale approaches to resource allocation which are typically restricted to TCP-friendly congestion control mechanisms; and SLA (SERVICE LEVEL AGREEMENT) negotiation between network operators, which typically occurs very slowly. A more exible approach is required to address these two problems. Firstly, trafc engineering should be considered from multiple timescales: data, control, and management. Secondly, a mechanism to allow the requirements of many different trafc types to be specied and compared is required. Finally, such a mechanism should give users incentives to behave truthfully in specifying their constraints to the network to help reduce management and policing costs. There are two principal contributions of this dissertation. Firstly, it proposes and evaluates mechanisms for control timescale trafc engineering in the form of admission control for TCP and an ECN (EXPLICIT CONGES TION NOTIFICATION ) proxy for RTP ( REAL - TIME TRANSPORT PROTOCOL ). These allow the operator to control contention for network resources on a per-ow basis enabling them to offer different levels of service at a more useful granularity than per-packet differentiation. Secondly, management timescale trafc engineering is discussed, leading to proposal and evaluation of a price path attribute for BGP. This enables operators to advertise prices for transit of trafc, allowing them greater control
10
Chapter 1. Introduction
1.5. Outline
settle-
over the ow of trafc between ASs, as well as more automated ment, reducing network management costs.
SLA
Additionally, this dissertation discusses possible consequences of deployment of these proposals, and the effects their deployment might have on the network. In general, such mechanisms should allow information about both the users value judgements and the networks state to ow freely, and to be used by many layers of the protocol stack.
1.5
Outline
The detailed structure of this dissertation is as follows. Chapter 2 describes background and related work, covering the basic Internet protocols, approaches to resource control in networks and the Internet in particular, and pricing approaches to network resource control. Additionally, the assumptions made about the structure of the network for the work described in this dissertation are detailed. Chapter 3 discusses the application of control timescale trafc engineering to the Internet in the form of ow admission control for TCP and an RTP-ECN-proxy. It evaluates these mechanisms and also discusses the use of pricing to implement ow admission policies. Management timescales are considered in Chapter 4, and the application of pricing to inter-AS trafc engineering is discussed. Results from a prototype implementation of a price path attribute for BGP are presented. Deployment and integration issues and consequences are considered in Chapter 5, and nally conclusions are drawn and further work suggested in Chapter 6.
11
1.5. Outline
Chapter 1. Introduction
12
Chapter 2. Background
Chapter 2
Background
This chapter provides general background to the work presented in this dissertation. It briey introduces the principal protocols and technologies referenced throughout this dissertation, in addition to discussing prior approaches to resource control and pricing in networks. It also notes underlying assumptions about the structure of the network.
2.1
Internet protocols
This section describes the basic Internet protocols relevant to the rest of the work in this dissertation. As alluded to in Section 1.1 the Internet is very loosely structured, functioning as an ad hoc collection of ASs, providing connectivity between networks and thus users. This looseness of structure is often considered in large part responsible for the success of the Internet and its associated technologies. By concentrating on connectivity, and by making few assumptions about the services to be run, there is a great deal of exibility in development and deployment of new services. At the same time, the basic service of enabling communication between all connected users remains well-supported, as demonstrated by the continuing increase in popularity of services such as email [Odlyzko00].
13
Chapter 2. Background
individual hosts1 . Due to its simplicity and the few assumptions it makes about the underlying network layers, IP can be implemented over almost any network layer and has therefore been extensively deployed. An IP packet consists of a header and payload. The header contains the source and destination addresses, and the TOS (TYPE OF SERVICE) for the packet, along with other information relevant to the transport of the packet. In classical IP, a packet traverses the network based solely on its destination address. A decision is taken at each router as to where the packet should next be sent based on the destination address contained in the packet header and the current contents of the routers routeing tables. These tables may be maintained manually, or by a separate routeing protocol as discussed in Section 2.4. Packet delivery is best effort, with the TOS byte providing only a hint to the router concerning the packets desired treatment.
IP
IP
addresses.
14
Chapter 2. Background
received, and port numbers to enable the receiver to distinguish between multiple sources or destinations at the same IP address. This allows the transmitters operating system to multiplex the trafc of multiple concurrently executing processes into the network, and the receivers operating system correspondingly to demultiplex this trafc. UDP is commonly used where the reliable ordered byte stream nature of TCP is inappropriate, such as when real-time data is being transported, or where the latency of the handshake process in TCP is too great for the application in question.
15
Chapter 2. Background
Laevens00], as was discussed in Section 2.1.2. The connection will subsequently linearly increase its transmission rate until another loss event is detected. This gives TCP its characteristic sawtooth transmission pattern, as it probes for bandwidth, experiences loss, backs-off, and repeats the cycle. A more recent attempt to improve the congestion behaviour of TCP resulted in Vegas [Ahn95, Brakmo95, Low01]. Rather than continually probing the network to see if it may increase its congestion window when it reaches steady state, it attempts to estimate the correct congestion window size. This is done by accurate estimation of the RTT using the assumption that the lowest RTT is the RTT that the network would allow if it was not carrying the trafc associated with this connection. This allows the protocol to estimate the amount of data it has in ight, and then to adjust its transmission rate (and RTT estimate) to ensure that its estimated fair share is not exceeded. This behaviour only applies in the congestion avoidance, or steady-state, phase; when loss is detected, the standard TCP congestion control mechanisms are applied. In Chapter 3 it will be shown that there are situations in which neither the behaviour of standard TCPs nor that of Vegas TCP is sufcient to guarantee acceptable performance of the network.
The
16
Chapter 2. Background
2.1.6 Discussion
The Internet is a worldwide network supporting an enormous number of users and services, from simple data transfer to more demanding soft real time multimedia applications. In large part the success of the Internet has been attributed to the simplicity of the service provided by IP, and the exibility that this allows [Odlyzko00]. However, this exibility does come at a price precisely because IP is so simple, it provides very little support for more demanding applications. For example, congestion control had to be implemented in TCP after serious problems with congestion in the Internet arose [Jacobson88], and the schemes implemented could not rely on support from IP. This led to congestion control schemes reliant on packet loss as the congestion signal, and gave rise to schemes which tend to react suddenly and harshly to congestion. Similarly, RTCP transmitters have to rely on loss information from the RTP stream being returned to them in RRs. Moreover, the TOS byte and DIFFSERV notwithstanding, classical IP traditionally provides little support for differentiated forwarding treatment of packets. ECN attempts to retro-t support for congestion avoidance to IP by enabling packets to be marked as having been in the network at a time when routers were becoming overloaded. This enables ECN-aware applications to behave more intelligently, as they can now choose to react to congestion before it becomes serious rather than relying solely on packet loss to signal that the network is busy. As well as potentially allowing a smoother reaction to the onset of congestion, it also allows applications to better hide the onset of congestion from users. For example, RTP applications can still display the information contained in marked packets, whilst noting that the codec should perhaps alter its behaviour in the face of oncoming congestion. Those ECN-aware protocols that mandate specic behaviour in the face of received marks still restrict user responses in the face of congestion. They do not allow different users to express their differing valuations of the trafc they transmit all marked packets are treated identically, as mandated by the protocol specication. Additionally, both the newer ECN-aware protocols and earlier protocols running over IP rely on the end-system to ensure that users receive only their fair share of network resources. The network has few mechanisms for enforcing such behaviour, and users have little or no incentive to conform. More recent approaches within this paradigm have exposed the congestion information as a price to be charged to users. This allows users to make the decision as to whether they should continue using the network based on the current price of the resource. The price of the resource is calculated
17
Chapter 2. Background
based on the mark probability, and users are effectively charged according to the marks they cause to be generated. This creates an incentive for users to behave fairly. The rationale is that every marked packet that a user receives contributed to causing congestion, otherwise it would not have been marked. Since the user receives benet by receiving that packet, they should pay for it. When there is no congestion in the network, no packets will be marked, and so no-one pays any per-packet fee; this assumes that the marginal cost to the network operator of carrying packets is zero. It also assumes that the utility to a user of a marked packet is identical at a given instant, for all marked packets and all users. Hence, even though the actual price charged may change in time with network conditions, and the marking policy may vary from router to router, neither the user nor the network has any way of explicitly differentiating to the network between two packets emitted at the same time to the same destination. Support for this type of differentiation is the subject of the next sections, which discusses network resource control in general, and then specically in the Internet.
18
Chapter 2. Background
equal to the minimum such share. A ow can enter the network at any time, but should attempt to discern the bandwidth it may use and not exceed this amount. Simultaneously, ows that have already entered the network must ensure that they detect when their fair share allocation has reduced, and reduce their use accordingly. TCP is perhaps the most widely-known example of a protocol that follows such rules. More recent work has developed schemes whereby trafc transported over UDP can also be made to behave fairly. Such schemes can be generally split into two: equation/model based, and sender/receiver based. Model based schemes use mathematical models of TCP to dene behaviour that leads to a fair share resource allocation [Padhye99, Floyd00]. Sender/receiver based schemes perform rate control as in TCP, utilising some system of acknowledgement for successfully received data [Sisalem98, Rejaie99,Rhee00]. This enables the relevant end-point to implement an additiveincrease, multiplicative-decrease rate control in a similar manner to TCP. In order to make both TCP and UDP based schemes more fair, a variety of router marking disciplines has been investigated. Most use various queue properties to infer which ows are receiving more than their fair share and to penalise them through preferential marking or dropping. Examples include RED [Floyd93], FRED [Lin97], WRED [Bodin00], and SFB [Feng99]. Other schemes use more active methods, such as matching incoming packets against already queued packets to see if two packets are from the same ow [Pan00], or using ICMP (INTERNET CONTROL MESSAGE PROTOCOL) source quench messages to allow routers to control transmitters based on router queue occupancy [Rangarajan99].
Proposals for distributed admission control exist and are discussed in Section 2.2.4.
19
Chapter 2. Background
conventional telephony systems is simplied by the fact that connections require unit resource and are established end-to-end. This makes it easy for the network to know if it may accept a connection, since it is of a known, constant bandwidth, with a route determined at connection setup time. Any switch on the route may reject a connection during the connection setup phase. Typical ATM (ASYNCHRONOUS TRANSFER MODE) signalling methods [ATMF-UNI96] use a similar end-to-end system, but require that connections desiring QOS guarantees should declare certain parameters such as the peak and sustained transmission rates in order that resources may be reserved at connection setup [ATMF-TM99].
20
Chapter 2. Background
All attempts to provide an incentive for users to co-operate to prevent congestion rather than simply mandating co-operation lead to the associated problem of enforcement. At the same time they enable users to use the network even during times of congestion simply by paying for their contribution to the congestion. This gives greater exibility in network access than allowed by traditional CAC schemes. A variety of pricing algorithms have been proposed and studied, based on effective bandwidth [Courcoubetis97, Courcoubetis98a, Courcoubetis98c], trafc priority [Sairamesh95, Odlyzko99a, Odlyzko99b], and on achieving proportional fairness [Kelly97a, Kelly98]. Implementation varies based on the underlying technology. Proposed schemes for ATM networks include use of pricing at connection admission to encourage users to correctly declare QOS parameters [Kelly97b], and use of prices to cause users to exert control over their cell transmission rate [Murphy94]. In the Internet, schemes have been proposed that would allow users to place the price that they would be willing to pay for a packets transmission into the packets header [MacKie-Mason95]. The network would then make a decision as to whether the packet should be transmitted or dropped, and would charge users the highest price of all the dropped packets. Alternatively, a price can be charged at connection setup time under RSVP [Tassel97] and TCP [Edell95]. With the advent of packet marking techniques in the Internet, where individual routers can mark packets based on their own load and hence the congestion in the network, a number of related pricing schemes have been proposed. They make use of a small number of bits in the IP TOS byte to signal to end-systems that a packet caused congestion [Key99a, Kelly00]. These signals can either be used purely for congestion control at end-points, or can be used by ISPs to charge users for emitting trafc at times of congestion. By combining incentive compatible pricing with admission control, distributed admission control schemes are produced [Gibbens99a, Breslau00, Kelly00]. These allow the edge nodes to probe the network and then make their own admission control decisions as to whether or not to transmit at this time. These decisions can be based on the reported price and implemented at edge devices [Gibbens99a], or as part of the end-system protocol [Kelly00]. This removes the problems of per-ow monitoring and trafc characterization to the edge of the network, either to edge devices or to the end-systems themselves. In situations where users wish more control over their trafc, schemes such as WTP (WILLINGNESS TO PAY) [Key99a] are appropriate. By paying based on a price set by the network for marked packets received, users can use their willingness to pay as a signal that they assign trafc a high or low value. This allows differentiation between users trafc by the network, and allows
21
Chapter 2. Background
users exibility in choosing the service they receive. The price for this is that users now have to deal directly with uctuation in the price being applied. To ameliorate the complexity associated with the more dynamic resource pricing schemes, agent based systems [Courcoubetis98b, Courcoubetis98d] automate the process of dealing with price uctuation. This insulates the user from the details of short timescale uctuations in price, while still allowing them to make decisions to transmit based on the network advertised price, and hence based on congestion in the network. Whether the transmitter or receiver pays generally depends on the service being provided; out-of-band mechanisms for settlement may be used. For example, users might pay monthly subscription charges for a real-time media service, with the service provider (as the originator of the majority of the trafc) paying network operators for marked packets on a day-to-day basis. This simplies the billing problem from the service providers point of view as they now only have to bill users a xed monthly amount and deal with trafc monitoring and uctuating prices from the far smaller number of operators. Additionally, it makes the service much easier to use from the user point of view as they no longer have to deal with rapidly varying prices for services.
2.2.5 Discussion
This section briey presented the principal categories of resource control in computer networks. The rst three fair share resource allocation, admission control, and measurement based admission control have different trade-offs, dependent on whether the extra accounting work required by admission control is an acceptable price for the tighter control of resource allocations, and on how easily and accurately trafc sources may be characterized. It should be noted that the choice is not exclusive it is possible to deploy protocols that implement different resource allocation schemes within the same network. Specic applications of network resource control to the Internet are discussed in the following section. Techniques such as queue management and TCPfriendly rate control discussed in Section 2.2.1 are orthogonal to the admission control mechanisms. Queue management and rate control aim to share bandwidth fairly and smoothly between competing ows, whereas admission control aims to ensure that there are not so many ows competing for the resource that the fair share mechanisms fail. The nal category, incentive compatible pricing, is most akin to the work developed in this dissertation. Its principal failing is in the complexity of the implementation details. Although much theoretical work has been done on
22
Chapter 2. Background
pricing algorithms and marking strategies, this has only addressed the problem from the end-to-end point of view of the network, and not considered interconnection of different operators. It is also still not clear how accurately the theory models reality, and how well many of the results concerning such attributes as price and network stability will translate into implementation. Furthermore, the implementation details of user interaction with the various network pricing and brokerage schemes have yet to be fully addressed [Oliver00]. Aggregation of marks, futures schemes [Semret99], user interfaces [Bouch99, Bouch00], and billing systems [Edell95,Chu99] are all implementation details still being addressed.
2.3
This section briey describes proposals for extending IP and the Internet protocols in general to better support trafc engineering.
23
Chapter 2. Background
that much coarser service differentiation will be satisfactory given the plentiful nature of bandwidth in the future. Using parts of the TOS byte, or DSCP (DIFFERENTIATED SERVICES CODE POINT), as identication, PHBs (PER HOP BEHAVIOURS ) are dened which enable routers to give different levels of service to packets sporting different DSCPs. Standardized PHBs are mapped onto DSCPs by the IETF, and currently consist of best effort (standard Internet service with the added requirement that this class must not be starved), expedited forwarding [RFC 2598], a low latency, low jitter, low loss service, and assured forwarding [RFC 2597], a low loss, ordered service. Experimental PHBs may also exist, but are not guaranteed to be supported. Different operators may implement the PHBs differently, the only proviso being that the standardized PHBs must be represented by their mandated DSCPs. Trafc between operators will be managed through SLAs, with promotion and demotion of trafc between PHBs allowed in order to meet the specied SLAs5 .
24
Chapter 2. Background
an LDP (LABEL DISTRIBUTION PROTOCOL), which may be classied according to how LIB entries are created: Request-driven LDPs use the messages of a control protocol such as traditional ATM signalling [ATMF-UNI96, ATMF-PNNI96], or RSVP [RFC 2205]. Topology-driven LDPs use information derived from network layer routeing protocols, such as BGP [RFC 1771], OSPF [RFC 2328], or the generic MPLS - LDP [ RFC 3036]. Trafc-driven LDPs use information gathered by monitoring the trafc streams being switched, as with IP switching [Newman98] for example. The separation in MPLS of these three network functions packet classication, packet forwarding and label distribution simplies the data path, and allows greater service differentiation between aggregates. Service differentiation at a range of packet aggregation granularities is supported, as is additional functionality such as trafc engineering. MPLS also provides a platform for building VPNs (VIRTUAL PRIVATE NETWORKS) to which resource guarantees may be given [RFC 2764, Isaacs00], and hence supporting multiple control systems with resource partitioning [Mortier01].
2.3.4 Discussion
The IETFs INTSERV effort aims to extend the Internet service model to support multimedia and data trafc within the same infrastructure. Although partially successful, in that the RSVP signalling protocol was dened and implemented, this approach has not been widely adopted. There are two principal problems: QOS is signalled on a per-ow basis, and this is believed to be unscalable in the Internet; and due to the need to support the new signalling protocol no benets can be seen from a partial deployment. If just one router on the desired path does not support RSVP, no guarantee can be given concerning the QOS that trafc on that path will receive. There is also considerable debate as to whether or not per-ow QOS is suitable for the Internet in any case. Given the widespread use of elastic protocols such as TCP and the proliferation of adaptive real-time protocols such as RTP and associated adaptive applications, it seems that adequate performance does not require per-ow guarantees. In response to this, the DIFFSERV effort aims to provide a much simpler, more evolutionary solution. The result is that benets can be seen with only partial deployment, and without requiring end-to-end signalling. However, DIFFSERV is very much work in progress and a number of important questions have yet to be resolved. For example, the precise denitions of the standardized PHBs are still being discussed, and further standardized PHBs
25
Chapter 2. Background
may be required. The denition and implementation of policies to prevent users requesting that all their trafc is given the most resource hungry PHB is a problem yet to be addressed. In addition, the DIFFSERV architecture [RFC 2475] explicitly makes no statement as to how PHBs should be implemented, leading to concern about the strength of guarantee that can be made about the QOS that trafc allocated to a given PHB will receive as it crosses administrative boundaries. MPLS was originally intended to improve the performance of IP over connection oriented networks such as ATM. However, as IP router performance has improved, it is now principally intended to enable more extensive trafc engineering capabilities for IP trafc. The mapping of IP trafc into FECs and then of FECs onto LSPs allows aggregates of IP trafc to be treated together, avoiding some of the scalability problems associated with INTSERV. Considerable thought has also been given to efcient support within MPLS for a variety of network types such as ATM and frame relay, as well as support for newer network services, such as VPNs. As such it can be considered a supporting layer for the INTSERV and DIFFSERV efforts discussed above. However, concern remains about how MPLS will support resource management between domains as well as within domains [Mortier01].
26
Chapter 2. Background
router to keep an LSA (LINK STATE ADVERTISEMENT) for every node in the network, where each LSA contains information about the neighbours of that node. The network bandwidth and computational scalability of the two protocol types in an arbitrary network is less clear, but both are considered to require only a modest amount of bandwidth and have incremental computation versions. However, link state protocols do provide more functionality, and generally converge faster than distance vector protocols [Zaumen91,Zaumen92]. Consequently, link state protocols are usually considered preferable in situations where the increased memory requirements are not prohibitive [Perlman00]. Routeing in the Internet is effectively performed hierarchically, with routeing within an AS, or intra-AS routeing, performed by the OSPF (OPEN SHORTEST PATH FIRST ) or ISIS ( INTERMEDIATE - SYSTEM INTERMEDIATE - SYSTEM ) protocols, and routeing between ASs, or inter-AS routeing, performed almost exclusively by BGP. Since OSPF and ISIS are very similar, both being link state protocols for intra-AS IP routeing6 , only OSPF will be described. As the only widely deployed inter-AS routeing protocol and the basis for the work described in Chapter 4, BGP is also described. Additionally, as an example of a previous Internet routeing protocol that dynamically calculated route metrics based on congestion, the HELLO protocol is described.
27
Chapter 2. Background
Area1 Host 1
Backbone Area
Area2 Host 7
Host 2 Host 3
Host 6
Host 4 Host 5
Hosts 2 and 6 are the inter-area routers, routeing packets between Area 1 (containing hosts 1, 2, 3, and 4), and Area 2 (containing hosts 6 and 7) via the backbone area which contains hosts 2, 5, and 6.
Figure 2.1: An
OSPF
single area or within the backbone run a single copy of the OSPF algorithm. Routers at the boundaries of areas and of the backbone run multiple copies. Each router builds up a view of the topology of its area using the LSAs from the other routers. The process starts with the routers announcing themselves through the OSPF hello protocol7 , allowing the adjacencies and capabilities of routers to be established. Once a router establishes adjacency with another router, it waits for LSAs from that router, whilst sending its own advertisements to other routers. These advertisements allow the current area topology to be distributed to all routers within the area. Border routers summarise the topology of their areas, and distribute these summaries into the backbone and thence to the border routers of other areas. The routeing process itself is based on running the shortest path rst algorithm over the routeing tables constructed from the LSAs from each router. This forms shortest paths from the router running the algorithm to all routers and networks within the area, and to all the border routers for routes outwith the area. The length of a path is calculated based on its metric, of which there are two types. Type 1 metrics are equivalent to the link state metrics used for internal routes, and are guaranteed to be less than Type 2 metrics used for routes learnt from sources external to the area. This ensures that internal links are always used in preference to external links, under the assumption that routeing between areas and ASs always costs more than routeing within an area or AS. OSPF also originally allowed routers to provide a separate set of
7
HELLO
28
Chapter 2. Background
AS1
R1 R2 R5
AS2
R6
R3
R4 R8 R9 R10
R7
AS3
AS1 contains R1, R2, R3, and R4; AS2 contains R5, R6, and R7; and AS3 contains R8, R9, and R10. Inside the ASs, interior BGP is operating as a fully connected mesh of BGP peering arrangments with a session between each pair of ASs.
Figure 2.2: A
BGP
AS s.
routes for each IP TOS, for cases where metrics vary based on the TOS. This ability was removed in a later revision due to lack of implementation experience [RFC 2178]. However, the LSAs can still carry the required information for compatibility reasons.
29
Chapter 2. Background
and UPDATE. The rst two respectively enable two peers to conrm that the session between them is still alive, and to inform each other of errors during the lifetime of the session. The principal means for communicating route information is the UPDATE message, used to advertise and withdraw prexes. Additionally, an UPDATE message may contain a number of path attributes which apply to all the prexes being advertised in that message. Receipt of an UPDATE message can cause the BGP peer to recalculate its routeing table. Assuming that the ltering policies applied to the other peer allow this peer to accept the route information contained in the message, this calculation is performed in two parts. The rst is based on longest prex match, so the most specic route to a prex is always used. If there is a tie that is if two ASs are advertising exactly the same prex then the path attributes associated with the prex are considered, with the route selection process stopping as soon as the tie is broken and a unique route discovered. The two principal attributes considered are: LOCAL-PREF: a locally valid metric (higher is better) associated with a path. If a route is learnt via an external BGP peer or via static conguration, this metric will be recomputed at the learning router, otherwise it is learnt from the advertising router. AS-PATH: this eld has the AS number prepended as the route is advertised throughout the network. Shorter AS-PATHs are selected over longer ones if there is no unique route remaining after the LOCALPREF attribute has been considered. Other attributes are standardized and can be used if the two above are insufcient. Eventually all ties will be resolved, ultimately by choosing the route learnt from the router with the lowest BGP identier (typically the highest IP address associated with the router).
DCN -Local
30
Chapter 2. Background
for a given destination. Lower RTT estimates signied lower congestion and so were preferred; changes in estimate of less than 100 ms were rounded up to 100 ms. In the case of an update being received for a connected network, the gateway node simply used its own estimate rather than concerning itself with the estimate contained in the route update. Although the HELLO protocol initially worked well it was found to suffer from a number of problems. As the network grew in size, it did not scale well10 . Also, as packet timescales within the network altered and the load characteristics of the network changed, it became clear that RTT was not a good basis for a routeing metric. At times of high load, routeing changes can lead to large changes in load, and hence queueing delay, on links. This consequently causes too high a degree of oscillation in the routeing tables. The fundamental problem is that the RTT varies on too short a timescale to be generally useful as a measurement of load. In particular, if the network becomes heavily loaded, the lengths of queues in the network can become very large, causing similar increase in the RTT. This can cause a positive feedback cycle to occur, where a small increase in trafc in the network causes the routeing protocol to attempt to reroute trafc down unloaded links. This causes those links in turn to become congested, and so the protocol tries again to redistribute trafc. The effect of this is to cause the RTT on these links to begin to oscillate wildly, further increasing the frequency at which the protocol reroutes trafc. The routeing metric was revised in 1987 [Khanna89] to smooth the measured delay values and limit the relative change in metric between successively reported values; to normalize the reported costs to take into account how the network might react to such a change; and to cause the algorithm to shed load from overloaded links more gradually. However, as the Internet increased in size and competing commercial entities began to take part, use of a common delay metric in this manner was dropped since a suitable trusted metric could not be decided upon [RFC 975]. Chapter 4 will discuss use of measures of congestion as drivers for BGP routeing metrics.
2.4.4 Discussion
As stated previously, OSPF is intended as an IGP. Consequently, whilst it has support for metrics enabling choice between multiple routes, its routeing hierarchy requiring that all packets wishing to travel between two areas must do so via the backbone is not sufciently scalable for use as an EGP. Additionally, more recent revisions of the OSPF v2 standard [RFC 2178, RFC 2328]
Indeed, it also seems to have been a precursor to the EGP [RFC 904] protocol, the scaling properties of which led to the initial development of BGP.
10
31
Chapter 2. Background
recommend that implementations should continue to accept PDUs (PROTO COL DATA UNITS ) with TOS routeing options, but that this information should not be utilized due to lack of implementation experience required by the IETFs standardization procedure. Notwithstanding this, recent research [Fortz00, Wang01, RFC 2676] has suggested dynamically setting OSPF weights to achieve trafc engineering objectives. Similar techniques have been proposed for MPLS [Elwalid01]. BGP is designed as an EGP and therefore does not impose such a restrictive hierarchy as OSPF. However, due to the peer-to-peer nature of BGP routeing and the desire to ofoad trafc to another AS as soon as possible, route asymmetries are common [Paxson97]. This can lead to situations where one direction of the path between two end-points has vastly different characteristics to the other. Furthermore, it lacks a globally useful metric and so provides poor support for situations where a choice between routes must be made. Current deployments rely on articially increasing the length of the AS-PATH by prepending multiple copies of their own AS number to those routes they wish to discourage study of a sample BGP database from the KPN-Qwest peering point suggests that approximately 8% of best routes have been so treated. This implies that there does exist a desire amongst operators to be able to inuence routes taken by others. Additionally, such techniques appear to have a detrimental effect on the efciency of the network interaction with ltering policies appears to cause approximately 20% of paths to be inated by 50% or more since not all routes are available to all ASs [Tangmnarunkit01]. A globally valid path attribute for BGP, the destination preference attribute, appears to have been discussed in the IETF between 1994 and 1996 but seems not to have become sufciently advanced to be standardised in an RFC 11 . More recent work has suggested use of an avoidance level attribute for routes, to enable safe backup routeing [Gao01]. This is similar to the price path attribute proposed in Section 4.2.3, but intended to be used in a more restricted manner. Situations where there is a choice between routes seem likely to arise due to the desirability of multi-homing for reasons of reliability and performance for customers. Multi-homing occurs when a customer connects to the Internet in more than one place as depicted in Figure 2.3. This can have strong implications for the aggregatability of addresses, generally considered to be of utmost importance in the Internet. As the network has grown and address space become more fragmented, one of the most signicant contributions of BGP is that it enables routeing by address aggregates, meaning that not every
Notwithstanding this, it does appear to be available in at least two deployed mentations.
11
BGP
imple-
32
Chapter 2. Background
R1
R1
R2
ISP Customer
R2 R3
ISP Customer
R2
ISP3
ISP1
ISP2
Customer
(c) Customer is multi-homed to two ISPs which connect via a third ISP.
BGP
multi-homing congurations.
router need know about every routeable address. This is true both for those routers within an AS and those that interconnect ASs. Multi-homing can occur either by the customer connecting to a single ISP in multiple places, or by them connecting to the Internet through more than one ISP. In the former case the ISP can attempt to inuence the customers choice of route in two ways. If the customer has multiple egress routers which connect to the ISPs network as in Figure 2.3(a), the ISP can make use of the MULTI-EXIT-DISCRIMINATOR or LOCAL-PREF attributes to control trafc distribution. Alternatively, if the customer has one egress router connecting to the ISP in multiple places as in Figure 2.3(b), different prexes can be advertised as reachable from the different points in the ISPs network, allowing the ISP to control the distribution of inbound customer trafc. The most problematic case is where a customer wishes to use multiple providers for Internet connectivity as in Figure 2.3(c). In this case care must be taken
33
Chapter 2. Background
over who owns the IP addresses that the customer will use. If the customer uses addresses delegated by one ISP then it is likely that they will be announced by that ISP as part of an aggregated block, but by the other ISPs as addresses specic to that customer (since those ISPs cannot aggregate the addresses delegated by the rst ISP). In this case the longest-prex-match behaviour will take over and potentially cause trafc for the customer to arrive via all but the rst ISP. Similar problems can arise if the customer is delegated addresses out of all the ISP s address spaces care must be taken to avoid magnetic longest-prexmatch behaviour if the addresses delegated from one ISP are advertised to the others to give increased reliability for the customer. The alternative is for the customer to get its IP addresses from some other registry. However, this decreases the aggregatability of routes in the Internet, since all ISPs are then unlikely to be able to aggregate the customers addresses into their existing address blocks. Some of these problems can be solved by using the BGP community attribute. This is an optional, non-transitive attribute containing policy information concerning the associated routes. There are three values standardized: NOADVERTISE, NO-EXPORT-SUBCONFED, and NO-EXPORT12 . All control the scope over which a route will be advertised. The rst restricts advertisement to the router receiving the advert, the second to the sub-AS that receives the advert for a confederation13 and the last to the AS receiving the advert. These community attributes notwithstanding, problems remain with the expression of policy in BGP. Although the standardized community values allow expression of policies that solve a number of the more common problems arising from such situations as multi-homing, they are not exible. As they can only express whether or not a route should be re-advertised, they cannot easily be used to choose between possible routes, and as a consequence they are not useful for more general trafc engineering. Currently they are also manually congured and conguration cannot easily be automated, leading to extra network management burden.
34
Chapter 2. Background
national
35
Chapter 2. Background
2.5.4 Discussion
This decentralized structure, with ISPs connecting at many points, potentially to many other ISPs, allows a great deal of exibility in the network. However it is not without problems, principally boiling down to accountability. For valid historical and technological reasons, the Internet does not provide good mechanisms for accountability [Clark88]. Originally designed to provide end-to-end connectivity for co-operating users across a small number of co-operating public networks, there is little inherent support for monitoring, authentication or policing of network use. As described above, protocols such as TCP rely on compliant implementations to ensure that some approximation to fairness for users is achieved. As the Internet becomes more commercial and provides more socially fundamental services such as telephony, this problem manifests itself in two ways. The rst is of a commercial nature: those responsible for the interconnecting networks need to know the quality of the service they are providing and receiving in order that they can effectively manage their networks and the agreements they enter into with other operators. The second is that as the network increases in popularity and use, government agencies and regulators become involved. This involvement typically leads to extra requirements on operators to provide audit trails and information to allow the regulators to ensure that an acceptable service is being provided at an acceptable price. One aim of the work in this dissertation is to address some of the problems
36
Chapter 2. Background
2.6. Summary
posed by the way these changes have altered the way the network is structured and consequently operated.
2.6
Summary
This chapter has provided some background to the technologies on which the work described in this dissertation relies. The relevant Internet protocols were described, along with current techniques for providing resource allocation within networks in general and the Internet in particular. In addition, techniques for implementing routeing within the Internet were discussed and the structure of the resulting Internet described. This chapter illustrates that the requirements of networks have evolved past the simple services provided by IP. Better control of the network and more facilities for resource management, particularly resource management between operators, have become requirements on the network. Although individual users are perhaps more easily satised as bandwidth becomes plentiful, there is still a requirement that inter-operator management be performed due to the large volumes of trafc and money involved. SLAs between operators are still specied and must be managed and met. Having illustrated the continuing requirement for trafc engineering at multiple timescales, the next two chapters present attempts to implement this for the Internet, beginning with control timescales in the following chapter.
37
2.6. Summary
Chapter 2. Background
38
Chapter 3
3.1
Current models for congestion control in the Internet rely solely on congestion control at data timescales. Congestion is controlled via the individual packets allowed to enter the network. Information is provided to hosts concerning the state of the network, as seen by individual packets. This can be inferred as in TCP where the protocol detects the onset of congestion by attempting to detect if a packet was dropped [Jacobson88, Fall96], or explicit as in ECN [Floyd94]. Alternatively, protocols such as RTP [RFC 1889] attempt to use more explicit information about the delay on the path between transmitter and receiver to detect when the network is becoming congested. This information can be aggregated and sent to the transmitter by the receiver, in order that the transmitter can alter its transmission rate or coding scheme appropriately. Schemes providing more information to the hosts have been suggested, as
39
described in the ECN modications to IP [RFC 2481] for example. This enables information to be provided to hosts before drops occur and also to be provided more smoothly, i.e. at a ner granularity.
UK
40
successfully downloaded pages the connection and session level goodputs respectively. When a user aborts a ow due to poor performance, bandwidth has effectively been wasted at the very time it was most scarce, since the data already transferred is of little or no use, and restarting the ow will usually require that this data be retransmitted.
41
1e+07 Mean per-flow throughput, RED queue Mean per-flow throughput, droptail queue Median per-flow throughput, RED queue Median per-flow throughput, droptail queue
1e+06
1e+05
1e+04
1e+03
1e+02
1e+01
10
1000
10000
10 9 8 Per-flow throughput: ratio of standard deviation to mean Droptail queue 7 6 5 4 3 2 1 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 RED queue
(b) Coefcient of variation (the ratio of standard deviation to mean) of throughput vs. number of active ows.
TCP
TCP
model.
42
1e+07 Mean per-flow throughput, droptail queue Mean per-flow throughput, RED queue Median per-flow throughput, droptail queue Median per-flow throughput, RED queue
1e+06
1e+05
1e+04
1e+03
1e+02
1e+01
10
1000
10000
16 14 12 10 8 6 4 2 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
(b) Coefcient of variation (ratio of standard deviation to mean) of throughput vs. number of active ows.
TCP
TCP
Vegas model.
43
src in
sink
src
sink
TCP .
This is supported by Figure 3.1(b). This is a plot on linear axes of the coefcient of variation (ratio of the standard deviation to the mean) of the per-ow throughput vs. the number of active ows. It shows that this ratio is increasing, again with the effect much more marked for the DropTail case. This suggests that the proportional variability of per-ow throughput is becoming much greater as the number of ows increases. The same experiments, using both DropTail and RED queues, were then run using the NS re-implementation of TCP Vegas congestion control. This does not model TCP as completely as the full TCP agent, but does use the more sophisticated Vegas congestion control algorithm. Figure 3.2 shows results similar to the full TCP case, indicating that even with the smoother congestion avoidance behaviour of TCP Vegas, congestion collapse due to excess ows is still exhibited. Using both the full TCP and Vegas TCP models, variability between and within ows over shorter timescales is also apparent. With just 100 ows competing against each other individual ows can achieve very low or even zero goodput whilst others achieve more than their fair share for tens of seconds at a time. Similarly, within an individual ow the goodput achieved can vary substantially over the duration of the simulation. All the above experiments were also performed taking bandwidth estimates over 1 and 10 second periods with no signicant differences in the results. Under both regimes collapse is due to the over-reaction of TCP to congestion. As more ows compete, losses become harder to recover from using the fast-retransmit mechanism. It then becomes more difcult for a ow that reduces its window to recover using slow-start, so ows that reduce their windows are likely to retain small windows for relatively long periods of time, whereas ows that start increasing their windows will increase them very quickly. Consequently, although the link is constantly utilised, individ-
44
ual ows experience short intense bursts of activity followed by long quiet periods. Thus, as Massouli and Roberts [Massouli 99] and Kumar et al [Kumar00] e e argue more abstractly, it makes sense to allow operators to control the admission of trafc at a variety of levels and specically at the ow level, rather than just at the packet level. This should help to temper the effects of congestion, and ensure that bottlenecks never become so heavily overloaded that real-time services and interactive applications over TCP can make no useful progress. The following section discusses the application and impact of ow admission control in the Internet.
3.2
Internet ow management
This section discusses the requirements for control timescale trafc engineering, and considers different implementation approaches. Mechanisms for ow detection, and ow admission and denial are discussed, followed by consideration of suitable policies to implement using these mechanisms.
3.2.1 Requirements
Protocols such as TCP are considered elastic in their resource demands since they operate relatively satisfactorily within a wide range of resource allocations. Real time protocols are typically inelastic in their resource demands, having a much smaller useful operating range, introducing further complications. They often also place constraints on the amount of delay in the network. In the case of an elastic protocol such as TCP, the delay constraints are not very stringent users care about the time for a web page download to complete, not the time for a given packet to arrive. In the case of real time trafc however, the delay of a given packet can noticeably degrade the quality of the media stream. To avoid this applications must use either extensive buffering leading to additional latency, or redundant coding and error correction schemes leading to wasted bandwidth. To avoid these problems, some means of differentiating between trafc associated with different services is required. Trafc carrying data with real time constraints should not be buffered behind trafc carrying data without real time constraints, but should instead be expedited through the network, or dropped if this is not possible. As stated in Chapter 1, computer communication is predominantly ow based so it is often the case that dropping entire ows is of more benet to
45
the network and the user than allowing the ow to begin and then restricting the bandwidth it can achieve. The remainder of this section considers the design of suitable admission controllers in more detail.
46
discussed in the following section. This technique was used in the proxy discussed in Section 3.5.
In certain circumstances the second approach has benets over the rst, due to its more explicit knowledge of ow initiation and completion. This may be of use where the network element performing the estimation is known to be the single point of connection to the network for example a rewall connecting a corporate LAN to the Internet. In more general topologies however, some timeout mechanism is still required, since trafc belonging to a ow may travel through different sequences of network elements. Ensuring that the network element implementing the admission control function has up-to-date knowledge of all the required protocols could also be a problem, unless an additional complementary approach were used to deal with currently unsupported protocols. However, it does have signicant benets in terms of the state required compared to the rst approach. The third approach is based around measurement techniques and is similar in spirit to MBAC. Estimation of the number of active ows in a router is performed using statistical information provided by the router, such as packet drops or queue dynamics. More generally, the load on the router is estimated through such statistics, and then assumptions about the trafc mix are applied to generate an estimate of the number of active ows. This approach has a number of advantages over the two previously discussed and was used for the implicit admission control work presented in Section 3.3. It has very low state overheads, and these overheads do not usually scale with the number of ows. It also does not in general require per-ow or per-packet calculation, as information is typically aggregated in blocks over time, and calculation performed after a block has been gathered. The challenge with such methods is making good assumptions about the trafc mix, and nding relationships between the measured statistics and the number of ows.
47
A more efcient solution is to signal to the endpoint generating the ow that the ow has been denied. One mechanism would be to introduce a new protocol or to extend an existing protocol, allowing a router within an AS that wished to deny a ow to signal to the ows ingress router that packets for that ow should be dropped. While effective, such protocols introduce yet more control trafc to the Internet, and require implementation on all routers within an AS (if not throughout the entire Internet) to be useful. A compromise between dropping the packets of a ow, and explicitly signalling denial of admission to a ow, is termed Implicit Admission Control [Mortier00]. This uses protocol specic knowledge to notice packets associated with ow setup, such as SYN packets in TCP, and then either drop these packets, or generate the correct protocol specic message, such as a RST packet in TCP, to prevent setup of the connection. This approach has the advantage that it denies access to a ow early in its lifetime, preventing bandwidth being wasted, but without introducing further complexity in the network in the form of new control protocols. However, as with the second approach to ow detection discussed in the previous section, it does require that admission controllers be kept up-to-date with new or updated protocols.
48
users receive the QOS they desire. At the same time, the network must attempt to efciently and fairly share the available resource among different users. Current trends suggest that packet marking according to congestion experienced is a useful mechanism for achieving fair sharing at the same time as allowing service differentiation. WTP [Key99a] is a mechanism that enables users to express the relative importance of their trafc by causing them to pay for marked packets received, and enabling them to express different per-ow packet mark rates to the endsystem. This allows the customer to pay more for the ows they believe to be important and less for those they dont. By also controlling the number of ows entering the router, elastic protocols principally TCP at this time are able to remain in an operating region which provides some assurance of progress for users. Alternatively, by using mark-proxies such as the one described in Section 3.5, the ISP can exert control over customers resource use. By translating received marks (i.e. congestion signals) into harder congestion signals (e.g. either dropping the packet or clearing the mark for TCP, or rewriting the RR loss eld for RTP), the ISP can inuence users behaviour. In conjunction with the admission control mechanisms previously described, this allows ISPs greater control over users resource use. In a network supporting packet marking, WTP and mark-proxies are two schemes at opposite ends of the spectrum of control. The rst allows users ne grained control over their resource use, enabling them to express on a per-ow basis their desires to the network. The cost of such control is that they may have to deal with rapidly varying prices, and the possibility of being starved of access to the network. The second allows the ISP to use the congestion information received from the network to inuence user behaviour, to ensure that the network can maintain a given level of service, both perow and per-user. This allows them to offer simpler pricing schemes to those users unwilling to deal with the complexity of a completely unregulated WTP scheme. In the latter case, it might make sense to offer a at rate per TCP ow pricing scheme, or to cap the number of ows a user is allowed. It is then the task of existing protocols such as TCP, or more expressive approaches such as WTP, to share bandwidth between a given users ows. This could also have consequences for the complexity of the ISPs billing system rather than having to bill a user for the number of marks generated, with each mark potentially costing a different amount, users can be billed for their contracted number of ows, a much simpler problem. This section rst discussed the design of the two mechanisms involved in implementing admission control: estimation of the number of ows, and the
49
admission and denial of individual ows. It then discussed policies that could be implemented over these mechanisms, and the expression of these policies through pricing. The following sections now describe the simulation and implementation work carried out to validate the proposed mechanisms.
50
FORWARD chain
De-masquerade
IN
OUTPUT chain
INPUT chain
Sanity check
Checksum
Route
OUT
1 2 3
1 2 3 4 5
filter4 filter
5
(a) Architecture, showing ve installed lters and associated code for the FORWARD chain only.
OUT
no isSYN? /proc/mtk mtk? yes admit deny genRST RST transmitted Admission control user-space process
(b) M TK/TCP implementation.
51
NetBSD 1.3 OSF/1 3.2D SunOS 5.5.1 SunOS 5.6 Windows 98 Windows NT4.0
SYN RTO interval sequence (s) Data RTO interval sequence (s) 2.8, 6.0, 12.0, 24.0 1.4, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0*7 3.7, 10.1, 24.0 0.5, 0.5, 4.0, 8.0, 16.0, 32.0, 64.0*7 3.0, 6.0, 12.0, 24.0, 48.0, 96.0, 120.0*5 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8, 25.6, 51.2, 102.4, 120 .0*6 6.0, 12.0, 24.0 1.0*11 0.7, 3.0, 6.0, 12.0, 24.0 1.4, 3.0, 6.0, 12.0, 24.0, 48.0, 64.0*8 1.7, 5.1, 11.8, 25.3, 52.3, 106.3, 162.6 0.9, 0.8, 1.5, 3.0, 6.0, 12.0, 24.0, 48.0, 56.3*6 3.5, 6.4, 12.8, 25.6, 51.2 0.2, 0.5, 1.0, 3.8, 7.6, 15.3, 30.6, 61.2, 122.4 2.9, 6.0, 12.0 0.3, 0.6, 1.2, 2.4, 4.8 3.2, 6.6, 13.1 0.6, 0.9, 1.8, 3.5, 7.0
Total (s) Total (s) 44.8 511.4 37.8 509.0 789.0 924.6 42.0 11.0 45.7 606.6 365.1 434.0 99.5 242.6 20.9 9.3 23.0 13.8
x, y means that packet py was retransmitted y seconds after packet px . x*n means that n packets were retransmitted at intervals of x seconds.
Table 3.1: Measurements of packet retransmission intervals for some implementations following SYN and data loss.
TCP
behaviour to keep a running estimate of the current probability of a packet being dropped, returns its decision. If the decision is to allow the ow to enter the network, the control packet is returned to the usual forwarding path and continues further into the network. If the decision is to deny access to the ow, then the packet is either dropped, or rewritten as a valid TCP RST packet, its source and destination addresses and ports swapped, and then returned to the forwarding path. This results either in the originating host detecting the loss of the connection setup packet and backing off, or in it receiving an explicit reset for that connection setup attempt.
52
tions; it demonstrates that although the mandated behaviour is not always followed, no implementation would cause the link to be overloaded with SYN packets. The decision to retry may also be taken by the application or user, rather than the protocol. Other suggested methods of denying admission to a connection include using ICMP source quench and ICMP reject: unknown protocol messages [Kumar00]. The former has the advantage that it also allows the operator to control the throughput of active connections since it reduces the receivers congestion window to one. The latter has the disadvantage that in addition to denying the requesting ow access, it can also cause existing ows between the same endpoints to break.
3.3.4 Conclusions
Even a relatively heavy-weight ow estimator such as M TK uses less than 10% CPU when the machine is forwarding at line rate (100 Mb/s). As previously stated, the M TK estimator does not keep per-ow state. The results concerning TCP stack behaviour demonstrate that deployment of implicit admission control would not cause a signicant increase in the amount of TCP control trafc in the network. The results concerning application behaviour demonstrate that existing applications respond satisfactorily to the denial of their ows. In all, the results from this section demonstrate that implicit admission control is feasible from the points of view of router, end-system, and user behaviour.
53
The following section considers the performance impact on the network and users ows of implicit admission control using the M TK estimator.
54
20 nodes src
10 nodes sink
src
sink out
20 nodes 10Mb/s; 5ms link delay 10Mb/s; 10ms link delay src
10 nodes sink
10Mb/s; 2ms link delay bottleneck in 34Mb/s; 200ms link delay out
src
sink
src
sink
55
Two topologies were studied initially, shown in Figure 3.5. Both are dumbbell topologies, one with constant delay links, the other with links of varying delay to simulate ows with different RTTs. Although topologies such as these are too simple to satisfactorily model a network such as the Internet, it is also the case that the natural place to position systems such as implicit admission control, or the RTP-ECN-proxy discussed in Section 3.5, is at the ingress to an ISPs network, or at the egress from a stub AS. Such places are likely to be the principal bottlenecks that user trafc will see, since the core network is dimensioned to keep ahead of demand. Similarly, LAN technologies are such that users are unlikely to experience congestion within their own network. Consequently modelling the network as a dumb-bell is not unrealistic from the users perspective. To complement the two topologies, two trafc models were also used. The rst is simple constant size bulk data transfer with Poisson arrivals process, with each ow transferring 1 MB of data. This is congured explicitly to heavily overload the link. The second is a more complex model with ow lengths generated from a distribution constructed from data obtained by analysis of web server logs from a variety of sources; this mixes many short ows with a few much longer ows, leading to the heavy-tailed ow length distributions commonly reported [Paxson94a, Paxson94b].
56
7e+06 offered load; no AC 6e+06 5e+06 data (bytes/s) 4e+06 3e+06 2e+06 1e+06 0 dropped load; AC thresh = 0.1 900 dropped load; no AC retransmitted load; no AC
100
200
300
600
700
800
7e+06 offered load; no AC 6e+06 5e+06 data (bytes/s) 4e+06 3e+06 2e+06 1e+06 0 dropped load; AC thresh = 0.1 900 dropped load; no AC retransmitted load; no AC
100
200
300
600
700
800
Figure 3.6: Offered, dropped and retransmitted load, with and without admission control. In both graphs, the retransmitted load is negligible in the admission control with threshold 0.1 case.
57
0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 No admission control Admission control, threshold 0.5 Admission control, threshold 0.1 Admission control, theshold 0.01
normalised frequency
20
40
60
80 time (s)
100
120
140
0.12 No admission control Admission control, threshold 0.5 Admission control, threshold 0.1 Admission control, threshold 0.01
0.08
0.06
0.04
0.02
20
40
60
80 time (s)
100
120
140
(b) Variable link delays as in Figure 3.5(b). Frequency counts were made over buckets of 2 seconds, and normalized to the total number of ows which complete. Note that the x-axis has been truncated for clarity; due to the no admission control case, it actually extends to 894 seconds.
58
(a) Flow durations: identical link delays as in Figure 3.5(a), simple trafc model.
(b) Flow durations: differing link delays as in Figure 3.5(b), simple trafc model.
Table 3.2: The number of ows completed, packets transferred by completed ows, the total number of packets retransmitted, and the duration means and standard deviations for the completed ows. Simulations were run for 900 seconds.
59
Based on these results, Figure 3.7 shows histograms of the time to successful completion for ows and Table 3.2 shows the means and standard deviations of their durations. These demonstrate that employing admission control can greatly increase the number of ows that successfully complete in a given time interval by allowing ows to complete substantially faster. Without admission control most ows do not complete, and those that do have a mean duration of 509 seconds and a standard deviation of approximately half the mean. Conversely, completion times when admission control is applied as leniently as the current estimator allows have a mean duration of 135 seconds, and a correspondingly lower standard deviation, and nearly 20 times more ows complete. Since TCP is greedy, admitted ows will attempt to use the available bandwidth in the bottleneck and the link remains at near full utilisation even with admission control in place. This is shown by the offered load results in Figure 3.6. In conjunction with those, the results shown in Figure 3.7 and Table 3.2 demonstrate that many applications will achieve higher utility if admission control is applied. Users may be prepared to wait for 1 minute for a large download to complete; they are less likely to be prepared to wait for 15 minutes. In effect the results demonstrate that it is possible for the network operator to tune the network based on users applications requirements, in order that users receive higher utility.
60
6e+06
5e+06
3e+06
2e+06
1e+06 dropped load; no AC retransmitted load; no AC 0 0 100 200 300 400 500 600 700 800 900 dropped load; AC thresh = 0.1
time (s)
(a) Offered load, drops and retransmissions, without admission control and with an admission threshold on the target loss probability of 0.1. Again, retransmitted load is negligible in the admission control with threshold 0.1 case.
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 offered load; AC thresh 0.1 No admission control Admission control, threshold 0.5 Admission control, threshold 0.1 Admission control, threshold 0.01
normalised frequency
10
15
20 time (s)
25
30
35
40
(b) Flow durations, where the frequency count has been made over buckets of 2 seconds, and normalized to the total number of ows to complete.
Figure 3.8: Results for the complex trafc model with the topology shown in Figure 3.5(a).
61
Flow durations: identical link delays, complex trafc model. The simulations were run for 900 seconds, using the complex web-cache log based trafc model. The topology shown in Figure 3.5(a) was used.
Table 3.3: Number of ows completed, packets transferred by completed ows, total number of packets retransmitted, and mean and standard deviations of the durations of the completed ows.
Threshold None 1.0 0.5 0.1 0.05 0.01 Completed Flows 15219 12065 11427 10192 9900 9157 Good Flows 4595 7255 7968 8591 8634 8618 (%) (30%) (60%) (70%) (84%) (87%) (94%) Bad Flows 10624 4810 3459 1601 1266 539 (%) (70%) (40%) (30%) (16%) (13%) (6%)
Only ows that started after the rst 100 seconds had passed were counted, in order to remove initial transient behaviour.
Table 3.4: The number of completed ows with the number that met a target of 10 packets per second over their lifetime (good ows), and the number that failed to meet this target (bad ows). excess short ows are unable to enter the link and cause the long ows to experience excessive loss. Examining Table 3.4 provides further insight. Picking a target of 10 packets per second per ow as a measure of useful goodput, the application of admission control nearly doubles the number of good ows that complete. This suggests that a large number of the extra ows that manage to complete with no admission control are receiving very low transfer rates, and are hence of less use. This table also suggests a manner in which the operator could set the threshold. One might choose a target throughput and then adjust the admission threshold to achieve it, the particular value depending on the trafc mix and on the level of service the operator wishes to provide for its customers. Better estimators might give one a more controllable parameter with greater dynamic range a weakness of M TK in these circumstances is that its maximum
62
3.5. An RTP-ECN-proxy
threshold is 1.0, which leaves quite a large gap in system behaviour between no admission control and admission control at its most lenient. However, M TK does appear to give a reasonable range of values for the operator to tune, which is encouraging given that the link is only experiencing overload of approximately 20%. Additionally, appropriate use of dynamic pricing schemes as suggested in Section 3.2.4 might give the operator some indication of how to set the admission parameters. Alternatively, more static pricing schemes simply require that the number of ows be limited. This could be achieved by adjusting the threshold in response to an estimator for the number of ows.
3.4.4 Conclusions
This section presented results from an implementation of implicit admission control for the NS simulator. The results show that with admission control in place ows experience lower completion times when the link is overloaded. The data presented demonstrate that even for relatively elastic protocols such as TCP there is benet to be had from limiting the number of ows competing for a congested resource. There is also evidence to suggest that admission control may be of benet even where the resource is not continuously overloaded, in that doing so may provide greater fairness in resource allocation to ows and users. The following section discusses implementation of a different approach to control timescale trafc engineering in the Internet: an RTP-ECN-proxy.
3.5
An RTP-ECN-proxy
This section presents the design of, and results from, an implementation of an RTP-ECN-proxy for Linux. This enables applications that are not ECNaware to be made aware of the congestion information being given to them as ECN marks. The proxy was implemented for the RTP protocol and utilises IPChains as shown in Figure 3.9, more dynamically than the implicit admission control implementation. The proxy watches for the initial port negotiation procedure of the RTSP protocol to enable it to see which ports will be used for the RTP conversation. Having discovered this information, it imposes a lter on the forwarding path to enable it to capture incoming ECN-marked packets, and outgoing RRs (RECEIVER REPORTS). The proxy can then account incoming ECN-marked packets to the relevant ow.
63
3.5. An RTP-ECN-proxy
IN
INPUT chain
OUT
isRTSP?
no
yes
reinsert
isKnownFlow?
yes
no
BYE
remove filter
SR
rewrite parse
RR
rewrite
<data>
implementation.
When the receiver sends an RR (containing packet loss statistics) back to the sender, the proxy captures the RR, and rewrites the packet loss eld to take account of the ECN-marked packets it has seen on that ow. This then causes the sender to adjust its rate as if the marked packets had been dropped. A more complete implementation would extend the RTP protocol to allow the RR to separate the packet lost and packet marked information, enabling the sender to make a more intelligent rate adjustment decision.
64
3.6. Summary
When used in conjunction with TCP admission control schemes, this might be considered one way in which QOS could be maintained for real-time trafc and existing TCP ows during times of congestion. Rather than allow even more best-effort TCP trafc to enter the network, the threshold for the admission control system could be modied to deny access to more ows. Existing ows would not be harmed by the onset of congestion collapse, and real-time trafc would be able to maintain a reasonable frame rate at the expense of resolution (or other suitable trade-off, dependent on the application and user preferences).
3.5.2 Conclusions
As with the implicit admission control implementation, CPU use for the RTPECN -proxy is low, so it is to be hoped that edge routers where such schemes might be deployed have sufcient spare computing power. Memory use for the proxy is approximately 100 bytes per ow, so such a system should scale to a reasonable number of ows.
3.6
Summary
This chapter discussed trafc engineering at control timescales within the Internet. It began by demonstrating the need for control timescale trafc engineering caused by TCPs unfairness and eventual congestion collapse due to too many ows contesting a restricted resource. It then discussed the implications of performing this sort of trafc engineering, and went on to consider design issues and possibilities. Finally, results to support the claims made were presented, and demonstrated that per-ow admission control and mark-proxies can be used to implement control timescale trafc engineering in the Internet, and that doing so has benets for users. The following chapter now discusses management timescale trafc engineering in the Internet.
65
3.6. Summary
66
Chapter 4
4.1
Scope
This section discusses the scope of this chapter. It describes the various components of routeing in the Internet, notes their application to management timescale trafc engineering, and their relation to data timescale trafc engineering as presented in Section 2.2.4. The following section then concentrates on inter-AS routeing and BGP, and modications to BGP for implementing management timescale trafc engineering.
67
4.1. Scope
matching a particular prex should be sent. An IP prex consists of an IP address and a prex length which states how many bits of the address are signicant. The routeing components can be split based on the origination of the prexes advertised and the receivers of the adverts. As previously noted in Section 2.4, the initial split can be made based on whether or not the prexes advertised originate within the AS. Prexes originating within an AS are advertised via the IGP (INTERNAL GATEWAY PROTO COL ). These protocols are typically restricted to the local area and examples include OSPF and ISIS. Conversely, EGPs (EXTERNAL GATEWAY PROTOCOLS) are used to advertise prexes originating outwith the AS, and the only widely deployed EGP is BGP. A further separation can be made based on the use to which BGP is put. If it is used to advertise prexes across administrative boundaries (i.e. between AS s), it is referred to as EBGP ( EXTERNAL - BGP ). Conversely, if it is used to re-advertise externally learnt prexes within an AS, it is referred to as IBGP (INTERIOR - BGP). This chapter concentrates principally on BGP in its EBGP form, although the application of IGPs and IBGP to management timescale trafc engineering is briey discussed in the following subsection.
68
4.1. Scope
measuring the load of the links over which it runs and using these measurements to calculate a price. This price can then be used to calculate values for the protocols routeing metrics. The intent of doing so is to cause trafc to be distributed efciently and automatically throughout the AS, much as was attempted by the HELLO protocol. The HELLO protocol and its use of RTT as a metric was discussed in Section 2.4.3; its fundamental problem is that the RTT is not very suitable as a measure of congestion for this purpose. A better alternative is available through packet marking schemes as discussed in Sections 2.1.2 and 2.2.4. Marking strategies and pricing schemes that damp the oscillations experienced with RTT can be implemented. Also, since the majority of intra-AS routeing protocols are link-state unlike the HELLO protocol, more explicit information is available for the shortest path computation. The price in this context could equally be viewed simply as the calculated load metric [Fortz00] as it is unlikely to be transformed into a charge since settlement within the AS is generally unnecessary. In conjunction with the generally better convergence properties of link-state protocols over path- and distance-vector protocols, it thus seems reasonable to suppose that intra-AS pricing could be implemented so as to avoid oscillatory behaviour. Consequently, the work described in the remainder of this chapter concentrates on the use of pricing for management timescale trafc engineering within BGP, a path vector protocol.
69
The aims of management timescale trafc engineering differ in a number of important ways. Management timescale trafc engineering aims to make the distribution of trafc through the network more efcient given the trafc that has already entered the network. Management timescale trafc engineering does not deal with transient overload in the network; this is dealt with through data timescale trafc engineering approaches and prevented through control timescale trafc engineering approaches. In this sense the problem of calculating suitable prices is much simpler: there is no optimum price to be reached. Rather, the concern is with the stability of the calculated routes and the smoothness of the trafc distribution. Furthermore, prices in the context of this chapter are only directly distributed between BGP peers and not to end-systems. The following section discusses the use of BGP for management timescale trafc engineering in more detail.
70
for a prex are available, the receiving router then bases its choice of best route on the values of these path attributes. A commonly seen example is the use of the length of the AS-PATH attribute, the number of ASs listed in the AS-PATH attribute associated with an advertised prex. Given a choice, a router will generally prefer the route with the smallest number of such hops. Further control is available through the community attribute. This is a path attribute consisting of four octets, the rst two of which are the AS number of the advertising router by convention. The values of the other two octets then encode either IETF standardised values, or have semantics dened bilaterally between the advertising and receiving ASs. Using this mechanism, an advertising AS can instruct a co-operating receiving AS to prefer one route to another, or to perform some operation on a route before re-advertising it. In this way more complex policies such as multi-homing are currently implemented [RFC 1998].
71
72
Although there are a number of possible network metrics on which to base the price, the work described in this dissertation concentrates on the use of packet marks for this purpose: routers will monitor the congestion that they are experiencing, and calculate a price based on this. It is believed that the mark rate is a useful measure of congestion since it takes into account both the remaining capacity and queueing delay on the link, but should not (given a sensible mark scheme) induce oscillatory behaviour. The price will thus be based on, and usually proportional to, the congestion that the router is experiencing. The price is calculated per-node based on the per-link load, and not per-link, since a price will be advertised on both IBGP and EBGP sessions. Although it may seem more natural to have a per-link price, there are a number of reasons why this is not appropriate. First, BGP does not have a natural notion of a link. Rather, BGP sessions are transported over TCP and so use the underlying IP network and IGP routeing information to effectively create virtual links between every pair of peering routers. Consequently, two apparently different links (i.e. two prexes with different NEXTHOP attributes) may overlap. Furthermore, depending on lower layer conguration, trafc apparently destined for the same NEXTHOP may actually traverse different links. Second, the price is used in both IBGP and EBGP peering sessions. When advertised in IBGP sessions, it is not mapped to a charge, but is essentially used to generate a per-AS price. When advertised in an EBGP session, it is mapped into a charge, but is attempting to advertise a per-AS charge, rather than a per-link charge. This makes settlement in situations where two ASs connect in multiple locations simpler, since the same charge will be advertised to both. Finally, the AS receiving the trafc remains at liberty to use other mechanisms such as MULTI-EXIT-DISCRIMINATOR to attempt to inuence the transmitting ASs choice of egress link, and to distribute trafc within its AS as it sees t. Before the calculated price is advertised to the peer through the UPDATE message, it is transformed according to local policy into a charge. As previously mentioned, this enables the operator to apply policy to inuence the inux of trafc to their network. Finally, the charge is advertised to the BGP peers through the UPDATE message. Charges received from peers are then used to calculate LOCAL-PREF values, allowing policy to inuence the efux of trafc from their networks1 .
An alternative available in most BGP implementations is to map the received charges into the administrative weight, a value associated with a route according to some per-router policy. This value is not advertised by the router, and indeed, is not documented in any current RFC.
1
73
4.2.4 Settlement
Having received price path attributes associated with prexes, operators can choose their best routes on the basis of the charges they will incur. Hence there must be a basis for settlement, where the charges associated with neighbouring prexes are transformed into bills. Although there are a variety of metrics on which settlement might be based, this dissertation proposes use of the trafc volume exchanged between peers. Trafc volume has a number of advantages: it is straightforward to understand and to measure; it is generally slowly varying between ASs, allowing operators to make relatively accurate predictions about future bills; and many operators already have to collect such information in order to police the SLAs into which they have entered. Of course, scope exists for more complex settlement schemes. For example, if suitable feedback could be arranged, settlement might be performed based on the number of packets marked. Although this links the nal bill more closely to congestion (since charges will not be levied unless congestion is occurring and hence packets being marked), such a scheme is more complex to understand and predict, and requires more infrastructure to support. The following section considers the detailed design of the price path attribute.
Since charges are advertised by an optional non-transitive path attribute, it is immaterial whether they are mapped into weight or LOCAL-PREF.
74
Pricing in the context of management timescales aims to reduce the manual intervention required to manage SLAs, enabling greater automation and to allow operators to make informed routeing choices for aggregates of trafc, where such choices exist. These choices may be driven by the type of network service the operator wishes to offer, and hence driven indirectly by end-user desires, but end-users do not directly inuence such routeing decisions. As a consequence, prices calculated for trafc engineering at management timescales have different requirements to prices calculated for data timescales: the price itself should stabilise, but the key point is that the BGP routeing tables should converge. Operators may be able to deal with oscillating prices since actual settlement will not be performed continuously, but network engineering considerations do require that BGP has no worse stability properties than at present. The rst consideration is the information available at a node for calculating the price. As described so far, a node has knowledge of the load its links are experiencing and the charges advertised to it from its peers. The price, pi , j at a node, Ni , may thus be viewed as a function, pi = p(li , ci ) j = i j j i is the charge node where li is the load between nodes Ni and Nj , and cj Nj advertises to node Ni . Note that although these parameters are all time dependent, this is not made explicit in the notation. Denoting the load at a j node Ni by li = j li , there are then a number of reasonable constraints that exist concerning the price: 1. pi > 0, since a negative price is nonsensical; 2. 3.
d dli (pi ) d2 2 (pi ) dli
> 0, since the price should rise as load rises; 0, to make the price less sensitive to load changes as load,
and hence the price, increases. The rst two constraints are trivial, but the third bears some explanation. As the network becomes excessively congested, changing the selected best route will have a progressively more disruptive effect, and so should become harder to do. The reason is that route stability is likely to be a more important constraint than maximising the revenue generated by these price-based mechanisms: operators have other means to generate revenue, and there is not much that the routeing protocol can do to deal with a network which is simply overloaded. At this point, measures such as admission control and the end-to-end congestion control mechanisms of the transport protocols must play their part to reduce congestion. Although this may be implemented through pricing visible to end users, it is not within the remit of the routeing protocol to calculate or advertise these prices.
75
1 2 3 4 5 6 7 8 9
at router Ri ) record number of marked packets li calculate price pi , based on load and received charges, ci , j = i j calculate LOCAL-PREF values calculate best routes FOR - EACH IBGP session: advertise routes with price pi to internal peers Rj , j = i FOR - EACH EBGP session: transform price pi into a (session-specic) charge cj , j = i i advertise routes with charge cj to external peers Rj , j = i i Figure 4.1: Basic pricing algorithm.
A further constraint not considered here is that the revenue that the operator generates from the network should be positive. This may not require that trafc based charges discussed here actually render a net positive result, since operators may generate revenue through other means, but is certainly something that should be considered in a real deployment. Similarly, if trafc based revenue is to be used to recover sunk costs, this factor might be taken into account when calculating the charge. In summary, the price should be positive, increase as the load on the router increases, and be related to the charges advertised by external peers. Both the price and the policies that might be implemented using it should not damage BGP stability, and at a given level of load on the network the price should itself converge to a stable value.
76
monitoring of trafc between routers and settlement of any difference in the load-charge product.
In fact, companies offering such Internet performance measurement services now exist [Keynote01, Matrix01].
77
4.3.5 Discussion
There are a number of potential issues with the price path attribute that bear some discussion. They can be split into four: route disaggregation; route oscillation; price oscillation; and interaction with pricing applied at other layers. These will be dealt with in turn.
78
Route disaggregation is a problem affecting the scalability and resource usage of BGP. It occurs since operators desire greater control over the trafc they carry. Since routeing is currently performed on a longest prex match basis, the only way for providers to exercise ner grained control over trafc aggregates is to disaggregate prexes in order to separate trafc aggregates. This allows them (and their neighbouring ASs) to choose different best routes for these smaller aggregates. Such disaggregation has traditionally been avoided at all costs, as it can increase both the number and size of UPDATE messages, and perhaps more importantly, it increases the size of routeing tables. However, methods to automatically create equivalent forwarding tables containing the (provably) minimum number of prexes exist [Draves99] and can yield a 45% reduction in the number of prexes in the forwarding table. Simultaneously, the computational power and memory capabilities of routers have dramatically increased. These facts coupled with the fact that disaggregation of this nature is occurring in any case due to multi-homing suggest it is not as serious a problem as it might at rst appear. Route oscillation is undesirable for a variety of reasons: it increases the routeing protocols resource usage; it can cause substantial variation in the path and hence network characteristics that end-to-end trafc experiences; and under heavy load conditions it makes network management more difcult since large quantities of trafc may be moved between links. Section 4.5 will discuss how dynamic routeing based on advertised prices may be implemented so as not to increase the potential for oscillatory behaviour of the protocol. In fact, it can also be speculated that it might actually cause a decrease. It is already known that BGP can suffer from oscillatory behaviour [Labovitz97, Labovitz98, Grifn99, Labovitz00]. Previous work on the stability of BGP suggests that instability results due to incompatible routeing policies [Grifn99,Varadhan00,Labovitz01]. Incompatibilities arise due to the application of different LOCAL-PREF values to routes from different providers, removing the monotonically increasing metric required to guarantee stability and provided by the AS-PATH length. The addition of a globally valid metric (i.e. the price) that will be monotonically increasing on the majority of paths should reduce the likelihood of instability due to policy incompatibility by restricting the set of implemented policies. A further benet of pricing in such situations is that any conicts arising due to prices (for example, two multi-homed ASs disagreeing over which of the two transit ASs should be used as depicted in Figure 4.4(a)) should be resolved precisely in the direction of the party placing the most value in favourable resolution. Finally, it should be noted that the price is being used only as a metric to
79
4.4. Implementation
decide between multiple available routes to a destination. As a consequence, reachability should be maintained even when routes are oscillating due to prices changing: oscillating prices only affect the choice of route (from many available routes) to a destination, not the reachability of the destination. It is noted above that price oscillation might also cause a decrease in the stability of the routeing tables. However, a more serious problem from a nancial standpoint is how the operator should deal with rapidly uctuating charges when these are being used as the basis of inter-operator settlement. Since the volume of trafc involved may be large, so might the amounts of money. An ISP could end up signicantly over-spending in the interval between a neighbouring AS increasing its price and the ISP being able correspondingly to increase and advertise its price, particularly if UPDATE messages are being rate limited for route stability reasons. Such a situation could arise naturally, or as a result of a (distributed) denial of service attack by customers of the ISP receiving the advertised price; in either case network administrators should be informed through some mechanism. This situation allows a number of possible remedies. To cover all such risks providers might wish to buy futures to protect themselves against such situations. Less heavyweight solutions include capping the size of price increments to prevent one ISP getting too far out of step with others. This could have further benecial impact with respect to the stability of routes, since such capping could help reduce the frequency at which routes change due to price changes. The nal point to note, but one which is not discussed in detail here, is the complexity of potential interaction between application level and routeing level reaction to marking. This might be of particular interest where the application is also charging or being charged for marks, or where the application needs to reect marks it generates back to the user. A standard example of this situation is a web server which may cause many marks to be generated by transmitting requested data toward a users web-browser. There must be some way that the value the user receives from the marks generated by the web-server can be recovered by the operator of the web-server. Subsequent sections discuss the implementation and simulation framework, and describe results of some simple simulations.
4.4 Implementation
This section discusses the implementation work carried out to validate the designs presented in this chapter. A BGP simulator is presented with an associated simulation description language. The following section presents the
80
4.4. Implementation
BGP daemon
OSPF daemon
RIP daemon
ZEBRA
81
4.4. Implementation
in the BGP dmon so that it would explicitly bind to a local address. By then using the facility in Linux for virtual IP interfaces, multiple copies of the ZEBRA and BGP dmons could be instantiated on a single machine, and could communicate with other instances, each instance believing itself to be running on an independent router. A discrete event simulator harness was written to enable the instantiation of numerous BGP dmons within a single Unix process. These instances run in the same way as the standard dmons, using the BSD sockets API to communicate and executing the standard BGP route management and preference code; the discrete event harness deals with scheduling the BGP instances. Finally, the ZEBRA dmon was modied to log rather than modify the kernel forwarding tables.
82
4.4. Implementation
<SimSpec> <BaseCong>
<BaseCong>+ [<ZebraCong>+] [<BgpdCong>+] base [ configfile | logfile ] = FileName base port = PortNumber base [ presleep | postsleep ] = SleepTime zebra debugging = <ZebraDebug> events | packets bgpd debugging = <BgpdDebug>+ [ bgpd | IPAddress ] <BgpdTimer> = TimerValue ASn contains IPAddress+ ASn advertises IPSubnet+ IPAddress peers IPAddress IPAddress time <BgpdTimeEvt> events | filter | fsm | load holdtime | checkload | keepalive | connect TimerValue [ withdraws | advertises ] IPSubnet
The result is the simulation of the distribution of load throughout the network by a real implementation of the routeing protocol. Although the simulation of load makes these simplifying assumptions about the behaviour of trafc in the network, the state machine and routeing protocol behaviour are not simplied in any way.
83
4.5. Results
4.5 Results
Results for three simple scenarios are now presented and discussed. In each case the stub ASs do not carry transit trafc, and so carry a total load equal to the number of other ASs in the topology (the load that they sink) plus the total number of ASs in the topology (the load that they source). In all simulations, each AS contains only one router; the effect of pricing on IBGP operation is not addressed here. Pricing was applied using the same price and charge mappings for all nodes: pi = li , cj = pi + ci where ci is the charge advertised to node Ni by the node i j j to be used as best route, Nj . The charge for a route was mapped linearly into the LOCAL-PREF, causing the route selection policy to be the most obvious dynamic policy, prefer the cheapest route. It should be noted that ZEBRA version 0.91a, on which the simulator was based, contains a modication to the standard BGP route selection process to reduce route ap: rather than always breaking ties using the BGP identier with the lowest value winning, it prefers the rst received route. The effects of this are discussed in the presentation of results for each scenario. The control results presented in Figures 4.5(a), 4.6(a), and 4.7(a) have this modication removed so as to follow the published BGP specication. The modied results presented in Figures 4.5(b), 4.6(b), and 4.7(b) have both this ZEBRA modication and pricing applied. Additional experiments were carried out for the cases where pricing is applied without the ZEBRA modication, and where the ZEBRA modication is applied without pricing. Results from these experiments are discussed in the text, but the results themselves are not presented.
84
4.5. Results
S2 T1 S1
(a) Scenario 1: basic multi-homing.
S2 T2 S3 T1 S1 T2
S3 T2 S1 T1 T3 S2
85
4.5. Results
14
12 10
Load (dimensionless)
T1
8 6 S1; S2; T2
4 2
S1 T1
S2 T2 100 120
20
40
60 Time (s)
80
(a) Unmodied
BGP .
14
4 2
T1 T2 120
(b) Modied
BGP .
Figure 4.5: Per-node load distributions for Scenario 1 shown in Figure 4.4(a).
86
4.5. Results
The large spike to 13 units in the initial portion of the graph is effectively an artifact of the message ordering imposed by the scheduling of the discrete event simulator harness. Due to the ordering of BGP messages, S1 , S2 , and T2 discover T1 in advance of the direct routes to each other. Consequently, they all initially use T1 to reach each other. As the simulation progresses, BGP information propagates through the network and the routers make more reasonable routeing choices, leading to the nal load distribution shown. With pricing and no ZEBRA modication, the system persistently oscillates. Application of pricing does cause the balanced distribution to be reached. However, since both the topology and load distributions are symmetric in this simulation, the prices are also equal and hence so are the LOCAL-PREF values. As a consequence the BGP decision process passes over the LOCALPREF attribute, and the tie is broken in favour of the router with the lowest BGP identier, T1 . This causes the price on T1 to increase, leading to one or both of S1 and S2 ceasing to prefer it for transit to the other. This results in one of two cases: either T2 is preferred by both S1 and S2 , or T2 is preferred by only one of S1 and S2 . In the rst case, the price advertised by T2 becomes higher than that advertised by T1 ; in the second case, the prices advertised by T1 and T2 become equal. Consequently, the rst case leads to the price advertised by T2 being higher than T1 , and the second case leads again to the symmetric situation where the LOCAL-PREF values are equal, and so the tie is broken in favour of T1 ; in either case, it can be seen that the system will continue to oscillate. With pricing and the ZEBRA modication applied, the system always stabilises to a balanced distribution of load, with both T1 and T2 carrying 8 units as shown in Figure 4.5(b). However, the convergence time is approximately double that in the control case, and the process involves approximately 4 times as many BGP messages. Additionally, the choice of route between S1 and S2 is non-deterministic: the route from S1 to S2 may involve either T1 or T2 ; this may be made deterministic by application of policy through the price-to-charge mapping. For example, if S2 wished to encourage trafc to travel towards it via T2 , it could do so by making the charge it advertised to T1 higher than that advertised to T2 for a given price. Assume that both T1 and T2 implement some rational policy such as choose cheapest route and are taking account of charges advertised to them in setting their own prices, as in these simulations. The effect will be that S1 will prefer to use T2 to reach S2 , since the charge advertised by T2 to S1 for the prexes associated with S2 will be lower than that advertised by T1 .
87
4.5. Results
88
4.5. Results
120
(a) Unmodied
BGP .
18 16 14 Load (dimensionless) 12 10 8 6 4 2 0 0 50 100 Time (s) 150 S1 S2 S3 200 T1 T2 240 S1; S2; S3 T2 T1
(b) Modied
BGP .
Figure 4.6: Per-node load distributions for Scenario 2 shown in Figure 4.4(b).
89
4.5. Results
points to be achieved, dependent on the ordering of the BGP messages. Similarly, with pricing and no ZEBRA modication, the default BGP tie-breaker causes any stable, balanced allocation achieved to be destroyed, inducing persistent oscillation as described for Scenario 1. Results when pricing is applied are shown in Figure 4.7(b). Here, the imbalance in loads experienced by T1 , T2 , and T3 cause different prices to be advertised. This results in all three of these transit nodes nishing with 15 units apiece. Again, the convergence time approximately doubles, and the number of BGP messages increases by a factor of 13 from approximately 240 to 3200. A more interesting feature of this result is that the approach to the stable point is much smoother. This is caused by the modications to the pricing algorithm discussed in more detail in Section 4.5.4. Essentially, the increase in the number of nodes (and hence routes in the network) and the more conservative load-shedding policy cause each node to react less violently to an alteration in the distribution of load around the network. This causes the system to behave more smoothly.
4.5.4 Discussion
The results above have demonstrated that it is possible to achieve route stability and a more efcient distribution of load using BGP with pricing and the ZEBRA modication. However, a number of issues became clear in the course of testing the simulator and running these experiments. These principally affect the trafc model used and the policies applied when redistributing load and are discussed below. First, even in the stable cases presented above, the number of BGP messages increases as change in the load causes change in the price. These changed prices must then be advertised to peers, requiring BGP messages. With more realistic sizes of network and routeing tables this might become a problem and so deserves further investigation. Second, correct choice of which routes to move to the cheaper AS can be difcult. If an AS advertises a reduction in its charge, the natural reaction is to cause as many routes as possible to use that AS as transit. However, doing so can increase the load on that AS to the extent that the price reduction is destroyed, and replaced by a price increase. This can cause the AS receiving the advert to now choose to move its routes back, resulting in needless route ap. This problem can be addressed in two ways. Firstly, the assumption that each AS sources trafc from only one prex means that BGP has no exibility over how much trafc to shift: it must move all or nothing. In a real deployment, a
90
4.5. Results
(a) Unmodied
BGP .
25
20 Load (dimensionless)
T1; T2; T3 15
10
S1; S2; S3
T1 T2 T3 240
(b) Modied
BGP .
Figure 4.7: Per-node load distributions for Scenario 3 shown in Figure 4.4(c).
91
4.5. Results
7 S2 7 8 T1 7 S1 7
(a) (b)
7 S2
9 T2 8 9
7 S2 8 T2 7 8 9 9 T1 9 S1 7
(c)
8 9 T1 8 S1 7
7 T2 7 7
7 S2 9 7 T1 9 S1 7
(d)
7 S2 7 T2 9 7 7 7 T1 7 S1 7
(e)
7 S2 9 T2 9 9 7 8 T1 7 S1 7
(f)
9 T2 8 9
Figure 4.8: Example of persistent oscillation. single AS is unlikely to both source sufcient trafc and do so toward a single prex to cause this effect where such a situation occurs, dynamic SLAs can be considered inappropriate without application of other techniques such as disaggregation. Secondly, route ap damping [RFC 2439] can be used to rate limit adverts in such situations. Finally, simulation of Scenarios 1 and 3 demonstrated an issue with the algorithm as presented in Figure 4.1. It is possible for the routers to persistently oscillate particularly in the more symmetric topologies. An example using Scenario 1 is shown in Figure 4.8. Although the optimum distribution has been reached in Figure 4.8(a) with 8 units through each transit node, old UPDATE messages still propagating through the network cause this distribution to be unstable. The current price at a node is shown in black, and old price information still propagating through the network is shown in grey. Figure 4.8(a) shows that the balanced state has been reached, with a price of 8 units advertised by each transit node. However, due to out-of-date information still propagating through the network, S1 and S2 come to believe that the current prices are 7 for T1 and 9 for T2 . This causes them to change their preferred route
92
4.6. Summary
choice as shown in Figure 4.8(b). Due to the ZEBRA modication, this is unaffected by receipt of the now out-of-date prices of 8 units for T1 and T2 in Figure 4.8(b). In Figure 4.8(c) S1 and S2 receive the now correct prices of 9 for T1 and 7 for T2 . In this example, this causes both S1 and S2 to change their preferred routes to use T2 rather than T1 . This leads to the situation shown in Figure 4.8(d). Subsequently, Figure 4.8(e) shows S1 and S2 detecting the new prices of 7 units for T1 and 9 units for T2 . This time S2 decides to change its preferred route, and does so such that the effects of this change reach S1 before S1 next makes its route preference choices. This causes the prices at T1 and T2 to become equal at 8 units, shown in Figure 4.8(f), and the oscillation may repeat. Although this synchronization can be destroyed simply due to the timing of BGP messages, a mechanism that guarantees to break up this synchronization is required. This is achieved by making two modications to step 5 of the algorithm. Firstly, the number of routes that may have their LOCAL-PREF altered on the basis of a change in price is limited, in this case to one. This implements a more conservative load shedding policy; changes in price are still re-advertised as soon as they are processed. Secondly, modication of the LOCAL-PREF is only allowed to take place after a delay proportional to the maximum AS-PATH length in the network. These changes attempt to ensure that changes in price have a chance to propagate throughout the network so that routes are not changed on the basis of out-of-date prices, to prevent situations such as shown in Figure 4.8 occurring. An alternative, less pessimistic, scheme would be to choose the delay randomly from [0, n] where n is proportional to the diameter of the network; this should decrease convergence times while still preventing synchronisation. Notwithstanding these issues, the results presented do demonstrate that it is possible to implement pricing in BGP such that the protocol converges to a more even distribution of trafc through the network. The resulting distribution can be controlled according to policies considered desirable by the network operator.
4.6
Summary
This chapter discussed management timescale approaches to trafc engineering in the Internet. It began by considering the scope of management timescale trafc engineering, noting that user utility maximisation and network congestion control are more appropriately achieved using data and control timescale approaches to trafc engineering. It continued by consid-
93
4.6. Summary
ering inter-AS routeing and pricing, and how current practise relates them to trafc engineering. Subsequently, a new path attribute was proposed and its detailed design presented. Finally, the implementation of a BGP simulator was described, and initial evaluation of the new path attribute performed in three simple scenarios. Although detailed evaluation is beyond the scope of this thesis, the simulations gave some insight into the possible behaviour of the protocol when extended with the new path attribute. This insight was used to successfully modify the algorithm; however, it is clear that further investigation is required here. Detailed investigation of different pricing and charging regimes is necessary before deployment could be considered. Similarly, the interactions when different ASs use different pricing and charging functions are unknown and require investigation. The nal area for further investigation concerns more operational details BGP of: the behaviour of IBGP with pricing, and algorithms for combining IBGP advertised prices to achieve a price for the AS should be studied. Implementing more complex dynamic policies involving quality estimates of neighbouring ASs, interaction between ASs applying different policies, and interactions between static and dynamic policies should also all be studied further. This chapter and the preceding chapter have presented two mechanisms for performing trafc engineering at control and management timescales in the Internet. The following chapter now considers how and why these mechanisms, along with data timescale trafc engineering mechanisms, could be deployed in the Internet, and the effects of such deployment.
94
Chapter 5
5.1
This section considers the requirements that the various participants in the network have with respect to trafc engineering.
95
in a manner that maximises the amount of trafc carried whilst attaining the levels of service users desire. This requires that protocols receive timely usage information so that correct trafc distribution decisions may be taken subject to the constraints imposed by users. Without accurate and timely usage information, no routeing protocol is able to make routeing decisions that correctly balance trafc through the network. Furthermore, in order for trafc to be efciently distributed through the network, it seems clear that rich peering between network operators should be encouraged, as should the accurate expression of users desires. Given the problems of effectively managing networks using existing protocols, the incentives for rich peering are not strong. As discussed in Chapter 1, there are also few mechanisms for customers to express their desires clearly to the network.
96
The rst is through transit agreements. In this case the smaller operator becomes a customer of the larger, with the larger operator agreeing to advertise routes to and from the smaller so that trafc can be routed to and from the smaller operators IP addresses. The second way that operators interconnect is peering, controlled by the SLAs (SERVICE LEVEL AGREEMENTS) into which the operators enter. In both cases arrangements are managed via SLAs. These are legal, rather than technical, agreements and have two forms: SLAs for bilateral private peering arrangements between two operators who wish to exchange trafc; and multilateral peering arrangements between groups of operators all peering together at some exchange point. They specify the requirements that each party places on the other, the service that each party will provide to the other, any costs a party may incur, and the grounds on which a party may terminate the SLA. A number of SLAs are publicly available [SLA - SPRINT00, SLA GIGABELL 01, SLA - GENUITY 01, SLA - LEVEL 3, SLA - UUNET 00, SLA - UUNET 01, SLA - MAE 01], and the requirements stated in these agreements can be roughly classied as follows: Operational support Operational support covers the more mundane details of interconnecting networks, such as suitable access by the respective operators to the peering point and machines, 247 staff support at the network operations centre, rack space for installation of equipment, and power supply. Network size Network size covers specication of the geographic diversity of the network, often in terms of a minimum number of peering points in the region covered by the peer, and interconnection bandwidth available at those peering points. Network capacity This concerns the networks capacity in the region under consideration, in terms of the networks bandwidth (as opposed to the interconnection bandwidth referred to above), and the maximum allowed average busy hour load.
97
For example, the Worldcom-UUnet [SLA - UUNET00, SLA - UUNET 01] agreement species connectivity with at least 50% of the peering points in the relevant region (at least 15 states in the US, at least 8 countries in Europe, or at least 2 countries in Asia-Pacic); fully redundant backbone at speed dependent on the region (622 Mb/s in the US, 45 Mb/s in Europe, and 12 Mb/s for Asia-Pacic); and maximum utilization of not more than 50% during the average busy hour. Total ingress/egress trafc Total ingress/egress trafc refers to the amount of trafc to be exchanged under the agreement. Since these are peering agreements this usually limits the ratio of ingress to egress trafc so that the imbalance is not too high. It also often includes some statement about the minimum rate of trafc to be exchanged. For example, the same Worldcom-UUnet agreement species 40 Mb/s minimum trafc exchange, and that the ratio of trafc exchanged not exceed 1:1.5 in either direction. Route control Route control concerns the manner in which route information to and from the two peers will be treated. Route exchange between peers is via BGP, although different operators may choose to use different protocols as their IGP. Some operators do place constraints on the policy to be expressed through the IGP, such as shortest exit policy, requiring that the transmitting AS route trafc to the exit closest to the receiver. Also specied are policies concerning redistribution and use of routes, and other routeing support. For example, Genuity [SLA - GENUITY01] mandate that their peers support IPs loose source record route option at the edges of the network.
98
As described in Section 2.4, routeing in the Internet is currently performed principally by three protocols: OSPF or ISIS for routeing within an AS, and BGP for routeing between AS s. The principal mechanism available to operators to allow them to implement the SLAs they enter with other operators is therefore BGP. Correspondingly, OSPF and ISIS are the principal mechanisms by which operators can manage trafc within their networks to ensure both that whilst it is under their control and at the point that it exits their control it is being treated in a manner which meets applicable SLAs. Although the original specication of OSPF [RFC 1583] included support for calculation of separate routes based on the IP TOS byte, this has since been removed [RFC 2178], due to a lack of requisite implementation experience. The current specication uses assignment of metrics to paths to compute shortest paths, but a given path is only allowed a single metric. This prevents separate treatment for different trafc types. When entering into multiple varied SLAs with many other operators, it is likely to be desirable for operators to have the ability to apply different treatment to trafc from different operators. The DIFFSERV proposals enable operators to treat trafc differently, and to use these different treatments in the specication of SLAs. However, DIFF SERV is only intended to allow the use of forwarding and queueing behaviour at nodes to differentiate between trafc; routeing treatment of trafc is intended to be unaffected. BGP allows different prexes to have different preferences within an AS, but provides no way for a receiving AS to advertise a cost to a transmitting AS for carrying its trafc. The MULTI-EXIT-DISCRIMINATOR path attribute can be used by the receiver to inuence the transmitting ASs choice of entrypoint into the receiving AS; however this can be, and often is, ignored by the transmitting AS since they have no incentive to trust it. As described in Section 4.2.1, the commonest way of achieving the desired effect is by the receiving AS prepending multiple copies of its own AS number to the ASPATH when it advertises the prex to its peers.
5.2.3 Discussion
Operators specify a number of requirements in SLAs when entering peering arrangements. As the preceding discussion notes, there are currently few mechanisms available for the implementation of such agreements. Although BGP allows operators some ability to implement policy between ISP s and the DIFFSERV proposals enable individual ISP s to differentiate between trafc at individual nodes, these mechanisms are unsatisfactory. It is difcult to automate the implementation of policy within BGP, and manual implementation is prone to error and to potential conicts between ISPs
99
5.3. Deployment
leading to persistent oscillation of routeing tables. Attempts to provide policy repositories where operators register the policies they wish to implement using a recognised policy specication language have proved only partially successful, and do not address the problems of automation of policy implementation. Correspondingly, although DIFFSERV addresses the problem of enabling individual nodes to implement differential packet forwarding, it explicitly does not specify how such differentiation should be implemented. Furthermore, it does not address the translation of DSCPs (DIFFERENTIATED SERVICES CODE POINTS ) at network boundaries, leaving it to bilateral agreements between operators as to how trafc sporting a particular DSCP should be treated by the receiving network. The DIFFSERV proposals also do not consider how such agreements are to be implemented and managed. Finally, MPLS (MULTI - PROTOCOL LABEL SWITCHING) grants greater control over trafc aggregates to operators, allowing the implementation of ner grained SLAs. Consequently, a mechanism to automate the parameterization and settlement of such SLAs is valuable; use of BGP as an LDP for MPLS, enables the price path attribute to be used in MPLS networks.
5.3 Deployment
This section discusses the use of the mechanisms presented in Chapters 3 and 4 to implement SLAs. These mechanisms not only allow more exible SLA s to be specied, but also allow the automation of their management. Before any such implementation might be undertaken, aspects such as engineering the code, interfacing with network management tools, and so on would have to be dealt with. Although important, such details are not considered further here as they are not relevant to the thesis being presented.
5.3.1 Disincentives
Objections to the deployment of the mechanisms discussed in Chapter 3 focus on the end-to-end nature of the Internet. It is generally believed that the Internet should only operate at the packet level, and as such, interior nodes should not consider trafc at any other granularity1 . Flow admission schemes are considered inappropriate since they place extra computation and state within the network, and it is assumed that doing so will violate the Internets scalability. Furthermore, since ow admission control must involve the
It should be pointed out that the restrictions this places on the functionality of the Internet, particularly in terms of accountability, have been noted for some time [Clark88].
1
100
5.3. Deployment
denial of access to the network to some ows, such schemes also violate the assumption of connectivity through the network. Finally, it is assumed that ow admission mechanisms will increase the management effort required by the network operator, making the network more expensive to run. Implementation of the mechanisms discussed in Chapter 4 requires two things: rst, the deployment of the technology presented; and second, co-operation at a management level between providers. The technological issues consist of design and implementation of suitable pricing and charging functions. This is a relatively large problem, but it is hoped that elements of Chapter 4 go some way towards a solution. Potential disincentives toward deployment of these mechanisms can be divided into two categories: technological and managerial. The basis for the technological disincentives is that routeing in the Internet is an extremely complex, ill-understood system. It is implemented over a wide variety of platforms both in terms of the hardware and software used in end-systems and routers, and in terms of the support systems in place to deal with issues such as billing and trafc monitoring. This makes controlled deployment of alterations to the structure of the Internet difcult and costly, as evinced by the problems faced in the largely abortive deployment of RSVP, and the continuing deployment problems faced by IPv6 and multicast. In particular, any modication to BGP that may increase fragmentation of the IP address space is seen as unreasonable. These problems lead in turn to the managerial disincentives. Many people do not see the need for the capabilities offered by improved trafc engineering. They claim that over-provisioning of the network is sufcient for its foreseeable future uses, and point to the failure of modications like RSVP to provide sufcient benet to outweigh the associated costs. As well as the obvious costs associated with any such upgrade, there are hidden costs such as the retraining of support staff, the modication of support systems and so on. It also seems to be the case that many people have more emotional reasons for avoiding pricing and charging for the Internet, believing that it should be a free service for all. Finally, Metcalfes Law observes that the utility of a network tends to increase as the square of the number of participants2 . Consequently, deployment of a network-wide change struggles: its utility is not obvious while used by only a small number of the network participants. In particular, if there is a high entry cost associated with starting to use the new network, there is a catch-22 situation. Whilst there are only a small number of users the cost of joining is high, and the benets of joining are perceived to be low. This applies especially to the Internet given its current structure where trafc may
2
101
5.3. Deployment
5.3.2 Incentives
The incentives for deployment of the mechanisms presented in Chapter 3 focus on network performance and service differentiation. There are a number of performance benets to be gained by ow management in the Internet, from the points of view of the user and the network operator. Most fundamentally, allowing the network to deny access to ows gives it another mechanism to deal with congestion. This can help to prevent congestion collapse situations where elastic connection oriented protocols principally TCP are not elastic enough and would be forced into a bandwidth region in which they cannot usefully operate. Furthermore, by restricting the number of ows in the network the operator can provide what might be termed soft bandwidth partitioning. Since users are generally expected to run compliant implementations of protocols, in many cases simply restricting the number of active ows of a given protocol can be enough to provide a soft guarantee of the service each ow will receive. This allows users more freedom to specify the value they are placing on a particular use of the network. Finally, as previously stated, by allowing users to be more explicit about their requirements from the network, and by allowing network operators to better control the ow of trafc through their networks, billing and management should become more straightforward. As data transfer is predominantly ow based, giving operators easy access to the value and duration of a ow gives them the information required to provide more exible billing. Incentives for the deployment of the mechanisms presented in Chapter 4 split into two parts: those applicable to small ISPs who typically do not currently enter into peering arrangements; and those applicable to large ISPs, who typically already enter into peering arrangements. The principal incentive for a smaller ISP to deploy the mechanisms proposed in this dissertation is to enable them to provide increased service differentiation. Since communication networks generally grow in utility with size, for a small ISP to be successful it should be more aggressive in the services it offers to counter-balance the problem of its small size. By deploying mechanisms such as those discussed in Chapters 3 and 4, greater service differentiation may be offered to users. Furthermore, in cases where the ISP really only exists to provide connectivity for a single content provider, the ability to ensure the quality of the content distribution channel is valuable and provided by mechanisms such as those
102
5.3. Deployment
presented in Chapter 4. Finally, by making peering more automated and manageable, it becomes feasible for smaller ISPs to form co-operative groups able to leverage their aggregate size to satisfy the requirements discussed in Section 5.2.1. The case for larger ISPs to deploy such mechanisms is more subtle. Although those who also act as user facing ISPs may benet from the admission control techniques suggested in Chapter 3, the techniques of Chapter 4 are of equal relevance. Whilst these techniques do allow more effective competition from the smaller ISPs, they should provide a reduction in management costs, both in terms of administration of SLAs and in terms of the operational management costs of running the network. Furthermore, should co-operatives of smaller ISPs form, it becomes benecial for the larger ISPs to peer with them; this increases the benet of reducing the costs associated with peering. Similar incentives concerning control over trafc leaving the ISPs network also apply.
5.3.3 Discussion
In response to the perceived problems with per-ow admission control, it should be noted that whilst it is true that the Internet provides end-to-end connectivity, as soon as trafc crosses administrative boundaries this is all that it provides. Any provision for QOS in the Internet therefore requires a process of discussion and agreement between the administrators of the networks over which the service is to be provided. Per-ow control need have neither excessive state nor computation requirements, as demonstrated in Sections 3.3 and 3.5. Furthermore, admission control should only cause a ow to be dropped where it was likely that the ow would achieve such low bandwidth as to be of no use to the user. Finally admission controllers such as the M TK controller evaluated in Section 3.4 have simple parameterizations that can be easily understood and tuned by operators. Deployment of these mechanisms is straightforward. It simply requires that the ISP identify any bottleneck links they may have, and install the admission control device at the relevant points. The only difcult aspect of this deployment is the choice of estimator and parameterization for the admission controller. As discussed in Section 3.4.3 the M TK estimator is probably not the most suitable estimator to use here; investigation of more suitable estimators and feedback schemes to dynamically parameterize them is left as an area for further research. Fundamentally, the response to the disincentives related to the mechanisms presented in Chapter 4 is simply that many of the problems posed in the
103
5.3. Deployment
previous section are already being faced by operators, and must therefore be dealt with. Problems associated with operational system support are becoming more important as more people make use of the Internet, and particularly as more people and businesses start to rely on it as a core infrastructure service. Increased support within the Internet infrastructure to deal with such problems is becoming a necessity. As discussed in Chapter 1, simple over-provisioning of the network is not a satisfactory solution. Consumer demand and expectations rise in line with the increased capabilities of the technology, and similarly show no signs of slowing down. Furthermore, over-provisioning is not possible everywhere. Although network technologies such as dense wave division multiplexing do allow for huge bandwidths in the core of the network and thus overprovisioning may be reasonable for certain core network providers, bottlenecks will still exist elsewhere. Additionally, not only is the available bandwidth not homogeneous throughout the network, but neither is the trafc load. Even with future technologies, it may not be possible, and certainly not nancially reasonable, to ensure global over-provisioning. Since trafc demand can change dramatically from day to day, and even from hour to hour, having mechanisms to deal with such uctuations seems valuable. The technical problems of controlled and incremental deployment can be addressed through techniques and tools such as the BGP simulator presented in Section 4.4.1. This allows for such modications to be tested, at least in part, before deployment need begin. Similarly, pre-deployment testing of modications such as presented in Chapter 3 is well-established through the use of tools like NS. Using such tools, operators can attempt to gain some condence in their proposed policies before deployment. All of the techniques discussed in Chapters 3 and 4 implicit admission control, the RTP-ECN-proxy, and the price path attribute allow for straightforward incremental deployment. Chapter 3 demonstrated that implicit admission control need not adversely affect user applications. Such techniques are also most naturally implemented at particular bottlenecks at the edges of the network, where other middle boxes [Carpenter01] such as rewalls are currently deployed. These places are generally under the control of a single administrative domain, and thus the deployment of implicit admission control need not require co-operation between operators. The extensions to BGP are naturally incremental in that path attributes are intended precisely for extending the protocol whilst retaining interoperability with prior versions. Consequently, none of the objections to the mechanisms presented in Chapters 3 and 4 are so strong as to make deployment unreasonable. Furthermore, the incentives presented give positive reasons why deployment of these
104
mechanisms is desirable, and the process of deployment is itself feasible. The following section discusses the application of the presented mechanisms to the provision of services by ISPs to users.
5.4
Service provision
This section discusses how the mechanisms presented in Chapters 3 and 4 might be used to provide services. It considers service provision from the two principal points of view: users and operators. Finally, it presents a concrete example of service provision using the previously described mechanisms.
105
seems more easy to understand than whether or not packets for this application require expedited or assured forwarding, or some other ISP-specic PHB . For users that desired the extra exibility provided by DIFFSERV, services could be provided which limited the number of high quality PHB streams. Alternatively, the protocol type or port number could be used to assign ows transparently to service classes. This allows trafc associated with different uses to be given an appropriate QOS. Section 5.4.3 discusses an example in more detail. The content services to which the user subscribes also have a signicant impact on their perception of the Internet. Such subscriptions could be provided as part of the service offered by the ISP; alternatively, the user could subscribe to them directly. Interaction of such content subscription with the mechanisms presented in Chapter 4 is discussed in the following subsection.
106
CheapISP Charles
BajaVista
Figure 5.1: A concrete example. being presented through the modied BGP protocol would allow operators to automate parameter tuning. Content providers might wish to subscribe to a service that enabled them to improve the transport of trafc to and from their site for all customers. This can be achieved by the content provider effectively paying for the policy the ISP will apply when advertising the content providers prexes; by offering to carry trafc cheaply for those prexes, the ISP can attempt to make itself and its chosen neighbours the preferred route for reaching that content provider. Conversely, trafc from the content provider would be assured that it would take the highest quality path currently available by using DSCPs and control of IGP link weights to choose the route it takes, where possible. It is worth noting that all of these proposed mechanisms are backward compatible with currently deployed systems. Furthermore, they all offer visible incremental improvements, unlike solutions such as INTSERV with RSVP, which provide no guaranteed improvement as soon as there is a single path element that doesnt support the new capability.
107
5.5. Consequences
paid for.
Since Alice subscribes to such a high quality service, she is allowed as many premium quality ows as she likes, subject to some total limit applied by QualityISP to ensure that all ows can still make good progress. On the other hand, Robert has a per-user limit applied to the number of high quality ows he can introduce; assuming QualityISP was providing a higher quality PHB for real time media streams, this might translate to limiting the number of such streams Robert could achieve, or to utilising a mechanism such as the RTP-ECN-proxy presented in Section 3.5 to limit the quality Roberts streams could achieve. Finally, since CheapISP offers no limit on the number of streams its subscribers can have, Charles can use as many streams as he wishes, but may see extremely poor service at times of high load. Settlement for the service BajaVista provides will be provided by QualityISP for Alice and Robert as part of their standard service. Charles would have to subscribe to BajaVista directly. In all cases, the actual cost BajaVista incurs by attempting to appear to all users as if they were well-connected to it could be monitored in terms of the number of marks arriving at the destination ISP, either QualityISP or CheapISP. The ISP could then either settle itself in the case of QualityISP, Alice and Robert, or it could pass the bill on to the user in the case of CheapISP and Charles. BajaVista desires that all users see reasonable service. It subscribes to TopTransit specifying its desires. In turn, TopTransit advertises BajaVistas prexes with a low associated cost, and furthermore, it advertises them to other transit ISPs providing a high quality service. This should result in trafc for those prexes typically following a high quality path to BajaVista. Conversely, BajaVista also subscribes to a high quality service from TopTransit, so that trafc from BajaVista will be transmitted efciently toward the requester, be they Alice, Robert or Charles.
5.5 Consequences
This section discusses some of the consequences of deployment of the work presented in this dissertation. It divides into two parts: how the networks structure might be affected, and how the economic structures associated with network operation and management might be affected.
108
5.5. Consequences
By removing many of the barriers to network interconnection, the natural incentive for ISPs to richly peer should come to the fore. This has benets both for the robustness of the network since the higher degree of connectivity makes routeing around failure easier, and in terms of performance since the higher degree of connectivity at an AS level should lead to a lower diameter network. It is also reasonable that improved trafc engineering will lead to more efcient network use in terms of the distribution of trafc, further improving performance for users. It seems likely that widespread deployment of admission control could lead to a greater number of long lived ows, which are generally easier to manage and route, and which map more efciently onto newer network technologies such as MPLS and pure optical networks. Additionally, new capabilities are made available to network customers in terms of their ability to specify desired levels of service. In the past, the addition of such capabilities has often led to the development of new services and applications able to make use of the richer network semantics now available.
109
5.6. Summary
5.6 Summary
This chapter has discussed issues related to the deployment of the ideas and mechanisms presented in Chapters 3 and 4 of this dissertation. It began by considering the requirements placed upon Internet trafc engineering along with the state of the art of Internet trafc engineering. It continued with discussion of the disincentives and incentives toward deployment of the presented work. This was followed by a discussion of how user and operator perceptions of services offered over the Internet might change, and of the network and economic consequences of deployment of the proposed mechanisms. In conclusion, this chapter has argued that the ideas and mechanisms presented in this dissertation are both useful and deployable, with benets for both small and large network operators, and network users. The nal chapter now concludes the dissertation and considers areas where further work would be useful.
110
Chapter 6. Conclusions
Chapter 6
Conclusions
This chapter concludes the dissertation by summarising the work it described, and noting areas in which further work is required.
6.1
Summary
This dissertation has addressed issues of trafc engineering in the Internet at multiple timescales. Chapter 1 began by motivating the continuing need for trafc engineering in the Internet. It argued that current approaches are unsatisfactory, and proposed that successful trafc engineering requires consideration of network behaviour at both control and management timescales in addition to data timescales. It concluded by proposing pricing as a useful mechanism for implementing and unifying trafc engineering across all timescales. Chapter 2 then considered background and related work to the problem of Internet trafc engineering. The relevant Internet protocols were reviewed and it was argued that they do not provide sufcient information to enable efcient trafc engineering. The chapter went on to consider resource allocation mechanisms for networks, introducing pricing in particular as such a mechanism. The specic case of resource control in the Internet was then discussed and current proposals were shown to be unsatisfactory. Both intraand inter-AS Internet routeing protocols were then considered and the principal inter-AS protocol, BGP, argued to be too restrictive in its operation to enable automated inter-AS trafc engineering. Finally this chapter noted the context of the work described in this dissertation, in terms of the structure of the network and the assumptions made about it. The bulk of the contribution of this dissertation was reported in the following three chapters. Chapter 3 considered control timescale trafc engineering,
111
6.1. Summary
Chapter 6. Conclusions
i.e. dealing with connections, concentrating on the TCP and RTP protocols. It began by demonstrating that current approaches to congestion control in TCP can fail in extreme cases to ensure that all users achieve reasonable goodput through the network. It also showed that even if such failure does not occur, TCP allocates resource in a highly variable and potentially unfair manner. To alleviate these problems, admission control and specically implicit admission control was proposed. The potential impact of this was discussed, followed by design considerations for such a system. Implementation of such a system in the Linux operating system was then presented, demonstrating the feasibility of this approach. Simulation work reporting an implementation of implicit admission control based on measured trafc statistics in the NS simulator followed, and showed that implicit admission control for TCP substantially improves the performance of the network at times of overload. Finally, implementation of an RTP-ECN-proxy demonstrated the feasibility of an alternative to admission control. The presented mechanisms were shown to improve the performance of the network for users, and the controllability of trafc within the network for operators. Chapter 4 discussed issues related to management timescale trafc engineering, i.e. dealing with aggregates of trafc between ISPs. It described current mechanisms within the Internet for performing this and discussed the relation with data timescale trafc engineering. It then looked in more detail at interAS trafc engineering using the BGP protocol, and proposed the price path attribute as a mechanism that improves the facility for management timescale trafc engineering. Design considerations for the price path attribute were then detailed and implementation within a BGP simulator described. Finally, results of simulations using this simulator were presented and discussed. Chapter 5 presented the case for evolving from the state of the current Internet toward that presented in this dissertation. It began by describing the requirements users and operators have for Internet trafc engineering, and the state of the art of their implementation. Having previously demonstrated the benets of the mechanisms presented in Chapters 3 and 4, arguments for and against the deployment process were presented. The deployment process itself was shown to be desirable; this was followed by a discussion of user and operator perceptions of service provision and a concrete example of the services that deployment of these mechanisms would allow. Finally, consequences from the point of view of the network and associated economic structures were presented.
112
Chapter 6. Conclusions
6.2
Further work
This section notes areas where further work is required, and future directions related work could take. Leaving aside the issues of engineering the prototypes described in this dissertation before deployment could occur, there are a number of areas where further work is required. The rst such area, and one which applies across both Chapters 3 and 4 is the design of suitable marking schemes. Mark rate was used as a congestion measure as it is believed to be a suitably smooth and accurate congestion indicator; however, this deserves further investigation, particularly investigation of whether or not current marking schemes are satisfactory, and how different marking schemes interact. Application to other current network technologies such as wireless networks, IPv6, and MPLS all bear further investigation. There are three principal pieces of work arising from Chapter 3. The rst is the need for accurate and timely estimation of the number of ows a router carries. Each of the three suggested approaches requires further understanding of Internet trafc mixes and behaviour, to a greater or lesser degree. Connected with this is the second area: development of pricing functions suitable for per-ow pricing, their interfacing and interaction with end-system operating systems, and the presentation of the generated information to the user. The latter two areas are already being studied in the context of data timescale packet marking, discussed in Section 2.2.4. Finally, more exible mechanisms for performing ow deferral, rather than denial could also be investigated possibilities include the splicing of TCP connections to allow the admission controller to truly defer the end-to-end connection setup without requiring the end-system to retry. There a number of areas for further work suggested by Chapter 4. The largest is intra-AS pricing, only briey touched upon in this dissertation. Modication of IGPs, design of suitable pricing functions, and integration with BGP are key areas for further work. More detailed study of IBGP interaction, along with study of pricing functions, policies to be expressed through the price-to-charge mapping, and routeing and price stability for BGP in general are also required. In general, issues concerning the control and management of resources between operators needs more study. More radical modications to certain of the Internet protocols could enable much greater control over trafc distribution within the Internet. This includes mechanisms for efciently managing prex disaggregation, and the corresponding increase in routeing table size. Possibilities here rely on more extensive modication to BGP and the way in which prexes are advertised; by more efciently encoding of prexes in the protocol, it becomes possi-
113
6.3. Conclusion
Chapter 6. Conclusions
ble to refer to groups of prexes in routeing tables, restricting routeing table size. Additionally, tighter integration with end-systems and per-packet marking schemes could greatly increase the ability of users to specify their requirements to the network.
6.3 Conclusion
To conclude, it is the thesis of this dissertation that trafc engineering is required at multiple timescales within the Internet, and that current provision for it is unsatisfactory. Users are unable to express their desires to the network, and in any case operators do not have sufcient control over trafc within the network to meet these desires. Furthermore, given mechanisms for multi-timescale trafc engineering, a suitable unifying framework for the policies to be expressed is required. This dissertation has presented and evaluated mechanisms to achieve this by enabling operators to control access to the Internet on a per-ow as well as a per-packet basis; by providing mechanisms to allow for automated settlement between operators; and by discussing structures within which these mechanisms can be used to increase service differentiation throughout the network, enabling a better match to be achieved between the desires of users and the capabilities of the network. Pricing has been presented as a unifying framework in which users and operators can express desired policies. It was argued that pricing is well-suited to this task as it is both exible and intuitive, and it provides both parties with incentives for appropriate behaviour. In summary, this dissertation has argued that deployment of the presented mechanisms with pricing as a policy framework would help satisfy both user and operator requirements for Internet trafc engineering.
114
BIBLIOGRAPHY
BIBLIOGRAPHY
Bibliography
[Ahn95] J. Ahn, P. Danzig, Z. Liu, and L. Yan. Evaluation of TCP Vegas: Emulation and Experiment. Computer Communication Review, 25(4):185195, August 1995. Proceedings of ACM SIGCOMM 1995. (p 16) The ATM Forum Technical Committee. Private Network-Network Interface Specication 1.0, March 1996. af-pnni-0055.000; see also addendum Addendum to PNNI v1.0 for ABR parameter negotiation, afpnni-0075.000. (p 25) The ATM Forum Technical Committee. Trafc Management Specication 4.1, March 1999. af-tm0121.000; see also addendum Differentiated UBR, aftm-0149.000. (p 20) The ATM Forum Technical Committee. ATM UserNetwork Interface Signalling Specication 4.0, July 1996. af-sig-0061.000; see also addendum Signalling ABR addendum, af-sig-0076.000. (pp 19, 20, 25) U. Bodin, O. Schelen, and S. Pink. Load-tolerant Differentiation with Active Queue Management. Computer Communication Review, 30(3):416, July 2000. (p 19) A. Bouch and M.A. Sasse. It Aint What You Charge Its The Way That You Do It: A User Persepctive of Network QoS and Pricing. In M. Sloman, S. Mazumdar, and E. Lupu, editors, Proceedings of IFIP/IEEE International Symposium on Integrated Network Management (IM99), pages 639655, May 1999. (p 23) A. Bouch, M.A. Sasse, and H.G. DeMeer. Of Packets and People: A User-Centred Approach to Quality of Service. In Proceedings of 8th International Workshop
[ATMF-PNNI96]
[ATMF-TM99]
[ATMF-UNI96]
[Bodin00]
[Bouch99]
[Bouch00]
115
BIBLIOGRAPHY
BIBLIOGRAPHY
on Quality of Service (IWQoS00), Pittsburgh, PA, USA, June 2000. (p 23)
[Brakmo95]
L.S. Brakmo and L.L. Peterson. TCP Vegas: End to End Congestion Avoidance on a Global Internet. IEEE Journal on Selected Areas in Communications, 13(8):1465 1480, October 1995. (p 16) L. Breslau, E.W. Knightly, S. Shenker, I. Stoica, and H. Zhang. Endpoint Admission Control: Architectural Issues and Performance. Computer Communication Review, 30(4):5769, October 2000. Proceedings of ACM SIGCOMM 2000. (p 21) B. Carpenter and S. Brim. Middle boxes: Taxonomy and Issues. Internet Draft, July 2001. <draft-carpentermidtax-02.txt>. (p 104) K.M. Chandy and J. Misra. Distributed Computation on Graphs: Shortest Path Algorithms. Communications of the ACM, 25(11):833837, November 1982. (p 26) K. Chu. Demand for Different Qualities of Service for Internet Access: INDEX Findings. In Network Modelling in the 21st Century: Royal Society Discussion Meeting. Royal Society, December 1999. Available from http://www.statslab.cam.ac.uk/richard/ research/topics/royalsoc1999/index.html.
(p 23)
[Breslau00]
[Carpenter01]
[Chandy82]
[Chu99]
[Clark88]
D.D. Clark. The Design Philosophy of the DARPA Internet Protocols. Computer Communication Review, 18(4):106114, August 1988. Proceedings of ACM SIGCOMM 1988. (pp 36, 100) Scott Clearwater, editor. Market Based Control: A Paradigm for Distributed Resource Allocation. World Scientic, 1996. (p 20) R. Cocchi, D. Estrin, S. Shenker, and L. Zhang. A Study of Priority Pricing in Multiple Service Class Networks. Computer Communication Review, 21(4):123 132, September 1991. Proceedings of ACM SIGCOMM 1991. (p 20) R. Cocchi, D. Estrin, S. Shenker, and L. Zhang. Pricing in Computer Networks: Motivation, Formulation
[Clearwater96]
[Cocchi91]
[Cocchi93]
116
BIBLIOGRAPHY
BIBLIOGRAPHY
and Example. IEEE/ACM Transactions on Networking, 1(6):614627, December 1993. (p 20)
[Courcoubetis97]
C. Courcoubetis, F.P. Kelly, and R.R. Weber. Measurement-based Charging in Communication Networks. Technical Report 19, Statistical Laboratory, University of Cambridge, 1997. (p 21)
[Courcoubetis98a] C. Courcoubetis, F. P. Kelly, V. A. Siris, and R. Weber. A Study of Simple Usage-based Charging Schemes for Broadband Networks. In Proceedings of IFIP TC6 International Conference on Broadband Communications (BC98), Stuttgart, Germany, April 1998. (p 21) [Courcoubetis98b] C. Courcoubetis, C. Manolakis, and G.D. Stamoulis. An Intelligent Agent for Negotiating QoS in Priced ABR Connections. In Proceedings of International Conference on Telecommunications (ICT98), Halkidiki, Greece, June 1998. (p 22) [Courcoubetis98c] C. Courcoubetis and V.A. Siris. An Evaluation of Pricing Schemes that are based on Effective Usage. Technical Report 214, Institute of Computer Science, Foundation for Research and Technology, Hellas (ICS FORTH), February 1998. (p 21) [Courcoubetis98d] C. Courcoubetis, G.D. Stamoulis, C. Manolakis, and F.P. Kelly. An Intelligent Agent for Optimizing QoSfor-Money in Priced ABR Connections. Telecommunication Systems, Special Issue on Network Economics 1998. (p 22) [DiffServ01] IETF: Differentiated Services Working Group. http://www.ietf.org/html.charters/ diffserv-charter.html, January 2001. (p 23) E.W. Dijkstra. A Note on Two Problems in Connexion with Graphs. Numerische Mathematik, 1:269271, 1959. (p 26) R.P. Draves, C. King, S. Venkatachary, and B.N. Zill. Constructing Optimal IP Routing Tables. In Proceedings of IEEE Infocom 1999, New York, March 1999. Also available as Microsoft Technical Report MSR-TR98-59. (p 79) R. Edell, P. Varaiya, and N. McKeown. Billing Users and Pricing for TCP. IEEE Journal on Selected Areas in
[Dijkstra59]
[Draves99]
[Edell95]
117
BIBLIOGRAPHY
BIBLIOGRAPHY
Communications, 13(7):11621175, September 1995.
(pp 21, 23)
[Elwalid01]
A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS Adaptive Trafc Engineering. In Proceedings of IEEE Infocom 2001, pages 13001309, Anchorage, Alaska, April 2001. (p 32) Ensim Corp. Ensim Corporation. ensim.com/, 2000. (p 35) http://www.
[Ensim00] [Falkner00]
M. Falkner, M. Devetsikiotis, and I. Lambadaris. An Overview of Pricing Concepts for Broadband IP Networks. IEEE Communications Surveys, Q2 2000. Available from http://www.comsoc.org/ pubs/surveys/. (p 20) K. Fall and S. Floyd. Simulation-based Comparisons of Tahoe, Reno, and SACK TCP. Computer Communication Review, 26(3):521, July 1996. (pp 15, 39, 54) W. Feng, D. Kandlur, D. Saha, and K. Shin. Blue: A New Class of Active Queue Management Algorithms. Technical Report CSE-TR-387-99, University of Michigan, April 1999. Available from http://www.eecs. umich.edu/wuchang/blue/. (p 19) S. Floyd and V. Jacobson. Random Early Detection Gateways for Congestion Avoidance. IEEE/ACM Transactions on Networking, 1(4):397413, August 1993. (pp 19, 41) S. Floyd. TCP and Explicit Congestion Notication. Computer Communication Review, 24(5):1023, October 1994. (pp 14, 15, 39) S. Floyd. Comments on Measurement-based Admission Control for Controlled-Load Services. Technical Report, Lawrence Berkeley National Laboratory, July 1996. (p 20) S. Floyd, M. Handley, J. Padhye, and J. Widmer. Equation-based Congestion Control for Unicast Applications. Computer Communication Review, 30(4):34 56, October 2000. Proceedings of ACM SIGCOMM 2000. (p 19)
[Fall96]
[Feng99]
[Floyd93]
[Floyd94]
[Floyd96]
[Floyd00]
118
BIBLIOGRAPHY
[Fortz00]
BIBLIOGRAPHY
B. Fortz and M. Thorup. Internet Trafc Engineering by Optimizing OSPF Weights. In Proceedings of IEEE Infocom 2000, Tel Aviv, Israel, March 2000. (pp 32, 69) L. Gao, T. Grifn, and J. Rexford. Inherently Safe Backup Routing with BGP. In Proceedings of IEEE Infocom 2001, pages 547556, Anchorage, Alaska, April 2001. (p 32) R. Gibbens, F. Kelly, and P. Key. A Decision-theoretic Approach to Call Admission Control in ATM Networks. IEEE Journal on Selected Areas in Communications, 13(6):11011114, 1995. Special issue on Advances in the Fundamentals of Networking. (p 20) R.J. Gibbens and F.P. Kelly. Measurement-based Connection Admission Control. In V. Ramaswami and P.E. Wirth, editors, Teletrafc Contributions for the Information Age: Proceedings of the 15th International Teletrafc Congress, Washington, DC, pages 879888, 1997. (p 20) R.J. Gibbens and F.P. Kelly. Distributed Connection Acceptance Control for a Connectionless Network. In Key and Smith [Key99b], pages 941952. (p 21) R.J. Gibbens and F.P. Kelly. Resource Pricing and the Evolution of Congestion Control. Automatica, 35:19691985, 1999. (p 14) T. Grifn and G.T. Wilfong. An Analysis of BGP Convergence Properties. Computer Communication Review, 29(4):277288, October 1999. Proceedings of ACM SIGCOMM 1999. (pp 71, 79, 79) J.Y. Hui. Resource Allocation for Broadband Networks. IEEE Journal on Selected Areas in Communications, 6(9):15981608, December 1988. (p 7) IETF: Integrated Services Working Group. http://www.ietf.org/html.charters/ intserv-charter.html, September 2000. (p 23) R. Isaacs. Dynamic Provisioning of Resource-Assured and Programmable Virtual Private Networks. PhD thesis, University of Cambridge Computer Laboratory, December 2000. (p 25)
[Gao01]
[Gibbens95]
[Gibbens97]
[Gibbens99a]
[Gibbens99b]
[Grifn99]
[Hui88]
[IntServ00]
[Isaacs00]
119
BIBLIOGRAPHY
[Jacobson88]
BIBLIOGRAPHY
V. Jacobson and M. Karels. Congestion Avoidance and Control. Computer Communication Review, 18(4):314329, 1988. Proceedings of ACM SIGCOMM 1988. (pp 15, 17, 39) S. Jamin, S.J. Shenker, and P.B. Danzig. Comparison of Measurement-based Admission Control Algorithms for Controlled-Load Service. In Proceedings of INFOCOM97, April 1997. (p 20) S. Jamin, S.J. Shenker, and P.B. Danzig. Measurementbased Admission Control Algorithms for ControlledLoad Service: A Structural Examination. Technical Report CSE-TR-333-97, University of Michigan, April 1997. (p 20) F. Kelly. Charging and Rate Control for Elastic Trafc. European Transactions on Telecommunications, 8:33 37, 1997. (p 21) F. Kelly. Internet Economics, chapter Charging and Accounting for Bursty Connections, pages 253278. MIT Press, 1997. (p 21) F. Kelly, A. Maulloo, and D. Tan. Rate Control in Communication Networks: Shadow Prices, Proportional Fairness and Stability. Journal of the Operational Research Society, 49:237252, 1998. (pp 21, 69) F.P. Kelly, P.B. Key, and S. Zachary. Distributed Admission Control. IEEE Journal on Selected Areas in Communications, 18(12):26172628, 2000. (pp 14, 21, 21,
21)
[Jamin97a]
[Jamin97b]
[Kelly97a]
[Kelly97b]
[Kelly98]
[Kelly00]
[Key99a]
P. Key, D. McAuley, P. Barham, and K. Laevens. Congestion Pricing for Congestion Avoidance. Technical Report MSR-TR-99-15, Microsoft Research, February 1999. http://www.research.microsoft.com/ research/network/disgame.htm. (pp 14, 21, 21,
49)
[Key99b]
P. Key and D. Smith, editors. Teletrafc Engineering in a Competitive World: Proceedings of ITC-16, volume 3b of Teletrafc Science and Engineering. Elsevier Science B.V., June 1999. (pp 119, 127) Keynote.com. Keynote.com. http://www.keynote. com/, 2001. (p 77)
[Keynote01]
120
BIBLIOGRAPHY
[Khanna89]
BIBLIOGRAPHY
A. Khanna and J. Zinky. The Revised ARPANET Routing Metric. Computer Communication Review, 19(4):4556, September 1989. Proceedings of ACM SIGCOMM 1989. (p 31) A. Kumar, M. Hegde, S.V.R. Anand, B.N. Bindu, D. Thirumurthy, and A.A. Kherani. Nonintrusive TCP Connection Admission Control for Bandwidth Management of an Internet Access Link. IEEE Communications Magazine, pages 160167, May 2000. (pp 45, 53) C. Labovitz, G.R. Malan, and F. Jahanian. Internet Routing Instability. Computer Communication Review, 27(4):115126, October 1997. Proceedings of ACM SIGCOMM 1997. (p 79) C. Labovitz, G.R. Malan, and F. Jahanian. Internet Routing Instability. IEEE/ACM Transactions on Networking, 6(5):515528, October 1998. (p 79) C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet Routing Convergence. Computer Communication Review, 30(4):175187, October 2000. Proceedings of ACM SIGCOMM 2000. (p 79) C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary. The Impact of Internet Policy and Topology on Delayed Routing Convergence. In Proceedings of IEEE Infocom 2001, Anchorage, Alaska, April 2001. (pp 71,
79)
[Kumar00]
[Labovitz97]
[Labovitz98]
[Labovitz00]
[Labovitz01]
[Laevens00]
K. Laevens, P. Key, and D. McAuley. An ECN-based End-to-End Congestion Control Framework: Experiments and Evaluation. Technical Report MSR-TR-2000-104, Microsoft Research, October 2000. ftp://ftp.research.microsoft.com/ pub/tr/tr-2000-104.ps. (p 16) W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson. On the Self-Similar Nature of Ethernet Trafc. Computer Communication Review, 23(4):183193, October 1993. Proceedings of ACM SIGCOMM 1993.
(p 10)
[Leland93]
[Leland94]
W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson. On the Self-Similar nature of Ethernet Trafc (extended version). IEEE/ACM Transactions on Networking, 2(1):115, February 1994. (p 10)
121
BIBLIOGRAPHY
[Lin97]
BIBLIOGRAPHY
D. Lin and R. Morris. Dynamics of Random Early Detection. Computer Communication Review, 27(4):127 138, September 1997. Proceedings of ACM SIGCOMM 1997. (p 19) Linx. The London Internet Exchange. http://www. linx.net/, 2001. (p 36) S. Low, L. Peterson, and L. Wang. Understanding TCP Vegas: A Duality Model. In Proceedings of ACM SIGMETRICS, June 2001. (p 16)
[Linx01] [Low01]
[MacKie-Mason95] J.K. MacKie-Mason and H.R. Varian. Pricing Congestible Network Resources. IEEE Journal on Selected Areas in Communications, 13(7):11411149, September 1995. (pp 20, 21) [Mae01] [Massouli 99] e Worldcom Inc. MAE Information Site. http://www. mae.net/, 2001. (p 36) L. Massouli and J.W. Roberts. Arguments in Favour e of Admission Control for TCP Flows. In P. Key and D. Smith, editors, Teletrafc Engineering in a Competitive World: Proceedings of ITC-16, volume 3a of Teletrafc Science and Engineering, pages 3344. Elsevier Science B.V., June 1999. (pp 45, 46) Matrix.net. Matrix.net. http://www.matrix.net/, 2001. (p 77) Measure Web Page. http://www.cl.cam.ac. uk/Research/SRG/netos/old-projects/ measure/, 1998. (p 50) R. Morris. TCP Behaviour with Many Flows. In IEEE International Conference on Network Protocols, Atlanta, Georgia, October 1997. (p 40) R. Morris. Scalable TCP Congestion Control. PhD thesis, Harvard University, January 1999. (pp 15, 40) R. Mortier, I. Pratt, C. Clark, and S. Crosby. Implicit Admission Control. IEEE Journal on Selected Areas in Communications, 18(12):26292639, December 2000.
(p 48)
[Matrix01] [Measure98]
[Morris97]
[Morris99] [Mortier00]
[Mortier01]
R. Mortier, R. Isaacs, and K. Fraser. Switchlets and Resource-Assured MPLS Networks. Technical Report
122
BIBLIOGRAPHY
BIBLIOGRAPHY
510, University of Cambridge Computer Laboratory, Cambridge, U.K., January 2001. (pp 25, 26)
[MPLS] [Murphy94]
J. Murphy and L. Murphy. Bandwidth Allocation By Pricing In ATM Networks. In Second International IFIP Conference on Broadband Communications, BB94, March 1994. (p 21) P. Newman, G. Minshall, and T. Lyon. IP Switching: ATM Under IP. IEEE/ACM Transactions on Networking, 6(2):117129, April 1998. (p 25)
VINT . The UCB / LBNL / VINT Network Simulator, version 2. http://www.isi.edu/nsnam/ns/, 2000.
[Newman98]
[NSv2]
[Odlyzko99a]
A.M. Odlyzko. Paris Metro Pricing for the Internet. In Proceedings ACM Conference on Electronic Commerce (EC99), pages 140147, 1999. (p 21) A.M. Odlyzko. Paris Metro Pricing: The Minimalist Differentiated Services Solution. In Proceedings of the 7th International Workshop on Quality of Service (IWQoS99), pages 159161, London, UK, May 1999.
(p 21)
[Odlyzko99b]
[Odlyzko00]
A.M. Odlyzko. The History of Communications and its Implications for the Internet. Available from http://www.research.att.com/amo/doc/ history.communications0.ps., June 2000.
(pp 13, 17)
[Oliver00]
H. Oliver and D. Songhurst. Market Managed Multiservice Internet. Telektronikk, 96(2):3844, 2000. Project home page at http://www.m3i.org/. (p 23) J. Padhye, J. Kurose, D. Towsley, and R. Koodli. A Model Based TCP-Friendly Rate Control Protocol. In Proceedings of the Ninth International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV99), July 1999. (p 19) R. Pan, B. Prabhakar, and K. Psounis. CHOKE, A Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation. In Proceedings of
[Padhye99]
[Pan00]
123
BIBLIOGRAPHY
BIBLIOGRAPHY
IEEE Infocom 2000, pages 942951, Tel Aviv, Israel, March 2000. (p 19)
[Paschalidis00]
I. Paschalidis and J. Tsitsiklis. Congestion-Dependent Pricing of Network Services. IEEE/ACM Transactions on Networking, 8(2):171184, April 2000. (p 20) V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Transactions on Networking, 2(4):316336, August 1994. (p 56) V. Paxson. Growth Trends in Wide-Area TCP Connections. IEEE Network Magazine, 8(4):817, July/August 1994. (p 56) V. Paxson. Measurements and Analysis of End-to-End Internet Dynamics. PhD thesis, Computer Science Division, University of California at Berkeley, April 1997. LBNL-40319; UCB//CSD-97-945. (p 32) R. Perlman. Interconnections. Addison Wesley Longman, 2nd edition, 2000. (pp 26, 27) A. Rangarajan. Early Regulation of Unresponsive Flows. Masters thesis, University of California at Santa Barbara, July 1999. Technical Report TR-CS-99-26.
(p 19)
[Paxson94a]
[Paxson94b]
[Paxson97]
[Perlman00] [Rangarajan99]
[Rejaie99]
R. Rejaie, M. Handley, and D. Estrin. RAP:An Endto-end Rate-based Congestion Control Mechanism for Realtime Streams in the Internet. In Proceedings of IEEE Infocom 1999, March 1999. (p 19) J. Postel. User Datagram Protocol. August 1980. (p 14) J. Postel. Internet Protocol. 1981. (p 13)
RFC RFC
[RFC 768] [RFC 791] [RFC 793] [RFC 891] [RFC 904] [RFC 975]
768, IETF,
J. Postel. Transmission Control Protocol. IETF, September 1981. (p 15) D.L. Mills. DCN local-network protocols. IETF, December 1983. (p 30)
793, 891,
RFC
D.L. Mills. Exterior Gateway Protocol formal specication. RFC 904, IETF, April 1984. (pp 29, 31) D.L. Mills. Autonomous confederations. IETF, February 1986. (p 31)
RFC
975,
124
BIBLIOGRAPHY
[RFC 1122]
BIBLIOGRAPHY
R. Braden and Ed. Requirements for Internet Hosts Communication Layers. RFC 1122, IETF, October 1989. (p 52) D. Oran and Ed. OSI IS-IS Intra-domain Routing Protocol. RFC 1142, IETF, February 1990. (p 27) David L. Mills. Network Time Protocol (Version 3) Specication, Implementation. RFC 1305, IETF, March 1992. (p 30) J. Moy. OSPF Version 2.
(p 99)
RFC
R. Braden, D. Clark, and S. Shenker. Integrated Services in the Internet Architecture: an Overview. RFC 1633, IETF, June 1994. (p 23) Y. Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4). RFC 1771, IETF, March 1995. (pp 25, 29) Audio-Video Transport Working Group, H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol for Real-Time Applications. RFC 1889, IETF, January 1996. (pp 16,
39)
[RFC 1998]
E. Chen and T. Bates. An Application of the BGP Community Attribute in Multi-home Routing. RFC 1998, IETF, August 1996. (p 71) J. Moy. OSPF Version 2.
(pp 29, 31, 99)
RFC
R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, and S. Jamin. Resource ReSerVation Protocol (RSVP) Version 1 Functional Specication. RFC 2205, IETF, September 1997. (pp 8, 23, 25) J. Moy. OSPF Version 2.
(pp 25, 27, 31)
RFC
C. Villamizar, R. Chandra, and R. Govindan. BGP Route Flap Damping. RFC 2439, IETF, November 1998. (p 92) S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An Architecture for Differentiated Service. RFC 2475, IETF, December 1998. (pp 23, 26)
[RFC 2475]
125
BIBLIOGRAPHY
[RFC 2481]
BIBLIOGRAPHY
K. Ramakrishnan and S. Floyd. A Proposal to add Explicit Congestion Notication (ECN) to IP. RFC 2481, IETF, January 1999. (pp 14, 15, 40) J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski. Assured Forwarding PHB Group. RFC 2597, IETF, June 1999. (p 24) V. Jacobson, K. Nichols, and K. Poduri. An Expedited Forwarding PHB. RFC 2598, IETF, June 1999. (p 24) C. Alaettinoglu, C. Villamizar, E. Gerich, D. Kessens, D. Meyer, T. Bates, D. Karrenberg, and M. Terpstra. Routing Policy Specication Language (RPSL). RFC 2622, IETF, June 1999. (p 71) G. Apostolopoulos, S. Kama, D. Williams, R. Guerin, A. Orda, and T. Przygienda. QoS Routing Mechanisms and OSPF Extensions. RFC 2676, IETF, August 1999.
(p 32)
[RFC 2597]
[RFC 2676]
S. Herzog. RSVP Extensions for Policy Control. 2750, IETF, January 2000. (p 23)
RFC
B. Gleeson, A. Lin, J. Heinanen, G. Armitage, and A. Malis. A Framework for IP Based Virtual Private Networks. RFC 2764, IETF, February 2000. (p 25) T. Bates, R. Chandra, and E. Chen. BGP Route Reection An Alternative to Full Mesh IBGP. RFC 2796, IETF, April 2000. (p 78) S. Floyd. Congestion Control Principles. IETF, September 2000. (p 15)
RFC
[RFC 2796]
2914,
E. Rosen, A. Viswanathan, and R. Callon. Multiprotocol Label Switching Architecture. RFC 3031, IETF, January 2001. (p 24) L. Andersson, P. Doolan, N. Feldman, A. Fredette, and B. Thomas. LDP Specication. RFC 3036, IETF, January 2001. (p 25) P. Traina, D. McPherson, and J. Scudder. Autonomous System Confederations for BGP. RFC 3065, IETF, February 2001. (p 78) I. Rhee, V. Ozdemir, and Y. Yi. TEAR: TCP Emulation at Receivers. Technical Report, Department
[RFC 3036]
[RFC 3065]
[Rhee00]
126
BIBLIOGRAPHY
BIBLIOGRAPHY
of Computer Science, NCSU, April 2000. Available from http://www.csc.ncsu.edu/faculty/ rhee/export/tear_page/. (p 19)
[Sairamesh95]
J. Sairamesh, D. F. Ferguson, and Y. Yemini. An Approach to Pricing, Optimal Allocation and Quality of Service Provisioning in High-Speed Packet Networks. In Proceedings of IEEE Infocom 1995, pages 1111 1119, June 1995. (p 21) N. Semret and A.A. Lazar. Spot and Derivative Markets in Admission Control. In Key and Smith [Key99b], pages 757766. (p 23) S. Shenker, D. Clark, and L. Zhang. A Scheduling Service Model and a Scheduling Architecture for an Integrated Services Packet Network. Technical Report, Xerox PARC, August 1993. (p 20) S. Shenker. Making Greed Work in Networks: A Game-Theoretic Analysis of Switch Service Disciplines. Computer Communication Review, 24(4):4757, August 1994. Proceedings of ACM SIGCOMM 1994.
(p 20)
[Semret99]
[Shenker93]
[Shenker94]
[Shenker95]
S. Shenker. Service Models and Pricing Policies for an Integrated Services Internet. In Public access to the Internet. MIT Press, Cambridge, MA, USA, 1995. (p 20) S. Shenker, D. Clark, D. Estrin, and S. Herzog. The Internet and Telecommunications Policy, chapter Pricing in Computer Networks: Reshaping the Research Agenda. Lawrence Erlbaum Associates, 1996. (p 20) D. Sisalem and H. Schulzrinne. The Loss-Delay Based Adjustment Algorithm: A TCP-Friendly Adaptation. In Proceedings of the 8th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV98), July 1998. (p 19) Genuity Inc. Internet Interconnection Guidelines for Genuity. http://www.genuity.com/ infrastructure/interconnection.htm, 2001.
(pp 97, 98)
[Shenker96]
[Sisalem98]
[SLA - GENUITY01]
[SLA - GIGABELL01] Gigabell AG. Gigabell Peering Policy. http://rs1. gigabell.net/public/peer.html, 2001. (p 97)
127
BIBLIOGRAPHY
[SLA - LEVEL 3]
BIBLIOGRAPHY
Level 3 Communications. Global IP Interconnection Peering Policy. http://www.level3.com/us/ info/network/interconnection/, 2001. (p 97) MCIWorldcom Inc. MAE Connection Guidelines. http://www.mae.net/doc/maecheck.html, 2001. (p 97) Sprint Corporation. Sprints Bi-Lateral Peering Policy. http://gullfoss2.fcc.gov/prod/ ecfs/retrieve.cgi?native_or_pdf=pdf&id_ document=6011256512, 2000. Filed with the FCC as a result of the proposed Sprint-MCIWorldcom merger. (p 97) MCIWorldcom Inc. UUnet North American Peering Policy. http://gullfoss2.fcc.gov/prod/ ecfs/retrieve.cgi?native_or_pdf=pdf&id_ document=6011256523, 2000. Filed with the FCC as a result of the proposed Sprint-MCIWorldcom merger. (pp 97, 98) MCIWorldcom Inc. WorldCom Policy for SettlementFree Interconnection with Internet Networks. http: //www.uu.net/peering/, 2001. (pp 97, 98) Sprint. Sprint Press Release. http: //www.sprintbiz.com/press/0003/ 000322roundtrip.html, March 2000. (p 48) J.W. Stewart III. BGP4 Inter-Domain Routing in the Internet. Addison Wesley Longman, 1999. (p 29)
[SLA - MAE01]
[SLA - SPRINT00]
[SLA - UUNET00]
[Sprint00]
[Stewart99]
[Tangmnarunkit01] H. Tangmnarunkit, R. Govindan, D. Estrin, and S. Shenker. The Impact of Routing Policy on Internet Paths. In Proceedings of IEEE Infocom 2001, Anchorage, Alaska, April 2001. (p 32) [Tassel97] J. Tassel, B. Briscoe, and A. Smith. An End to End PriceBased QoS Control Component Using Reective Java. Lecture Notes in Computer Science, 1356:1832, 1997.
(p 21)
[UKERNA01]
128
BIBLIOGRAPHY
[Varadhan00]
BIBLIOGRAPHY
K. Varadhan, R. Govindan, and D. Estrin. Persistent Route Oscillations in Inter-Domain Routing. Computer Networks, March 2000. Also Technical Report 98-631, Computer Science Department, University of Southern California, September 1997. (p 79) Lawrence Berkeley National Laboratory/UCB. The VIC Video-Conferencing Tool. http://www-mice.cs. ucl.ac.uk/multimedia/software/vic/, 2001.
(p 64)
[Vic01]
[Wang99]
J.L. Wang and A. Erramilli. A Connection Admission Control Algorithm for Self-Similar Trafc. In Proceedings of IEEE Globecom 1999: Symposium on High Speed Networks, pages 16231628, December 1999.
(p 20)
[Wang01]
Z. Wang, Y. Wang, and L. Zhang. Internet Trafc Engineering without Full Mesh Overlaying. In Proceedings of IEEE Infocom 2001, pages 565571, Anchorage, Alaska, April 2001. (p 32) D. Wetherall. OTcl Object Oriented Extensions to Tcl. ftp://ftp.tns.lcs.mit.edu/pub/otcl/ README.html, 2000. (p 54) Xipeng Xiao, A. Hannan, B. Bailey, and L.M. Ni. Trafc Engineering with MPLS in the Internet. IEEE Network Magazine, 14(2):2833, March/April 2000. (p 5) W.T. Zaumen and J.J. Garcia-Luna-Aceves. Dynamics of Distributed Shortest-Path Routing Algorithms. Computer Communication Review, 21(4):3142, September 1991. Proceedings of ACM SIGCOMM 1991. (p 27) W.T. Zaumen and J.J. Garcia-Luna-Aceves. Dynamics of Link-State and Loop-Free Distance-Vector Routing Algorithms. Internetworking: Research and Experience, 3(4):161188, December 1992. (p 27) DML Networks, Inc. The GNU Zebra Routeing Protocol Suite. http://www.zebra.org/, 2000. (p 81)
[Wetherall00]
[Xiao00]
[Zaumen91]
[Zaumen92]
[Zebra00]
129