BGP 1
BGP 1
DD2491, p2 2009
Literature
• Practical BGP
– Follow reading instructions
• RFC 4271
• Many vendor pages
DD2491, p2 2009
Inter-domain routing
• The objective of inter-domain routing is to bind together
all the thousands of independent IP networks that
constitute the Internet
• Perspective from one network
– Decide how to receive information (packets) from the outside
world
– Decide how to spread information to the outside world
• Handling of prefixes
– Receive and choose (filter) between prefixes from other
domains
– Announce prefixes to other domains
• Address aggregation
DD2491, p2 2009
What is BGP?
• Border Gateway Protocol version 4
• Defined in RFC 4271
• An inter-domain routing protocol
• Uses the destination-based forwarding paradigm
– No other relations can be expressed: sources, tos, link load
DD2491, p2 2009
IGP/EGP
ISP
Customer
EGP
IGP
IGP
EGP IGP
– Exterior Gateway Protocol. – Interior Gateway Protocol.
– Runs between – Runs within a
networks/domains (inter- network/domain (intra-
domain) domain)
– Examples: BGP, static routing – Examples: RIP, OSPF, IS-IS.
– Note that BGP can also run
internally in a network: IBGP
DD2491, p2 2009
Why cant we use an IGP?
• On page 3 in the book, there is a chapter:
– Why not use a single protocol for both internal and external
routing?
DD2491, p2 2009
Autonomous Systems (AS)
• A set of routers that has a single routing policy, that run under
a single technical administration
– A single network or group of networks
– University, business, organization, operator
• This is viewed by the outside world as an Autonomous System
– All interior policies, protocols, etc are hidden within the AS
• Represented in the Internet by an Autonomous System
Number (ASN). 0-65535
– Example: ASN 1653 for SUNET
DD2491, p2 2009
General AS graph
AS1 AS2
DD2491, p2 2009
Whois example
gelimer.kthnoc.net> whois -h whois.ripe.net AS1653
aut-num: AS1653
as-name: SUNET
descr: SUNET Swedish University Network
import: from AS42 accept AS42
export: to AS42 announce AS-SUNET
import: from AS702 accept AS702:RS-EURO AS702:RS-CUSTOMER
export: to AS702 announce AS-SUNET
import: from AS2603 accept any
export: to AS2603 announce AS-SUNET
import: from AS2831 accept AS2831 AS2832
export: to AS2831 announce any
import: from AS2833 accept AS2833
export: to AS2833 announce any
import: from AS2834 accept AS2834
export: to AS2834 announce any
DD2491, p2 2009
RIPE
• There are two “classes” of IP addresses
– Provider Independent (PI)
– Provider Aggregatable (PA)
DD2491, p2 2009
RIPE
• PI space is addresses that end customers can request
directly from RIPE.
– Good for the end customer, but bad for the ISP.
– RIPE will also start charging for PI space
DD2491, p2 2009
AS graph and peering relations
Tier 1: Full
Internet
connectivity
AS1 AS2
Transit
NSPs Peer
ISPs AS3 AS4 AS5
Customer
Stubs/
Customers AS6 AS7 AS8 AS9
DD2491, p2 2009
Cost and peering relations
Stubs/
Customers AS6 AS7 AS8 AS9
DD2491, p2 2009
Traffic patterns
NSPs Peer
ISPs AS3 AS4 AS5
Custome
r
Stubs/
Customers AS6 AS7 AS8 AS9
DD2491, p2 2009
Peering relations
• An abstract way of defining peering relations is for example:
• Prefix sets:
– Define a customer set, a peering set and a transit set
• Example rules:
– Customer prefixes should be announced to transit and peers
– Peer and transit prefixes should be announced to customers
– Prefer prefixes from peers over prefixes from transit
– Do not accept illegal prefixes (RFC 1918 for example), or unknown
prefixes from customers
– Load balance over several transit providers
– Filter traffic (eg src addresses) according to the prefixes announced
DD2491, p2 2009
Customer / ISP Relations: Stub AS
Announced
Provider
networks,
traffic flows in
def other direction
n1, aul
t
n2
Customer
• Typical customer topology n1, n2
DD2491, p2 2009
Multi-homed customer
n1, n2
n1, n2
n1, n2
2
n1
, n
,n
n1
2
Customer Customer Customer
2
n1
,n
,n
n1
2
Customer
n1, n2
ISP1 ISP2
n3, n4 n5, n6
,n4
n1
,n
,n3
2,
n5
,n2
,n
n1
6
Multi-homed
transit AS
n1, n2
DD2491, p2 2009
BGP sessions
• BGP connections (peerings) are setup manually using TCP
• Two peers must have IP connectivity
• Things to think about (see RTE):
– How are routes imported into AS2?
AS1 RTD
AS2
RTF RTA
BGP BGP
RTE RTC
AS3
RTB
DD2491, p2 2009
Example JunOS configuration
routing-options {
autonomous-system 1653
}
protocol bgp {
group external-peers {
type external;
peer-as 42;
neighbor 192.168.200.13;
}
group internal-peers {
type internal;
local-address 192.168.24.1;
neighbor 192.168.16.1;
neighbor 192.168.6.1;
}
AS1653
}
EBGP
IBGP
DD2491, p2 2009
Path vector protocol
• In a distance-vector protocol, vectors with destination
information are distributed between routers:
• Example:
– <dst: 10.1.10/24, metric: 5, nexthop: 10.2.3.4>
neighbor/peer neighbor/peer
Establishing BGP session (TCP 179)
UPDATE messages
KEEPALIVE messages
NOTIFICATION messages
DD2491, p2 2009
BGP protocol operation
• OPEN messages to initiate a connection and exchange capabilities
• UPDATES contain
– A set of path attributes
– A set of prefixes sharing the path attributes
– A set of withdrawn routes
• BGP compares the AS path and other attributes to select the best path
for a prefix
– Same prefix may be received from several peers
• Path attributes describes properties of the route
– How it was generated, which is the nexthop, various metrics, etc
• NOTIFICATION to signal errors
• KEEPALIVE to check liveness of peer
DD2491, p2 2009
BGP Connections
• All updates are incremental • This assumes IP connectivity
– Basic case: No refreshes between peers!
– Extension: RFC 2918: Route – But via other mechanism
Refresh Capability fro BGP-4 than BGP
DD2491, p2 2009
BGP Finite State Machine
Active Connect
OPEN
Idle
OPEN
TCP connection
NOTIFICATION
NOTIFICATION
OpenSent
NOTIFICATION
KEEPALIVE
OpenConfirm
Established
KEEPALIVE
BGP connection KEEPALIVE UPDATE
NOTIFICATION
DD2491, p2 2009
BGP message header
• BGP message header format
0 7 1 2 3
5 3 2
Marker
Length Type
• Marker field:
– Authentication of incoming BGP messages
– Detect loss of synchronization between two BGP peers
DD2491, p2 2009
OPEN message
0 7 1 2 3
5 3 2
Version
My autonomous system
Hold time
BGP identifier
Opt parm len
DD2491, p2 2009
Withdrawn Routes
• Withdrawn Routes Length: total length of Withdrawn Routes
field
– 0 means no routes being withdrawn and Withdrawn Routes field is
not present in this UPDATE message
DD2491, p2 2009
Path Attributes
• Total Path Attribute Length: total length of the Path Attribute
field
– '0' indicates that neither NLRI field nor the Path Attribute field is
present in the UPDATE message
• Path Attributes:
– A sequence of path attributes is presents in every UPDATE message
except message that carries only withdrawn routes
– Each part attribute is a triple of
<attribute type, attribute length, attribute value>
DD2491, p2 2009
Path Attribute type
• Attribute type: consists of type code and flags
• Type code: contains the attribute code maintained by IANA
• Flags
– Bit 0: well-known (0) or optional (1)
• Universally known?
– Bit 3: for attribute length; one octet (0) or two octets (1)
DD2491, p2 2009
NOTIFICATION message
0 7 15 23 32
DD2491, p2 2009
KEEPALIVE message
• Periodically sent to determine whether peers are
reachable
• Sent at a rate that ensures that hold time will not
expire
– Recommended rate is one-third of the Hold Timer
– Must not be sent more frequently than one per second
– If Hold Timer is 0, periodic KEEPALIVE must not be sent
DD2491, p2 2009
Path attributes categories
• Path attributes are characterize according to categories:
• Well-known: All BGP implementations must recognize them
– Mandatory: Must always be present in all updates
– Discretionary: May or may not be sent in an UPDATE
DD2491, p2 2009
Early BGP path attributes
• Type code 1: ORIGIN (RFC4271)
• Type code 2: AS_PATH (RFC4271)
• Type code 3: NEXT_HOP (RFC4271)
• Type code 4: MULTI_EXIT_DISC (RFC4271)
• Type code 5: LOCAL_PREF (RFC4271)
• Type code 6: ATOMIC_AGGREGATE (RFC4271)
• Type code 7: AGGREGATOR (RFC4271)
• Type code 8: COMMUNITY (RFC1997)
DD2491, p2 2009
ORIGIN
• Well-known mandatory
• Defines the origin of the path information
• Types:
– IGP (0) NLRI is internal to the originating AS
(eg learnt via IGP)
– INCOMPLETE (2) NLRI is learned by some other means
(eg static route)
DD2491, p2 2009
ORIGIN example
R1 export direct
R2 export static R4 export IGP EBGP
AS2
IBG
RTD RTB
P R1 origin: INCOMPLETE
R2 origin: INCOMPLETE
AS1 R3 origin: (defined by origin)
R3 learnt via EBPG R4 origin: IGP
DD2491, p2 2009
AS_PATH (cont.)
AS2
1 21
AS1
AS3
192.16.1.0/24
X 1
4
3 32
2
1
AS4
192.16.2.0/23
AS Path: 3 {1 2}
AS3 AS4
192.16.3.0/24
AS Path: 2
AS2
192.16.3.0/24
DD2491, p2 2009
AS_PATH Manipulation
• The AS_PATH can be manipulated to affect inter-domain
routing behavior
• The AS_PATH can be lengthened to make a path less
preferable
• This affects all ASes that receives this prefix update
• Unlike the MED that only can affect how a neighboring
AS sends traffic to you
• Affects how incoming traffic is routed
• Is achieved by prepending dummy ASNs to the AS_PATH
DD2491, p2 2009
Routing case before Manipulation
192.16.0.0/24 – 30 20 10
AS20 10G AS30
10G
192.16.0.0/24 – 20 10
10G
AS10 192.16.0.0/24 - 10 IX
10G
192.16.0.0/24
1G AS40
192.16.0.0/24 – 40 10
DD2491, p2 2009
Routing case after manipulation
192.16.0.0/24 – 30 20 10
AS2 10G AS3
10G
0 0
192.16.0.0/24 – 20 10
10G
AS1 192.16.0.0/24 - 10 IX
0
192.16.0.0/24 10G
1G AS4
192.16.0.0/24 – 40 10 10 10
192.16.0.0/24 – 10 10 10 0
192.16.0.0/24
RTC
AS1
1.1.1.1 RTA
3.3.3.3 192.16.1.0/24
EBGP
2.2.2.2
DD2491, p2 2009
NEXT_HOP on Multiaccess Media
• When advertising route on a multi-access media , the
next hop can be an IP address of the interface of the
router connected to the medium that originated the
route
– This is called third-party next-hop
10.0.0.0/24
.2
EBGP
OSP
.1 F
.3
11.0.0.0/24
via 10.0.0.3 11.0.0.0/24
DD2491, p2 2009
MULTI-EXIT-DISCRIMINATOR (MED)
• Optional non-transitive
• Used on external links to discriminate among
multiple links to the same neighboring AS
• Lower MED is preferred
• MED received from external peer must not be
propagated to other neighboring AS:s
DD2491, p2 2009
MULTI-EXIT-DISCRIMINATOR (cont.)
RTA RTB
AS1
Link A Link B
MED=70 MED=120
RTD
RTE
AS2 RTC
192.16.0.0/24
RTD
RTE
AS2 RTC
192.16.0.0/24 192.16.1.0/24
DD2491, p2 2009
MULTI-EXIT-DISCRIMINATOR (cont.)
RTA RTD
AS1 MED=5
0 AS3
MED=7 MED=120
RTE
0
AS4
RTB RTC
192.16.0.0/24
AS2
• AS1 will select RTB over RTC, but chooses between RTB and
RTD using other means
• MEDs can not be compared from different ASs!
DD2491, p2 2009
Using MED as tie-breaker
• The use of MED as tie-breaker is controlled by several sub-
settings
• CISCO for example, has a non-deterministic comparison by
default based on age of the routes (newer routes are pairwis
compared).
– cisco-non-deterministic parameter in JunOS.
– deterministic-med in CISCO
• You can also set always-use-med to use MED comparisons
from different AS:s
– Can be useful if you are among a group of AS:s that trust each
other.
DD2491, p2 2009
LOCAL-PREF
• Well-known discretionary
• Used as local policy to set degree of preference of
routes when announcing to other internal peers
• Used locally within the AS
• A higher local preference is preferred(!)
DD2491, p2 2009
LOCAL_PREF (cont.)
AS1
192.16.0.0/24
AS2 AS3
T1 Link T3 Link
IBGP
RTB RTC
AS4
Set LOCAL_PREF = 200 Set LOCAL_PREF = 300
DD2491, p2 2009
MED versus LOCAL_PREF
• MED is announced to other AS:s
– Used by your neighbors to tell you how they want to receive
traffic
DD2491, p2 2009
ATOMIC_AGGREGATE
• Well-known discretionary
• Set to indicate information loss
– There may be longer prefixes to AS:s not in AS_PATH
– Alternative to using AS_SET
DD2491, p2 2009
AGGREGATOR
• Used in combination with ATOMIC_AGGREGATE
• Optional transitive
• Contains ASN and IP address of BGP speaker that
aggregates the route
AS1
192.16.0.0/24 AS3
RTA 192.16.0.0/23: AS3 AS2
130.247.203.1
AS2
192.16.0.0/23 ATOMIC_AGGREGATE
AGGREGATOR=(3, 130.247.203.1)
DD2491, p2 2009
Vendor specific tie-break
• CISCO has several own rules
– Weight (cisco-specific)
– Routes are compared by default pairwise in the order
they arrived (non-deterministic).
Example:
entry1: AS(PATH) 500, med 150, external, routerid 172.16.13.1
entry2: AS(PATH) 100, med 200, external, routerid 1.1.1.1
entry3: AS(PATH) 500, med 100, internal, routerid 172.16.8.4
DD2491, p2 2009
Peering sessions
• Neighbor negotiation of IBGP and EBGP are the same
– IBGP peering: within an AS
– EBGP peering: between AS:s
• Two peers must have IP connectivity
– Simple check: they should be able to ping each other
• If EBGP peers are not physically connected: Multihop EBGP
AS1 RTD
AS2
RTF RTA
EBGP
EBGP Multihop
RTE IBGP RTC
Physical
Logical
AS3
This router does not RTB
DD2491, p2 2009
run BGP
IBGP loopback peering
AS2
RTC
RTB eth0 lo
RTE
eth1
RTD
DD2491, p2 2009
NEXTHOP is
192.168.200.2
How do I reach it?
EBGP peering
AS1 DMZ:
IGP .1 192.168.200.0/30 .2 AS2
IBGP
RTC RTA EBGP RTB
DD2491, p2 2009
EBGP: Next-hop self
NEXTHOP NEXTHOP is
is 10.0.0.1 192.168.200.2
AS1 lo: 10.0.0.1 DMZ:
IGP .1 192.168.200.0/30 .2 AS2
IBGP EBGP
RTC RTA RTB
• Alternative:
– Set next-hop-self
– Announce routes using the loopback address of the border
router as next-hop
– DMZ does not need to be distributed within the AS
• But RTA still uses the directly connected DMZ address
DD2491, p2 2009
EBGP nexthop: recursive lookup
RTD
AS1 130.2.3.0/24
.1
12.0.0.0/30
lo: 10.0.0.1 DMZ:
IGP 192.168.200.0/30 AS2
IBGP .1 .2
RTC RTA EBGP RTB
DD2491, p2 2009
Alternative EBGP peering: multi-hop
RTA RTB
DD2491, p2 2009
Alternative EBGP peering: redundancy/load
balancing
AS1 EBGP
RTA RTB
AS2
DD2491, p2 2009
Alternative EBGP peering: redundancy/load
balancing
AS1 AS2
DD2491, p2 2009
How to transit traffic?
AS1 AS2
RTA EBGP RTB
RTC AS3
• How does RTC and RTD know how to forward transit traffic
between RTA and RTF?
• You cannot use default routes. Why?
• You may use IGP to distribute external routes.
• But most common nowadays is to use IBGP to distribute
external routes internally.
DD2491, p2 2009
How to transit traffic using IGP
RTC
AS1 IGP AS2
IGP
RTA EBGP RTB IGP
IBGP AS3
IGP
RTE EBGP RTF
RTD
• You can inject all BGP routes into your IGP as external routes
• Scales badly – High memory consumption for the IGP and will
take time to converge
• There is also a problem with synchronization between the IGP
and EBGP
– Can you announce a route even though your IGP has not
converged?
DD2491, p2 2009
Synchronization between IGP and BGP
Is 192.16.124.0/24
RTC
AS1 192.16.124.0/24 IGP AS2
propagated in my AS?
IGP
RTA EBGP RTB IGP
IBGP
192.16.124.0/24 AS3
IGP
192.16.124.0/24
RTE EBGP RTF
RTD
DD2491, p2 2009
Using IBGP: Full mesh
RTC
AS1 AS2
RTA EBGP RTB
AS3
– Never reannounce routes to an IBGP peer learned from another IBGP peer
DD2491, p2 2009
IBGP full mesh
• So IBGP needs to be fully meshed in order for:
– All internal routes to receive all external routes
– Loop prevention (no difference in AS_PATH)
DD2491, p2 2009
Route advertisement rules
• BGP next-hop must be reachable
– Consequence: If IGP fails, BGP route is not announced
DD2491, p2 2009
Private ASes
• Sometimes it is not necessary to have a public ASN
• IANA has reserved the ASN range 64512 – 65535 for
internal use within a system
• Can be used for customers that are single-homed or
multi-homed to the same provider
• Private ASNs must not be announced globally
• Providers must strip private ASNs before announcing the
prefixes on to the rest of the Internet
• Purpose of private AS:s is to conserve AS numbers and
hide networks
DD2491, p2 2009
Private ASes, cont’d
192.16.0.0/24 - 1
R3 AS7
AS1 R2 AS1
R2
192.16.0.0/24
DD2491, p2 2009
Extensions
• BGP is under constant development
• New operational problems and new technologies
require extensions to the protocol
• Extensions are introduced, standardized, and
implemented
• Implementation of extensions:
– Negotiated via BGP capabilities when peering is set up.
– Sent as optional transitive attributes and either recognized
or not
DD2491, p2 2009
Extensions example
• BGP extensions
– BGP communities attribute (RFC1997)
– Route refresh capability (RFC2918)
– BGP multipath (RFC3107)
– Capabilities advertisement (RFC3392)
– BGP route reflection (RFC4456)
– Multi-protocol extensions (RFC4760)
– Graceful restart (RFC4724)
– Four-byte AS (RFC4893)
– Autonomous system confederations (RFC5065)
• TCP extension
– TCP MD5 signature option (RFC2385)
DD2491, p2 2009
Capabilities Advertisement
• An 'extension to negotiate extensions'
• Announce supported capabilities to the peer with OPEN
message using options parameter
• Some capabilities are (~same as previous slide)
Value Description Reference
0 Reserved RFC3392
1 Multiprotocol Extensions for BGP-4 RFC4760
2 Route Refresh Capability for BGP-4 RFC2918
3 Cooperative Route Filtering Capability
4 Multiple routes to a destination capability RFC3107
5-63 Unassigned
64 Graceful Restart Capability RFC4724
65 Support for 4-octet AS number capability RFC4893
66 Deprecated (2003-03-06)
67 Support for Dynamic Capability (capability specific)
68-127 Unassigned
128-255 Vendor Specific
DD2491, p2 2009
The COMMUNITY attribute
• RFC 1997 defines a 4-byte COMMUNITY attribute as optional
transitive
• A group of destinations that share some common property
• Used to simplify routing policies based on logical property
rather than IP prefix or ASN
• Format
– First 2-bytes ASN, last 2-bytes defines a value (ASN:value)
– Example 5678:90 (0x162E005A)
• A route can have more than one community attribute
• BGP speaker can add and modify a community attribute before
passing routes on to other peers
DD2491, p2 2009
NO_EXPORT example
192.16.0.0/23
192.16.1.0/24 NO_EXPORT
192.16.1.0/24 192.16.0.0/23
• Other well-known:
– NO_ADVERTISE: a route should not be advertised to other BGP
peers
DD2491, p2 2009
Extended communities
I T Type[Subtype] Data
1 byte 1 byte 6-7 bytes
DD2491, p2 2009
Use of Communities
• Communities are used extensively in modern networks for
defining policies
– Both internally and between networks (if they have agreed)
DD2491, p2 2009
Community configuration example (1)
Tagging a community at the edge (or by the other peer):
}
}
DD2491, p2 2009
Community configuration example (2)
Using the community to implement a policy:
DD2491, p2 2009
Multiprotocol extension for BGP-4
• Support routing of other network layer protocols than IPv4
• NLRI and NEXTHOP fields in the UPDATE message are IPv4
specific
• Use a generalized address form using:
• AFI - Address Family Identifier
• SAFI - Subsequent Address Family Identifier
DD2491, p2 2009
Multiprotocol extension for BGP-4
• Examples
– IPv4 unicast: AFI=1, SAFI=1
– IPv4 multicast: AFI=1, SAFI=2
– L3VPN: AFI=1, SAFI=128
– IPv6 unicast: AFI=2, SAFI=128
– IPX: AFI=11
DD2491, p2 2009
Route refresh capability
• By default, BGP has no mechanism to dynamically request for
re-advertisement of routes from a peer
• If a route (its attributes) does not change, a BGP speaker does
not re-announce it
• Therefore, a receiver needs to cache all previous routes
– Even if not required at a specific moment
• This places a lot of load (memory) on the receiver
• Suppose a router changes input policy and does not want to
store all data from all neighbors “just in case”
• With the ROUTE-REFRESH message, a router can request to
get the complete Adj-RIB-Out from a neighboring router
DD2491, p2 2009
TCP MD5 Signature option
• Provides a mechanism for TCP to carry a digest message in
each TCP segment using a shared secret
– MD5 message digest algorithm
– Verification of authenticy (no encryption)
– manually configured
• Graceful restart:
– Tell your peers that: You are going down but you will be back, please
continue forwarding to me until I am back.
DD2491, p2 2009
Route flap damping example
Penalty (flaps)
Stop announcing
Suppress limit
Reuse limit
Start announcing
Time
DD2491, p2 2009
Four-byte AS numbers
• 2-byte ASNs are quickly running out
• 4-byte ASNs have been standardized re-using the
AS_PATH and a migration technique using a special 2-byte
ASN: 23456.
• The migration technique maps all 4-byte ASNs to 23456
when NEW speakers (that have four-byte AS capability)
talk to OLD speakers (those that do not have 4-byte AS
capability)
DD2491, p2 2009
AS numbers in BGP
• Where does BGP carry AS numbers?
– In the UPDATE message (my ASN)
– In the AS_PATH attribute
– In the Aggregator attribute
– In Communities attributes
DD2491, p2 2009
New attributes
• Two new attributes (optional transitive)
– AS4_PATH
– AS4_AGGREGATOR
DD2491, p2 2009
NEW speaking with OLD
• NEW converts all 4-byte ASNs in AS_PATH to 23456
• NEW creates the attribute AS4_PATH to “tunnel” the
4-byte AS-path to other NEW speakers.
• When NEW receives a route from OLD with an
AS4_PATH attribute, it constructs a new AS_PATH
replacing all 23456 with the corresponding 4-byte
AS:s in AS4_PATH.
DD2491, p2 2009
Example
AS70000 AS50 AS100