DayOne Green Seamless EVPN
DayOne Green Seamless EVPN
DayOne Green Seamless EVPN
The various data center interconnect options covered in this book show how flexible the EVPN
control plane has become over the last couple of years. It proves that it can be used in green- DAY ONE: SEAMLESS EVPN-VXLAN
field scenarios, stitching VXLAN-to-VXLAN, as well as brownfield where sometimes VXLAN-
TUNNEL STITCHING FOR DC AND DCI
DAY ONE: SEAMLESS EVPN-VXLAN TUNNEL STITCHING FOR DC AND DCI NETWORK OVERLAY
to-MPLS stitching is a better choice to quickly interconnect remote data centers, as well as
larger scale POP locations or remote campuses sites, allowing them to access data from differ-
ent data center locations.
This book covers six different DCI options with a larger focus on EVPN-VXLAN-to-EVPN-VX-
NETWORK OVERLAY
LAN for L2 bridging/switching purposes, and EVPN-VXLAN-to-EVPN-VXLAN for L3 IP prefix
advertisement purposes. While the MPLS DCI is still very popular, the authors contend that
the VXLAN-to-VXLAN option fulfills most site requirements, opening it to additional emerging
use cases where Group Based Policy (GBP) profiles are also extended between DC sites for
micro-segmentation purposes.
Day One: Seamless EVPN-VXLAN Tunnel Stitching for DC and DCI Network Overlay is a thorough
examination of Junos tunnel stitching techniques for data centers by some of the leading ex-
perts at Juniper Networks.
“The data center space is hot with emerging technology. Indeed, data center design has fundamen-
tally changed over the last several years as new technology has brought new capabilities together
with better operational design. But navigating new technology can be difficult. How do you know
which designs to use and how to avoid pitfalls of working in only abstract ideas? This book intro-
duces various options for handling DCI, providing step-by-step instruction on how to decide, design,
deploy, and manage. This is a difference-maker for network architects, engineers, and operators.”
Michael Bushong, GVP Cloud-Ready Data Center, Juniper Networks
IT’S DAY ONE AND YOU HAVE A JOB TO DO: Optimize your overlay data center interconnections with
n Understand the new DCI overlay options EVPN-VXLAN and EVPN-MPLS tunnel stitching techniques.
n Configure seamless stitching for different form of encapsulations
n Learn advanced EVPN topics related to the fabric deployment on different data center sites
n Deploy EVPN-VXLAN to EVPN-VXLAN and EVPN-VXLAN to EVPN-MPLS seamless stitching
n Verify the operation of EVPN-VXLAN fabrics and DCI
The various data center interconnect options covered in this book show how flexible the EVPN
control plane has become over the last couple of years. It proves that it can be used in green- DAY ONE: SEAMLESS EVPN-VXLAN
field scenarios, stitching VXLAN-to-VXLAN, as well as brownfield where sometimes VXLAN-
TUNNEL STITCHING FOR DC AND DCI
DAY ONE: SEAMLESS EVPN-VXLAN TUNNEL STITCHING FOR DC AND DCI NETWORK OVERLAY
to-MPLS stitching is a better choice to quickly interconnect remote data centers, as well as
larger scale POP locations or remote campuses sites, allowing them to access data from
different data center locations.
This book covers six different DCI options with a larger focus on EVPN-VXLAN-to-EVPN-VX-
NETWORK OVERLAY
LAN for L2 bridging/switching purposes, and EVPN-VXLAN-to-EVPN-VXLAN for L3 IP prefix
advertisement purposes. While the MPLS DCI is still very popular, the authors contend that
the VXLAN-to-VXLAN option fulfills most site requirements, opening it to additional emerging
use cases where Group Based Policy (GBP) profiles are also extended between DC sites for
micro-segmentation purposes.
Day One: Seamless EVPN-VXLAN Tunnel Stitching for DC and DCI Network Overlay is a thorough
examination of Junos tunnel stitching techniques for data centers by some of the leading ex-
perts at Juniper Networks.
“The data center space is hot with emerging technology. Indeed, data center design has fundamen-
tally changed over the last several years as new technology has brought new capabilities together
with better operational design. But navigating new technology can be difficult. How do you know
which designs to use and how to avoid pitfalls of working in only abstract ideas? This book intro-
duces various options for handling DCI, providing step-by-step instruction on how to decide, design,
deploy, and manage. This is a difference-maker for network architects, engineers, and operators.”
Michael Bushong, GVP Cloud-Ready Data Center, Juniper Networks
IT’S DAY ONE AND YOU HAVE A JOB TO DO: Optimize your overlay data center interconnections with
n Understand the new DCI overlay options EVPN-VXLAN and EVPN-MPLS tunnel stitching techniques.
n Configure seamless stitching for different forms of encapsulation
n Learn advanced EVPN topics related to the fabric deployment on different data center sites
n Deploy EVPN-VXLAN to EVPN-VXLAN and EVPN-VXLAN to EVPN-MPLS seamless stitching
n Verify the operation of EVPN-VXLAN fabrics and DCI
© 2023 by Juniper Networks, Inc. All rights reserved. Kishore Tiruveedhula is a Senior Staff Engineer with
Juniper Networks and Junos are registered trademarks of Juniper Networks. He has over 20 years of experience designing
Juniper Networks, Inc. in the United States and other countries. and implementing routing protocols features in MPLS and VPN
The Juniper Networks Logo and the Junos logo, are trademarks technologies. And most recently developed and implemented
of Juniper Networks, Inc. All other trademarks, service marks, multiple EVPN DCI features like seamless interconnection of
registered trademarks, or registered service marks are the EVPN-VXLAN to EVPN-VXLAN, EVPN-VXLAN to
property of their respective owners. Juniper Networks assumes EVPN-MPLS and DCI type 5 stitching, which are covered in
no responsibility for any inaccuracies in this document. Juniper this Day One book.
Networks reserves the right to change, modify, transfer, or
otherwise revise this publication without notice. Authors’ Acknowledgments
Published by Juniper Networks Books Elisabeth Rodrigues thanks her beloved partner, Sylvain, and
Authors: Elisabeth Rodrigues, Michal Styszynski, and Kishore her children, Rachel and Ruben, for their patience and love. A
Tiruveedhula special thanks to Michal Styszynski for accepting her contribu-
Reviewers: Sean Clarke, Adrien Desportes tion to this book. A big thank you to her managers and directors,
Editor in Chief: Patrick Ames Dirk van den Borne, Selmane B. Slama, and Tom Ruban for
their support.
Version History: v1, February 2023
2 3 4 5 6 7 8 9 10 Michal Styszynski would like to thank his wife Kasia, sons
Ernest and Marcel for their love, patience, and understanding.
About the Authors He thanks the support when writing this book to his current
managers Michael Bushong, Subramaniam Vinod, and Mahesh
Elisabeth Rodrigues is a technical solution consultant Subramaniam, as well as his former managers, Tim McCarthy
specialized in data center networking. She supports the sales and Praful Lalchandani.
teams in their data center projects by providing guidance in the
design of solutions, delivering technical deep-dive presentations Kishore Tiruveedhula would like to thank Michal Styszynski for
and demonstrations to customers, and leading Proof-of-Concept starting this great initiative Day One book and the opportunity to
tests. She has been working for Juniper for 8 years and has been contribute to this book. He would like to thank Wen Lin,
a Junos fan for more than 15 years. Distinguished Engineer, and Selvakumar Sivaraj, Distinguished
Engineer, for their valuable suggestions during implementing
Michal Styszynski is a Sr Product Manager in CRDC - the DCI features. He thanks managers Vrishab Sikand,
Cloud Ready Data Center business unit at Juniper Networks, Raveendra Torvi, and Manish Gupta for giving him the
focusing on Junos and Junos Evolved for the QFX product line. opportunity to work on implementing these EVPN DCI
Michal joined Juniper Networks over 10 years ago. Before his features.
current PLM role, he also worked in technical marketing and
product consulting, focusing on data centers, storage networking Everything shared in this book was possible thanks to the hard
projects for major telcos and large enterprises. Before Juniper, work of Juniper’s software development and test engineering
he worked for about 10 years at Orange/FT R&D and TPSA. colleagues. Special thanks go to Wen Lin and Selvakumar
Michal graduated from the Electronics & Telecommunication Sivaraj from the Distinguished Engineering team for their
faculty at Wroclaw University of Science & Technology. He’s leadership in architecting EVPN solutions. Appreciations also
JNCIE-DC#523 and PEC, PLC, PMC certified from the go to our Editor in Chief, Patrick Ames, reviewers Sean Clarke,
Product School, San Francisco, California. Sr. Manager for Worldwide Proof-of-Concept, and Adrien
Desportes, Sales Engineering Manager at Juniper Networks.
v Welcome to Day One
You can build out the lab topologies used in this book without detailed instructions.
Learn advanced EVPN topics related to the fabric deployment on different data
center sites
Deploy EVPN-VXLAN to EVPN-VXLAN and EVPN-VXLAN to EVPN-MPLS
seamless stitching
Verify the operation of EVPN-VXLAN fabrics and DCI.
NOTE: The seamless stitching for EVPN pure Type-5 to Type-5 route based VxLAN tunnels are supported starting release 22.4R1
for Junos and Junos-Evolved based platforms mentioned here.
vii Preface
Preface
Secure and ‘always-on’ access to the data was important for the last 20 years but with the
global pandemic outbreak, many organizations realized even more about the significance
of getting access to the most relevant and up to date information.
The critical part in getting information on time is the data center ecosystem, where mas-
sive files are stored and where servers offer ultra-fast data processing. Interconnecting
data center networks located in different geographic regions is likewise important to
ensure the data gets replicated between different regions and access to the information is
still offered in case of natural disaster (wildfires, floods), power outages or persistent secu-
rity attack at one of the data center network locations. For many organizations, 24/7
secure access to the data through private data center interconnect investments is repre-
senting an important competitive advantage.
To ensure multi-site data center interconnect is delivered in a simple, secure, and agile
way, the use of network virtualization overlay (NVO) techniques recently became more
popular.
This book focuses on how to unify and optimize the overlay data center interconnections
with EVPN-VXLAN and EVPN-MPLS tunnel stitching techniques. We’ll walk through
the main use cases and architectures where the NVO (Network Virtualization Overlay)
seamless tunnel stitching technique can become even more relevant. The implementa-
tion and verification tasks using prescriptive DC and DCI designs should help you to un-
derstand precisely how to use it in your production networks
Glossary
ARP = Address Resolution Protocol
aka = also known as
AS = Autonomous System
ASN = Autonomous System Number
BGP = Border Gateway Protocol
BO = Bridged Overlay
CE = Customer Edge
CFM = Connectivity Fault Management
CLI = Command Line Interface
CRB = Centrally-Routed Bridging
DC = Datacenter
DCI = Datacenter Interconnect
viii Glossary
iESI = Interconnect ESI (EVPN Segment Identifier) VNI = VXLAN network identifier
LACP = Link Aggregation Control Protocol VPLS = Virtual Private LAN Service
LAG = link aggregation group VPN = Virtual Private Network
LAN = Local Area Network VRF = Virtual routing and forwarding
MAC = Media Access Control
VTEP = Virtual Tunnel End Point
MAN = Metropolitan Area Network
VXLAN = Virtual Extensible LAN
MC-LAG = Multi Chassis LAG
WAN = Wide Area Network
NLRI = Network Layer Reachability Information
MPLS = Multiprotocol Label Switching
.
Chapter 1
NVOs (Network Virtualization Overlays) have become popular in the DC and DCI net-
work infrastructure context mainly because they are offering design agility for quicker,
simpler service delivery, meaning efficiently crossing the underlay network boundaries
with limited requirements for the underlay IP routing. The overlay’s distributed state is
typically offering a much higher scale, better growth over time than the traditional 3-tier
LAN DC network design with reduced failure domain.
Using the overlay networking approach helps build unified LAN DC and DCI ecosys-
tems with the common BGP EVPN overlay signalization protocols in both domains while
offering various tunneling transport options for the data center interconnect.
With the newer EVPN-VXLAN overlay techniques, interconnecting data centers be-
came much easier and faster to implement because it offers less dependency on the core-
IP capabilities. For example, when comparing the EVPN-VXLAN to traditional
VPLS-MPLS L2 networking, we can highlight that the newer overlay technology is not
reserved anymore to telco’s and larger enterprises, and now small to medium enterprises
can use it directly at the DC fabric level, without additional CAPEX/OPEX spendings
on dedicated gateways/licenses or without asking the WAN teams to deliver such con-
nectivity for services extension and high availability.
Before focusing on our main topic of this book - the DCI – and to better understand the
DC interconnect part, let’s review the characteristics, requirements, and outcomes of the
modern DC fabric where EVPN-VXLAN is used as a main technology for Layer 2 and
Layer 3.
10 Chapter 1: Data Center Overlay and EVPN
Active/Active forwarding
VM server mobility
Some of the requirements mentioned above are not completely new and were around in
the DC networking industry for a long time. The way these requirements are fulfilled
with overlay networks is solved in the protocol design and is done automatically within
BGP EVPN signalization, instead of a monolithic spanning-tree combined with multi-
VRF access/aggregation/ core designs.
The following list of the EVPN feature set is needed to ensure all requirements can be
combined under the same overlay network in the data center.
Multi-homing
All-active redundancy mode is a strong requirement in data center networks, and it must
use an algorithm that ensures that the frames are delivered in order for a given traffic
flow.
Traffic forwarding must be optimized for a multi-homed group. It should not be forward-
ed between PE devices that are members of a multi-homed group unless the destination
CE is attached to one of the multi-homed PEs.
Figure 1.1 is an example topology where leaf L2 is a member of two different ESI-LAGs
with two different leaves – one ESI towards server1 and the other ESI to server2.
12 Chapter 1: Data Center Overlay and EVPN
In this scenario the local bias will be acting for the flows sent to leaf2 and destined to
server1 or server2. All ECMP paths to remote leaf nodes are typically used from leaf to
spines but also from each spine to destination leaf nodes – when the destination MAC
address is on a multihomed server connected via an ESI-LAG.
Ease of Provisioning
The PEs leaf nodes belonging to a given VPN or a given redundancy multi-homed
group must be auto discovered. Different ESI types were defined by IETF and can help
delivering the L2 multi-homing in much easier way, for example the ESI type-1 (not to
confuse with EVPN route-type 1) generates the ESI 10 bytes value based on the LACP
system-id information received from the server or the CE switch. Same goes for the ESI
type-3 which can build the ESI value automatically based on the system MAC@ and
local discriminator.
Otherwise, the ease of provisioning is brought by fabric managers such as Apstra or via
the auto-EVPN options delivered across the fabric using the RIFT protocol as an
underlay.
Fast convergence
Virtualized applications increase the volume of MAC addresses. The network conver-
gence time upon failure must be independent of the number of MAC addresses learned
by the PE and it should be independent of the number of service instances.
Flood suppression
The data center network minimizes the amount of flooding of multi-destination frames
and eliminates unnecessary flooding upon topology changes, especially in the case of a
multi-homed site.
Multicast optimizations
By default, when ingress replication is in use, multicast traffic is flooded to remote PEs of
a given bridge domain.
To optimize multicast L2 traffic in each bridge domain, PEs (leaf nodes) can act as
IGMP/MLD proxies to make sure multicast traffic is only sent to interested receiver
hosts in an efficient way. Each time a host is interested in a multicast group, it sends an
IGMP/MLD Membership Report. Then, the IGMP/MLD router periodically sends
Membership Queries.
The goal of IGMP / MLD proxy solution in EVPN is to reduce the flooding of IGMP/
MLD messages and act as a distributed multicast router forwarding multicast traffic only
to PEs that have hosts interested in the multicast group.
To achieve IGMP/MLD proxy, three new route types are described in RFC9251:
RT6: Selective Multicast Ethernet Tag route (SMET) ensures that each PE sends its
local IGMP membership requests and leave group state. Instead of flooding the
IGMPv2 membership report coming from the receiver server into the fabric, it’s
transformed into the EVPN Type-6 route to advertise the intent to receive the given
multicast feed.
RT7 and RT8: Multicast Membership Report Sync route / Multicast Leave Sync
route are used to coordinated IGMP states in the case of multi-homing where PEs
share a given Ethernet Segment.
When the fabric is using distributed IP anycast gateways the edge-routed multicast is also
sometimes required to preserve the leaf to spine bandwidth or reduce the latency when
sources and receivers are connected to the same node. This is something EVPN fabrics
are typically delivering through the OISM (Optimized Inter-Subnet Multicast) at the leaf
and border-leaf level.
14 Chapter 1: Data Center Overlay and EVPN
Security Considerations
With an EVPN control plane, the network can detect and properly handle duplicated
MAC addresses and it can associate a MAC address with specific flags. For example, when
the EVPN MAC address is detected as a duplicate (for example because of an Ethernet
loop, introduced by wrong leaf/server cabling) it can be set as a blackhole MAC@ and
discard the packets destined or sourced at that address.
Alternatively, in some cases the CFM based enhanced loop-detection can be used to react
quicker to the loop and simply disable the interfaces involved in the loop.
Other important security features can be added to data center networks. For example,
MAC Sec encryption on all leaf/spine links, L2-L3 traffic isolation, stateless ACLs, service
chaining to a L4-L7 firewall, or the VXLAN Group Based Policy to deliver the micro
segmentation.
NOTE This book will also examine additional security requirements in its scope of DCI.
NOTE EVPN is constantly evolving with the addition of new capabilities, this chapter
focuses mainly on RFC7432, RFC8365, RFC9135, RFC9136.
multiple PEs and, at the DC-GW level, it allows a given leaf node to load-balance
the traffic to multiple border-leaf nodes.
Fast convergence is offered using MAC mass withdrawal in case of PE-CE link
failure. A single route is used to notify remote PEs.
VM Mobility mechanism is tracked with sequence numbers.
The use of Route Targets allows different logical topologies (mesh, hub, spoke and
extranet).
Ethernet loop-detection is built in EVPN control plane.
Figure1.2 VXLAN Header used to transport data inside the DC and in DCI
Outer IP Header
This is the outer IP header with the source and destination IP addresses indicating the IP
address of the VTEPs.
They are identified by their corresponding Route Target (common between fabric
nodes sharing same MAC-VRF) and Route Distinguisher (unique per each instance
and per each node)
Can enable different service-types for Layer 2 isolation and virtualization
Here are the three principal L2 service types offered in Junos and Junos Evolved as part
of the MAC-VRF implementation – VLAN-based, VLAN-bundle, and VLAN-aware.
Some of the main characteristics of these services are covered in the next section.
VLAN-based service-type
VLAN based service-type is an EVPN instance that consists of a single VLAN broadcast
domain and single bridge-table per EVI/MAC-VRF. It is a one-to-one mapping between
a VLAN-id: VNI and MAC-VRF/EVI – one VLAN mapped to one VNI inside the giv-
en MAC-VRF instance name.
18 Chapter 1: Data Center Overlay and EVPN
This service is sometimes preferred but also means more auto-discovery routes in the fab-
ric so higher RIB utilization when all VLAN-VNIs are enabled at all server leaf nodes.
Nevertheless this option is sometimes used to offer complete isolation of two workloads
– something we highlight in the network virtualization chapter.
This option is the most popular in enterprises that just want to define one Layer 2 MAC-
VRF/EVI and add more VLANs later into it if needed.
It’s also slightly more efficient from the RIB point of view when all VLAN-VNIs are en-
abled everywhere because we reduce the number of unique Auto-Discovery routes that are
within the fabric.
7 Multicast Join sync route Used in multihoming (MH) scenario when receiver is connected to
two leaf nodes and the nDF receives the IGMPv2 membership report
then the Join Sync EVPN route type-7 is sent to the DF node elected
for the given ESI so that he originates also the SMET route for the
given multicast group.
8 Multicast Leave sync route Used in multihoming (MH) scenario when receiver is connected to
two leaf nodes and the nDF receives the IGMPv2 leave message then
the Leave Sync EVPN route type-8 is sent to the DF node elected for
the given ESI
RFC7432 describes the first four BGP EVPN route types and RFC 9136 introduces the
IP Prefix advertisement option into EVPN. The multicast EVPN route types RT-6, RT-7,
and RT-8 are introduced in RFC9251, and we already briefly described those routes.
To better understand the usage of each route, let’s review the main purposes and explain
when and why these route types are typically used.
EVPN Route Type 1 (RT-1): Ethernet auto-discovery route
Each leaf node in the data center fabric advertises RT-1 routes for each locally attached
Ethernet segment for a given EVI.
When there is a connectivity failure to the attached segment, the PE withdraws the cor-
responding set of Ethernet A-D per ES routes.
If the leaf nodes (aka PEs) that receive the withdrawal have another A-D route for this
segment, they will update the next hop accordingly. If not, they will invalidate the MAC
entries for that segment.
This route type brings fast convergence, aliasing, backup path, and split horizon
protection.
A PE learns remote MAC addresses that sit behind other PEs using EVPN control-plane
learning.
If the optional IP field is set, this route can be used to minimize the flooding of ARP or
Neighbor Discovery. When a leaf node receives an ARP request for an IP address and it
has the MAC address binding, it should perform ARP proxy by responding to the ARP
request. Proxy ARP/ND, and ARP/ND suppression are benefits of RT-2.
Additionally, when the symmetric inter-irb is enabled in the data center fabric the MAC/
IP RT-2 is used as well to perform the local IP routing operation at the ingress and egress
leaf node.
Interface-less model
In this model, RT-5 is used to advertise IP prefixes along with a router’s MAC extended
community.
A VNI will be associated to the VRF and advertised with RT-5. In this case there’s no
requirement to have any specific server interface to be associated with the given VRF to
advertise a prefix. In the context of DCI, the interface-less model is used when stitching
Type-5 to Type-5 VXLAN tunnels – where in fact there’s no IFL interfaces associated
with the various prefixes and everything is handled from the forwarding point of view
using the next hop recursiveness for the given destination IP prefix.
Interface-ful models
In these models, RT-5 advertisse IP prefixes whereas RT-2 advertises MAC-IP addresses
of each SBD IRB interface.
RT-5 requires a recursive lookup resolution to an RT-2 route. VNI from RT-2’s MPLS
label 1 field is used when forwarding packets.
The Supplementary Broadcast Domain (SBD) is created for the recursive lookup with no
attachment circuit (VLAN) and it has an IRB interface that connects the SBD to the
IP-VRF.
Table 1.2 Fields Required for Each Model in the RT-5
Interface-less Interface-full with an SBD IRB Interface-full with an unnumbered SBD IRB Model
model model
No SBD and no Each SBD IRB has an IP and a MAC Each SBD IRB has a MAC address only and no IP.
Overlay Indexes address where the IP is reachable RT-5 used to advertise IP prefixes and RT-2 used to
required. RT-5 used to advertise IP prefixes and advertise MAC of each SBD IRB
RT2 used to advertise MAC/IP of
each SBD IRB.
Route distinguisher Route distinguisher Route distinguisher
Ethernet-tag-id = 0 Ethernet-tag-id = 0 Ethernet-tag-id = 0
IP Prefix length IP Prefix length IP Prefix length
GW IP = 0 GW IP = IRB IP of SBD GW IP = 0
VNI = VRF VNI VNI = 0 (RT-2 VNI will be used) VNI = 0 (RT-2 VNI will be used)
Router-MAC No Router-MAC community Router-MAC community
community VRF RT VRF RT
VRF RT
25 EVPN: The Data Center Overlay Network
Figure1.6 EVPN-VXLAN DC Fabric NVO - tenant distribution example within the DC fabric
The example tenant distribution illustrated in Figure 1.6 shows tenant44 and tenant55
each having dedicated EVPN-VXLAN EVI (MAC-VRFs) Layer 2 contexts.
Based on the decision of the admin, two DC racks were dedicated to tenant44, one rack
to tenant55, and the fourth rack was enabled for both tenant services. The tenant can be
26 Chapter 1: Data Center Overlay and EVPN
also viewed here as a different service-id. In many DCs that type of tenant/service distri-
bution is however not staying for long because many servers are simply moving from one
rack to another or are added based on decommissioning, maintenance, memory upgrade
cycles, etc... That’s also why, when any changes are needed in the overlay fabric infra-
structure, they can be easily adapted at the leaf to server port level, without any changes
in the rest of the infrastructure.
For example, when tenant55/service55 must move to rack3, the fabric manager tool, or
simply the admin, can quickly add that tenant only to that new pair of leaves, without
changing anything in the rest of the fabric. This was not possible with the traditional
LAN data center infrastructures where the tenant state was propagated manually on each
node of the fabric instead of being fully distributed via the control plane protocol. In fact,
EVPN as a standard was designed with built-in automation so the propagation of the
information of a new connected tenant is automatically done via the BGP family EVPN
advertisement to the rest of the fabric nodes. Based on the BGP EVPN control plane
information (different EVPN route types) exchange between the leaf nodes, the VXLAN
tunnels are automatically established between the fabric nodes, when they are sharing the
same MAC-VRF EVPN instance(s).
Regarding the tenant network overlay infrastructure segmentations and virtualization
options supported in Junos/Junos Evolved on the QFX family of products, consider these
following network virtualization mapping options in Figure 1.7.
Option-1 isolates the tenant services at the L2 level but points to the same Type-5
instance through which the tenant44/tenant55 can communicate using local IP
routing in the leaf.
Option-2 is offering a 1:1 mapping of L2 MAC-VRFs to Type-5 L3 VRF, where
each tenant or service gets an MAC as well as IP context, in this case there’s no
communication between the two services within the given node.
Option-3 of the mapping is adding the possibility to communicate between the two
services but via the external firewall for advanced L4-L7 inspection.
Option-4 still offers a 1:1 mapping between the L2 and L3 context but, instead of
fully isolating two services, it leaks a specific prefix between the two tenants – for
example a default route.
The good thing about the proposed network virtualization options is that one set of ten-
ant/services may follow the option-1 and the other set of services may follow another
option.
Using the mapping concepts highlighted above, the following services options can be
considered within the EVPN-VXLAN fabrics.
The services options suggested in Table 1.3 can be used within the given enterprise or
telco-cloud EVPN-VXLAN fabric and used within the same node, this is visualized in
Figure 1.8s fabric diagram where leaf1 communicates with leaf2 and the same physical
10GbE interface xe-0/0/1 is used to offer all three services options, each with different
characteristics.
Figure 1.9 Tenant Virtualization and EVPN Service-types Implications on VXLAN Tunnel Establish-
ment
On the other hand, if the EVPN-VXLAN fabric design requires all VLAN-VNIs to be
enabled on all leaf nodes in the fabric, the selective MAC-VRF provisioning high- lighted
above is not so efficient anymore and morover, auto-discovery routes will be present in
the RIB of each leaf node. Indeed, when fabric design requires all VLANs to be present
everywhere then the VLAN-aware service-type grouping multiple VLAN- VNIs under
the same MAC-VRF instance will contribute to reduce the RIB (Routing Information
Based) utilization, at the control plane level. Because in VLAN-based EVPN service-
types, each MAC-VRF is just use one VLAN-VNI, then each new VLAN-VNI enabled
on all leaves generates an independent auto-discovery route (AD route), slightly increas-
ing the number of control plane entries distributed across the fabric.
Intra-Datacenter Architectures
Modern data center networks use multistage topologies inspired by Charles Clos. He de-
signed a 3-staged non-blocking network to switch telephone calls. Let’s review three data
center topologies: 3 stage CLOS, 5 stage CLOS, and collapsed.
Each leaf is connected to all spines but the leaves are not connected to each other. The
hosts are connected to the leaves and there’s a maximum of three hops between any two
hosts connected to the fabric: leaf – spine – leaf. The uplink bandwidth of each leaf can
be increased by adding new spines.
At the leaf layer, if the aggregated bandwidth of the downlink interfaces is equal to the
aggregated bandwidth of the uplink interfaces, then there is no oversubscription and the
architecture is non-blocking. The maximum number of spines is the number of leave’s up-
link interfaces. The maximum number of leaves is the number of spines’ interfaces.
5-stage IP Clos
When a 3-stage topology reaches its limit and there is a need for additional leaves or for
cabling considerations, it can evolve into a 5 stage CLOS. See Figure 1.11.
A 5 stage CLOS interconnects two or more 3 stage PoDs with the use of an additional
layer called super spine. The super spines are connected to the spines from each PoD. In the
following diagram, SS1 and SS2 are the super spines interconnecting the two fabric
PoDs.
There’s a maximum of five hops between any two servers connected to the fabric: leaf >
spine > super spine > spine > leaf. For a non-blocking architecture, there must be no
oversubscription at both the leaf and spine layers. Evolving from a 3-stage to a 5-stage
architecture is an easy and efficient way to increase the scalability of a data center and
add more redundancy to it. For example, two different DC rooms can be interconnected
using super-spine blocks of architecture.
31 Intra-Datacenter Architectures
Similarly, to the 3-stage architecture, in each of the PoDs you can deploy a BO or ERB
EVPN-XLAN design and have, additionally, in each PoD the border-leaf block connect-
ing to the core IP network. In most of cases, the spines and super-spines will be used for
IP forwarding and EVPN route-server purposes, with the exceptions of multicast assisted
replication, when there’s a reason to connect the existing L2 domain directly to the spines
or when the spines in each PoD are used for the VXLAN to VXLAN stitching function,
discussed in greater details in upcoming chapters.
When using collapsed core topology, the seamless stitching techniques are typically not
required because there are not so many fabric VXLAN tunnels beside the one between
the two leaf nodes, like L1 and L2 in Figure 1.12. In that ESI-LAG scenario, leaf L1 and
L2 will have one VXLAN tunnel back-to-back and, when required, will establish VX-
LAN tunnels towards remote sites.
In the scenario where the collapsed core is connected directly to an MPLS core, the ESI-
LAG will be used as a server multihoming technology, however the EVPN-MPLS will be
used on the MPLS network and EVPN-VXLAN won’t be required.
When connecting collapsed core topology to the core IP but without direct MPLS con-
nect, the ESI-LAG is still used for active/active multihoming but with the VXLAN as
encapsulation and EVPN BGP as signalization.
The nice part of the collapsed core topology is that it’s not limited to two nodes and can
be enabled with four nodes or more in full mesh topology bringing additional level of
redundancy and bandwidth to servers or appliances.
NOTE In the case of collapsed spine topology, you typically don’t use any of the
seamless stitching techniques covered later in the book as the tunneling part starts from
the collapsed spine to the remote location.
example, at the existing DC-GW devices in the given PoP location or when enabling the
first hop IP gateway at the firewalls.
In the case of the BO, the traditional VRRP is used as the external DC-GW to deliver
the IP gateway function to the servers connected to the server-leaf EVPN-VXLAN Lay-
er 2 fabric.
In the case of the ERB design, we get the benefit of having a reduced blast radius – re-
duced failure domain thanks to the distributed anycast IP gateway model, while in the
case of the BO fabric design, we maintain the centralized IP first hop gateway model.
In both cases, ERB and BO, when connecting to external devices, like existing firewalls,
existing DC gateways, or load-balancers, it is typically done through the border-leaf
block as highlighted in Figure 1.13.
In both cases, the spines may play the role of IP forwarders and route-servers for EVPN
and VTEP tunnel termination (NVE) will only happen at the server-leaf and border-leaf.
There are however situations where spines are also used in the role of the border spine
– this design is called CRB and can also be a useful way of optimizing the EVPN-VX-
LAN DC fabric previsioning and scale. When most of the traffic is north-south, firewalls
may connect to the spines directly. Or when the spines are used for assisted replication, as
in the case of heavy deployments with multicast, with a lot of elephant multicast traffic.
The spine will also become the NVE and will be used for VXLAN tunnel termination.
Overlay Protocol
The overlay will be BGP-based as it handles EVPN signaling. For that, there are two
choices: eBGP or iBGP.
iBGP
a different protocol is needed for the Underlay or the underlay is using eBGP with
different ASN per node
eBGP
Same routing protocol can be used for both Underlay and Overlay
Each fabric node is identified by an ASN and it provides AS PATH loop detection
Underlay Protocols
The underlay will oversee advertising the loopback addresses used by the overlay and
perform ECMP. With the use of ECMP on top of an efficient routing protocol, all links
of a data center fabric are used making the data center network performant. For the un-
derlay, an IGP can be chosen, mainly OSPF or ISIS, or an eBGP as well.
Note that eBGP has many advantages:
Same protocol can be chosen for both underlay and overlay: each device will have a
single ASN and it will be easier to configure.
eBGP is also the protocol used to peer the Fabric with WAN routers, knowing only
one protocol is sufficient.
eBGP is more scalable than OSPF and ISIS and will be a better choice for very large
scale fabrics.
eBGP is less complex than a link-state IGP.
The operator decides what route to advertise using simple routing policies.
Although eBGP is known for slower routing convergence compared to IGP, some
mechanisms make it as fast as an IGP. For the underlay, eBGP peerings are config-
ured on point-to-point fiber connections and a physical interface failure triggers a
BGP reconvergence in milliseconds.
35 Underlay and Overlay Protocol Considerations
Sub-optimal Forwarding
Traffic forwarded to a dual-homed server (server2 in Figure 1.14) is load-balanced be-
tween both leaves (L2 and L3).
In some cases of multiple failures, like the example below, the traffic between L1 and L3
is sub-optimal and goes through another leaf of the fabric (L2 in this case). If the same
ASN is configured on both spines S1 and S3, then a routing loop will be detected and the
route leading to sub-optimal traffic will be rejected. But in the case of a single-homed
server to L3 (server3), you need the sub-optimal traffic to happen. To address this dual
failure scenario, it will be needed to authorize an AS loop or to configure different ASN
on the spines.
MPLS Underlay
In the case of MPLS, an IGP is typically needed for loopback or label signaling
(SPRING), OSPF and ISIS are the most deployed IGPs. As all nodes must belong to the
same ASN, iBGP will be preferred for the EVPN MPLS scenario. This is something that
is also demonstrated in the implementation section of the EVPN-VXLAN to EVPN-
MPLS lab (Chapter 6).
Chapter 2
Figure 2.1 EVPN-VXLAN Network Virtualization and Data Center Interconnect (DCI)
The following chapter describes several DCI options improving the disaster recovery de-
sign, while preserving the same data isolation and data access models across multiple geo-
graphical locations.
The newer seamless stitching Layer 2 DCI options (option-3/option-4) will be covered in
more details in the next chapters from the control plane and data plane perspective. The
implementation part will be also detailed in the lab dedicated chapter.
The summary of DCI shown here highlights the main characteristics for each option and
helps to identify the differences. For example, between options 1/2/3/4 and option-5/6,
the main difference is that the first four design options are to be considered for L2 stretch-
ing requirements, and the last two options are for the IP-only based DCI. Between differ-
ent options, it can be noticed that even if one option may appear easy to implement at
the early phase, then over time when fabrics are growing or when number of DC sites is
increasing, then the easy way of implementing may be problematic from scaling, opera-
tion, and the control point of view.
It is possible to mix several options to have a full flavor DCI. For example, you could mix
options 3 and 5 to have both L2 and L3 stretching with VXLAN to VXLAN stitching, or
options 4 and 6 to have both L2 and L3 stretching with VXLAN to MPLS DCI. Table
2.1 highlights some additional capabilities that should be considered during the design
decision process.
Controlled VTEP
scale
Same subnet
everywhere
Built-in DCI
VLAN translation
VMTO
ARP/ND
suppression
Flood reduction
between sites
High scaled DCI
for multisite
L3 full isolation
The first four options from Table 2.1 are offering L2 connectivity between the sites while
option-5 and option-6 can be either deployed as the main DCI solution without any L2
extensions or added as an additional L3 service when one of the first four options is used.
When IP routing of the tenants is not handled by the fabric (bridged overlay design), then
the option-4 and option-5 are not typically considered for the DCI. In the next sections,
we will be describing some of these options in more details.
When L3 is involved, it adds even more complexity. At the gateway level, each VRF must
advertise its routes to the other site, and this can be done by configuring an external BGP
peering inside each VRF. In this scenario, distributed routing, meaning a distributed
gateway for stretched VLANs is very complex because we cannot benefit from the EVPN
anycast gateway capability. For example, we would have to configure VRRP with a knob
to make it active/active between datacenters.
The way handoff is enabled at the border-leaf nodes in each DC site is shown at Figure
2.3’s simplified topology diagram.
You can see that the VXLAN tunnels always terminate at the border-leaf level and it’s
the role of the external DC-GW to deliver the DCI services.
The need to have unique control plane and data plane values across all sites: RT, RD,
ESI, and VNIs must be unique and have a global significance.
Blast radius: if a configuration error is made or a problem occurs on one site, it could
potentially impact all sites at the same time
Figure 2.4 OTT DCI EVPN-VXLAN Solution and VXLAN Full Mesh of Tunnels Between DC Sites
In fact, in the case of the seamless stitching solution, the leaf from the given PoD will ter-
minate the interPoD tunnel at the interconnect gateway level (iGW) spine and use the
EVPN signalized interco tunnel that had been established even before any additional
new leaf gets connected.
From the tenant distribution point of view, the seamless EVPN-VXLAN stitching is giv-
ing an additional centralized tool where you can be more selective on which tenants L2
networks are stretched between the PoDs or DC sites.
From the virtualization point of view the new seamless DCI techniques also offer a meth-
od for additional VLAN-translation between the point-of-delivery within the DC, or be-
tween the DC sites, by introducing the translation-VNI capabilities between DC and
DCI.
The site origin identification using the interconnect ESI, aka iESI, can help track the
workloads mobility between the sites and apply specific policy regarding the acceptance
of the workloads tagged with specific ESI value – using Junos advanced EVPN route pol-
icy-statements – aka EVPN route-maps.
Another important reason for tunnel stitching techniques is the fact that two transport
techniques, for example VXLAN and MPLS, can be merged under the same control
plane and enable the connection seamlessly between these domains inside the same vir-
tual context. It helps control which type of workloads are interconnected between the
data center sites, but it also simplifies the operational aspect of the service delivery. For
example, a given service can have restricted operations in just one data center but is en-
abled in a dedicated separate MAC-VRF and the other service is enabled with a MAC-
VRF having a multi-site profile as well.
Let’s summarize the logic behind the new seamless DCI techniques for NVO
architectures:
Simplified and unified implementation of DCI Layer2 and Layer3 extensions.
Better scaling of the control-plane (less EVPN routes between the domains) and
data-plane (lower number of VXLAN tunnels).
Efficient multicast flooding between sites or PoDs .
New VNI translation options – customer VNI and provider VNI option.
RIB/FIB perspective: the leaf in each DC or in each PoD is not installing all the
EVPN routes from the site to which it’s not directly connected, efficiently reducing
the RIB and in consequence the FIB utilization
This is visualized in Figure 2.8 where a data center was deployed with two type of PoDs;
a larger one with many servers, and then after some time, a smaller PoD is connected
with a lower number of servers to offer additional redundancy just for the most critical
services.
In this scenario, even if the larger PoD2 continues connecting the servers, the smaller
PoD1 is not directly impacted. The leaf in the smaller PoD keeps consistent scaling from
the unicast, flood next hop, or VXLAN tunnel perspective, highly reducing the TCAM
utilization. The outcome of this approach is mainly around the reduced number of leaf
scaling requirements thanks to the stitching happening at the higher end interconnect
gateway level.
Figure 2.10 Inter-PoD DC Fabric with 240 Leaves Using Seamless EVPN-VXLAN Tunnel Stitching
at the Border-Spines
48 Chapter 2: Network Overlays and DCI
In the NVO seamless tunnel stitching solution used in the multiPoD architecture shown
in Figure 2.10, the leaf devices from one PoD to the other PoD of the same fabric are not
establishing any direct VXLAN tunnel although they all can share the same broadcast
domain for the given tenant network. This is achieved thanks to the Interconnect Gate-
way, the iGW spines role, at every point of distribution – PoD. This use case will be ex-
plained in more detail in chapter 4 where the seamless stitching at the iGW is
incorporated at the border-leaf or super-spine level instead of the spines.
Figure 2.11 Use-Case Nr 2 – Seamless DCI Using NVO VXLAN Tunnel Stitching
As you can see, the DCI use case two main scenarios are possible when it comes to tunnel
stitching:
Scenario 1: Stitching from VXLAN LAN domain in Overlay 1 to VXLAN intercon-
nect red highlighted in the above diagram
Scenario 2: Stitching from VXLAN LAN domain in Overlay 1 to MPLS intercon-
nect LSPs
49 DCI and Security
In scenario 2, the border-leaf nodes connect directly to the MPLS backbone and become
PE nodes. They are connected directly to remote site PE or connected via an existing P
node to the MPLS domain for the reachability to remote sites.
Even if the main use case for the stitching of tunnels described in RFC9014 is for the
Ethernet Layer 2, the tunnel stitching for the IP Layer 3 part in EVPN Type-5 VXLAN
and EVPN Type-5 MPLS is also important and contributes to seamless introduction of
IP services.
The advantage of MACSec over IPSec is that this encryption can be done directly by the
border leaf switch at wire-speed. For example, in the Juniper portfolio, we offer QFX and
PTX devices with line-rate MACSec encryption.
On the other hand, MACSec is limited to use cases where data center interconnection
uses a direct link.
50 Chapter 2: Network Overlays and DCI
Figure 2.13 Border leaf Doing Both EVPN VXLAN and IPSec
This means the FW must be able to have both IPSec and EVPN VXLAN features. If
that’s not the case, then two separate devices will be needed: a border leaf dedicated to
EVPN VXLAN and a firewall dedicated to IPSEC.
Instead of using IKEv2 for key exchange, the new draf-sajassi-bess-secure-EVPN de-
scribes a method that uses BGP for IPSec tunnels establishment, this method will also
provide privacy, integrity and authentication.
It also answers the following DC requirements:
Protect L2 and L3 tenant’s data and control traffic.
All tenant’s traffic must be protected by IPSec: unicast, unknown unicast, broadcast,
and multicast.
BGP P2MP signaling for setup and maintenance.
Granularity of Security Association Tunnels : per PE, per tenant, per subnet, per L3
flow, per L2 flow, per AC pair.
Support single or multiple policies and DH groups for all SAs.
51 DCI and Security
UTM
Figure 2.14 Secure Border Using Inline Firewall with OTT DCI
Use case 2: a border leaf performs DCI EVPN seamless stitching and a firewall is added
between the border leaves to provide L4-L7 security. In that case as well, the firewall in-
spects VXLAN encapsulated traffic and, moreover, you benefit from the seamless stitch-
ing feature provided by the border leaf. See Figure 2.15.
NORTH-SOUTH Traffic
When traffic ingress data centers from remote locations, it often requires to provide ad-
vanced security features with L4-L7 firewalls. This firewall service must be redundant as
well. It means that it is deployed in each data center and that traffic must be symmetri-
cally handled.
In order to guarantee symmetrical routing, there are several configuration options.
Option 1: active / passive implementation
All traffic ingresses and egresses the same data center. This also means sub-optimal rout-
ing for the passive site where all traffic of the data center ingresses and egresses the other
data center. The requirement is to have enough bandwidth between the data centers and
to stretch all L2 and L3 services between the data centers. The default route advertised
by the active firewall cluster is preferred over the passive cluster.
54 Chapter 2: Network Overlays and DCI
Many aspects related to tunnel stitching were designed to be done automatically via BGP
EVPN signalization. However, it’s always good to know exactly what’s happening under-
neath from the theoretical point of view as well as for mindful design and implementation
of more complex scenarios with many DC sites involved.
In this chapter, we will review in more details the control-plane and data plane aspects
involved in the seamless EVPN-VXLAN to EVPN-VXLAN tunnel stitching as well as
EVPN-VXLAN to EVPN-MPLS.
Figure 3.2 Seamless EVPN-VXLAN T2 Tunnel Stitching - Summary of Control Plane Steps
Based on the automatic translation of EVPN routes Type-1, Type-3, Type-4, and Type-2
for the mac@1 at border-leaf1 in site-A, the border-leaf3 will establish a VXLAN tunnel
to border-leaf1. The translation of route-target, route-distinguisher, and ESI value will
happen seamlessly for the EVPN routes at the border-leaf.
The original EVPN route received from leaf1 are not used at the remote DC sites be-
cause the border-leaf is advertising new EVPN routes for the L2 VNI segments and
MAC addresses learned from the local fabric leaf nodes.
The same operation of EVPN route translation will continue to happen for mac@1 from
the example topology at the border-leaf3 in site-B, where before advertising the Type-2
route to leaf7, it will rewrite the EVPN route attributes, such as route-target to the site-B
local site target:1:9999, the protocol next-hop to its local IP@, the route distinguisher to
its local value, and the ESI value to its local site-B iESI-2.
Based on that control plane information, leaf7 will establish a tunnel to its local site bor-
der-leaf3, instead of an over-the-top direct tunnel to leaf1. This is something that can be
depicted on the block level diagram (Figure 3.2), where VLAN1001 is extended between
the site-A and site-B using VNI51001.
By using the EVPN route translation at border-leaf1/3, the tunnels between the leaf
nodes in two different DC sites are never directly established and are always using the IP
57 Control Plane and Data Plane for Seamless T2 Tunnel Stitching – VXLAN to VXLAN
next hops of local site border-leaf to reach the MAC addresses from the remote data cen-
ter site. This is significant improvement for larger scale deployments with many leaf
nodes, as it helps to control the number of route processing cycles at the leaf nodes and
reduces the number of next hops installed in the PFE (only the local border-leaf IP next
hop will be installed for remote sites mac-addresses reachability).
The border leaf level VNI rewrite operation can be useful in case the operator of the
data centers would like to have a different provider VNI value for data center intercon-
nect purposes whilst preserving the same bridging and same broad- cast domain end-to-
end from the forwarding point of view. In case this is enabled at the configuration level, a
new VNI value is advertised by the border-leaf for the MAC@ learned from the leaf.
The translations that occur at the border leaf are seamless and don’t require explicit
policies.
When it comes to the control plane aspects and Junos/Junos Evo, Figure 3.3’s block dia-
gram highlights the sequence in which the logical tunnels are created, based on the re-
ceived EVPN control plane information.
Figure 3.3 Junos/Junos Evo L3 Routing and L2 Bridging Daemon Interaction for the VXLAN Tun-
nel Creation
The first step of the state machine for the VXLAN bridging tunnel creation involves pro-
cessing of the EVPN route at the RPD routing daemon level, which then triggers the
‘tunnel create’ message sent to the Layer 2 bridging daemon – l2ald, which then sends,
via the kernel, a call for the local fabric tunnel creation, DCI tunnel creation and installa-
tion of the flooding mesh-groups at the PFE level, which is needed for the processing of
BUM traffic.
Because the border-leaf EVI is processing the routes from the local fabric as well as
routes from the remote site, the RT-1/RT-2/RT-3 coming from two different domains
will contribute to the creation of the LAN VXLAN tunnels as well as the DCI VXLAN
tunnels. This is something we will explain in more detail in the next chapter.
58 Chapter 3: Deep Dive into EVPN Seamless Stitching
When stitching from one VXLAN tunnel to another at the border-leaf, the operation of
decapsulating and encapsulating packets is taking place at the PFE level. The data plane
part of the stitching also involves handling the broadcast and multicast packets when
dealing with L2 DCI to maintain the loop-free ethernet solution. In Junos/Junos Evo this
is implemented using the concept of mesh-groups, associated LAN, WAN tunnels and
local node interfaces.
Figure 3.4 reveals the way remote site and local site VXLAN tunneling gets associated
with different mesh-groups aka interface-groups (IFG) to respect the split-horizon rule,
which is not allowing to send back the same BUM traffic to the interface on which it was
originally received.
Figure 3.4 Layer 2 Data-Plane and Tunnel Stitching – Tunnels to Mesh-Groups Mapping
For mesh groups for the given bridged-domain, a flood-route is installed, which points to
a composite next hop which has all the interface next hops where the packet needs to be
flooded. For each VLAN/VNI, a mesh group has different flood NH groups and a flood
route. The concept of mesh groups is specific to Layer 2 traffic and is not used in the case
of pure IP DCI solutions (DCI option-5/option-6).
The mesh-group installation per MAC-VRF and per VLAN name can be verified at the
QFX5130 and QFX5700 border-leaf level using the following command.
via DCI
0x40002/51 FLOOD_GRP_COMP_NH __re_flood__ comp 57021
root@border-leaf3>
We can then also check which physical interface is used for the given mesh-group name
next hop. For example, here we show how to verify the outgoing DCI flood interface at
the QFX5130 border-leaf3 level. The composite NH index 57067 learnt from previous
command for the L2 flooding mesh-group called __wan_flood__ points to the DCI inter-
face et-0/0/26 at border-leaf3 – used to connect to DC A from DC B.
At the unicast data plane level, when the VXLAN packet is received at the border-leaf, it
gets decapsulated locally and sent either to the local server port (for example when there
are any servers or appliance locally connected to the border) or it is decapsulated from
ingress VNI and encapsulated in the new DCI VXLAN tunnel using the same or a new
VNI value (if VNI translation is set) inside the VXLAN tunnel header.
60 Chapter 3: Deep Dive into EVPN Seamless Stitching
You can see in Figure 3.5 that the iESI in the case of pure Type-5 to Type-5 stitching is
not used for load balancing purposes and only the overlay IP ECMP will be used to load
balance the traffic destined to the given DC site. The original router-mac (RMAC) from
leaf1, the route-distinguisher, and the route-target are changed at the border-leaf level
but the routing VNI info is by default preserved, copied from the original EVPN T5
route. The Figure 3.5 is illustrating the pure Type-5 scenario, aka interface-less, where
there’re no need to also advertise the Type-2 MAC routes for load-balancing purposes.
The interface-less T5 mode is in fact the most popular industry implementation of
EVPN prefix-advertisement standard – RFC9136.
61 Control Plane and Data Plane Operation for EVPN-VXLAN to EVPN-MPLS Stitching
When it comes to Junos and Junos Evo implementation of Type-5 stitching, Figurer 3.6
can help illustrate the interaction between the Kernel and routing daemons.
Figure 3.6 Pure Type-5 VXLAN Tunnel Stitching - RPD Daemon And Kernel State Machine
When comparing the block diagram for “Type-5 stitching” and the one for “type-2
stitching” you can observe that in the case of Type-5, stitching the local border-leaf will
not create any additional logical interfaces, aka IFLs, and the L2 daemon called l2ald is
not engaged in the creation of the Type-5 tunnels anymore.
When using the iESI from DC site-A, the MPLS aliasing label will also be acting for load
balancing purposes, when the given site has two or more border-leaf nodes.
Figure 3.7 Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Data Plane and Control-Plane
Operations
The data plane operation highlighted above is possible thanks to the BGP EVPN control
plane part where the LAN route-distinguishers (RD) and WAN (DCI) route-distinguish-
ers (called iRD),
LAN Route-Target (RT) and WAN Route-Target (iRT) within the same EVI (MAC-
VRF) are also part of the MAC-VRF configuration at the border-leaf in each DC site.
The ESI change takes place at the border-leaf node, same as for the VXLAN-to-VX-
LAN stitching– all the LAN Fabric MAC@ will be represented on the remote site bor-
der-leaf via the site identifier iESI, which is specific to each site per EVI.
The border-leaf nodes will advertise type 1 EVPN routes (AD routes) per ESI and AD
route per EVI to DC and DCI side. DC fabric wide RD and RTs (shared with server leaf
nodes) and iRD and iRT for DCI (shared with remote site border-leaf nodes). In the
context of VXLAN to MPLS, the AD route per EVI towards the DCI side is advertised
with aliasing label, which will be used by remote border-leaf for load balancing of uni-
cast traffic.
The IM route type-3 will also have the vlan-id instead of the VNI used toward leaf
nodes on the fabric side. The type 4 route (ES-Import route) in the DCI context is adver-
tised with the new I-ESI (interconnect ESI) and MPLS label, which is used for split-hori-
zon function between multi-homing border-leaf nodes. The DF/nDF election is also
done by default using the MOD algorithm but can be also set with hard coded prefer-
ence-based values.
63 Control Plane and Data Plane Operation for EVPN-VXLAN to EVPN-MPLS Stitching
Handling at the data plane level of the BUM packets is consistent with the one described in
the VXLAN-to-VXLAN section above, where VXLAN IFL interfaces and MPLS LSP are
associated with specific mesh-group IDs for which the split-horizon rules will be enabled, to
avoid L2 Ethernet flooding loops. In the section dedicated to implementation, we further
break down the control plane and data plane aspects of EVPN-VXLAN to EVPN-MPLS
stitching for bridging.
In some cases, the EVPN-VXLAN fabric is deployed in parallel of bridging part, men-
tioned before, also with the IP Prefix advertisement, with prefixes specific to the given DC
site location. In this case, there’s no need to leverage in parallel the bridging between the
sites and only the full IP reachability is required between the DCs. This is something high-
lighted in Figure 3.8 where each DC fabric is deployed with ERB design and generates site
specific prefixes (that are not used, also on a different DC site), which needs to be reachable
between the sites and potentially reachable also from the DC users seating behind the core
IP. In this case the border-leaf nodes in each fabric will simply advertise the IP prefixes re-
ceived from distributed leaf nodes (like, for example, QFX5120 or QFX5130) as EVPN-
VXLAN Type-5 routes but transform them to IPVPN-MPLS advertisements. This way
any existing remote PE node in the IPVPN-MPLS network will get the reachability to the
DC fabric. The border-leaf doesn’t have to advertise all the EVPN Type-5 prefixes from
the given site location and only a summary route for the given tenant will be sent to the rest
of the PE nodes. This is something visualized in Figure 3.8.
The deployment highlighted above is interesting when the organization already deployed
MPLS in their core infrastructure and data centers services, in different locations, are
reachable by the users using IPVPN-MPLS. In this case, in order to simplify the DCI de-
ployment, same approach is taken and simply the site specific prefixes – here the
64 Chapter 3: Deep Dive into EVPN Seamless Stitching
10.10.10.0/24 for site A and 10.10.30.0/24 for site B, are advertised from the border-leaf
PE nodes to the IPVPN-MPLS existing infrastructure. So before they are reachable not
only between the DC sites but also for all the existing POP locations for the given tenant/
service id. The operation behind the stitching from VXLAN Type-5 tunnels to MPLS
LSP is also explained in the draft-ietf-bess-EVPN-ipvpn-interworking document.
In order to protect the network from any IP routing loops, the domain-path, aka D-PATH
attribute, specifically set for DC-to-DC communication can be optionally introduced.
The situation when part of the IP prefixes are site specific, and others are stretched be-
tween the two site,s may also happen. In this L2 DCI stretched scenario, the PoP (Point of
Presence) locations for most optimal IP reachability will also get the host IP routes via the
IPVPN in order to reach the right DC location directly in an optimal way.
Chapter 4
This chapter focuses on different underlay options for two or more PoDs or data center
sites interconnection. Because of the number of sites to interconnect, due to the expected
east-west oversubscription ratio or because of the existing site-to-site IP underlay routing,
the decision on the type of underlay architecture will be made based on:
Where exactly the seamless stitching will take place – at the border-spine or at the
border-leaf level
What type of underlay transport approach will be used – dark fiber, super-spines, or
existing IP domain
The super-spine peers use eBGP with both PoD1/PoD2 spines to deliver full IP reach-
ability between the S1/S2 in PoD1 and S3/S4 in PoD2. The route server role for overlay
eBGP peerings is optional as spines from each PoD can simply establish a full mesh of
overlay multihop eBGP peering between themselves. For more than two pods or two sites
the iBGP full-mesh is however more convenient from provisioning point of view.
The seamless stitching of tunnels would still happen in that case at the spine level in each
PoD. The leaves from PoD1 would not enable a direct tunnel to the leaf devices in PoD2.
Host-1 to Host-2 provisioned within the same VNI 51001 would communicate via the
interconnect gateways local to the PoD. In the example shown in Figurer 4.1, the vlan-id
used at the leaf level is 1001 but, with the built-in translation capabilities, both VLAN
and VNI can be different in each PoD, while still delivering the same L2 broadcast do-
main between the PoDs.
With the inter-PoD design shown in Figure 4.1, the super-spines usually use
100Ge/400Ge PoD-to-PoD connection for any low latency and data replications re-
quirements. This design can still use the border-leaf block from each PoD for the end
user access to the data via the WAN.
67 Border-Spine Stitching with Intermediate IP Domain Underlay Transport
With the design shown In Figurer 4.2, you also skip adding a dedicated block of border-
leaf nodes, which for significant north-south data volumes, would require higher number
of leaf-spine 100Ge links overall in the fabric. Compared to the previous example, in this
case you typically don’t use any additional border-leaf block as everything is combined
from PoD-to-PoD as well as core IP to the DC connect at the border-spine level. The in-
termediate domain 2 connected to the border-spines will be using the edge routers for IP
forwarding purposes and all the overlays between the border-spine S1/S2 and S3/S4 will
create a full mesh of eBGP multi-hop peerings.
68 Chapter 4: DCI and Multipod: Underlay Architecture Options
With this design, we similarly put in place the stitching function at the border-spine level.
However, the BGP interconnection for underlay is directly done with the remote site spine
nodes. The link between the spine nodes is optional but sometimes preferred for higher
redundancy requirement.
Because the design doesn’t have any intermediate super-spine or core-IP edge-routers
involved, the latency between the PoDs is lower when comparing to the two previous de-
sign options.
Typically, with dark fiber the link failure detection is also direct between the two PoDs.
However, in the EVPN context, where most BGP peerings are using 3x300ms BFD, it’s
better to keep the same timers also in the back-to-back dark fiber scenario, for fast conver-
gence consistency between the two PoDs.
69 Border-leaf Design and Intermediate Core IP Underlay
Figure 4.4 DCI EVPN-VXLAN Stitching Using Border-leaf – VXLAN to VXLAN Option
This DCI deployment option is especially popular when the fabric PoD is deployed using
the edge routed architecture (ERB), where all tenant virtualization is delivered at the
edge and higher segmentation ratio between the workloads is used in general within the
fabric.
Comparing border-leaf design with previously discussed border-spine, the main differ-
ence is related to the latency and bandwidth used for the north-south connection, as well
as for the replication/synchronization between the DCs. So long as the border-leaf is
having enough links to the lean spines to cover the site-to-site replication and north-south
connection, or when simply most of the traffic stays within the DC site, then this design
option is more interesting compared to the others previously mentioned. Mainly because
the site local forwarding functions delivered by the lean spines are not mixed with the
DCI function, the design offers a better demarcation point between the two distinct func-
tions, and reduces the blast radius for any of the lean spines or border-leaf nodes opera-
tions such as upgrades or node replacements.
70 Chapter 4: DCI and Multipod: Underlay Architecture Options
When using the core IP to connect between the border-leaf block from each DC site, you
can occasionally have a situation where the IP connectivity to the remote border-leaf
VTEPs is offered via the existing PE IPVPN MPLS service. This is still something possible
to consider as an underlay connectivity option, although we recommend in this case ex-
tending the core IP domain of the IGP down to the border-leaf and just advertise the local
border-leaf loopbacks into the existing IGP domain.
Figure 4.5 DCI EVPN-VXLAN Stitching Option Using Border-leaf VXLAN-to-MPLS Option
The good thing in this case of the seamless stitching techniques is that the interconnect
option is enabled within the same tenant MAC-VRF and brings the same deterministic L2
extension capabilities, instead of the default full mesh VXLAN overlay between the DC
sites leveraging the MPLS for the underlay.
In the case of VXLAN to MPLS stitching, besides the automatic change of transport type,
from VXLAN in the fabric to MPLS for DCI, most of the control plane EVPN stays the
same. However, the border-leaf, instead of generating VNI specific IM routes to the re-
mote sites, will originate the bridge-domain VLAN related routes and associate them with
the EVI MPLS label. This means that every EVI instance at the border-leaf will also origi-
nate a MPLS aliasing label for active/active load-balancing purposes, from DC site-A to
DC site-B. At the border-leaf node, the split-horizon rules will be triggered for any BUM
traffic from the core to make sure it is not flooded back to the originating domain.
Chapter 5
To better understand the low-level implementation and verification aspects of the seam-
less EVPN-VXLAN to EVPN-VXLAN stitching, the simplified lab topology in Figure
5.1 is used.
Use Case
In the proposed example, there is a bigger data center on site-A and a smaller on site-B,
interconnected via the VXLAN to VXLAN DCI tunnel stitching as part of the scaling
and admin operations optimization project.
The two DC sites are connected using 100Ge links. Note that the crossed links towards
both remote site border-leaf devices are not used here in the lab topology but are recom-
mended for even better redundancy.
Table 5.1 summarizes the main configuration components used for the provisioning of the
DCI infrastructure shown in Figurer 5.1 using the VLAN-aware EVPN Instances (EVI)
called MACVRF101 and eBGP underlay/overlay peering.
Table 5.1 BGP Overlay/Underlay ASN, Loopback, RD, RT Information Inside the Fabric
When considering DCI interconnections, each site is enabled with a site id iESI at the
MAC-VRF level. Because in the suggested example we use only one MAC-VRF(EVI) called
MACVRF101 the following DC site-id ESIs are used listed in Table 5.2.
73 Use Case
Table 5.2 ESI - Site Identifiers Used At The Interconnect Gateway Border-leaf Nodes
The ESI for DCI site identification shown in Table 5.2 are used only at the border-leaf
level in each DC and are not enabled at the server leaf nodes. As explained in the previ-
ous chapters, these ESI values are rewritten for EVPN Route type-2 at the border-leaf
level. This way, the remote site border-leaf nodes are using a single iESI value for a given
MAC-VRF to reach all the MAC addresses located on the local site. The leaf nodes in
each site use local border-leaf ESI value to reach the remote MAC addresses, effectively
reducing the number of next hops installed at the PFE level.
The following VLANs and VNIs are enabled inside the EVI MACVRF101 and extend-
ed between the two DC sites to connect all servers together. The servers (server1/2/3 in
DC1 and server4 and server5 in DC2) are running standard VMware virtual machines so
multiple VLANs must be enabled to each of the servers, however only the VLAN1001
and VLAN1002 will be extended between the two DC sites, because only that service has
the site level redundancy.
The vlan-id information shown in Table 5.3 is used to provision each leaf node locally.
However, vlan-id information is stripped at the server leaf nodes before sending the traffic
into the VXLAN tunnel towards the border.
Because the DC sites are also deployed with the anycast IP gateway model (aka EVPN-
VXLAN ERB), the following IRB interfaces IP addressing are used at the leaf nodes. The
border-leaf nodes don’t have the same IRB interfaces enabled, while typically in a pro-
duction environment some IRB interfaces would be present at the border-leaf level,
mainly to connect the external firewall cluster or other appliances. The IRB interfaces are
placed in the same IP VRF called T5-VRF-1. See Table 5.4.
On the server-facing interfaces (aka “PE-CE” ports) the following servers are connected
and extended between the two data centers in VLAN1001. See Table 5.5.
Server server server VLAN- vlan- VNI ESI-LAG LACP system-id Physical
name MAC@ IP@ name id interface
Servers in DC site-A
server1 00:50:56:- 10.10.0.101/24 VLAN- 1001 51001 00:00:88:88:88:88:88:88:01:01 00:01:88:88:01:01 et-0/0/50 leaf1
AB:01:01 1001 et-0/0/4 leaf2
server2 00:50:56:- 10.10.0.102/24 VLAN- 1001 51001 00:00:88:88:88:88:88:88:02:02 00:01:88:88:02:02 et-0/0/50 leaf3
AB:01:02 1001 et-0/0/50 leaf4
75 Use Case
server3 00:50:56:- 10.10.0.103/24 VLAN- 1001 51001 00:00:00:00:00:00:00:00:00:00 (single homed) et-0/0/50 leaf5
AB:01:03 1001 (single homed)
Servers in DC site-B
server4 00:50:56:- 10.10.0.104/24 VLAN- 1001 51001 00:00:88:88:88:88:88:88:04:04 00:01:88:88:04:04 xe-0/0/14:0
AB:01:04 1001 leaf6
xe-0/0/14:0
leaf7
server5 00:50:56:- 10.10.0.105/24 VLAN- 1001 51001 00:00:88:88:88:88:88:88:05:05 00:01:88:88:05:05 xe-0/0/12:0
AB:01:05 1001 leaf6
xe-0/0/12:0
leaf7
Table 5.6 tracks the verification outputs shown in the next section. For example, the ESI-
LAG value highlighted above is used within each fabric but when stitching at the border-
leaf, it is replaced with the site identifier iESI values. So, depending on the device where
the commands are executed, the ESI is changing for the given end host MAC address.
For the VXLAN-to-VXLAN stitching lab purposes, the VLAN1001 is our main focus in
the next sections, even if the VLAN1002 is also enabled in the same MACVRF101 EVI
and stretched between the two data center sites.
The data shared in the previous tables are also included in the following reference lab
topology, Figure 5.2, where you can see the server MAC and IP information for the
VLAN1001 which is mapped inside each fabric and between the DC sites to VNI 51001.
Figure 5.2 Seamless EVPN-VXLAN Stitching Lab Diagram - Local Site ESI and Interconnect ESI
Values
In the upcoming sections the focus will be on the configurations and verifications used at
the border-leaf nodes. We are also showing the selected leaf devices mac-VRF configura-
tions and verifications for reference purposes.
76 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
As highlighted in Figure 5.3, the eBGP underlay and iBGP overlay approach is extended
at the DCI level. However, when there’s an existing core IP routing, the underlay IP rout-
ing may be different. For example, when the core-IP is already running an IGP. As long as
the underlay IP routing solution delivers full IP reachability between the loopback ad-
dresses of the interconnect gateways, then an IGP can be also a valid design option.
In our lab, for simplicity and consistency with the fabric design, we just used eBGP un-
derlay and eBGP overlay inside the fabric and eBGP underlay and iBGP overlay between
the DC sites.
Loopbacks of the server leaf nodes and their end-to-end reachability is mainly needed
within the fabric as they are not needed between the DC sites. Indeed, the interconnect
gateway (border-leaf in our lab topology) changes the routes originated at the server leaf
nodes by changing the PNH IP with the one used locally at the iGW. Thereby, the site-to-
site underlay loopback reachability is mainly required for the interconnect gateways -
border-leaf nodes.
Here’s an example of the underlay provisioning between the leaf5 qfx5120 and spine1/
spine2 in site-A.
set protocols bgp group underlay type external >> we explicitly say what type of BGP peering we’ll be
using
set protocols bgp group underlay export my_underlay_export >> this route policy-statement is for local
loopback adv.
set protocols bgp group underlay local-as 65005 >> unique BGP ASN for underlay
set protocols bgp group underlay multipath multiple-as
set protocols bgp group underlay neighbor 192.168.220.1 peer-as 65100 >> spine1 underlay peering using
the interface et-0/0/20 IP@
set protocols bgp group underlay neighbor 192.168.222.1 peer-as 65100 >> spine2 underlay peering using
the interface et-0/0/20 IP@
set protocols bgp group overlay type external
set protocols bgp group overlay multihop >> we need to specify it explicitly because our peering is
using loopback interfaces
set protocols bgp group overlay local-address 172.16.7.5 >> corresponding to local loopback0.0 IP
address
set protocols bgp group overlay family evpn signaling >> at the server leaf only the EVPN address-
family is used
set protocols bgp group overlay local-as 65005 >> unique BGP ASN for overlay
set protocols bgp group overlay multipath multiple-as
set protocols bgp group overlay neighbor 172.16.7.100 peer-as 65100 >> spine1 overlay peering to his
loopback
set protocols bgp group overlay neighbor 172.16.7.101 peer-as 65100 >> spine2 overlay peering to his
loopback
set protocols bgp group overlay vpn-apply-export >> this is needed when implementing any export
policy-statements
set policy-options policy-statement my_underlay_export term term1 from route-filter 172.16.7.0/24
prefix-length-range /32-/32
set policy-options policy-statement my_underlay_export term term1 then accept
78 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
To make sure we load-balance the traffic from the leaf to both spines using IP ECMP, the
following configurations are required at the routing-options level for the underlay IP
routing, where the policy-statement called LB is exported at the routing-options forward-
ing-table level.
set routing-options router-id 172.16.7.5 >> it’s better to use the value corresponding to local
loopback IP@ and RD used in VRFs
set routing-options forwarding-table export LB
set routing-options forwarding-table chained-composite-next-hop ingress evpn >> this part is for
qfx5120 leaf when T5 instance is used
## the following policy-statement is used to deliver the IP ECMP underlay towards spines
set policy-options policy-statement LB term term1 from protocol evpn
set policy-options policy-statement LB term term1 then load-balance per-packet
set policy-options policy-statement LB term term1 then accept
set policy-options policy-statement LB term term2 then load-balance per-packet
set policy-options policy-statement LB term term2 then accept
The similar eBGP underlay and overlay configurations can be repeated on the other leaf
nodes where only the local-address, BGP local-as and underlay eBGP peering IP address-
ing is changing. For example, here’s the BGP configuration of the server leaf1.
That loopback IP@ advertised from each node in the fabric via underlay routing is, in
fact, used for BGP overlay multi-hop peerings with family EVPN. Moreover, it is also
used as the IP@ for the VxLAN tunnel termination within the given site fabric.
It means, within the given EVPN-VXLAN fabric, that leaf1 and leaf5 are using their lo-
cal loopback IP addresses for the VXLAN tunnels. This address is also the protocol next
hop used in the overlay BGP advertisement for all EVPN route types within the fabric.
However, for the MAC@ reachable in remote destinations from site B, local leaf1 and
leaf5 use the BGP protocol next hop of the local site border-leaf1/border-leaf2 instead of
the protocol next hop of leaf6/leaf7 directly.
From the lean spine perspective, BGP configuration also involves underlay and overlay
peering. However, because there is not any MAC-VRF configuration on these spines, the
VXLAN tunnel termination won’t take place and the spines will be acting as pure IP for-
warders and route-servers.
From the BGP point of view, the lean spines need to be provisioned with the explicit mul-
tihop no-nexthop-change knob to change the default behavior of eBGP peering which
changes the BGP protocol next hop with his local IP value.
At the border-leaf level we use a same BGP ASN value for underlay and overlay but
unique per node, on the fabric side when peering eBGP to the spines – ASN 65113 for
underlay and overlay at border-leaf1 and ASN 65114 for underlay and overlay at
border-leaf2.
The multihop no-nexthop-change knob, in the context of seamless DCI, is not needed at
the border-leaf level because it changes the EVPN routes protocol-next hop with his local
loopback IP address by default.
For example, here is the peering configuration on border-leaf1 (in DC site-A) to border-
leaf3 (in DC site-B).
set protocols bgp group underlay neighbor 192.168.53.1 peer-as 65100 >> underlay peering to spine1
set protocols bgp group underlay neighbor 192.168.53.1 import my_underlay_import
set protocols bgp group underlay neighbor 192.168.63.1 peer-as 65100 >> underlay peering to spine2
set protocols bgp group underlay neighbor 192.168.63.1 import my_underlay_import
set protocols bgp group underlay neighbor 192.168.63.1 peer-as 65100
set protocols bgp group underlay neighbor 192.168.228.2 peer-as 65215 >> underlay peering to border-
leaf3 in DC site B
set protocols bgp group overlay type external
set protocols bgp group overlay multihop
set protocols bgp group overlay local-address 172.16.7.113
set protocols bgp group overlay family evpn signaling
set protocols bgp group overlay local-as 65113 >> same overlay BGP ASN inside the fabric – site A
set protocols bgp group overlay multipath multiple-as
set protocols bgp group overlay neighbor 172.16.7.100 peer-as 65100 >> overlay peering to spine1
set protocols bgp group overlay neighbor 172.16.7.101 peer-as 65100 >> overlay peering to spine2
set protocols bgp group overlay vpn-apply-export
The border-leaf2 underlay/overlay eBGP is consistent with border-leaf1 – both are using
the different overlay ASN number which helps identifying routes inside the fabric.
Config. 6 border-leaf2 underlay/overlay peering with same ASN for both border-leaf nodes
The following policy statements for import were used in DC site A on both border-leaf
nodes in site A. The import policy was used to make sure the remote site loopbacks (used
to terminate the VXLAN tunnels) are not learned also via the local spines. If we were not
restricting it through the import policy of the underlay BGP group, there could be some
suboptimal path also via the local spine nodes.
set policy-options policy-statement my_underlay_import term term1 from route-
filter 172.16.7.215/32 exact
set policy-options policy-statement my_underlay_import term term1 from route-
filter 172.16.7.216/32 exact
set policy-options policy-statement my_underlay_import term term1 then reject
set policy-options policy-statement my_underlay_import term term2 then accept
81 Underlay and Overlay BGP Peering Provisioning and Verification
The export policy is simply there for underlay to advertise the local loopback IP@ to the
rest of the fabric and to the remote location. Advertising to the remote location the local
fabric loopbacks of the server leaf nodes is optional because besides the border-leaf to
border-leaf loopback reachability between the sites the server leaf from site A to server
leaf from site B won’t have any direct VxLAN tunnel established.
set policy-options policy-statement my_underlay_export term term1 from route-
filter 172.16.7.0/24 prefix-length-range /32-/32
set policy-options policy-statement my_underlay_export term term1 then accept
When it comes to DCI overlay peering from border-leaf nodes, a new overlay BGP
group called DCI is introduced and a full mesh of iBGP peering. We decided to use full
mesh of iBGP because only two DC sites were used in our example, however in case of
multiple sites, typically an in-path or off-path Route-Reflector would be used.
At the border-leaf nodes in site A we used the following overlay DCI BGP peering
configuration:
The border-leaf nodes in DC site B are using consistent approach for iBGP full mesh
config and peering to his local border-leaf as well as remote location border-leaf nodes
in site A.
82 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
You can see in the above configuration examples that even if our site to site connect is
using a partial mesh, we enabled a full mesh because we wanted to make it ready for any
additional links between the DC sites.
The leaf1 interface configuration connected to server1 is using the enterprise-style con-
figuration with local ESI value – significant within the DC1.
set interfaces ae0 unit 0 family ethernet-switching vlan members 10 >> VLAN local to DC1
set interfaces ae0 unit 0 family ethernet-switching vlan members 20 >> VLAN local to DC1
set interfaces ae0 unit 0 family ethernet-switching vlan members 1001-1002 >> VLANs stretched between
DC1 and DC2
set interfaces et-0/0/50 ether-options 802.3ad ae0 >> associating physical interface towards the
server1 with ESI-LAG
set chassis aggregated-devices ethernet device-count 2 >> to be enabled if not yet used globally at
the given server-leaf
When the server leaf is used with QFX5120 platforms, the following global configura-
tions are also required to enable the MAC-VRF based EVI implementation.
set interfaces irb unit 10 family inet address 10.10.10.1/24 >> same IP@ is used across all server leaf
nodes
set interfaces irb unit 10 mac 00:00:01:01:01:01 >> same MAC@ is used across all server leaf nodes
set interfaces irb unit 20 family inet address 10.10.20.1/24
set interfaces irb unit 20 mac 00:00:01:01:01:01
set interfaces irb unit 1001 family inet address 10.10.0.1/24
set interfaces irb unit 1001 mac 00:00:01:01:01:01
set interfaces irb unit 1002 family inet address 10.10.1.1/24
set interfaces irb unit 1002 mac 00:00:01:01:01:02
NOTE These IRB interfaces will be enabled inside the IP Type-5 VRF later with the
Type-5 tunnel stitching scenario. In the case the fabric is just deployed as an L2 EVPN-
VXLAN fabric, the IRB interfaces are not required, and the IP default gateways (like an
MX or SRX external DC GW) can be connected to the border-leaf nodes.
For completeness of the explanation, we also introduce the MAC-VRF and interface
configurations of the leaf2 also connected to server1.
For the ESI-LAG interface configuration needed to connect server1, you can notice that
it is the same configuration as leaf1: same ESI value, same LACP system-id and same list
of VLANs.
Similarly, to leaf1 you must enable the following global forwarding options.
set forwarding-options evpn-vxlan shared-tunnels >> this knob is a pre-requisite when enabling
MAC-VRF/ reboot needed
set forwarding-options vxlan-routing next-hop 32768
set forwarding-options vxlan-routing interface-num 8192
set forwarding-options vxlan-routing overlay-ecmp >> used when deploying the ERB anycast IRB gateway
– T5 instances
86 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
The leaf2 connected with leaf1 to server1 is also enabled with an IP anycast gateway:
set interfaces irb unit 10 family inet address 10.10.10.1/24 >> same IP anycast used at leaf1 in DC A
set interfaces irb unit 10 mac 00:00:01:01:01:01 >> same MAC@ used at leaf1 in DC A
set interfaces irb unit 20 family inet address 10.10.20.1/24
set interfaces irb unit 20 mac 00:00:01:01:01:01
set interfaces irb unit 1001 family inet address 10.10.0.1/24
set interfaces irb unit 1001 mac 00:00:01:01:01:01
set interfaces irb unit 1002 family inet address 10.10.1.1/24
set interfaces irb unit 1002 mac 00:00:01:01:01:02
With IRB interfaces local to the server leaf, all ARP/ND resolutions are performed lo-
cally. In case the same server leaf is enabled with an IP Type-5 instance for prefix-adver-
tisement (ERB design), then the subnet associated with the IRB and, in some cases, also
the Type-5 host routes are advertised to the fabric and used at the border-leaf. When a
given server leaf gets a MAC-IP Type-2 route as well as the Type-5 host route, then the
Type-5 host-route prefix route is always preferred for forwarding by the server leaf.
The border-leaf nodes in each DC site are used as the interconnect gateways (iGW) uni-
fying the border-leaf and gateway role within the same device. That also reduces effi-
ciently the total cost and maintenance of the DCI infrastructure.
The tunnel stitching implementation is done inside the local site MAC-VRF called
MACVRF101 where a new block of configuration called ‘interconnect’ is dedicated to
data center interconnect purposes. Here’s how DC site A border-leaf1 and border-leaf2
seamless stitching configuration is enabled to stretch the VNI 51001 and 51002, with an
explicit list of VNIs that we want to extend between the sites.
Config. 15 border-leaf1 - EVPN-VXLAN to EVPN-VXLAN seamless stitching configuration
set routing-instances MACVRF101 route-distinguisher 172.16.7.113:1 >> unique regular RD for the fabric
LAN purposes
set routing-instances MACVRF101 vrf-target target:1:8888 >> LAN fabric route-target shared with server
leaf nodes
set routing-instances MACVRF101 vlans vlan1001 vlan-id 1001 >> VLAN to VNI mapping
set routing-instances MACVRF101 vlans vlan1001 vxlan vni 51001 >> local site A VNI value shared with
server leaf nodes
set routing-instances MACVRF101 vlans vlan1001 VXLAN translation-VNI 61001 >> explicit but optional VNI
translation for DCI purposes
set routing-instances MACVRF101 vlans vlan1002 vlan-id 1002 set routing-instances MACVRF101 vlans
vlan1002 vxlan vni 51002
set routing-instances MACVRF101 vlans vlan1002 VXLAN translation-VNI 61002
## at the global protocol EVPN level we also explicitly provision the peer IP loopback address of the
border-leaf from same DC site
set protocols evpn interconnect-multihoming-peer-gateways 172.16.7.114 >> required for loop free
forwarding
Because at DC site A we have two border leaf nodes the same configuration pattern is also
used at the border-leaf2:
Config. 16 border-leaf2 - EVPN-VXLAN to EVPN-VXLAN seamless stitching configuration
set routing-instances MACVRF101 instance-type mac-vrf
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN
set routing-instances MACVRF101 protocols evpn default-gateway no-gateway-community
set routing-instances MACVRF101 protocols evpn extended-vni-list 51001 >> VNI defined for LAN fabric
purposes in an explicit way
set routing-instances MACVRF101 protocols evpn extended-vni-list 51002
## the new configuration block called ‘EVPN interconnect’ is used for seamless stitching DCI using
qfx5130/qfx5700
set routing-instances MACVRF101 protocols evpn interconnect vrf-target target:1:101 >> DCI dedicated
route-target shared with remote border
set routing-instances MACVRF101 protocols evpn interconnect route-distinguisher 172.16.7.113:101 >>
unique iRD per border-leaf
set routing-instances MACVRF101 protocols evpn interconnect esi 00:00:11:11:11:11:11:11:11:11 >> the DC
site identifier ESI unique per site
set routing-instances MACVRF101 protocols evpn interconnect esi all-active >> all-active is the only
option but is typically preferred in DC
set routing-instances MACVRF101 protocols evpn interconnect interconnected-vni-list 61001 >> we defined
which VNI to extend between DCs
set routing-instances MACVRF101 protocols evpn interconnect interconnected-vni-list 61002 set routing-
instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 service-type vlan-aware >> we keep the EVPN service type consistent with
what was enabled at server leaf
set routing-instances MACVRF101 route-distinguisher 172.16.7.113:1 >> unique regular RD for the fabric
LAN purposes
set routing-instances MACVRF101 vrf-target target:1:8888 >> LAN fabric route-target shared with server
leaf nodes
set routing-instances MACVRF101 vlans vlan1001 vlan-id 1001 >> VLAN to VNI mapping
set routing-instances MACVRF101 vlans vlan1001 vxlan vni 51001 >> local site A VNI value shared with
server leaf nodes
set routing-instances MACVRF101 vlans vlan1001 VXLAN translation-VNI 61001 >> explicit but optional VNI
translation for DCI purposes
set routing-instances MACVRF101 vlans vlan1002 vlan-id 1002 set routing-instances MACVRF101 vlans
vlan1002 vxlan vni 51002
set routing-instances MACVRF101 vlans vlan1002 VXLAN translation-VNI 61002
## at the global protocol EVPN level we also explicitly provision the peer IP loopback address of the
border-leaf from same DC site
set protocols evpn interconnect-multihoming-peer-gateways 172.16.7.114 >> required for loop free
forwarding
88 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
When it comes to DC site B, the border-leaf configurations are following the same prin-
ciple as site-A. However, they have a new interconnect iESI
00:00:22:22:22:22:22:22:22:22 used for load-balancing the L2 traffic on border-leaf3/
border-leaf4. Because the iESI is used within the MACVRF for interconnect DCI pur-
poses, the DF/nDF election also takes place.
In this example, we keep the default MOD election but in Junos/Junos Evo, preference-
based DF/nDF election can be optionally used. This is especially useful when there are
more MACVRFs enable. With preference-based election, we can control which of the
two border-leaf nodes will be responsible for multicast forwarding for which set of
VLANs.
As you can see, there are no IRB routing interfaces defined at the border-leaf and
mapped to different VLANs, this is because we don’t have in our design any server also
connected to border-leaf nodes.
The translation-VNI function can be interesting when the goal is to maintain the same
broadcast domain between the sites and when the remote site is using a different VNI
value for a given VLAN. In this case, we can normalize the VNI value at the border-leaf
level.
To verify the BGP attributes associated with the local DC site server1 MAC@, use the
following Junos command at the border-leaf2:
root@border-leaf2> show route receive-protocol bgp 172.16.7.101 evpn-mac-address 00:50:56:ab:01:01
table MACVRF101.evpn.0 detail
MACVRF101.evpn.0: 101 destinations, 102 routes (101 active, 0 holddown, 0 hidden)
* 2:172.16.7.1:1::51001::00:50:56:ab:01:01/304 MAC/IP (1 entry, 1 announced)
Import Accepted
90 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
To check how this control-plane EVPN advertisement is decoded inside the wireshark
packet analyzer, we simply enable the port mirroring at the border-leaf2. The MAC@
00:50:56:ab:01:01 of the server1, when received at the border-leaf2 from the spine2 is as
shown in Figure 5.4.
Figure 5.4 Wireshark Decoded EVPN Route-Type-2 MAC/MAC-IP Received at The Border-leaf2
From Spine2 in DC-A , Originated at leaf1
91 MAC-VRF Implementation and Verification for DC and DCI
Now on Border leaf 2, let’s check the routes advertised to Border leaf 4 for server 1
(mac@ 00:50:56:ab:01:01), when the site A originated MAC@ all behind the iESI
00:00:11:11:11:11:11:11:11:11.
root@border-leaf2> show route advertising-protocol bgp 172.16.7.216 evpn-mac-address
00:50:56:ab:01:01 table MACVRF101.evpn.0 detail
MACVRF101.evpn.0: 101 destinations, 102 routes (101 active, 0 holddown, 0 hidden)
* 2:172.16.7.114:101::51001::00:50:56:ab:01:01/304 MAC/IP (1 entry, 1 announced)
BGP group overlay type External
Route Distinguisher: 172.16.7.114:101
Route Label: 61001 >> original VNI value is rewritten at the border-leaf for DCI purposes
ESI: 00:00:11:11:11:11:11:11:11:11 >> ESI rewritten with Interconnect ESI value
Nexthop: Self
Flags: Nexthop Change
AS path: [65000] I >> AS Path rewritten with own ASN at border-leaf2/border-leaf1
Communities: target:1:101 encapsulation:VXLAN(0x8) >> the new route-target associated with the
EVPN MAC route
* 2:172.16.7.114:101::51001::00:50:56:ab:01:01::10.10.0.101/304 MAC/IP (1 entry, 1 announced)
BGP group overlay type External
Route Distinguisher: 172.16.7.114:101
Route Label: 61001
ESI: 00:00:11:11:11:11:11:11:11:11
Nexthop: Self
Flags: Nexthop Change
AS path: [65000] I
Communities: target:1:101 encapsulation:VXLAN(0x8)
The advertisement mentioned here with Junos output is also decoded with wireshark to
precisely observe the attribute changes that happen at the border-leaf2 (172.16.7.114)
when the server1 mac@ is advertised towards border-leaf4 (172.16.7.216). See Figure
You can see, for example, the ESI changes from
00:00:88:88:88:88:88:88:01:01
to the new interconnect ESI
00:00:11:11:11:11:11:11:11:11.
The same goes for the AS-PATH as well as next hop, both BGP attributes are changed
for the EVPN route-type-2.
On Border-leaf2, we also verify the MAC-IP table. You can see the border-leaf2 is using
the site B site identified iESI 00:00:22:22:22:22:22:22:22:22 to reach server4/server5.
From border-leaf2 and border-leaf1 perspective, server1/server2 continue to be reach-
able via the original server ESIs. Because server3 is single homed to leaf5, the ESI infor-
mation is equal to all-zero, as per standard, and only the RVTEP IP@ information is
registered.
92 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
Figure 5.5 Wireshark Decoded EVPN MAC Route-Type2 for Server1, Sent From Border-leaf2 to
Border-leaf4 for DCI Purposes
On Border leaf 4, we check the type of tunnels created (WAN-VTEP for DCI, RVNE for
fabric side of tunnels and I-ESI peer tunnel) and entries learned from the remote DCI
gateway in the EVPN database with a new knob which helps to trace only the DCI-
learned MAC addresses.
You can see, for example, at the border-leaf4 (DC site B), that server1/server2/server3
mac addresses are reachable via the site-A iESI 00:00:11:11:11:11:11:11:11:1, also used
as the site identifier.
root@border-leaf4# run show mac-vrf forwarding vxlan-tunnel-end-point remote summary
Logical System Name Id SVTEP-IP IFL L3-Idx SVTEP-Mode ELP-SVTEP-IP
<default> 0 172.16.7.216 lo0.0 0
RVTEP-IP IFL-Idx Interface NH-Id RVTEP-Mode ELP-IP Flags
172.16.7.113 19309 vtep.32773 55058 Wan-VTEP
172.16.7.114 19308 vtep.32772 55043 Wan-VTEP
172.16.7.206 19307 vtep.32771 55022 RNVE
172.16.7.215 19312 vtep.32774 55063 I-ESI-Peer
RVTEP-IP L2-RTT IFL-Idx Interface NH-Id RVTEP-Mode ELP-IP Flags
172.16.7.114 MACVRF-techfest22 671547396 vtep-56.32772 55043 RNVE
172.16.7.215 MACVRF-techfest22 671547398 vtep-56.32774 55063 I-ESI-Peer
172.16.7.113 MACVRF101 671555589 vtep-57.32773 55058 Wan-VTEP >> DCI tunnel
172.16.7.114 MACVRF101 671555588 vtep-57.32772 55043 Wan-VTEP >> DCI tunnel
172.16.7.206 MACVRF101 671555587 vtep-57.32771 55022 RNVE
172.16.7.215 MACVRF101 671555590 vtep-57.32774 55063 I-ESI-Peer
root@border-leaf4#
root@border-leaf4> show evpn database instance MACVRF101 origin dci-remote
Instance: MACVRF101
VLAN DomainId MAC address Active source Timestamp IP address
51001 00:00:01:01:01:01 00:00:11:11:11:11:11:11:11:11 May 31 06:35:20 10.10.0.1
51001 00:50:56:ab:01:01 00:00:11:11:11:11:11:11:11:11 May 31 06:35:20 10.10.0.101 >>
Server 1
51001 00:50:56:ab:01:02 00:00:11:11:11:11:11:11:11:11 May 31 06:35:20 10.10.0.102 >>
Server 2
51001 00:50:56:ab:01:03 00:00:11:11:11:11:11:11:11:11 May 31 06:35:20 10.10.0.103 >>
Server 3
94 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
On Border leaf 4, let’s check the routes received from Border leaf 2 for server 1:
root@border-leaf4> show route receive-protocol bgp 172.16.7.114 evpn-mac-address 00:50:56:ab:01:01
table MACVRF101.evpn.0 detail
MACVRF101.evpn.0: 68 destinations, 68 routes (68 active, 0 holddown, 0 hidden)
* 2:172.16.7.114:101::51001::00:50:56:ab:01:01/304 MAC/IP (1 entry, 1 announced)
Import Accepted
Route Distinguisher: 172.16.7.114:101
Route Label: 61001 << the VXLAN VNI translation value from 51001 for DCI purposes
ESI: 00:00:11:11:11:11:11:11:11:11 >> Interconnect ESI value rewritten by border leaf 2
Nexthop: 172.16.7.114
AS path: 65113 I >> ASN received is 65000 same as local-as so it's just set to internal flag I
Communities: target:1:101 encapsulation:VXLAN(0x8)
* 2:172.16.7.114:101::51001::00:50:56:ab:01:01::10.10.0.101/304 MAC/IP (1 entry, 1 announced)
Import Accepted
Route Distinguisher: 172.16.7.114:101
Route Label: 61001
ESI: 00:00:11:11:11:11:11:11:11:11
Nexthop: 172.16.7.114
AS path: I >> ASN received is 65000 same as local-as so it's just set to internal flag I
Communities: target:1:101 encapsulation:VXLAN(0x8)
Once the rewriting of EVPN routes takes place at the border-leaf3/border-leaf4 from
DCI to DC site in site B, leaf6 view for site A originated MAC@ is the following:
root@leaf6> show mac-vrf forwarding mac-table instance MACVRF101 vlan-id 1001
MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
Ethernet switching table : 7 entries, 7 learned
Routing instance : MACVRF101
Ethernet switching table : 7 entries, 7 learned
Routing instance : MACVRF101
95 MAC-VRF Implementation and Verification for DC and DCI
You can see that server1/server2/server3 from leaf6’s point of view are all reachable via
border-leaf3/border-leaf4 iESI value 00:00:22:22:22:22:22:22:22:22 in site-B. So, even if
we get significant number of MAC@ from site-A, they will all be reachable via the same
pair of next hops.
In addition to the MACVRF mac@ learning state, we can verify if leaf6, thanks to the
seamless stitching techniques, is not using any direct tunnel to the leaf1 which originated
the MAC@ 00:50:56:ab:01:01 for server1 in DC site A:
root@leaf6> show mac-vrf forwarding VXLAN-tunnel-end-point esi
ESI RTT VLNBH INH ESI-IFL LOC-IFL #RVTEPs
00:00:22:22:22:22:22:22:22:22 MACVRF101 1769 524291 esi.1769 2 Aliasing
RVTEP-IP RVTEP-IFL VENH MASK-ID FLAGS MAC-COUNT
172.16.7.215 vtep-9.32770 1773 0 2 7
172.16.7.216 vtep-9.32771 1786 1 2 7
ESI RTT VLNBH INH ESI-IFL LOC-IFL #RVTEPs
00:00:88:88:88:88:88:88:04:04 MACVRF101 1784 524286 esi.1784 ae0.0, 1 Aliasing
RVTEP-IP RVTEP-IFL VENH MASK-ID FLAGS MAC-COUNT
172.16.7.207 vtep-9.32772 1760 0 2 1
ESI RTT VLNBH INH ESI-IFL LOC-IFL #RVTEPs
00:00:88:88:88:88:88:88:05:05 MACVRF101 1785 524288 esi.1785 ae1.0, 1 Aliasing
RVTEP-IP RVTEP-IFL VENH MASK-ID FLAGS MAC-COUNT
172.16.7.207 vtep-9.32772 1760 0 2 1
root@leaf6>
Leaf6 is not having any direct VXLAN tunnel to leaf1 or any other server leaf nodes on
DC site A) . This behavior is expected in the case of seamless stitching enabled at the bor-
der-leaf nodes on each DC site.
Here leaf6 is still using EVPN-VXLAN but it uses its local border-leaf3 (172.16.7.215)
and border-leaf4 (172.16.7.216) as next -hops for the server1 mac@00:50:56:ab:01:01
reachability.
The previous outputs help determine which RVTEPs IP addresses stand behind the
esi.1769 logical interface and ESI 00:00:22:22:22:22:22:22:22:22. Leaf6 will continue to
use direct tunnels to reach other leaf nodes located in its local DC.
96 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
For the cases where the destination mac@ is reachable via the local interface member, for
example when server4 tries to communicate with server5, the local-bias will be used in-
stead of the EVPN-VXLAN tunnel. This is because both leaf6 and leaf7 are multihomed
to these two servers. The default behavior is to prefer local bias forwarding.
On the data plane side, server1(00:50:56:ab:01:01) packet destined to server4
(00:50:56:ab:01:04) is first received by leaf1 (172.16.7.1) and then sent to border-leaf1
(172.16.7.113).
From leaf1 to border-leaf1/2, you can see that the customer vlan-id is stripped before the
layer 2 ethernet frame is pushed into the tunnel. For border-leaf1/2, it means the process-
ing of the data packet is based on the ingress VNI value. In the example of VLAN1001,
we see the packet has VNI 5100.
Figure 5.6 a pcap data frame captured at the border-leaf1 level before the stitching to
DCI VX- LAN tunnel takes place:
Figure 5.6 VXLAN Data Packet Destined to Server4 in DC Site B Sourced in DC A On Server1 -
Before Stitching
Because the packet is destined to server4 in DC site B, the border-leaf1 decapsulates the
packet received from leaf1 and encapsulates it into the new WAN VXLAN destined to
site B border-leaf3 (172.16.7.215), performing the tunnel stitching function (we can see
the VXLAN VNI changed to 61001), as shown in Figure 5.7.
Figure 5.7 VXLAN Data Packet Destined To Server4 Sourced On Server1 - After Stitching To Inter-
connect VXLAN At The Border-leaf1
97 MAC-VRF Implementation and Verification for DC and DCI
When more in-depth verification is required at the border-leaf3/4, which in our lab is the
QFX5130-32cd switch, we can verify if the DCI reachable MAC@ is installed in PFE
(Packet Forwarding Engine):
root@border-leaf3> start shell user root
[VRF:none] root@border-
leaf3:~# # >>we will enter the pfe cli at the Junos Evolved based switches such as the qfx5130-32cd
[VRF:none] root@border-leaf3:~# cli-pfe
root@border-leaf3:pfe>
root@border-leaf3:pfe>
root@border-leaf3:pfe> show l2 manager bridge-domains
index name rtt index flags
------------------------------------------------------------
2 VLAN1001+1001 51 0x0
index name rtt index flags
------------------------------------------------------------
5 VLAN1002+1002 51 0x0
index name rtt index flags
------------------------------------------------------------
6 default+1 50 0x0
root@border-leaf3:pfe>
root@border-leaf3:pfe>
root@border-leaf3:pfe> show evo-pfemand layer2 bd-table
-------------- ----------
Layer2 Token Bd-Id
-------------- ----------
4096 2
4097 5
1 6
root@border-leaf3:pfe> show evo-pfemand layer2 mac-fdb-table
Mac Address Vlan Port isLag isStatic HW Synced HW Prog isRemote
00:00:01:01:01:01 4096 0xb0000104 False False 1 1 True
00:50:56:ab:01:01 4096 0xb0000104 False False 1 1 True >> server1 MAC@
at the pfe level
00:50:56:ab:01:02 4096 0xb0000104 False False 1 1 True
00:50:56:ab:01:03 4096 0xb0000104 False False 1 1 True
00:50:56:ab:01:04 4096 0xb0000103 False False 1 1 True
00:50:56:ab:01:05 4096 0xb0000103 False False 1 1 True
18:2a:d3:57:b1:80 4096 0xb0000102 False False 1 1 True
9c:8a:cb:05:64:00 4096 0xb0000103 False False 1 1 True
00:00:01:01:01:02 4097 0xb0000104 False False 1 1 True
18:2a:d3:57:b1:80 4097 0xb0000102 False False 1 1 True
9c:8a:cb:05:64:00 4097 0xb0000103 False False 1 1 True
c0:03:80:1c:7e:e0 4097 0xb0000104 False False 1 1 True
c0:03:80:1c:b5:e0 4097 0xb0000104 False False 1 1 True
Global Mac Table Size : d
root@border-leaf3:pfe>
root@border-leaf3:pfe> show evo-pfemand virtual vtep
vtep-ifl(Count) ifl-idx ifl-name ip vtepFlavor vPortToken
isTunnelInstalled ucTunnelId ucEgressIf
----------------- -------------- -------------- -------------- -------------- -------------- --------
----------- -------------- --------------
source-ifl(3) 19304 vtep 172.16.7.215 Default 0x0 Yes
0x0 0x206a1 (Discard)
remote-ifl(1) 19305 vtep.32770 172.16.7.206 Default 0xb0000102 Yes
0x4c100102 0x206a2
remote-ifl(1) 19306 vtep.32771 172.16.7.207 Default 0xb0000103 Yes
98 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
0x4c100103 0x206a2
remote-ifl(1) 19307 vtep.32772 172.16.7.113 DCI-WAN 0xb0000104 Yes
0x4c100104 0x206a3
root@border-leaf3:pfe>
To verify which physical interface is associated with the ifl-index 19307 used for VXLAN
tunneling, you can check this recursively at QFX5130 border-leaf3/4 using the following
command:
root@border-leaf3:pfe> show evo-pfemand virtual vtep iflIndex 19307
vtep-ifl(Count) ifl-idx ifl-name ip vtepFlavor vPortToken
isTunnelInstalled ucTunnelId ucEgressIf
----------------- -------------- -------------- -------------- -------------- -------------- --------
----------- -------------- --------------
remote-ifl(1) 19307 vtep.32772 172.16.7.113 DCI-WAN 0xb0000104 Yes
0x4c100104 0x206a3
Tunnel SIP: 172.16.7.215
Tunnel DIP: 172.16.7.113
Source Vtep’s IP: 172.16.7.215
VENH info
----------
Sw VTEP Ref Count: 1
Sw Venh Id: 57050
Hw Venh Id: 132771(0x206a3)
Hw Venh Id Ref count: 1
Underlay forwarding interface (Unicast):
interface: et-0/0/24.0, (port-id: 120)
Multihoming Local Bias Filter Info:
-----------------------------------
IfpVXLAN-EsiLB Filter is not Installed
Vtep Local Bias Info:
----------------------------
No rvtep is enabled for local bias
root@border-leaf3:pfe>
Realize that for the RVTEP tunnel destination 172.16.7.113, the interface et-0/0/24 is
used as an outgoing physical interface. As we can see in the output, the DCI-WAN flag is
also associated with the given tunnel IFL, indicating it’s going to be used for DCI
purposes.
This type of in-depth verification is not required on a day-to-day basis, we introduce it
here as an additional reference for the verification tasks related to the data center inter-
connect, when using QFX5130-32cd as border-leaf node.
In these situations, some pairs of leaf nodes will be enabled with rack specific IP subnet
at the IP VRF Type-5 routing-instance level and advertise their reachability using pure
Type-5 EVPN routes. This helps scaling out the fabric by advertising only the Type-5
LPM IP prefix instead of each MAC, MAC-IP advertisement.
Besides the intra-fabric scenario where you restrain the MAC/MAC-IP EVPN route
propagation between the ToR nodes, you can have a requirement where the fabric in a
given site has Type-2 MAC/MAC-IP as well as Type-5 EVPN advertisements. However,
for the DCI purposes, only the Type-5 IP prefix advertisements are required for full
reachability between the data center sites.
For such scenarios, in larger scale deployments and when more operational control is re-
quired, then similarly as described in Chapter 4 for Type-2 stitching, you can perform
Type-5 tunnel stitching and originate, at the border-leaf level, a new DCI Type-5 tunnel
to the remote site after terminating the local Type-5 tunnels.
Let’s compare the pure Type-5 stitching to Type-2 EVPN VXLAN tunnel stitching and
highlight the differentiators:
There’s no IFL tunnel interface creation in case of pure type-5 tunnels (interface-
less) and the IP traffic is directed to tunnels-based overlay composite next hop
resolution
The load balancing for pure Type-5 tunnels for IP prefix reachability is based on
overlay IP ECMP and not using the iESI which was used in case of layer 2 (MAC)
stitching
The VNI value change/rewrite is not performed and the same routing VNI value is
used for DCI purposes as well as LAN fabric
The router MAC (RMAC) is changed in Type-5 EVPN to the local border-leaf
value
Similar to the Type-2 EVPN-VXLAN stitching implementation, a new Route-Target
(iRT), new Route-Distinguisher(iRD), and IP next hop change is done at the border-leaf
level nodes in each DC site.
When it comes to the allocation of the route-targets – it’s recommended to consider local
fabric Type-5 route-targets to be unique per site within the given T5 IPVRF and then use
a common route-target for DCI purposes in the interconnect section. Each node will ob-
viously have a different route-distinguisher for fabric and DCI purposes.
From the design point of view, when the distributed IP gateway at the leaf/ToR is de-
ployed within the DC site, then often the Type-2 and Type-5 tunnel stitching will be con-
ducted in parallel to better scale the solution and keep it consistent for Layer 2 as well as
layer 3 DCI.
As the pure Type-5 interface-less EVPN routes are not using an ESI value, for site identi-
fication purposes, it is highly recommended to consider the new BGP attribute called
100 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
D-PATH – domain path. This attribute will identify the EVPN prefix origin and help
avoid routing loops between the DC sites or any sub-optimal forwarding. Each Domain
segment is comprised of <domain segment length, domain segment value>, where the
domain segment value is a sequence of one or more domains. All domain IDs included
in the D-PATH BGP attribute are compared with the local domain IDs after the route-
target verification passed.
Managing the DCI solution using traditional communities can become too complex
when dealing with many T5-IPVRFs, so the new D-PATH approach is considered more
user friendly in terms of implementation.
Here’s an example of how to enable the D-PATH at the border-leaf level in site A – same
domain-id 101:1:EVPN will be enabled at border-leaf1/border-leaf2.
The same approach is taken for site B border-leaf3/border-leaf4 to enable the BGP D-
PATH attribute:
To better understand the implementation and verification part of pure Type-5 tunnel
stitching the following lab topology is introduced in Figure 5.8.
Figure 5.8 Seamless EVPN-VXLAN Stitching Lab for Pure Type-5 Tunnel Stitching
You can see in Figure 5.8 that VLAN10/VLAN20 is only present in DC site A and
VLAN30/VLAN40 is specific to site B, while the VLAN1001 is still stretched between
the two data centers sites.
101 Type-5 EVPN-VXLAN to Type-5 EVPN-VXLAN - Implementation and Verification
The IP prefixes associated with the site specific VLANs will be advertised to the remote
site, in this case, and not the MAC/MAC-IP information.
From the BGP underlay and overlay implementation point of view, in the case of pure
Type-5 stitching, you continue using the same principle described in more details in the
previous chapter. Both border-leaf nodes will be using the overlay eBGP ASN numbers
and make sure optimal routes are installed and the default eBGP routing loop prevention
mechanisms are in place. This means when, for some reasons, an EVPN prefix originat-
ed in site A is sent back as advertisement from the site B, then the local border-leaf1/bor-
der-leaf2 will drop that advertisement by default and only the remote site originated
prefixes will be installed in the RIB.
The VRF routing-instance is used for pure Type-5 tunnel stitching with the additional
interconnect block.
The policy-statement is used to explicitly define the prefixes local to the site and remote
prefixes which will be advertised towards the local site leaf as well as to the remote
locations.
The border-leaf2 from site A follows the same approach as border-leaf1 for the pure
Type-5 VRF implementation. Only the local RD/iRD information is changed.
After all the IPVRF Type-5 EVPN implementation, the leaf1 from site A won’t create direct
Type-5 tunnels towards site B but rather go through the border-leaf1/border-leaf2 first and
just install the two next-hops to his local site border-leaf nodes.
This is something you can verify using the following command at the border-leaf1 level.
root@border-leaf1# run show evpn ip-prefix-database l3-context T5-VRF1 extensive prefix 10.10.30.0/24
L3 context: T5-VRF1
IPv4->EVPN Exported Prefixes
Prefix: 10.10.30.0/24
EVPN route status: Created >> will be advertised to leaf1/leaf2/leaf3 in site A
Change flags: 0x0
Advertisement mode: Direct nexthop
Encapsulation: VXLAN
VNI: 1100
Router MAC: 0c:59:9c:22:6d:a2 >> local router MAC (RMAC) corresponding to IRB MAC/chassis-mac@
EVPN->IPv4 Imported Prefixes
Prefix: 10.10.30.0/24, Ethernet tag: 0
Change flags: 0x0
Remote advertisements:
Route Distinguisher: 172.16.7.215:102
VNI: 1100
Router MAC: 00:cc:34:bb:8c:2d >> RMAC info received from border-leaf3
BGP nexthop address: 172.16.7.215 >> next-hop IP@ of the border-leaf3
IP route status: Created
root@border-leaf1#
root@border-leaf1#
root@border-leaf3# run show interfaces irb | match Hardware
Current address: 00:cc:34:bb:8c:2d, Hardware address: 00:cc:34:bb:8c:2d >> border-leaf3 uses it for
RMAC in Type-5 route
root@border-leaf3#
The site B border-leaf3 view of remote prefix is as follows when it comes to RMAC and IP
next-hops.
104 Chapter 5: Seamless EVPN-VXLAN to EVPN-VXLAN Stitching – Implementation and Verification
After the reception of the prefix 10.10.10.0/24 from site A at the border-leaf3 in site B,
you can check if that prefix is advertised also to the route server spine4 and then to server
leaf6.
Config 10 Pure Type-5 verification of routes advertised to spine4 (local site route-server/reflector)
All the IP Prefix advertisements between the sites happen because the server leaf1 in DC
site A and leaf6 in DC site B are also provisioned with Type-5 IP routing instances with
the route-targets unique per site. Each of the server leafs advertise to the local spine route
servers, the local IRB interfaces corresponding IP prefix as Type-5 EVPN route, which is
then received at the border-leaf and re-originated with some of the EVPN BGP attribute
changes.
Here’s an example of the Type-5 instance configuration from leaf1 in site A where the
prefix 10.10.10.0/24 originated.
For the site B server leaf6 the configuration template is very similar, however as you can
see below the site local route-target (RT) is target:1200:1200 and corresponds to the one
used at border-leaf3/border-leaf4 in site B. This RT is not the one used for the DCI pur-
poses target:102:102, which is only used at all the border-leaf nodes communicating to
each other through Type-5 EVPN advertisement.
The leaf6 also defines explicitly which prefixes it will precisely be advertising to this local
spine4 route-server and then to the border-leaf3/border-leaf4.
Config 14 Type-5 policy-statement to define which prefixes are advertised at the server leaf6 level in
DC site B
The additional prefix advertisement is optional for loopback0.1 and static route. It’s in-
troduced here just to show how to advertise prefixes which are not specific to any IRB
interface connected.
The keyword orlonger inside the policy statement is optional but used here in order to
also advertise the host route Type-5 corresponding to the server5. This means the bor-
der-leaf will receive not only an LPM entry but also the most specific server5 prefix
107 Type-5 EVPN-VXLAN to Type-5 EVPN-VXLAN - Implementation and Verification
10.10.30.105/32. This is something optional but useful when the same subnet is enabled
on multiple top of racks, because the border-leaf will know exactly which of the server
leaf nodes to forward traffic coming from the core-IP. In case the subnet 10.10.30.0/24
was only enabled at leaf6, then obviously the host route 10.10.30.105/32 advertisement
would not be needed for the full IP reachability.
Once this is in place, we are able to see how the server leaf6 in site B processes the prefix
10.10.10.0/24 originated in site A at the server leaf1.
We can also verify the tunnel installation itself from the border-leaf3 and border-leaf4
point of view by entering the PFE (Packet Forwarding Engine) CLI and calling the evo-
pfemand commands.
Based on the EVPN Type-5 route control plane exchanged with the remote border-leaf1
(172.16.7.113) and local site B leaf6 (172.16.7.206), we can clearly see the type of tunnels
(tunnelType5) installed in the forwarding table of the Trident4 chip. The same output
shows the other type of Type2 tunnels which starts with the name vtep.xxx. The key dif-
ference we can see is related to the fact that the tunnelType5 is not really attached to any
IFL (logical interface) index while the type-2 EVPN VXLAN tunnels create in the PFE
also a logical interface instance.
After all that detailed verification now we need to go to basics and at least run a basic ping
from site A to site B between server1 [email protected] and server5 [email protected].
{master:0}[edit]
root@server1# >
109 Type-5 EVPN-VXLAN to Type-5 EVPN-VXLAN - Implementation and Verification
The format of the data packet received at border-leaf1 via pure Type-5 EVPN-VXLAN
tunnel from the server leaf1 is shown below. We can see that the MAC@ used inside the
tunnel are the chassis IRB MAC of the leaf1 and border-leaf1 instead of the original
server1 and server5 – this is based on the RMAC community advertisement inside
EVPN Type-5 route. This way DC site A to DC site B for server1 to server5 communica-
tion only need to advertise the Pure type-5 prefixes instead of the end host MAC addre-
ses, which would be the case of the L2 stretched DCI with Type-2/Type-2
EVPN-VXLAN stitching – covered in previous chapter.
Figure 5.9 Type-5 EVPN-VXLAN tunnel encapsulated data packet sent from leaf1 to border-leaf1
– before pure Type-5 tunnel stitching on border-leaf1
Figure 5.10 Type-5 EVPN-VXLAN tunnel encapsulated data packet sent from border-leaf1 to
border-leaf3 – after pure Type-5 tunnel stitching on border-leaf1
After the Type-5 stitching on border-leaf1 we can also observe the changes on the data
packet format – a new RMAC (Router MAC) information is used (the one from border-
leaf nodes). The figure above shows the data plane result of the tunnel stitching at the
border-leaf1 for pure Type-5 tunnels – fabric T5 tunnel received from server leaf1 is
stitched to the DCI Type-5 tunnel destined to the remote location border-leaf3.
Chapter 6
The cool thing about the new RFC9014 seamless stitching techniques is that it can be
used in different types of transport requirements. In fact, besides the EVPN-VXLAN to
EVPN-VXLAN stitching scenario covered in previous sections, the second very popular
scenario for DCI is when the same EVI is used to seamlessly deliver the Ethernet bridg-
ing between multiple data center sites, using VXLAN in the LAN DC and MPLS in the
WAN for DCI purposes.
This is something we can explain in detail using the following lab scenario where the in-
termediate EVPN domain is using the MPLS encapsulation while the local site domains
are using EVPN-VXLAN.
Figure 6.1 DCI EVPN-VXLAN to EVPN-MPLS Seamless Tunnel Stitching - Lab Topology
In the scenario shown in Figure 6.1, the border-leaf nodes, aka gateways, are part of the
fabric and will seamlessly change the transport format from VXLAN to MPLS. This will
also be done automatically using built-in BGP EVPN attributes. We’ll use this topology
for further configuration discussion and verifications.
111 BGP Peers Connecting to the Existing Backbone IP MPLS
Before we go into the details, it’s also worth reminding the reader of situations this option
can be considered. Here are the most common reasons some organizations may prefer
using the MPLS versus VXLAN in the DCI scenario:
Existing core network is already running the MPLS, and the P router connectivity is
directly available in the DC room.
Smaller DC fabrics satellite sites are deployed in regions and adding a separate PE to
connect to MPLS is not cost effective.
Existing PE devices are integrated as border-leaf nodes into the fabric, and they
already have the long reach MPLS connectivity to remote sites.
VPLS was used so far as the DCI technique so the migration to EVPN-MPLS DCI
can offer fast convergence capabilities.
For the MPLS connectivity we also need to explicitly enable the LDP and MPLS proto-
cols on the xe-0/2/0 interface connected to the P1_transit router and to get the transport
labels allocated.
We can also verify the basic state of the ISIS and MPLS using the following commands.
This is recommended before starting any EVI and overlay BGP configurations.
root@border-leaf6_re> show isis adjacency
Interface System L State Hold (secs) SNPA
xe-0/2/0.0 P1_transit_re 2 Up 26 2c:6b:f5:4c:cb:52
root@border-leaf6_re> show ldp database
Input label database, 172.16.7.116:0--172.16.7.20:0
Labels received: 5
Label Prefix
3 172.16.7.20/32
299936 172.16.7.115/32
299920 172.16.7.116/32
299888 172.16.7.217/32
299952 172.16.7.218/32
Output label database, 172.16.7.116:0--172.16.7.20:0
Labels advertised: 5
Label Prefix
299840 172.16.7.20/32
300832 172.16.7.115/32
3 172.16.7.116/32
299952 172.16.7.217/32
301088 172.16.7.218/32
root@border-leaf6_re>
root@border-leaf6_re> show ldp interface
113 BGP Peers Connecting to the Existing Backbone IP MPLS
The MPLS label 299888 associated with border-leaf7 loopback IP @ 172.16.7.217 will
be used as an outer/transport label when sending packets from DC site A to site B. You
will see in the next section how the data packet is built when the EVPN EVI label and
transport label are both used.
At the iBGP level, the local-as number 65000 will be used on all border-leaf nodes and
they will peer with the core IP route-reflector. Here, the route reflector loopback IP ad-
dress is 172.16.7.20.
Config 22 Border-leaf5 iBGP configuration for the connectivity with route-reflector P1_transit
Config 23 Border-leaf6 iBGP configuration for the connectivity with route-reflector P1_transit
The route-reflector in the lab example is in-path but obviously if there’s an existing off-
path route-reflector in the core IP network, it can also be used and only the additional
family EVPN signalization will have to be added at the RR level.
set protocols bgp group WAN neighbor 172.16.7.115 >> border-leaf5 in site A
set protocols bgp group WAN neighbor 172.16.7.116 >> border-leaf6 in site A
set protocols bgp group WAN neighbor 172.16.7.217 >> border-leaf7 in site B
set protocols bgp group WAN neighbor 172.16.7.218 >> border-leaf8 in site B
The route-reflector has the local cluster value enabled and peers to all border-leaf nodes,
if the cluster-id is not set the route-reflection of EVPN routes between the data center
sites won’t work in the iBGP scenario.
The border-leaf in each site must also connect to the local fabric spines to learn the
EVPN routes originated at the server leaf nodes. The following configuration is specific
to integrate the border-leaf to the fabric LAN EVPN-VXLAN side.
From the verification point of view, we can use the well-known commands for BGP to
check the peers for the fabric as well as for the DCI purposes are in the ‘Established’ state.
root@border-leaf6_re> show bgp summary
Threading mode: BGP I/O
Default eBGP mode: advertise - accept, receive - accept
Groups: 3 Peers: 5 Down peers: 0
Table Tot Paths Act Paths Suppressed History Damp State Pending
bgp.evpn.0
118 80 0 0 0 0
inet.0
8 8 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/
Accepted/Damped...
172.16.7.20 65000 2533 2093 0 4 15:21:17 Establ >> overlay peering to
the route-reflector
bgp.evpn.0: 61/80/61/0
MACVRF101.evpn.0: 26/43/26/0
__default_evpn__.evpn.0: 1/1/1/0
172.16.7.105 65101 2021 2421 0 4 15:21:13 Establ >> peering to the fabric
spine1 in site A
bgp.evpn.0: 17/19/19/0
MACVRF101.evpn.0: 16/17/17/0
__default_evpn__.evpn.0: 0/0/0/0
172.16.7.106 65101 2021 2422 0 4 15:21:13 Establ >> peering to the fabric
spine2 in site A
bgp.evpn.0: 2/19/19/0
MACVRF101.evpn.0: 1/17/17/0
__default_evpn__.evpn.0: 0/0/0/0
192.168.9.1 65101 2004 2027 0 4 15:21:17 Establ >> underlay peering to
the fabric spine1 in site A
inet.0: 4/4/4/0
192.168.10.1 65101 2004 2027 0 4 15:21:17 Establ >> underlay peering to
the fabric spine2 in site A
inet.0: 4/4/4/0
root@border-leaf6_re>
In these verification tasks, we see that both border-leaf nodes are sharing the same BGP
ASN number for the EVPN-VXLAN peering towards the spines, while still maintaining
the eBGP as the type of peering towards spines5/6. This helps to prevent any sub-opti-
mal routes to be installed in the RIB – which could happen of both border-leaf nodes
were using different overlay ASN.
which are staying local. This is important as in many cases, there’s a customer require-
ment to keep the data only local – for example when the requirement is not to stretch it
between the two countries. In this case the admin will precisely define only the VLANs
for interconnect purposes that have high DRS (disaster recovery solution) requirements
and are not restricted to site-to-site data storage replication.
Before we go into more details on the configurations and verification, we will quickly re-
fresh the simplified topology used for our EVPN-VXLAN to EVPN-MPLS lab in Figure
6.2.
Here’s an example of the EVI (EVPN instance) configuration used at the border-leaf5
and border-leaf6 from data center site A. Comparing this to the previous use case you can
see that within the interconnect block of configurations the encapsulation format is now
MPLS and the instance type is virtual-switch for the MX border-leaf.
set routing-instances MACVRF101 instance-type virtual-switch >> will enable the VLAN-aware service-
type on MX
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN >> encapsulation used in LAN fabric
to server-leaf
set routing-instances MACVRF101 protocols evpn extended-vni-list all
set routing-instances MACVRF101 protocols evpn interconnect vrf-target target:1:101 >> route-target
for DCI EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect route-distinguisher 172.16.7.115:101 >>
iRD unique per node
set routing-instances MACVRF101 protocols evpn interconnect esi 00:00:11:11:11:11:11:11:11:11 >>
site-id iESI in DC A
set routing-instances MACVRF101 protocols evpn interconnect esi all-active
set routing-instances MACVRF101 protocols evpn interconnect interconnected-VLAN-list 1001
set routing-instances MACVRF101 protocols evpn interconnect encapsulation mpls
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 bridge-domains bd1001 domain-type bridge
set routing-instances MACVRF101 bridge-domains bd1001 vlan-id 1001
set routing-instances MACVRF101 bridge-domains bd1001 routing-interface irb.1001
set routing-instances MACVRF101 bridge-domains bd1001 vxlan vni 51001 >> vxlan vni value we stitch to
MPLS
set routing-instances MACVRF101 bridge-domains bd1001 VXLAN ingress-node-replication
set routing-instances MACVRF101 route-distinguisher 172.16.7.115:1 >> unique LAN EVPN-VXLAN RD per
node
set routing-instances MACVRF101 vrf-target target:1:8888 >> route-target for LAN EVPN-VXLAN in fabric
site A
117 EVPN Instance (EVI) Provisioning When Stitching to MPLS
A similar configuration will be enabled at the boder-leaf6 from site A where the iESI will be
the same to allow for site identification but also load-balancing to happen whenever the site
B hosts are trying to reach the site A hosts. The EVPN-MPLS interconnect route-target for
DCI purposes, target:1:101, is the same on all border-leaf nodes enabled for DCI, as well as
the local LAN EVPN-VXLAN target:1:8888 is shared between the border-leaf nodes and
the server-leaf nodes.
set routing-instances MACVRF101 instance-type virtual-switch >> will enable the VLAN-aware service-type
on MX
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN >> encapsulation used in LAN fabric to
server-leaf
set routing-instances MACVRF101 protocols evpn extended-vni-list all
set routing-instances MACVRF101 protocols evpn interconnect vrf-target target:1:101 >> route-target for
DCI EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect route-distinguisher 172.16.7.116:101 >> iRD
unique per node
set routing-instances MACVRF101 protocols evpn interconnect esi 00:00:11:11:11:11:11:11:11:11 >> site-id
iESI in DC A
set routing-instances MACVRF101 protocols evpn interconnect esi all-active >> only all-active is
currently supported
set routing-instances MACVRF101 protocols evpn interconnect interconnected-VLAN-list 1001 >> explicit
list of stretched VLANs
set routing-instances MACVRF101 protocols evpn interconnect encapsulation mpls >> change of
encapsulation to MPLS
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 bridge-domains bd1001 domain-type bridge
set routing-instances MACVRF101 bridge-domains bd1001 vlan-id 1001
set routing-instances MACVRF101 bridge-domains bd1001 vxlan vni 51001 >> vxlan vni value we stitch to
MPLS
set routing-instances MACVRF101 bridge-domains bd1001 VXLAN ingress-node-replication
set routing-instances MACVRF101 route-distinguisher 172.16.7.116:1 >> unique LAN EVPN-VXLAN RD per node
set routing-instances MACVRF101 vrf-target target:1:8888 >> route-target for LAN EVPN-VXLAN in fabric
site A
When compared to the previous example of stitching, here we specify in the interconnect
section the list of VLANs we want to stretch using the interconnect-VLAN-list. This is be-
cause in case of stitching from VXLAN to MPLS we don’t use the notion of VNI anymore.
The data center site B configurations are consistent and share the same interconnect route-
target, target:1:101, however for the LAN EVPN-VXLAN purposes we allocate a different
route-target than the one from the site A. Here’s an example configuration of the DC site B
EVI for border-leaf nodes.
set routing-instances MACVRF101 instance-type virtual-switch >> will enable the VLAN-aware service-type
on MX
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN >> encapsulation used in LAN fabric to
server-leaf
set routing-instances MACVRF101 protocols evpn extended-vni-list all
118 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
set routing-instances MACVRF101 protocols evpn interconnect vrf-target target:1:101 >> route-target
for DCI EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect route-distinguisher 172.16.7.217:101 >>
iRD EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect esi 00:00:22:22:22:22:22:22:22:22 >>
site-id iESI in DC B
set routing-instances MACVRF101 protocols evpn interconnect esi all-active
set routing-instances MACVRF101 protocols evpn interconnect interconnected-VLAN-list 1001 >> explicit
list of stretched VLANs
set routing-instances MACVRF101 protocols evpn interconnect encapsulation mpls >> change of
encapsulation to MPLS
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 bridge-domains bd1001 domain-type bridge
set routing-instances MACVRF101 bridge-domains bd1001 vlan-id 1001
set routing-instances MACVRF101 bridge-domains bd1001 vxlan vni 51001 >> vxlan vni value we stitch to
MPLS
set routing-instances MACVRF101 bridge-domains bd1001 VXLAN ingress-node-replication
set routing-instances MACVRF101 route-distinguisher 172.16.7.217:1 >> LAN EVPN-VXLAN route-
distinguisher
set routing-instances MACVRF101 vrf-target target:1:9999 >> route-target for LAN EVPN-VXLAN in fabric
site B
set routing-instances MACVRF101 instance-type virtual-switch >> will enable the VLAN-aware service-
type on MX
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN >> encapsulation used in LAN
fabric to server-leaf
set routing-instances MACVRF101 protocols evpn extended-vni-list all
set routing-instances MACVRF101 protocols evpn interconnect vrf-target target:1:101 >> route-target
for DCI EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect route-distinguisher 172.16.7.218:101 >>
iRD EVPN-MPLS
set routing-instances MACVRF101 protocols evpn interconnect esi 00:00:22:22:22:22:22:22:22:22 >>
site-id iESI in DC B
set routing-instances MACVRF101 protocols evpn interconnect esi all-active
set routing-instances MACVRF101 protocols evpn interconnect interconnected-VLAN-list 1001 >> explicit
list of stretched VLANs
set routing-instances MACVRF101 protocols evpn interconnect encapsulation mpls >> change of
encapsulation to MPLS
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 bridge-domains bd1001 domain-type bridge
set routing-instances MACVRF101 bridge-domains bd1001 vlan-id 1001 >> vlan-id used for local pe-ce
ports
set routing-instances MACVRF101 bridge-domains bd1001 vxlan vni 51001 >> vxlan vni value we stitch to
119 EVPN Instance (EVI) Provisioning When Stitching to MPLS
MPLS
set routing-instances MACVRF101 bridge-domains bd1001 VXLAN ingress-node-replication
set routing-instances MACVRF101 route-distinguisher 172.16.7.218:1 >> LAN EVPN-VXLAN route-
distinguisher
set routing-instances MACVRF101 vrf-target target:1:9999 >> route-target for LAN EVPN-VXLAN in fabric
site B
DC site A and DC site B configurations are very similar, so automating them using An-
sible or Python shouldn’t be a big problem. Mainly the interconnect iESI and local LAN
route-targets are changing, besides route-distinguishers which are unique per node in site
A and B – on each node one RD for LAN fabric part and one for WAN DCI.
Border-leaf config for DCI purposes is the focus of this Day One book, however it’s also
important to bring up what the server-leaf EVPN-VXLAN configuration in each data
center site looks like. This will help you understand later how the RT and ESI values are
changing for the MAC addresses that are advertised through EVPN between the two
data center sites.
Here, for example, is the server-leaf LEAF11 and LEAF12 configuration of the EVI
(MAC-VRF) when connecting server1.
set routing-instances MACVRF101 instance-type mac-vrf >> instance type MACVRF is used on qfx series of
switches
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN
set routing-instances MACVRF101 protocols evpn default-gateway no-gateway-community
set routing-instances MACVRF101 protocols evpn extended-vni-list 51001 >> we define which VNIs are
enabled explicitly
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 service-type vlan-aware >> explicit definition of the service-type
set routing-instances MACVRF101 interface ae0.0 >> we enable the interface connected to server1 in the
mac-VRF
set routing-instances MACVRF101 route-distinguisher 172.16.7.11:1 >> unique LAN level Route-
Distinguisher (RD)
set routing-instances MACVRF101 vrf-target target:1:8888 >> same value used on all server-leaf nodes
sharing that mac-VRF
set routing-instances MACVRF101 vlans vlan1001 vlan-id 1001
set routing-instances MACVRF101 vlans vlan1001 l3-interface irb.1001
set routing-instances MACVRF101 vlans vlan1001 vxlan vni 51001
set interfaces ae0 description “EP-style to server1”
set interfaces ae0 mtu 9100
set interfaces ae0 esi 00:00:88:88:88:88:88:88:11:12 >> local LAN EVPN-VXLAN ESI used for server1
multihoming
set interfaces ae0 esi all-active >> QFX server leaf only support all-active ESI but we enable it
explicitly as well
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-id 00:01:88:88:01:01 >> same value as on
LEAF12
set interfaces ae0 aggregated-ether-options lacp admin-key 1
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members 1001
set interfaces xe-0/0/0 description member-link_ae0
set interfaces xe-0/0/0 ether-options 802.3ad ae0 >> physical interface connected to server1 and
120 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
You can see that at the QFX server-leaf node the instance-type is mac-VRF while on the
MX we’re still using the traditional virtual-switch instance type. This is because at the
time of writing this book we used the Junos release which qualified only the traditional
virtual-switch instance type and because of the official support of virtual-switch for any
MPLS purposes when running release 21.4r2 of Junos at the border-leaf MX devices.
For the completeness of our solution, we also introduce the MAC-VRF config used at the
server-leaf LEAF12 – both in data center site A, as well as the LEAF21 from site B.
set routing-instances MACVRF101 instance-type mac-vrf >> this instance type is used on server-leaf
nodes such as QFX
set routing-instances MACVRF101 protocols evpn encapsulation VXLAN
set routing-instances MACVRF101 protocols evpn default-gateway no-gateway-community
set routing-instances MACVRF101 protocols evpn extended-vni-list 51001
set routing-instances MACVRF101 vtep-source-interface lo0.0
set routing-instances MACVRF101 service-type vlan-aware >> explicitly set on QFX but border-leaf MX
use virtual-switch
set routing-instances MACVRF101 interface ae0.0
set routing-instances MACVRF101 route-distinguisher 172.16.7.12:1
set routing-instances MACVRF101 vrf-target target:1:8888 >> this value was also used at the border-
leaf5/6 for LAN section
set routing-instances MACVRF101 vlans vlan1001 vlan-id 1001
set routing-instances MACVRF101 vlans vlan1001 l3-interface irb.1001
set routing-instances MACVRF101 vlans vlan1001 vxlan vni 51001
set interfaces ae0 description “EP-style to server1”
set interfaces ae0 mtu 9100
set interfaces ae0 esi 00:00:88:88:88:88:88:88:11:12 >> local LAN EVPN-VXLAN ESI used for server1
multihoming
set interfaces ae0 esi all-active
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-id 00:01:88:88:01:01 >> same value as on
LEAF11
set interfaces ae0 aggregated-ether-options lacp admin-key 1
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members 1001
set interfaces xe-0/0/0 description member-link_ae0
set interfaces xe-0/0/0 ether-options 802.3ad ae0 >> physical interface connected to server1 and
member of AE0 LAG
set chassis aggregated-devices ethernet device-count 2 >> defines the number of AE interfaces to
servers, it can be higher
set forwarding-options evpn-vxlan shared-tunnels >> this global config is specific to qfx5120 to
enable optimal MAC-VRF
121 EVPN Instance (EVI) Provisioning When Stitching to MPLS
# The following configs are also required when the same qfx5120 server-leaf node runs the T5 IP VRF
instances
set routing-options forwarding-table chained-composite-next-hop ingress EVPN
set forwarding-options vxlan-routing next-hop 32768 >> for qfx5120 specific to enable more overlay
next-hops
set forwarding-options vxlan-routing interface-num 8192 >> for qfx5120 leaf when dealing with IPv4 &
IPv6 IRBs
set forwarding-options vxlan-routing overlay-ecmp >> This is only required when in parallel the Type-5
instance is used
The following configuration of the server-leaf in site B – leaf21 can also be highlighted,
mainly to mention again that it should not use the same route-target as the route-targets in
site A.
Server4 interface connect configuration of the ESI at interface AE0, used locally within
the DC site B EVPN-VXLAN fabric and explicitly enabled.
Once the EVI configs shown in Figurer 6.3 were enabled at border-leaf nodes in each
DC site as well as at the server leaf nodes, the following prefix advertisement is received
at the border-leaf5 from leaf11 in site A within the EVPN-VXLAN fabric domain. The
same MAC@ will then be re-originated for seamless DCI stitching purposes but this time
with MPLS encapsulation information. The label value instead of the VNI value will be
used and the community attribute will be informing about the encapsulation type as part
of the BGP EVPN advertisement.
Figure 6.3 Server4 MAC EVPN Route Received on the Border-leaf7 From leaf22 in DC Site B
The border-leaf7 verification of the local fabric route received from LEAF22 in the data
center site B is also confirming it’s getting it from the spine7/8 route servers and next-hop
IP corresponding to LEAF21/LEAF22 – here the next hop for LEAF22 is 172.16.7.222
and the neighbor spine7 IP@ is 172.16.7.207:
root@border-leaf7_re# run show route receive-protocol bgp 172.16.7.207 table MACVRF101.
evpn.0 active-path next-hop 172.16.7.222
MACVRF101.evpn.0: 45 destinations, 59 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
1:172.16.7.207:1::8888888888882122::0/192 AD/EVI
123 EVPN Instance (EVI) Provisioning When Stitching to MPLS
* 172.16.7.222 65201 65222 I
1:172.16.7.222:0::8888888888882122::FFFF:FFFF/192 AD/ESI
* 172.16.7.222 65201 65222 I
2:172.16.7.207:1::51001::00:00:01:01:01:01/304 MAC/IP
* 172.16.7.222 65201 65222 I
2:172.16.7.207:1::51001::00:50:56:ab:01:04/304 MAC/IP
* 172.16.7.222 65201 65222 I
2:172.16.7.207:1::51001::00:00:01:01:01:01::10.10.0.1/304 MAC/IP
* 172.16.7.222 65201 65222 I
2:172.16.7.207:1::51001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP
* 172.16.7.222 65201 65222 I
3:172.16.7.207:1::51001::172.16.7.222/248 IM
* 172.16.7.222 65201 65222 I
[edit]
root@border-leaf7_re#
To verify the detailed information for one of the EVPN prefix local to the site B before it
gets re-originated to the MPLS domain we run the following verification command:
root@border-leaf7_re# run show route table MACVRF101.evpn.0 match-prefix 2:172.16.7.207:1::51001::00:
50:56:ab:01:04/304 detail active-path
MACVRF101.evpn.0: 45 destinations, 59 routes (45 active, 0 holddown, 0 hidden)
2:172.16.7.207:1::51001::00:50:56:ab:01:04/304 MAC/IP (2 entries, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 172.16.7.207:1
Next hop type: Indirect, Next hop index: 0
Address: 0x76b6160
Next-hop reference count: 30
Source: 172.16.7.207
Protocol next hop: 172.16.7.222
Indirect next hop: 0x2 no-forward INH Session ID: 0
State: <Secondary Active Ext>
Peer AS: 65201
Age: 17:36 Metric2: 0
Validation State: unverified
Task: BGP_65201_65217.172.16.7.207
Announcement bits (1): 0-MACVRF101-EVPN
AS path: 65201 65222 I
Communities: target:1:9999 encapsulation:VXLAN(0x8)
Import Accepted
Route Label: 51001
ESI: 00:00:88:88:88:88:88:88:21:22
Localpref: 100
Router ID: 172.16.7.207
Primary Routing Table: bgp.evpn.0
Thread: junos-main
[edit]
root@border-leaf7_re#
Indeed, you can observe at border-leaf7 in DC site B that the server4 MAC@ when origi-
nated at LEAF22 is still using the VXLAN community and original ESI value of
00:00:88:88:88:88:88:88:21:22.
Once the server4 MAC EVPN route is received, it will be re-originated from the same
EVI to the DCI domain but with new BGP EVPN attribute values. The new
124 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
interconnect ESI will be set at the border-leaf7 level and because we changed the encap-
sulation format from VXLAN to MPLS, at the control plane level in the given MAC@ of
server4 (00:50:56:ab:01:04) will have now the Route-Label associated – here the route
label 299808 is used. This information will then be used by the border-leaf5/6 in site A to
build the MPLS data packet with label 299808 used as VPN label. The VNI related in-
formation is not part of the advertisement from the boder-leaf7/8 in this DCI use case
anymore because the VXLAN to MPLS stitching is taking place. The vlan-tag informa-
tion is corresponding also to the VLAN we configured at the interconnect EVPN level
– here we used 1001 value which was the same as the local fabric VNI value, however if
there were values above 4092 for local fabric VNIs then we would need to allocate in the
configuration some value from the range of 4K VLANs when stitching to the MPLS
domain.
root@border-leaf7_re# run show route advertising-protocol bgp 172.16.7.20 table MACVRF101.evpn.0
evpn-mac-address 00:50:56:ab:01:04 active-path
MACVRF101.evpn.0: 45 destinations, 59 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
2:172.16.7.217:101::1001::00:50:56:ab:01:04/304 MAC/IP
* Self 100 I
2:172.16.7.217:101::1001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP
* Self 100 I
[edit]
root@border-leaf7_re#
root@border-leaf7_re# run show route advertising-protocol bgp 172.16.7.20 table MACVRF101.evpn.0
evpn-mac-address 00:50:56:ab:01:04 active-path detail
MACVRF101.evpn.0: 45 destinations, 59 routes (45 active, 0 holddown, 0 hidden)
* 2:172.16.7.217:101::1001::00:50:56:ab:01:04/304 MAC/IP (1 entry, 1 announced)
BGP group WAN type Internal
Route Distinguisher: 172.16.7.217:101
Route Label: 299808
ESI: 00:00:22:22:22:22:22:22:22:22
Nexthop: Self
Flags: Nexthop Change
Localpref: 100
AS path: [65000] I
Communities: target:1:101
* 2:172.16.7.217:101::1001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP (1 entry, 1 announced)
BGP group WAN type Internal
Route Distinguisher: 172.16.7.217:101
Route Label: 299808
ESI: 00:00:22:22:22:22:22:22:22:22
Nexthop: Self
Flags: Nexthop Change
Localpref: 100
AS path: [65000] I
Communities: target:1:101
[edit]
root@border-leaf7_re#
To follow the control plane part of the verification, the border-leaf6 in site A once it re-
ceived the server4 MAC@ info from the remote site B, will process it and readvertise to
the local fabric A with the new EVPN BGP attributes – specific to his local site A
125 EVPN Instance (EVI) Provisioning When Stitching to MPLS
configurations of the border-leaf5/6. It will advertise it to the spine5/6 route servers us-
ing the local interconnect iESI value of 00:00:11:11:11:11:11:11:11:11 and the route la-
bel information now corresponds to the vxlan vni 51001 instead of the route label value
299808 used when originally received from the MPLS DCI domain route-reflector.
root@border-leaf6_re> show route advertising-protocol bgp 172.16.7.105 table MACVRF101.evpn.0 active-
path evpn-mac-address 00:50:56:ab:01:04
MACVRF101.evpn.0: 48 destinations, 83 routes (48 active, 0 holddown, 17 hidden)
Prefix Nexthop MED Lclpref AS path
2:172.16.7.116:1::51001::00:50:56:ab:01:04/304 MAC/IP
* Self I
2:172.16.7.116:1::51001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP
* Self I
root@border-leaf6_re>
root@border-leaf6_re> show route advertising-protocol bgp 172.16.7.105 table MACVRF101.
evpn.0 active-path evpn-mac-address 00:50:56:ab:01:04 detail
MACVRF101.evpn.0: 48 destinations, 83 routes (48 active, 0 holddown, 17 hidden)
* 2:172.16.7.116:1::51001::00:50:56:ab:01:04/304 MAC/IP (1 entry, 1 announced)
BGP group overlay type External
Route Distinguisher: 172.16.7.116:1
Route Label: 51001
ESI: 00:00:11:11:11:11:11:11:11:11
Nexthop: Self
Flags: Nexthop Change
AS path: [65115] I
Communities: target:1:8888 encapsulation:VXLAN(0x8)
* 2:172.16.7.116:1::51001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP (1 entry, 1 announced)
BGP group overlay type External
Route Distinguisher: 172.16.7.116:1
Route Label: 51001
ESI: 00:00:11:11:11:11:11:11:11:11
Nexthop: Self
Flags: Nexthop Change
AS path: [65115] I
Communities: target:1:8888 encapsulation:VXLAN(0x8)
root@border-leaf6_re>
We can also observe that the DCI community target:1:101 used by all border-leaf nodes
for DCI purposes is replaced with the site local route-target target:1:8888. In fact, each
site should use per EVI/MAC-VRF a site-specific local route-target – here the site A uses
target:1:8888 and site B is using target:1:9999.
At the data plane level, when server1 pings server4, the packet reaches border-leaf6 inter-
faces xe-0/3/0 and xe-0/3/ before stitching to MPLS, the data packet looks like Figure
6.4.
126 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
Figure 6.4 Server1 Sourced Data Frame Destined To Server4, When Received at the Border-leaf6
Before Stitching VXLAN To MPLS
You can see that the VXLAN encapsulation is used and the local site VNI value of 51001
is used also at the forwarding level. It can be also noticed that by default the original vlan-
id is stripped and only the VNI value in the VXLAN header is used when the packet is
sent from server LEAF11/LEAF12 to border-leaf5/6. See Figure 6.5.
Figure 6.5 Server1 Sourced Data Frame Destined to Server4, When Received at the Border-leaf6
After Stitching VXLAN To MPLS
After the VXLAN to MPLS stitching, the border-leaf5/6 uses the inner (EVI) label
299808 to reach the server4 mac@ via the MPLS label mapped to the remote site B in-
terconnect iESI, it also adds the transport MPLS label 299888.
127 EVPN Instance (EVI) Provisioning When Stitching to MPLS
In DC site A, to analyze the data plane for specific MAC@ of server4 originated in DC
site B, use the following Junos commands:
root@border-leaf6_re> show route table MACVRF101.evpn.0 evpn-mac-address 00:50:56:ab:01:04 active-
path next-hop 172.16.7.217
MACVRF101.evpn.0: 39 destinations, 57 routes (39 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2:172.16.7.217:101::1001::00:50:56:ab:01:04/304 MAC/IP
*[BGP/170] 2d 06:03:06, localpref 100, from 172.16.7.20
AS path: I, validation-state: unverified
> to 192.168.15.1 via xe-0/2/0.0, Push 299888
2:172.16.7.217:101::1001::00:50:56:ab:01:04::10.10.0.104/304 MAC/IP
*[BGP/170] 2d 06:03:06, localpref 100, from 172.16.7.20
AS path: I, validation-state: unverified
> to 192.168.15.1 via xe-0/2/0.0, Push 299888
root@border-leaf6_re> show route label 299888
mpls.0: 15 destinations, 16 routes (15 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
299888 *[EVPN/7] 2d 06:03:11, remote-pe 172.16.7.217, routing-instance MACVRF101, route-type
Egress-MAC
> to 192.168.15.1 via xe-0/2/0.0, Push 299808, Push 299888(top)
root@border-leaf6_re>
You can see the MPLS label allocation will happen for inner 299808 (related to destina-
tion EVI/iESI) and outer label 299888 – top label for transport purposes.
Because the destination server4 MAC@ from site B is reachable via two border-leaf
nodes, border-leaf7/8, the EVPN aliasing feature (load balancing) per destination ESI at
the per EVI level will be used:
root@border-leaf6_re> show route table mpls.0 protocol EVPN | grep “Egress-MAC”
301056 *[EVPN/7] 00:00:49, routing-instance MACVRF101, route-type Egress-MAC, ESI
00:00:22:22:22:22:22:22:22:22
301072 *[EVPN/7] 02:43:21, remote-pe 172.16.7.217, routing-instance MACVRF101, route-type
Egress-MAC
301120 *[EVPN/7] 00:00:49, remote-pe 172.16.7.218, routing-instance MACVRF101, route-type
Egress-MAC
root@border-leaf6_re> show route label 301056
mpls.0: 16 destinations, 17 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
301056 *[EVPN/7] 00:00:51, routing-instance MACVRF101, route-type Egress-MAC, ESI
00:00:22:22:22:22:22:22:22:22
to 192.168.15.1 via xe-0/2/0.0, Push 299808, Push 299888(top)
> to 192.168.15.1 via xe-0/2/0.0, Push 299808, Push 299952(top)
root@border-leaf6_re>
When it comes to bridge the MAC table verification at the MX border-leaf level you can
also check which are the ESIs associated with the server mac addresses:
root@border-leaf6_re> show bridge mac-table instance MACVRF101
MAC flags (S -static MAC, D -dynamic MAC, L -locally learned, C -Control MAC
O -OVSDB MAC, SE -Statistics enabled, NM -Non configured MAC, R -Remote PE MAC, P -Pinned MAC, FU
- Fast Update)
Routing instance : MACVRF101
Bridging domain : bd1001, VLAN : 1001
MAC MAC Logical NH MAC active
128 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
address flags interface Index property source
00:00:01:01:01:01 DRP vtep.32769 172.16.7.11
00:50:56:ab:01:01 DR esi.624 00:00:88:88:88:88:88:88:11:12
00:50:56:ab:01:03 DR vtep.32770 172.16.7.13
00:50:56:ab:01:04 DC .local..1048589 1048589 00:00:22:22:22:22:22:22:22:22
root@border-leaf6_re>
To further trace the history of MAC@ within the EVPN database you can specify the
origin of the server4 mac address you want to review:
root@border-leaf6_re> show evpn database origin dci-remote
Instance: MACVRF101
VLAN DomainId MAC address Active source Timestamp IP address
1001 00:00:01:01:01:01 00:00:22:22:22:22:22:22:22:22 Jun 27 05:56:31
1001 00:50:56:ab:01:04 00:00:22:22:22:22:22:22:22:22 Jun 27 05:56:31 10.10.0.104
root@border-leaf6_re> show evpn database origin dci-remote mac-address 00:50:56:ab:01:04 extensive
Instance: MACVRF101
VLAN ID: 1001, MAC address: 00:50:56:ab:01:04
Nexthop ID: 1048589
State: 0x0
Source: 00:00:22:22:22:22:22:22:22:22, Rank: 1, Status: Active
Remote origin: 172.16.7.217
Remote state: <Mac-Only-Adv Interconnect-DC>
Remote origin: 172.16.7.218
Remote state: <Mac-Only-Adv Interconnect-DC>
Mobility sequence number: 0 (minimum origin address 172.16.7.217)
Timestamp: Jun 27 05:56:31.636360 (0x62b9a8ff)
State: <Remote-To-Local-Adv-Done>
MAC advertisement route status: Not created (no local state present)
Interconn advertisement route status: DC route created
IP address: 10.10.0.104
Flags: <Sent-to-l2ald Interconnect-DC>
Remote origin: 172.16.7.217
Remote state: <Interconnect-DC>
Remote origin: 172.16.7.218
Remote state: <Interconnect-DC>
Interconn advertisement route status: DC route created
History db: <No entries>
root@border-leaf6_re>
root@border-leaf6_re>
The DC route-created status is saying that even if we received the MAC@ from Inter-
connect (remote state), the local DC route was also created and will be advertised to the
local site spines and server leaf devices.
In the EVPN-VXLAN-to-EVPN-MPLS stitching ethernet bridging scenario, similarly
to what was discussed in the EVPN-VXLAN-to-EVPN-VXLAN stitching, the flooding
mesh-groups will be installed at all the border-leaf nodes for the given EVI and bridge-
domain. This is something you can also verify at the MX level using the following
command:
root@border-leaf6_re> show bridge flood instance MACVRF101 detail
Name: MACVRF101
CEs: 0
VEs: 6
Bridging domain: bd1001
129 EVPN Instance (EVI) Provisioning When Stitching to MPLS
Flood Routes:
Prefix Type Owner NhType NhIndex
0x30004/51 FLOOD_GRP_COMP_NH __ves__ ulst 1048580
0x30006/51 FLOOD_GRP_COMP_NH __wan_flood__ ulst 1048579
0x30003/51 FLOOD_GRP_COMP_NH __re_flood__ ulst 1048584
root@border-leaf6_re>
For the flood groups in the case of the EVPN-VXLAN-to-EVPN-MPLS stitching will
have part of the member logical interfaces on the VXLAN domain side and part on the
MPLS domain side:
root@border-leaf6_re> show bridge flood instance MACVRF101 extensive
Name: MACVRF101
CEs: 0
VEs: 6
Bridging domain: bd1001
Flood route prefix: 0x30004/51
Flood route type: FLOOD_GRP_COMP_NH
Flood route owner: __ves__
Flood group name: __ves__
Flood group index: 0
Nexthop type: ulst
Nexthop index: 1048580
Flooding to:
Name Type NhType Index
__wan_flood__ Group comp 609
Composition: flood-to-all
Component flood-nh(s) (for flooding to EVPN core):
Index Peer NH-Type
608 172.16.7.217 comp (IM/SH)
637 172.16.7.218 comp (IM/SH)
Flood route prefix: 0x30006/51
Flood route type: FLOOD_GRP_COMP_NH
Flood route owner: __wan_flood__
Flood group name: __wan_flood__
Flood group index: 15
Nexthop type: ulst
Nexthop index: 1048579
Flooding to:
Name Type NhType Index
__ves__ Group comp 625
Composition: flood-to-all
Flooding to:
Name Type NhType Index RVTEP-IP
vtep.32768 CORE_FACING venh 621 172.16.7.12
vtep.32769 CORE_FACING venh 622 172.16.7.11
vtep.32770 CORE_FACING venh 623 172.16.7.13
vtep.32771 CORE_FACING venh 640 172.16.7.115
Flood route prefix: 0x30003/51
Flood route type: FLOOD_GRP_COMP_NH
Flood route owner: __re_flood__
Flood group name: __re_flood__
Flood group index: 65534
Nexthop type: ulst
Nexthop index: 1048584
Flooding to:
Name Type NhType Index
130 Chapter 6: Seamless EVPN-VXLAN to EVPN-MPLS Stitching - Implementation and Verification
__ves__ Group comp 625
Composition: flood-to-all
Flooding to:
Name Type NhType Index RVTEP-IP
vtep.32768 CORE_FACING venh 621 172.16.7.12
vtep.32769 CORE_FACING venh 622 172.16.7.11
vtep.32770 CORE_FACING venh 623 172.16.7.13
vtep.32771 CORE_FACING venh 640 172.16.7.115
Flooding to:
Name Type NhType Index
__wan_flood__ Group comp 609
Composition: flood-to-all
Component flood-nh(s) (for flooding to EVPN core):
Index Peer NH-Type
608 172.16.7.217 comp (IM/SH)
637 172.16.7.218 comp (IM/SH)
root@border-leaf6_re>
Finally, after all the border-leaf level verification, you may need to quickly verify the in-
formation regarding the site B originated MAC@ at the server-leaf level in site B.
root@LEAF11_re> show mac-vrf forwarding mac-table instance MACVRF101
MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
Ethernet switching table : 3 entries, 3 learned
Routing instance : MACVRF101
VLAN MAC MAC Logical SVLBNH/ Active
name address flags interface VENH Index source
VLAN1001 00:50:56:ab:01:01 DLR ae0.0
VLAN1001 00:50:56:ab:01:03 DR vtep.32770 172.16.7.13
VLAN1001 00:50:56:ab:01:04 DR esi.1764
00:00:11:11:11:11:11:11:11:11
{master:0}
root@LEAF11_re>
You can see that from the site A server-leaf LEAF11 perspective the server4 located in
DC site B is reachable via the local fabric border-leaf5/6 iESI
00:00:11:11:11:11:11:11:11:11 and not through his original ESI
00:00:88:88:88:88:88:88:21:22 enabled at LEAF21/LEAF22 in site B.
It means that even if we were learning many MAC@s from site B, the local server leaf
will be installing just the local site border-leaf iESI and local next-hops, instead of all the
remote site ESIs and next-hops. This aspect helps optimizing the TCAM resources used
at the low-end server leaf nodes and simply connect more servers in each data center site
without directly impacting the switches scale between two data center sites.
Additionally, you can confirm the local site A server leaf LEAF11 tunnels:
root@LEAF11_re> show mac-vrf forwarding VXLAN-tunnel-end-point remote instance MACVRF101
Logical System Name Id SVTEP-IP IFL L3-Idx SVTEP-Mode ELP-SVTEP-IP
<default> 0 172.16.7.11 lo0.0 0
RVTEP-IP L2-RTT IFL-Idx Interface NH-Id RVTEP-Mode ELP-
IP Flags
172.16.7.12 MACVRF101 557 vtep.32771 1762 RNVE
VNID MC-Group-IP
131 EVPN Instance (EVI) Provisioning When Stitching to MPLS
51001 0.0.0.0
RVTEP-IP L2-RTT IFL-Idx Interface NH-Id RVTEP-Mode ELP-
IP Flags
172.16.7.13 MACVRF101 556 vtep.32770 1722 RNVE
VNID MC-Group-IP
51001 0.0.0.0
RVTEP-IP L2-RTT IFL-Idx Interface NH-Id RVTEP-Mode ELP-
IP Flags
172.16.7.115 MACVRF101 555 vtep.32769 1716 RNVE
VNID MC-Group-IP
51001 0.0.0.0
RVTEP-IP L2-RTT IFL-Idx Interface NH-Id RVTEP-Mode ELP-
IP Flags
172.16.7.116 MACVRF101 568 vtep.32772 1763 RNVE
VNID MC-Group-IP
51001 0.0.0.0
{master:0}
root@LEAF11_re>
As expected, the LEAF11 in site A just has local site server-leaf and border-leaf tunnels
reducing the load of the local node. The tunnels to RVTEP 172.16.7.115 and
172.16.7.116 are the tunnels used to reach the border-leaf5/6 and then stitching at the
borders to the MPLS domain in seamless way.
Chapter 7
In the previous scenario we covered the use case of DCI where VXLAN is stitching to
MPLS for bridging/switching L2 DCI purposes and where the same VLAN/bridge-do-
main is stretched between the DC sites.
In some cases, some IP prefixes may become specific to a given DC site and be used for
the reachability of services running in the DC by the external users located in the POP
locations.
In this case, when there is an existing MPLS core, the border-leaf nodes advertise the
site-specific IP prefixes as IPVPN NLRI and the encapsulation format changes from the
original EVPN-VXLAN (Type-5 route signalized within the fabric) to MPLS IPVPN at
the DCI level. This is the DCI option-6 we highlighted in the summary of the DCI op-
tions early on in Chapter 2.
We will break down this option in the following example where two DC sites are com-
municating using just IP prefixes for a tenant/service. Both sites keep the tenant IP virtu-
alization from the server leaf nodes to the border-leaf. The border leaf changes the
encapsulation format from VXLAN to MPLS. In this case, the border-leaf also becomes
a PE node from the IPVPN MPLS network perspective.
The routing-instance used for this scenario at the border-leaf is the same one used for
Type-5 IP prefix advertisement purposes, however, because the peering between the sites
also uses the inet-vpn unicast type, the IP prefixes we receive from remote location PE
will have the precedence for the MPLS encapsulation and will be installed in the forward-
ing table with the LSP label push/pop information.
The topology used is shown in Figure 7.1, where two sites are connected to each other via
the core MPLS network.
133
Prefixes from one site to the other are reflected by the core route-reflector which is also
used as a P device. The intermediate MX tap node in the middle is used here for lab ex-
amination of the encapsulation and is usually not used in real deployments.
set routing-instances MACVRF101 vrf-target target:1:8888 >> shared route-target between leaf11/leaf12
set interfaces ae0 mtu 9100
set interfaces ae0 esi 00:00:88:88:88:88:88:88:11:12 >> ESI value for server1 L2 multihoming
set interfaces ae0 esi all-active
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-id 00:01:88:88:01:01
set interfaces ae0 aggregated-ether-options lacp admin-key 1
set interfaces ae0 unit 0 family bridge interface-mode trunk
set interfaces ae0 unit 0 family bridge VLAN-id-list 10 >> vlan-id for the IRB.10 with prefix
10.10.10.0/24
set interfaces ae0 unit 0 family bridge VLAN-id-list 20
set interfaces ae0 unit 0 family bridge VLAN-id-list 1001
set interfaces xe-0/3/0 gigether-options 802.3ad ae0 >> we associate the given physical interface with
the AE0 aggregate
set chassis aggregated-devices ethernet device-count 2 >> we just decided to use 2 AE interfaces
typically around 24 or 48 AE aggregated interfaces can be enabled locally to connect servers
set chassis network-services enhanced-ip >> needed only when MX is used as a leaf node
As border-leaf5 is not enabled for any L2 stretch purposes, we don’t have to provision any
MAC-VRF instance.
A similar configuration is deployed on border-leaf6, where only the IPVRF is needed and
the overlay iBGP EVPN signalization towards the core-IP RR.
To enable internetworking between VXLAN and MPLS, the iBGP overlay peering is
used towards the core IP WAN route-reflector 172.16.7.20.
Config 41 border-leaf5 iBGP WAN DCI peering for IPVPN internetworking use-case
The border-leaf in site A and B are connected as PE nodes to the core IP MPLS so they
use ISIS underlay to reach the route-reflector and the remote site loopback IP
addresses.
Here are the configurations used for ISIS and MPLS.
set protocols isis interface xe-0/2/0.0 >> underlay interface used to connect to the core-IP MPLS
set protocols isis interface lo0.0 >> loopback used for IPVPN-MPLS purposes
set protocols isis level 1 disable >> we decided to use only level 2 ISIS for simplicity
set protocols ldp interface xe-0/2/0.0 >> to advertise the MPLS labels we use the LDP at the border-
leaf in both DC sites
set protocols ldp interface lo0.0
set protocols mpls interface xe-0/2/0.0 >> interface connected to the core MPLS is enabled at the
MPLS protocol level
set interfaces xe-0/2/0 mtu 9216
set interfaces xe-0/2/0 unit 0 family inet address 192.168.14.2/24
set interfaces xe-0/2/0 unit 0 family iso >> to run the ISIS underlay we enable that family
explicitly at the underlay interface
set interfaces xe-0/2/0 unit 0 family mpls >> we also specify the MPLS family for the underlay port
connected to the core
set interfaces xe-0/3/0 unit 0 family inet address 192.168.7.2/24
set interfaces xe-0/3/1 unit 0 family inet address 192.168.8.2/24
set interfaces lo0 unit 0 family inet address 172.16.7.115/32 primary
set interfaces lo0 unit 0 family iso address 49.0001.1720.1600.7115.00 >> is required for the ISIS to
be used for DCI underlay
set interfaces lo0 unit 1 family inet address 172.16.100.115/32
set routing-instances T5-VRF1 protocols evpn ip-prefix-routes export my-t5-export-VRF1 >> define which
prefixes to inject
set routing-instances T5-VRF1 interface lo0.1
set routing-instances T5-VRF1 route-distinguisher 172.16.7.217:100
set routing-instances T5-VRF1 vrf-target target:5:8888 >> the same route-target in site B is used as
in the site A
set routing-instances T5-VRF1 vrf-table-label
##define which IP prefixes from site A will be injected as Type-5 EVPN prefixes on site B
set policy-options policy-statement my-t5-export-VRF1 term term1 from route-filter 172.16.100.217/32
exact
set policy-options policy-statement my-t5-export-VRF1 term term2 from route-filter 10.10.100.217/32
exact
set policy-options policy-statement my-t5-export-VRF1 term term2 from route-filter 10.10.10.0/24
orlonger
set policy-options policy-statement my-t5-export-VRF1 term term2 from route-filter 10.10.20.0/24
orlonger
set policy-options policy-statement my-t5-export-VRF1 term term2 then accept
set chassis network-services enhanced-ip
Once all the configurations are in place, we want to verify the data plane part before and
after the stitching from VXLAN to MPLS at border-leaf5/6. We send an ICMP echo
request from server1 located in DC A to server4 located in DC B. And we observe the
following data plane packet capture (Figure 7.2) with two different encapsulation
formats.
First with VXLAN when packet arrives at border-leaf5. It is using VNI 1100.
Then we observe it in MX_tap node, inside the core MPLS network, after the stitching
occurs.
141 Implementation and Verification
Figure 7.2 The ICMP Packet Before IP Stitching From Type-5 EVPN-VXLAN Domain To IPVPN-
MPLS (Ipvpn Internetworking)
After stitching from VXLAN to MPLS, using IPVPN EVPN internetworking, the data
packet is using transport MPLS label 299952 and label 18 for the given IPVPN. See Fig-
ure 7.3.
Figure 7.3 The ICMP Packet After IP Stitching from Type-5 EVPN-VXLAN Domain to IPVPN-
MPLS (Ipvpn Internetworking)
We are also able to confirm, on border-leaf5, which MPLS label is used towards the core
IP to reach server4 in DC site B:
root@border-leaf5# run show route forwarding-table destination 10.10.30.104/32 vpn T5-VRF1 detail
Routing table: T5-VRF1.inet
Internet:
Destination Type RtRef Next hop Type Index NhRef Netif
10.10.30.104/32 user 0 ulst 1048577 8
indr 1048587 5
192.168.14.1 Push 18, Push 299936(top) 653 2 xe-0/2/0.0
indr 1048582 4
192.168.14.1 Push 18, Push 299952(top) 652 2 xe-0/2/0.0
[edit]
root@border-leaf5#
142 Chapter 7: EVPN-VXLAN T5-to-IPVPN-MPLS Internetworking Implementation and Verification
We can also see the two different transport MPLS labels – 299936 as well as 299952 be-
cause the destination prefix 10.10.30.104/32 (server4 in site B) is reachable through two
borders, border-leaf7/8. The IPVPN inner label18 allocated to the given IPVPN is com-
mon and will be used for the final recursive resolution at the border-leaf nodes7/8 in DC
site B.
Getting to the bottom of the MPLS and the IP destination is important but what about
checking the source IP 10.10.10.101/32 from the border-leaf5 perspective?
You can trace this back using the same approach as before by additionally calling the pfe
for recursive resolution of the next-hop-id:
root@border-leaf5> show route forwarding-table destination 10.10.10.101/32 vpn T5-VRF1
Routing table: T5-VRF1.inet
Internet:
Destination Type RtRef Next hop Type Index NhRef Netif
10.10.10.101/32 user 0 ulst 1048592 6
indr 1048585 3
comp 591 2
indr 1048590 3
comp 594 2
root@border-leaf5> request pfe execute command “show nhdb id 591 det” target fpc0
SENT: Ukern command: show nhdb id 591 det
ID Type Interface Next Hop Addr Protocol Encap MTU Flags PFE
internal Flags
----- -------- ------------- --------------- ---------- ------------ ---- ------------------
------------------
591 Compst - - IPv4 - 0 0x0000000000000000
BFD Session Id: 0
Composite NH:
Function: Tunnel Function
Hardware Index: 0x0
Composite flag: 0x0
Composite pfe flag: 0xe
Lower-level NH Ids:
Derived NH Ids:
Tunnel Data:
Type : VXLAN
Tunnel ID: 806354950
Encap VRF: 0
Decap VRF: 8
MTU : 0
Flags : 0x0
AnchorId : 0
Encap Len: 53
Mode : Encap-Decap
Encap : 0x01 0x73 0x07 0x10 0xac 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x0b 0x07 0x10 0xac 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x4c 0x04 0x00 0x00 0x4c 0x04 0x00
0x00 0x2c 0x6b 0xf5 0xe7 0xb1 0xf0 0x2c
0x6b 0xf5 0xe8 0xc1 0xf0
Data Len : 0
Encap VXLAN id: 1100
Decap VXLAN id: 1100
Src mac: 2c.6b.f5.e8.c1.f0
143 Implementation and Verification
We can observe that from border-leaf5, to reach out server1, the VXLAN encapsulation
is used towards leaf11 (172.16.7.11) and is mapped as dynamic tunnel for the Type-5
IPVPN called T5-VRF1.
Chapter 8
After all the implementation examples from Chapters 6 and 7 covering VXLAN-to-VX-
LAN and VXLAN-to-MPLS data center interconnect options, we think most readers
will be comfortable discussing more specific implementations and be able to replicate the
example in a more complex scenario – for example when more than just two data center
sites are used.
The various data center interconnect options covered in this book shows how flexible the
EVPN control plane has become over the last couple of years. It proves that it can be
used in greenfield scenarios, stitching VXLAN-to-VXLAN, as well as brownfield where
sometimes VXLAN-to-MPLS stitching is a better choice to quickly interconnect remote
data centers, as well as larger scale POP locations or remote campuses sites, allowing
them to access data from different data center locations.
From the outcomes point of view, we think the scaling optimization (reduced number of
tunnels, next hops, logical interfaces) is one part of the story, while the second is purely
related to operations where we fully control which workloads gets extended to which re-
mote location.
As a matter of fact, some data may need to be replicated just in some locations while oth-
er may need to go to a different location due to legislation rules of the given organization.
This book is covering six different DCI options with a bigger focus on option-3 (EVPN-
VXLAN-to-EVPN-VXLAN for L2 bridging/switching purposes) and option-5 (EVPN-
VXLAN-to-EVPN-VXLAN for L3 IP prefix advertisement purposes). While the MPLS
DCI is still very popular, we think, in many greenfield environments, that the VXLAN-
to-VXLAN option-3/5 are sufficient and fulfill most of the requirements, opening it with
additional emerging use cases where Group Based Policy (GBP) profiles are also extended
between DC sites for micro-segmentation purpose.
145
We believe DCI options may also evolve to SRv6 use cases whenever the new core-IP
infrastructure moves to SRv6-based signalization. This is something we believe may hap-
pen for greenfield 5G core infrastructure and will be mainly needed for the IP prefixes
site and domain segmentation.
We also covered the new domain path BGP attribute (D-PATH) which automatically
protects the DCI solution from any routing loops and avoids implementation complexi-
ties at the border-leaf level.
The security of the DCI was also highlighted and can become more important for some
organizations. For example, when, instead of traditional service chaining, we decided to
use firewalls as border-leaf devices.
The DCI options covered in this book, deployed at a larger number of sites, will have full
success when automated by the Apstra intent-based networking tool.
For the control of the operation and management, additional protocols like BFD inside
VXLAN may be used while, in the case of MPLS stitching, the traditional OAM MPLS
toolset will stay very important for the monitoring of the state of the DCI.