Cisco ACI Zero To Hero
Cisco ACI Zero To Hero
Cisco ACI Zero To Hero
Zero to Hero
A Comprehensive Guide to Cisco ACI
Design, Implementation, Operation,
and Troubleshooting
—
Jan Janovic
Cisco ACI: Zero to Hero
A Comprehensive Guide to Cisco ACI
Design, Implementation, Operation,
and Troubleshooting
Jan Janovic
Cisco ACI: Zero to Hero: A Comprehensive Guide to Cisco ACI Design,
Implementation, Operation, and Troubleshooting
Jan Janovic
Prague, Czech Republic
Introduction������������������������������������������������������������������������������������������������������������xix
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
viii
Table of Contents
ix
Table of Contents
x
Table of Contents
xi
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 591
xii
About the Author
Jan Janovic, 2x CCIE #5585 (R&S|DC) and CCSI #35493, is
an IT enthusiast with 10+ years of experience with network
design, implementation, and support for customers from
a wide variety of industry sectors. Over the past few years,
he has focused on datacenter networking with solutions
based on mainly, but not limited to, Cisco Nexus platforms–
traditional vPC architectures, VXLAN BGP EVPN network
fabrics, and Cisco ACI software-defined networking. All have
an emphasis on mutual technology integration, automation,
and analytic tools. Another significant part of his job is the delivery of professional
training for customers all around Europe.
He holds a master’s degree in Applied Network Engineering – Network Infrastructure
from the Faculty of Management Science and Informatics, University of Zilina, Slovakia.
During his university studies, he led a group of students in successfully developing
the world’s first open-source EIGRP implementation for the Quagga Linux package
(development currently continued under the name FRRouting). He also contributed to
OSPF features there. His technical focus additionally expands to system administration
of Windows and Linux stations plus public cloud topics related to the design and
deployment of AWS solutions.
He mantains the following certifications: Cisco Certified Internetworking Expert in
Routing and Switching, Cisco Certified Internetworking Expert in Datacenter, Cisco
Certified Professional in DevNet, Cisco Certified Systems Instructor, AWS Certified
Solution Architect – Associate, AWS Certified Developer – Associate, and AWS Certified
SysOps Administrator – Associate, ITILv4.
xiii
About the Technical Reviewer
David Samuel Peñaloza Seijas works as a Principal
Engineer at Verizon Enterprise Solutions in the Czech
Republic, focusing on Cisco SD-WAN, ACI, OpenStack,
and NFV. Previously, he worked as a Data Center Network
Support Specialist in the IBM Client Innovation Center in
the Czech Republic. As an expert networker, David has a
wide diapason of interests; his favorite topics include data
centers, enterprise networks, and network design, including
software-defined networking (SDN).
xv
Acknowledgments
First and foremost, I would like to again express my deep and sincere gratitude to my
family and friends, who are one the most important pillars to achieving anything in
life, even writing a book like this. Thank you, my darlings Simona and Nela. You were
the best support in this challenging feat, throughout all those endless days and nights
when writing on the road, in the hospital, or even during a rock concert. Thank you, my
parents, Vladimír and Libuša, and brother, Vladimir, for giving me the best possible
upbringing, supporting my higher education, being my positive idols, and always
encouraging me to follow my dreams.
Sometimes there are moments in life that completely change its direction and
become a foundation for a lifelong positive experience. For me, this happened when I
first enrolled in Peter Paluch’s networking classes during my university studies in 2011.
Thank you, Peter, for sparking a huge networking enthusiasm in me and for being my
mentor and closest friend ever since. You gave me unprecedent knowledge and wisdom
to become professionally as well as personally who I am now.
I cannot be more grateful to the Apress team, especially Aditee Mirashi, for giving
me the opportunity of a lifetime to put my knowledge and practical skills gained over the
years into this book and for guiding me through the whole publishing process.
It was an honor to cooperate with my friend David Samuel Peñaloza Seijas, who
as a technical reviewer always provided spot-on comments and helped me bring the
content of this book to the highest possible level. Additionally, thanks a lot Luca Berton
for your help with automation content and language enhancements.
Big appreciation goes to my employer ALEF NULA, a. s. and my coworkers for their
constant support and for providing me with all the necessary equipment and significant
opportunities to grow personally as well as professionally and gain the knowledge and
skills which ended up in this book.
I cannot forget to express my gratitude to my alma mater and its pedagogical
collective, Faculty of Management Science and Informatics, University of Zilina,
Slovakia, for giving me rock-solid knowledge and a lot of practical skills, thus thoroughly
preparing me for my professional career.
xvii
Acknowledgments
xviii
Introduction
Dear reader, whether you are network architect, engineer, administrator, developer,
student, or any other IT enthusiast interested in modern datacenter technologies and
architectures, you are in the right place. Welcome!
In this book, you will explore the ongoing datacenter network evolution driven by
modern application requirements, leading to the next-generation networking solution
called Cisco Application Centric Infrastructure (ACI). It doesn’t matter if you are
completely new to ACI or you already have some experience with the technology, my
goal in this book is to guide you through the whole implementation lifecycle. I will
show you how to simplify the deployment, operation, and troubleshooting of your
multi-datacenter networking environment using ACI, how to integrate it effectively
with L4-L7 devices, virtualization and containerization platforms, and how to provide
unprecedented visibility and security, all with an emphasis on programmability and
automation.
You will start “from zero” and discover the story behind ACI. You will build the strong
fundamental knowledge about its main components and explore the advantages of the
hardware-based Leaf-Spine architecture composed of Nexus 9000 switches. During the
first chapters, I describe all of ACI’s design options with their specifics, followed by a
detailed guide of how to deploy and initially configure the whole solution according to
best practices. You will then assemble all the necessary “access policies” for connecting
end hosts to ACI and utilize its multi-tenancy capabilities to create logical application
models and communication rules on top of the common physical network. You will
learn about control-plane and dataplane mechanisms running under the hood, resulting
in correct forwarding decisions, and I will help you with troubleshooting any potential
problems with end host connectivity in ACI. I will also describe both Layer 2 and
Layer 3 external connectivity features to expose datacenter applications and services
to the outside world. I will cover integration capabilities with L4-L7 security devices
or loadbalancers as well as virtualization solutions. At the end, you will learn how to
effectively use ACI’s REST API to further automate the whole solution using the most
common tools: Postman, Python, Ansible, or Terraform.
xix
Introduction
Many times throughout the book I provide my own recommendations and views
based on knowledge and experience gained from real-world implementations,
combined with best practices recommended directly by Cisco. I constantly try as much
as possible to support each topic with practical examples, GUI/CLI verification tools,
and troubleshooting tips to ultimately build in you the strong confidence to handle any
ACI-related task.
Although this book is not directly related to any Cisco certification, I hope it
will become a valuable resource for you if you are preparing for the CCNP or CCIE
Datacenter exams and that you will use it as a reference later for your own ACI projects.
Let’s now dive into the first chapter, which is related to network evolution from
legacy architecture to Cisco ACI.
xx
CHAPTER 1
Introduction: Datacenter
Network Evolution
What is a datacenter (or “data center,” which will be used interchangeably throughout
the book)? It’s a fairly simple question at first blush to many IT professionals, but let’s
start with proper definition of this term.
From a physical perspective, a datacenter is a facility used by various organizations
to host their mission-critical data. Companies can invest in their own hardware and
software resources, which will stay completely under their control, maintenance, and
operation, or consume services from public clouds in the form of XaaS (Infrastructure/
Platform/Software as a Service).
Internally, we can divide a datacenter into a complex multi-layered architecture
of compute, storage, and networking equipment, interconnected to provide fast and
reliable access to shared application resources (see Figure 1-1). The networking layer
of each datacenter is especially important for the whole system to work correctly. Keep
in mind that you can have the most powerful blade server chassis with petabytes of
storage space, but without robust, reliable, and secure networking, you won’t change
the world. Some IT architects often underestimate the importance of this “datacenter
undercarriage.” My goal in this publication is to give you all the necessary knowledge
and skills needed to design, configure, and operate one of the industry-leading data
center SDN networking solutions, Cisco Application Centric Infrastructure (ACI).
1
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_1
Chapter 1 Introduction: Datacenter Network Evolution
(QWHUSULVH$SSOLFDWLRQV
0RQLWRULQJ0DQDJHPHQW$XWRPDWLRQ'HYHORSHPHQW
,QIRUPDWLRQ6\VWHPV&RQWHQW'HOLYHU\
0LGGOHZDUH
:HE6HUYHUV$SSOLFDWLRQ6HUYHUV&RQWHQW0DQDJHPHQW
2SHUDWLQJ6\VWHPV
:LQGRZV/LQX[9LUWXDOL]DWLRQ+\SHUYLVRUV
6HUYHUDQG6WRUDJH(TXLSPHQW
5DFNPRXQW6HUYHUV%ODGH&KDVVLVDQG9LUWXDOL]HG+:
'DWDFHQWHU1HWZRUN(TXLSPHQW
/$1DQG6$1,QIUDVWUXFWXUHWR,QWHUFRQQHFW$OO5HVRXUFHV
(OHFWULFLW\&RROLQJ3RZHU%DFNXS
5HGXQGDQW3'8V%DFNXS(OHFWULFLW\*HQHUDWRUV
'DWDFHQWHU3K\VLFDO%XLOGLQJ)DFLOLW\
3K\VLFDO$FFHVV6HFXULW\/RFDWLRQ([SDQVH
2
Chapter 1 Introduction: Datacenter Network Evolution
//%RXQGDU\ &RUH/D\HU
$JJUHJDWLRQ/D\HU
$FFHVV/D\HU
%ODGH6HUYHUV 5DFN6HUYHUV
In a datacenter, each part of such a network design had its specific purpose:
• Core layer
3
Chapter 1 Introduction: Datacenter Network Evolution
• Aggregation layer
• Access layer
• Admission control
The traditional design taken from campus networks was based on Layer 2
connectivity between all network parts, segmentation was implemented using VLANs,
and the loop-free topology relied on the Spanning Tree Protocol (STP). All switches
and their interfaces were implemented as individuals, without the use of virtualization
technologies. Because of this, STP had to block redundant uplinks between core and
aggregation, and we lost part of the potentially available bandwidth in our network.
Scaling such an architecture implies growth of broadcast and failure domains as well,
which is definitely not beneficial for the resulting performance and stability. Imagine
each STP Topology Change Notification (TCN) message causing MAC tables aging in
the whole datacenter for a particular VLAN, followed by excessive BUM (Broadcast,
Unknown Unicast, Multicast) traffic flooding, until all MACs are relearned. The impact
of any change or disruption could be unacceptable in such a network. I’ve seen multiple
times how a simple flapping interface in a DC provider L2 network caused constant
traffic disruptions. Additionally, the 12-bit VLAN ID field in an 802.1Q header limits the
number of possible isolated network segments for datacenter customers to 4,096 (often
even less due to internal switch reserved VLANs).
One of the solutions recommended and implemented for the mentioned STP
problems in the past was to move the L3 boundary from the core layer to aggregation
and divide one continuous switched domain into multiple smaller segments. As shown
in Figure 1-3, by eliminating STP you can use ECMP load balancing between aggregation
and core devices.
4
Chapter 1 Introduction: Datacenter Network Evolution
&RUH/D\HU
/ $JJUHJDWLRQ/D\HU
/
$FFHVV/D\HU
%ODGH6HUYHUV
%ODGH 6HUYHUV 5DFN6HUYHUV
Let’s Go Virtual
In 1999, VMware introduced the first x86 virtualization hypervisors, and the network
requirements changed. Instead of connecting many individual servers, we started to
aggregate multiple applications/operating systems on common physical hardware.
Access layer switches had to provide faster interfaces and their uplinks had to scale
accordingly. For new features like vMotion, which allows live migration of virtual
machines between hosts, the network had to support L2 connectivity between all
physical servers and we ended up back at the beginning again (see Figure 1-4).
5
Chapter 1 Introduction: Datacenter Network Evolution
/D\HU
&RUH/D\HU
$JJUHJDWLRQ/D\HU
/D\HU
$FFHVV/D\HU
%ODGH6HUYHUV
%ODGH 6HUYHUV 5DFN6HUYHUV
To overcome the already-mentioned STP drawbacks and to keep support for server
virtualization, we started to virtualize network infrastructure too. As you can see in
Figure 1-5, switches can be joined together into switch stacks, VSS, or vPC architectures.
Multiple physical interfaces can be aggregated into logical port channels, virtual port
channels, or multi-chassis link aggregation groups (MLAG). From the STP point of view,
such an aggregated interface is considered to be an individual, therefore not causing
any L2 loop. In a virtualized network, STP does not block any interface but runs in the
background as a failsafe mechanism.
Virtualization is deployed on higher networking levels as well. Let’s consider
virtual routing and forwarding instances (VRFs): they are commonly used to address
network multi-tenancy needs by separating the L3 address spaces of network consumers
(tenants). Their IP ranges can then be overlapping without any problems
These various types of virtualizations are still implemented in many traditional
datacenters, and they serve well for smaller, not-so-demanding customers.
6
Chapter 1 Introduction: Datacenter Network Evolution
&RUH/D\HU
/D\HU
$JJUHJDWLRQ/D\HU
$FFHVV/D\HU
%ODGH6HUYHUV 5DFN6HUYHUV
Server virtualization also introduced a new layer of virtual network switches inside
the hypervisor (see Figure 1-6). This trend increased the overall administrative burden,
as it added new devices to configure, secure, and monitor. Their management was often
the responsibility of a dedicated virtualization team and any configuration changes
needed to be coordinated inside the company. Therefore, the next requirement for a
modern datacenter networking platform is to provide tools for consistent operation
of both the physical network layer and the virtual one. We should be able to manage
encapsulation used inside the virtual hypervisor, implement microsegmentation, or
easily localize any connected endpoint.
7
Chapter 1 Introduction: Datacenter Network Evolution
&RUH/D\HU
$JJUHJDWLRQ/D\HU
$FFHVV/D\HU
WŚLJƐŝĐĂů ^ĞƌǀĞƌ ʹ sŝƌƵƚĂůŝnjĂƟŽŶ ,ŽƐƚ WŚLJƐŝĐĂů ^ĞƌǀĞƌ ʹ sŝƌƵƚĂůŝnjĂƟŽŶ ,ŽƐƚ WŚLJƐŝĐĂů ^ĞƌǀĞƌ ʹ sŝƌƵƚĂůŝnjĂƟŽŶ ,ŽƐƚ
Along with continuous changes in IT and network infrastructures, we can see another
significant trend in datacenter networking: a higher adoption of containerization,
microservices, cloud technologies, and deployment of data-intensive applications is causing
the transition from a north-south traffic pattern to a majority of east-west traffic. According
to Cisco Global Cloud Index, more than 86% of traffic stayed inside datacenters in 2020.
8
Chapter 1 Introduction: Datacenter Network Evolution
6SLQH6ZLWFKHV
/HDI6ZLWFKHV
9
Chapter 1 Introduction: Datacenter Network Evolution
In order to use all the available bandwidth between leaves and spines, their
interconnection needs to be based on Layer 3 interfaces. This eliminates the STP need and
provides support for ECMP load-balancing. However, to support virtualization or various other
applications features, we still need to ensure Layer 2 connectivity somehow. The solution here
is to simply encapsulate any (Layer 2 or Layer 3) traffic at the edge of the network into a new,
common protocol called Virtual eXtensible Local Area Network (VXLAN). See Figure 1-8.
2ULJLQDO(WKHUQHW)UDPH
9;/$1(QFDSVXODWLRQ
10
Chapter 1 Introduction: Datacenter Network Evolution
So why are we even discussing Cisco ACI in this book? All of the previously described
requirements can be solved by implementing this centrally managed, software-defined
networking in your datacenter.
11
Chapter 1 Introduction: Datacenter Network Evolution
Summary
In subsequent chapters, I will cover each ACI component in detail and you will learn
how to design and implement this solution while following best practices and many of
my personal practical recommendations, collected during multiple ACI deployments for
various customers. They won’t always necessarily match according to the situation they
are considered in, but the goal is the same: to provide you with a comprehensive toolset
to become confident with various ACI-related tasks.
12
CHAPTER 2
ACI Fundamentals:
Underlay Infrastructure
In this chapter, you will focus on establishing a proper understanding of the main ACI
components: the underlay network infrastructure and APIC controllers with their
architectural deployment options. In ACI, hardware-based underlay switching offers a
significant advantage over various software-only solutions due to specialized forwarding
chips. Thanks to Cisco’s own ASIC development, ACI brings many advanced features
including security policy enforcement, microsegmentation, dynamic policy-based
redirect (to insert external L4-L7 service devices into the data path) or detailed flow
analytics—besides the huge performance and flexibility.
13
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_2
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• Multi-speed 100M/1/10/25/40/50/100G/400G
• Intelligent buffering
14
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• Encryption technologies
15
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• Interface/queue info
• Flow latency
• Along with flows, you can also gather the current state of the
whole switch, utilization of its interface buffers, CPU, memory,
various sensor data, security status, and more.
16
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Ingress Slice 2
Egress Slice 2
Ingress Slice n
Egress Slice n
Let’s dig deeper and look at the packet processing workflow inside the individual
CloudScale slice, as shown in Figure 2-3. On the ingress part of a slice, we receive the
incoming frame through the front switch interface using the Ingress MAC module. It’s
forwarded into the Ingress Forwarding Controller, where the Packet Parser identifies
what kind of packet is inside it. It can be IPv4 unicast, IPv6 unicast, multicast, VXLAN
packet, and so on. Based on its type, just the necessary header fields needed for the
forwarding decision are sent in the form of a lookup key to the lookup pipeline. The
switch doesn’t need to process all of the payload inside the packet. However, it will
still add some metadata to the lookup process (e.g., source interface, VLAN, queue
information).
Once we have a forwarding result, ASIC sends the information together with the
packet payload via the slice interconnect to the egress part. There, if needed, the packet
is replicated, buffered in some specific queue according to the Quality of Service
configuration, and we can apply a defined security policy in case of ACI. After that, ASIC
rewrites the fields and puts the frame on the wire.
Each result of this complex forwarding decision for a packet can be analyzed in
detail using advanced troubleshooting tools like the Embedded Logic Analyzer Module
(ELAM). By using it, you can easily monitor if the packet even arrived in the interface,
and if so, what happened inside the ASIC while doing these forwarding actions. You will
find more about ELAM in Chapter 6.
17
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Slice
Ingress Forwarding Controller
Ingress
Packets Ingress Packet Packet Payload
MAC Parser
Lookup Result
Lookup Key
Slice
Lookup Pipeline Interconnect
Replication
18
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Now, let’s have a look at the wide palette of device options when choosing from the
Nexus 9000 family of switches to build the ACI underlay fabric.
Each chassis consists of multiple components, and while some of them are common
for all chassis models, others are device-specific.
19
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Chassis-Specific Components
Based on the size of the chassis, you have to choose specific fabric and fan modules. In
the following section, I will describe them in more detail.
Fabric Module
Internally, Nexus 9500 is designed (architecture-wise) as a leaf-spine topology itself.
Each fabric module (FM) inserted from the rear of the chassis comes with one or
multiple CloudScale ASICs and creates a spine layer for the internal fabric. On the
other hand, line cards with their set of ASICs act as leaf layer connected with all fabric
modules. Together they provide non-blocking any-to-any fabric and all traffic between
line cards is load balanced across fabric modules, providing optimal bandwidth
distribution within the chassis (see Figure 2-5). If a fabric module is lost during the
switch operation, all others continue to forward the traffic; they just provide less
total bandwidth for the line cards. There are five slots for fabric modules in total. If
you combine different line cards inside the same chassis, the fifth fabric module is
automatically disabled and unusable. This is due to the lack of a physical connector for
the fifth module on all line cards (at the time of writing) except for X9736C-FX, which
supports all five of them. Generally, you can always use four fabric modules to achieve
maximal redundancy and overall throughput.
20
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Optional
FM 2 FM 3 FM 4 FM 5 FM 6
X9732C-EX X9736C-FX
• FM-G for 4-slot and 8-slot chassis with LS6400GX ASIC that provides
up to 1.6 Tbps capacity per line card slot.
• FM-E2 for 8-slot and 16-slot chassis with S6400 ASIC that provides up
to 800 Gbps capacity per line card slot.
• FM-E for 4-slot, 8-slot, and 16-slot chassis with older ALE2 ASIC that
provides up to 800 Gbps capacity per line card slot.
Note All fabric modules installed in the chassis should be of the same type.
Fan Module
The Nexus 9500 chassis uses three fan modules to ensure proper front-to-back airflow.
Their speed is dynamically driven by temperature sensors inside the chassis. If you
remove one of the fan trays, all others will speed up 100% to compensate for the loss of
cooling power.
21
Chapter 2 ACI Fundamentals: Underlay Infrastructure
There are two types of fan modules available that are compatible with specific fabric
modules:
22
Chapter 2 ACI Fundamentals: Underlay Infrastructure
System Controller
As you already know, the supervisor is responsible for handling all control plane
protocols and operations with them. A redundant pair of systems controllers further
helps to increase the overall system resiliency and scale by offloading the internal non-
datapath switching and device management functions from the main CPU in supervisor
engines. The system controllers act as multiple internal switches, providing three main
communication paths:
• Ethernet Out of Band Channel (EOBC): 1 Gbps switch for intra node
control plane communication. It provides a switching path via its
own switch chipset, which interconnects all HW modules, including
supervisor engines, fabric modules, and line cards.
• Ethernet Protocol Channel (EPC): 1 Gbps switch for intra node data
protocol communication. Compared to EOBC, EPC only connects
fabric modules to the supervisor engines. If any protocol packets
need to be sent to the supervisor, line card ASICs utilize the internal
data path to transfer the packets to fabric modules. They then
redirect the packet through EPC to the supervisor.
23
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Line Cards
Several line card options are available and universally compatible with any chassis
size. For the purpose of this book, I will list only models related to the Cisco ACI
architecture. As I’ve already described, they need to be based on CloudScale ASICs and
you can deploy them only as ACI spine switches. The choice depends on the number of
interfaces, their speeds, or, for example, CloudSec DCI encryption support needed in
your final design. Internal TCAM scalability for the purpose of ACI is consistent across
all models and allows you to handle 365,000 database entries. TCAM entry is used for
saving information about MAC, IPv4, or IPv6 address of endpoints visible in the fabric
for any host connected to ACI. See Table 2-2.
Note *Inside one chassis you can combine different line cards, but in such
a case, the fifth fabric module will be disabled and you won’t get maximum
throughput for X9736C-FX.
24
Chapter 2 ACI Fundamentals: Underlay Infrastructure
25
Chapter 2 ACI Fundamentals: Underlay Infrastructure
93360YC-FX2 96x 1/10/25G SFP28 and 12x 40/100G 7.2 Tbps Leaf only Yes
QSFP28
93180YC-FX3 48x 1/10/25G SFP28 and 6x 40/100G 3.6 Tbps Leaf only Yes
QSFP28
93108TC-FX 48x 100M/1/10GBASE-T and 6x 40/100G 2.16 Tbps Leaf only Yes
QSFP28
9348GC-FXP 48x 100M/1G BASE-T, 4x 1/10/25G SFP28 969 Gbps Leaf only Yes
and 2x 40/100G QSFP28
93108TC- 48x 100M/1/2.5/5/10G BASE-T and 6x 2.16 Tbps Leaf only Yes
FX3P 40/100G QSFP28
Now with all this knowledge about the Nexus 9000 hardware platform, let’s have a
closer look how it’s used for ACI.
26
Chapter 2 ACI Fundamentals: Underlay Infrastructure
The leaf-spine topology offers simple scalability. The overall bandwidth capacity
of the whole ACI fabric is based on amount of spine switches used. Each spine
adds another and concurrent path for traffic to be sent between any leaf switch. For
production ACI fabric, the general recommendation is to use at least two spines. This
preserves the necessary redundancy in case of failure and allows you to upgrade
them one by one without affecting the production traffic. Adding a new spine switch
is just matter of connecting it correctly to the fabric; after registration in APIC, it will
automatically become a fabric member without disrupting already deployed devices.
On the other hand, if you need to scale your access interfaces, all you need to do
is deploy a new leaf switch (or pair), similar to the previous case. Each new leaf will
become part of the fabric and the control plane will converge automatically, all without
affecting already running devices.
This simplicity applies analogically when removing any device from ACI. To make
sure you won’t cause any service disruption in your production network, ACI offers
maintenance mode (commonly known also as the graceful insertion and removal
feature). If enabled on some ACI switch, this switch is not considered part of the
underlay network anymore and all control plane protocols are gracefully brought down.
Additionally, all host interfaces are shut down.
Min. 40G
ISIS/COOP
Control-plane
27
Chapter 2 ACI Fundamentals: Underlay Infrastructure
To summarize each layer’s purpose for ACI, leaf switches provide the following:
28
Chapter 2 ACI Fundamentals: Underlay Infrastructure
In general, these are the last six high-speed QSPF interfaces of the individual leaf
switches. Besides them, if the leaf is already registered to the fabric and managed by the
APIC, you can convert its downlinks (end-host interfaces) to fabric ports as well. And the
same applies the opposite way: if needed, you can covert the default fabric interface to a
downlink, except for the last two interfaces of each leaf. All these interface manipulations
require a switch reload before the change can take effect.
For the interconnection itself, you can use two main cable types:
a. Multimode fiber: Multimode allows you to use either MPO-12 (12 fibers)
or LC-LC (2 fibers) connectors, both with a maximum reach of 100m when
connected with OM4 cable standard.
29
Chapter 2 ACI Fundamentals: Underlay Infrastructure
,QWHUIDFHV
/ 33 ,QWHUI
U DFHV
,6,6
&223
03%*3
03%*3
2ULJLQDO
97(3'$
97(3 '$ 97(36$
97(3 6$ 9;/$1 (WKHUQHW
(WKHUQHW
3D\ORDG
30
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Spine switches act in ACI as the endpoint knowledge base. As soon as any leaf
identifies a new locally connected endpoint, it will update a spine endpoint database
with this information using Council of Oracles Protocol (COOP). COOP is the second
part of ACI’s main control plane together with IS-IS. As a result, spines in ACI always
know the entire endpoint information and they always synchronize any changes
between them via COOP. If neither leaf nor spine is aware about particular endpoint, you
won’t be able to communicate with it. In some cases, covered later in this book, a spine
can still proactively try to resolve unknown endpoint addresses using the so-called ARP
gleaning process, but if it still won’t help, the communication is dropped.
Then we have the third important control plane operation: external information
management. External prefixes and endpoint information from outside the particular
fabric are distributed between leaves and spines using Multi-Protocol Border Gateway
Protocol (MP-BGP). Prefixes are mutually redistributed on border leaves between
MP-BGP and static information or dynamic routing protocols based on configured
application policies. MP-BGP uses the concept of redundant route reflectors (RRs)
represented by spines to optimize number of BGP peerings needed inside the fabric. You
will look deeper at control/data plane operations and forwarding inside the fabric during
Chapter 6.
ACI Architecture
Even though some ACI implementations start small, by deploying just one fabric,
organizations sooner or later want to expand their datacenter infrastructure. It always
makes sense to have geo-redundant datacenter facilities, with flexible networking
available for each site. But how to get there? Ideally as quickly and simply as possible,
without affecting already deployed resources or compromising already created security
policies. You have wide variety of options when deploying multi-datacenter ACI fabric or
expanding the already existing implementation. The following sections will describe all
of them in detail.
31
Chapter 2 ACI Fundamentals: Underlay Infrastructure
between distant datacenter rooms or floors and you cannot reach the spines directly, or
when there is not enough horizontal cabling available in the datacenter to connect more
distant leaves with all spines.
1H[XV6SLQHV
7LHU$&,/HDIV
7LHU$&,/HDIV
I used Tier-2 leaves several times for implementing server management layer
utilizing “low speed” copper RJ-45 switches (Nexus 9348GC-FXP), while server data
interfaces were connected to the main and “fast” Tier-1 leaves. When there is no
dedicated out-of-band infrastructure available for servers in the DC, this design brings
a reliable and quite redundant way to access your servers and all are centrally managed
using APICs, without increasing the administration load.
Another physical deployment use case for multi-tier ACI is positioning spines to
dedicated network racks in the datacenter room. Then the Tier-1 leaves are installed as
end-of-row (EoR) “aggregation” for each compute rack row and the Tier-2 leaves serve
as a top-of-rack (ToR) switches. This way you can shorten cable runs inside the DC room
and optimize your use of horizontal cabling (many times including patch panels in each
rack). See Figure 2-9.
32
Chapter 2 ACI Fundamentals: Underlay Infrastructure
5RZ
&RPSXWH UDFN
7 /HDI
5RZ
&RPSXWH UDFN 6SLQH
7 /HDI
7 /HDI 7 /HDI 7 /HDI
7 /HDI 7 /HDI 7 /HDI 7 /HDI 1HWZRUN
UDFN
5RZ
&RPSXWH UDFN 6SLQH
7 /HDI
5RZ
&RPSXWH UDFN
7 /HDI
Each Tier-2 leaf can be easily connected with more than two Tier-1 leaves (an
advantage compared to vPC). The limiting factor here is just the ECMP load balancing,
currently supporting 18 links.
Configuration-wise you have to use fabric interfaces for the interconnection. In ACI,
each Nexus 9K has some default fabric interfaces (usually high-speed uplinks, or the last
four-six interfaces) and you can in addition reconfigure any host port to a fabric uplink,
after the switch gets registered with APIC. Here is the recommendation when deploying
a Tier-2 architecture:
• If you plan to connect APICs to the Tier-2 leaf, at least one default
fabric interface of the Tier-2 leaf has to be connected with the default
fabric interface of Tier-1 leaf. Otherwise, APIC won’t be able to
discover and provision the Tier-1.
33
Chapter 2 ACI Fundamentals: Underlay Infrastructure
From a functional point of view, there is no difference between two leaf layers. You
can connect APICs, endpoints, services devices, or external switches/routers to them.
7UDQVLW/HDIV 7UDQVLW/HDIV
$3,&&OXVWHU
The stretched fabric is still a single ACI fabric. All datacenters are part of the
common administration domain and one availability zone. They are configured and
monitored by a single APIC cluster as the one entity, preserving all the fabric capabilities
across the different locations. A maximum of three sites can be part of this stretched
fabric design.
34
Chapter 2 ACI Fundamentals: Underlay Infrastructure
In each fabric, choose at least one, but optimally two, transit leaves, which will be
interconnected with all spine switches at a remote site. 40G interface speed is required
and enforced for these links, similar to standard ACI fabric.
From a redundancy point of view, it’s recommended to provision at least a three-
node APIC cluster with two nodes deployed at the main site and one node in the
other site. The main site can be considered the one with the majority of switches, the
active site, or the main from other operational perspectives. Every change in the ACI
configuration is automatically replicated to all APIC cluster nodes in the stretched fabric.
This synchronous replication of APIC databases brings us to the important latency factor
for a proper cluster performance. You have to ensure maximal 50ms round-trip time
(RTT) between APIC nodes.
Maybe you are asking yourself, what happens when the interconnection between
these stretched sites is lost? Can I possibly lose the configuration or any data? The
answer is no. In such a case, both fabrics continue to work independently and traffic
forwarding inside them is not affected. Information about new endpoints is distributed
using Council-of-Oracles Protocol (COOP) only to local spines, but as soon as the
interconnection is restored, they will merge their databases. What will be affected during
the outage, though, is configurability of the fabric. An APIC cluster needs a quorum
in order to be configurable; the majority of cluster nodes have to be available, or the
configuration becomes read-only. Therefore, if we lose an interconnect, only the first
site with two APIC nodes will be configurable; the second one can be monitored by the
local APIC, but without permission to do any changes. After the link is restored, their
databases will be synchronized as well. A more serious situation is if the first datacenter
is lost completely due to some natural disaster, for example. Then we end up with read-
only second fabric only. We can avoid this condition by deploying a fourth standby APIC
node in the second site and, in case of need, add it to the cluster to regain the quorum.
For more information about APIC clustering, refer to a later section of this chapter.
From a configuration point of view, stretched fabric has no specific requirements;
the control plane is provisioned automatically. You just need to cable the devices
accordingly.
Even though the stretched fabric offers a quite simple architecture, it’s not always
feasible to implement it and there are drawbacks. Many times, you don’t have enough
available fiber cables between the datacenter rooms. Sometimes the sites aren’t
even connected directly but by using intermediate service provider, offering just L3
services. Or there is more scalable architecture needed than three locations. With the
35
Chapter 2 ACI Fundamentals: Underlay Infrastructure
release of newer software versions for ACI, the drawbacks of stretched fabric were
addressed by introducing new design options. Now we have capabilities to extend the
DC infrastructure in a simpler yet powerful way (feature-wise). Therefore, in the next
sections, you will look at Multi-Pod and Multi-Site ACI architectures, which are from a
practical point of view more optimal, they support a broader set of features, and their
scalability is increased to 14 sites, each consisting of 25 Pods (at the time of writing this
book). During the design phase of a Multi-DC ACI deployment, I definitely recommend
going for one of the following options rather than stretched fabric if possible.
Note There is no need to have APIC in each and every Pod. Even without any controller,
but with working IPN connectivity, the Pod is manageable and fully configurable.
/ŶƚĞƌͲWK
EĞƚǁŽƌŬ
;/WEͿ
WK ϭ WK Ϯ
sy>E W
DWͲ'W W
,6,6 ,6,6
&223 &223
03%*3 03%*3
$3,&&OXVWHU
To ensure proper Inter-Pod forwarding, the ACI control plane utilizes a new set of
MP-BGP peerings between the spines of Pods, exchanging the routing, endpoint and
multicast group information (in the form of BGP VPNv4/VPNv6 and L2VPN EVPN
NLRIs). When deploying Multi-Pod, you can either choose to use full mesh BGP peerings
or route reflector servers (RR). The general recommendation from Cisco is to use the first
option whenever possible: full mesh peerings and with only two spines acting as BGP
peers in each Pod, independently from the overall number of spines across all the Pods.
Changes to the BGP database are internally distributed to non-BGP spines using COOP.
A multi-Pod data plane uses VXLAN encapsulation for packet forwarding between
Pods to achieve seamless L2 and L3 services across the common infrastructure. The VXLAN
header carries the same, unaltered information end to end. For detailed information about
control and the data plane operation in a Multi-Pod design, see Chapter 6.
37
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Currently, you can deploy as many as 25 Pods, which makes this architecture
perfect for scaling further from single datacenter deployments. Multi-Pod commonly
finds its usage in various campus datacenters or colocation facilities consisting of
multiple buildings that comprise a single logical datacenter. A disaster recovery
datacenter is another use case, and you gain easier and faster recovery times by avoiding
configuration inconsistencies. In public cloud terminology, the ACI Multi-Pod is similar
to availability zones inside some particular region.
38
Chapter 2 ACI Fundamentals: Underlay Infrastructure
39
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• Quality of Service (QoS): ACI allows you to classify and apply QoS
policies to six user-defined traffic levels plus four internal control
plane levels. It’s important when deploying ACI Multi-Pod to
consider both APIC and a IPN QoS configuration to ensure proper
end-to-end prioritization of important traffic, especially the control
plane. Although QoS configuration is not mandatory for Multi-Pod to
work, it’s highly recommended.
Sometimes all these features may seem overwhelming at first sight, and you may
think they add too much complexity to the architecture, but at the end of the day,
when we go through the whole IPN configuration later, you will realize how nicely they
cooperate together to provide robust, flexible, and optimal forwarding between the
distant datacenters.
40
Chapter 2 ACI Fundamentals: Underlay Infrastructure
shut down the entire datacenter (while you have only two of them), that is exactly the
moment when you want to have 100% confidence that the other is working as expected
on its own.
/ŶƚĞƌ ^ŝƚĞEĞƚǁŽƌŬ
;/^EͿ
6LWH 6LWH
:$1,3
&RQQHFWLYLW\
/ŶƚĞƌ WKͬ^ŝƚĞ
EĞƚǁŽƌŬ
;/WEͬ/^EͿ
6LWH 6LWH
32' 32'
$3,&&OXVWHU
In order to achieve connectivity between spines of different sites, this time we use
a generic inter-site network (ISN). Compared with an IPN, an ISN doesn’t need any
multicast or DHCP relay support. Although the multi-destination BUM traffic within
each site is still delivered using multicasts, when a multicast packet needs to be sent to
any remote site, spines perform so called head-end replication. Each multicast packet
is replicated to multiple unicast packets, and they are sent to destination ACI sites via
41
Chapter 2 ACI Fundamentals: Underlay Infrastructure
the ISN. What we still have to count with, though, is increased MTU due to additional
VXLAN headers and dynamic routing over a statically defined VLAN 4 between the spine
and ISN switch. Back-to-back spine connection is supported as well, but between two
sites only.
A significant emphasis is put on security in transit among the datacenters when
deploying Multi-Site. Since ACI version 4.0(1), the CloudSec feature is available,
allowing you to line-rate encrypt multi-hop L3 traffic between spine VTEP addresses
using a 256-bit AES algorithm.
Remember that each site uses its own APIC cluster in Multi-Site. But still, in case of
need, we have to achieve configuration, security, and administration consistency among
them. Therefore, a new component needs to be introduced into the architecture: Nexus
Dashboard Orchestrator.
• Network Insights: Big data analytics platform, used for gathering and
processing telemetry data from NX-OS and ACI Nexuses 9000
Now let’s get back to ACI Multi-Site. The Nexus Dashboard Orchestrator (NDO)
component acts as the ACI inter-site policy manager and centralized management layer
to provision, monitor, and handle the full lifecycle of networking policies within the
connected sites (as shown in Figure 2-13). Additionally, it ensures policy consistency.
42
Chapter 2 ACI Fundamentals: Underlay Infrastructure
/ŶƚĞƌ ^ŝƚĞEĞƚǁŽƌŬ
;/^EͿ
6LWH 6LWH
:$1,3
&RQQHFWLYLW\
/ŶƚĞƌ WKͬ^ŝƚĞ
EĞƚǁŽƌŬ
;/WEͬ/^EͿ
6LWH 6LWH
32' 32'
EĞdžƵƐ
ĂƐŚďŽĂƌĚ
KƌĐŚĞƐƚƌĂƚŽƌ
$3,& &OXVWHU
It’s the Orchestrator that automatically configures spines for ISN features and
runs the MP-BGP peering between each other. You just establish a connection with
the management interfaces of the APIC clusters, and from then on, part of the ACI
configuration will happen in NDO instead of individual APICs. Based on your needs, you
can choose to which sites the configuration will be pushed.
There are three deployments options available for the Nexus Dashboard platform:
43
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• Public cloud: AWS or Azure can be used as well to host the Nexus
Dashboard
Nexus Dashboard Orchestrator uses the Representational State Transfer (REST) API
when communicating with APICs. It needs either connectivity with APIC’s out-of-band
interface (which is highly recommended from my point of view), an in-band interface,
or both. By using a dedicated and separated out-of-band infrastructure for NDO to the
APIC communication, you ensure manageability of all ACI sites even during any issues/
outages of internal ACI forwarding or problems with in-band access, which can be
caused outside of ACI.
NDO additionally brings another open northbound REST API, so you can include the
whole ACI Multi-Site architecture further in the orchestration layer of your preference.
Cloud ACI
Nowadays, for a majority of companies it’s a must to somehow consume public cloud
services for their IT architecture. From a business, functional, or operation perspective,
public clouds bring significant improvement in reliability, flexibility, elasticity, and
costs (mainly in form of operating expenses, OPEX) compared with running your own
infrastructure.
ACI Multi-Site architecture, thanks to Nexus Dashboard Orchestrator, is not
limited to automating on-premises ACI deployments only. We can move with a whole
networking infrastructure to the cloud as well (as shown in Figure 2-14). As of ACI
version 5.2(3), Amazon Web Services and Microsoft Azure are supported, with other
cloud vendors on the roadmap.
44
Chapter 2 ACI Fundamentals: Underlay Infrastructure
6LWH
&65Y
6LWH
&65Y
:$1,3 $PD]RQ:HE
&RQQHFWLYLW\ 6HUYLFHV
&65Y
&
1H[XV
'DVKERDUG
2UFKHVWUDWRU
0LFURVRIW $]XUH
With Cloud ACI deployment, we reuse all the concepts discussed before in the
Multi-Site architecture. On-premises leaf-spine ACI fabric is via a generic L3 ISN
connected with virtual CSR1000v routers running in the cloud environment. Their
lifecycle is completely managed from creation through maintenance and upgrades to
termination by Nexus Dashboard Orchestrator. NDO is used to set up the control and
data plane protocols (OSPF, MP-BGP, and VXLAN) as well as IPsec encryption between
environments.
In the public cloud, there’s a new component called the Cloud APIC (cAPIC)
appliance. cAPIC is an adapted version of standard APIC but for managing the most
important cloud network resources. It acts as an abstraction layer to translate ACI
45
Chapter 2 ACI Fundamentals: Underlay Infrastructure
,QWHU32'
1HWZRUN
0DLQ'DWDFHQWHU 5HPRWH/RFDWLRQ
,31
46
Chapter 2 ACI Fundamentals: Underlay Infrastructure
An intermediate L3 network connects remote leaves with the main datacenter and
shares similar requirements with the InterPOD network already discussed:
• Increased MTU: All packets sent between remote leaves and ACI
spines are encapsulated in VXLAN, so count an additional 50B in
transport.
BiDir PIM multicast support is not required for remote leaves. Maximum RTT can go
up to 300ms and minimum uplink speed is as low as 10Mbit per second.
47
Chapter 2 ACI Fundamentals: Underlay Infrastructure
To communicate with fabric switches, APIC uses an encrypted Layer 3 channel built
over an infrastructure VLAN (user configurable during the APIC initialization) and based
on assigned VTEP IP addresses. Transmitted instructions for switches are formatted
into the standardized Opflex protocol. An Opflex is responsible for translating your
intents (described by the application model) to the fabric and to declaratively instruct
involved switches on how they should configure themselves. Remember that APIC is just
a management plane. It’s not directly involved in the control plane operation anyhow
(as opposed to OpenFlow). Switches have their own “brain” and they run an Opflex
agent under the hood. This agent takes the model description from APIC’s southbound
API and implements it in a form of well-known networking constructs: interface (and
protocol) configuration, VLANs, IP interfaces, routing protocols, VXLAN configuration,
security policies, and such. In fact, the majority of ACI’s configuration is in the end
translated to the standard networking protocols.
From my own path to becoming an ACI “hero,” I found the most difficult part to
grasp was the Application Policy model concepts and mapping them mentally to already
known networking principles from the legacy infrastructure. As soon as you reach that
state and find your way around how each building block fits together, ACI will become
piece of cake for you. And hopefully we are walking towards such a level together in
this book.
48
Chapter 2 ACI Fundamentals: Underlay Infrastructure
Hardware Equipment
So far, there were three generations of APIC appliances available, each of them based on
a Cisco UCS C220 rack mount server running on CentOS with specialized software on
top of it. Currently, only the third generation is fully supported and orderable; others are
past their End of Sale (EoS) dates already.
• First generation: Cisco UCS C220 M3 based server – APIC M/L1 (EoS
04/2016)
According to your needs, you can choose from medium (M) or large (L) versions.
As described in Table 2-4, the difference lies in the hardware components used inside
the chassis and the amount of end host interfaces supported in ACI. If you plan to scale
above the 1,200 access interfaces, you should go for the large APIC. Otherwise, choose
the medium one and it will fulfill all the ACI requirements just fine.
Processor 2x 1.7 GHz Xeon Scalable 3106/85W 2x 2.1 GHz Xeon Scalable 4110/85W
8C/11MB Cache/DDR4 2133M 8C/11MB Cache/DDR4 2400MHz
Memory 6x 16GB DDR4-2666-MHz RDIMM/ 12x 16GB DDR4-2666-MHz RDIMM/
PC4-21300/single rank/x4/1.2v PC4-21300/single rank/x4/1.2v
Hard Drive 2x 1 TB 12G SAS 7.2K RPM SFF HDD 2x 2.4 TB 12G SAS 10K RPM SFF HDD
(4K)
PCI Express Cisco UCS VIC 1455 Quad Port 10/25G Cisco UCS VIC 1455 Quad Port 10/25G
(PCIe) slots SFP28 CNA PCIE SFP28 CNA PCIE
or or
Intel X710 Quad-port 10GBase-T NIC – Intel X710 Quad-port 10GBase-T NIC -
RJ-45 RJ-45
49
Chapter 2 ACI Fundamentals: Underlay Infrastructure
3. APIC Data interfaces: Each APIC node comes with either a Cisco
UCS VIC card consisting of 4x SFP28 interfaces and supporting
10/25G speeds, or 4x 10GBase-T, RJ-45 Intel NIC. For redundancy,
you have to connect two APIC’s data ports into different leaves,
specifically to their end host interfaces. By default, all leaf host
interfaces are up and even without configuration, ready to
connect the controller.
50
Chapter 2 ACI Fundamentals: Underlay Infrastructure
By default, in the third generation of APICs, their four physical data interfaces are
grouped together in CIMC to form two logical uplinks of eth2-1 and eth2-2 (as shown in
Figure 2-16). Uplinks are further bundled into a single active/standby bond0 interface
from the OS perspective and used to manage the ACI. In order to achieve correct LLDP
adjacency with leaf switches, you have to connect one interface from both uplink pairs
(e.g., ETH2-1 and ETH2-3, or ETH2-2 and ETH2-4); otherwise, the APICs won’t form a
cluster. Another option I like to use during ACI implementations is to turn off the default
hardware port-channel in CIMC. After logging into the CIMC GUI, click-left the menu
button and navigate to Networking -> Adapter Card 1 -> General Tab -> Adapter Card
Properties. Here, uncheck the Port Channel option and click Save Changes. Now you
can use any two physical interfaces of your choice.
ERQG /RJLFDO26'DWD,QWHUIDFH
52
Chapter 2 ACI Fundamentals: Underlay Infrastructure
But be aware. If you increase a number of APIC cluster nodes to more than three, you
will still have three replicas of each shard. Therefore, more nodes ≠ more reliability
and redundancy. Each additional APIC just means increased scalability and ability to
manage more fabric switches.
Let’s consider a larger, five-node APIC cluster during a failure scenario. If we lose
two nodes, formally we will still have a quorum, but have a look at Figure 2-18. With
such a shard layout, some shards will end up in read-only state and others will be still
writable. It’s highly recommended not to do any changes in this situation and restore
the cluster as soon as possible. If more than two nodes are lost, there is actually a high
probability that some information will be irreversibly lost.
Your goal should be to always to distribute APICs around the ACI fabric in a way that
ensures their protection against failure of more than two nodes at the time. Of course,
within the single fabric you don’t have too much maneuvering space. In that case, try to
spread APIC nodes to different racks (or rows), and connect each to independent power
outlets.
In Multi-Pod architecture, it’s recommended to connect two APICs in Pod 1 and
the third APIC in one of the other Pods. In a worst-case scenario, there is a potential
for losing two nodes. In such a case, you would end up without a quorum and with
read-only ACI fabric. To overcome this problem, there is an option to prepare and
deploy a standby APIC in advance (as shown in Figure 2-19). This is the same hardware
appliance with the specific initial configuration, instructing APIC not to form a cluster
with others, but rather it stays in a “background” for the failure situation. In the standby
53
Chapter 2 ACI Fundamentals: Underlay Infrastructure
state, APIC does not replicate any data or participate in any ACI operation. However, the
main cluster is aware of the standby APIC presence and allows you to replace any failed
node with it in case of need. The active APIC will then replicate its database to standby
one and form again read-write cluster to fulfill the quorum.
If you plan to deploy between 80 and 200 leaves across the Multi-Pod environment,
the best performance and redundancy can be achieved using four four-node clusters
spread as much as possible around Pods. In a two-Pods-only design, put an additional
standby APIC in each Pod.
For a five-node cluster in MultiPOD, see the recommended distribution shown in
Table 2-5.
54
Chapter 2 ACI Fundamentals: Underlay Infrastructure
* Note: When you completely lose some shards during POD1 outage, there is still a chance for
data recovery using a procedure called ID Recovery. It uses configuration snapshots and has to be
executed by Cisco Business Unit or Technical Support.
For a highly scaled ACI deployment consisting of more than 400 switches, a
seven-node APIC cluster distribution should be implemented as shown in Table 2-6.
Personally, I wouldn’t implement this number of switches and APICs in two Pods only.
You would end up with four nodes in the first Pod, and in case of failure, there is no
guarantee that configuration will be recoverable. A safe minimum for me would be going
for at least four Pods.
Two Pods - - - - - -
Three Pods * 3 2 2 - - -
Four Pods 2 2 2 1 - -
Five Pods 2 2 1 1 1 -
Six+ Pods 2 1 1 1 1 1
55
Chapter 2 ACI Fundamentals: Underlay Infrastructure
ACI Licensing
As with many other network solutions, Cisco ACI is not only about hardware to ensure
genuine vendor support of the whole architecture in case of need. Even though ACI uses
so called “honor-based” licensing (meaning that all its features are available and working
right from the box), to receive any technical assistance from Cisco Customer Experience
(CX) centers later, you need to buy the correct software licenses.
Both standalone Cisco NX-OS and ACI switch modes consume tier-based
subscription licenses. Simply said, you must have a Cisco Smart Account (centralized
license repository in Cisco’s cloud) and allow your infrastructure to talk with this
licensing service to consume a license. It can be managed via a direct internet
connection, a company proxy server, or using a Smart Software Manager (SSM) Satellite
server. Only SSM then needs internet connectivity and all your devices can license
themselves “offline” using this proxy component.
The advantage of the Smart Licensing architecture is license portability and
flexibility. They are not directly tied with any specific hardware anymore; you can
upgrade the licensing tiers anytime during the subscription duration and you always
have access to all of the newest features or security patches. Consider it as a pool of
generic licenses that are consumed over time with your hardware infrastructure.
At the time of writing this book, subscription licenses are offered in three-, five-, and
seven-year terms and come in these predefined tiers, based on features needed:
In ACI, spine switches don’t require any licenses at all; you have to buy them just for
each leaf switch. The minimum for running ACI is always Essentials. If you want to use
ACI Multi-Pod with remote leaves functionality, only remote leaves need Advantage; the
rest of the fabric can stay on Essentials. In case of Multi-Site or additional analytic tools,
all the leaf switches have to consume the Advantage + 2-Day Ops or Premier license.
• If under 50ms round-trip time (RTT) is fulfilled, you can reach for
a Multi-Pod architecture; otherwise go for Multi-Site.
57
Chapter 2 ACI Fundamentals: Underlay Infrastructure
• How many racks with DC resources will you need to cover with
networking infrastructure?
58
Chapter 2 ACI Fundamentals: Underlay Infrastructure
A great way to make sure everything is in a place for ACI, and you haven’t forgot
something, is to draw a high-level networking schema. I prefer to use MS Visio, but there
are many other (and free) tools. In the design drawing, I describe the datacenters’ layout,
their interconnection, the number and type of leaf-spine switches, APICs, and of course
all the interfaces used between the whole infrastructure. Mark their speeds, types, used
QSA adapters (QSFP to SFP) where applicable, and such. From my experience, when
you see the network design schema in a front of you, you will usually realize things and
connections otherwise easily missed.
At the end, when I’m satisfied with the design, I create a Bill of Materials (BoM),
which is a list of all hardware and software components, licenses, and supports. Again,
it’s great to use an already created network schema and go device by device and
interface by interface to make sure you included everything needed in the BoM.
Summary
In this chapter, you had the opportunity to explore ACI’s main underlay components: a
leaf-spine network based on Nexus 9000 switches and their CloudScale ASICs. You saw
various ACI architectures covering multi datacenter environments, remote locations,
and public cloud services and you explored ACI’s centralized management plane, APIC.
In the following chapter, you will take all this theoretical knowledge gained so far and
start applying it practically. You will look at how to initialize the whole fabric, register all
physical switches, configure various management protocols, and prepare IPNs/ISNs to
support multi datacenter communications.
59
CHAPTER 3
Fabric Initialization
and Management
After two mainly theoretical chapters about ACI concepts and components, you will
finally get to practice this new knowledge. My goal for this chapter is to help you deploy
the ACI fabric with a best practice configuration and optimize it for maximal durability,
all with an emphasis on troubleshooting tools and verification commands available for
effective issue resolution. You will start with a switch conversion to ACI mode, the APIC
cluster initialization for either single or multi fabric deployment, followed by a dynamic
fabric discovery and control plane spin up. Then, you will configure out-of-band and/
or in-band management access for your infrastructure and prepare basic fabric policies
(NTP, DNS, MP-BGP, Syslog, SNMP, or NetFlow). Last but not least, I will show you
how to correctly configure an Inter-Pod network consisting of Nexus 9000 devices in
standalone NX-OS mode.
For the next sections, let’s assume you have already done the entire hardware
racking and cabling, your switches and APICs are properly connected to the
management network(s) mentioned in the previous chapter, and you have physical
access to all equipment. Now you are ready to start the ACI configuration itself.
61
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_3
Chapter 3 Fabric Initialization and Management
To make it simple, Cisco provides exactly the same ACI file for each Nexus 9000
model and you can directly obtain it from the Cisco Software Download page together
with APIC firmware: https://software.cisco.com/. It requires an account linked with
a service contract. The file naming convention has the structure shown in Figure 3-1.
DFLQGNJELQ
DFLDSLFGNJLVR
Figure 3-1. ACI images naming convention
The APIC version should ideally be the same or newer than the switches in your
fabric. A major version is released around once per year and there are significant features
introduced, such as various user interface improvements or new hardware generation
support. Then there are several minor updates per year, also enabling new features and
hardware support, fixing a bigger number of issues, and so on. Maintenance releases
are published one to two times per month and they fix mainly open bugs and identified
security vulnerabilities.
At the time of writing this book, we have two long-lived recommended version trains
with assured vendor support for the upcoming period. If you don’t have any reason to
use other version, make sure to implement one of these:
• ACI 4.2
• ACI 5.2
When you hover your mouse over a particular file on the download page, in the
Details popup you will find the MD5 or SHA512 hash. By comparing one of these hashes
with the uploaded file in the Nexus switch, you can avoid issues with image integrity
resulting in problems with booting itself.
62
Chapter 3 Fabric Initialization and Management
• Plug the USB stick in switch and verify its content using dir usb1: or
dir usb2:.
• Copy the image from the USB stick to the switch with copy
usb1:<aci-image-name>.bin bootflash:.
After Nexus wakes up in ACI mode for the first time, there is one more and often
neglected step: setting boot variables to ensure that even after the reload it is still in ACI
mode. An unregistered ACI switch allows you to log in using the admin user without a
password.
Alternatively, you can use some transfer protocol (e.g., SCP or SFTP) to upload the
ACI image remotely. The prerequisite is to have already working IP connectivity to the
switch management interface.
63
Chapter 3 Fabric Initialization and Management
2. Enable the SCP server feature on the Nexus switch with switch
(config)# feature scp-server.
Note After this NX-OS conversion, ACI can sometimes notify you about degraded
switch health due to a mismatch of BIOS and FPGA firmware versions compared
with the main ACI image. This is not an issue from the short-term perspective.
It will be fixed during the next standard upgrade administered by APIC. You can
avoid this by converting a switch to version N-1 vs. currently running on APIC and
upgrade it right away after the registration to fabric.
64
Chapter 3 Fabric Initialization and Management
Start with connecting a keyboard to USB and a monitor to the VGA interface of an
APIC. Turn on the server, and during its bootup, keep pressing F8 to enter the CIMC
configuration utility (shown in Figure 3-2).
Make sure that NIC Mode is set to Dedicated, NIC Redundancy to None, configure the
IPv4 properties, and optionally VLAN if used. Press F10 to save the configuration and exit
the CIMC utility with ESC.
After the APIC reboot, you can either continue using the keyboard and monitor or
connect to the CIMC IP configured recently and enter the Virtual KVM console (see
Figure 3-3).
65
Chapter 3 Fabric Initialization and Management
Whichever way you choose, APIC without any initial configuration in the factory
default state will offer you a configuration wizard. Filling in its fields is a key step to
configure APIC for single or Multi-Pod operation.
In the following section, I’ll explain each configuration parameter in more detail
(with default values in brackets):
Fabric name (ACI Fabric1)
Common name for the ACI fabric and APIC cluster. Has to be configured consistently
for all nodes of a single cluster.
Fabric ID (1)
Consistent ID across all APICs of a particular fabric. Cannot be changed unless you
perform a factory reset.
Number of active controllers (3)
Initial expected APIC cluster size. Production fabric should have at least three nodes
due to database sharding and data replication. However, you can later change its size
both ways: up and down. In a lab, it’s perfectly fine to use only one APIC node. If you are
increasing the size of a cluster, add the next available controller ID. On the other hand,
when removing a cluster member, remove the highest one.
Pod ID (1)
This field will become significant when deploying ACI Multi-Pod. Set it according to
the Pod where this APIC will be connected.
66
Chapter 3 Fabric Initialization and Management
67
Chapter 3 Fabric Initialization and Management
Make sure not to overlap the VTEP pool with any other subnet used in your
production network. Not from a functional point of view, but you won’t be able to access
APIC OOB management from IPs inside such a subnet. Although APIC receives the
incoming packets on its management interface, the reply is routed back to the fabric due
to the installed TEP pool static route.
Finally, avoid configuring 172.17.0.0/16 as the TEP pool. This one is already used by
default for internal Docker networking between APIC nodes to run additional installable
APIC apps.
Infrastructure VLAN (4093)
Dedicated VLAN used between APIC and leaf switches (or IPN since 5.2(1)). There
are integration use cases (e.g., Kubernetes) when this VLAN is extended further from ACI
to servers or blade chassis, so make sure it’s not overlapping with any already existing
VLANs in your network. Infra VLAN cannot be changed later unless a factory reset is
performed.
IP pool for bridge domain multicast address (GIPo) (225.0.0.0/15)
Each created bridge domain (L2 segment in ACI) will be assigned with one multicast
group address from this pool to ensure delivery of multi-destination traffic across
ACI fabric. Different Pods must have this pool consistently set. A valid range for GIPo
addresses is 225.0.0.0/15 to 231.254.0.0/15 and mask /15 is mandatory.
Out-of-band configuration (-)
Fill in the IP address, mask, default gateway, and speed/duplex configuration of APIC
management interfaces. APIC’s main GUI will be reachable via this IP. Each APIC node
has to have unique management IP address, of course.
Strong password (Y)
Enforce stronger username passwords.
Administrator password
Initial admin password, can be changed later using the GUI or CLI.
Note If your APIC was for any reason already preconfigured, or the person
configuring it made a mistake and you would like to return the server to the
factory default state and initiate again the configuration wizard, issue the following
commands after logging into the APIC console:
68
Chapter 3 Fabric Initialization and Management
Congratulations! After entering all the initial configuration parameters on all your
APIC nodes, they are ready to discover a fabric, register the switches, and form a cluster.
If your out-of-band connectivity is already up and working, try opening the management
IP address configured in a previous step in a web browser (Chrome and Firefox are fully
supported). You should get the initial APIC login screen shown in Figure 3-4. By entering
the initial admin credentials, you can enter the main ACI dashboard.
Tip To simply show and review all the configured values from the initial script,
issue following command:
apic# cat /data/data_admin/sam_exported.config
69
Chapter 3 Fabric Initialization and Management
At the same time, you can use Secure Shell (SSH) to access APIC’s CLI on a
management IP. Log in there and verify its network configuration by issuing the
ifconfig command. See Listing 3-1.
apic1# ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
inet6 fe80::72e4:22ff:fe86:3a0d prefixlen 64 scopeid 0x20<link>
ether 70:e4:22:86:3a:0d txqueuelen 1000 (Ethernet)
RX packets 346486101 bytes 121947233810 (113.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 325970996 bytes 131644580111 (122.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The oobmgmt interface is used for the current SSH session as well as GUI and API
services. The subinterface configured on top of the Bond0 interface always represents
the main data channel between APIC and leaf switches for management plane traffic
(LLDP, DHCP, Opflex, etc.). It uses VTEP IP addresses from the initially specified pool
and Infra VLAN dot1q encapsulation.
70
Chapter 3 Fabric Initialization and Management
Individual menu items are summarized in the following section, and you will learn
about most of them gradually through the book.
71
Chapter 3 Fabric Initialization and Management
• ALL: List of all available and visible ACI tenants for the currently
logged user. Clicking a particular tenant opens its configuration.
• Inventory: Here you can verify the status of the whole ACI
network for all Pods, create new Pods, or register new switches
into ACI fabric.
• Fabric Policies: This settings affecting the whole ACI fabric,
global switch policies, and leaf-spine uplink configuration
options.
72
Chapter 3 Fabric Initialization and Management
73
Chapter 3 Fabric Initialization and Management
Along with these main menu items, you will find many configuration objects and
policies in the left blue panel arranged in a tree hierarchy (as shown in Figure 3-6).
Individual policies usually have their default object present and used by ACI in case you
haven’t created any of your own. To add a new object, just right-click the parent folder
where the object should belong and fill in the related object form.
74
Chapter 3 Fabric Initialization and Management
75
Chapter 3 Fabric Initialization and Management
Pay close attention to the Node ID field. The Node ID can go from 101 to 4000, and
it represents a unique ACI switch identifier used as a reference to this device in various
ACI policies all around the system. If you want to configure an interface, some switch
feature, routing, or create a static mapping for an endpoint behind this switch, you will
always refer to its Node ID, not its hostname or a serial number. Additionally, be aware
that you cannot change it later without returning the switch to the factory default state
and rediscovering it once again. Therefore, it is crucial to prepare a “naming” (or better,
numbering) convention for the whole ACI fabric beforehand.
If the amount of leaf switches per Pod won’t exceed 80, you can use following
convention:
For larger deployments (let’s assume 200 leaves per Pod) it can go like this:
After all, these numbering and naming decisions are completely up to you and
your preference. Just try to make them simple. It will ease ACI troubleshooting and
operation later.
76
Chapter 3 Fabric Initialization and Management
In case of need, you can even pre-provision a currently non-existing switch in the
ACI fabric. Here at the Registered Nodes tab, just click the symbol and choose
Create Fabric Member Node. After filling in the Pod ID, serial number, node ID, and
hostname, this switch will pop up in Nodes Pending Registration, but without the need
for a manual registration later. You can further refer to its NodeID in all the policies
around ACI as for an already available switch. After its discovery, it will be automatically
configured according to the existing policies and added to the fabric without your
intervention.
77
Chapter 3 Fabric Initialization and Management
6. Finally, using the Intra Fabric Messaging (IFM) software layer, the
whole configuration policy is downloaded to the switch and its
control plane is configured accordingly. IFM is the main method
to deliver any system messages between software components
internally within APIC as well as between APIC and leaf switches.
The first leaf switch discovery process is shown in Figure 3-9. Analogically it applies
to others, not directly to connected switches with APIC; just the broadcast DHCP
messages are relayed to APIC using unicast packets.
78
Chapter 3 Fabric Initialization and Management
Figure 3-9. Discovery process between APIC and ACI leaf/spine switch
Now, since you have discovered and provisioned the first leaf switch, the discovery
process continues similarly with the rest of the fabric within a particular Pod (see
Figure 3-10):
2. The discovery process continues between the first leaf and all
spine switches. LLDP runs on their fabric interfaces and the
first leaf automatically relays spine DHCP Discover messages to
APIC. After the spine nodes registration, APIC ensures their VTEP
address assignment and provisioning.
79
Chapter 3 Fabric Initialization and Management
4. As soon as you identify the other APIC cluster members via LLDP
behind the leaf interfaces, the leaf will install a static host /32
route in the IS-IS routing table pointing to the APIC VTEP address
and therefore the first member can cluster with them.
80
Chapter 3 Fabric Initialization and Management
For Multi-Pod discovery, you choose a “seed” Pod, the initial Pod that is completely
discovered first and from which the discovery of all others is made. The TEP pool in
initial APIC scripts should always be set to this seed Pod. Even when some APIC is
connected to a non-seed Pod, it will still have a seed Pod VTEP IP address. The overall
Multi-Pod discovery process then follows these steps (as depicted in Figure 3-11):
1. The first leaf switch is discovered and provisioned in the seed Pod.
2. The rest of the fabric in the seed Pod is discovered and the spine
switches configure their IPN-facing interfaces and activate
forwarding with an appropriate routing protocol in place over
the VLAN 4 subinterface. The prefix of a seed Pod TEP pool is
propagated to IPN, and PIM Bidir adjacencies are created.
3. Even though they aren’t forwarding any data yet, spine switches
from other Pods are sending at least DHCP Discover messages on
their IPN-facing subinterfaces in VLAN 4. IPN routers, thanks to
the configured DHCP relay, proxy these messages to APIC in the
first Pod (using unicast), and the remote spines are discovered.
81
Chapter 3 Fabric Initialization and Management
,QWHU 32'
1HWZRUN
,31
32' VHHG 32'
$3,&&OXVWHU
For more detailed information about how the IPN should be configured to provide
all the necessary functions for ACI Multi-Pod and to ensure dynamic discovery, please
refer to the last section in this chapter.
82
Chapter 3 Fabric Initialization and Management
you with a comprehensive output including a set of 16 checks (in ACI version 5.2.3),
precisely analyzing every step related to the ongoing discovery. Not all checks have
to be necessarily passed in order for the whole process to succeed (as you can see in
Listing 3-2).
83
Chapter 3 Fabric Initialization and Management
84
Chapter 3 Fabric Initialization and Management
85
Chapter 3 Fabric Initialization and Management
===========================================================================
Check 11 Reachability to APIC
===========================================================================
Test01 Ping check to APIC PASSED
[Info] Ping to APIC IP 10.11.0.2 from 10.11.72.64 successful
===========================================================================
Check 12 BootScript Status
===========================================================================
Test01 Check BootScript download PASSED
[Info] BootScript successfully downloaded at
2022-03-10T10:29:17.418+01:00 from URL http://10.11.0.2:7777/fwrepo/boot/
node-FDO22160HJ7
===========================================================================
Check 13 SSL Check
===========================================================================
Test01 Check SSL certificate validity PASSED
[Info] SSL certificate validation successful
===========================================================================
Check 14 AV Details
===========================================================================
Test01 Check AV details PASSED
[Info] AppId: 1 address: 10.11.0.1 registered: YES
[Info] AppId: 2 address: 10.11.0.2 registered: YES
[Info] AppId: 3 address: 10.11.0.3 registered: YES
===========================================================================
Check 15 Policy Download
===========================================================================
Test01 Policy download status PASSED
[Info] Registration to all shards complete
[Info] Policy download is complete
[Info] PconsBootStrap MO in complete state
86
Chapter 3 Fabric Initialization and Management
===========================================================================
Check 16 Version Check
===========================================================================
Test01 Check Switch and APIC Version PASSED
[Info] Switch running version is : n9000-15.2(3e)
[Info] APIC running version is : 5.2(3e)
When you hit any issues during the discovery phase, focus your attention on these
main areas:
• LLDP adjacencies
• Firmware versions
For LLDP, you can use the simple show command and compare the output with the
cabling matrix used during the hardware installation phase. So far, all problems I’ve seen
related to LLDP are from in misconnected ACI underlay cabling. See Listing 3-3.
87
Chapter 3 Fabric Initialization and Management
If this is the first leaf to be discovered, and it’s not appearing in APIC GUI or you
don’t see any APIC in its LLDP neighbors, you can check the protocol from the APIC
point of view as well using the acidiag run lldptool out eth2-1 and acidiag
run lldptool in eth2-1 commands. These outputs will provide you with detailed
information about the entire sent and received LLDP TLVs.
After getting LLDP running, Infra VLAN should be announced to the switches,
installed in their VLAN database, and configured on interfaces facing APICs. To check
that it happened, use the commands shown in Listing 3-4.
In case of any problem with an Infra VLAN or wiring in general, try also verifying the
useful output shown in Listing 3-5.
88
Chapter 3 Fabric Initialization and Management
childAction :
descr :
dn : sys/lldp/inst/if-[eth1/1]
lcOwn : local
mac : 70:0F:6A:B2:99:CF
modTs : 2022-03-10T10:27:04.254+01:00
monPolDn : uni/fabric/monfab-default
rn : if-[eth1/1]
status :
sysDesc :
wiringIssues : infra-vlan-mismatch
Next, should the switch indicate an inability to receive an IP from APIC, check if
the DHCP process is running by using the well-known tcpdump Linux utility. Interface
kpm_inb stands for a switch’s CPU in-band channel for all control plane traffic. This
approach can be useful in general when troubleshooting any control plane protocols
(such as LACP, dynamic routing, COOP, and monitoring protocols). See Listing 3-6.
To ensure security for all control plane traffic between ACI nodes, it uses SSL
encryption. The certificate for this purpose is generated based on a device serial
number during the manufacturing and it’s burned into a specialized hardware chip. The
commands shown in Listing 3-7 let you review the content of a certificate subject, and
with the -date flag, they allow you to verify its time validity. Any issue with SSL can be
resolved only in cooperation with a technical assistance engineer directly from Cisco.
89
Chapter 3 Fabric Initialization and Management
After the switch obtains IP connectivity with APIC, it should download its policies
and acknowledge that this process is successfully completed. The actual state of
bootstrapping can be verified by querying the following object, shown in Listing 3-8.
# pcons.BootStrap
allLeaderAcked : yes
allPortsInService : yes
allResponsesFromLeader : yes
canBringPortInService : yes
childAction :
completedPolRes : yes
dn : rescont/bootstrap
lcOwn : local
modTs : 2022-03-10T10:31:02.929+01:00
policySyncNodeBringup : yes
rn : bootstrap
state : completed
status :
timerTicks : 360
try : 0
worstCaseTaskTry : 0
90
Chapter 3 Fabric Initialization and Management
Any potential problems with the reachability between the switch and APIC, resulting
in the inability to download bootstrap policies, can be identified using the iping utility,
as in Listing 3-9.
Finally, if you encounter some problems with a discovery, compare the version of
your image running on APICs vs. the version on the switches using the show version
command. The APIC version must be same or newer than the one used in the ACI
fabric. Otherwise, you won’t be able to register these switches.
To wrap up this topic, from my experience, when the fabric is correctly cabled,
don’t expect many issues with this first step in its lifecycle. The discovery process is very
reliable. I’ve encountered few problems related to the wrong versions or images running
on Nexus. One time I saw malformed SSL certificates on APIC delivered from the factory
(but we were notified about this fact via a Field Notice), and even in a case of some bug
hit, a reload always helped to resolve an issue.
91
Chapter 3 Fabric Initialization and Management
92
Chapter 3 Fabric Initialization and Management
93
Chapter 3 Fabric Initialization and Management
In the contract object, create at least one subject and add a set of filters defining all
allowed protocols on OOB interfaces. It’s like creating legacy ACL entries. Then the OOB
contract has to be associated with two entities, as displayed in Figure 3-14.
1. Default Out-Of-Band EPG found under Tenant -> mgmt -> Node
Management EPGs. Here, click the symbol in the Provided
Out-Of-Band Contracts field and associate the chosen contract.
This OOB EPG represents all mgmt0 and Eth2-1/Eth2-2 interfaces
in the ACI fabric.
94
Chapter 3 Fabric Initialization and Management
I won’t go into more details of contract or tenant configuration here since I will cover
them completely in Chapter 5.
7HQDQWPJPW
2XWRI%DQG(3* ([WHUQDO0DQDJHPHQW
,QVWDQFH3URILOH (3*
>ĞĂĨϭϬϭ ^ƉŝŶĞϭϭϴ
ŵŐŵƚϬ /W ŵŐŵƚϬ /W 3
2XWRI%DQG &
&RQWUDFW
>ĞĂĨϭϬϮ ^ƉŝŶĞϭϭϵ $GPLQLVWUDWRU¶V
ŵŐŵƚϬ /W ŵŐŵƚϬ /W 6XEQHW V
Warning! Be aware when applying an OOB contract for the first time, because
this is the moment when you can lose the management connectivity of all ACI
devices, including the APIC GUI (due to incorrect contract filters or external subnet
definition).
1. Connect to the CIMC interface of any APIC node from the cluster
and open the virtual KVM console over HTTP/S.
2. Log into the APIC CLI and follow the configuration steps
described in Figure 3-15.
95
Chapter 3 Fabric Initialization and Management
96
Chapter 3 Fabric Initialization and Management
Next, you have to create a new in-band EPG under the Node Management
EPGs folder, associate it with the inb bridge domain, and specify an in-band VLAN
(see Figure 3-17). This VLAN will become significant especially for APIC in-band
connectivity soon.
97
Chapter 3 Fabric Initialization and Management
After the previous configuration steps, you should see a new in-band VRF present on
each ACI switch with one VLAN SVI interface. Its primary address has to match the default
gateway IP and the secondary address is the actual in-band IP of that switch. In other
words, each switch acts as a default gateway for itself. APIC also receives a new subinterface
in the corresponding in-band VLAN on top of the bond0 interface. See Listing 3-11.
apic1# ifconfig
bond0.9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1496
inet 192.168.3.252 netmask 255.255.255.0 broadcast 192.168.3.255
inet6 fe80::72e4:22ff:fe86:3a0d prefixlen 64 scopeid 0x20<link>
ether 70:e4:22:86:3a:0d txqueuelen 1000 (Ethernet)
RX packets 5841 bytes 275040 (268.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9009 bytes 1382618 (1.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
When ACI needs to connect any endpoint to its fabric in a particular VLAN, you have
to create complete set of access policies: objects defining the exact leaf or spine interface
configuration with their available encapsulation resources. In this special and only
occasion, each APIC node needs access policies as well for enabling the connectivity
over the in-band VLAN. In Figure 3-18, you can see an example of access policies for
APIC connected behind the leaf 101 and 102 interface Eth1/1. Each object will be
discussed in detail in Chapter 4.
$FFHVV,QWHUIDFH
/HDI3URILOH /HDI,QWHUIDFH3URILOH //'33ROLF\
/HDIB,Q%DQGB/HDI3URI
3ROLF\*URXS //'3B(QDEOHG
/HDIB,Q%DQGB,QW3URI ,QEDQGB3RO*US
$$(3
/HDI6HOHFWRU 3RUW6HOHFWRU ,QEDQGB$$(3
/HDI $3,&
3K\VLFDO'RPDLQ
3RUW%ORFN ,QEDQGB3K\V'RP
9/$13RRO 9/$1(QFDS%ORFN
,QEDQGB6W9/$1 9/$1
Figure 3-18. Access policies for APIC allowing in-band VLAN connectivity
99
Chapter 3 Fabric Initialization and Management
Finally, you have to allow the connectivity to the in-band IP addresses by applying a
contract between in-band EPG and any other internal or external EPG configured in the
ACI (see Figure 3-19). If you want to reach the in-band subnet from the external network,
it will require properly configured routing between ACI and the external network using a
L3OUT object. I will cover these features in Chapter 7.
7HQDQWPJPW
>ĞĂĨϭϬϭ ^ƉŝŶĞϭϭϴ 3
6WDQGDUG /287
&
/ŶͲďĂŶĚ^s//W /ŶͲďĂŶĚ^s//W &RQWUDFW ([WHUQDO(3*
>ĞĂĨϭϬϮ ^ƉŝŶĞϭϭϴ
/ŶͲďĂŶĚ^s//W /ŶͲďĂŶĚ^s//W
& /287
([WHUQDO(3*
100
Chapter 3 Fabric Initialization and Management
In the default settings, APIC prefers the in-band interface as outgoing for all packets
sourced from its own IP address, and under the hood, this switch alters the configuration
of the default routes metrics. See Listing 3-12.
apic1# bash
admin@apic1:~> ip route
default via 192.168.3.1 dev bond0.9 metric 8
default via 10.17.87.1 dev oobmgmt metric 16
When you switch the preference to ooband, Listing 3-13 shows the routing
table output.
apic1# bash
admin@apic1:~> ip route
default via 10.17.87.1 dev oobmgmt metric 16
default via 192.168.3.1 dev bond0.9 metric 32
The previous setting applies exclusively for the connections initiated from APIC to
the remote, not directly connected destinations. In the opposite direction, if some packet
enters the particular APIC interface from the outside, APIC will reply using the same
interface, regardless of this setting. Hence, even if you prefer in-band connectivity, you
can still reach the APIC GUI and CLI using an OOB IP address without any problem.
101
Chapter 3 Fabric Initialization and Management
102
Chapter 3 Fabric Initialization and Management
Figure 3-21. Date Time Policy configuration in the Pod policy group
The final step is to apply a policy group to a profile, selecting the exact switch
resources where the defined configuration should be deployed. In your case, open
Fabric -> Fabric Policies -> Pods -> Profiles and update the Policy Group field in
the Pod Selector (refer to Figure 3-22). Here you can choose exactly which Pod the
configuration specified in the policy group will be applied to (based on individual Pod
IDs or ALL). By Pod, ACI means all switches within it.
103
Chapter 3 Fabric Initialization and Management
Eventually APIC instructs all the switches in selected Pods to configure the defined
NTP servers and you can verify it using the show ntp commands shown in Listing 3-14.
104
Chapter 3 Fabric Initialization and Management
As before, this configuration needs to be deployed to the fabric using fabric policies.
Go to Fabric -> Fabric Policies -> Pods -> Profiles and make sure you have some Pod
policy group with default BGP Route Reflector Policy applied to all your Pods. If you
configured NTP consistently for the whole fabric in the previous step, the same policy by
default ensures MP-BGP is running there as well.
To verify MP-BGP sessions between leaf and spine switches, you can use the well-
known BGP show commands directly in the CLI. Look at address-families VPNv4 or
VPNv6 as the one MP-BGP session transports information about all prefixes for all VRFs
available in ACI (similarly to MPLS). See Listing 3-15.
This time, it’s not necessary to include the DNS Profile in any policy group. Instead,
go to Tenants -> mgmt -> Networking -> VRFs -> oob and click the Policy tab in right
upper menu. Scroll down in the form and set the “default” DNS label as shown in
Figure 3-25.
This label association relates to the name of the DNS Profile from fabric policies.
APIC will apply the configuration to all switches where this particular VRF is present.
That’s why it is beneficial to use the mgmt tenant and the out-of-band management VRF,
which is pervasive across the whole fabric.
Note The DNS configuration will be applied to APIC only if you use the default
DNS Profile object, not your own.
To check the results of the DNS configuration, view the contents of the /etc/resolv.
conf files on APIC or switches. See Listing 3-16.
107
Chapter 3 Fabric Initialization and Management
nameserver 10.11.0.1
nameserver 8.8.8.8
Non-authoritative answer:
Name: google.com
Address: 142.251.36.78
108
Chapter 3 Fabric Initialization and Management
If you have edited the default one and you already have a Pod policy group associated with
a profile from a previous protocol configuration, any change will be immediately effective.
Just a small note: usually after submitting this form, the APIC web server needs to be
restarted, so it will suddenly disconnect you from the GUI. Don’t be surprised.
109
Chapter 3 Fabric Initialization and Management
TCAM memory. By enabling IP aging, IP entries get their own independent timers. At
75% of retention timer, ACI sends an ARP/ND request three times; if no reply is received,
the IP entry is removed from the endpoint database.
Rogue Endpoint Control
Another best practice recommendation is found in System -> System Settings ->
Endpoint Controls -> Rogue EP Control. After enabling this feature, ACI will declare
an endpoint rogue when it is flapping between multiple leaf interfaces more times than
the configured “Detection Multiplication Factor” during a “detection interval.” A rogue
endpoint is temporarily statically pinned to the last seen interface for a “hold time”
interval so we can avoid rapid, unsolicited updates to the control plane and make the
ACI much more stable. After the hold time expiration, the endpoint is removed from the
database.
QoS Setting – DSCP Translation
Go to Tenant -> infra -> Policies -> Protocol -> DSCP class-CoS translation policy
for L3 traffic and enable it. This will ensure that all the ACI QoS classes will translate
to DSCP values in the outer IP header of a VXLAN packet when sending it through the
IPN/ISN. There, you can ensure the priority forwarding or other manipulation with the
specific ACI traffic class. Generally, it’s recommended to use the DSCP values currently
free and not classified already for other purposes in IPN/ISN. And make sure to prefer
the Control-Plane Traffic class between your Pods, which ensures stable and reliable
MP-BGP and APIC communication.
Fabric Port Tracking
This feature enables monitoring of a leaf’s fabric interfaces (uplinks). If the number
of active ones falls below the configured threshold, the leaf will disable all its host-facing
interfaces. We expect that this will force the end host switchover to the healthy and active
leaf. A best practice recommendation is to set the mentioned threshold to 0 active fabric
interfaces. You can find this knob in System -> System Settings -> Port Tracking.
Global AES Encryption
From both a security and operational point of view, it’s highly recommended to
globally enable encryption of sensitive data in ACI (e.g., passwords or credentials for ACI
integration). Without this option, when you create an ACI backup, the configuration is
exported without any passwords. Therefore, after the potential import, you could break
multiple ACI functionalities. Encryption can be enabled in System -> System Settings ->
Global AES Passphrase Encryption Settings. Set a minimum 16-character passphrase
and you are ready to go.
110
Chapter 3 Fabric Initialization and Management
111
Chapter 3 Fabric Initialization and Management
ACI supports SNMPv1, v2, and v3 with the wide selection of both standardized
SNMP MIBs and Cisco-specific MIBs that can be all found in the following ACI MIB
Support List: www.cisco.com/c/dam/en/us/td/docs/Website/datacenter/aci/mib/
mib-support.html.
ACI doesn’t allow you to configure anything using SNMP, so the only supported
operations are
Note If you have implemented out-of-band contracts from the previous section,
don’t forget to add the filters allowing UDP ports 161 (SNMP agents) and 162
(SNMP servers); otherwise, you won’t be able to connect to the ACI switches and
APICs using SNMP or receive SNMP traps from them.
The SNMP configuration in ACI consists of a fairly simple three-step process. For
SNMP traps, you need to configure receiver and data sources; for SNMP read, you will
create and apply one more SNMP fabric policy.
First, go to Admin -> External Data Collectors -> Monitoring Destinations ->
SNMP Tab and click to create a SNMP monitoring destination group. Here you will
define one or more SNMP trap receiver servers, their UDP port, version, community, and
management EPG preference (as shown in Figure 3-27).
Now you need to associate SNMP trap destinations with several data sources. By
doing so, you basically define which objects you want to receive events about on your
collector. And due to ACI’s differentiation between access interfaces, fabric uplinks, and
112
Chapter 3 Fabric Initialization and Management
• Fabric -> Fabric Policies -> Policies -> Monitoring -> default (or
common or you own) -> Callhome/Smart Callhome/SNMP/
Syslog -> SNMP Tab
• Fabric -> Access Policies -> Policies -> Monitoring -> default (or
your own) -> Callhome/Smart Callhome/SNMP/Syslog ->
SNMP Tab
The general recommendation is to configure the monitoring for all objects in each
category. You don’t want to miss any important information, after all. For simplicity, the
SNMP trap source configuration form is always identical. ACI allows you to granularly
apply different policies for each individual object if you click the little symbol, or you
can leave the same setting for ALL (as shown in Figure 3-28).
To deploy these policies to the infrastructure, in Tenants, click the main tenant object
(its name) and open the Policy tab. There you will find the Monitoring Policy field with
the ability to associate a created policy. APIC will automatically configure SNMP traps on
all switches where its logical constructs (application policies) are instantiated.
113
Chapter 3 Fabric Initialization and Management
In both fabric sections you can put individual monitoring policies into an Interface
Policy Group, which is then associated with the Interface Profile and Switch Profile (see
Figure 3-29). Detailed theory behind these objects will follow in Chapter 4. For now,
consider it as a configuration for individual leaf/spine physical interfaces.
A legitimate complaint is that this approach can over time become harder to
maintain. Three different places and a monitoring policy for each Interface Policy Group
in Fabric and Access Policies; why can’t you use one common policy for the whole
ACI, for the entire set of objects? In fact, you can! If you noticed, there is a common
monitoring policy in Fabric -> Fabric Policies -> Policies -> Monitoring. If you apply
any configuration here, it will have instantly global impact to all ACI objects. Even
without any additional association. Common is policy resolved as a last resort, when ACI
cannot find any other more specific default or your own monitoring policy objects.
Last but not least, you must configure the standard read access using SNMP. There is
only one additional policy needed, located in Fabric -> Fabric Policies -> Policies ->
Pod -> SNMP -> default (or create your own). Have the Admin state set to Enabled.
In client group policies you have to specify the source IPs allowed to communicate
with your devices using SNMP (consider it kind of SNMP ACL), and finally set a SNMP
community or SNMPv3 user in the respective fields. This object is, like before, applied to
Fabric -> Fabric Policies -> Pod -> Policy Groups and subsequently to the Pod profile.
114
Chapter 3 Fabric Initialization and Management
If you use default objects and have them already associated from previous configuration
tasks, you don’t need to do anything more here.
Now you should be able to access all your ACI switches and APIC via SNMP. The
configuration can be verified using the commands shown in Listing 3-17.
----------------------------------------------------------------------
Community Context Status
----------------------------------------------------------------------
snmpcom ok
----------------------------------------------------------------------
User Authentication Privacy Status
----------------------------------------------------------------------
----------------------------------------------------------------------
Context VRF Status
----------------------------------------------------------------------
----------------------------------------------------------------------
Client VRF Status
----------------------------------------------------------------------
10.17.84.48 management ok
---------------------------------------------------------------------
Host Port Ver Level SecName VRF
---------------------------------------------------------------------
10.10.1.1 162 v2c noauth snmpcom management
115
Chapter 3 Fabric Initialization and Management
----------------------------------------
Community Description
----------------------------------------
snmpcom
------------------------------------------------------------
User Authentication Privacy
------------------------------------------------------------
------------------------------------------------------------
Client-Group Mgmt-Epg Clients
------------------------------------------------------------
Zabbix default (Out-Of-Band) 10.17.84.48
------------------------------------------------------------
Host Port Version Level SecName
------------------------------------------------------------
10.10.1.1 162 v2c noauth snmpcom
Faults
Faults are objects describing any kind of problems with the system. Each fault is a child
object of the main affected object, with its unique fault code, severity, description, and
lifecycle, transiting between the following states:
116
Chapter 3 Fabric Initialization and Management
• Retaining: The resolved fault is left in the retaining state for 3600
seconds by default and then it is moved to a history database.
To find currently present faults in ACI, you can navigate to their central list in
System -> Faults. Besides this, each object has its own Fault tab in the upper right
corner of the configuration form. In Figure 3-30, you can see the properties of each fault
object. When the description isn’t clear enough, look in Troubleshooting tab for more
information about the recommended procedure.
Events
An event represents the more traditional type of object, similar to a log message. It
describes a condition that occurred in a specific point in time, without the necessity for
user attention. It doesn’t have any lifecycle, so once created, it is never changed. You can
find a list of all events either in System -> History -> Events, or selectively in the History
tab of a specific object. The content of an event object is shown in Figure 3-31.
Audit Logs
Accounting entries describe which ACI user did what action in APIC, such as creation,
modification, or deletion of any object (as shown in Figure 3-32). The complete list
of these logs are in the traditional place of System -> History -> Audit Logs or in the
History tab of a particular object.
118
Chapter 3 Fabric Initialization and Management
Session Logs
As you can see in Figure 3-33, the last type of logging entries are session logs. They
basically inform an administrator about both successful and failed authentication to the
ACI devices using SSH or the REST API (they include standard GUI logins as well). They
can be found only in one place in ACI: System -> History -> Session Logs.
119
Chapter 3 Fabric Initialization and Management
Syslog Configuration
All previously described objects have only a limited retention data storage available
when saved locally on the APIC cluster. You can configure these properties in
System -> Controllers -> Retention Policies, but the maximum numbers for log objects
are 1 million entries for audit logs and 500,000 for events and faults.
A much better and definitely best practice is to translate these native ACI monitoring
objects to standardized Syslog messages and export them to the external collector.
The configuration itself follows a very similar principle to SNMP traps. You start with
the definition of an external Syslog collector in Admin -> External Data Collectors ->
Monitoring Destinations -> Syslog. Create a new policy. I recommend checking both
Show Milliseconds in Timestamp and Show Time Zone in Timestamp options. Here you
can also enable logging to the console and a local file for defined log severity. On the
next page, fill the information about the syslog server: IP address, minimal exported log
severity, transport protocol, port, or preferred management EPG to reach the server.
The second part of the configuration consists of defining data sources for Syslog
messages. Similar to SNMP traps, you have to configure it either three times in three
different places, or use single common policy in Fabric -> Fabric Policies:
• Fabric -> Fabric Policies -> Policies -> Monitoring -> default (or
common or you own) -> Callhome/Smart Callhome/SNMP/Syslog ->
Syslog Tab
• Fabric -> Access Policies -> Policies -> Monitoring -> default
(or your own) -> Callhome/Smart Callhome/SNMP/Syslog ->
Syslog Tab
You can configure the same Syslog source policy for ALL ACI objects or choose the
individual settings for different objects when you click the little symbol in the policy
configuration window. Finally, don’t forget to apply the created monitoring policies to
the respective objects (tenant policies or interface policy groups and profiles).
Verification of the Syslog configuration can be performed by looking to the Syslog
collector or listening for Syslog messages on the eth0 (the physical mgmt0) interface using
the tcpdump utility (if you chose OOB EPG during the configuration). See Listing 3-18.
120
Chapter 3 Fabric Initialization and Management
You can also generate a test Syslog message with the severity of your choice directly
from APIC, as shown in Listing 3-19.
NetFlow
The last, but not less important, monitoring protocol I will describe is NetFlow. From
previous chapter, you know that Nexus 9000s contain a hardware-based flow table,
collecting metering information about the entire traffic seen in the data plane, such
as source and destination MAC/IP addresses, TCP/UDP ports, VLANs, Ethertype, IP
protocol, source interfaces, number of packets transferred, and more. In ACI mode, with
the assistance of the CPU, you can transform this flow table data into a standardized
NetFlow (v1, v5, or v9) entry and continuously it to the collector for further analysis and
processing. NetFlow information can be used, for example, for network monitoring and
planning, usage billing, anomaly detection, or general traffic accounting.
In ACI, you can gather the NetFlow data only on ingress of individual leaf host
interfaces. However, the flows are collected for all packets even before any security
policy is applied to them. Therefore, you are also able to see in your NetFlow collector
information about the flows that were actually denied by the switch due to applied or
missing contract and thus not forwarded further through the fabric.
121
Chapter 3 Fabric Initialization and Management
A prerequisite for enabling NetFlow in ACI is to reconfigure the default Fabric Node
Control policy located in Fabric -> Fabric Policies -> Policies -> Monitoring -> Fabric
Node Controls -> default. This setting by default instructs Nexus 9000 HW TCAMs to
prefer and install the Cisco Tetration Analytics sensor, which is mutually exclusive with
the hardware NetFlow collection. Prior to deploying NetFlow, configure this policy to
NetFlow Priority (see Figure 3-34).
The NetFlow configuration follows the Flexible NetFlow structure used in Cisco IOS
devices, so you will need to create Record, Monitor, and Exporter policies and apply
the results either to the leaf host interface or the bridge domain (logical L2 segment).
All three NetFlow configuration objects are located in Tenants -> <tenant_name> ->
Policies -> NetFlow or Fabric -> Access Policies -> Policies -> Interface -> NetFlow.
NetFlow Exporter
The first component to configure is NetFlow Exporter. In this policy, you define how to
reach the external data collector, which can be connected in any internal application
EPG or behind the external L3 network (in ACI represented by L3OUT and external
EPG). Most of the fields are self-explanatory, as you can see in Figure 3-35. In the Source
Type and Source IP Address you can define what IP address the leaf will use when
sending the flow information to the collector to identify itself (TEP, OOB, in-band, or
122
Chapter 3 Fabric Initialization and Management
custom). In case of a custom Src IP, make sure to use a subnet with a /20 mask or larger.
ACI switches will use the last 12 bits of a defined subnet to code their NodeID into the
source address.
NetFlow Record
Next, you configure NetFlow Record, where you specify which flow information and
statistics you will match and collect (see Figure 3-36). Match parameters must be set with
respect to the IPv4, IPv6, or Ethernet address families. You can use these combinations
or their subsets:
123
Chapter 3 Fabric Initialization and Management
NetFlow Monitor
In the final object, NetFlow Monitor, you join together the flow record with the flow
exporters (as shown in Figure 3-37). Simultaneously only two active flow exporters are
supported on a particular leaf switch.
124
Chapter 3 Fabric Initialization and Management
The finished NetFlow monitor is then applied to the individual Interface Policy
Group in Fabric -> Access Policies, or Tenants -> <tenant_name> -> Networking ->
Bridge Domain -> <BD_name> -> Policy tab -> Advanced/Troulbeshooting Tab (as
shown in Figure 3-38). If you applied a NetFlow policy to the bridge domain, APIC will
automate the configuration on all leaf interfaces belonging to it and filter the information
about incoming traffic to related BD subnets.
To verify the state of your flow cache, you can issue the command on the leaf switch
shown in Listing 3-20.
125
Chapter 3 Fabric Initialization and Management
One of ACI’s significant advantages lies in its automation capabilities. You can
apply a configuration to hundreds of switches around multiple datacenters from the
one, centralized place. And in the same way, you can very easily back up the whole
configuration of your datacenter networking. Thanks to the object information
model used in ACI, all configuration components can be represented and described
in a text form (JSON or XML) and therefore easily exported into one single text file.
APIC additionally compress it into a tar/gz .zip file, so the resulting size of the whole
configuration has usually single units of MBs at most.
A backup in ACI can be configured as a one-time job or you can granularly schedule
recuring exports on an hourly, daily, or weekly basis. The resulting .zip file is either
saved locally on APIC or automatically uploaded to a remote location using FTP, SFTP,
or SCP. Remember to enable global AES encryption, described in a previous section, to
ensure that sensitive data are also exported as a part of the configuration file (of course
in encrypted form).
The easiest and fastest way to create a one-time local backup is to navigate to
Admin -> Config Rollbacks. There, select Config Rollback for Fabric, optionally add a
description, and click Create a snapshot now (as shown in Figure 3-39).
When you select a specific snapshot, returning to this previous state of ACI
configuration is matter of one click on the Rollback to this configuration button.
Optionally, you can further compare this selected snapshot with others and see
the changes. I highly recommend creating this snapshot before implementing any
126
Chapter 3 Fabric Initialization and Management
significant change. When something goes wrong or a change causes an outage, you don’t
want to spend too much time troubleshooting. The best solution in such a case is fast
rollback.
Let’s now create a remote export location in Admin -> Import/Export -> Remote
Locations (as shown in Figure 3-40). Fill in the IP address of a remote server, protocol,
path, credentials, and, as usual, the preferred management EPG to reach it.
127
Chapter 3 Fabric Initialization and Management
In the Operational tab of the Configuration Export Policy, you can check the
outcome of the implemented backup. If something failed during the process, there you
will find the error messages and continue with troubleshooting. The most common
problems with backups are related to server connectivity, credentials, or a wrong remote
path specified.
128
Chapter 3 Fabric Initialization and Management
3RG 3RG
([WHUQDO 5RXWDEOH 7(3 ([WHUQDO 5RXWDEOH 7(3
3RRO ± 3RRO ±
6SLQH 6SLQH
3ULPDU\ 53 IRU 6HFRQG EDFNXS 53 IRU
H )LUVW EDFNXS 53 IRU 7KLUG EDFNXS 53 IRU H
H
,31 ,31 H
H H
,QWHUQDO 7(3 3RRO ,31 263)DUHD ,QWHUQDO 7(3 3RRO
3,0 %L'LU
H H
03%*3 5RXWHU,' 03%*3 5RXWHU,'
H H
H
,31 ,31 H
H 3ULPDU\ 53 IRU 6HFRQG EDFNXS 53 IRU H
)LUVW EDFNXS 53 IRU 7KLUG EDFNXS 53 IRU
6SLQH 6SLQH
,31 263)%*3 RYHU 9/$1 VXELQWHUIDFH ,31 263)%*3 RYHU 9/$1 VXELQWHUIDFH
Before the ACI Multi-Pod deployment itself, prepare the following IP information:
• Internal TEP Pool (/16 - /23): One per Pod (first “seed” Pod should
be already addressed)
• External Routable TEP Pool (/24 - /29): One per Pod, used for
control plane and data plane forwarding between Pods. For each
spine, a MP-BGP Router ID will be chosen from this pool during the
configuration and APIC reserves three more IP addresses: Anycast-
mac External TEP, Anycast-v4 External TEP, and Anycast-v6 External
TEP. These are for Multi-POD VXLAN bridging and routing purposes.
129
Chapter 3 Fabric Initialization and Management
really matter which devices you use as long as they support the necessary features. Each
configuration part is meant to be implemented on all IPN devices unless stated otherwise.
OSPF/eBGP Process
Since ACI version 5.2(3), both OSPF and eBGP underlay routing protocols are supported
for IPN routing so choose based on your preference or other operational aspects. If
you’ve created an IPN VRF, these protocols need to run as part of it. See Listing 3-23.
feature ospf
router ospf IPN
vrf IPN
router-id 1.1.1.1
log-adjacency-changes
passive-interface default
130
Chapter 3 Fabric Initialization and Management
At the time of writing, eBGP peerings only are supported between the IPN and
spines, so make sure to specify correct neighbor IP addresses and the autonomous
system. On the ACI side, BGP AS is configured during internal MP-BGP provisioning. See
Listing 3-24.
131
Chapter 3 Fabric Initialization and Management
Then it’s important to set a MTU to the value at least 50 bytes higher than the
maximum size used on your ACI host-facing interfaces or control plane MTU setting,
whichever is higher. The default MTU in ACI for the control plane and endpoints is 9000.
See Listing 3-25.
interface Ethernet1/1
mtu 9216
no shutdown
interface Ethernet1/1.4
description Spine118
mtu 9216
encapsulation dot1q 4
vrf member IPN
ip address 192.168.11.1/31
no shutdown
interface Ethernet1/2
description IPN2
mtu 9216
vrf member IPN
ip address 192.168.0.0/31
no shutdown
interface Ethernet1/3
description IPN3
mtu 9216
vrf member IPN
ip address 192.168.0.2/31
no shutdown
132
Chapter 3 Fabric Initialization and Management
This PIM variant creates common bidirectional shared trees for multicast traffic
distribution from the rendezvous point (RP) to receivers and from the sources to the
rendezvous point. Both sources and receivers are represented by ACI spines when
sending any broadcast, unknown unicast, or multicast (BUM) traffic between Pods. It is
optimally scalable without the need for thousands of (S, G) entries in multicast routing
tables of IPN devices. Only (*, G) entries are present instead. Furthermore, multicast
trees in IPN are created thanks to IGMP as soon as the new bridge domain is activated in
ACI, so you don’t need to wait for the first multicast packet.
PIM Bidir cannot use standard Anycast RP, AutoRP, or BSR mechanisms for dynamic
RP announcements and redundancy. Instead, you will use the concept of statically
configured phantom RPs. Only one particular IPN device acts as the RP for the whole
ACI /15 multicast pool, or for the specific subset of multicast IP addresses, and in case of
failure, another IPN will take over. The RP placement and failover completely relies on
the basic longest prefix match routing to the RP IP address, which in fact isn’t configured
anywhere. From an addressing point of view, it just belongs to the loopback subnet with
variable masks configured around IPN devices. The one with the most specific mask
(longest prefix match) on its loopback interface will act as a current RP. That’s the reason
for calling it “phantom.” If not already, it will become clearer when you go through the
configuration example below.
Additionally, in your sample architecture with four IPN devices, assuming the default
GIPo multicast pool was used during APIC initialization, you can quite easily even load-
balance the multicast traffic with ensured redundancy. Just split the previous /15 pool
into two /16s and implement the RP configuration like this:
• IPN1 is the primary RP for multicast range 225.0.0.0/16 and the first
backup RP for 225.1.0.0/16
• IPN2 is the primary RP for multicast range 225.1.0.0/16 and the first
backup RP for 225.0.0.0/16
• IPN3 is the second backup for multicast range 225.0.0.0/16 and the
third backup RP for 225.1.0.0/16
• IPN3 is the second backup for multicast range 225.1.0.0/16 and the
third backup RP for 225.0.0.0/16
But why have second and third backups with just two ACI Pods? If you lose both
IPN 1 and 2, backups won’t be used for anything actually. But imagine having these four
IPN devices interconnecting three+ ACI Pods. Then, even after losing the two main IPN
133
Chapter 3 Fabric Initialization and Management
devices, you want to ensure that multicast is working and additional backups will take
over. I used to implement the phantom RP this way for our customers in production. See
Listings 3-26 through 3-28.
IPN1 IPN2
IPN3 IPN4
interface loopback1 interface loopback1
description BIDIR Phantom RP description BIDIR Phantom RP
1 vrf member IPN ip address 1vrf member IPNip address
192.168.254.1/28 ip ospf network 192.168.255.1/27ip ospf network
point-to-point ip router ospf point-to-pointip router ospf
IPN area 0.0.0.0 ip pim sparse- IPN area 0.0.0.0ip pim sparse-
modeinterface loopback2 description modeinterface loopback2description
BIDIR Phantom RP 2 vrf member IPN BIDIR Phantom RP 2vrf member IPNip
ip address 192.168.255.1/27 ip ospf address 192.168.255.1/28ip ospf
network point-to-point ip router ospf network point-to-pointip router ospf
IPN area 0.0.0.0 ip pim sparse-mode IPN area 0.0.0.0ip pim sparse-mode
134
Chapter 3 Fabric Initialization and Management
interface Ethernet1/2
ip pim sparse-mode
no shutdown
interface Ethernet1/3
ip pim sparse-mode
no shutdown
135
Chapter 3 Fabric Initialization and Management
136
Chapter 3 Fabric Initialization and Management
For your purposes, the most important traffic classes are Control Plane Traffic
with CS4 (DSCP value 32) and Policy Plane Traffic with Expedited Forwarding (DSCP
value 46). You will match these two and use them in the Nexus 9000 QoS CLI. If you
have deployed some other networking devices as IPNs, just map the configuration in
their CLI.
At first, create a QoS class map matching the control plane DSCP values and a policy
map classifying the traffic into an internal N9K QoS group 7. The policy map will be then
applied on all IPN ingress interfaces where the spines are connected. See Listing 3-30.
interface Eth1/1.4
description Spine291
service-policy type qos input ACI_QoS_Classification
interface Eth1/2.4
description Spine292
service-policy type qos input ACI_QoS_Classification
Next, you need to prepare the queuing configuration, defining that matched control
plane traffic will get priority in case of any interface congestion. Class names in the
following example are predefined on Nexus 9000 and they match internal QoS classes.
So, if you classify ACI control plane traffic into the qos-group 7, in the following queuing
policy map, it will be part of class c-out-8q-q7. Any control plane traffic in this class will
be put into the priority queue. See Listing 3-31.
137
Chapter 3 Fabric Initialization and Management
Note As you can see, Nexus 9000s support eight different traffic classes for
queuing. In the same way as you configured priority for the control plane, you
can reserve bandwidth for any other ACI traffic class defined in APIC. Just match
different DSCP values, put the traffic in the respective qos-group, and define the
amount of bandwidth available for a particular queue in the previous queuing
policy map.
The completed queuing policy map now needs to be globally assigned on the switch
using the commands shown in Listing 3-32.
138
Chapter 3 Fabric Initialization and Management
system qos
service-policy type queuing output ACI-8q-out-policy
To verify QoS operation, first have a look at ingress interfaces to see if there are
matched control plane packets, defined in your class map. See Listing 3-33.
Ethernet1/1.4
Service-policy (qos) input: ACI_QoS_Classification
SNMP Policy Index: 285213859
Slot 1
156729 packets
Aggregate forwarded :
156729 packets
Match: dscp 32,46
set qos-group 7
Second, check the output IPN interface-facing IP network between Pods from a
queuing perspective, as shown in Listing 3-34.
Listing 3-34. IPN Custom Queuing Verification for ACI Control-Plane Traffic
139
Chapter 3 Fabric Initialization and Management
Now, with all the previous IPN configuration in place, you are ready to create
additional necessary objects in ACI and enable Multi-Pod features.
• Page 2 “Pod Fabric:” Allows you to configure the Pod ID, TEP pool,
and L3 interfaces used between spines and the IPN
140
Chapter 3 Fabric Initialization and Management
After submitting the whole form, repeat the process for all other Pods present in your
ACI infrastructure.
141
Chapter 3 Fabric Initialization and Management
Check that each IPN-facing spine switch has the following IP interfaces configured
for Multi-Pod operation (loopback numbers may vary):
142
Chapter 3 Fabric Initialization and Management
Next, you should find the MP-BGP peerings in VRF overlay-1 on spines for address
families VPNv4, VPNv6, and L2VPN EVPN. See Listing 3-37.
For endpoint connectivity to work between ACI Pods, both local and remote
endpoint entries should be visible in the MP-BGP database (see Figure 3-44). The EVPN
information is quite complex, but I will cover it later in detail as part of Chapter 6.
143
Chapter 3 Fabric Initialization and Management
6SLQHVKRZEJSOYSQHYSQYUIRYHUOD\
5RXWH'LVWLQJXLVKHU /91,
%*3URXWLQJWDEOHHQWU\IRU>@>@>@>@>@>@>@YHUVLRQGHVWSWU[DEH
3DWKV DYDLODEOHEHVW
)ODJV [D[ RQ[PLWOLVWLVLQULEHYSQLVQRWLQ+:LVLQOULEPSRGVKDUGLVLQOULE
0XOWLSDWKH%*3L%*3
$GYHUWLVHGSDWKLG
džƚĞƌŶĂůdW/W 3DWKW\SH [DH LQWHUQDO[F[UHIDGYSDWKUHISDWKLVYDOLGLVEHVWSDWKLVLQOULE
,PSRUWHGIURP [D >@>@>@>@>@>@>@
$63DWK121(SDWKVRXUFHGLQWHUQDOWR$6
PHWULF IURP
ŽŶƚƌŽůͲƉůĂŶĞdW
2ULJLQ,*30('QRWVHWORFDOSUHIZHLJKWWDJSURSDJDWH
5HFHLYHGODEHO
5HFHLYHGSDWKLG
([WFRPPXQLW\
57
622
(1&$3 sE/ sZ&sE/
3DWKLGQRWDGYHUWLVHGWRDQ\SHHU
And finally, you should see the same information redistributed from MP-BGP to
the local COOP database, which can be checked by issuing the command shown in
Listing 3-38.
IP address : 192.168.33.10
Vrf : 2850817
Flags : 0x2
EP bd vnid : 16187326
EP mac : 00:50:56:89:45:86
Publisher Id : 172.16.22.1
Record timestamp : 01 01 1970 00:00:00 0
Publish timestamp : 01 01 1970 00:00:00 0
Seq No: 0
Remote publish timestamp: 03 19 2022 17:16:03 301209130
URIB Tunnel Info
Num tunnels : 1
Tunnel address : 10.22.0.34
Tunnel ref count : 1
144
Chapter 3 Fabric Initialization and Management
Summary
By the end of this chapter, you should have gained the necessary knowledge and
practical skills to initialize ACI for both single-fabric and Multi-Pod architectures.
You’ve gone through the automated switch discovery process, out-of-band and in-band
management access, best practice recommendations for the initial fabric configuration,
and protocols related to the fabric management, operation, and monitoring. You also
learned how to correctly configure an Inter-Pod Network to support all the necessary
features for ACI.
Since the fabric is currently up and running, you can start connecting endpoints to it.
The next chapter will focus on ACI access policies representing configuration options for
the physical underlay leaf host interfaces.
145
CHAPTER 4
ACI Fundamentals:
Access Policies
ACI, due to its application-centric philosophy and multi-tenancy, differentiates between
the common physical underlay network consisting of leaf/spine switches and the
logical network model describing tenant application policies and the logical network
configuration (VRFs, BDs, EPGs, etc.). In this chapter, you will look closer at underlay
access policies, the first and very important configuration aspect to ensure end host
connectivity.
Access policies are a group of universal configuration objects associated between
each other to completely describe settings for individual end host-facing interfaces on
leaf or spine switches. This implies that all end hosts without a difference need them.
It doesn’t matter if you plan to connect a bare-metal server, virtualization hypervisor,
external router, legacy switch, firewall, load balancer, or IPN/ISN device. All of them
must have an entire set of access policies in place to communicate through ACI.
Additionally, they describe encapsulation resources (VLAN IDs or VXLAN VNIDs)
usable on specified interfaces. The important term is usable. In access policies you
don’t define actually used encapsulation or how it will be specifically configured on
particular interface (e.g., mode access or trunk). You just grant a usage permission for
encapsulation IDs and the mapping itself will happen later in the tenant policies.
Access policies are global and usually configured by the global ACI administrator.
This especially applies to datacenter service providers. On the other hand, tenants can
be maintained by completely different subjects without admin rights to the whole ACI
configuration. By the proper design of access policies, the global admin can easily ensure
separation of hardware and encapsulation resources between ACI tenants, preconfigure
the underlay networking for them, and simply expose all the settings using the single
object mapping I will describe in the following sections.
147
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_4
Chapter 4 ACI Fundamentals: Access Policies
In Figure 4-1, you can view the complete set of access policies and their relation to tenant
policies. I will gradually focus on each object and describe its significance for the whole.
//'33ROLF\
/HDI6ZLWFK3URILOH /HDI,QWHUIDFH3URILOH ,QWHUIDFH3ROLF\ //'3B(QDEOHG
/IBB6Z3URI /IB,QW3URI
*URXS
/IBBB93&
3RUW&KDQQHO
3ROLF\
/$&3B(QDEOHG
/HDI6HOHFWRU 3RUW6HOHFWRU
/IBB6Z6HO HWKB
$$(3 6WURP&RQWURO
7HQDQWB$$(3 3ROLF\
%80BB'URS
1RGH ,'
3RUW%ORFN
3K\VLFDO'RPDLQ
7HQDQWB3K\V'RP
1RGH ,'
$FFHVV3ROLFLHV
9/$13RRO 9/$1(QFDS%ORFN
7HQDQW3ROLFLHV 7HQDQWB6W9/$1 9/$19/$19/$1
$SSOLFDWLRQ
7HQDQW (3*
7HQDQW
3URILOH :HEB(3*
$SSB$3
6WDWLF3DWK0DSSLQJ 9/$13RUW(QFDSVXODWLRQ
3RG1RGH/IBBB93& 9/$1 WUXQN
Figure 4-1. Overview of ACI access policies with relation to tenant EPGs
The philosophy of access policy configuration is the same as with fabric policies in the
previous chapter. First, you need to create individual policy objects where each policy is
related to a specific protocol or functionality, but without an exact affiliation to a concrete
switch or interface. Consider them to be universal building blocks. Then you create lists of
individual policies (kind of a policy template) called policy groups. Finally, the policy group
is applied to a switch or interface profile. In general, it doesn’t matter in which order you
create individual access policy objects, just make sure to have all of them at the end.
Switch Policies
You will start the configuration by creating switch policies. These objects define global
switch settings related to access connectivity and vPC domain parameters. Most
importantly, they allow you to specify switch node IDs on which you will later configure
individual access interfaces.
148
Chapter 4 ACI Fundamentals: Access Policies
Mostly I don’t customize default individual switch policies if not explicitly required,
but there is one exception almost always. As compared to legacy NX-OS mode, when
you plan to create vPC port channels, a pair of switches has to be bundled into a vPC
domain. For this purpose, in switch policies, navigate to the Virtual Port Channel
default object. Here you can create so-called explicit VPC protection groups (as shown
in Figure 4-2). The logical pair ID corresponds to the vPC domain ID.
Notice that each pair of leaves in the vPC receives another common “Virtual IP”
from a respective POD VTEP pool. This IP represents their anycast VTEP address since
at that moment, all traffic originated from or destined to a vPC port-channel will be
149
Chapter 4 ACI Fundamentals: Access Policies
routed inside the fabric to this virtual anycast VTEP address instead of individual switch-
specific addresses.
vPC policy doesn’t need to be applied anywhere; it’s sufficient to create a protection
group, and the configuration is immediately pushed to the fabric. On ACI leaves, the vPC
domain is implemented in a very similar manner compared to NX-OS, just without need
for a physical vPC peer-link and peer-keepalive between leaves. The vPC control plane is
established via spine switches instead. To verify the configuration deployment, you can
use well-known show commands from NX-OS, as shown in Listing 4-1.
150
Chapter 4 ACI Fundamentals: Access Policies
151
Chapter 4 ACI Fundamentals: Access Policies
Switch Profile
In Fabric -> Access Policies -> Switches -> Leaf Switches -> Switch Profiles, you can
create the most important object of switch policies: the switch profile. It allows you to
map switch policy groups to concrete leaf or spine switches based on their Node IDs
and at the same time it defines on which switches you will configure interfaces when the
switch profile is associated to the interface profile (as shown in Figure 4-4).
152
Chapter 4 ACI Fundamentals: Access Policies
One switch can be part of multiple profiles; you just can’t overlap the interfaces
configured in them. ACI displays an error message in such a case.
153
Chapter 4 ACI Fundamentals: Access Policies
/HDI6ZLWFK3URILOH %)'3ROLF\
,3YB%)'B(QDEOH
/IBB6Z3URI
)RUZDUGLQJ6FDOH
3URILOH
+LJKB/30B6FDOH3URILOH
/HDI6HOHFWRU 6ZLWFK3ROLF\*URXS
͙͙
/IBB6Z6HO /IBB3RO*US
&R33/HDI3ROLF\
&R33B3HUPLVVLYH
1RGH,'
([SOLFLW93&3URWHFWLRQ*URXS
/HDIBBY3&
1RGH,'
2SWLRQDO&XVWRP
3ROLFLHV
Interface Policies
Switch policies universally describe ACI leaves or spines and their global access settings,
but not how their interfaces should be configured. For this purpose, you will use
interface policies, another set of stand-alone ACI objects. Only after associating interface
profiles with switch profiles will you achieve the resulting access interface configuration.
154
Chapter 4 ACI Fundamentals: Access Policies
or LLDP protocols) to more complex types such as a port channel or data plane policing,
involving various configurable fields, values, and thresholds. Similar to switch policies,
make sure to use descriptive object names here as well. It will ease your life later during
policy group definition.
If you are wondering what the most important individual policies for access
interfaces are, I created this basic set of objects during the initial ACI provisioning for
customers:
• L2 Interface: VLAN_Port_Local_Scope
All these objects must be created only once at the beginning of ACI provisioning and
then you just keep reusing them in interface policy groups.
155
Chapter 4 ACI Fundamentals: Access Policies
Another significance of interface policy groups lies in their types. In most cases, you
will choose from these three main options:
156
Chapter 4 ACI Fundamentals: Access Policies
Tip Pay special attention to the name of the PC and VPC interface policy
group. ACI will display it in most of the show commands or GUI outputs related
to port channels instead of the actual physical interface name or port channel
ID. Therefore, it’s my good practice to code into the interface policy group name at
least the Node ID and corresponding physical interface on that switch. Example:
Lf101_102_05_VPC, where 101 and 102 stand for the Node IDs of the vPC pair
and 05 is a physical vPC member port on both of them.
With interface policy groups you can also configure interface breakout (to split,
for example, a 100G interface into 4x25G) or SAN interfaces such as a fiber channel
individual or port channel. These are minor options, and they are not used often.
Interface Profile
The ACI access configuration philosophy leads us to interface profiles, located in
Fabric -> Access Policies -> Interfaces -> Leaf Interfaces -> Profiles. As shown in
Figure 4-7, now you create object mapping between interface policy groups and concrete
interface identifiers (e.g., Ethernet 1/5). However, it is another universal object, so you
don’t specify on which switch these interfaces are configured yet.
Each interface profile contains multiple interface selectors, which can further
contain either a single interface entry or a range of multiple interfaces. I always put
only one interface per one selector, though. The main reason is to preserve an ability to
157
Chapter 4 ACI Fundamentals: Access Policies
change the configuration of any individual interface in the future. Otherwise, if you’re
using interface ranges and a sudden single interface change requirement arises, you will
need to delete the whole interface selector range and create multiple entries instead,
resulting in a potential undesirable network disruption.
In Figure 4-8, you can see the summary of all interface-related access policies.
Port Channel
Interface Policy
Policy
Port Selector Group LACP_Enabled
eth1_5 Lf101_102_05_VPC
……
Port Block Strom Control
1/5 Policy
BUM_10_Drop
The completed interface profile now needs to be associated with the switch profile,
which finally tells ACI on which specific switch the related interface policies will be
configured. This can be done either during the creation of a switch profile or later in the
profile configuration.
Design-wise, I recommend you exactly match the structure of your switch profiles.
If you have created one switch profile per switch before, you should create one interface
profile per switch (see Figure 4-9). This will keep your policies consistent and clear.
158
Chapter 4 ACI Fundamentals: Access Policies
159
Chapter 4 ACI Fundamentals: Access Policies
$$(3
7HQDQWB$$(3
3K\VLFDO'RPDLQ
7HQDQWB3K\V'RP
9/$13RRO 9/$1(QFDS%ORFN
7HQDQWB6W9/$1 9/$19/$19/$1
How many AAEP objects do you need for ACI? It depends. You need at least one per
fabric to effectively provide access to physical resources for your tenants. Sometimes it
makes sense to create more of them for a single tenant, especially when a tenant uses
consistently configured clusters of virtualization servers with the same encapsulation
requirements. Instead of constructing tons of individual VLAN mappings to all EPGs
and all interfaces, you can do it in AAEP with a single entry (as shown in the EPG
Deployment section of Figure 4-11). An analogy to a legacy network is the need to add
a particular VLAN on a range of a trunk or access interfaces using single configuration
command. I will discuss this topic in Chapter 5.
160
Chapter 4 ACI Fundamentals: Access Policies
When troubleshooting physical connectivity to ACI, always make sure that all your
interfaces policy groups have the correct AAEP associated. When you refer back to
Figure 4-7, it’s the first associated object in the list.
161
Chapter 4 ACI Fundamentals: Access Policies
7HQDQW3ROLFLHV $FFHVV3ROLFLHV
(3* 3K\VLFDO'RPDLQ
:HEB(3* 7HQDQWB3K\V'RP
/287 /'RPDLQ
/HJDF\B/287 7HQDQWB3K\V'RP
$$(3
7HQDQWB$$(3
/287 ([WHUQDO%ULGJHG'RPDLQ
/HJDF\B/287 7HQDQWB3K\V'RP
(3* )LEHU&KDQQHO'RPDLQ
6$1B(3* 7HQDQWB3K\V'RP
Domains provide the only existing relation between the otherwise abstracted tenant
and access policies. For different use cases in tenant policies, you must differentiate
between four types of domains:
Each ACI tenant has to have at least one physical domain to attach internal
endpoints and one L3 domain to ensure external connectivity if applicable.
162
Chapter 4 ACI Fundamentals: Access Policies
9/$1(QFDS%ORFN
9/$1
9/$1(QFDS%ORFN
9/$1
163
Chapter 4 ACI Fundamentals: Access Policies
Each encapsulation block inside a pool has its own allocation mode as well. By
default, it inherits the mode from a parent pool, but sometimes there is a need to
explicitly change it to something other than that. For VMM integration (discussed in
Chapter 9), there is still an option to use static VLAN allocation and mappings for EPGs
instead of dynamic. In such a case, you create a dynamic main VLAN pool object, but the
range of VLANs used for static mapping is explicitly set to static. If needed, you can even
combine both approaches, as shown in Figure 4-14.
The most common issues during an ACI operation with access policies actually
relates to incomplete or misconfigured VLAN pools. Always remember when configuring
any VLAN mapping in tenant policies, especially when adding a new VLAN or creating
new L3 external connectivity over SVI interfaces, to add these VLANs to the VLAN pool
first. And with the correct allocation mode (mostly static).
164
Chapter 4 ACI Fundamentals: Access Policies
• The server is connected behind interfaces e1/25 of both Leaf 201 and
Leaf 202.
• Protocol policies: LACP rate fast, LLDP, CDP, and MCP are enabled
and storm control shouldn’t allow more than 10% of interface
bandwidth for BUM traffic.
• The tenant expects static VLANs mapping for this server with
IDs 9-11.
165
Chapter 4 ACI Fundamentals: Access Policies
/HDI /HDI
H H
:HE 6HUYHU
You will now go through a step-by-step process of creating all the necessary access
policies and ensuring their deployment on the leaf switches.
2. You know that all of the following access policies are intended for
a new tenant, so let’s create separate objects where applicable.
For VLANs 9-11, you have to prepare a new VLAN pool in
Fabric -> Access Policies -> Pools -> VLAN. Set its mode to static
and create individual encapsulation blocks for each VLAN.
166
Chapter 4 ACI Fundamentals: Access Policies
5. Now go to Fabric -> Access Policies -> Policies -> Interface and
create all the necessary partial protocol policies (if not already
present):
6. Create a vPC interface policy group in Fabric -> Access Policies ->
Interfaces -> Leaf Interfaces -> Policy Groups -> VPC Interface
and list all previously created protocol policies together with AAEP
association in it. The port channel member policy goes to the
override access policy group section in the advanced tab.
After the configuration, you should end up with the structure of access policies
objects shown in Figure 4-16.
167
Chapter 4 ACI Fundamentals: Access Policies
0&33ROLF\
1RGH ,' 1RGH ,' 3RUW%ORFN 3RUW%ORFN 0&3B(QDEOHG
&'33ROLF\
&'3B(QDEOHG
,QWHUIDFH3ROLF\
([SOLFLW93& *URXS
3URWHFWLRQ*URXS /IBBB93&
/HDIBBY3& //'33ROLF\
//'3B(QDEOHG
6WURP&RQWURO
3ROLF\
9/$13RRO 9/$1(QFDS%ORFN %80BB'URS
7HQDQW;B6W9/$1 9/$19/$19/$1
A good practice is also to check for any new faults related to these objects after their
configuration and verify the deployment of individual policies using the CLI on leaf
switches. See Listing 4-2.
168
Chapter 4 ACI Fundamentals: Access Policies
169
Chapter 4 ACI Fundamentals: Access Policies
vPC status
----------------------------------------------------------------------
id Port Status Consistency Reason Active vlans
-- ---- ------ ----------- ------ ------------
684 Po3 up success success 9-11
170
Chapter 4 ACI Fundamentals: Access Policies
171
Chapter 4 ACI Fundamentals: Access Policies
172
Chapter 4 ACI Fundamentals: Access Policies
Summary
In this chapter, you had the opportunity to explore all the necessary objects related to
ACI access policies, which are (as the name suggests) used for an individual end host
interface configuration as well as switch global access-related settings. They bring a
new, object-oriented philosophy to the underlay network configuration and together
with physical domains and encapsulation pools define the exact hardware consumable
resources for ACI tenants. Access policies are mandatory prerequisites for any type of
endpoint connected to the fabric.
My goal for the next chapter is to follow up this underlay configuration and actually
consume it in tenant policies. You are going to create a logical, segmented application
and network policies on top of the shared physical resources and ensure the proper L2
switching as well as L3 routing over ACI for the connected endpoints.
173
CHAPTER 5
ACI Fundamentals:
Application Policy Model
After the previous chapter, you should have a good idea about how to prepare the
physical underlay network for ACI to connect your endpoints to the fabric. Each of them
needs to have the entire set of access policies in place to define both access interface
settings and encapsulation resources. This alone won’t provide any mutual IP (or FC)
connectivity between endpoints yet.
You have certainly noticed that you haven’t any access or trunk interface types, you
haven’t specified actually used VLANs on them, IP configurations, default gateways for
end hosts, and such. In order to do that, you need to define application (many times
referred to as tenant) policies. Remember, ACI uses an allowlisting model, so what is not
defined or explicitly allowed will be denied by default.
ACI uses this differentiation between physical access and logical application
policies to abstract the application needs from the underlying network. In fact, from
the application policies perspective, it’s absolutely insignificant what hardware
running behind the scenes and the configuration of the physical switches. ACI tenants
transparently build on top of that their logical policies, and APIC is responsible for
translating the application needs into various configuration objects to make the
connectivity happen.
175
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_5
Chapter 5 ACI Fundamentals: Application Policy Model
8QLYHUVH
$&,3ROLF\7UHH5RRW
7HQDQW $FFHVV3ROLFLHV
$&,/RJLFDO&RQILJXUDWLRQ 3K\VLFDO'RPDLQVDQG
,QWHUIDFH&RQILJXUDWLRQ Q
Q
Q Q Q Q Q Q Q
([WHUQDO
$SSOLFDWLRQ
95) %ULGJH'RPDLQ )LOWHU &RQWUDFW &RQQHFWLYLW\
3URILOH
Q Q /287/287
Q Q
Q Q Q Q
Q Q Q
([WHUQDO $SSOLFDWLRQ Q
6XEQHW 6XEMHFW (QGSRLQW (QGSRLQW
*URXS *URXS
Q
2EMHFWEHORZLVDGLUHFWFKLOGRIWKHXSSHURQH
2EMHFWVDUHDVVRFLDWHGEHWZHHQHDFKRWKHU
ACI’s object model is organized into a tree structure with a common tree root
called Universe. From Universe, it expands to all areas related to ACI configuration and
operations. We have already gone through one of the tree branches, access policies (the
physical configuration), which “grows” right from the root. Similarly, we will now explore
another tree branch, application policies (the logical configuration).
Application policies start with the definition of tenants, which is the logical
administrative domains belonging to a particular ACI consumer. For each tenant, we
further create Virtual Routing and Forwarding instances (VRFs) to differentiate between
their separated L3 routing tables, bridge domains (BDs) to define L2 broadcast domains,
segment our endpoint resources into endpoint groups (EPGs), connect them to the
external networks, and enable the communication between them using contracts. For
EPGs and external connectivity, we have to create a reference to access policies in order
to consume some of the ACI’s physical fabric resources, but besides that these two
worlds are significantly independent and not directly related.
Many objects, as you can see in a model tree, have parent-child relationships (the
solid line), where parents can have multiple children of the same type, but the child
object can have only exactly one parent. In addition, we will often create “n to n”
associations (or references) between them (the dashed line).
176
Chapter 5 ACI Fundamentals: Application Policy Model
ACI Tenants
Your first object to focus on, the tenant, represents the separated logical container for all
other application policies created inside it. ACI, as you know already, builds its operation
around a multi-tenancy concept to differentiate between potential datacenter network
consumers. Tenants can be managed by a single administrator entity together with all
other ACI policies or theoretically they can be completely unrelated to the main ACI
administrator. As you can see in Figure 5-2, the most common tenant designs isolate
distinct companies, internal departments, or various environments (e.g., test, prod, dev).
Alternatively, you can always implement a combination of the above.
177
Chapter 5 ACI Fundamentals: Application Policy Model
178
Chapter 5 ACI Fundamentals: Application Policy Model
describes combinations of security domains with tenants that effectively secure user
access. The same concept can be applied to physical domains or fabric leaf switches
as well.
7HQDQW
$SUHVV
6HFXULW\'RPDLQV
$//B6'
$&,XVHUQDPH 6HFXULW\'RPDLQ
GHYBWHVWBDGPLQ 'HY7HVWB6'
7HQDQW
$/()
6HFXULW\'RPDLQV
$//B6'
7HQDQW
'HY
6HFXULW\'RPDLQV
$&,XVHUQDPH 6HFXULW\'RPDLQ $//B6'
JOREDOBDGPLQ $//B6' 'HY7HVWB6'
7HQDQW
7HVW
6HFXULW\'RPDLQV
$//B6'
'HY7HVWB6'
System Tenants
In the default ACI state, without any user application policies, there are three
preconfigured system tenants in the Tenants main menu section. You cannot delete
them; you can only make use of them.
Tenant common
As mentioned, all tenants in ACI are entirely separated and their endpoints cannot
communicate with each other in the default state without unique contract interfaces,
which I will discuss later. The common tenant, however, is an exception to this rule. The
entire set of its objects is fully visible and available for all other tenants. Using standard
contracts, even communication between resources in common and other tenants is
179
Chapter 5 ACI Fundamentals: Application Policy Model
allowed (I will cover contracts later in this chapter). The common tenant also serves
as a default policy repository for APIC to resolve if no other more specific user-created
policies are available in the user tenant.
The common tenant is thus perfect for managing shared endpoint resources
accessed from all others such as central network services like NTP servers, DNS servers,
DHCP servers, and L4-L7 devices used for policy-based redirect in service graphs
(covered in Chapter 8) or shared external L2/L3 connectivity.
With endpoints there comes another frequent use case: creating shared application
policies under the common tenant and inheriting them in standard user-created tenants.
These can include any object ranging from networking constructs like VRFs, BDs, and
EPGs to contract definitions, contract filters, various L3OUT settings, or monitoring
policies.
Tenant infra
The main purpose of the infra tenant is to configure policies related to the enablement
of datacenter interconnect (DCI) connectivity over VRF overlay-1 between distinct ACI
fabrics for Multi-POD, Multi-Site, or Remote Leaf architectures. In most cases, but not
exclusively, all these policies are created automatically, and you don’t need to intervene
with them manually. When deploying Multi-POD and Remote Leaf, infra tenant policies
will be configured by the APIC wizard and in the Multi-Site case from Nexus Dashboard
Orchestrator.
For corner case scenarios, if you’re using virtual L4-L7 devices (in ACI 5.X ASAv
and Palo Alto FWs are supported) for policy-based redirect together with virtualization
(VMM) integration, the infra tenant encompasses special policies allowing APIC to
dynamically spin up new VM instances according to the PBR needs.
Tenant mgmt
You have already met the mgmt tenant in Chapter 3 during the ACI fabric initialization.
As the name suggests, it is primarily used to configure out-of-band or in-band access to
fabric switches and APICs. By default, it already includes and manages VRF management
(oob) and mgmt:inb. Only inside the mgmt tenant can you configure the individual
management IP addresses, enhance the management access security, and provide
access to in-band resources from an external network if needed. For more information
about these ACI configuration aspects, refer to Chapter 3.
180
Chapter 5 ACI Fundamentals: Application Policy Model
User Tenants
Along with first three already created tenants, you can add up to 3,000 (with five- or
seven-node APIC clusters) user tenants for arbitrary use. To create a new tenant,
navigate to the Tenant -> Add Tenant menu option. There is not much to configure
yet, as you can see in Figure 5-4. Just set its name and you can add a security domain to
restrict later ACI user access to only objects of this particular tenant.
After the creation, you will directly enter its configuration with following menu
options in the left pane:
181
Chapter 5 ACI Fundamentals: Application Policy Model
Tenant Monitoring
As discussed in Chapter 3, for each tenant I highly recommend that you create
a monitoring policy in Tenants -> Tenant_name -> Policies -> Monitoring (or
alternatively use the common one from tenant common). There you can enable SNMP
traps, create Syslog data sources for tenant objects, configure Callhome, statistic
collection, or, for example, alter the fault and event severities. The monitoring policy
then has to be applied in the main tenant object Policy tab.
The collection of this data is especially important from the long-term visibility point
of view, to have a historical information in hand when you have troubleshooting needs.
7HQDQW$SUHVV 1R FRPPXQLFDWLRQ
E\ GHIDXOW
To create a new VRF, navigate to Tenant -> Tenant_name -> Networking -> VRFs
and right-click Create VRF. In the form (shown in Figure 5-6), there is no need to change
anything from the default state. Just uncheck the Create a Bridge Domain option (I will
cover bridge domain objects later in detail).
VRF’s various configuration knobs mostly relate to other ACI features and their
behavior inside a VRF, so I will return to them in more detail during the respective
sections. Here you can find at least a brief overview:
183
Chapter 5 ACI Fundamentals: Application Policy Model
At the moment of creation, VRF isn’t instantiated anywhere in the fabric yet because
you don’t have any interfaces or endpoints associated with it. APIC dynamically creates
VRFs only where needed to optimize switch resource utilization. Later, when deployed,
you can check its presence on respective leaf switches using the commands in Listing 5-1.
184
Chapter 5 ACI Fundamentals: Application Policy Model
VRFs instances on fabric switches always use the name format Tenant:VRF (in this
case Apress:PROD). All names are case-sensitive, and you should pay attention to their
length when designing the tenant object structure. Long names can become opaque
and create an unnecessary administrative burden when you have to work with them
repeatedly in the switch CLI during verification or troubleshooting tasks. The VRF route
distinguisher (RD) on each switch consists of NodeID:L3VNI. I will cover this in more
detail in Chapter 6, which describes ACI fabric forwarding.
Bridge Domains
Even though in a particular tenant you can have multiple VRFs, let’s stay for a while
inside a single VRF only.
In traditional networking, to implement network segmentation and enhance the
security, we use VLANs primarily. However, VLAN encapsulation wasn’t originally
meant to be tied with a single IP prefix and associated communication rules (ACLs, FW
rules, etc.). Its main intention was just to split up large broadcast domains on a single
device or cable. The ACI logical model returns to this concept but with bridge domains.
A bridge domain represents a separate Layer 2 broadcast domain inside a VRF that
can span multiple switches or even distinct ACI fabrics, but without a direct relation to
traditional VLANs. Each BD receives its own VXLAN L2 VNID, which ensures transparent
185
Chapter 5 ACI Fundamentals: Application Policy Model
forwarding of any unicast, multicast, or broadcast frames over the fabric. Endpoints are
put into a specific BD by associating EPGs with the BD, and VLAN affiliation happens
at EPG-to-leaf interface mappings. As a result, traditional VLANs became just a local
encapsulation identifier between the connected endpoint and ACI leaf switch. All main
forwarding decisions in ACI are instead related to bridge domains, VRFs and EPGs.
A bridge domain offers two types of network services based on an enabled or
disabled unicast routing configuration option:
In Figure 5-7, you can see the bridge domain configuration without unicast routing
enabled. Each bridge domain has to have its own L3 default gateway connected
somewhere in the fabric and ACI provides just L2 bridging between the endpoints.
7HQDQW$SUHVV
186
Chapter 5 ACI Fundamentals: Application Policy Model
On the other hand, with routing enabled for a bridge domain, the ACI fabric
becomes a distributed router logically, allowing you to configure one or even multiple
default gateways for each Layer 2 segment (as shown in Figure 5-8). The default gateway
is created directly on leaf switches in the form of an SVI interface.
7HQDQW$SUHVV
95) 352'
187
Chapter 5 ACI Fundamentals: Application Policy Model
Figure 5-9. Layer 3 enabled vs. Layer 2 only bridge domain configuration
The second step in the BD creation process is related to the L3 configuration (as
shown in Figure 5-10). By default, each bridge domain has the Unicast Routing knob
enabled and thus supports the IP default gateway configuration. ARP flooding is enabled
as well by default. L3 bridge domains natively limit local endpoint learning to configured
subnets (this option is visible at the bottom and will be covered shortly).
To make the bridge domain Layer 2 only, uncheck the Unicast Routing option. Other
configuration settings on this page related to IPv6 or external connectivity using L3OUTs
are not important for you at this point, and you can leave them in the default state.
188
Chapter 5 ACI Fundamentals: Application Policy Model
VLAN Name Encap Ports
---- -------------------------------- ---------------- -----------
---------
14 infra:default vxlan-16777209, Eth1/1
vlan-3967
15 Apress:Backend_BD vxlan-16318374 Eth1/20, Po1
17 Apress:Frontend_BD vxlan-16089026 Eth1/20, Po1
19 Apress:Database_BD vxlan-16449430 Eth1/20, Po1
189
Chapter 5 ACI Fundamentals: Application Policy Model
Notice that if you define multiple subnets for a single bridge domain, APIC will
create just one SVI interface and all additional IP addresses are included as a part of it in
the form of secondary IPs.
ARP Handling
Correctly delivered ARP between hosts is a crucial part of network communication in
IPv4. In ACI version 5.X, ARP is by default always flooded within a BD (regardless of
unicast routing configuration). Therefore, its delivery is ensured in every situation, even
if you have silent hosts connected to ACI, whose endpoint information is not yet learned
by the control plane in the COOP database. The best practice recommendation for a L2
bridge domain is never to turn off ARP flooding.
With unicast routing enabled, on the other hand, you can optimize ARP delivery
by unchecking the ARP flooding configuration option. Then, instead of sending ARP
broadcasts to all interfaces mapped to a particular bridge domain, ARP requests are
routed directly and only to the individual connected host based on the information in
the leaf local and spine COOP endpoint database. If some silent host is present in the
ACI fabric and the switches don’t have its endpoint information yet, the spines will
drop the original ARP packet and generate a so-called ARP Glean message instead.
Glean is sent to all leaves with the appropriate bridge domain deployed, instructing
them to create and locally flood the artificial ARP request destined to the original host
and sourced from BD’s SVI interface. ACI expects that this should “wake up” the silent
host, which will subsequently respond to ARP. The leaf switch will learn its location and
update the spine COOP database. All following ARPs will be delivered between the end
hosts directly.
190
Chapter 5 ACI Fundamentals: Application Policy Model
For internal forwarding, ACI doesn’t build and maintain a traditional ARP table. It
relies on the endpoint database filled by information learned from data plane traffic to
obtain a next-hop MAC address for a particular IP.
This approach isn’t scalable enough for external endpoints and prefixes learned
through L3OUTs, though. Imagine having tens of thousands of /32 IP endpoints behind
the same external router MAC address (while current ACI scalability in 5.X version is
only 4,096 host IP entries behind one MAC address). For this case only, ACI uses the
traditional routing information base (routing table) and ARP table to resolve the external
MAC address to IP next-hop relationship.
If there is a potential, some of your hosts (especially VMs) will change the MAC
address over time but retain the same IP behind the single interface; in the default state,
ACI won’t trigger endpoint relearning. You need to wait for the endpoint aging timer to
pass. In such a case, you can enable GARP-based endpoint move detection in the bridge
domain L3 configuration section. As soon as the end host sends a gratuitous ARP with
the new MAC information, ACI will instantly update its endpoint databases.
Application Profiles
With bridge domains in place, you can move further to application profiles and EPGs.
Application profiles act as a logical container grouping together multiple endpoint
groups. To create an application profile, go to Tenants -> Tenant_name -> Application
Profiles. This object has only a single configuration option available: the selective
monitoring policy applicable to all EPGs included inside (as shown in Figure 5-11).
191
Chapter 5 ACI Fundamentals: Application Policy Model
Endpoint Groups
One of the main segmentation tools and concepts in ACI is represented by endpoint
groups. The idea behind application profiles and EPGs is to enforce network
communication policies between resources connected to the fabric, but without direct
dependencies on VLANs, subnets, or a physical network. As the name suggests, EPGs
allow the administrator to create separated groups of connected resources in the fabric
named endpoints.
An endpoint is simply any host (bare-metal, virtualized, or containerized) with a
unique MAC and IP address pair (in L2 bridge domains, ACI learns only MACs). The
fabric doesn’t differentiate between various types of connected resources, so each
packet is delivered uniformly and solely based on endpoint information learned and
saved in leaf endpoint tables and the spine COOP database.
As shown in Figure 5-12, the EPG is associated with exactly one bridge domain,
which defines its L2 broadcast boundary, and one BD can provide forwarding services
for multiple EPGs. Compared to traditional networks, there is no solid relation between
endpoints in an EPG and their IP address. You can split one subnet into more EPGs
(frontend EPGs) or endpoints from multiple subnets in single EPG (backend EPG).
192
Chapter 5 ACI Fundamentals: Application Policy Model
7HQDQW$SUHVV
95) 352'
y
y
)URQWHQGB(3*
y
The ACI fabric by default allows the entire communication between endpoints
within the same EPG, but without additional contract policies in place, no
communication can happen between different EPGs. This allowlist philosophy further
enhances datacenter network security from day 0 and makes sure only explicitly
permitted traffic will flow through the fabric.
In order to create an EPG, navigate to Tenant -> Tenant_name -> Application
Profiles -> Profile_name -> Application EPGs and right-click this folder. Within the
configuration form shown in Figure 5-13, there are only two mandatory fields: Name and
the bridge domain association.
193
Chapter 5 ACI Fundamentals: Application Policy Model
Mostly you don’t need to alter the default EPG configuration, but I will briefly
describe the configuration knobs available during its creation. I will touch on some of
them later during this and the following chapters:
• Qos Class: Traffic classification of this EPG into one of QoS’ six
classes (levels)
• Data-Plane Policer: QoS tool to limit the traffic rate on all interfaces
for this EPG. You can choose from One-Rate Two-Color, or Two-Rate
Three-Color policers.
194
Chapter 5 ACI Fundamentals: Application Policy Model
• Preferred Group Member: In each VRF, you can create a set of EPGs
called a preferred group, which won’t enforce any contracts mutually
between them. Contracts will be enforced only to EPGs external to
the preferred group.
195
Chapter 5 ACI Fundamentals: Application Policy Model
1. Static: The administrator will statically define the exact VLAN and
interface associations to the EPG
Both options come with pros and cons, so let’s have a closer look at them.
6WDWLF3DWK0DSSLQJ 9/$13RUW(QFDSVXODWLRQ
3RG1RGH/IBBB93& 9/$1 WUXQN
$SSOLFDWLRQ
7HQDQW (3*
3URILOH
$SUHVV )URQW(QGB(3* 6WDWLF3DWK0DSSLQJ 9/$13RUW(QFDSVXODWLRQ
352'B$3
3RG1RGH/IBBB93& 9/$1 WUXQN
7HQDQW3ROLFLHV
$FFHVV3ROLFLHV
3K\VLFDO'RPDLQ
$SUHVVB3K\V'RP
$FFHVV3ROLFLHV
The endpoints are always connected behind some physical leaf interfaces and, as
you already know, you must create access policies to enable physical access. All together,
they will grant interface and encapsulation resources to a particular tenant.
196
Chapter 5 ACI Fundamentals: Application Policy Model
Now, from the tenant perspective, you will consume the access policies by associating
one or more physical domains with EPGs. That’s the only existing relationship between the
logical application (tenant) policies and the physical access policies. To do so, right-click
the EPG and choose Add Physical Domain Association. There’s nothing much to configure
there, as you can see in Figure 5-15. This will just unlock the access to ACI’s physical and
encapsulation resources. In other words, by creating this association you say, “Endpoints of
this EPG can potentially use any interface mapped to the physical domain and they can use
any VLAN (VXLAN) defined in VLAN (VXLAN) pool in that physical domain.”
The second step is to specify the exact static path to the particular interface(s) and
VLANs used for this EPG. Right-click the EPG and choose Deploy Static EPG on PC, VPC,
or Interface to open the configuration form (shown in Figure 5-16).
197
Chapter 5 ACI Fundamentals: Application Policy Model
Here you need to choose the Path Type, the individual interface or (vPC) port
channel. Based on your choice, the Path field will allow you to configure either a leaf
switch with one particular interface or a reference to the PC/vPC interface policy group.
Then fill in the Port Encap, which is the actual VLAN on the wire that the endpoints
are expected to use when communicating with ACI. For the only one VLAN on each
interface, Mode can be set to Access (Untagged); this defines either the traditional
untagged access interface or the native VLAN on a trunk port. All other VLANs must
have the trunk mode configured and ACI will expect dot1q tagged communication.
By creating this static mapping, you have exactly defined that all endpoints behind
this interface and VLAN will belong to this EPG. And the same concept applies to all
other EPGs.
There is one leaf TCAM optimization option called Deployment Immediacy. As you
can see in Figure 5-16, two settings are available:
• Immediate: The leaf installs to its hardware ASIC all the policies
(contracts) related to this EPG as soon as the mapping is created and
submitted in APIC.
198
Chapter 5 ACI Fundamentals: Application Policy Model
one mapping per VLAN and EPG for all interfaces that are part of AAEP. To do so, just
navigate to Fabric -> Access Policies -> Policies -> Global -> Attachable Access Entity
Profiles -> Policy (see Figure 5-17).
199
Chapter 5 ACI Fundamentals: Application Policy Model
For CLI verification of endpoint learning on APIC or a leaf switch, you can use the
commands shown in Listing 5-3.
Listing 5-3. EPG Endpoint Learning Verification on APIC and the Leaf
Switch CLI
Dynamic Endpoints:
Tenant : Apress
Application : PROD_APP
AEPg : Backend_EPG
200
Chapter 5 ACI Fundamentals: Application Policy Model
Tenant : Apress
Application : PROD_APP
AEPg : Database_EPG
Tenant : Apress
Application : PROD_APP
AEPg : Frontend_EPG
201
Chapter 5 ACI Fundamentals: Application Policy Model
E - shared-service m - svc-mgr
+--------------+---------------+-----------------+-----------+------------+
VLAN/ Encap MAC Address MAC Info/ Interface
Domain VLAN IP Address IP Info
+--------------+---------------+-----------------+-----------+------------+
28 vlan-10 0050.56b3.0b92 L eth1/20
Apress:PROD vlan-10 10.2.2.10 L eth1/20
28 vlan-10 0050.56b3.d400 L eth1/20
Apress:PROD vlan-10 10.3.3.10 L eth1/20
20 vlan-11 0050.56b3.0850 L eth1/20
Apress:PROD vlan-11 10.4.4.10 L eth1/20
29 vlan-9 0050.56b3.86dd L eth1/20
Apress:PROD vlan-9 10.1.1.10 L eth1/20
::::
output omitted
202
Chapter 5 ACI Fundamentals: Application Policy Model
Note ACI will learn both MAC and IP information if the bridge domain is
configured with unicast routing enabled, as in this case. Otherwise, you should see
only the MAC address.
If you cannot see any endpoints in EPG after configuring path mappings, check the
following:
• Faults related to EPG: If present, they will usually tell you exactly
what the problem is. In most cases, administrators are missing the
EPG to physical domain association or a configured VLAN in the
VLAN pool.
• Silent Host? ACI will learn about an endpoint as soon as its first
packet is seen. If the endpoint is silent, this can result in a missing
entry in the database as well.
203
Chapter 5 ACI Fundamentals: Application Policy Model
Network Centric
Especially during a migration phase to ACI, it makes sense to “imitate” your current
segmentation deployed in a legacy network and create a 1:1 mapping between VLANs
and ACI constructs. For each legacy VLAN you configure exactly one bridge domain
and one associated EPG (see Figure 5-19). All static or dynamic path mappings also use
only one VLAN encapsulation. As a result, endpoints from the legacy network can be
physically moved to ACI, while preserving their configuration, IP addressing, and logical
placement. Initially, such a bridge domain can stay at Layer 2 only with the default
gateway connected externally; later, it can take over with Layer 3 services as well. For my
customers, this is the first and preferred step into ACI.
7HQDQW$SUHVV /HJDF\1HWZRUN
95) 352'
9/B(3*
9/B%' 9/$1
9/B(3*
9/B%' 9/$1
9/B(3*
9/B%' 9/$1
204
Chapter 5 ACI Fundamentals: Application Policy Model
Application Centric
If you are implementing a greenfield ACI, or all your endpoints are already migrated
into it, you can start thinking about moving to an application-centric EPG design.
For this approach, it’s expected that you have information about the architecture of
your applications and their communication matrix. Then you can fully utilize ACI’s
tenant policies to exactly describe the individual application tiers and their network
requirements without a direct relation to the legacy VLAN segments or IP subnets.
As shown in Figure 5-20, each application tier is mapped to its own EPG, and you can
differentiate between individual servers based on their encapsulation and the interface
used in the static or dynamic path to EPG mappings. VLAN IDs will become just a locally
significant EPG identifier between servers and leaf switches instead of broadcast domain
boundaries with closely coupled IP subnets. Just make sure not to use the same VLAN
identifier for multiple EPGs on the same pair of leaf switches. ACI will raise a fault in that
case. By default, without additional configuration, VLAN IDs are global per switch or
vPC pair.
7HQDQW$SUHVV
95) 352'
$SSOLFDWLRQ3URILOH:DUHKRXVHB$33
/IBB
BB /IBB
9/$1 9/$1
/IBB /IBB
9/$1 9/$1
$3,*DWHZD\B(3*
3,*DWHZD\B(3* /IBB
9/$1
/IBB
9/$1
BB /IBB
/IBB /I
I BB
9/$1 9/$1
/IBB /IBB
9/$1 9/$1
205
Chapter 5 ACI Fundamentals: Application Policy Model
Microsegmentation uEPGs
Standard application EPGs and their endpoint segmentation based on VLANs and
interface mappings can support many use cases, but sometimes, especially with an
application-centric philosophy, you need to go even further and differentiate between
endpoints more granularly. As displayed in Figure 5-21, APIC enables you to configure
micro endpoint groups (uEPG), separating individual endpoints based on these specific
attributes:
• IP and MAC addresses: In case of endpoints connected to ACI using
the physical domain (bare-metal hosts)
'%B90
'%B90
%
9/$1 %
'%B90 0RQJR'%B(3*
'%B90
%)
9/$1 %%
'%B90
'%B90 4XDUDQWLQHBX(3*
%%
9/$1 %)
0RQJR'%B(3* 0RQJRBX(3*
• Endpoints have to be first learned in the main EPG before you can
microsegment them into uEPGs. Then the uEPG mapping happens
regardless of the encapsulation/interface.
• A new uEPG needs to be put inside the same bridge domain as the
parent EPG.
206
Chapter 5 ACI Fundamentals: Application Policy Model
• The new uEPG has to be associated with the same physical (or VMM)
domain as the parent EPG.
• If the physical domain is used, you need to statically map the leaves
to the uEPG.
To demonstrate the functionality, let’s assume you have three endpoints (from
Figure 5-21) in the main Database_EPG associated with Database_BD. See Listing 5-4.
Tenant : Apress
Application : PROD_APP
AEPg : Database_EPG
00:50:56:B3:27:0B 10.4.4.30 learned 102
eth1/9 vlan-11
00:50:56:B3:2F:07 10.4.4.20 learned 102
eth1/10 vlan-11
They can freely communicate between each other without restrictions, as you can
see in Figure 5-22, when pinging from DB_VM1 to DB_VM2 and DB_VM3.
207
Chapter 5 ACI Fundamentals: Application Policy Model
208
Chapter 5 ACI Fundamentals: Application Policy Model
Finally, set the uSeg Attributes in the uEPG configuration, where you can choose
from Match Any (OR) or Match All (AND) logical expressions and the attributes
themselves (as shown in Figure 5-24). In the case of the IP attribute, you can specify the
single end host IP or a subnet configured under the uEPG.
After a few moments from submitting the previous configuration, you will see that
specified endpoint(s) were learned inside this new microEPG, as shown in Listing 5-5.
Dynamic Endpoints:
Tenant : Apress
Application : PROD_APP
AEPg : Mongo_uEPG
209
Chapter 5 ACI Fundamentals: Application Policy Model
210
Chapter 5 ACI Fundamentals: Application Policy Model
ESGs are configured in Tenant -> Tenant_name -> Application Profile -> App_
name -> Endpoint Security Groups. In the first step, you just specify the name and VRF
for ESG (Figure 5-26).
During the second step, you need to add a combination of ESG selectors. For the
simple demonstration shown in Figure 5-27, I’ve picked one endpoint from each
previously used EPGs and put it inside an IP tag policy (Tenant -> Policies -> Endpoint
Tags). Each endpoint entry is configured with a key-value tag in the format EPG_
Name: <name>.
211
Chapter 5 ACI Fundamentals: Application Policy Model
These policy tags are now used in an ESG configuration for an endpoint
classification (see Figure 5-28).
In the third step, there are advanced contract settings (to be covered later in this
chapter), so leave the default configuration in place and submit the finished form. Right
after that you should see learned endpoints according to your specification (shown in
Figure 5-29).
212
Chapter 5 ACI Fundamentals: Application Policy Model
ACI Contracts
Now, since you have ACI resources in a fabric segmented into EPGs, microEPGs, or
ESGs, the last and crucial component of an application policy is the contracts, specifying
communication needs between them. Contracts represent the implementation of a
fundamental security architecture in ACI, using allowlist to permit the required traffic
between EPGs explicitly. Contracts are directional, reusable, and universally describe
protocols with their L4 ports (where applicable).
For better understanding, consider contracts as non-stateful access lists (ACLs) with
dynamic MAC/IP entries based on currently learned endpoints in related EPGs. ACI in
this regard acts as a distributed, non-stateful, zone-based firewall.
By default, no communication between EPGs, uEPGs, or ESGs is permitted thanks to
the per-VRF configurable option Policy Control Enforcement Preference set to Enforced.
If you set this knob to Unenforced, ACI completely stops using contracts and all traffic
between endpoints in a particular VRF will be allowed. This feature can come in handy
to quickly troubleshoot non-working communication between existing endpoints in
ACI. If it doesn’t go against your security policy, disabling the contract enforcement
momentarily can immediately show problems in contract definitions or associations if
the connectivity starts working.
As described in Figure 5-30, contracts have a pretty granular structure, allowing the
administrator to thoroughly design the communication policies between endpoints,
consisting of the following:
213
Chapter 5 ACI Fundamentals: Application Policy Model
&RQWUDFW)URQWBWRB%DFNB&7
6XEMHFW$SS0RGXOHB6XEM
)LOWHU$OORZB7&3B
(QWU\7&3B ± LSWFSVRXUFHDQ\GHVWLQDWLRQ
)LOWHU$OORZB7&3B
(QWU\7&3B ± LSWFSVRXUFHDQ\GHVWLQDWLRQ
6XEMHFW$SS0RGXOHB3%5B6XEM
)LOWHU$OORZB7&3B+7736
(QWU\7&3B ± LSWFSVRXUFHDQ\GHVWLQDWLRQORJ
(QWU\7&3B ± LSWFSVRXUFHDQ\GHVWLQDWLRQORJ
(QWU\7&3B ± LSWFSVRXUFHDQ\GHVWLQDWLRQORJ
Each contract comes with two universal logical connectors: consumer and provider.
In order to work correctly, both of them have to be connected to some of the following
entities (I will cover external EPGs in Chapter 7):
214
Chapter 5 ACI Fundamentals: Application Policy Model
Caution At the time of writing, ACI in version 5.2 didn’t support the application of
a contract between ESG and EPG, just two ESGs mutually.
A contract is, like many other application policy objects, a universal building block so
you can attach it to multiple EPG/ESGs and easily allow connectivity to shared resources
(as shown in Figure 5-31).
(3*%
(3*$ & &RQWUDFW 3 & &RQWUDFW 3 (3*&
& &
&RQWUDFW
3 3URYLGHU
& &RQVXPHU
3
(3*
6KDUHG
When designing contracts and their relations, I recommend consulting the official
scalability guide for your implemented ACI version as their amount is not unlimited
(although it is quite high). In ACI version 5.2, there can be a maximum of 100 EPGs
attached on each side of a single contract. Another important value is the maximum
TCAM space for deployment of contracts to the leaf’s hardware ASICs. Based on a given
leaf model and currently configured Forwarding Scale Profile, the amount of TCAM
memory can go from 64.000 up to 256.000, while the approximate consumption can be
calculated using the following formula:
Number of filter entries in a contract X Number of consumer EPGs X Number of
provider EPGs X 2
Note Contracts are enforced on unicast traffic only. Broadcast, unknown unicast,
and multicast (BUM) is implicitly permitted between EPGs.
215
Chapter 5 ACI Fundamentals: Application Policy Model
7HQDQW$SUHVV
^ƵďũĞĐƚ͗ ƉƉůLJ ďŽƚŚ ĚŝƌĞĐƟŽŶƐĂŶĚƌĞǀĞƌƐĞ ĮůƚĞƌ ƉŽƌƚƐ
95) 352' &ŝůƚĞƌ͗ ƉĞƌŵŝƚ ŝƉ ƚĐƉ ƐƌĐ͗ĂŶLJ;ƵŶĚĞĮŶĞĚͿ ĚƐƚ͗ ϴϬ ;ŚƩƉͿ
(3* (3*
)URQWHQGB(3* %DFNHQGB(3*
&RQVXPHU &RQWUDFW 3URYLGHU
)URQWBWRB%DFNB&7
The provider EPG is (as the name suggests) the provider of an application service
allowed by the contract. That means its endpoints are destinations of a traffic defined in
the contract. In your example, you expect that endpoints in provider EPG Backend_EPG
have the HTTP service enabled and open on port 80.
The consumer EPG, on the other side, consumes the service allowed by the contract.
Endpoints in consumer EPGs are therefore initiators of traffic flow or TCP sessions. In
your example, servers from Frontend_EPG initiate communication using a random
source TCP port to Backend_EPG endpoints with destination port 80.
A contract generally describes the allowed traffic from consumers to providers, in
case of TCP/UDP from a consumer source port to a provider destination port. But what
about the replies to such traffic? ACI isn’t stateful, so it won’t automatically create a
session to permit an incoming reply. Fortunately, thanks to default contract subject
216
Chapter 5 ACI Fundamentals: Application Policy Model
settings Apply Both Directions and Reverse Filter Ports, each contract is in fact applied
two times, consumer to provider and provider to consumer, but the second time with
reversed TCP/UDP ports. If you enable destination port 80 from any source, APIC will
also implement a permit for a reply from source port 80 to destination ANY. You don’t
need to create multiple contracts or associate the one in both directions manually.
Contract Configuration
Configuration-wise, to create a new contract you need to go to Tenant -> Tenant_name
-> Contracts and right-click the Standard folder. As you can see in Figure 5-33, besides
the object name, you can set the contract scope and QoS properties and create a list of
subjects.
Contract Scope
By default, the contract is applicable and valid between any EPGs inside a single VRF. A
contract scope configuration option gives this at the root object level. You can choose
from the following values:
217
Chapter 5 ACI Fundamentals: Application Policy Model
Contract Subject
Individual contract subjects group together traffic rules with a specific treatment. Here
you can apply different QoS policies just for a subset of traffic matched by the main
contract or differentiate between packets forwarded in a standard way through the ACI
fabric and the communications redirected to L4-L7 service devices using ACI’s service
graphs (covered in Chapter 8).
As shown in Figure 5-34, other important configuration knobs, by default enabled,
are the already-mentioned Apply Both Direction and Reverse Filter Ports. They ensure
that the subject will permit not only the traffic from the initiator consumer to the
destination provider EPGs, but also the replies in the opposite direction.
218
Chapter 5 ACI Fundamentals: Application Policy Model
When adding multiple filter objects to the subject, you can choose which of them
will permit or deny the specified traffic. Optionally, you can enable logging for particular
filters using the Directives setting or decide to conserve the TCAM memory when
you choose the Enable Policy Compression directive. In that case, ACI won’t save any
statistics about matched packets by that rule in exchange for less TCAM consumption.
Contract Filter
The last contract component is filters (see Figure 5-35). Using filter entries, they describe
exactly the type of traffic, permitted or denied, based on subject settings when the filter
is applied.
219
Chapter 5 ACI Fundamentals: Application Policy Model
For filter creation and configuration, you can get it directly from subjects as a part of
the contract object, or they can be prepared separately in advance and later just used.
Individual filter objects can be found in Tenants -> Tenant_name -> Contracts ->
Filters.
The filter entry configuration options include
220
Chapter 5 ACI Fundamentals: Application Policy Model
Graphically, the result can be checked when you navigate to the contact object and
open the Topology tab. It describes just the associations of one contract (see Figure 5-37).
Note that the arrows characterize the relationship from a contract perspective: the arrow
goes from a provider EPG to a contract and from a contract to a consumer EPG. The
actual traffic between endpoints is initiated in the opposite direction.
221
Chapter 5 ACI Fundamentals: Application Policy Model
Another way to show contract relations graphically is when you open the application
profile object and navigate to the Topology tab (shown in Figure 5-38). You’ll find an
overview of the whole application structure.
222
Chapter 5 ACI Fundamentals: Application Policy Model
In your case, Frontend_EPG has class ID 16386 and Backend_EPG has class ID
49156. Common scope 3080193 refers to a VRF instance they are both part of. Now when
you apply a simple contract between them, allowing destination TCP port 80 with default
subject configuration (apply both directions and reverse filter ports), the outcome is
documented in Figure 5-40.
223
Chapter 5 ACI Fundamentals: Application Policy Model
Each contract relationship between two unique EPGs creates by default a number of
new entries in zoning tables of all affected leaf switches, where you have some static or
dynamic EPG to interface mappings. The number of entries depends on the amount of
filter rules included in the contract. These zoning tables are basically the end result and
hardware implementation of security policies in ACI. As you can see in Figure 5-34, you
can view their current content on a particular leaf switch by issuing show zoning-rule.
The structure of a zoning table is the following:
224
Chapter 5 ACI Fundamentals: Application Policy Model
• Dir: Direction of the rule. With apply both directions enabled, there
will be always bidir rules from the consumer to the provider EPG
and uni-dir-ignore rules for returning traffic from the provider to the
consumer EPG.
• operSt: Enabled/Disabled
In your case with the contract permitting TCP destination port 80 using a single filter,
you can see two entries: rule ID 4117 for a consumer to provider direction and 4114 for
returning traffic. That’s the result of setting Apply Both Directions in the contract subject.
There are, of course, many other entries in the zoning table, but I’ve omitted them for
clarity. You will always find entries implicitly permitting ARP in each VRF and implicitly
denying and logging all other traffic, not matched by any more specific custom contract.
Implicitly denied entries have the lowest possible priority of 22 so all your custom
contracts will take precedence over it.
Note Although the rule 4117 (consumer to provider) from the previous example
specifies the bi-dir direction according to the zoning table, it suggest that you
can also permit TCP destination port 80 in the other way, provider to consumer. In
fact, ACI enforces this bi-dir entry only in one and the correct direction. Lab tests
showed that all TCP SYNs to a destination port 80 from provider to consumer are
dropped.
The Reverse Filter Ports option in a contract subject configuration causes that for
each filter defined in a contract. You can see two individual entries in a zoning filter
table with automatically switched TCP/UDP port definitions. Filter ID 27 from your
example defines the TCP any source port connection to destination port 80, while the
automatically created Filter ID 28 reverses the definition to TCP source port 80 and any
destination port.
Just as a side note, when you consider contracts permitting “ip any any” (all filter
fields unspecified), thanks to this zoning rule and zoning filter implementation, it
actually doesn’t matter which way the contract is applied between two EPGs. It will
permit the entire traffic in both directions, regardless of consumer-provider association.
Only more specific port definitions have this directional behavior.
226
Chapter 5 ACI Fundamentals: Application Policy Model
MAC/IP addresses behind local interfaces and update this information using COOP
on spines as well. By default, the same learning process happens for remote endpoints
communicating with local ones. For zoning rules enforcement, all these endpoints must
be internally associated with pcTags (sClass) of their EPG/ESG/uEPG. This information
can be then verified with the commands in Listing 5-6.
227
Chapter 5 ACI Fundamentals: Application Policy Model
When a packet arrives from the source endpoint, the leaf will internally associate it
with a specific policy tag (pcTag) based on previous endpoint learning. As a next step,
the contract enforcement decision is made.
If the leaf switch already knows about a destination endpoint, it can right away
search through its zoning rules, looking for matching source and destination pcTag
entries there. Based on the result, the packet is either permitted to further processing
or denied. This happens thanks to the default setting of Policy Control Enforcement
Direction to Ingress in the VRF policy. That’s why leaves prefer to enforce contracts on
ingress whenever possible.
If the leaf switch is not currently aware of a destination endpoint and thus it cannot
derive its pcTag, such a packet is forwarded through the ACI fabric to the egress leaf
switch and zoning rules are applied there instead.
228
Chapter 5 ACI Fundamentals: Application Policy Model
7HQDQW$SUHVV
(3*$ &RQVXPHU
V&ODVV
&RQWUDFW 3URYLGHU
(3*
3UHIBWRB%DFNB&7
%DFNHQGB(3*
V&ODVV
(3*%
V&ODVV
(3*3UHIHUUHG*URXS
In order to configure the preferred group feature, you need to do two steps:
229
Chapter 5 ACI Fundamentals: Application Policy Model
7HQDQW$SUHVV
Y]$Q\
• Need to permit access from one specific EPG to all others in the same
VRF, or vice versa
231
Chapter 5 ACI Fundamentals: Application Policy Model
Responsible entries for vzAny contract enforcement are Rules 4139 and 4128. The
first one permits communication to be initiated from any EPG (sClass 0) to the provider
EPG (sClass 48452), and the returning traffic is then allowed using the second rule with
reversed filter ports.
Image you have implemented a universal permit for some TCP protocol where vzAny
is both provider and consumer, as shown in Figure 5-43.
232
Chapter 5 ACI Fundamentals: Application Policy Model
Tenant Apress
vzAny
vzAny
The subsequent outcome in the leaf hardware tables would look like Listing 5-9.
233
Chapter 5 ACI Fundamentals: Application Policy Model
In the last two entries, 4139 and 4128, you can see that there is a contract applied
but without any specific sClass in both directions, resulting in permitting any-to-any
communication (sClass 0).
7HQDQW$SUHVV
&RQVXPHU (3*
&RQWUDFW 3URYLGHU
1RFRPPXQLFDWLRQLQVLGH
R FRPPXQLFDWLRQLQVLGH (3
(3* %DFNHQGB(3*
,VRODWHGBWRB%DFNB&7 V&ODVV
(3* ,VRODWHGB(3*
V&ODVV
%' )URQWHQGB%'
Even though endpoints inside the isolated EPG won’t communicate between each
other, the EPG is still attachable using standard application contracts with other EPGs.
To enable the endpoint isolation, open the EPG object and navigate to Policy -> General
and switch the Intra-EPG Isolation knob to Enforced.
After enabling the isolation, in zoning rules you will find one new entry denying all
traffic with the source and destination being the same sClass of the isolated EPG (49161).
See Listing 5-10.
234
Chapter 5 ACI Fundamentals: Application Policy Model
If you still need to permit at least some traffic between the otherwise isolated
endpoints, ACI allows you to configure the intra-EPG contract (depicted in Figure 5-45).
It’s a standard contract applied using a special configuration option. Right-click the
isolated EPG object and choose Add Intra-EPG Contract.
7HQDQW$SUHVV
&RQVXPHU &RQWUDFW
1RFRPPXQLFDWLRQLQVLGH
R FRPPXQLFDWLRQ LQVLGH (3
(3* ,QWUD(3*B&7
3URYLGHU
(3* ,VRODWHGB(3*
V&ODVV
%' )URQWHQGB%'
To the previous single zoning rule effectively isolating endpoints two more entries
are added now, overriding the deny by higher priority, but only for traffic specified in the
filter of this intra-EPG contract. See Listing 5-11.
235
Chapter 5 ACI Fundamentals: Application Policy Model
236
Chapter 5 ACI Fundamentals: Application Policy Model
237
Chapter 5 ACI Fundamentals: Application Policy Model
In this output you can see the information about denied attempts to establish a SSH
connection from 10.1.1.10 (the frontend endpoint in your example) to 10.4.4.10 (the DB
endpoint). By filtering the show command using grep based on the source or destination
IP address, you can pretty easily and quickly confirm if the packet drops are caused by
contracts on this particular leaf switch.
A good practice is to export all of these logs from the entire ACI fabric to some
external syslog collector. Before ACI will do so, first you need to edit the fabric syslog
238
Chapter 5 ACI Fundamentals: Application Policy Model
message policy found in Fabric -> Fabric Policies -> Policies -> Monitoring ->
Common Policy -> Syslog Message Policies -> default (see Figure 5-46). There the
default Facility Filter has to be set from alerts to information.
Tip This kind of logging can be alternatively enabled for permit rules in contracts
as well. Just choose the Log directive in the contract subject definition. In the CLI,
then use show logging ip access-list internal packet-log permit.
APIC contract_parser.py
Another great utility to help you with contract troubleshooting is a Python script located
natively in each leaf switch called contract_parser.py. It takes all the information from
the previous show commands, resolves the actual names of the VRFs and EPGs used in
zoning rules, and presents the consolidated output. See Listing 5-14.
239
Chapter 5 ACI Fundamentals: Application Policy Model
Leaf-101# contract_parser.py
Key:
[prio:RuleId] [vrf:{str}] action protocol src-epg [src-l4] dst-epg [dst-l4]
[flags][contract:{str}] [hit=count]
240
Chapter 5 ACI Fundamentals: Application Policy Model
Using the CLI, you can receive the same information when issuing the show
command in Listing 5-15.
241
Chapter 5 ACI Fundamentals: Application Policy Model
--output omitted--
Policy stats:
=============
policy_count : 48
max_policy_count : 65536
policy_otcam_count : 84
max_policy_otcam_count : 8192
policy_label_count : 0
max_policy_label_count : 0
--output omitted--
242
Chapter 5 ACI Fundamentals: Application Policy Model
Application-centric:
<Purpose>_BD
Bridge domain for firewall <FW_name>FW_BD XXXFW_BD
PBR
Application profiles <VRF_name>_APP PROD_APP
<Application_name>_APP TestApp_APP
EPG Legacy: VL0900_EPG
VL<vlan_id>_EPG Frontend_EPG, DMZ_EPG,
Application_EPG
Application Centric:
<Purpose>_EPG
Contract <Purpose>_CT Legacy_Permit_ANY_CT
Frontend_to_Backend_CT
Traffic flow direction in name:
<Consumer_EPG>_to_<Provider
_EPG>_CT
Contract with L3OUT on <App EPG >_to_<L3OUTname> Web_to_MPLS_
provider side _<External EPG>_CT Default_ExtEPG_CT
Contract with L3OUT on <L3OUT_Name>_<External EPG>_ MPLS_Default_
customer side to_<App EPG>_CT ExtEPG_to_Web_CT
Subject <Purpose>_(PBR)_Subj ANY_Subj, Web_PBR_Subj
Filter <Allow|Deny>_<Protocol>_<Port Allow_HTTP_8080
range|Custom ports> Deny_HTTP
Allow_TCP_2459-3000
(continued)
243
Chapter 5 ACI Fundamentals: Application Policy Model
Summary
The application policy model is one of the main ACI components, enabling the
network administrator to match the application connectivity requirements with the
actual network configuration. Its goal is to provide flexible and universal building
blocks to enable network multitenancy, describe application segmentation, or enforce
connectivity rules between the ACI endpoints connected behind leaf switches.
During this chapter you learned how to create and manage fundamental application
policy objects including tenants, application profiles, endpoint (security) groups, and
contracts. Additionally, you explored how to provide Layer 2 connectivity with bridge
domains inside a L3 space represented by VRFs for all internal ACI endpoints with
correct mapping to their EPGs. This knowledge will become your basis for further
expansion to the areas of ACI internal forwarding as well as external connectivity or
L4-L7 service insertion.
The next chapter will focus on more details of ACI internal forwarding between the
endpoints and the logic behind the scenes. Understanding this is key for effective and
successful connectivity troubleshooting.
244
CHAPTER 6
245
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_6
Chapter 6 Fabric Forwarding (and Troubleshooting)
t ransparent Layer-2 frame shipment across the ACI fabric, but also the scalability of
available logical segments. While traditional VLAN encapsulation allows you to address
only 4,096 segments due to the 12-bit VLAN ID header field, VXLAN offers up to 16M
segments thanks to a 24-bit addressing space.
As shown in Figure 6-1, when forwarding any traffic between two endpoints, ACI
generally performs so-called encapsulation normalization.
2XWHU 2XWHU,3
2XWHU,3 ,QQHU
1RUPDOL]HG 8'3 9;/$1
0$&
,QQHU,3 7&38'3 2ULJLQDO
0$& 97(3
(QFDSVXODWLRQ +HDGHU +HDGHU +HDGHU +HDGHU 3D\ORDG
+HDGHU +HDGHU +HDGHU
7&3 7&3
2ULJLQDO (QGKRVW 0$&
$& 0$&
GRW4WDJ ,3+HDGHU 8'3 3D\ORDG GRW4WDJ ,3+HDGHU 8'3 3D\ORDG
3D
(QFDSVXODWLRQ +HDGHU
DGHU
+HDGHU
+HDGHU
+HDGHU
Regardless of the type or ingress encapsulation ID (dot1Q VLAN ID, VXLAN VNID),
when the ACI leaf has to forward the incoming frame to another leaf, it removes
the original encapsulation and replaces it with common internal iVXLAN header.
Consequently, the entire traffic is forwarded across the fabric in a normalized way,
consistently using internal VXLAN VNID segments. When the original frame arrives to
an egress leaf, it will strip down the internal iVXLAN header and add the correct egress
encapsulation and deliver the frame to the connected end host. Thanks to this concept,
the original encapsulation and its ID is just a locally significant value describing how
the connected endpoints communicate with their closest leaf switch. Normalization to
246
Chapter 6 Fabric Forwarding (and Troubleshooting)
common VXLAN segments inside the fabric allows ACI to interconnect any endpoints,
even those not using same VLAN IDs or not using the same encapsulation type at all.
This provides significant flexibility not otherwise available in a traditional network.
Figure 6-2 illustrates the internal iVXLAN header in detail with all field sizes listed.
'HVW0$&$GGUHVV
6UF0$&$GGUHVV
9/$17\SH
ϭϰďLJƚĞƐ
[
;ϰďLJƚĞƐ
ŽƉƟŽŶĂůͿ
3ULRULW\&),9/$1,'
81'(5/$<
(WKHU7\SH
[
2XWHU0$&+HDGHU ,3+HDGHU0LVF'DWD
3URWRFRO
[ 8'3
+HDGHU&KHFNVXP ϮϬďLJƚĞƐ
2XWHU,3+HDGHU
6RXUFH,3
'HVWLQDWLRQ,3
8'3+HDGHU
ϱϬ;ϱϰͿ
ďLJƚĞƐ 8'36RXUFH
29(5/$< LQQHUIUDPHKHDGHUVKDVK
9;/$1'HVWLQDWLRQ
9;/$1+HDGHU ϴďLJƚĞƐ
8'3/HQJWK
&KHFNVXP[
2ULJLQDO/D\HU)UDPH
'/ ( 63 '3
)ODJV
6RXUFH*URXS
9LUWXDO1HWZRUN,GHQWLILHU
ϴďLJƚĞƐ
91,'
5HVHUYHG
Although the iVXLAN header originates from the RFC standard, ACI expands its
capabilities by using many otherwise reserved fields for the forwarding purposes. In next
sections, I will go through them and describe their significance.
247
Chapter 6 Fabric Forwarding (and Troubleshooting)
248
Chapter 6 Fabric Forwarding (and Troubleshooting)
249
Chapter 6 Fabric Forwarding (and Troubleshooting)
$Q\FDVW $Q\FDVW
97(3 ,3 97(3 ,3
,6,6 5RXWLQJ 3URWRFRO 95) RYHUOD\
x 97(3 ,3 ([FKDQJH
x 0XOWLFDVW 7UHH &DOFXODWLRQ / 33 8SOLQNV
,6,6 DQG 03%*3 SHHULQJV
/RFDOO\ OHDUQHG
HQGSRLQWV DUH
DQQRXQFHG WR &223
VSLQHV XVLQJ
&223
6HUYHU $ 6HUYHU %
250
Chapter 6 Fabric Forwarding (and Troubleshooting)
For all its fabric ACI uses the point-to-point interfaces IP Unnumbered feature to
conserve IP space and simultaneously provide routing capabilities; this completely
eliminates any reliance on the Spanning Tree protocol. Instead of individual IP
addresses, all uplinks borrow a single IP from Loopback 0 (VTEP IP). Uplinks are always
configured with a subinterface, but internally no VLAN encapsulation is used. See
Listing 6-1.
After fabric initialization, leaves and spines establish IS-IS L1 adjacencies between
each other over fabric interfaces in transport VRF overlay-1. This routing protocol is
primarily used to ensure a distribution of VTEP addresses around the fabric and create
multicast distribution trees (which I will describe later in this chapter). See Listing 6-2.
251
Chapter 6 Fabric Forwarding (and Troubleshooting)
252
Chapter 6 Fabric Forwarding (and Troubleshooting)
Thanks to routing information about remote VTEP addresses, you know how to get
VXLAN encapsulated traffic to the distant leaf switches or spine proxy, but you still need
the information on who is behind them. Traditional MAC tables on legacy switches are
replaced with an endpoints database in ACI. Each leaf switch in the default state learns
about locally connected endpoints by observing their source packets, generated ARPs
requests, or DHCP messages. MAC addresses (in case of L2 BD) and MAC+IP couples
(L3 BD) are put into the local endpoint database, together with additional metadata
like encapsulation ID, source interface, or EPG/BD/VRF affiliation (sClass). In the
endpoint database, besides local entries, you can find remote MAC or IPs, which are
easily identifiable by the tunnel interface instead of local EthX/X. This approach, called
conversational learning, is in place to optimize a leaf’s HW resources by learning only
those remote endpoints that are actively communicating with the local ones. The source
of information about the remote endpoint can be a standard bridged or routed packet
incoming from the fabric, encapsulated to iVXLAN or a flooded ARP (GARP) packet, if
ARP flooding is enabled on a BD level.
ACI avoids stale endpoints by attaching aging timers to both local and remote entries.
They are configurable in Endpoint Retention Policy located in Tenants -> Tenant_
name -> Policies -> Protocol -> Endpoint Retention and applied per bridge domain
in its general policy configuration. Timers are refreshed as soon as a leaf sees any traffic
from a particular endpoint. The default value for local endpoints is 900 seconds, while
for a remote it’s 300 seconds. After the timer expires, the related entry is removed from
the endpoint database. If the endpoint is learned in a L3-enabled BD with default GW
in ACI, the leaves additionally perform host tracking. This feature proactively aims to
refresh the endpoint information after crossing 75% of its aging timer without any packet
seen during the interval. In case of need, the leaf sends three artificially generated ARP
requests destined to the end host IP from its bridge domain SVI interface. If the host is
still present, the aging is reset by replying to the ARP requests. See Listing 6-3.
253
Chapter 6 Fabric Forwarding (and Troubleshooting)
+------------+-------------+-----------------+--------------+-------------+
VLAN/ Encap MAC Address MAC Info/ Interface
Domain VLAN IP Address IP Info
+------------+-------------+-----------------+--------------+-------------+
29 vlan-10 0050.56b3.d400 L eth1/20
Apress:PROD vlan-10 10.3.3.10 L <<-local entry->> eth1/20
20 vlan-11 0050.56b3.2f07 O tunnel3
Apress:PROD vlan-11 10.4.4.20 O tunnel3
Apress:PROD 10.4.4.10 <<-remote entry->> tunnel19
30 vlan-9 0050.56b3.86dd L eth1/20
Apress:PROD vlan-9 10.1.1.10 L eth1/20
Even though leaves are not aware of all endpoints in the ACI fabric, they have to be
able to send traffic to unknown ones as well. In such a case and based on the bridge
domain setting (described later), they can either flood the iVXLAN encapsulated traffic
through the fabric using multicast to all leaves with that the bridge domain configured or
send the packet to the spine acting as an endpoint resolution proxy. The spine, if possible,
resolves the destination endpoint in its distributed database and the iVXLAN packet is
forwarded to the egress leaf. The spine endpoint database is filled exclusively using the
control plane and that’s when the Council of Oracles Protocol (COOP) comes into play.
From a COOP terminology point of view, leaves are “citizens” periodically updating
the spine “oracle” endpoint repository. Each locally learned endpoint on a leaf is
immediately announced using COOP adjacency to the spine as well as any change to it.
The spines are therefore expected to have absolute visibility into all known endpoints to
254
Chapter 6 Fabric Forwarding (and Troubleshooting)
a single fabric. When checking the COOP database for a single given endpoint, you have
to specify a key, a bridge domain VNID (segment ID) for a MAC address endpoint or VRF
VNID (segment ID) in case of IP+MAC endpoints. See Listing 6-4.
IP address : 10.3.3.10
Vrf : 3080193
Flags : 0
EP bd vnid : 16318374
EP mac : 00:50:56:B3:D4:00
Publisher Id : 10.11.24.64
Record timestamp : 05 24 2022 15:38:17 482300
Publish timestamp : 05 24 2022 15:38:17 1155657
Seq No: 0
Remote publish timestamp: 01 01 1970 00:00:00 0
URIB Tunnel Info
Num tunnels : 1
Tunnel address : 10.11.24.64
Tunnel ref count : 1
255
Chapter 6 Fabric Forwarding (and Troubleshooting)
IS-IS and COOP protocols ensure intra-fabric forwarding information, but you can’t
forget about external connectivity when discussing control plane operations. All external
prefixes or endpoint information between distinct ACI fabrics are distributed around
using Multi-Protocol BGP (MP-BGP) with spines acting as a route reflectors for all leaf
switches. When ACI learns about a prefix via L3OUT, it is redistributed within its VRF
on a border leaf switch to the BGP in the address family VPNv4 or VPNv6. For internal
BD prefixes being propagated to external routers via L3OUTs, it works vice versa. In
Multi-Pod and Multi-Site architectures, spines, besides external prefixes, also exchange
endpoint information in the L2VPN EVPN address family. More details about inter-
fabric connectivity awaits you later in this chapter, and the next one will be completely
dedicated to external connectivity, including L3OUTs. See Listing 6-5.
256
Chapter 6 Fabric Forwarding (and Troubleshooting)
³7KLV WUDIILF LV GHVWLQHGWR 'DWDEDVH (3* ± , ZLOO SHUPLW 6SLQH 6SLQH ³,¶P OHDUQLQJ UHPRWH HQGSRLQW¶V ,3 EHKLQG
GHQ\ LW EDVHG RQ LQVWDOOHG ]RQLQJ UXOHV FRQWUDFWV ´ /HDI 97(3 ,3 DQG GHFDSVXODWLQJ 9;/$1
SDFNHW &RQWUDFW KDV EHHQ DOUHDG\ DSSOLHG
³%DVHG RQ P\ HQGSRLQWWDEOH , NQRZ WKDW
RQ LQJUHVV´
GHVWLQDWLRQ ,3 RI VHUYHU % LV EHKLQG
$Q\FDVW 97(3 RI /HDI DQG ´
(QGSRLQW ,QWHUIDFH
³$KD 3DFNHW LV FRPLQJ LQ 9/$1 7XQQHO
,3 $
/HDI 97(3 ,3
ZKLFK LV PDSSHG WR 'DWDEDVH
(3* RQ WKLV LQWHUIDFH´
'67,3
65& ,3 9;/$1
/HDI 3D\ORDG
/HDI 97(3 ,3 91,'
(QGSRLQW ,QWHUIDFH $Q\FDVW Y3& 97(3 Y3&([SOLFLW
3URWHFWLRQ
0$& ,3 $ (WK *URXS
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\ *URXS/HDI
/IBBB93&
6HUYHU $ 6HUYHU %
,3 $ ,3 %
0$& $ %' 0$& % %)
9/$1 9/$1
%ULGJH 'RPDLQ %ULGJH 'RPDLQ
(3* %DFNHQGB(3* (3* 'DWDEDVHB(3*
%DFNHQGB%' 'DWDEDVHB%'
95) $SUHVV352'
/ 91,'
257
Chapter 6 Fabric Forwarding (and Troubleshooting)
6. The egress leaf will decapsulate the iVXLAN packet and save
the source endpoint location behind the source leaf as a remote
entry to its endpoint table. If not already, it will apply contract
zoning rules.
• ARP Flooding: ARP has its own dedicated checkbox here, allowing
you to optimize the delivery in L3-enabled bridge domains. Instead
of flooding, you can start delivering ARP directly to the destination
endpoint, based on endpoint table information.
258
Chapter 6 Fabric Forwarding (and Troubleshooting)
Layer 2 Forwarding
Let’s start by a closer look at simple traffic bridging between two ACI endpoints inside
one bridge domain (as shown in Figure 6-6). The Bridge domain has unicast routing
disabled, so there is no default gateway configured in ACI. Server A is connected behind
an individual interface using Access Policy Group, whereas Server B utilizes a VPC
port-channel and vPC Interface Policy Group.
259
Chapter 6 Fabric Forwarding (and Troubleshooting)
(QGSRLQW ,QWHUIDFH
,QWH
(QGSRLQW ,QWHUIDFH
,QWHUI
U DFH
0$& % 3R
3
0$& $ (WK
Y3&([SOLFLW
3URWHFWLRQ
3URWHFWLRQ
*URXS
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\\ *URXS
S /HDI
/IBBB93&
/ IBBB93& &
In the Layer 2 bridge domain, ACI is not learning IP addresses at all. From this
perspective, it acts similarly to a traditional switch, just looking at the source MAC
addresses from the data plane.
260
Chapter 6 Fabric Forwarding (and Troubleshooting)
to the correct recipients, instead of a unicast outer destination VTEP address, ACI uses
a Group IP outer (GiPO) address, dynamically assigned to each bridge domain after
creation. Figure 6-7 illustrates this operation.
)7DJ 5RRWV )7DJ 5RRWV
6SLQH 6SLQH
Y3&([SOLFLW
3URWHFWLRQ
*URXS
/HDI /HDI /HDI /HDI
$535(48(67 $535(48(67
:KR KDV " :KR KDV "
7HOO 7HOO
Note In fact, ACI supports 16 FTags, but IDs 13, 14, 15 are not used and are
implicitly disabled.
The placement of FTag roots can be verified using the spine CLI, together with the
configuration for individual GiPO addresses. See Listing 6-6.
261
Chapter 6 Fabric Forwarding (and Troubleshooting)
Multicast data will be forwarded to all interfaces listed in Output Interface List (OIF).
There you should find local downlinks to leaf switches as well as an external interface
leading to an IPN if Multi-Pod architecture is deployed.
(QGSRLQW ,QWHUIDFH
7XQQHO
0$& $
/HDI 97(3
0$& % 3R
(QGSRLQW ,QWHUIDFH
0$& $ (WK
'67,3
7XQQHO 65& ,3 9;/$1
0$& % /HDI $Q\FDVW 3D\ORDG
Y3& $Q\FDVW 97(3 /HDI 97(3 ,3 91,'
Y3& 97(3
Y3&([SOLFLW
3URWHFWLRQ
*URXS
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\ *URXS /HDI
/IBBB93&
3D\ORDG IURP
6HUYHU $WR
6HUYHU %
6HUYHU $ %ULGJH 'RPDLQ 'DWDEDVHB%' 6HUYHU %
,3 $ ,3 %
0$& $ %
6HJPHQW ,' 0$& % %)
9/$1 *L32 DGGUHVV 9/$1
In this situation, when you view the endpoint table on Leaf 101, you should see
information about both endpoints: MAC A with an associated local interface and MAC
B behind the TunnelX interface. For the local endpoint, you can see associated on-wire
encap VLAN 11, matching the static interface to EPG mapping. Notice there is another
VLAN 20 shown for this entry at the beginning of the row. That’s the locally significant
internal VLAN. The leaf switch always operates with internal VLANs instead of the encap
ones due to fact that you can have the same encap VLAN mapped to multiple interfaces
and always in a different EPG. Internally, the leaf needs to differentiate between them,
and this is how it is done. See Listing 6-7.
263
Chapter 6 Fabric Forwarding (and Troubleshooting)
264
Chapter 6 Fabric Forwarding (and Troubleshooting)
The remote endpoint behind the tunnel interface uses VXLAN encapsulation with a
VNID corresponding to the BD Segment ID. The tunnel itself represents a logical non-
stateful VXLAN connection between two leaves, specifically their VTEP addresses. They
are created automatically on demand as soon as the leaf receives the iVXLAN packet
from a new VTEP IP address and closes with the expiration of the last endpoint mapped
to them. The tunnel’s destination IP address should be routed in the VRF overlay-1
towards all spines, utilizing ECMP loadlbalancing. See Listing 6-8.
If you are wondering who the owner of tunnel destination IP is, you can easily find its
Node ID using the query command in the APIC CLI (Listing 6-9). Two entries mean it’s a
anycast VTEP IP configured on both vPC leaves.
265
Chapter 6 Fabric Forwarding (and Troubleshooting)
On destination egress leaves 103-104, the situation is similar, just with reversed
information: MAC B is a local one and the MAC A learnt is remote. Notice that internal
VLANs are totally different and not related to the ingress leaf. The only common
denominator is the logical segment, the bridge domain VNID you are forwarding the
Layer 2 traffic in. See Listing 6-10.
266
Chapter 6 Fabric Forwarding (and Troubleshooting)
267
Chapter 6 Fabric Forwarding (and Troubleshooting)
268
Chapter 6 Fabric Forwarding (and Troubleshooting)
(QGSRLQW ,QWHUIDFH
0$& $ (WK
'67,3
0$& % """""" 65& ,3 9;/$1
6SLQH $Q\FDVW 3UR[\ 3D\ORDG
/HDI 97(3 ,3 91,'
97(3 ,3
Y3&([SOLFLW
3URWHFWLRQ
*URXS
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\ *URXS
,QWHUID /HDI
/IBBB93&
/I 93&&
3D\ORDG IURP
6HUYHU $WR
6HUYHU %
6HUYHU $ %ULGJH 'RPDLQ 'DWDEDVHB%' 6HUYHU %
,3 $ ,3 %
0$& $ %
6HJPHQW ,' 0$& % %)
9/$1 *L32 DGGUHVV 9/$1
When the destination endpoint is missing in the endpoint table, you won’t see any
entry when searching on the ingress leaf. What you can check in such a case, though, is
the spine COOP database. Use the bridge domain VNID as a key and specify the MAC
address of the destination endpoint. If present in the COOP database with the correct
tunnel next hop, ACI should be able to deliver traffic to it. See Listing 6-11.
269
Chapter 6 Fabric Forwarding (and Troubleshooting)
-- output ommited --
Repo Hdr flags : IN_OBJ ACTIVE
EP bd vnid : 16449430
EP mac : 00:50:56:B3:08:50
flags : 0x80
repo flags : 0x102
Vrf vnid : 3080193
PcTag : 0x100c005
EVPN Seq no : 0
Remote publish timestamp: 01 01 1970 00:00:00 0
Snapshot timestamp: 05 28 2022 11:48:28 483522044
Tunnel nh : 10.11.66.70
MAC Tunnel : 10.11.66.70
IPv4 Tunnel : 10.11.66.70
IPv6 Tunnel : 10.11.66.70
ETEP Tunnel : 0.0.0.0
-- output ommited –
270
Chapter 6 Fabric Forwarding (and Troubleshooting)
,QFRPLQJ)UDPHRQ
/HDI,QWHUIDFH
<HV ,V'HVWLQDWLRQ 1R
&RYHUHGLQ/D\HU
0$&NQRZQ" )RUZDUGLQJ6HFWLRQ
<HV 1R
,V'HVWLQDWLRQ
%'&RQILJ
0$&/RFDO"
/8QNQRZQ8QLFDVW
)UDPH6HQWWR$OO )UDPH6HQW
/HDYHV:LWKLQ%' WR6SLQH
<HV ,V'HVWLQDWLRQ0$& 1R
+DUGZDUH
.QRZQWR&223 'URS
3UR[\
'DWDEDVH"
Layer 3 Forwarding
As soon as you can enable the Unicast Routing checkbox and configure subnet IP(s) in
bridge domain L3 configurations (Tenants -> Tenant_name -> Networking -> Bridge
Domains -> BD_Name -> Policy -> L3 Configurations), ACI starts to act as a default
gateway for all endpoints in related EPGs. It will deploy SVI interfaces with BD subnet IPs
on all leaves (depending on the immediacy setting), where at least one EPG to interface
mapping exists. Additionally, leaves will start learning MAC addresses together with
endpoint IP information.
Let’s now examine the forwarding behavior for two endpoints in different L3 bridge
domains as depicted in Figure 6-11.
271
Chapter 6 Fabric Forwarding (and Troubleshooting)
(QGSRLQW ,QWHUIDFH
(QGSRLQW ,QWHUIDFH 0$& %
3R
0$& $ ,3 %
(WK
,3 $
'HIDXOW *:
'HIDXOW *:
'HIDXOW *:
Y3&([SOLFLW
3URWHFWLRQ
*URXS
/HDI ,QWHUIDFH 3ROLF\ *URXS
S /HDI
/HDI (WK /HDI
/IBBB93&93&&
6HUYHU $ 6HUYHU %
,3 ,3
0$&%' 0$&%)
9/$1 9/$1
95) $SUHVV352'
/ 91,'
To ensure routing outside an endpoint’s “home” bridge domain, you can check the
pervasive gateways presence on the related leaf switches. See Listing 6-12.
272
Chapter 6 Fabric Forwarding (and Troubleshooting)
The following step before the communication in IPv4 between endpoints can
happen is, as always, the ARP resolution.
Now, when communication outside your own bridge domain is needed, the
endpoint must use ARP to resolve the MAC address of its default gateway (in your case,
the ACI pervasive SVI interface). The ARP reply for such a request is always processed
and sent from the closest leaf switch; no further forwarding is made. You can verify that
it’s happening from the ACI point of view by issuing the command in Listing 6-13.
In the output you see that the leaf received the ARP request on a physical interface
Ethernet 1/20 from the end host with IP 10.3.3.10. The destination (target) IP was 10.3.3.1,
which matched ACI’s own pervasive gateway of SVI VLAN15. Therefore, the leaf generated
the ARP reply, enabling the end station to reach all the subnets outside of its bridge domain.
Another very useful tool in general for control plane operation verification, including
the ARP Protocol, is the well-known tcpdump utility available on all ACI devices. However,
you can only capture and analyze traffic heading to the leaf/spine CPU, not a standard
data plane packet. For data plane analysis I will soon provide you with other options.
Regarding ARP, with tcpdump you can see in real time whether the endhost is sending
ARP requests to the leaf switch with the correct information. Even if you don’t have access
to it, at least you can confirm the expected behavior or right away identify issues on the
endpoint side and possibly involve the related team in troubleshooting. See Listing 6-14.
274
Chapter 6 Fabric Forwarding (and Troubleshooting)
Based on ARP packets, leaves will learn the placement of local endpoints and save
them to their endpoint table. In a Layer 3-enabled bridge domain you will find both
MAC+IP entries. See Listing 6-15.
To wrap up ARP forwarding in ACI, the schema in Figure 6-12 illustrates the decision
process for ARPs in an L3 bridge domain.
275
Chapter 6 Fabric Forwarding (and Troubleshooting)
$533DFNHW,QFRPLQJWR
/HDI,QWHUIDFH
1R <HV
/%ULGJH'RPDLQ"
1R
)ORRG:LWKLQ%' <HV
,V$535HTXHVW,32XU
8VLQJ*,3R$GGUHVV $&,*:"
/HDI5HVSRQGV <HV 1R
$53)ORRG%'
:LWK$535HSO\ 6HWWLQJ(QDEOHG"
)ORRG:LWKLQ%'
<HV ,V$537DUJHW,3 1R
8VLQJ*,3R$GGUHVV
3UHVHQWLQ(QGSRLQW
7DEOH"
6HQG$53WR
)RUZDUG$53WR 6SLQH3UR[\
(JUHVV/HDI
<HV ,V$537DUJHW,3 1R
3UHVHQWLQ6SLQH
&223'%"
)RUZDUG$53WR 'URS2ULJLQDO$53DQG8VH
(JUHVV/HDI $53*OHDQ0HFKDQLVP
IP address : 10.3.3.10
Vrf : 3080193
Flags : 0
EP bd vnid : 16318374
EP mac : 00:50:56:B3:D4:00
276
Chapter 6 Fabric Forwarding (and Troubleshooting)
Publisher Id : 10.11.66.65
Record timestamp : 05 29 2022 08:38:12 213146976
Publish timestamp : 05 29 2022 08:38:12 213833526
Seq No: 0
Remote publish timestamp: 01 01 1970 00:00:00 0
URIB Tunnel Info
Num tunnels : 1
Tunnel address : 10.11.66.65
Tunnel ref count : 1
IP address : 10.4.4.20
Vrf : 3080193
Flags : 0
EP bd vnid : 16449430
EP mac : 00:50:56:B3:2F:07
Publisher Id : 10.11.66.70
Record timestamp : 05 29 2022 08:38:09 962932484
Publish timestamp : 05 29 2022 08:38:09 963037266
Seq No: 0
Remote publish timestamp: 01 01 1970 00:00:00 0
URIB Tunnel Info
Num tunnels : 1
Tunnel address : 10.11.66.70
Tunnel ref count : 1
Even though the spine has all of the endpoint information, there was actually no
packet exchange between the endpoints themselves yet, just ARPing of their local
gateway so that leaves won’t have any remote endpoint entries in their tables. See
Listing 6-17.
277
Chapter 6 Fabric Forwarding (and Troubleshooting)
In the L3 bridge domain, you don’t perform any flooding of unknown unicasts.
Instead, the leaf will refer to the Routing Information Base (RIB), or simply a routing
table, and search for the remote bridge domain subnet prefix, marked as pervasive.
Its presence is a mandatory prerequisite to forward the unknown unicast packet.
Forwarding itself will be then handled according to the routing next-hop information.
See Listing 6-18.
278
Chapter 6 Fabric Forwarding (and Troubleshooting)
And guess what actually represents this next-hop IP? Yes, it’s the anycast spine VTEP
address, dedicated for the IPv4 forwarding proxy. See Listing 6-19.
Based on previous information, the ingress leaf will encapsulate the packet to
iVXLAN with the IPv4 anycast proxy destination address and send it to one of the
spine switches, indicating a need for endpoint resolution in the COOP database (as
shown in Figure 6-13). Notice that in case of L3 routing between bridge domains, ACI
will use VRF Segment ID (VNID) inside the corresponding header of the iVXLAN
encapsulated packet.
279
Chapter 6 Fabric Forwarding (and Troubleshooting)
Endpoint Interface
MAC A
Eth1/20
IP A
IP B ?????
DST IP:
SRC IP: VXLAN
Spine Anycast IPv4 Payload
Leaf-101 VTEP IP VNID 3080193
Proxy VTEP IP
vPC Explicit
Protection
Group
Leaf 101 Eth1/20 Leaf 102 Leaf 103 Interface Policy Group Leaf 104
Lf103_104_05_VPC
Payload from
Server A to
Server B
Server A Server B
IP: 10.3.3.10/24 IP: 10.4.4.20/24
MAC: 00:50:56:B3:D4:00 MAC: 00:50:56:B3:2F:07
VLAN 10 VLAN 11
VRF: Apress:PROD
L3 VNID: 3080193
When the spine receives the iVXLAN destined to its own anycast VTEP address, it
will proceed with endpoint lookup in the COOP database, and if the match is found,
it will rewrite the iVXLAN outer destination IP address to the egress leaf VTEP IP and
forward the packet towards its destination.
But what if spine cannot find any endpoint? In a Layer 2 bridge domain, it would
drop such traffic. Here with Unicast Routing enabled, the original traffic is dopped
as well, but ACI has an additional feature available called silent host tracking or ARP
gleaning (summarized in Figure 6-14).
280
Chapter 6 Fabric Forwarding (and Troubleshooting)
(QGSRLQW ,QWHUIDFH
0$& $
(WK
,3 $
3D\ORDG IURP
6HUYHU $
,3 % """""
'URSSHG
[ $53
Y3&([SOLFLW
5HTXHVW IURP
3URWHFWLRQ
*URXS
%' 69,
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\ *URXS /HDI
/IBBB93&
3D\ORDG IURP
6HUYHU $WR
6HUYHU %
6HUYHU $ 6HUYHU %
,3 ,3
0$&%' 0$&%)
9/$1 9/$1
95) $SUHVV352'
/ 91,'
Figure 6-14. Silent host tracking or ARP gleanning for unknown endpoint
The spine generates an ARP glean request for unknown endpoints and sends it to
all leaves where the destination bridge domain resides. Each leaf then creates three
artificial ARP requests sourced from its pervasive gateway SVI interface, flooding them
to all interfaces in a given bridge domain and asking for the unknown target IP address.
The goal is to “nudge” the silent host and make it reply to ARP. This will populate the
endpoint table on a local leaf with subsequent updates of the COOP database on spines.
The same ARP gleaning process works within one bridge domain as well, but just for
ARP traffic, no other data-plane packets. If you disabled ARP flooding in an L3 bridge
domain and the source station is ARPing for an unknown silent host, the spine will
generate ARP glean in the same way as discussed above.
281
Chapter 6 Fabric Forwarding (and Troubleshooting)
282
Chapter 6 Fabric Forwarding (and Troubleshooting)
::::
'67,3
65& ,3 9;/$1
/HDI $Q\FDVW 3D\ORDG
/HDI 97(3 ,3 91,'
Y3& 97(3
Y3&([SOLFLW
3URWHFWLRQ
*URXS
/HDI (WK /HDI /HDI ,QWHUIDFH 3ROLF\ *URXS /HDI
/IBBB93&
3D\ORDG IURP
6HUYHU $WR
6HUYHU %
6HUYHU $ 6HUYHU %
,3 ,3
0$&%' 0$&%)
9/$1 9/$1
95) $SUHVV352'
/ 91,'
283
Chapter 6 Fabric Forwarding (and Troubleshooting)
I won’t go into more details about external forwarding here as the all aspects of ACI
external connectivity will be covered in Chapter 7.
284
Chapter 6 Fabric Forwarding (and Troubleshooting)
/D\HU3DFNHW,QFRPLQJ
WR/HDI,QWHUIDFH
<HV ,V'HVWLQDWLRQ 1R
&RYHUHGLQ/D\HU
,3.QRZQ" )RUZDUGLQJ6HFWLRQ
(QFDSVXODWH)UDPHWR
)RUZDUG)UDPH L9;/$1DQG)RUZDUG 6SLQH3UR[\ )RUZDUG3DFNHWWR <HV ,V([WHUQDO5RXWLQJ 1R
/RFDOO\ WR(JUHVV/HDI (JUHVV/HDI%DVHG ,QIRUPDWLRQ$YDLODEOH 'URS
RQ,35RXWLQJ7DEOH IRU'HVWLQDWLRQ,3"
<HV 1R 'URSDQG*HQHUDWH
,V'HVWLQDWLRQ0$&.QRZQWR
&223'DWDEDVH" $53*OHDQ
3DFNHW6HQWWR
(JUHVV/HDI
Multi-Pod Forwarding
So far, we have discussed various forwarding scenarios inside a single ACI fabric. When
Multi-Pod architecture comes into play, in fact, not much will change. Multi-Pod is still
considered a single fabric, with multiple availability zones, under a single administration
domain and follows the same forwarding principles. In the following sections, you
will mainly examine additions to control plane operations with regards to forwarding
between ACI Pods.
285
Chapter 6 Fabric Forwarding (and Troubleshooting)
As soon as the spine learns about a new endpoint via the COOP protocol, in Multi-
Pod it will signal this information to all other remote spines using MP-BGP in address
family L2VPN EVPN. In Figure 6-17, you can examine the format of L2VPN EPVN
Network Layer Reachability Information (NLRI).
ZŽƵƚĞdLJƉĞ͗ ƚŚĞƌŶĞƚ^ĞŐŵĞŶƚ ƚŚĞƌŶĞƚdĂŐ DĚĚƌĞƐƐ /WĚĚƌĞƐƐ /WĚĚƌĞƐƐ
DĚĚƌĞƐƐ
Ϯʹ Dͬ/W /ĚĞŶƟĮĞƌ /ĚĞŶƟĮĞƌ >ĞŶŐƚŚ >ĞŶŐƚŚ
6SLQHVKRZEJSOYSQHYSQYUIRYHUOD\
5RXWH'LVWLQJXLVKHU /91,
%*3URXWLQJWDEOHHQWU\IRU>@>@>@>@>%)@>@>@YHUVLRQGHVWSWU[DEH
3DWKV DYDLODEOHEHVW
)ODJV [D[ RQ[PLWOLVWLVLQULEHYSQLVQRWLQ+:LVLQOULEPSRGVKDUGLVLQOULE
0XOWLSDWKH%*3L%*3
$GYHUWLVHGSDWKLG
džƚĞƌŶĂůdW/W 3DWKW\SH [DH LQWHUQDO[F[UHIDGYSDWKUHISDWKLVYDOLGLVEHVWSDWKLVLQOULE
,PSRUWHGIURP [D >@>@>@>@>@>@>@
$63DWK121(SDWKVRXUFHGLQWHUQDOWR$6
PHWULF IURP
ŽŶƚƌŽůͲƉůĂŶĞdW
2ULJLQ,*30('QRWVHWORFDOSUHIZHLJKWWDJSURSDJDWH
5HFHLYHGODEHO
5HFHLYHGSDWKLG
([WFRPPXQLW\
57
622
(1&$3 sE/ sZ&sE/
3DWKLGQRWDGYHUWLVHGWRDQ\SHHU
A single route type 2 entry can describe both MAC address for L2 BDs or MAC+IP
couples in L3 enabled BD. Route Type 5 is then similarly used to exchange routing
information between spines in a given VRF (address families VPNv4, VPNv6).
Notice that these external entries carry in a next-hop field already mentioned
external TEP address, configured on spines during Multi-Pod initiation. External TEPs
are redistributed to the COOP database as a next hop for all external endpoints. As
you can see in Figure 6-18, it’s the only significant difference from standard internal
endpoints.
286
Chapter 6 Fabric Forwarding (and Troubleshooting)
Y3&([SOLFLW Y3&([SOLFLW
W
3URWHFWLRQ 3URWHFWLRQ
*URXS *URXS
/HDI ,QWHUIDFH 3ROLF\ *URXS /HDI /HDI ,,QWHUIDFH
QWHUID 3ROLF\ *URXS
*URXS
S /HDI
/IBBB93& /I 93&
/IBBB93& &
Leaf endpoint tables still use tunnel interfaces for remote endpoint reachability
and their destination address for Inter-Pod endpoints refers to the actual remote VTEP
address of the distant leaf switch.
287
Chapter 6 Fabric Forwarding (and Troubleshooting)
32'97(33RRO5HGLVWULEXWLRQ 32'97(33RRO5HGLVWULEXWLRQ
IURP,31263)%*3WR32',6,6 IURP32',6,6WR,31263)%*3
32' 32'
65& ,3 '67,3
/HDI ,31 91,'9;/$1
/HDI
3D\ORDG
$Q\FDVW 97(3 ,3 $Q\FDVW Y3& 97(3
Y3&([SOLFLW Y3&([SOLFLW
W
3URWHFWLRQ 3URWHFWLRQ
*URXS *URXS
/HDI ,QWHUIDFH 3ROLF\ *URXS /HDI /HDI ,QWWHUID
,QWHUIDFH
D 3ROLF\ *URXS
*URXS
S /HDI
/IBBB93& /IBBB93&
/I 93&&
3D\ORDG IURP
6HUYHU $WR
6HUYHU %
When the local leaf switch doesn’t know about destination endpoint and the bridge
domain is configured for a hardware proxy for an unknown L2 unicast, the iVXLAN
packet is sent to an anycast spine VTEP address. There, if the endpoint information exists
in the COOP database learned from MP-BGP peering with a remote Pod, the iVXLAN
outer destination IP header is rewritten to an external TEP address and sent over the IPN
toward the remote spine switch. The remote spine will finally resolve its local endpoint
in COOP, rewrite the outer iVXLAN destination IP to the egress leaf VTEP, and deliver
the packet locally in a standard fashion. Hardware proxy forwarding is indicated in
Figure 6-20.
288
Chapter 6 Fabric Forwarding (and Troubleshooting)
For Layer 3 routed traffic, there is no special behavior compared to single fabric or
previously described data plane operations. Known L3 endpoints are directly routed
with the egress VTEP address in the iVXLAN IP header. For unknown L3 endpoints,
you need at least their bridge domain pervasive subnet in the routing table, pointing to
spines, where the COOP resolution happens. And for entirely silent hosts not known to
the COOP database, spines generate ARP glean messages, which are flooded through the
whole Multi-Pod environment where the destination bridge domain exists.
289
Chapter 6 Fabric Forwarding (and Troubleshooting)
32' 32'
65& ,3 '67,3
/HDI %' *L32 ,31 91,'9;/$1
3D\ORDG
$Q\FDVW 97(3 ,3
Y3&([SOLFLW Y3&([SOLFLW
W
3URWHFWLRQ 3URWHFWLRQ
*URXS *URXS
/HDI ,QWHUIDFH 3ROLF\ *URXS /HDI /HDI ,QWWHUID
,QWHUIDFH
D 3ROLF\ *URXS
*URXS
S /HDI
/IBBB93& /IBBB93&
/ I 93&&
%URDGFDVW IURP
6HUYHU $WR
6HUYHU %
In order to deliver multicast traffic through IPN as well, you need to utilize PIM-Bidir
multicast routing with phantom RPs as discussed in previous chapters. Spines in each
Pod, thanks to IS-IS, internally elect one of them to become the authoritative forwarder
of multicast packets towards the IPN for a particular bridge domain. At the same time,
they will send the IGMP Report over a single chosen interface facing the IPN, indicating
interest in receiving multicast traffic for a given GiPO address. This way, they can simply
achieve BUM traffic loadbalancing.
When looking at the IPN multicast routing table, if the connected spine registered an
intention to receive a multicast for GiPO address, you will see the interface over which
the IGMP was received in the Outgoing Interface List. Alternatively, you can look in the
IGMP Snooping table if enabled. See Listing 6-22.
290
Chapter 6 Fabric Forwarding (and Troubleshooting)
291
Chapter 6 Fabric Forwarding (and Troubleshooting)
%' %'
%' %'
32' 32'
Multi-Site Forwarding
In ACI Multi-Site architecture, forwarding follows similar concepts to Multi-Pod with few
variations and additions. ISN OSPF/BGP routing exchanges overlay VTEP IPs (similar
to external TEPs in Multi-Pod) between spines. You now differentiate between Unicast
Overlay TEP (O-UTEP) and Multicast Overlay TEP (O-MTEP) IPs used for forwarding
respective traffic types. ISN doesn’t require any multicast support; the spines will
perform the ingress-replication of BUM packets instead. Multiple unicast copies of
previously broadcasted/multicasted traffic are sent to O-MTEP IPs of remote spines.
Endpoint information in spine COOP databases (and subsequently leaf endpoint
tables) is synchronized thanks to MP-BGP between sites in the L2VPN EVPN address
family. If routing is needed, VPNv4 and VPNv6 entries are exchanged. As you can see in
Figure 6-23, remote endpoint entries always refer to O-UTEP IP addresses.
292
Chapter 6 Fabric Forwarding (and Troubleshooting)
293
Chapter 6 Fabric Forwarding (and Troubleshooting)
$&,6LWH
H
95) 352'
&RQWUDFW
'DWDEDVHB(3*
When you push such a policy from NDO to both APIC clusters, the resulting
configuration will look like this:
1. ACI Site 1
2. ACI Site 2
294
Chapter 6 Fabric Forwarding (and Troubleshooting)
Shadow EPG sClasses are then installed to local zoning rule tables and mapped to
local EPGs, ostensibly simulating the local relationship. The output shown in Listing 6-23
would be seen in the zoning rule table of Site 1 leaves. Notice especially the SrcEPG and
DstEPG fields.
If you check the leaves of Site 2, you get the output shown in Listing 6-24.
295
Chapter 6 Fabric Forwarding (and Troubleshooting)
The second step is related to data plane traffic normalization. NDO has to create a
translation database called the Name-Space Normalization Table on all spine switches.
Based on it, the spines will rewrite the iVXLAN header fields when the inter-site packet is
being forwarded. Figure 6-25 and the following paragraph describe the whole process.
296
Chapter 6 Fabric Forwarding (and Troubleshooting)
1DPH6SDFH 1RUPDOL]DWLRQ7DEOH
1DPH6SDFH 1RUPDOL]DWLRQ7DEOH
6LWH 6LWH
5HPRWH 6LWH /RFDO 6LWH 5HPRWH 6LWH /RFDO 6LWH
91,' %' VKDGRZ 91,' %' VKDGRZ
91,' 95) ,61 91,' 95)
V&ODVV (3* VKDGRZ V&ODVV (3* VKDGRZ
1H[XV
6HUYHU $ 'DVKERDUG 6HUYHU %
(3* %DFNHQGB(3* 2UFKHVWUDWRU (3* 'DWDEDVHB(3*
V&ODVV V&ODVV
%DFNHQGB(3* &RQWUDFW
6KDGRZ
'DWDEDVHB(3* %DFNHQGB(3* &RQWUDFW 'DWDEDVHB(3*
(3*V DQG %'V
In this example, the backend endpoint from Site 1 sends a packet to the database
endpoint in Site 2. Ingress leaf 101/102 matches the zoning rules related to the local
backend EPG and shadow database EPG. The permitted packet is encapsulated into
iVXLAN with headers set to Backend_EPG sClass 38273 and Apress:PROD VRF VNID
3047424. The packet is forwarded to destination O-UTEP of spines 298/299. The
Destination spine identifies inter-site traffic, so its headers must be translated for local
use and delivery. VRF VNID will be according to the normalization table translated
from remote 3047424 to local 2097153 and remote sClass 38273 of the actual backend
EPG in Site 1 is translated to artificial shadow sClass 45793 of the backend EPG in local
Site 2. The outer destination IP of the iVXLAN header is changes as well to the local
egress leaves anycast VTEP. After receiving a packet on the egress leaf, it will see that the
contract was already applied in the source site, so it will just deliver the payload to the
destination endpoint.
297
Chapter 6 Fabric Forwarding (and Troubleshooting)
Name-space normalization tables can be verified on spine switches using the show
commands in Listing 6-25, illustrating the previous example as well.
298
Chapter 6 Fabric Forwarding (and Troubleshooting)
Of course, this normalization works not only to interconnect separated ACI segments
between sites, but also in the same way to stretch them. All stretched objects existing on
both sites will be just directly mapped between each other without creating an artificial
shadow object. In the previous example, VRF is such a case.
Endpoint Tracker
Navigating to Operations -> EP Tracker you will find a useful GUI tool to simply and
quickly verify if some endpoint is known to the fabric or not (as shown in Figure 6-26). In
the search field, you can put either a MAC address, IP address, or when integrated with
299
Chapter 6 Fabric Forwarding (and Troubleshooting)
a VMM domain even a VM name. If an endpoint is learned behind some leaf switch, you
will get one or more search results with information on where it is currently connected
and to which tenant/application profile/EPG it belongs.
300
Chapter 6 Fabric Forwarding (and Troubleshooting)
endhost/fabric to a particular interface, or you can dig deeper to see its handling. ELAM
is very useful to find the outgoing interface(s) of a chosen traffic or even more to identify
a drop reason when some communication is not working.
Originally, ELAM was a CLI utility, initiated in a special line card shell (vsh_lc) with
a quite complicated filter configuration and output ranging up to thousands of lines full
of internal codes and hex values. If you are not Cisco CX (TAC) Engineer, it can be hard
to find that one information you currently need. Thankfully, help is provided in the form
of the additive ELAM Assistant app, which can be download from DC App Center store
(https://dcappcenter.cisco.com/). The Package needs to be installed in the
Apps -> Installed Apps section in the ACI GUI and enabled before use (as you can see in
Figure 6-27).
ELAM Assistant will significantly (really hugely) simplify the ELAM operation,
providing a graphical way to configure the utility for catching the correct packet. If the
filter rule is triggered on a specified ASIC, ELAM Assistant will further download the
resulting detailed output and parse it on your behalf. This will translate the sawdust hex
values into a human-readable and useful information.
The fastest procedure to set up ELAM is to open the app, log in with your credentials,
choose Capture (Perform ELAM) in the left navigation pane, and click on Quick Add to
choose up to three ACI switches where the traffic filter will be configured and will define
a wide variety of packet parameters to match (MAC, IP, CoS, DSCP, IP Protocol Number,
L4 source/destination port, ARP sender/target IP, ARP MAC, or its Opcode). If needed,
you can even match the iVXLAN packet headers on a fabric interface (as shown in
Figure 6-28).
301
Chapter 6 Fabric Forwarding (and Troubleshooting)
302
Chapter 6 Fabric Forwarding (and Troubleshooting)
If the defined filter is hit, ELAM Assistant will start by downloading the report file
from the ACI switch and parsing it for you. Click the Report Ready button in the Status
field and view the output. Besides the traffic header analysis at the beginning, the most
crucial part is the Packet Forwarding Information section (shown in Figure 6-30).
As you can see, ELAM offers confirmation about forwarding decision on this node
(e.g., local delivery, unicast forwarding through the fabric to the egress leaf, flood or
drop) together with contract hit information and a drop code if applicable.
fTriage
ELAM is a powerful tool for performing a deep packet forwarding analysis, but as you’ve
seen, it’s a kind of one-time type and maximally on three concurrent ACI nodes. How
great would it be if you could start ELAM on a given ingress node(s), specify the traffic
of your interest, and some tool would gather all the outputs from the ELAM report and
based on them automatically perform further ELAMs along the whole path around the
ACI fabric (even in Multi-Pod/Multi-Site architectures)? All with comprehensive and
detailed documentation about the packet manipulation on each hop? That’s exactly
what APIC’s fTriage utility (written in Python) is for.
fTriage can analyze the whole forwarding path of a defined communication by
entering the initial information about where you expect the traffic to ingress. The
command itself has a plethora of flexible options and flags you can explore by issuing
ftriage example or a specific -h flag for bridged or routed variants. Some of the
common examples are shown in Listing 6-26.
303
Chapter 6 Fabric Forwarding (and Troubleshooting)
# Broadcast
apic1# ftriage -user admin bridge -ii LEAF:Leaf-101 -dmac FF:FF:FF:FF:FF:FF
Let’s start fTriage and analyze the communication between endpoints 10.4.4.20 and
10.3.3.10 in different EPGs behind different leaf switches. Notice especially how detailed
information about forwarding decisions along the path fTriage can provide (I removed
timestamps and few not useful fields from each line to make the output more readable).
See Listing 6-27.
304
Chapter 6 Fabric Forwarding (and Troubleshooting)
apic1# ftriage -user admin route -ii LEAF:Leaf-102 -sip 10.4.4.20 -dip
10.3.3.10
Request password info for username: admin
Password:
main:2064 Invoking ftriage with username: admin
fcls:2379 Leaf-102: Valid ELAM for asic:0 slice:1 srcid:106 pktid:1586
main:1295 L3 packet Seen on Leaf-102 Ingress: Eth1/20 Egress:
Eth1/60 Vnid: 3080193
main:1337 Leaf-102: Incoming Packet captured with [SIP:10.4.4.20,
DIP:10.3.3.10]
main:1364 Leaf-102: Packet's egress outer [SIP:10.11.24.66,
DIP:10.11.24.64]
main:1371 Leaf-102: Outgoing packet's Vnid: 3080193
main:353 Computed ingress encap string vlan-11
main:464 Ingress BD(s) Apress:Database_BD
main:476 Ingress Ctx: Apress:PROD Vnid: 3080193
main:1566 Ig VRF Vnid: 3080193
main:1610 SIP 10.4.4.20 DIP 10.3.3.10
unicast:1607 Leaf-102: Enter dbg_sub_ig with cn HEA and inst: ig
unicast:1934 Leaf-102: Dst EP is remote
unicast:840 Leaf-102: Enter dbg_leaf_remote with cn HEA and inst: ig
misc:891 Leaf-102: caller unicast:976 DMAC(00:22:BD:F8:19:FF) same as
RMAC(00:22:BD:F8:19:FF)
misc:893 Leaf-102: L3 packet caller unicast:985 getting routed/bounced in HEA
misc:891 Leaf-102: caller unicast:996 Dst IP is present in HEA L3 tbl
misc:891 Leaf-102: caller unicast:1054 RwDMAC DIPo(10.11.24.64) is one of
dst TEPs ['10.11.24.64']
main:1770 dbg_sub_ig function returned values on node Leaf-102 done False,
nxt_nifs {Spine-199: ['Eth1/32']}, nxt_dbg_f_n nexthop, nxt_inst ig, eg_ifs
Eth1/60, Vnid: 3080193
main:958 Found peer-node Spine-199 and IF: Eth1/32 in candidate list
fcls:2379 Spine-199: Valid ELAM for asic:0 slice:0 srcid:56 pktid:31
main:1295 L3 packet Seen on Spine-199 Ingress: Eth1/32 Egress:
Eth1/31 Vnid: 3080193
305
Chapter 6 Fabric Forwarding (and Troubleshooting)
Additionally, you can differentiate between multiple places where to configure SPAN
session in ACI, each with its own specifics:
307
Chapter 6 Fabric Forwarding (and Troubleshooting)
• The traffic is SPANed with iVXLAN headers and, due to this fact,
you can filter it based on VRF or BD (their VNIDs).
SPAN Configuration
Each SPAN type is configured in the same way, just in different places (access policies,
fabric policies, tenant policies). The configuration itself has two components: creating a
SPAN destination group and mapping it to the SPAN source group.
To create a SPAN destination in access policies, navigate to Fabric -> Access
Policies -> Policies -> Troubleshooting -> SPAN -> SPAN Destination Groups. Here
you will create an object describing either a single destination interface on a particular
leaf switch or an ERSPAN destination IP address connected in the chosen EPG (as shown
in Figure 6-31).
308
Chapter 6 Fabric Forwarding (and Troubleshooting)
In the Source IP/Prefix field you can choose from which IP address the ERSPAN
traffic will arrive at the destination monitoring station. It doesn’t need to be real IP
available in ACI. If you enter a prefix (e.g., 10.0.0.0/24), ACI will incorporate into a
ERSPAN source IP the Node ID of originating leaf (for Leaf-101, 10.0.0.101).
As the second step, you need to create SPAN source group in Fabric -> Access
Policies -> Policies -> Troubleshooting -> SPAN -> SPAN Source Groups. There you
can configure one or more physical interfaces, PC/vPC interface policy groups, or even
individual vPC member ports. Filtering is possible on top of the main object, related to
all defined traffic sources or individually per defined source (see Figure 6-32).
309
Chapter 6 Fabric Forwarding (and Troubleshooting)
After submitting the form, SPAN is instantly deployed to the fabric without any
additional association needed. In order to verify it happened, you can connect to
affected leaf CLI and check the output shown in Listing 6-28.
Session State Reason Name
------- ----------- ---------------------- ---------------------
3 up active Apress_SPANSrc
310
Chapter 6 Fabric Forwarding (and Troubleshooting)
state : up (active)
erspan-id : 1
granularity :
vrf-name : Apress:PROD
acl-name :
ip-ttl : 64
ip-dscp : ip-dscp not specified
destination-ip : 10.2.2.10/32
origin-ip : 10.0.0.101/24
mode : access
Filter Group : None
source intf :
rx : [Eth1/20]
tx : [Eth1/20]
both : [Eth1/20]
source VLANs :
rx :
tx :
both :
filter VLANs : filter not specified
filter L3Outs : filter not specified
311
Chapter 6 Fabric Forwarding (and Troubleshooting)
After submitting the form, you will receive a graphical representation of mutual
endpoint forwarding with the possibility to analyze traffic statistics, drops, related faults,
events, applied contracts, or initiate traceroute between endpoints. All functions are
accessible from the left menu (shown in Figure 6-34).
312
Chapter 6 Fabric Forwarding (and Troubleshooting)
To view individual hardware drop counters for Nexus interfaces, you can attach to a
special line card shell (vsh_lc) and use the command in Listing 6-29.
Leaf-101# vsh_lc
module-1# show platform internal counters port 20
Stats for port 20
(note: forward drops includes sup redirected packets too)
IF LPort Input Output
Packets Bytes Packets Bytes
eth-1/20 20 Total 53458772 3831638779 946076 125371600
Unicast 82427 8736812 944230 125244266
Multicast 0 0 1846 127334
Flood 5270 366424 0 0
Total Drops 53371075 0
313
Chapter 6 Fabric Forwarding (and Troubleshooting)
Storm Drops(bytes) 0
Buffer 0 0
Error 0 0
Forward 53371075
LB 0
Note These described packet drop reasons can be viewed in ELAM as well for a
single packet.
Summary
Understanding forwarding mechanisms with control plane and data plane operations
in ACI is a fundamental and highly important aspect of its operation, especially when it
comes to troubleshooting of non-working connectivity between distinct endpoints. In
this chapter, you had the opportunity to explore all the main forwarding concepts for
both bridged and routed unicast traffic, as well as for flooded unknown traffic or ARP
operation. You covered intra-fabric forwarding together with Multi-Pod and Multi-Site
ACI architectures, all supplemented with wide range of tools and procedures to simplify
the forwarding troubleshooting. If you would like to continue further expanding your
knowledge in this area, I recommend reading through the Endpoint Learning or Multi-
Pod, Multi-Site official Cisco’s whitepapers. Each of them is a great study resource.
You probably noticed that this chapter was entirely oriented to the east-west traffic
pattern between internal endpoints. In the following chapter, you will start looking at the
ways to interact with ACI fabric from external L2 and L3 networks in various scenarios.
I will cover a Layer 2 EPG or bridge domain extension for stretching ACI connectivity to
a legacy network for a migration purposes, as well as L3OUT objects representing the
complete static and dynamic routing configuration of ACI with an external L3 network.
314
CHAPTER 7
315
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_7
Chapter 7 External Layer 2 and Layer 3 Connectivity
&RQWUDFW
([WHUQDOB/B(3* ([WHQGHGB%'
Both variants have their pros and cons, but the end goal is ultimately the same: to
provide adequate connectivity based on ACI policies for internal as well as external
endpoints. In the following sections, I will describe and compare both of them.
Configuration of Layer 2 extensions is completely abstracted from the underlay ACI
network, so you can use any approach, even combined with later-described external L3
routing over the same physical interfaces.
&RQWUDFW
9/$1B
([W(3* )URQWHQGB(3*
)URQWHQGB%'
/287 2EM
2EMHFW
E HFW
&RQWUDFW
9/$1B
([W(3* %DFNHQGB(3*
%DFNHQGB%'
/287 2EMHFW
From a tenant policies point of view, this L2 external EPG is no different compared to
a standard EPG. It has its own pcTag and all contained endpoints must meet zoning rule
definitions from the contract in order to communicate outside of their EPG.
317
Chapter 7 External Layer 2 and Layer 3 Connectivity
The similarity continues to how the external L2 EPG consumes physical fabric
resources. Standard EPGs need to have some physical domain associated with them to
gain access to physical interfaces and encapsulation VLAN IDs. For L2OUT you need to
create special external bridged domain found in the same place as the physical domains
in the fabric access policies discussed in Chapter 4. The rest of the access policies remains
the same; the external bridged domain is mapped to a static VLAN pool and AAEP of
your choice, listing the interfaces used for legacy network interconnection.
Finally, for bridge domain extension, navigate and create the L2OUT object in
Tenants -> Tenant_Name -> Networking -> L2OUTs. As shown in Figure 7-3, besides
the name, mandatory information includes:
318
Chapter 7 External Layer 2 and Layer 3 Connectivity
On the second page of the L2OUT configuration form, you must create at least
one external EPG with optional settings allowing to change its QoS class or put it to a
preferred group in its VRF (see Figure 7-4).
319
Chapter 7 External Layer 2 and Layer 3 Connectivity
After creation, you should immediately see new endpoints learned in this L2_
ExtEPG if they communicate with ACI. Check either the GUI using the Operational tab
when on external EPG object, or by using the APIC/leaf CLI (Listing 7-1).
+-----------+---------+-----------------+-----------+------------+--------+
VLAN/ Encap MAC Address MAC
Info/ Interface Endpoint Group
Domain VLAN IP Address IP Info Info
+-----------+---------+-----------------+-----------+------------+--------+
22 vlan-200 0050.56b3.362b L eth1/20
Apress:Frontend_L2OUT:L2_ExtEPG
Apress:PROD vlan-200 10.0.200.30 L eth1/20
320
Chapter 7 External Layer 2 and Layer 3 Connectivity
By default, these external endpoints will have the connectivity only to the ACI BD
gateway (if configured and the extended BD has Unicast Routing enabled). Therefore,
the final step in the process is to associate a contract with an external EPG and connect it
to internal resources, defining application profiles as shown in Figure 7-5.
Even though this L2 extension approach is fully supported and functional, due to
multiple required configuration steps and unnecessary complexity, it actually hasn’t
gained so much popularity when connecting ACI to a legacy network. The unambiguous
winner is the next option: the EPG extension.
321
Chapter 7 External Layer 2 and Layer 3 Connectivity
extending the legacy VLAN to ACI; statically mapping it to EPG (as shown in Figure 7-6).
ACI will without any additional configuration apply exactly the same connectivity
policies for external endpoints as defined in the application profile for the main EPG.
322
Chapter 7 External Layer 2 and Layer 3 Connectivity
323
Chapter 7 External Layer 2 and Layer 3 Connectivity
# stp.AllocEncapBlkDef
encapBlk : uni/infra/vlanns-[Apress_StVLAN]-static/from-[vlan-90]-to-
[vlan-110]
base : 8292
childAction :
descr :
dn : allocencap-[uni/infra]/encapnsdef-[uni/infra/vlanns-[Apress_
StVLAN]-static]/allocencapblkdef-[uni/infra/vlanns-[Apress_StVLAN]-static/
from-[vlan-90]-to-[vlan-110]]
from : vlan-90
to : vlan-110
If you, for example, deploy VLAN 100 from above the VLAN pool encap block, the
resulting VNID for BPDU flooding would be 8292 + (100 – 90) = 8302. The CLI command
in Listing 7-3 can confirm the VNIDs. Both VNID for BD forwarding as well as EPG
flooding in encapsulation are listed there:
+----------+---------+-----------------+----------+------+----------+---------
VLAN ID Type Access Encap Fabric H/W id BD VLAN Endpoint
(Type Value) Encap Count
+----------+---------+-----------------+----------+------+----------+---------
14 Infra BD 802.1Q 3967 16777209 25 14 1
21 Tenant BD NONE 0 16744308 32 21 0
22 FD vlan 802.1Q 62 8694 33 21 2
26 Tenant BD NONE 0 16023499 34 26 0
27 FD vlan 802.1Q 100 8302 35 26 1
29 FD vlan 802.1Q 10 8392 27 15 2
30 FD vlan 802.1Q 9 8292 29 17 1
324
Chapter 7 External Layer 2 and Layer 3 Connectivity
Summed and wrapped, when two leaves (or pairs of leaves) are supposed to flood
STP BPDUs, their interfaces have to
• Be part of the same physical domain using the same VLAN pool.
The outcome can be seen in Figure 7-7. Flooded BPDUs through ACI between
multiple physical interfaces in the same VLAN and EPG causes external switch
participating in STP to block its interface(s), effectively mitigating a Layer 2 loop and
following the broadcast storm.
673%3'8V
y
9/$1 ,'
PDSSHG WR (3*
673
5RRW %ULGJH
([WHQGHGB(3*
9/$1
)ORRGLQJ 9;/$1 ,'
325
Chapter 7 External Layer 2 and Layer 3 Connectivity
Interface blocking is always a task for an external switch. ACI will never block its
interface due to STP. Therefore, make sure your legacy network devices are configured
correctly and optimally to minimize the potential for any unintended switching loop.
• Multiple Spanning Trees: If you use this STP variant in your legacy
network, make sure to configure and apply a special Spanning Tree
Policy with MST Region information found in Fabric -> Access
Policies -> Policies -> Switch -> Spanning Tree. Additionally, as
MSTP sends BPDUs in native VLAN without any tag, create a “native
EPG” for them in order to correctly flood them through the ACI fabric.
326
Chapter 7 External Layer 2 and Layer 3 Connectivity
The interface shown in the above output can either point to a local access port
or bridge domain SVI. In the second case, it means the last TCN was flooded within
EPG from another leaf switch, and to find its source, you should look further around
the fabric.
327
Chapter 7 External Layer 2 and Layer 3 Connectivity
In general, your goal should be to minimize as much as possible the number of TCNs
by implementing the following countermeasures:
• Permit only really used VLANs on trunk interfaces; map only really
used VLANs to EPGs.
328
Chapter 7 External Layer 2 and Layer 3 Connectivity
After applying this policy through the interface policy group, all VLANs mapped
to a particular leaf interface can be reused for other interfaces as well. ACI will remap
each overlapping VLAN to different internal VLAN in order to differentiate between
them. Moreover, for this function to work, you have to separate interfaces into a different
physical (or virtual) domain with their dedicated VLAN pools carrying overlapping
VLANs. This is no problem and is supported for internal ACI endpoints. The issue can
arise when combining overlapping VLANs with external L2 extensions, mainly if they
consist of multiple external switches.
Imagine having two pair of vPC leaves, as depicted in Figure 7-9. Just this time,
multiple tenants are connected with their own L2 switches to them, both extending
VLAN 100. It will result in multiple physical domains configured on the same switches
with the same VLAN 100 encap block in associated VLAN pools. After configuring Layer
2 extension for these particular EPGs, you would find consistent VXLAN ID information
on all leaf switches. See Listing 7-5.
329
Chapter 7 External Layer 2 and Layer 3 Connectivity
+----------+---------+-----------------+----------+------+----------+-----------
VLAN ID Type Access Encap Fabric H/W id BD VLAN Endpoint
(Type Value) Encap Count
+----------+---------+-----------------+----------+------+----------+-----------
24 Tenant BD NONE 0 16236241 32 24 0
25 FD vlan 802.1Q 100 8302 33 24 2
26 Tenant BD NONE 0 16991234 34 26 0
27 FD vlan 802.1Q 100 8634 35 26 1
673%3'8V
673%3'8V
9/$1
7HQDQW $± (3* $ )ORRG 9;/$1 ,'
9/$1
7HQDQW %± (3* % )ORRG 9;/$1 ,'
However, as soon as you upgrade (or wipe) one vPC pair, due to overlapping VLAN
pools and different physical domains, ACI can configure different base values for their
encap blocks. This results in different VXLAN VNIDs, tearing apart BPDU flooding in
EPGs and ultimately causing a bridging loop because external switches won’t block their
interfaces without receiving BPDUs (see Figure 7-10).
330
Chapter 7 External Layer 2 and Layer 3 Connectivity
673%3'8V
673%3'8V
y
y
9/$1
)ORRG 9;/$1 ,' 7HQDQW $± (3* $ )ORRG 9;/$1,'
9/$1
)ORRG 9;/$1 ,' 7HQDQW %± (3* % )ORRG 9;/$1,'
331
Chapter 7 External Layer 2 and Layer 3 Connectivity
At the beginning, ACI was considered just as a stub network, but long since (version
1.1), we can achieve transit routing as well either between multiple L3OUTs in the same
VRF, or by route-leaking among different VRFs. Generally, consider the ACI fabric as a
single distributed router (single next hop) in the L3 path.
From the high-level perspective, we can summarize external Layer 3 routing in ACI
to the following steps (as shown in Figure 7-11):
332
Chapter 7 External Layer 2 and Layer 3 Connectivity
%*3 55 %*3 55
03%*3
/287
6WDWL
6WDWLF
263)
(,*53
%*3
&RQWUDFW
)URQWHQGB(3*
)URQW(QGB%' ([WHUQDOB/287B(3*
95) 352'
Figure 7-11. ACI Layer 3 external routing concept
I will now go through each area in more detail and focus on the configuration and
troubleshooting aspects of interaction between ACI and the external L3 world.
333
Chapter 7 External Layer 2 and Layer 3 Connectivity
1RGH3URILOHV± %RUGHU/HDIV'HILQLWLRQ5RXWHU,'V/RRSEDFNV6WDWLF5RXWLQJ
5RXWLQJ3URWRFRO,QWHUIDFH3URILOHV± 3URWRFRO6SHFLILF&RQILJXUDWLRQ
HJ7LPHUV0HWULFV$XWKHQWLFDWLRQ(WF
([WHUQDO/(3*± &RQWUDFWV7UDQVLW5RXWLQJ5RXWH0DSV95)/HDNLQJ
The object structure is (as usual) quite modular, with multiple design options I will
cover in subsequent sections. Additionally, in my opinion it’s convenient to familiarize
yourself with the L3OUT object hierarchy in the ACI GUI as well (shown in Figure 7-13).
ACI operations often include changes in external L3 routing, so it’s important to know
your way around.
334
Chapter 7 External Layer 2 and Layer 3 Connectivity
1RGH 3URILOH
5RXWLQJ 3URWRFRO
,QWHUIDFH 3URILOH
([WHUQDO / (3*
At the end of the day, your ultimate goal with L3OUTs will be to completely describe
external L3 connectivity needs for mutual exchange of routing prefixes and ensure the
connectivity rules enforcement between external and internal ACI resources. Let’s have a
closer look at what configuration options L3OUT offers.
335
Chapter 7 External Layer 2 and Layer 3 Connectivity
Take into account that this initial wizard actually won’t show you all the options
available for configuring a particular object, just the most important fields. For the entire
configuration, you need to return manually to the L3OUT hierarchy after submitting
the wizard form. I will start with an overview of the mandatory and global configuration
fields in the wizard for a root L3OUT object:
• Name: I find useful to include a name (in some form) of the external
L3 device in the L3OUT name. If you have more than one L3OUT
per VRF, you can easily differentiate what external connectivity they
provide based on their names.
336
Chapter 7 External Layer 2 and Layer 3 Connectivity
• VRF: Related routing context for which this L3OUT will provide
external connectivity
From the additional options available when navigating to the main L3OUT object
after finishing the wizard, these are the most important worth mentioning (shown in
Figure 7-15):
• Route Profile for Interleak: Later in this chapter I will cover how
to create custom route maps (in ACI terminology, route profiles)
to manipulate routing information attributes. There are multiple
places where they can be attached, and the interleak route profile at
root L3OUT level is one of them. By the term interleak, ACI means
redistribution from external EIGRP/OSPF routing protocols to BGP
IPv4/IPv6 address families and then to internal MP-BGP (BGP is not
influenced by this route map as the external routing information
directly comes to the BGP database). Interleak is especially useful
to attach route tags or alter other BGP path attributes during prefix
import, which can then be transported via MP-BGP to another leaf
and used for outbound filtering or transit routing.
338
Chapter 7 External Layer 2 and Layer 3 Connectivity
In the end, the provided information in this step will transform into logical node
profiles and related interface profiles. The main configuration options include
• Name: With by default checked option “Use Defaults”, the name of
the node profile will consist of <L3OUT_Name>_NodeProfile. I prefer
to uncheck this option and use LeafXXX_XXX_NodeProfile to later
instantly identify which leaf this L3OUT is configured to, even before
opening the node profile itself.
• Path: Based on the previously chosen interface type, you will have to
specify the path to leaf nodes as well as their interfaces. Either exactly
by choosing Nodes IDs and Interface IDs, or they will be resolved
automatically from selected interface policy group. There is a + sign
for both nodes and interfaces, allowing you to configure multiple
objects using the same form (with the exception of vPC SVI, where
the wizard allows to configure only single pair or vPC switches).
340
Chapter 7 External Layer 2 and Layer 3 Connectivity
When dealing with node and interface profiles, there are multiple supported design
options, all deployed in the same way to the fabric switches (see Figure 7-17). The
L3OUT wizard, regardless of the number of leaves specified or interfaces configured for
each leaf, will end up creating the first variant shown: a single node profile incorporating
all leaves with a single interface profile describing all user interfaces. Alternatively, you
could create a single node profile but separate interface profiles (second option), or
completely isolate both nodes as well as interfaces to their own profile objects (third
option).
/287 /287
/HDI
1RGH3URILOH 1RGH3URILOH /HDI
/HDI
,QWHUIDFH3URILOH ,QWHUIDFH3URILOH
/HDILQWHUIDFHH /HDILQWHUIDFHH
/HDILQWHUIDFHH
1RGH3URILOH /HDI
/287
/HDI ,QWHUIDFH3URILOH
1RGH3URILOH
/HDI /HDILQWHUIDFHH
,QWHUIDFH3URILOH
/HDILQWHUIDFHH
,QWHUIDFH3URILOH
/HDILQWHUIDFHH
Personally, I prefer a combination of the first and third options. For vPC pairs, I
usually create single node and interface profiles, supporting the fact that these two
switches are related and so is their configuration. For all other individual L3 interfaces, I
go for the third option.
Note If using dual-stack IPv4 and IPv6 on the same interfaces, both protocols
will need separated interface profiles.
341
Chapter 7 External Layer 2 and Layer 3 Connectivity
External EPG
The last section in the L3OUT wizard brings us to the crucial object for external
connectivity, the external EPG (see Figure 7-19). For simplicity, the wizard allows you
to create only single EPG, and when you leave the option Default EPG for all external
networks checked, any external IP address communicating via this L3OUT will be
considered as a part of this EPG.
342
Chapter 7 External Layer 2 and Layer 3 Connectivity
Like any other EPG, to permit communication in and out, you have to apply standard
application contracts to it. As shown in Figure 7-20, each external EPG receives its own
pcTag (or sClass); you can enable intra-EPG isolation or include it in a VRF preferred
group. External EPGs are also part of vzAny group in the parent VRF, together with all
standard internal EPGs.
But how can ACI classify the external endpoints to EPG and differentiate between
various external EPGs in the same L3OUT or even between multiple L3OUTs? The
mechanism here is slightly different from a standard application EPG. Instead of
relying on VLAN and interface mappings, you need to define subnets (with their
scope set by default to External Subnets for the External EPG). Be aware, for endpoint
343
Chapter 7 External Layer 2 and Layer 3 Connectivity
classification, the term “subnet” doesn’t relate to actual prefixes learned via L3OUT. A
subnet is perceived more like a range of IP addresses. Even if you receive the default
route 0.0.0.0/0 from external router, a subnet can be used in more specific range (e.g.,
192.168.0.0/16). All endpoints within a defined subnet are logically a part of a given
external EPG with related contracts applied to their communication. If multiple subnets
contain the source/destination IP address of the packet being forwarded, Longest Prefix
Match (LPM) is used; the external EPG with the most specific subnet will be matched for
contract policy enforcement. Subnet scope is VRF (not L3OUT), so you can’t have the
same subnet defined in multiple L3OUTs within the same VRF.
If you do not intend to differentiate between external endpoints, often for clarity (like
in this example), you can create a single subnet entry with 0.0.0.0/0, representing “any”
IP address. It can also serve as a last resort if the forwarding process hasn’t found any
other more specific subnet matching the forwarded packet addresses.
From endpoint learning and forwarding point of view, it wouldn’t be feasible or
scalable for leaves to learn tens of thousands of external /32 IP endpoints. Therefore,
ACI, when forwarding traffic to L3OUT, completely relies on IP prefixes received from
an external source and it just learns related next hops and their MAC addresses using
ARP. It’s the same as in a traditional L3 network. If the destination IP of the forwarded
packets is not learned in a leaf’s internal endpoint table and it cannot find a pervasive
gateway subnet there, the last option can be a standard routing table lookup for external
routing information (as shown in the forwarding decision diagrams in Chapter 6).
344
Chapter 7 External Layer 2 and Layer 3 Connectivity
This by default enabled checkbox in the External EPG Classification section represents
the already discussed external endpoint mapping to the external EPG. Based on subnets
defined with this scope, ACI per VRF creates an internal mapping table on border leaves.
IP headers of L3 packets forwarded via L3OUTs are then checked against it, searching
for the EPG with the most specific subnet definition (Longest Prefix Match). That EPG’s
contracts are applied to the traffic.
Deployed mapping tables can be verified directly on leaf switches using the vShell
show command in Listing 7-6.
345
Chapter 7 External Layer 2 and Layer 3 Connectivity
I created two subnets in ExtEPG: 0.0.0.0/0 and 10.0.0.0/8. While “any” subnet has
a special common pcTag (class) of 15, used as a last resort in each VRF, when no more
specific subnet is available, all other more specific definitions will be mapped to their
respective ExtEPGs (32773 in this case).
Caution Do not create multiple 0.0.0.0/0 subnets in multiple L3OUTs within the
same VRF. Even though the ACI configuration will allow such a step, the forwarding
could behave unexpectedly.
Besides creating mapping tables, this subnet scope doesn’t affect routing tables or
actual exchange or prefixes between ACI and external L3 devices.
Similar to the first subnet scope, this is another contract-related option. It ensures the
correct propagation of ExtEPG pcTAG information when leaking prefixes between
L3OUTs in different VRFs. However, it cannot be used alone and has to cooperate with
two other subnet scopes as well.
Imagine having L3OUT A with ExtEPG A in VRF A and you want to leak prefix
192.168.10.0/24 from it to L3OUT B in VRF B. VRF leaking requires an explicit subnet
definition in the source L3OUT with following configuration:
346
Chapter 7 External Layer 2 and Layer 3 Connectivity
applied contract relations between ExtEPGs (with contract scope tenant or global). Of
course, to achieve mutual connectivity, you need to apply the same configuration vice
versa to prefixes from other VRFs as well.
Next, three route control scopes (export, import, and shared) will actively affect the
prefix exchange between ACI switches and other L3 devices. Each subnet created
with these scopes will end up being included in the prefix list on the border leaf, thus
matching the subnet definition exactly (not Longest Prefix Match, like in previous cases).
If you configure subnet 10.0.0.0/8, exactly 10.0.0.0/8 route will be matched and not for
example 10.0.0.0/16.
The export route control subnet is the explicit definition of the prefix from
the routing table in the same VRF, which can be advertised to the outside via this
L3OUT. Mainly this option is used for transit routing (to exchange prefixes between
different L3OUTs), but it has the same effect on bridge domain subnets as well.
Let’s say you have a bridge domain with Unicast Routing enabled and configured
subnet 10.1.1.1/24 (scope advertised externally). By matching this prefix in the L3OUT
ExtEPG subnet 10.1.1.0/24 with export route control subnet scope instead of the default
one, the bridge domain prefix will be advertised to outside of ACI. The result can be seen
on border leaves in their automatically created redistribution route maps and prefix lists,
as well as in routing tables of external routers. See Listing 7-7.
347
Chapter 7 External Layer 2 and Layer 3 Connectivity
More information about ACI transit routing related to export route control subnets
will be discussed later in this chapter.
348
Chapter 7 External Layer 2 and Layer 3 Connectivity
checkbox greyed out) because ACI doesn’t filter incoming routing information at all. If
you intend to start filtering inbound prefixes, do the following:
The deployment philosophy is similar to the previous export scope: each import
route control subnet will be put precisely as defined into the route map and prefix list.
For OSPF, it’s table-map filtering inbound information is installed to RIB; in the case of
BGP, the route map is applied on a BGP peer. I configured 10.10.10.0/24 as the import
route control subnet and the result you can see in Listing 7-8.
349
Chapter 7 External Layer 2 and Layer 3 Connectivity
ip prefix-list IPv4-ospf-32773-3080193-exc-ext-inferred-import-dst-rtpfx:
1 entries
seq 1 permit 10.10.10.0/24
As already indicated, shared route control subnets enable you to leak defined external
prefixes from the current originating VRF to another (based on contracts applied
between ExtEPGs). Prefixes will end up again in a prefix list used as a filter for route-
target export/import in MP-BGP.
Aggregate shared routes for route-leaking between VRFs can be used universally
together with the shared route control subnet, always creating prefix list X.X.X.X/M le 32.
If you configure shared aggregation for 172.16.0.0/16, the result will be 172.16.0.0/16 le
32, matching for example 172.16.1.0/24, 172.16.2.0/24, and so on.
Note Should you need a more granular approach to route filtering, but without
individual subnets in ExtEPG, you can create customized route maps with prefix
lists called route profiles in ACI. This topic will be covered at the end of this
chapter.
350
Chapter 7 External Layer 2 and Layer 3 Connectivity
• In the global ACI policy found at System -> System Settings ->
BGP Route Reflector, set the BGP AS number and choose at least
two spines acting as MP-BGP route reflectors (RR). All leaves then
automatically became RR clients and establish BGP peerings with
defined RRs.
351
Chapter 7 External Layer 2 and Layer 3 Connectivity
03%*3 55 03%*3 55
03%*3 'DWDEDVH
931YY
95) RYHUOD\
95) $SUHVV352'
We can summarize the external prefix exchange process to the following elements:
• All leaf switches deploy the BGP IPv4/IPv6 address family in user-
created VRFs.
• All leaves and route reflector spines deploy the MP-BGP VPNv4/
VPNv6 address family (AF) and establish mutual peerings in this AF.
352
Chapter 7 External Layer 2 and Layer 3 Connectivity
2. The BD subnet prefix is permitted in the prefix list and route map
and applied to redistribution from ACI to the L3OUT external
routing protocol. This step can be achieved in three different ways:
In the following sections, I’ll show you how to configure and verify each aspect of
publishing bridge domain IP prefixes.
353
Chapter 7 External Layer 2 and Layer 3 Connectivity
alone won’t affect the leaf configuration or prefix advertisement yet. It always needs to
be coupled with one of additional configuration components described in the following
sections.
354
Chapter 7 External Layer 2 and Layer 3 Connectivity
can add multiple L3OUTs to the Associated L3 Outs list (as shown in Figure 7-24). In the
default state, all BD subnets with the scope of advertised externally will be automatically
advertised via the chosen L3OUTs and their routing protocols.
• Bridge domains are the “single source of truth” about which internal
subnets are redistributed to which L3OUTs.
Disadvantage:
• In the L3OUT object you don’t see any information about advertised
subnets.
355
Chapter 7 External Layer 2 and Layer 3 Connectivity
subnet is directly configured in redistribution prefix lists, so if the bridge domain prefix is
present in a border leaf routing table, it’s right away advertised to the outside.
Advantage of this approach:
• The same external EPG subnet definitions are used for transit
routing prefixes as well as BD subnets. There is no differentiation
between them.
Disadvantage:
356
Chapter 7 External Layer 2 and Layer 3 Connectivity
7HQDQW$SUHVV
([WHUQDO (3*
V&ODVV
/287 03/6B3(B/287
The contract is in the default state with the apply both directions and reverse filter
ports options enabled, so the resulting zoning rule table on border leaf switch looks as
follows in Listing 7-9.
358
Chapter 7 External Layer 2 and Layer 3 Connectivity
When there is at least one specific subnet in an external EPG, ACI will deploy two
standard zoning rules (bi-dir and uni-dir-ignore) between internal and external pcTags
(sClasses). In Listing 7-9, see Rule IDs 4151 and 4159.
With the “any” subnet 0.0.0.0/0, ACI follows this generic concept: there are two
additional rules in the zoning tables.
• Rule ID 4176: If the External EPG with 0.0.0.0/0 is the destination, ACI
uses always the same special pcTag 15.
• Rule ID 4127: If the External EPG with 0.0.0.0/0 is the source, ACI
uses VRF pcTag, in this case 49153.
359
Chapter 7 External Layer 2 and Layer 3 Connectivity
OSPF
The Open Shortest Path First (RFC 2328) protocol, based on a link state database and
Dijkstra’s shortest path calculation algorithm, is a widely deployed routing mechanism
to exchange IP prefix information within the single autonomous system. The routing
domain is split into areas, where area 0 is considered the “backbone” and all other
non-zero areas have to be interconnected though it. All routers in the single area
share complete topology information (link states) between each other in a form of List
State Advertisements (LSAs) and recalculate the best path to each subnet after link
state change.
To dynamically discover OSPF neighbors, the protocol uses Hello packets sent to a
multicast address (225.0.0.5) over enabled interfaces. But actual full OSPF adjacency is
established only after these properties are correctly configured:
• IP subnet: The ACI leaf and external L3 device must be on the same
subnet with the same network mask.
• Timers: The Hello (default 10) and Dead (default 40) timers have
to match.
• Router ID: Always make sure to have a unique router ID set for your
border leaves in a particular VRF. Duplicate router IDs would prevent
OSPF routers becoming neighbors, or if not directly connected, they
would cause reachability issues in the network due to the inability to
exchange routing information correctly.
360
Chapter 7 External Layer 2 and Layer 3 Connectivity
When the OSPF protocol is enabled at the root L3OUT object, ACI allows the
administrator to configure several policies related to its operation. In Figure 7-27, you
can see the global OSPF (per L3OUT) configuration.
• OSPF Area Control: Specific options meant for Stub or NSSA area to
filter advertised LSA, making the area Totally Stubby
• OSPF Area Cost: Cost (metric) for default route if the ACI border leaf
generates it in the Stub area
361
Chapter 7 External Layer 2 and Layer 3 Connectivity
As illustrated in Figure 7-29, for all L3 interfaces in the particular profile, you can
configure the following:
362
Chapter 7 External Layer 2 and Layer 3 Connectivity
The last option in the previous policy, the OSPF interface policy, provides an
additional modular object carrying a couple of configuration options you would in a
legacy network configure by individual commands on each interface (see Figure 7-30).
363
Chapter 7 External Layer 2 and Layer 3 Connectivity
What I definitely recommend changing when using OSPF is the default reference
bandwidth. ACI uses 40G, but with current Nexus 9000 models, it’s nothing special to
deploy 100G or even 400G interfaces. To take these higher speeds into account when
calculating OSPF cost, you need to change reference bandwidth to some reasonable value,
ideally the highest 400G. Together with other more advanced settings like administrative
distance, or fine tuning of LSA timers, you can find the policy at the VRF object level of
Tenants -> Tenant_name -> Networking -> VRF -> VRF_name -> OSPF Timer Policy.
364
Chapter 7 External Layer 2 and Layer 3 Connectivity
If multiple OSPF areas are required on the same border leaf, they need to be
separated to different L3OUTs.
If case of problems with protocol packets exchange, it’s good to use the tcpdump
utility on affected border leaf, as shown in Listing 7-10.
365
Chapter 7 External Layer 2 and Layer 3 Connectivity
EIGRP
Originally Cisco’s proprietary but now open and standardized, the Enhanced Interior
Gateway Routing Protocol (RFC 7868 with already existing open-source implementation)
is an additional option in a dynamic routing rig of ACI. EIGRP belongs with OSPF to
the same Interior Gateway Protocol category, exchanging prefix information within the
single autonomous system. Still, instead of being a link state, it is an advanced distance
vector protocol. EIGRP builds its topology table and calculates the best loop-free paths
to all destinations using the DUAL algorithm, based on information from its neighbor’s
advertisements. The neighbor advertising to the local EIGRP router the best loop-free
path to the destination is called the successor. If available and also loop-free, EIGRP
can right away calculate the backup path as well, the “feasible successor.” Compared
to OSPF, when the topology changes, EIGRP calculates new paths only for affected
prefixes instead of a recalculation of the whole OSPF link-state database. These specifics
make EIGRP incredibly scalable and it often achieves the best converge performance
compared to other routing protocols.
EIGRP requires these settings to match between neighbors in order to create an
adjacency:
EIGRP is not sensitive to timer mismatch, duplicate router IDs, or different IP MTUs.
However, as a best practice, I always ensure that these values are set consistently and
correctly.
366
Chapter 7 External Layer 2 and Layer 3 Connectivity
Going back to ACI, you need to enable EIGRP as always at the root L3OUT object
level. As you can see in Figure 7-31, there is not much to configure here. Just set the
correct autonomous system number to match the external router and optionally
configure the default route advertisement. Only single EIGRP L3OUT can be deployed
on the same border leaf per VRF due to a single configurable autonomous system per
EIGRP process.
Similar to OSPF, EIGRP needs an additional interface policy to enable its operation
for given leaf interfaces. If the protocol isn’t enabled during the initial L3OUT wizard,
you will need to create the policy manually by right-clicking the logical interface profile
and choosing Create EIGRP Interface Profile (as shown in Figure 7-32).
367
Chapter 7 External Layer 2 and Layer 3 Connectivity
Another configuration option available in the interface profile is the EIGRP interface
policy. Like in the case of OSPF, this object serves for specific settings related to EIGRP
that you would otherwise configure using the CLI per interface in a legacy network (see
Figure 7-34):
368
Chapter 7 External Layer 2 and Layer 3 Connectivity
More advanced EIGRP settings like administrative distances for internal and
redistributed prefixes, narrow/wide metric style, or active intervals can be found at the
VRF level when you navigate to Tenants -> Tenant_name -> Networking -> VRF ->
VRF_name -> EIGRP Context Per Address Family and there apply a custom EIGRP
family context policy.
369
Chapter 7 External Layer 2 and Layer 3 Connectivity
Note It’s good to know, that when advertising BD subnets or prefixes from other
L3OUTs for transit routing, EIGRP and OSPF share the same route map on the same
border leaf and VRF. Changing a setting in one protocol can negatively affect the
second and cause unwanted prefix leaking. Increased attention is recommended
when dealing with such a use case, preferably avoiding it completely.
BGP
The last available routing protocol in ACI, the Border Gateway Protocol (RFC 4271),
represents a path-vector, exterior routing mechanism, primarily designed to support
advanced routing and path manipulation features between different domains
(autonomous systems). Together with prefixes, it exchanges multiple path attributes,
some used for routing loop prevention (AS Path), others for traffic engineering (Weight,
Local Preference, Multi-Exit Discriminator), or customized communities (tags) for prefix
filtering and policy enforcement. BGP by default uses a standardized and deterministic
path selection procedure, always identifying the single best path to the destination (but
configurable for multi-path routing as well). The protocol is robust and scalable enough
to support the whole Internet backbone routing and often becomes popular for internal
networks as well due to its benefits.
BGP establishes its peerings reliably thanks to the TCP protocol on well-known port
179. It differentiates between external BGP (eBGP) peerings interconnecting distinct
autonomous systems or internal BGP (iBGP) peerings within the same autonomous
370
Chapter 7 External Layer 2 and Layer 3 Connectivity
When you enable the BGP protocol in the root L3OUT object, there are actually no
configuration options available at this level. The BGP process is already running thanks
to MP-BGP policies deployed in the ACI fabric Pod policies.
All main BGP peer configuration options in ACI are grouped to the BGP Peer
Connectivity Profile. The placement of this object defines which source interfaces will be
used for peering itself. There are two alternatives (as you can see in Figure 7-35).
• Node Profile: BGP uses loopback IP addresses defined for each leaf
node as a source for a peering session.
371
Chapter 7 External Layer 2 and Layer 3 Connectivity
To create both types of BGP peer connectivity profiles, simply navigate to L3OUT ->
Logical Node Profiles -> Node Profile and add the required type of profile in BGP Peer
Connectivity section (shown in Figure 7-36).
372
Chapter 7 External Layer 2 and Layer 3 Connectivity
• Allow Self AS: BGP accepts from eBGP peers even routes with
its own AS in the AS_PATH attributes. Normally this is denied to
prevent routing loops.
373
Chapter 7 External Layer 2 and Layer 3 Connectivity
• Allowed Self AS Count: Field enabled when the Allow Self AS option
is checked and defines the maximum count for self AS numbers seen
in AS_PATH
• Peer Controls
374
Chapter 7 External Layer 2 and Layer 3 Connectivity
• Remote Private AS: ACI will remote all private AS numbers from
AS_PATH as long as it consist only from private ones.
• Remove all private AS: This time ACI remotes all private AS
numbers regardless of public AS number presence in AS_PATH.
• Site of Origin (SoO): ACI 5.2 added support for this BGP transitive
extended community attribute, providing another way to avoid
routing loops by looking at information where the prefix originated.
BGP can automatically deny relearning of such a route in the
source site.
375
Chapter 7 External Layer 2 and Layer 3 Connectivity
376
Chapter 7 External Layer 2 and Layer 3 Connectivity
Although you have seen a huge number of options for BGP, the minimal peer
configuration consists of the peer IP, remote AS number, and eBGP TTL multihop if
necessary.
377
Chapter 7 External Layer 2 and Layer 3 Connectivity
Next hops are other objects in ACI and their configuration again consists of multiple
options (as shown in Figure 7-39).
• Next Hop Type: If None is chosen, ACI will create a special static
route pointing to Null0.
• Track Policy: Usability of the next hop can depend on some other IP
or a set of IPs defined in the track policy.
378
Chapter 7 External Layer 2 and Layer 3 Connectivity
To verify the deployment of static routes, just refer to the related border leaf routing
table and check their presence there using the code in Listing 7-11.
379
Chapter 7 External Layer 2 and Layer 3 Connectivity
Technically, to transit via the ACI fabric, you need to advertise prefixes learned from
one L3OUT to another L3OUT (in the same or different VRF) and make sure that zoning
rules include mutual pcTag entries between two external EPGs.
Figure 7-40 describes in detail the typical configuration and deployment principle of
transit routing between two L3OUTs and two prefixes for simplicity. L3OUT A is learning
prefix 172.16.0.0/16, L3OUT B is learning 10.0.0.0/8 and the goal is to ensure connectivity
between their external endpoints via ACI.
380
Chapter 7 External Layer 2 and Layer 3 Connectivity
95) 352'
/287 $ /287 %
([W (3*$ SF7DJ ([W (3*% SF7DJ
x ([WHUQDO 6XEQHWV IRU x ([WHUQDO 6XEQHWV IRU
([WHUQDO (3* &RQWUDFW ([WHUQDO (3*
x ([SRUW 5RXWH &RQWURO x ([SRUW 5RXWH &RQWURO
6XEQHW 6XEQHW
/287 $ /287 %
95)352' 95)352'
First, you need to accurately classify both local prefixes. In each L3OUT, therefore,
create an external EPG and respective subnet with the scope of External Subnets for
381
Chapter 7 External Layer 2 and Layer 3 Connectivity
External EPG. For L3OUT A, it will be 172.16.0.0/16, and in L3OUT B, it will be 10.0.0.0/8.
Optionally in this case, you could use a 0.0.0.0/0 subnet as well to match any prefix.
Then, from a route filtering perspective, you need to deploy prefix lists and route
maps to enable mutual redistribution between L3OUTs. This is done by adding a second
subnet entry to each external EPG, specifying a remote, transiting prefix with the scope
External Route Control Subnet. In L3OUT A, you need to advertise the externally remote
prefix 10.0.0.0/8 and for L3OUT B, remote prefix 172.16.0.0/16. At this point, all the
prefixes should be exchanged between border leaves via MP-BGP and they will start
to advertise them to external peers. In the respective local prefix mapping tables for
external EPGs (vsh -c "show system internal policy-mgr prefix") you should see
remote prefixes with their pcTags.
Finally, for endpoint connectivity to work, you need to deploy a contract between
external EPGs, resulting in mutual zoning rules installation (pcTags 32773 and 23453).
The previous example described the most common use case of transiting the
external traffic between two L3OUTs on different border leaves. ACI actually supports
more variations and topologies (as shown in Figure 7-41):
382
Chapter 7 External Layer 2 and Layer 3 Connectivity
/287 $ /287 $
When using multiple OSPF instances on the same border leaves, which are meant to
be transitive, one of them needs to be configured in area 0 due to direct inter-area prefix
distribution without MP-BGP involvement.
Another side note: the latter two supported topologies listed with a single L3OUT
introduce a limitation of using the 0.0.0.0/0 subnet in an external EPG. The “any” subnet
cannot match both source and destination prefixes at the same time. You have to split the
range at least to 0.0.0.0/1 and 128.0.0.0/1 with a scope of external subnets for the external EPG.
383
Chapter 7 External Layer 2 and Layer 3 Connectivity
In automatically created redistribution route maps for each L3OUT (if using EGRP
or OSPF protocol) notice, that all prefixes permitted by prefix-list are tagged. See
Listing 7-12.
Should the same prefix already advertised from ACI return back to ACI in the same
VRF, it will be automatically denied, as in Listing 7-13.
384
Chapter 7 External Layer 2 and Layer 3 Connectivity
In VRF object policy configuration (Tenants -> Tenant_name -> Networking ->
VRF -> VRF_name -> Policy), there is an inconspicuous little option to set your own
route tag in case of need (shown in Figure 7-42). Especially when you would like to
exchange prefixes between VRFs via external network (often through some core network
layer, or firewall performing inter-zone inspection), without a custom and different route
tag for each VRF, they would be denied on ingress.
385
Chapter 7 External Layer 2 and Layer 3 Connectivity
They offer a much more granular and easier-to-maintain prefix manipulation tool
compared with subnets in external EPGs or bridge domains to L3OUT associations. Each
route profile, more specifically its rules, will be after configuration simply appended to
the already existing route maps created by ACI.
As depicted in Figure 7-43, the structure of a route profile is modular, imitating the
CLI configuration of a traditional route map. On its root level, a route profile offers two
ACI-specific type settings:
• Match Prefix AND Routing Policy: Resulting route maps with merge
prefixes from objects where they are applied (e. g., Export Route
Control Subnets in ExtEPG) and from Match Rules in the route
profile.
• Match Routing Policy Only: The route map will ignore prefixes
from associated objects, and exclusively apply just policy defined in
Match Rules.
386
Chapter 7 External Layer 2 and Layer 3 Connectivity
5RXWH3URILOH;;;
7\SH0DWFK3UHIL[$1'5RXWLQJ3ROLF\_0DWFK5RXWLQJ3ROLF\2QO\
x 2UGHU
&RQWH[W x $FWLRQ
x &RPPXQLW\
0DWFK5XOHV
x 3UHIL[
x 2UGHU
&RQWH[W
x $FWLRQ
x &RPPXQLW\
0DWFK5XOHV
x 3UHIL[
Inside the route policy you can configure a set of contexts equivalent to sequence
blocks with permit/deny statements in route maps. Each context includes match
statements, configurable to match the prefix lists or collection of communities. Then,
if the context is permissive, on matched routing information you can apply a wide
selection of set rules.
Configuration-wise, route profiles can be created at two levels:
387
Chapter 7 External Layer 2 and Layer 3 Connectivity
With L3OUT level configuration, two special route profiles are available: default-
import and default-export. You can choose them from the drop-down menu during
object creation, instead of typing a custom name. These objects automatically affect all
the prefixes in chosen direction related to a particular L3OUT. You don’t need to apply
them anywhere, just have them created. Often, they can be the simplest solution for
custom routing manipulation and personally I prefer them whenever possible for our
customers.
For a practical demonstration, say you need to filter BGP prefixes advertised from
the production network to ACI and set a particular BGP local preference to them when
incoming via a specific L3OUT. The BGP peering is already up and import route control
enforcement in the global L3OUT configuration is enabled as well. To create the required
policy, go to the Route Map for import and export route control folder in L3OUT and
choose the already mentioned default-import route profile name. Add a new context
with Order 0 and action Permit (as shown in Figure 7-44).
388
Chapter 7 External Layer 2 and Layer 3 Connectivity
When the import route control is enforced for L3OUT, you need to somehow match
the incoming prefixes. Add a new match rule to the Associated Matched Rules list, where
you will find three main options for prefix classification:
389
Chapter 7 External Layer 2 and Layer 3 Connectivity
In your case, for simplicity, create few prefix entries matching /24 le 32 masks
(as shown in Figure 7-45).
Then for matched prefixes, create a new Set Rule with a preference such as 200. As
shown in Figure 7-46, there are many different options resembling NX-OS route maps.
This step finishes the route profile object configuration.
390
Chapter 7 External Layer 2 and Layer 3 Connectivity
Thanks to the default-import route profile, you don’t need to explicitly further apply
it; ACI will automatically update all the applicable route maps, for example in case of
BGP, the inbound neighbor route-map as in Listing 7-14.
391
Chapter 7 External Layer 2 and Layer 3 Connectivity
Match clauses:
ip address prefix-lists: IPv4-peer32773-3080193-exc-ext-in-default-
import2Set_LocalPref_2000PROD_Prefixes-dst
ipv6 address prefix-lists: IPv6-deny-all
Set clauses:
local-preference 200
Summary
ACI offers an extensive palette of features related to external connectivity, as you have
seen during this chapter. To support migrations or coexistence with a legacy networking
infrastructure, you can utilize Layer 2 EPG or bridge domain extensions, ensuring the
same communication policies are applied on both internal and external endpoints. For
publishing applications and services running inside the ACI, you can then configure
a plethora of Layer 3 routing features grouped into L3OUT objects, starting with static
routes, going through dynamic interior gateway protocols OSPF and EIGRP, up to the
BGP protocol. You learned how ACI can offer for them comparable route filtering and
attribute manipulation tools in the form of route profiles than any other Layer 3 router.
And you saw that the fabric doesn’t necessarily have to be the stub network only. You can
also enable transit routing and automated internal VRF-leaking, if needed.
Now with both east-west and north-south traffic patterns in place, let’s continue
further and include additional L4-L7 devices into the picture to significantly enhance the
security and performance of your applications. In the upcoming chapter, I will discuss
another very important feature: dynamic L4-L7 service insertion between ACI endpoints
using service graphs.
392
CHAPTER 8
Service Chaining
with L4-L7 Devices
Cisco ACI includes a significant number of tools to implement and enhance security
and segmentation from day 0. I have already discussed tenant objects like EPGs, uEPGs,
ESGs, and contracts permitting traffic between them. Even though the ACI fabric is
able to deploy zoning rules with filters and act as a distributed firewall itself, the result
is more comparable with a stateless set of access lists ACLs. They are perfectly capable
of providing coarse security for traffic flowing through the fabric, but its goal is not to
replace fully featured and specialized equipment with deep traffic inspection capabilities
like application firewalls, intrusion detection (prevention) systems (IDS/IPS), or load
balancers often securing the application workloads. These devices still play a significant
role in the datacenter design, and ACI offers multiple useful features to easily and
dynamically include them in the data path between endpoints, no matter whether the
traffic pattern is east-west or north-south.
In this chapter, I will focus on ACI’s service graph and policy-based redirect (PBR)
objects, bringing advanced traffic steering capabilities to universally utilize any L4-L7
device connected in the fabric, even without the need for it to be a default gateway for
endpoints or part of a complicated VRF sandwich design and VLAN network stitching.
And you won’t be limited to a single L4-L7 appliance only; ACI is capable of chaining
many of them together, or even loadbalancing between multiple active nodes, all
according to your needs. You will get started with a little bit of essential theory and
concepts about service insertion, followed by a description of supported design options
for both routed Layer 3 and transparent Layer 2 devices. Then you will review the
configuration concepts and verification techniques for multiple use cases of PBR service
graphs. Although my goal is not to cover every possible option and aspect of ACI service
insertion (in fact, this topic is enough for another whole book), I’ll definitely offer a
strong foundation of knowledge for you to build on later.
393
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_8
Chapter 8 Service Chaining with L4-L7 Devices
4. ACI Service Graph: The last option is our current topic: the
utilization of service graphs. They allow you to insert the L4-L7
service device into a forwarding path by specifying a so-called
service graph template in an already existing contract between
EPGs. There’s no need to create new contract relationships or
new L4-L7 EPGs. This way you can focus on standard application
policies instead of complicated application profile changes to get
traffic to the L4-L7 appliance.
/287 /287
// 95) $ 95) %
(3*
395
Chapter 8 Service Chaining with L4-L7 Devices
graph templates and interconnected with each other without any change to their
definition. You will take a practical look at the configuration of all mentioned objects
later in this chapter.
Figure 8-2 illustrates the standard contract between two EPGs with a service graph
applied in the subject, defining the chain of three function nodes for the communication
path. The sample L4-L7 logical device used in the load balancer function node internally
describes a HA cluster of two concrete (physical) load balancers connected behind
interfaces Eth1/1-2 of Leaf 101-102, over VLANs 10 and 20.
(WK
(WK (WK (WK
(WK
9/$11 9/$1 9/$1 9/$1
9/$
When such a contract is applied between EPGs, ACI “renders” a service graph
instance and implements all the necessary policies, interface configurations, VLANs, and
zoning rules on leaf switches to divert the endpoint traffic according to the description.
As you can see in Figure 8-3, APIC automatically deploys a new set of shadow EPGs for
L4-L7 device interfaces and implements additional contracts allowing communication
with them (or instructing leaves to perform the traffic redirect action instead of simple
forwarding based on learned endpoint information).
397
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
&RQVXPHU ZLWK // 3URYLGHU
(3* 6HUYLFH (3*
*UDSK
&RQVXPHU 3URYLGHU
&RQWUDFW ):B2XVLGHB(3* ):B,QVLGHB(3* &RQWUDFW
(3* (3*
Since ACI’s first introduction to the market, it has developed various deployment
modes for service graphs and L4-L7 devices used inside. For better understanding, it’s
helpful to differentiate between two main service insertion categories, especially as they
might look similar:
398
Chapter 8 Service Chaining with L4-L7 Devices
1. Go-To Mode (routed mode): For the routed mode, we assume that
the L4-L7 device is always a default gateway for connected servers.
There are two bridge domains needed for this design: Client BD,
in which the outside interface of the L4-L7 device is connected,
and Server BD for the inside L4-L7 interface (always L2 only).
Based on the routing solution implemented from the external
clients to ACI, we can further divide this design into following
subcategories:
399
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
6HUYHU(3*
//
&OLHQW (3*
'HIDXOW 6HUYHU
*: 'HIDXOW
*:
2XWVLGH ,QVLGH
([WHUQDO
5RXWHU &OLHQW %' / RQO\ 6HUYHU %' / RQO\
400
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
:$1 /287
6HUYHU(3*
): /287
6HUYHU
$&, 95) 'HIDXOW
*:
2XWVLGH ,QVLGH
401
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
6HUYHU
6HUYHU(3*
&OLHQW(3*
'HIDXOW
*:
2XWVLGH ,QVLGH
([WHUQDO
5RXWHU ([WHUQDO %' / RQO\ 6HUYHU %' / RQO\
7UDQVSDUHQW // 6HUYLFHV
&RQWUDFW
:$1 /287
6HUYHU(3*
%' 6XEQHW
'HIDX
I OW *:
6HUYHU 'HIDXOW
2XWVLGH ,QVLGH
Figure 8-8. Transparent L4-L7 node with L3 routing in the external ACI BD
402
Chapter 8 Service Chaining with L4-L7 Devices
403
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
&OLHQW (3* /287
6HUYHU (3*
%' 6XEQHW *: %' 6XEQHW *:
%' 6XEQHW
XEQHW *: %' 6XEQHW
6XEQH *:
2XWVLGH ,QVLGH
/ ):
6HUYLFH 2XWVLGH %' 6HUYLFH ,QVLGH %'
95) / HQDEOHG / HQDEOHG
404
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
&OLHQW (3* /287
6HUYHU (3*
%' 6XEQHW *: %' 6XEQHW *:
2XWVLGH ,QVLGH
95)
405
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFW
&OLHQW (3* /287
6HUYHU (3*
&OLHQW 'HIDXOW *: 6HUYHU'HIDXOW
HUYHU 'HID
I XOW *:
*
2XWVLGH ,QVLGH
Note For all previously described designs, always refer to the official ACI
whitepapers before implementing a particular one as various specifics and limits
may apply, sometimes related to the currently used ACI version. In the Appendix of
this book, you will find links to the most important and useful official resources.
406
Chapter 8 Service Chaining with L4-L7 Devices
offers basically everything you need to simply and effectively create a chain of L4-L7
service devices to which ACI forwards deterministically based on a predefined MAC
or MAC+IP next-hop addresses. All that with single (one-arm) or double (two-arm)
interfaces on L4-L7 appliances. Even when new contracts or provider/consumer EPGs
are configured in ACI’s application profile with service insertion, it won’t affect the
networking settings of the L4-L7 devices. Usually, you just need to add new firewall rules
or load balancer policies.
Due to the significant advantages and popularity, I will dedicate the rest of this
chapter exclusively to policy-based redirect, which I will analyze in more detail.
407
Chapter 8 Service Chaining with L4-L7 Devices
)URQWHQG (3*/287
95)$ )5217(1'
,3
*:
*:
%' )URQWHQGG
$&, /HDI
$
/287 $ &25( ):
95)$
,QWHUIDFH
/ 5RXWHG &25( ):
95)%
,QWHUIDFH
95)%± %$&.(1'
/287 % &25( ):
$&, /HDI
$
%' %DFNHQGG
,3
*:
%DFNHQG (3*
On the other hand, Figure 8-13 shows an example of a service graph with policy-
based redirect, which can deviate the traffic selectively and consistently to the same
set of interfaces on L4-L7 devices (and BDs) even within the same VRF. New VRFs
or communicating EPGs/ESGs don’t result in new interfaces on the intermediate
service node.
408
Chapter 8 Service Chaining with L4-L7 Devices
)URQWHQG (3*/287
,3
*:
*:
%' )URQWHQGG
3ROLF\%DVHG5HGLUHFW $&, /HDI
$
2XWVLGH
95)352'
/ 5RXWHG &25( ):
,QVLGH
,3
*:
%DFNHQG (3*
409
Chapter 8 Service Chaining with L4-L7 Devices
)URQWHQG %DFNHQG
(3* & )URQWHQG ([WHUQDO
(3*
(3* & (3*
/287
410
Chapter 8 Service Chaining with L4-L7 Devices
traffic to its destination, Leaf 103 will incorrectly learn endpoint 10.1.1.10 being
connected behind Leaf 102 as well. In the end, this will disrupt not only the PBR traffic,
but likewise all others.
6SLQH 6SLQH
*:
*: *:
*:
*:
*:
(QGSRLQW'DWDSODQH (QGSRLQW'DWDSODQH
(QGSRLQW 'DWDSODQH
/HDUQLQJ (QDEOHG /HDUQLQJ (QDEOHG
6HUYHU $ 6HUYHU %
,3 ,3
0$&%' &RQVXPHU &RQQHFWRU 3URYLGHU &RQQHFWRU
2XWVLGH ,QVLGH 0$&%)
,3 ,3
0$&):B0$&B2XWVLGH 0$&):B0$&B,QVLGH
411
Chapter 8 Service Chaining with L4-L7 Devices
&RQVXPHU&RQQHFWRU 3URYLGHU&RQQHFWRU
6HUYHU%
6HUYHU$ 2XWVLGH ,QVLGH
,3
,3 ,3 ,3
0$&%)
0$&%' 0$&):B2XWVLGH 0$&):B,QVLGH
6HUYLFHB):B2XWVLGH%' 6HUYLFHB):B,QVLGH%'
%')URQWHQG 8QLFDVW5RXWLQJ(QDEOHG /(QDEOHG /(QDEOHG %'%DFNHQG 8QLFDVW5RXWLQJ(QDEOHG
412
Chapter 8 Service Chaining with L4-L7 Devices
Note Before starting with the tenant configuration, don’t forget to prepare ACI’s
access policies for the L4-L7 service device. They are required, just like for any
other connected endpoint.
413
Chapter 8 Service Chaining with L4-L7 Devices
Consider this object as a universal building block usable in multiple service graphs later.
If the L4-L7 device is connected in two-arm mode, you will need two separate policies,
one for each logical connector–provider interface and consumer interface. In the case of
a one-arm device, a single policy is enough. Multiple active devices with different IPs can
be configured as destination next-hops and ACI will ensure symmetric load balancing
between them.
Figure 8-19 illustrates the configuration form for a FW outside interface in the model
topology. Analogically, one is created for the FW inside interface as well.
For PBR, the only mandatory field is the destination definition, but as you can see,
the L4-L7 policy-based redirect policy offers many other (useful) configuration options,
including the following:
414
Chapter 8 Service Chaining with L4-L7 Devices
415
Chapter 8 Service Chaining with L4-L7 Devices
416
Chapter 8 Service Chaining with L4-L7 Devices
In the two-arm FW example, you will end up creating two L4-L7 redirect
policies: FW_Outside with the outside interface destination IP+MAC (192.168.1.10,
00:58:82:A2:B8:01) and FW_Inside with the inside interface destination IP+MAC
(192.168.2.10, 00:58:82:A2:B8:02).
Note If you enable the IP SLA to track the destination IP address of the L4-L7
node, since ACI version 5.2 you don’t need to actually fill in the MAC address in this
policy object. Just put all zeros there or leave the field blank. ACI will automatically
ARP for it during the IP SLA operation and use the received MAC address.
L4-L7 Device
Thanks to the previous PBR policy object, ACI knows where to redirect the traffic from
an addressing point of view. The L4-L7 device object further describes the logical service
node, its physical location in the ACI fabric, the encapsulation used on its interfaces,
and the definition of multiple physical nodes composing a logical device HA cluster.
This object is again a universal building block for later use in service graphs, usable for
both traditional and policy-based redirect approaches. You can create it in Tenants ->
Tenant_name -> Services -> L4-L7 -> Devices.
Figure 8-21 shows the sample configuration of the L4-L7 device for your two-
arm firewall, consisting of two physical active-standby nodes connected to Leaf-102
interfaces eth1/10-13.
417
Chapter 8 Service Chaining with L4-L7 Devices
• Device Type: You should leave this knob to default Physical mode for
any physical service appliance, or even an appliance running as a VM,
but without VMM integration with ACI (described in the following
chapter). If the virtualization platform is integrated with ACI and you
intend to use a virtual L4-L7 device, switch to Virtual here.
For your example, consider a physical firewall L4-L7 with a single context and GoTo
(routed) mode since you will use L3 PBR. A firewall HA pair is defined in the Devices
section by two entries: the primary and secondary nodes. Each has an outside and inside
interface definition (two-arm mode). Then two logical connectors for a service graph are
created. The consumer (client side) contains the outside interfaces of both FW nodes
with encapsulation VLAN 10 and the provider (server side) contains the inside interfaces
with VLAN 20. The firewall network configuration should, of course, match the defined
VLANs. At the same time, it’s a simple way to split a single physical service device into
multiple virtual device contexts (VDCs) if supported. Just create multiple L4-L7 device
objects with the same configuration except for VLANs, which will differentiate between
firewall VDCs on top of the same physical interfaces.
All these settings put together completely describe the firewall cluster and its
connectivity with ACI. Now it’s time to join the L4-L7 device object with previous L4-L7
redirect policies into a service graph template.
419
Chapter 8 Service Chaining with L4-L7 Devices
In the left pane, you will find all L4-L7 logical devices created in the previous step,
and by dragging-and-dropping them between the generic consumer and provider
EPG placeholders shown on the right you define the service graph operation. Multiple
devices can be chained here as well if needed. For your example, use the following
settings:
• Graph Type: New Graph. Optionally you can copy settings from the
existing one.
• Filters After First Node: This setting applies to filters in the zoning
rule that don’t have the consumer EPG’s Class ID as a source or
destination (meaning after the first service node). The default setting
of Allow All simply permits all traffic, directing it to consecutive
service nodes. Filters from Contract enables more granular traffic
control between multiple nodes if needed.
420
Chapter 8 Service Chaining with L4-L7 Devices
The first configuration step is related to the contract definition. Choose the
consumer and provider EPGs or ESGs where the contract with the service graph will
be applied (at the time of writing this book, the newest ACI 6.0 does not support mixed
EPG/ESG relationships). Then fill in the contract information; either choose an existing
contract subject or create a completely new contract. By default, no filter is predefined,
meaning that the contract will match all traffic, and all traffic will be redirected to the
L4-L7 device. The common option is to specify the matched traffic instead manually.
Proceeding to the second step will bring you to the configuration of the Logical
Interface Contexts as part of Devices Selection Policy object (see Figure 8-24). The
Devices Selection Policy will be later located in Tenants -> Tenant_Name -> Services ->
L4-L7 -> Devices Selection Policy. The main purpose of these objects is to set the
421
Chapter 8 Service Chaining with L4-L7 Devices
bridge domain affiliation and map the L4-L7 redirect policies with logical connectors of
each functional node in the service graph template. Based on Devices Selection Policy,
ACI will also deploy a new shadow service EPG for each interface of the service device.
For each L4-L7 logical device (DC_FW in your case), there will be two configuration
sections: the consumer and provider connectors. Both offer the following options:
422
Chapter 8 Service Chaining with L4-L7 Devices
After submitting the form, ACI will finally merge and deploy all the previously
defined settings and policies resulting in a service graph instantiation.
Next, expand the service graph instance object, where you will find a list of all
applied function nodes in general. In your example, you see just one logical FW node
named N1 (default name). After opening the function node object, you’ll see an
important overview of cluster logical and concrete interfaces, together with function
connectors and their configuration (refer to Figure 8-26).
423
Chapter 8 Service Chaining with L4-L7 Devices
The cluster interfaces section provides an opportunity to check if the actual physical
interfaces of the L4-L7 devices are mapped in a correct VLAN to the correct service
graph function node connector. Then, in the function connectors section, notice that
each connector finally received a ClassID. These are policy tags (pcTags) of shadow
EPGs, ensuring connectivity for firewall interfaces, and you will check them in a while in
the zoning rules deployed on leaf switches.
The correctly associated service graph can be also verified from a contract point of
view in its subject (as shown in Figure 8-27). Remember that service graphs are always
applied only on traffic matched by the filter inside a particular subject.
424
Chapter 8 Service Chaining with L4-L7 Devices
When you move to the CLI, one of the main verification tools used on the provider
and consumer leaves where the policy-based redirect should be enforced is the
command and its output shown in Listing 8-1.
425
Chapter 8 Service Chaining with L4-L7 Devices
List of destinations
Name bdVnid vMac vrf operSt
HG-name
==== ====== ==== ==== =====
======
dest-[192.168.1.10] vxlan-15859680 00:5B:82:A2:B8:01 Apress:PROD enabled
Apress::FW_HG
dest-[192.168.2.10] vxlan-16056266 00:5B:82:A2:B8:02 Apress:PROD enabled
Apress::FW_HG
Consistent information should be present on all leaf switches where the provider or
consumer EPGs with their bridge domains are deployed. They are primarily performing
the policy-based redirect. Look especially for correctly configured destination IPs and
MACs, enabled health groups, and an operational state of Enabled. If you have IP SLA
configured (I highly recommend doing so) and there is some reachability issue with the
configured addresses, this output will notify you the about operational status of Disabled
with a further explanation of tracked-as-down in the operStQual column. Destination
group IDs shown at the beginning of each group line will play their role in zoning.
426
Chapter 8 Service Chaining with L4-L7 Devices
This standard contract in its default configuration with both options of Apply both
directions and Reverse filter ports enabled will result in two zoning rules shown in Listing 8-2.
Scope 3080193 refers to a VRF SegmentID, SrcEPG, and DstEPG then to EPG’s pcTag
or sClass.
Now consider the previously discussed PBR service graph applied between standard
EPGs (shown in Figure 8-29).
427
Chapter 8 Service Chaining with L4-L7 Devices
&RQWUDFWZLWK6HUYLFH*UDSK$SSOLHG
&RQVXPHU 3URYLGHU
&RQQHFWRU &RQQHFWRU
V&ODVV V&ODVV
95) 352' VFRSH
Figure 8-29. L4-L7 service graph deployed between two application EPGs
The same source and destination rules for application EPGs are still preserved, just
their action is changed from a simple permit to redir or threshold-redir. Redirect actions
further refer to PBR destination groups, which you saw in the show service redir info
output. Notice that new zoning rules are added with shadow EPG pcTags as source and
destination (RuleID 4230 and 4231 in Listing 8-3). The purpose of these rules is to permit
the forwarding from the egress interface of the L4-L7 device to the destination EPG in
each direction.
428
Chapter 8 Service Chaining with L4-L7 Devices
When an endpoint from the consumer EPG (pcTag 16386) initiates a connection
to the provider (pcTag 49156), its packets are matched by RuleID 4130, instructing the
leaf to perform a redirect to the destination group 2: the outside/consumer connector
of the L4-L7 device. The packet is then processed by the service device and returned
to the leaf via the inside/provider connector (shadow pcTag 49167). Thanks to the new
RuleID 4230, this traffic is permitted and delivered to the destination. The reply from
the provider EPG is redirected to destination group 3 based on RuleID 4131 and the
returning traffic from the service device via the outside/consumer connector (shadow
pcTag 16391) is permitted by RuleID 4231.
Tip To actually see the redirected traffic, it’s convenient to use the already
described ELAM tool or to configure SPAN sessions with source interfaces of L4-L7
devices. Alternatively, you can also enable some traffic capture utility directly on
the service device if it is equipped with that.
429
Chapter 8 Service Chaining with L4-L7 Devices
(QGSRLQW ,QWHUIDFH
0$& $
/HDI 97(3 ,3
,3 $
0$& %
/HDI 97(3 ,3
,3 %
//
&RQVXPHU /HDI 97(3 ,3
&RQQHFWRU
//
3URYLGHU /HDI 97(3 ,3
&RQQHFWRU
/HDI /HDI /HDI
*: *:
*:
*:
&RQVXPHU &RQQHFWRU 3URYLGHU &RQQHFWRU
6HUYHU %
6HUYHU $ 2XWVLGH ,QVLGH
,3
,3 ,3 ,3
0$&%)
0$&%' 0$&):B2XWVLGH 0$&):B,QVLGH
1. Server A generates a packet with its own source IP address and the
destination IP of Server B. The packet is sent to Leaf 101, which
acts as a default gateway for the server’s bridge domain.
430
Chapter 8 Service Chaining with L4-L7 Devices
the spines based on the VRF routing table information (to spine
Anycast VTEP IP). In the VXLAN header, there is the sClass of the
source EPG carried as well.
4. The incoming VXLAN packet causes Leaf 103 to learn the source
endpoint with its source VTEP address of Leaf 101 and it will
create the dynamic VXLAN tunnel interface. Additionally, Leaf 103
will learn the EPG information of Server A thanks to the VXLAN
header field carrying the source sClass. Now, Leaf 103 has both
source and destination EPG information in its endpoint table, and
it can perform a zoning table lookup and enforce the PBR contract
applied between EPGs. Just to remind you, ingress Leaf 101
couldn’t enforce any contract due to missing information, so it set
Source/Destination Policy (SP 0/DP 1) flags in the VXLAN header
to instruct the destination Leaf to apply for a contract instead.
5. The PBR zoning rule instructs Leaf 103 to redirect traffic from
the consumer EPG based on a given destination group to the
consumer connector of the L4-L7 device. To do so, the VXLAN
VNID is rewritten to the L4-L7 consumer bridge domain VNID,
and the destination MAC address is rewritten to a L4-L7 consumer
interface. Leaf 103, however, doesn’t know where the destination
MAC address is located in the fabric. It has to forward the VXLAN
packet again to the spines for a second resolution. The spines
will, thanks to COOP data, resolve as a destination Leaf 102 VTEP
address and forward the packet there.
431
Chapter 8 Service Chaining with L4-L7 Devices
6. Leaf 102 won’t learn Server A’s IP address because the endpoint
data plane learning feature is disabled for service bridge domains.
The VXLAN packet is now finally decapsulated and delivered to
the PBR service device.
The first part of the forwarding process ensures the traffic delivery to the consumer
connector (outside interface) of the L4-L7 service device. After the device performs
any configured inspections or data manipulation, it will return the packet via the
provider connector (inside interface) back to the fabric. The next steps are illustrated in
Figure 8-31.
(QGSRLQW ,QWHUIDFH
0$& $
/HDI 97(3 ,3
,3 $
0$& %
/HDI 97(3 ,3
,3 %
(QGSRLQW ,QWHUIDFH
7XQQHO
,3 $
/HDI 97(3
0$& %
(WK
,3 %
/HDI /HDI /HDI
432
Chapter 8 Service Chaining with L4-L7 Devices
The second part of PBR traffic delivery consists of the following steps:
3. After receiving the VXLAN packet, Leaf 103 will apply the
automatically created zoning rule, permitting all traffic from the
provider L4-L7 connector to the provider EPG. It will decapsulate
the traffic and send it via the local port to the destination
endpoint.
You have successfully delivered the first packet in a flow between consumer and
provider endpoints with policy-based redirect applied. Let’s now continue with
examining the return traffic, as described in Figure 8-32.
433
Chapter 8 Service Chaining with L4-L7 Devices
(QGSRLQW ,QWHUIDFH
0$& $
/HDI 97(3 ,3
,3 $
0$& %
/HDI 97(3 ,3
,3 %
//
&RQVXPHU /HDI 97(3 ,3
&RQQHFWRU
//
3URYLGHU /HDI 97(3 ,3
&RQQHFWRU
(QGSRLQW ,QWHUIDFH
7XQQHO
,3 $
/HDI 97(3
0$& %
(WK
,3 %
*: *: *:
*:
&RQVXPHU &RQQHFWRU 3URYLGHU &RQQHFWRU
6HUYHU %
6HUYHU $ 2XWVLGH ,QVLGH
,3
,3 ,3 ,3
0$&%)
0$&%' 0$&):B2XWVLGH 0$&):B,QVLGH
Server B sends a reply to Server A and the following actions happen around the
ACI fabric:
2. Leaf 103 knows about both source and destination endpoints and
their EPG affiliation, so it will apply a zoning rule instructing them
to perform a redirect. Based on a related PBR destination group,
it will identify the MAC and IP address of the provider L4-L7
434
Chapter 8 Service Chaining with L4-L7 Devices
Figure 8-33 illustrates the final forwarding part for your traffic.
(QGSRLQW ,QWHUIDFH
0$& $
/HDI 97(3 ,3
,3 $
0$& %
/HDI 97(3 ,3
,3 %
(QGSRLQW ,QWHUIDFH
0$& $
(WK
,3 $
7XQQHO
,3 %
/HDI 97(3
/HDI /HDI /HDI
435
Chapter 8 Service Chaining with L4-L7 Devices
so the zoning policies are not applied on this node. The packet is
encapsulated into VXLAN with Source/Destination Policy (SP 0/
DP 1) flags set, VRF VNID, and sClass of the shadow consumer
connector EPG. The VXLAN packet is then sent to spine anycast
VTEP IP for a COOP resolution.
6. The spine has in its COOP entries information about the location
of Server A, so the VLXAN destination IP will be rewritten to the
Leaf 101 VTEP IP, and the packet will be sent towards it.
I hope this step-by-step explanation helps you to better understand the process
behind the policy-based redirect forwarding and expanded the already described
standard intra-fabric forwarding from Chapter 6. The same principles can be applied to
other variations of PBR as well (e.g., L2 transparent or multi-node PBR).
Symmetric PBR
I will conclude this chapter with a feature related to policy-based redirect I find very
interesting to mention: symmetric PBR. Until now, we supposed that ACI utilizes for PBR
only a single HA pair of L4-L7 devices, sharing a single common MAC or MAC+IP next
hop. But it can actually do more. It supports multiple active devices with different IPs
and MACs and provides symmetric load balancing between them.
This behavior is very simple to configure. All you need to do is add multiple
destinations in the L4-L7 policy-based redirect object (located in Tenants -> Tenant_
name -> Policies -> Protocol -> L4-L7 Policy-Based Redirect). Even with multiple
active devices, from the ACI point of view it’s still a single logical L4-L7 device.
436
Chapter 8 Service Chaining with L4-L7 Devices
When the traffic is redirected to a multi-node logical device, the leaf switch will
perform a hash from the IP traffic header and choose one active physical node. By
default, the hash is calculated based on the original (inner) source and destination
IP addresses together with the IP protocol number. In an L4-L7 policy-based redirect
object, you have an option to configure just the source or destination hashing only. In
that case, make sure to create two separate policies, for one PBR connector with the
source and for the provider PBR connector with a destination hashing to preserve the
traffic symmetricity (as shown in Figure 8-34).
6RXUFH 'HVWLQDWLRQ
,3Y,3Y ,3Y,3Y
+DVKLQJ +DVKLQJ
(3
(3
(3
(3 (3* %
(3
(3* $
// &OXVWHU RI
$FWLYH 'HYLFHV
Summary
In this chapter, you have further expanded your knowledge of ACI by learning about
its significant feature, service insertion using L4-L7 service graphs and policy-based
redirect. ACI itself builds its operation on a security-first philosophy with an allowlist
model and zoning rules enforcement based on defining contracts between EPGs/ESGs.
It cannot provide stateful and deep inspection using next-gen security devices like
specialized firewalls, load balancers, or IPS/IDS devices, though.
437
Chapter 8 Service Chaining with L4-L7 Devices
You can either manually ensure that the traffic will flow through the mentioned
devices using network (VLAN) stitching or thanks to a routing configuration. However,
this traditional approach may not be scalable or maintainable with hundreds or
thousands of bridge domain segments and EPGs in ACI.
Service insertion tools provide a diverse set of options, allowing you to automatically
permit the traffic to the specialized L4-L7 devices without actually changing the already
deployed contracts. You can stay focused on standard application profiles instead of
worrying how to incorporate an external firewall between your EPGs. ACI will handle
additional contracts and service EPG creation for you.
With policy-based redirect, which is the most popular (according to my experience)
form of service insertion, leaf switches are even able to directly steer the traffic flow by
implementing special redirect zoning rules, all in a selective manner, giving you the
ability to choose which traffic will go through the service appliance. At the end of the
day, this results in a significant saving of service device resources.
During the chapter, you saw different design options related to the service insertion
for Layer 3 routed appliances as well as transparent Layer 2 or wire-like Layer 1 devices.
You went through the policy-based redirect configuration and verification, including
a description of their deployment in the leaf’s zoning rule tables. Then you looked
closely at a forwarding process between two EPGs with PBR applied in between. Finally,
you explored a symmetric PBR, allowing ACI to load balance between multiple active
service nodes.
In the next chapter, you will examine another very useful ACI integration feature,
providing additional automation and significant visibility into your virtualization and
container environments. Both network and virtualization teams can benefit from these
capabilities, resulting in shorter downtimes and troubleshooting sessions, followed by
the increased service quality offered to your customers.
438
CHAPTER 9
439
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_9
Chapter 9 Integrating ACI with Virtualization and Container Platforms
managing the virtual networking layer in a similar manner as the ACI fabric itself. APIC
gets the ability to see the whole virtualization inventory, including hypervisors, their
uplink interfaces, virtual switches with associated networks, and virtual machines with
their entire properties. In the end, all your ACI application profiles consisting of various
EPGs can be dynamically mapped into the virtualization platform without the need for
complicated static encapsulation management or specific interface path definitions.
• Alarms
• Distributed Switch
• dvPort Group
• Folder
• Network
• Host
• Host.Configuration.Advanced settings
• Host.Configuration.Network configuration
• Virtual machine (the following VM privileges are needed only for L4-L7
service device VMs if you plan to use them in ACI service graphs after
the integration)
440
Chapter 9 Integrating ACI with Virtualization and Container Platforms
After vCenter preparation, Figure 9-1 describes all the steps involved in the
integration process itself with the explanation following.
ϲ
$&, $GPLQLVWUDWRU &UHDWHV $SSOLFDWLRQ 3URILOHV DQG 0DSV 900 'RPDLQ WR (3*V
ϵ
$&, 3ROLF\ 'HSOR\PHQW WR /HDYHV
ϱ
Ϯ
'\QDPLF (6;L 'LVFRYHU\
$3,& ± Y&HQWHU
+DQGVKDNH //'3 &'3
Y'6 WR +\SHUYLVRU
8SOLQN 0DSSLQJ
ϰ +\SHUYLVRU (6;L +\SHUYLVRU (6;L
ϯ
:HE 3RUW *URXS $SS 3RUW *URXS ϳ '% 3RUW*URXS
$3,& &UHDWHV
'LVWULEXWHG :HE ϴ :HE $33 '% '%
6ZLWFK 90 90 90 90 90
441
Chapter 9 Integrating ACI with Virtualization and Container Platforms
442
Chapter 9 Integrating ACI with Virtualization and Container Platforms
//'33ROLF\
//'3B(QDEOH
6ZLWFK3URILOH
,QWHUIDFH3URILOH ,QWHUIDFH3ROLF\*URXS
/HDIB6Z3URI
/HDIB,I3URI (6;LB$FFHVV3RO*URXS 25
&'33ROLF\
&'3B(QDEOH
6ZLWFK6HOHFWRU $FFHVV3RUW6HOHFWRU
/HDIB6Z6HO (WKB
3RUW&KDQQHO3ROLF\
0$&3LQQLQJ/$&3
/HDI 3RUW%ORFN
$WWDFKDEOH$FFHVV(QWLW\3URILOH
(6;LB$$(3
900'RPDLQ
Y&HQWHUB3URG
9ODQ3RRO (QFDS%ORFNV
(6;,B'\Q9/$1
The main difference between access policies for bare-metal endpoints is the missing
physical domain (replaced by VMM domain object) and the fact that you have to use a
dynamic VLAN pool instead of static (as shown in Figure 9-3).
443
Chapter 9 Integrating ACI with Virtualization and Container Platforms
VLAN ranges should by default inherit dynamism from the parent pool, and this is
due to the way APIC creates port groups in VMware when the VMM domain is associated
with EPG. Each port group will automatically get one VLAN ID from this pool. For use
cases where you still need a static and manual VLAN to port group mapping, ACI allows
you to do that as well. In these situations, you have to include a static VLAN range in
a dynamic VLAN pool. Anyway, always make sure to include a reasonable amount of
VLAN ranges in your VLAN pool for all the EPGs.
It doesn’t matter whether you use access or vPC interface policy groups, LACP,
or static port channeling. These settings just have to match later from the vDS switch
perspective as well. Actually, from my practical experience, avoiding port channels
in this case can significantly simplify your ACI access policies for the ESXi cluster.
When you consider that all ESXi interfaces are usually standardized with the same
configuration, all you need is a single (access) interface policy group mapped to all
interface profiles and consequently to all switch profiles. Server redundancy can still be
achieved by configuring vDS to use MAC pinning based on a VM source MAC hash. The
only downside of this approach could be the fact, that for each VM (and its MAC), you
have a more limited amount of throughput equal to a single leaf interface. You cannot
load-balance the traffic per flow on top of the port channel.
Opposed to that, if you decide to create an individual VPC interface policy group
for each vPC port channel, it will definitely be supported and working. However, it can
become more difficult to manage over time for hundreds of servers. Regardless of the
option you choose, at the end of the day, make sure all your access policies are in place
for each ESXi server to ensure the correct VMware integration.
444
Chapter 9 Integrating ACI with Virtualization and Container Platforms
There is an option not to rely on LLDP/CDP discovery at all, for ESXi hosts
connected to ACI via an intermediate switch layer, non-managed from APIC. The most
common use case is a blade server chassis with blade switches but without support for
transparent LLDP/CDP forwarding (see Figure 9-4).
1R//'3&'3
; )RUZDUGLQJ ;
%ODGH6HUYHU6ZLWFKHV
Without LLDP/CDP information, you can later ensure policy deployment simply to
all interfaces and leaf switches that are part of the particular AAEP object, without any
relation to the ESXi host presence behind them.
The decision to rely or not to rely on dynamic discovery protocols for particular EPG
endpoints happens later, during VMM domain to EPG mapping, and is individually
configurable for each EPG and VMM domain. I will cover it in the next sections.
445
Chapter 9 Integrating ACI with Virtualization and Container Platforms
connecting the VMM domain with access policies, add a dynamic VLAN pool, and
specify vCenter IP/hostname with credentials to be used for integration (as shown in
Figure 9-5).
In the Delimiter field, you can alter the naming convention of the automatically
created port groups in VMware. The default delimiter in ACI version 5.X and newer is
underscore (_), resulting in following name format: <TenantName>_<ApplicationProfile
Name>_<EPGName>. Therefore, if your EPG names contain underscores (mine always),
it won’t allow you to create a port group with a corresponding name. In such a case,
change the delimiter, for example, to pipe (|), which was actually the default prior to ACI
version 5.
At the bottom, starting with the Number of Uplink field you can find the vSwitch
policies configuration. All these settings are deployed to newly created virtual distributed
switch in vCenter and especially the Port Channel Mode with vSwitch LLDP/CDP policy
should exactly reflect your ESXi access policies created in a previous step. Otherwise, you
end up with connectivity and VM endpoint learning issues.
When talking about vDS, let’s stay a while at the form to define vCenter in the VMM
domain shown in Figure 9-6.
Figure 9-6. vCenter controller setting inside the VMM domain object
To avoid any issues with the integration, make sure to set the DVS version correctly.
The vCenter default means the actual version of your vCenter, which does not
necessarily have to be the same version running on your ESXi servers. DVS has to match
your ESXi platform, or you will get a compatibility error. Currently, with ACI v5.X+, all
vSphere versions from 5.1 to 7.0 are fully supported.
447
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Also, check for any faults in the newly created VMM domain object in ACI. When
you expand ACI’s standard tree structure in the left menu, you should see all vCenter
inventory, and after opening the Operation -> General tab, vCenter has to be in the
Online state, with all the managed hypervisors visible (refer to Figure 9-8).
448
Chapter 9 Integrating ACI with Virtualization and Container Platforms
The same information is available through the CLI as well if needed for verification
or troubleshooting purposes. See Listing 9-1.
449
Chapter 9 Integrating ACI with Virtualization and Container Platforms
In case of any problems, first check the APIC <-> vCenter connectivity and verify that
the vCenter user specified in the VMM domain has all the necessary privileges described
at the beginning of this chapter.
Now, from the moment of integration, the VMware administrator shouldn’t
manually change any properties of vDS, which were created and managed from
APIC. Each change will result in a fault in ACI. The entire list of managed settings is
summarized in Table 9-1.
450
Chapter 9 Integrating ACI with Virtualization and Container Platforms
If you choose to use one of the LLDP or CDP protocols, ACI should now see your
ESXi hosts, and no faults will be visible for the VMM domain. In other cases, APIC
will notify you which ESXi hosts are not detected on any leaf switch and you can start
troubleshooting access policies, which are probably causing this issue.
451
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Deploy Immediacy setting affect the moment when the ACI policies are pushed into
the hardware TCAM tables of leaf switches. The Immediate option programs the TCAM
instantly after the policy is downloaded from APIC. With On Demand mode, the leaf
will wait to see the first packet received through a configured data path for a connected
endpoint in this EPG. It helps to optimize the switch hardware resources.
Resolution Immediacy is then a very important setting related to the already
mentioned discovery protocols LLDP and CDP. Here you choose whether to use them
or not. The Immediate setting ensures that the policy is downloaded to all affected leaf
switches as soon as the adjacency is detected with the ESXi server. However, LLDP
452
Chapter 9 Integrating ACI with Virtualization and Container Platforms
or CDP has to be enabled and work correctly in order to dynamically deploy switch
policies. Otherwise, you receive an EPG fault telling you there is a problem with a server
discovery. The second option, On Demand, also looks for the ESXI adjacency before
any policy is deployed on a leaf switch, but this time you have one additional condition:
APIC waits for the first VM to be connected in the created port group in VMware
corresponding to this EPG. The last option, the Pre-Provision setting, completely
ignores the availability of the LLDP/CDP information and better configures EPG-related
policies to all switches and interfaces described in interface policy groups mapped to
AAEP object used by the VMM domain. This is the configuration option for indirectly
connected ACI and ESXi servers described earlier.
The VLAN mode is by default set to dynamic, which implies dynamic VLAN
assignment, from a dynamic VLAN pool to a port group created for this EPG in vCenter.
In case you need a static VLAN assignment instead, make sure there is a static VLAN
range in the VLAN pool in the VMM domain and by choosing this option, ACI will allow
you to configure the VLAN ID manually.
All other settings from Port Binding further down are properties of the VMware port
group itself and mostly you don’t need to alter their default values.
After submitting this form, APIC creates a new port group in the managed virtual
distributed switch with the default name <Tenant>_<Application_Profile>_<EPG>
(or with other delimiters if specified) and you should see its presence in the vCenter
interface. In Figure 9-11, you can see multiple such port groups created by APIC
corresponding to ACI EPGs.
453
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Table 9-2 summarizes all APIC managed port group properties. The same
consideration about not manually changing any of them applies here as well as for vDS.
454
Chapter 9 Integrating ACI with Virtualization and Container Platforms
If you successfully applied all the settings so far, VMs should appear in ACI as a
learned endpoint from two different sources (refer to Figure 9-13):
• “learned” implies that the leaf switch sees in a data plane these
endpoints communicating to the ACI fabric
Thanks to the previous steps, now you can see how easy it is to get your virtual
endpoints to any EPG, obtain detailed information about them and their location, with
enhanced monitoring handled directly by APIC, and most importantly, without the
need for any static path mapping anymore. This makes the VMM integration a very
powerful ACI feature for significant resource and operation optimization in virtualized
environments.
456
Chapter 9 Integrating ACI with Virtualization and Container Platforms
inside ACI, it would be highly desired for individual containers to achieve the same
level of visibility and segmentability as any other bare metal server or virtual machine
endpoints. Your goal is to include containers in ACI application policies (EPGs,
contracts, etc.) like any other workload and as dynamically as with the previously
described VMM integration. Additionally, ACI can provide L4-L7 services and especially
distributed hardware-accelerated load balancing for containers with extremely simple
and automated application service exposure to the external network.
Many of our customers consider Docker and Kubernetes to be industry standards
for running containers. They usually deploy Kubernetes itself or some other enterprise
variants, including Rancher, Cisco Container Platform, and Openshift. Regardless of
your main platform, my goal in this chapter is to guide you through the deployment
of a simple Kubernetes cluster that is fully integrated and cooperating with ACI APIC
and publishing load-balanced services to the external network for user access. Since
all Kubernetes variations follow a very similar integration approach (even Cisco itself
recommends fully understanding basic Kubernetes integration first, before moving
further), this should be the best starting point for you.
During my recent ACI projects, I realized there is a lot of information available
directly from Cisco or other vendors regarding the Kubernetes topic, but they are
commonly outdated, scattered, or lacking context. Many times, I found myself blindly
following the offered steps without a proper understanding of the relationships between
all the components. We need to change that. Hopefully, the next sections will help you
to gain confidence and a strong understanding of all the aspects of running Kubernetes
together with ACI and allow you to replicate the whole concept inside your own
environment.
457
Chapter 9 Integrating ACI with Virtualization and Container Platforms
monitoring, and more. Kubernetes uses a highly modular, loosely coupled architecture,
so all the components heavily rely on a centralized API server. The high-level overview of
the Kubernetes platform is depicted in Figure 9-14.
$GPLQLVWUDWRU'HYHORSHU
NXEHFWO
.XEHUQHWHV &OXVWHU
&RQWUROOHU
0DQDJHU
&RQWDLQHU &RQWDLQHU
$GGRQV $GGRQV
5XQWLPH 5XQWLPH
$SSOLFDWLRQ 8VHUV
458
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Kubernetes uses dedicated master nodes (host servers) to run all the control plane
processes, separating them completely from the main container workloads, which are
scheduled on worker nodes.
459
Chapter 9 Integrating ACI with Virtualization and Container Platforms
according to the instructions from control-plane nodes. The Worker Node status
announced every few seconds to master nodes.
Kube-proxy acts as load balancer and network proxy. Provides routing of incoming
packets to the correct containers based on IP and TCP/UDP port information from a
packet header.
Container is the smallest executable entity and it representing the microservice of
an application with its libraries and dependencies. One or more containers are running
inside a Pod in Kubernetes.
Addons represent a plethora of modular, installable components for each
Kubernetes node. These include CNI plugins for networking and security functions,
Service Discovery DNS server, Kubernetes Web Dashboard, or detailed issue reporting
modules.
CNI brings a specification for connecting container runtimes to the network with
all the software libraries required to do so and CLI tools to execute various CNI actions.
It acts as an abstraction layer between container network interfaces and the host OS
(external network). The CNI specification defines exactly how to configure container
network interfaces and how to provide them with the expected connectivity.
We differentiate between several networking constructs and network segments in
Kubernetes. First, we will look at the Pod network. A Pod is an elementary scheduling
unit and can contain one or more containers inside. From a networking perspective,
460
Chapter 9 Integrating ACI with Virtualization and Container Platforms
each Pod is automatically assigned with an IP address from the initially defined Pod
subnet (usually /16). If multiple containers are part of the single Pod, they sharing a
single IP address. In such a case, a developer has to make sure not to use overlapping
TCP/UDP port numbers; otherwise, there would be a collision. Within a single Pod,
all containers can communicate freely using localhost, reaching other containers by
contacting their own IP, just specifying the different port. When trying to reach a remote
Pod, the container has to use the remote Pod’s IP over the Pod network.
Then we have the node network. As the name suggests, each Kubernetes host node
has an interface in this segment and uses it primarily for API services. API servers of
master nodes communicating with each Kubelet running inside Workers over the node
network. You indicate the node network to the Kubernetes by configuring a default route
via a particular network interface. When Kubernetes components start, they look for the
default route and that interface is declared as the main Node interface.
To expose any containerized application to the external world, Kubernetes uses an
abstraction concept of services. A service is constructed representing a VIP address (often
called a cluster IP) or chosen port with a corresponding DNS entry dynamically assigned
to the group of Pods. It serves for external access and load balancing between Pods that
are part of the same service. For these functionalities it is by default responsible for the
kube-proxy component of each Kubernetes server node. Sometimes (for example in your
ACI case), the CNI plugin can take over with its own service management. There are the
following types of services:
461
Chapter 9 Integrating ACI with Virtualization and Container Platforms
These were the most up-to-date versions of each component at the time, but the
whole concept is similar with others as well. There is a compatibility matrix available on
the Cisco website, so make sure you use a supported combination at least of ACI version,
ACI CNI version, and the container orchestration platform: www.cisco.com/c/dam/en/
us/td/docs/Website/datacenter/aci/virtualization/matrix/virtmatrix.html.
In Figure 9-15, you can see the high-level networking overview of resources needed
for running integrated Kubernetes in ACI.
462
Chapter 9 Integrating ACI with Virtualization and Container Platforms
You will first prepare some information, which will be needed during further
provisioning and working with the integrated solution. Create the following IP plan:
• Node Subnet (mask /16): Private subnet used for Kubernetes control
plane traffic and API services. Despite its privateness, it should have
network reachability with Cisco APIC and, for simplicity, it can be
used for management access to the Kubernetes nodes as well. Each
node server (master/worker) will have an interface with one IP from
this subnet. The default route of node servers has to point to this
subnet’s gateway. As shown in Figure 9-15, ACI will automatically
propagate this subnet using a predefined L3OUT later.
• Node Service Subnet (mask /24): Private subnet, used in ACI service
graphs for routing of hardware loadbalanced service traffic between
ACI leaves and Kubernetes server nodes. You will get in touch with
this subnet when deploying L4-L7 service graphs in ACI after the
integration.
463
Chapter 9 Integrating ACI with Virtualization and Container Platforms
• External Service Subnet Static (mask /24): The same as previous. The
only difference is its usage for a static allocation to Kubernetes services.
Note Network masks shown in the IP plan are the default and recommended, but
technically in Kubernetes can be any. I used smaller subnets for lab purposes.
Now prepare several VLANs for the integration. They match the ones from
Figure 9-15 and will be used in ACI as well as Kubernetes:
464
Chapter 9 Integrating ACI with Virtualization and Container Platforms
The following information will be used in this lab. I write prefixes (except of cluster
subnet) directly in a form of default gateways for each subnet, because you will fill them
in later in this format in the initial configuration YAML document. See Listing 9-2.
Now deploy a simple server installation and connect four CentOS machines to
ACI – OrchestratorVM, Kubernetes_Master, Kubernetes_Worker1 and Kubernetes_
Worker2. They can be as simple as bare-metal servers or VMs running inside the
VMware with ESXi host uplinks in ACI (the second is my case). They don’t need to
have any superb properties either for lab purposes; 1-2 vCPU, 4-8GB RAM, 32GB HDD,
and 1 network interface will work just fine. Make sure each server has a unique Linux
Machine ID, otherwise Kubernetes cluster will have problems with clustering (if you
cloned them from a common VM template, change the machine id manually). If you use
virtualization in between your Kubernetes nodes and ACI, make sure the virtual switch is
correctly configured to support the required MTU and transparently forwards any used
VLAN tag to your VMs. For VMware, there are recommended security settings for each
port group:
465
Chapter 9 Integrating ACI with Virtualization and Container Platforms
I ended up with the following lab topology for Kubernetes integration (as shown in
Figure 9-16).
32' 32'
,31
(6;L (6;L
$&,
2UFKHVWUDWRU
'LVWULEXWHG 6ZLWFK
1RGH,QIUD
1RGH,QIUD6HUYLFH 9/$1 1RGH,QIUD6HUYLFH 9/$1
6HUYLFH 9/$1
466
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Service VLANs. Servers themselves have a VLAN tagged subinterface for Infra VLAN
(dot1Q tag has to be added at OS level), and another tagged interface is part of the Node
VLAN. The default route of servers has to be pointed towards the Node VLAN gateway
via its subinterface. Kubernetes, based on the default route, identifies an interface to run
its API on. Finally, ensure all machines have internet access (proxy is not a problem) for
package installation (Kubernetes nodes will gain external connectivity over ACI L3OUT
after the initial provisioning).
Next, you need to create several objects in ACI manually before we start
orchestrating the integration. They include
Fabric Access Policies:
Tenant Policies:
The next step is to provision ACI with already prepared information. To do this, you
have to use the acc-provision tool from the Cisco software downloads website at https://
software.cisco.com/download/home. Navigate to Application Policy Infrastructure
Controller (APIC) downloads and APIC OpenStack and Container Plugins. There, find the
RPM package for ACI CNI Tools (acc-provision and acikubectl) and download it to the ACI
Orchestrator VM. There, unpack and install it as shown in Listing 9-3.
467
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Sometimes the RHEL system has to install an additional repository called Extra
Packages for Enterprise Linux (EPEL) to satisfy the required acc-provision tool
dependencies. Compared with the main RHEL packages, the EPEL project is maintained
by the community and there is no commercial support provided. Nevertheless, the
project strives to provide packages of high quality and stability. See Listing 9-4.
Now create a sample configuration file with the acc-provision tool. You need to
specify the “flavor” with the -f argument, matching your intended deployment. In our
case, it will be Kubernetes 1.22 (if needed, you can use acc-provision --list-flavors
to see available flavors). See Listing 9-5.
468
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Fill in all previously prepared information and ACI objects in the sample config file
aci-containers-config.yaml (be aware, fields are case-sensitive). Important changed
or added lines are marked as bold. I’ve added my own comments to some configuration
objects as well. See Listing 9-6.
#
# Configuration for ACI Fabric
#
aci_config:
system_id: K8S # This has to be unique name of each cluster
and will be
# included in ACI object names
#apic-refreshtime: 1200 # Subscrption refresh-interval in seconds;
Max=43200
apic_hosts: # Fill all your APIC cluster nodes OOB
addresses
- 10.17.87.60
- 10.17.87.61
- 10.17.87.62
vmm_domain: # Kubernetes container domain configuration
encap_type: vxlan # Encap mode: vxlan or vlan
mcast_range: # Every opflex VMM must use a
distinct range
start: 225.20.1.1
end: 225.20.255.255
nested_inside: # Include if nested inside a VMM
# supported for Kubernetes
# type: vmware # Specify the VMM vendor
(supported: vmware)
# name: myvmware # Specify the name of the VMM domain
469
Chapter 9 Integrating ACI with Virtualization and Container Platforms
#
# Networks used by ACI containers
#
net_config:
node_subnet: 192.168.100.1/24 # Subnet to use for nodes
pod_subnet: 192.168.101.1/24 # Subnet to use for Kubernetes Pods
extern_dynamic: 192.168.200.1/24 # Subnet to use for dynamic
external IPs
extern_static: 192.168.201.1/24 # Subnet to use for static
external IPs
node_svc_subnet: 192.168.102.1/24 # Subnet to use for service graph
kubeapi_vlan: 100 # NodeVLAN for mgmt access and
Kubernetes API
# (Kubernetes only)
service_vlan: 102 # Used by ACI L4-L7 Service graph
infra_vlan: 3967 # The VLAN used by ACI infra
interface_mtu: 8900 # !!! Default 1600 is incorrect. Use MTU
between 1700
470
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Note It’s possible and supported to combine both VMM and Kubernetes
integration. You can run your server nodes inside the VMware VMM domain and
connect them to the distributed vSwitch managed by APIC. To do so, specify VMM
domain inside the configuration above. Look for section nested_inside:
inside vmm_domain:.
471
Chapter 9 Integrating ACI with Virtualization and Container Platforms
With this initial configuration file, we can orchestrate the ACI configuration. The
acc-provision script will create all the necessary objects for you and create Kubernetes
deployment YAML. Download aci-containers.yaml (“-o” switch in the command) as
you will need it later for the Kubernetes master node to spin up the ACI CNI containers,
the control and data plane components of the ACI network inside Kubernetes. Notice
instructions in the output on how to later deploy ACI CNI or alternatively clean the older
deployment. See Listing 9-7.
When you now check the ACI, you should see many new objects created (as shown
in Figure 9-17). All have special graphical symbols meaning “created from orchestrator.”
These include
472
Chapter 9 Integrating ACI with Virtualization and Container Platforms
473
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Before we move further, as the last step we need to configure routing to both External
Service Subnet prefixes (static and dynamic). On your external router connected to
ACI border leaves (and Kubernetes L3OUT), simply add static routes in your case for
192.168.200.0/24 and 192.168.201.0/24 with the next hop pointing to ACI.
Caution Make sure both interfaces have MTU set at least to 1700, preferably
to the maximal value of 9000. The default value of 1600 (even listed in
documentation) is not enough to run the ACI CNI module. With 1600, I got the
following error later in the container crash log: "OpFlex link MTU must be
>= 1700" mtu=1600 name=ens224.3967 vlan=3967 panic: Node
configuration autodiscovery failed.
Maybe you are thinking, what about the third VLAN interface we specified for the
service node? This one will be created automatically and internally by the ACI CNI
plugin. All additional potential VLANs except for the previously mentioned Node and
Infra VLAN will be terminated inside ACI’s Open vSwitch container for each host and
bridged from the host’s main interface.
Next, configure all servers according to the following example. This is a permanent
interface setting in Centos Stream 8 (make sure to reflect your own interface id which
can differ from mine; the default used to be ens192). See Listing 9-8.
474
Chapter 9 Integrating ACI with Virtualization and Container Platforms
475
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Immediately after the interfaces are up, you should gain connectivity through ACI
via the Node VLAN to your servers from the external network thanks to L3OUT, and
each Kubernetes node will receive its VTEP address from APIC over the Infrastructure
VLAN. See Listing 9-9.
476
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Add a permanent static route for the whole multicast IP range with a next-hop
interface of Infra VLAN subinterface. See Listing 9-10.
Listing 9-10. The Static Multicast Route via ACI infraVLAN Interface
Now you need to tune and override several requirements for the Linux DHCP client
to receive special options from the APIC. For each Kubernetes server, fill in the proper
Infra VLAN subinterface MAC address in the first line. See Listing 9-11.
477
Chapter 9 Integrating ACI with Virtualization and Container Platforms
If you plan to deploy more than 20 EPGs inside the Kubernetes cluster, you should
tune the IGMP membership kernel parameter. Current kernel versions allow by default
joining only 20 multicast groups and this fact can be verified by the command in
Listing 9-12.
For each EPG count with one IGMP membership. To change the setting, edit file
/etc/sysctl.d/99-sysctl.conf, add the following line, and load the setting, as shown
in Listing 9-13.
Finally, use the DNS server or configure static DNS records for the individual
Kubernetes nodes on each server. See Listing 9-14.
Kubernetes Installation
Now as you have ACI and server nodes prepared from the configuration and networking
point of view, you can move to the Kubernetes installation itself. There are multiple
478
Chapter 9 Integrating ACI with Virtualization and Container Platforms
options for how to deploy the cluster, and none of them is recommended or preferred for
the ACI integration itself. The decision is up to you in the end. Personally, I have a good
experience with the kubeadm tool, so in this chapter, I will describe the process behind
deploying Kubernetes this way.
Let’s start with the Docker installation. To make sure DNS resolution inside Docker
is working fine, you need to disable the Linux firewall on all servers. See Listing 9-15.
The latest Docker is available in an official repository, which you will add to the
packaging system. Then you can install Docker and make sure the daemon will start and
run after the reboot as well. See Listing 9-16.
Note If you will run Docker with your own user instead of root, create a “docker”
group and add your user there to avoid need for constant use of sudo with docker
commands:
$ sudo groupadd docker$
sudo usermod -aG docker $USER
479
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Docker uses a control group (cgroup) system to handle the resource management
and to constrain resource allocation for individual container processes. There are
multiple cgroup drivers available and Docker runs by default cgroupfs. However, from
kubeadm version 1.22, if the user does not specify any cgroup driver, kubeadm will
install systemd for Kubernetes nodes. You need to match both of them to run kubelet
modules correctly by simply changing Docker’s driver, as shown in Listing 9-17.
If Docker is running and with the correct cgroup driver, you can move forward to the
kubeadm utility. First, a few system preparation steps. Set SELinux in permissive mode
(disable it), as in Listing 9-18.
Ensure the br_netfilter kernel module is loaded in the system by issuing lsmod |
grep br_netfilter. Mostly it is already in place, but if not, load it manually using the
modprobe br_netfilter command and ensure it is permanently available also after the
restart. This will enable support of the bridging network functionalities needed on the
Kubernetes host later for the ACI CNI plugin. See Listing 9-19.
480
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Now properly configure iptables to enable bridged traffic on the Kubernetes nodes
by adding the lines in Listing 9-20 to sysctl.d (I will explain the need for a network
bridge later):
The Kubelet module won’t work if the Linux swap is enabled, so you need to fix
that in advance. Turn off the swap momentarily by the command swapoff -a and
permanently disable it by commenting out the swap line in the /etc/fstab file. See
Listing 9-21.
481
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Finally, let’s install the kubeadm utility and Kubernetes itself. As before, add the
official Kubernetes repository, install the necessary packages, and run the kubelet
module. See Listing 9-22.
At this moment, kubelet will stay in a crashloop on all nodes, but don’t worry
because as soon as you start the Kubernetes control plane and add workers to the cluster,
everything will be fine. This time on a master node only, initialize Kubernetes control-
plane by issuing the command in Listing 9-23,
482
Chapter 9 Integrating ACI with Virtualization and Container Platforms
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a Pod network to the cluster. Run kubectl apply -f
[podnetwork].yaml with one of the options listed at https://kubernetes.io/docs/
concepts/cluster-administration/addons/.
Then you can join any number of worker nodes by running the following on each
as root:
If you received the previous message at the end of the initialization, your Kubernetes
control plane is running correctly, and you are ready to start adding worker nodes.
As instructed, issue kubeadm join with the received token on your worker nodes. See
Listing 9-24.
483
Chapter 9 Integrating ACI with Virtualization and Container Platforms
You need to tweak the Kubelet DNS resolution. As you’ve changed the default
Kubernetes Cluster subnet (default is 10.96.0.0/16 and your is 192.168.103.0/24), you
have to let the module know about it; otherwise, it will tend to reach 10.96.0.10 for
DNS resolution. In addition, add the --network-plugin=cni definition to KUBELET_
KUBECONFIG_ARGS= variable to instruct Kubelet to use the CNI plugin located inside
default folder /etc/cni/net.d. Add the lines in Listing 9-25 to the Kubelet configuration file
on each node and restart the process.
Put this kubectl reference permanently into your environmental variables by adding
the line in Listing 9-26 to the local user shell script.
Reconnect to the master node, where you can check the current cluster status and
you should see all Kubernetes nodes there. After a few minutes since join, all workers
should end up in the Ready state. See Listing 9-27.
484
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Now you have Kubernetes running but without any network CNI plugin. Remember
that YAML file generated by the acc-provision script at the beginning? It’s basically the
Kubernetes deployment specification file for CNI and you will use it now to provision all
ACI networking components into the running Kubernetes cluster. You need to upload
the file to the master node and apply it this way, as shown in Listing 9-28.
After a while, when the whole ACI machinery is deployed, you should see the
following containers running in your cluster (see Listing 9-29).
Congratulations!! This is the goal you have been pursuing from the beginning and
currently you’ve achieved a working Kubernetes cluster with an ACI CNI plugin and full
485
Chapter 9 Integrating ACI with Virtualization and Container Platforms
integration. As shown in Figure 9-18, when you navigate in ACI to Virtual Networking ->
Kubernetes and open the K8s domain, you should see all your Kubernetes nodes
(in the online state), their VTEP addresses (ACI considers them basically as remote
virtual leaves now), and discovered container endpoints.
In this section, go through all menu options to see the acquired consolidated
visibility from the ACI point of view. If you open a particular container endpoint, you will
get its MAC address, IP address, network veth interface, VXLAN VNID segment, labels,
associated ACI EPG, potential issues, and much more. In the Pods sections, you can
see exactly on which Kubernetes node a particular Pod is running and which ACI leaf
interface is used for its communication (see Figure 9-19).
486
Chapter 9 Integrating ACI with Virtualization and Container Platforms
You can easily get even detailed statistics about individual containers, as you can see
in Figure 9-20. Go to the Pods section, click the number representing Inband Discovered
Endpoints for a chosen Pod. In the newly opened form, choose the Pod’s veth interface,
click the symbol, and open the Statistics tab.
Now consider the fact that ACI’s big advantage is an open API. Any object visible
inside the Kubernetes VMM domain is accessible through it, which gives you powerful
options to easily use this information inside your own management or monitoring
systems.
487
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Figure 9-21 graphically summarizes the architecture of ACI CNI plugin components.
(WFG
2SHQ Y6ZLWFK
'DWD 3ODQH
%URZVHU
3RUW 3RUW
8, $QJXODU
QJLQ[ 3RUW
3RUW UHGLV
SRVWJUHV
\HOEFDFKH
\HOEGE
In ACI, navigate to your K8S tenant and create a new application profile called yelb_
app with four EPGs corresponding to application components: ui_epg, appserver_epg,
cache_epg, and db_epg. Each EPG is put inside the aci-container-K8S-pod-bd bridge
domain and associated with the K8S Vmm-Kubernetes domain.
ACI’s whitelisting model will now apply to Kubernetes Pods in the same way as with
any other workloads. By default, all the communication between Pods is denied right
at the Open vSwitch layer. Packets won’t even leave the interface of Kubernetes hosts.
In order to allow required Inter-Pod communication, you need to create three new
contracts with related filters, matching the application TCP flows shown in Figure 9-22.
Additionally, you need to ensure DNS resolution for each Pod with core-dns containers,
so you will consume an already existing DNS contract aci-containers-K8S-dns with each
490
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Now download the Yelb Kubernetes deployment file on the master node and save it
to the yelb-lb.yaml file. See Listing 9-30.
You need to edit the individual deployment sections to specify their EPG affiliation
by using annotations. Annotations are tools to simply associate any Kubernetes Pod to
EPG of your choice. Listing 9-31 shows the Yelb-ui deployment section.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: yelb-ui
491
Chapter 9 Integrating ACI with Virtualization and Container Platforms
namespace: yelb
annotations:
opflex.cisco.com/endpoint-group: '{ "tenant":"K8S", "app-
profile":"yelb_app", "name":"ui_epg" }'
spec:
selector:
matchLabels:
app: yelb-ui
replicas: 3
... rest of the lines omitted – no changes there...
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-server
namespace: yelb
annotations:
opflex.cisco.com/endpoint-group: '{ "tenant":"K8S", "app-
profile":"yelb_app", "name":"cache_epg" }'
... rest of the lines omitted – no changes there...
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: yelb-db
namespace: yelb
annotations:
492
Chapter 9 Integrating ACI with Virtualization and Container Platforms
And finally, see Listing 9-34 for the Yelb-appserver deployment with increased
container replicas to spread them across the Kubernetes cluster. You will later easily see
loadbalancing between them in action.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: yelb-appserver
namespace: yelb
annotations:
opflex.cisco.com/endpoint-group: '{ "tenant":"K8S", "app-
profile":"yelb_app", "name":"appserver_epg" }'
spec:
selector:
matchLabels:
app: yelb-appserver
replicas: 3
... rest of the lines omitted – no changes there...
Save the deployment file and apply Yelb application in your cluster. After a few
moments, check that all PODs are ready and running as expected. See Listing 9-35.
493
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Pods are already visible as learned endpoints inside the ACI fabric with detailed
information about their MAC/IP addresses, location, physical interface, and VXLAN
segment used for communication (as shown on Figure 9-24).
You’ve just achieved the same level of segmentation, visibility, and network
consistency for containers in ACI as for any other bare-metal or VM workloads. And this
integration allows you to go even further in few moments.
But first, navigate to the Kubernetes VMM domain in ACI and go to the Services
tab, or enter kubectl get services -A on the Kubernetes master node. You will find a
LoadBalancer service listening on External IP 192.168.200.8 and tcp port 80 for yelb-ui
PODs. This is an automatically assigned VIP for application access. See Listing 9-36.
494
Chapter 9 Integrating ACI with Virtualization and Container Platforms
It’s that simple. Try to refresh this site multiple times to observe changes in the
appserver container name listed at the bottom of the page. It’s the verification that
loadbalancing is working correctly and that it’s actually happening internally. But how
does ACI know how to route the traffic to this VIP?
During the Yelb app deployment, ACI automatically published yelb-ui Pods to the
external network through L3OUT in the K8S tenant and for clients, ACI even serves as a
hardware loadbalancer. To understand the magic behind this machinery, you need to
explore several new objects created by the ACI CNI Controller in APIC.
The key initiator for all the automation is a loadbalancer service definition in
your Kubernetes deployment file (for a service syntax, refer to yelb-lb.yaml). When
integrated Kubernetes sees the loadbalancer definition, the following steps will happen:
495
Chapter 9 Integrating ACI with Virtualization and Container Platforms
4. The ACI CNI creates a service graph instance with a redirect policy
listing all your Kubernetes worker nodes with their IPs and MAC
addresses from the service VLAN (102 in your example). This
object is automatically updated every time you add or remove
the Kubernetes cluster node where the related Pod is being
scheduled, which is very useful from an operation perspective.
Based on a hash, one of these nodes is chosen and the packet is
redirected there via the service VLAN in in the service BD. Thanks
to that, ACI provides HW accelerated loadbalancing capabilities.
5. The next-hop IP address (Node Service IP) in VLAN 102 for this
policy-based redirect is configured inside an open vSwitch and all
received packets are forwarded to the Kubernetes cluster IP, where
the second tier of loadbalancing happens.
496
Chapter 9 Integrating ACI with Virtualization and Container Platforms
In the Figure 9-26, I tried to visualize all the components involved in traffic
forwarding from ACI L3OUT to a Kubernetes Pod. Hopefully, it will provide you a
complete picture and mental map of this interesting feature.
HQV
//5HGLUHFW3ROLF\
1RGH9/$1 ,QIUD9/$1 .6BVYFBJOREDO
HQV HQV
$&,97(3,3
Y[ODQBV\VB
%ULGJH
//'HYLFH
//6HUYLFH*UDSK
EULQWBY[ODQ .6BVYFBJOREDO
2SHQY6ZLWFK .6BVYFBJOREDO
9/$1
RYVYVFWOVKRZ
SLYHWKGFIG SLYHWKFHII
([WHUQDO(3* ([WHUQDO(3*
.6BVYFB\HOEB\HOEXL 3 & 'HIDXOWB([W(3*
SDYHWKGFIG SDYHWKFHII
'1$7
EUDFFHVV
YHWKGFIG YHWKFHII
.6B/287
32' 32'
\HOEXLM] \HOEXLNQ
.XEHUQHWHV:RUNHU1RGH &LVFR$&,
$SSOLFDWLRQ8VHUV
497
Chapter 9 Integrating ACI with Virtualization and Container Platforms
Summary
In this chapter, you had an opportunity to explore and gain practical experience with
ACI integration capabilities, allowing you to achieve significant benefits for security,
visibility, and maintainability of networking layers for your virtualized workloads
and containers in a data center. This way, you can handle bare-metal servers, virtual
machines, and containers as equal endpoints in the infrastructure and ensure that
consistent policies are applied to them across all data centers.
The next chapter will discuss ACI remote leaf expansion capability, intended for
smaller data centers and colocations without the need for a whole leaf-spine fabric.
498
CHAPTER 10
ACI Automation
and Programmability
If you made your way to this last chapter through the whole book, congratulations and
great job! I sincerely hope you learned a lot, and ACI is becoming (if not already) a strong
part of your skill set. Maybe you incline a lot (like me) to network programmability,
automation, software development, or the DevOps philosophy, and you have jumped
here earlier, before finishing all the chapters, which is also perfect! If you are entirely new
to ACI, though, I definitely recommend starting with building the solid basics first, as I
expect at least a basic understanding of ACI structure, objects, and their usage for this
chapter. In the upcoming pages, we will dive deeper together into a number of exciting
and important topics related to ACI’s programmability features, highly improving its
provisioning speed, operation, change management, and troubleshooting.
This chapter will be different compared with the preceding ones primarily in the way
you will interact with APIC. Instead of creating objects or verifying the configuration using
the GUI and CLI, you will utilize a lower layer: the application programming interface
(API) and tools consuming it. This involves a plethora of options, from OS utilities to the
Python programming language, end-to-end orchestrator platforms, and more. You will
explore how ACI can simplify the creation of hundreds (even thousands) of objects and
policies in the blink of an eye, how you can write your own software to overcome limitations
and unsatisfactory outputs (if any) of the GUI/CLI, and how to prepare troubleshooting
workflow, gathering, and analyzing a quantum of information automatically on your behalf.
For the following sections, I also expect some experience with Python. You don’t need
to be a developer with years of experience at all, but at least minimal working proficiency
will be very beneficial. Personally, I encourage all the network engineers out there more to
learn Python basics. It can unbelievably simplify your life and increase your market value
at the same time. Automation and programmability skills, in general, are in huge demand
and almost mandatory for a wide variety of IT positions around the industry.
499
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2_10
Chapter 10 ACI Automation and Programmability
&LVFR
$QVLEOH 7HUUDIRUP 2SHQVWDFN
,QWHUVLJKW
&REUD
:HE *8, $3,& &/, .6 &1,
6'.
5(67 $3,
MIM carries a logical description of thousands of objects, defining the ACI solution
with all their attributes and mutual relationships. Then, any HTTP/S client in general can
be used to consume the REST API and access their actual instances in the APIC database.
500
Chapter 10 ACI Automation and Programmability
REST APIs
The REST API implemented in ACI is a software technology providing a uniform
communication interface between clients and servers (APIC) to transport information
about ACI objects. It utilizes very similar concepts to the traditional World Wide Web
operation (as shown in Figure 10-2). RESTful APIs come with these specifics:
• HTTP methods: Similar to the Web, the REST API supports standard
HTTP methods of GET, POST, PUT, and DELETE.
+7736*(7 +7736*(7
REST clients, when communicating with API servers, must specify the object URI
and intended operation with it using HTTP methods (as illustrated in Figure 10-3):
501
Chapter 10 ACI Automation and Programmability
• HTTP POST: Creates a new object. In the case of ACI, it is also used
for updating the existing one. The client has to send object attributes
formatted as JSON/XML in the REST call payload.
• HTTP GET: The client requests the object identified in the URI. The
server replies with the JSON/XML payload describing the object
attributes.
+7733267DSLPRXQLSD\ORDG
&UHDWH /RFDWLRQ
+773*(7DSLQRGHPRXQLWQ$SUHVVMVRQ
5HWUHLYH 3D\ORDG
+773'(/(7(DSLQRGHPRXQLWQ$SUHVVMVRQ
'HOHWH
• 1xx Informational
• 2xx Success
502
Chapter 10 ACI Automation and Programmability
• 3xx Redirection
In each category, there are many individual status messages implemented for HTTP,
but we will cover just the most important ones for the REST API. The first informational
category has just a single representative, show in Table 10-1.
101 – Switching Protocols ACI REST API sends it in response to an HTTP Upgrade message when
switching from HTTP to the WebSocket protocol. More about this feature
is in the “ACI Object Subscriptions” section of this chapter.
The success category confirms that your API operations were successfully done on
the server side. In software development, checking for 2XX status codes can be tied to
error identification and handling. See Table 10-2.
200 – OK Any requested client action was carried out successfully. With 200 OK, the REST
API should return a data object in the response body. Primarily used in ACI for
any successful operation.
201 – Created Notifies the client that the requested resource was successfully created. ACI
actually doesn’t implement 201 and sends 200 OK even when creating objects.
202 – Accepted If the processing of a client’s request takes a longer time, this status will notify
the client that the request has been accepted and it’s being processed. Not
implemented in ACI.
204 – No Content Used if the REST API declines to send any message in the response body.
Usually, the resource exists in the system, but it has no attributes to send to the
client. This one has not implemented in ACI.
503
Chapter 10 ACI Automation and Programmability
The third category of status codes is, to the best of my knowledge, not implemented
in ACI at all, but just for completeness, Table 10-3 shows their description in general for
REST APIs.
301 – Moved This response indicates that the underlying object model was significantly
Permanently redesigned, and the client should use a different URI for this request.
302 – Found Common way to perform a redirection to another URL, provided in the
response’s Location header field. The client is expected to generate another
REST call with a new URL.
304 – Not Modified Similar to status code 204, the response from a server doesn’t contain any
message in the body. 304 is sent only when the client asks if the resource
was modified since the version specified in headers If-Modified-Since or
If-None-Match and the resource is not changed.
The client-side error status codes in Table 10-4 are all implemented also for ACI,
and they are an important tool to implement error handling in the client’s automation
software solution.
400 – Bad Request Generic error code returned when some mistake is made in the message
body. It can be related to incorrect syntax or mandatory parameters missing.
401 – Unauthorized Very common error returned when you are not authenticated to access
the REST API. Usually caused by an expired, malformed, or missing
authentication token.
403 – Forbidden Even with the correct authentication token, if access to the requested object
is prohibited by the server’s role-based access control (RBAC), you will be
provided with status 403.
404 – Not Found Maybe the most “famous” status code, returned when you compose a
request URI to the non-existing object.
504
Chapter 10 ACI Automation and Programmability
And the last two server-side errors could be potentially seen when there is some
problem with the execution of your call on the server, such as the web server running on
the server threw an exception. See Table 10-5.
500 – Internal Server Error Failure on the server side, usually caused by an exception thrown by
the web framework running there.
501 – Not Implemented If the server cannot process your otherwise correct request due to
missing implementation of the method used. Not implemented in ACI.
• Case sensitivity
505
Chapter 10 ACI Automation and Programmability
XML
Extensible Markup Language (XML) is one of the data formats used for the ACI REST
API to describe structured object information. It’s very similar to HTML documents but
with strict syntax rules making it optimal for reliable processing by computer software.
However, from a human-readability point of view, it’s not preferred at all, and if possible,
I personally always reach for other options.
XML elements must always be closed or defined as empty. Either you can use pair of
tags, <imdata></imdata>, or a short form, <fvTenant … … />. Both options can be seen
in the example from ACI in Listing 10-1.
506
Chapter 10 ACI Automation and Programmability
JSON
JavaScript Object Notation (JSON) was derived from JavaScript to exchange the data
between clients and server in human-readable format. Compared to XML, JSON is
highly preferred whenever possible thanks to its readability. Another advantage is native
JSON data mapping to dictionaries in multiple programming languages, especially in
Python, commonly used also for ACI automation.
JSON objects use a key:value attribute format, which has to be delimited by curly
braces, { } , as you can see in the ACI example in Listing 10-2.
{
"totalCount": "1",
"imdata": [
{
"fvTenant": {
"attributes": {
"annotation": "",
"descr": "",
"dn": "uni/tn-Apress",
"lcOwn": "local",
"modTs": "2022-08-02T15:06:57.000+02:00",
"monPolDn": "uni/tn-common/monepg-default",
"name": "Apress",
"uid": "15374",
"userdom": ":all:"
}
}
}
]
}
Both keys and values have to use quotes, and created attributes need comma
separation between each other except for the last one. Whitespaces between JSON
attributes and object are not significant.
507
Chapter 10 ACI Automation and Programmability
To create an array, multiple JSON objects just need to be defined between square
brackets, [ ] , and separated by comma (in the previous example, the value of a key
imdata is array).
YAML
Often referred to as Yet Another Markup Language or by the recursive acronym YAML
Ain’t Markup Language, YAML is the third most used format to serialize and transport
data in a standardized way. ACI directly doesn’t support YAML for REST API operations,
but based on its structure, Ansible configuration files (playbooks) are produced. Also,
the ACI object structure can be easily described using YAML, as you will see later in this
chapter.
YAML, similar to JSON, utilizes key:value attributes, this time without the obligatory
quotes. Whitespaces are crucial as the indentation denotes hierarchical level and nested
objects, while each new line represents a new object or attribute. No commas are used.
See Listing 10-3.
aci_config:
aci_tenant_structure:
- name: Apress_Prod
state: present
vrf:
- name: PROD
- name: TEST
app_profile:
- name: WebApp
- name: JavaApp
Lists or arrays in YAML are created by lines starting with a dash, -. Each dash
produces an individual list item, while by item can be considered all lines indented one
level to the right from dash.
508
Chapter 10 ACI Automation and Programmability
XQL
8QLYHUVH± 3ROLF\5RRW
IY7HQDQW
Q Q Q Q Q Q
Q
IY%' IY&W[ Y])LOWHU Y]%U&3
OH[W2XW IY$3 95) FRQWUDFW
Q
Q Q Q
Q Q
Q
IY$(3J IY6XEQHW Y]6XEM
Each ACI object is situated in the precise place of the object tree hierarchy. This
place can be uniquely described by a path through three branches needed to get there
from a root, also called universe (or just “uni”). And that path produces a unique
distinguished name (DN) for each object. The DN will become quite important soon as
it directly enters object’s REST URI when working with API. Besides the DN, each object
can be also referred by a relative name (RN), meaning just its own MO’s name. The
naming convention for both DN and RN can be found for each object in the information
model documentation, which I will cover in the next section.
Figure 10-5 describes how to approach naming of well-known EPG objects:
5227
XQL
IY7HQDQW
Y]%U&3
WQ$SUHVV
IY$3 Y]6XEM
DS352'B$3
IY%'
IY$(3J
HSJ)URQW(QG
Figure 10-5. ACI MOs names, class names, distinguished name, and
relative name
510
Chapter 10 ACI Automation and Programmability
ACI Documentation
ACI is completely open platform from an API standpoint. Any object described in the
internal object model can be accessed or modified using a REST call. However, this
huge object database won’t bring any advantage for this case without proper user
documentation and the ability to search in it reasonably. Fortunately, the entire list
of ACI objects with an extensive description and other useful data can be found at
two places:
1. Online: https://developer.cisco.com/site/apic-
mim-ref-api/
Both variants provide exactly the same amount of information and the same user
interface. The main page allows you perform a full-text search among all the policy
objects or faults. Names of objects refer to their class names, but thanks to full-text
search, usually it’s enough to specify a standard well-known object name, and you will
find its class easily in the search results. The specific object of your choice can be further
examined when clicking in the upper right corner on the symbol shown in Figure 10-6.
That will open object’s detailed information.
511
Chapter 10 ACI Automation and Programmability
The object’s main overview tab provides a lot of useful information including
• Access rights: Is the object configurable? Which ACI roles have write
or read access to it?
• RN: tn-{name}
• DN: uni/tn-{name}
The second documentation tab informs you about all the object’s attributes
accessible through the REST API (as shown in Figure 10-7). By default, only configurable
attributes are shown, so make sure to click the switch in the upper right side. This list
is exactly the same set of key-value pairs with their variable types as configurable or as
returned by the REST API.
512
Chapter 10 ACI Automation and Programmability
In all other tabs, you can check object’s relationships, related faults, events, and
statistics. statistics can be especially valuable from a monitoring point of view, if your
monitoring solution is capable of utilizing API access.
Note All objects described in the documentation are just models, just their
prescriptions or templates, if you will. Actual instances will be accessible
differently, which we will explore in the following sections.
513
Chapter 10 ACI Automation and Programmability
Next, APIC provides a feature to show so-called debug info. The user needs to
enable it by clicking the settings icon (little cog wheel) and choosing Show Debug Info
from the context menu. Then, in the bottom little status bar, APIC starts showing the
current MO. The last word in the model path (before the left square bracket) is always
the object’s class name, and the string inside the brackets refers to the exact object’s
distinguished name (as shown in Figure 10-9).
514
Chapter 10 ACI Automation and Programmability
Save-As Feature
Another way to get an object’s DN, together with its entire JSON or XML representation,
is by right-clicking the object’s name in APIC GUI and choosing Save As. In the opened
form, there are several options (as shown in Figure 10-10):
• Get only the object itself, or all its children objects as well.
• JSON or XML output data format
Some objects can be alternatively saved by clicking the little symbol in the upper
right part of their configuration form.
Listing 10-4 illustrates the resulting JSON file, which once again contains the two
most useful pieces of information for object identification:
515
Chapter 10 ACI Automation and Programmability
{
"totalCount": "1",
"imdata": [
{
"fvTenant": {
"attributes": {
"annotation": "",
"descr": "",
"dn": "uni/tn-Apress",
"name": "Apress",
"nameAlias": "",
"ownerKey": "",
"ownerTag": "",
"userdom": ":all:"
}
}
}
]
}
In fact, a downloaded object in JSON or XML format can be manually edited in case
of need and returned back to APIC using the Post option from the context menu shown
in Figure 10-10, without you even knowing anything about REST API calls.
Visore
In the documentation section, I’ve noted that it just describes object templates in the
ACI’s model database. However, there is another tool enabling you to actually browse
the object instances graphically and see all their attributes in “raw” data format: Visore
(also commonly referred to as object store).
Accessing Visore is simple; it’s a native part of APIC with a URL: https://<APIC-IP>/
visore.html.
516
Chapter 10 ACI Automation and Programmability
This tool mostly expects that you already know either the class name, DN, or URL
of the requested object. An alternate way to access Visore for a single object without
knowing its API-related information is by right-clicking a given object in the APIC GUI
and choosing Open in Object Store Browser (this option is visible in Figure 10-10).
Visore allows you to filter returned objects based on multiple logical operations
related to the object’s attribute value. Figure 10-11 illustrates requesting all bridge
domains configured in ACI (based on the fvBD class name) that have “Apress” in their
distinguished name, effectively filtering just those that are part of the Apress tenant.
Even though it’s not so apparent at first sight, Visore has considerably more useful
features:
• The angle brackets, < >, bounding the DN of each object allow you to
interactively browse the object model tree. Either you can show the
parent object and move one level higher in the model hierarchy by
doing so or request all child objects of the current one.
517
Chapter 10 ACI Automation and Programmability
There is also a hypertext link shown after running the Visore query “Show URL
and response of the last query.” It serves a detailed examination of what query was sent
to APIC and to check the raw text response (see Figure 10-12) In my opinion, this is a
great practical learning source for building your own REST calls with complex filtering/
sorting capabilities. Visore will assemble the main part of the REST call URL for you
automatically, and you can continue with additional enhancements.
API Inspector
The last graphical tool and, in fact, a brilliant ACI automation assistant is available from
the main settings context menu (the little cogwheel in the upper right GUI corner), when
you click Show API Inspector.”
518
Chapter 10 ACI Automation and Programmability
You already know that all actions performed by the ACI administrator in the APIC’s
GUI are translated into a series of predefined REST calls. Wouldn’t it be awesome to have
an option to see in real time all the API calls flowing between your browser and APIC?
That’s exactly what API Inspector does (see Figure 10-13).
The API Inspector operation is simple. It will always open in the new browser window
with a blank space in the middle, waiting for any REST call to happen. Then all you need to
do is to click on something in the ACI or create an object-–literally perform any supported
GUI operation. The API Inspector will capture all the API communication needed to
process your request, show all returned resources on the page, or create the defined object.
The Inspector’s output consists of individual REST calls, with four lines for each:
519
Chapter 10 ACI Automation and Programmability
Such a valuable insight about a GUI operation can be 1:1 replicated for your own
REST calls. So, if you don’t know how to get the required ACI information via the API, but
you are aware of where to find it in the GUI, have a look through the API Inspector, note
the call, and then reproduce it. The same applies to creating ACI objects. If you’re unsure
of the correct object name and its DN, create it in ACI while the API Inspector is open
and then search for POST call in it.
APIC CLI
Besides various GUI tools, APIC offers one CLI tool worth mentioning for object
querying: MoQuery. It has no browsing capabilities like Visore, but it still brings simple
use options on how to view objects will all their attributes. Under the hood, this utility
will generate REST calls on your behalf with extensive parametrization capabilities. You
can ask for a single object, the whole class, filter the output, or choose the output format.
MoQuery is able to return not only JSON and XML, but also data formatted into a table,
or just plain, human-readable text blocks. The following MoQuery helps illustrate all the
options. See Listing 10-5.
apic1# moquery –h
usage: Command line cousin to visore [-h] [-i HOST] [-p PORT] [-d DN]
[-c KLASS] [-f FILTER] [-a ATTRS]
[-o OUTPUT] [-u USER]
[-x [OPTIONS [OPTIONS ...]]]
optional arguments:
-h, --help show this help message and exit
-i HOST, --host HOST Hostname or ip of apic
-p PORT, --port PORT REST server port
-d DN, --dn DN dn of the mo
-c KLASS, --klass KLASS comma seperated class names to query
-f FILTER, --filter FILTER attribute filter to accept/reject mos
-o OUTPUT, --output OUTPUT Display format (block, table, xml, json)
-u USER, --user USER User name
-x [OPTIONS [OPTIONS ...]], --options [OPTIONS[OPTIONS ...]]
Extra options to the query
520
Chapter 10 ACI Automation and Programmability
Let’s say you want to receive all distinguished names of EPG objects where the tenant
parent is Apress. With MoQuery, simply ask for the fvAEPg class and grep the desired
output, as shown in Listing 10-6.
0DQGDWRU\ 2SWLRQDO
)LOWHULQJ
+773RU $3,&,3RU'16 .H\ZRUGV³PR´ RU
$3, ,I02'LVWLQTXLVKHG1DPH ,QSXW2XWSXW 6RUWLQJ
+7736 ZLWKRSWLRQDO ³FODVV´³QRGH´ LV
V\VWHP ,I&ODVV&ODVV1DPH 'DWD)RUPDW 2WKHU
SURWRFRO SRUW RSWLRQDO
RSWLRQV
521
Chapter 10 ACI Automation and Programmability
By appending /api to the standard APIC URL, you tell the web server to forward your
request to the REST subsystem. Then followed by /node/mo or /node/class, you can
choose if the returned output will consist of single managed object only, or all objects
belonging to the same class. The keyword /node/ is optional. Based on a previous
decision, the next part of the URL requires either DN or class name specification. As a
file extension at the end of the URL, you have to define the expected input or output data
format (.xml or .json). In fact, ACI’s API ignores standard HTTP Content-Type or Accept
headers, otherwise used for content specification, so only the URL suffix matters.
The whole section described up to now can be considered mandatory in a REST
URL. However, you can optionally continue by expanding the definition with various
options summarized in Table 10-6.
query-target= {self | children | Query returns single object, also children objects, or
subtree} the whole subtree
target-subtree- Class-name Returns subtree objects only with defined class or
class= classes separated by comma
query-target- [eq|ge|le…] Returns only objects with attributes matching the
filter= (attribute,value) defined expressions
rsp-subtree= {no | children | full} Another way to include children objects or the whole
subtree in the response, with more additional options
rsp-subtree- Class-name Returns subtree objects only with defined class or
class= classes separated by comma
rsp-subtree- Filter expression Only objects with attributes matching the defined
filter= expressions are returned
rsp-subtree- Category Includes more related objects from the following
include= categories: audit logs, event logs, faults, fault
records, health, health records, relations, stats, tasks
order-by= classname.property | Sorting based on a chosen object’s attribute
{asc | desc}
time-range= { 24h | 1week | 1month Applicable for all log objects: faults, events, audit
| 3month | | range } logs. Range in yyyy-mm-dd|yyyy-mm-dd format
522
Chapter 10 ACI Automation and Programmability
Listing 10-7 shows multiple query examples. Others can be constructed either on
your own or using already described tools Visore and API Inspector.
GET https://<APIC-IP>/api/mo/topology/pod-1/node-1/sys/ch/bslot/board/
sensor-3.json
GET https://<APIC-IP>/api/mo/uni/tn-Apress.json?query-
target=subtree&target-subtree-class=fvAEPg&rsp-subtree-include=faults
GET https://<APIC-IP>//api/class/fvBD.json?query-target=subtree&order-
by=fvBD.modTs|desc
GET https://<APIC-IP>/api/class/fvTenant.json?rsp-subtree-
include=health&rsp-subtree-filter=lt(healthInst.cur,"50")
For object creation in ACI you will use the POST method and there are two options to
construct the URL:
Regardless of the chosen option, the body of the POST call has to carry JSON or XML
objects (based on URL suffix), describing at least the minimum required attributes for a
particular object (in case of a universal URL it’s the object’s DN or name; with a standard
URL you have to define the object class name but no other attribute is mandatory).
Additionally, using the single POST call, you can also describe the whole object
hierarchy to be created following the model tree. For such use cases, a “children” array of
objects is required in the JSON/XML body.
Subsequent examples in Listing 10-8 show all of the approaches
523
Chapter 10 ACI Automation and Programmability
POST https://<APIC-IP>/api/node/mo/uni.json
BODY:
{
"fvTenant": {
"attributes": {
"dn": "uni/tn-Apress"
},
"children": []
}
}
POST https://<APIC-IP>/api/node/mo/uni/tn-Apress/ctx-PROD.json
BODY:
{
"fvCtx": {
"attributes": {
"name": "PROD"
},
"children": []
}
}
POST https://<APIC-IP>/api/node/mo/uni.json
BODY:
{
"fvBD": {
"attributes": {
"dn": "uni/tn-Apress/BD-FrontEnd_BD",
"unkMacUcastAct": "flood",
"arpFlood": "true",
"unicastRoute": "false",
"status": "created"
},
524
Chapter 10 ACI Automation and Programmability
"children": [{
"fvRsCtx": {
"attributes": {
"tnFvCtxName": "PROD",
"status":"created,modified"
},
"children": []
}
}]
}
}
• The first call creates a single tenant using a universal URL and the DN
in the call body.
• The second call creates a VRF using a standard REST URL, without
any attribute, just the VRF class name (fvCtx).
• Finally, the third call adds a bridge domain with VRF association in
one step using a children array and a universal URL again.
525
Chapter 10 ACI Automation and Programmability
Listing 10-9 describes authentication to the REST API using both JSON and XML
objects with a response containing a token.
POST https://<APIC-IP>/api/aaaLogin.xml
BODY: <aaaUser name="admin" pwd="cisco123">
RESPONSE:
{
"totalCount": "1",
"imdata": [
{
"aaaLogin": {
"attributes": {
"token": "eyJhbGciOiJSUzI1NiIsImtpZCI6ImljeW5xYXF
tcWxnMXN5dHJzcm5oZ3l2Y2ZhMXVxdmx6IiwidHlwIjoiand0In0.
eyJyYmFjIjpbeyJkb21haW4iOiJhbGwiLCJyb2xlc1IiOjAsInJvb
GVzVyI6MX1dLCJpc3MiOiJBQ0kgQVBJQyIsInVzZXJuYW1lIjoi
YWRtaW4iLCJ1c2VyaWQiOjE1Mzc0LCJ1c2VyZmxhZ3MiOjAsImlhd
CI6MTY2MDM5ODMyNiwiZXhwIjoxNjYwNDAxMzI2LCJzZXNzaW9ua
WQiOiJKU0dhdjd4NlJoaUhpTnRuSnJENnpnPT0ifQ.iQ1i3SP6J_
LWiPWx79CUuDT_VYM0JF_x99zjr7h64hFGPj7Dd2Y2F0mO0s1-
526
Chapter 10 ACI Automation and Programmability
vI33kRaNHQSJ2VyE8B5kd8dL5jNCv_UC0ca0-wILzZrb9y8_
itzagaLss6dLOV8Z8VQEht5K-1ys0awWtSTt0ORJQ-MvyZxc
XEr3cq-NO_0WRjTBDQcODYrRQEqYUZfX5hhEJF6izuVblm_
0lTAcVG133oti8uxepeRaNTgywyrDAgXZn6rg1tJuvKtS2Dh
HfkI3U5uXAY-qzNk-cC6oU4CzjCP5OD1ZFyNKSDYpDmty1ebe7
fyGode7wYZYh-zVn3N5JBn3YIKDvh6mGNIfL1OnQg",
"siteFingerprint": "icynqaqmqlg1sytrsrnhgyvcfa1uqvlz",
"refreshTimeoutSeconds": "600",
"maximumLifetimeSeconds": "86400",
"guiIdleTimeoutSeconds": "3600",
"restTimeoutSeconds": "90",
"creationTime": "1660398326",
"firstLoginTime": "1660398326",
"userName": "admin",
"remoteUser": "false",
"unixUserId": "15374",
"sessionId": "JSGav7x6RhiHiNtnJrD6zg==",
"lastName": "",
"firstName": "",
"changePassword": "no",
"version": "5.2(3e)",
"buildTime": "Sun Oct 17 03:10:47 UTC 2021",
"node": "topology/pod-1/node-1"
}
}
}
]
}
527
Chapter 10 ACI Automation and Programmability
POST https://<APIC>/api/aaaLogin.json
BODY:
{
"aaaUser" : {
"attributes" : {
"name" : "apic:TacacsDomain\\userxxx",
"pwd" : "password123"
}
}
}
{
"totalCount": "2",
"imdata": [
{
"fvTenant": {
"attributes": {
"annotation": "",
"childAction": "",
"descr": "",
"dn": "uni/tn-common",
"name": "common",
<output omitted>
}
}
},
{
"fvTenant": {
"attributes": {
"annotation": "",
"childAction": "",
"descr": "",
"dn": "uni/tn-mgmt",
"name": "mgmt",
<output omitted>
}
}
}
]
}
How would you extract, for example, their names? The actual implementation
depends on the JSON processor, but most of tools and programming languages navigate
through the JSON hierarchy using either square brackets as in response["imdata"]
[0]["fvTenant"]["attributes"]["name"] or dotted notation as in response.
imdata[0].fvTenant.attributes.name.
529
Chapter 10 ACI Automation and Programmability
Note that in order to read the tenant name you need to first move into the imdata
key. An often missed fact: it is an array (denoted by square brackets, []). Therefore, you
need to address items from an array, starting at 0 for the first item and 1 for the second.
Then, the object class name follows as a next key (fvTenant). Inside you will find the
attributes key and finally the name attribute. Sometimes ACI objects can contain
besides attributes also a “children” key, which is another array of extra objects. In the
end, this philosophy is universally applicable to any object result returned by APIC.
The next call will use the collected cookie and list all APIC tenants. Its output is piped
to a simple JSON processor utility named jq and it will extract their names. This way, you
can specifically read any object’s attribute. See Listing 10-13.
To create a new object, specify its JSON/XML body directly in the command or put a
file path containing the JSON/XML object as a -d (--data) argument. See Listing 10-14.
530
Chapter 10 ACI Automation and Programmability
Tip If not already, install cURL or jq utilities on Linux using following commands:
Ubuntu/Debian
# apt-get install curl
# apt-get install jq
Centos/RHEL
# dnf install curl
# dnf install jq
Postman
Another very popular, this time graphical, tool for interacting with REST APIs is Postman.
Besides being an HTTP client, it is a comprehensive platform to design, test, and
document your own APIs. But for this use case, you will stay focused just on HTTP features.
Postman’s graphical interface is pretty intuitive. Figure 10-15 illustrates the main screen.
531
Chapter 10 ACI Automation and Programmability
Using separate tabs you are able to define individual REST calls. In the upper part of
the screen, choose the HTTP method, enter the desired URL, and fill the body section
below for POST calls. The bottom field show the response.
Each constructed REST call can be saved to Postman Collections (visible in the left
panel). Their execution is then easily automated using the Postman Runner feature.
When authenticating against APIC from Postman, you don’t need to manually
extract the token and add it to all following calls. Postman is intelligent enough to create
and use APIC-cookie automatically.
532
Chapter 10 ACI Automation and Programmability
Start by creating a new Python source file with initial imports and global variables
used within the functions later. In addition to the requests library, you also import a
JSON library, which is a native part of Python to process JSON objects. See Listing 10-16.
#!/usr/bin/env python
import requests
import json
requests.packages.urllib3.disable_warnings()
# APIC Credentials:
APIC = <PUT YOUR IP HERE>
USER = "admin"
PASS = "cisco123"
#Authentication to APIC
def apic_authenticate():
login_response = requests.post(AUTH_URL, json=AUTH_BODY,
verify=False).content
response_body_dictionary = json.loads(login_response)
token = response_body_dictionary["imdata"][0]["aaaLogin"]["attributes"]
["token"]
cookie = {"APIC-cookie": token}
return cookie
Listing 10-18 represents a function to query APIC for a list of tenants and return
already preprocessed JSON object in the Python dictionary form.
533
Chapter 10 ACI Automation and Programmability
def get_tenants(cookie):
url = "https://"+ APIC + "/api/node/class/fvTenant.json"
headers = {
"Cookie" : f"{cookie}",
}
response = requests.get(url, headers=headers, verify=False)
return response.json()
The last function creates a new tenant object. As you are sending a POST call, the
JSON payload needs to be prepared. See Listing 10-19.
payload = {
"fvTenant": {
"attributes": {
"name": f"{tenant_name}",
},
"children": []
}
}
response = requests.post(url, data=json.dumps(payload),
headers=headers, verify=False)
return response
Now, to execute all of the previous functions, you need to compose the Python main
function, serving as an entry point to the application. Listing 10-20 also shows the way to
iterate through an acquired Python dictionary with tenant objects.
534
Chapter 10 ACI Automation and Programmability
if __name__ == "__main__":
cookie = apic_authenticate()
response = get_tenants(cookie)
Note The source code for this book is available on GitHub via the book’s product
page, located at www.apress.com/ISBN.
Note Before using the previous commands, make sure to have the python3-pip
package installed on some Linux machine. Python versions higher than 3.4 should
include pip3 by default. For Python versions prior to 3.4, upgrade your installation
in the following way:
#python -m pip3 install --upgrade pip
#!/usr/bin/env python
from cobra.mit.access import MoDirectory
from cobra.mit.session import LoginSession
from cobra.model.fv import Tenant
from cobra.model.pol import Uni
from cobra.mit.request import ConfigRequest
import requests.packages.urllib3
requests.packages.urllib3.disable_warnings()
536
Chapter 10 ACI Automation and Programmability
USERNAME = "admin"
PASSWORD = "cisco123"
APIC_URL = "https://10.17.87.60"
def apic_authenticate():
moDir = MoDirectory(LoginSession(APIC_URL, USERNAME, PASSWORD))
moDir.login()
return moDir
The next function will universally gather an array of ACI objects based on the
specified class name. This time, it’s not an array of JSON objects, but native Python
objects 1:1 mapping ACI’s model thanks to the acimodel library. See Listing 10-23.
The third function creates a new tenant object and illustrates how to use a generic
ConfigRequest to alter ACI’s configuration. See Listing 10-24.
Like in the previous case, you will finish with creating Python's main function, calling
all others. The source code in Listing 10-25 first authenticates on APIC and queries for
the class fvTenants. Then it iterates through the returned array of objects, but this time
you can access all their attributes directly using dotted notation. Technically you are
reading a Python object’s variables. In the end, you call the function creating the tenant,
check the status code of a reply, and log out from APIC.
537
Chapter 10 ACI Automation and Programmability
if __name__ == "__main__":
mo = apic_authenticate()
response = query_class(mo, "fvTenant")
for tenants in response:
print(tenants.name)
response = create_tenant(mo, "Cobra_Tenant")
print (response.status_code)
mo.logout()
Note The source code for this book is available on GitHub via the book’s product
page, located at www.apress.com/ISBN.
538
Chapter 10 ACI Automation and Programmability
$&,
0RGXOH
1;26
0RGXOH
$QVLEOH &RUH
,26
0RGXOH
$6$
0RGXOH
Ansible is able to use both APIs and CLIs, without the need for any agent installation,
making it quite a universal tool not only to manage infrastructure devices but also
operating systems and various applications. Core libraries as well as individual modules
are written in Python, simplifying the expansion by the community and external
contributors.
Although it’s not my goal to explain every aspect of Ansible (in fact, that would
be a topic for another book), in the following sections I will go through some basic
components, structure, and operations. I will describe all the features needed to begin
building interesting ACI automation scripts on your own.
539
Chapter 10 ACI Automation and Programmability
$QVLEOH3OD\ERRN
3OD\
7DVN
7DVN
7DVN
3OD\
7DVN
7DVN
Playbooks are formatted exclusively in YAML, which is why we spent time talking
about it at the beginning of this chapter. Remember to keep a close eye on your
whitespaces in playbooks. Everything needs to be correctly indented, or you will receive
an error when running such a playbook. Most mistakes and issues of my students during
the training are related just to the text formatting. But don’t worry. As with everything, it
only requires some time to get familiar with YAML (and consequently Ansible).
Playbook Structure
Let’s have a look at the structure of a simple playbook. Listing 10-26 gathers a list of ACI
tenant objects and saves them to the defined variable.
540
Chapter 10 ACI Automation and Programmability
tasks:
- name: Query all tenants
aci_tenant:
hostname: '{{ ansible_host }}'
username: '{{ username }}'
password: '{{ password }}'
validate_certs: no
state: query
register: all_tenants
Each playbook carries list of plays. Therefore, it starts with a dash, -, which means a
list in YAML. Ansible plays consists of several attributes:
541
Chapter 10 ACI Automation and Programmability
In the tasks section, similar to plays, each task inside a list has the following
attributes:
Inventory File
So, if the playbook defines what will be performed on the managed infrastructure, how
does Ansible know which device(s) to use? That’s what the inventory file is for. Written
again in the YAML or INI format (which I find in this case much more readable), it
describes a set of managed devices, usually organized into groups, to which we can refer
from the playbook hosts attribute. The generic format of the INI inventory file is shown
in Listing 10-27.
ipn-1.test.com
[APICs]
apic1.test.com
apic2.test.com
apic3.test.com
542
Chapter 10 ACI Automation and Programmability
[NDOs]
ndo-node1 ansible_host=172.16.2.11
ndo-node2 ansible_host=172.16.2.12
ndo-node3 ansible_host=172.16.2.13
Ansible Variables
The next important function of an inventory file is its ability to define variables you can
later call inside your playbooks instead of repetitive static values. Variables are either
assignable to individual hosts or to the whole group. Listing 10-28 shows both options.
[APICs]
apic1 ansible_host=10.0.0.1 nodeID=1 serialNumber=FDOXXX1
apic2 ansible_host=10.0.0.2 nodeID=2 serialNumber=FDOXXX2
apic3 ansible_host=10.0.0.3 nodeID=3 serialNumber=FDOXXX3
[APICs:vars]
username = admin
password = cisco123
You can also put variables directly into the playbook itself, but they will become local
and playbook significant, so it depends on a given use case.
Another common option to incorporate variables into Ansible operations (which I
personally use quite a lot) is creating special directories named host_vars and group_
vars. Ansible searches for them in a relative path to the inventory file (either in /etc/
ansible/ or your own defined location). Inside these directories, you can construct
individual variable files for device aliases or groups used in the inventory file. If I used in
the previous example “apic1-3” in the “APICs” group, the resulting folder structure could
look like the code in Listing 10-29.
543
Chapter 10 ACI Automation and Programmability
/etc/ansible/group_vars/APICs.yml
/etc/ansible/host_vars/apic1.yml
/etc/ansible/host_vars/apic2.yml
/etc/ansible/host_vars/apic3.yml
Later in playbooks variables are referenced by their names as shown in Listing 10-30
(quotes or apostrophes are mandatory).
tasks:
- name: Add a new tenant
cisco.aci.aci_tenant:
host: '{{ ansible_host }}'
username: '{{ username }}'
password: '{{ password }}'
Ansible Roles
In these practical examples you will be using roles as well. They group together multiple
Ansible resources, including playbooks, tasks, additional plugins, variables, and more.
The goal is to create easily reusable and shareable entities to be exchanged between
Ansible users. Roles have a predefined and mandatory folder structure you have to
adhere to. The structure has to be created in the working directory from which the role
will be called (included) in a playbook. Multiple folders further require a main.yml file,
serving as an initial entry point for the execution of the included files. Listing 10-31
shows the most important role components and the ones we have already talked about
(but the list is even wider).
roles/
aci/ # <-- Name of the Role
tasks/ # <-- Set of tasks
main.yml #
544
Chapter 10 ACI Automation and Programmability
From now on, you can start using all ACI Ansible modules to create and modify ACI
objects.
Tip The complete Ansible module documentation for ACI collection can be found
at https://docs.ansible.com/ansible/latest/collections/cisco/
aci/index.html and https://galaxy.ansible.com/cisco/aci.
545
Chapter 10 ACI Automation and Programmability
[aci_group]
aci
Point Ansible to use this inventory file instead of the default one in /etc/ansible by
creating the ansible.cfg file in the working directory. See Listing 10-34.
[defaults]
# Use local inventory file
inventory = inventory
Now create a new directory host_vars and the aci.yml variable file there. Its name
is a reference to the alias “aci” from inventory, so any variable present within will
be associated with aci host. For now, you will put just a piece of login information to
APIC. See Listing 10-35.
546
Chapter 10 ACI Automation and Programmability
Tip To avoid using plain text credentials Ansible offers a security feature called
Ansible Vault. It’s out of scope of this book, but if you are interested, here is the
documentation: https://docs.ansible.com/ansible/latest/user_
guide/vault.html.
You are ready to prepare your first playbook, create_tenant.yml. In the host
attribute, there is a reference to the “aci” alias from the inventory file. Thanks to the
mapped variable file, instead of using a static IP, username, and password in each task,
you can substitute them with variable names. Module cisco.aci.aci_tenant, used
in the following example, will be based on its configured state (present, absent, query)
to either create, remove or list ACI tenants. In your case, you will create a single tenant
named Apress_Ansible thanks to the state set to “present.” Note that even though
Ansible is primarily a procedural tool, implementation of particular modules and the
nature of a destination managed system can sometimes make it declarative as well. ACI
modules will always ensure the desired state of the object defined in playbooks. See
Listing 10-36.
tasks:
- name: Add a new tenant
cisco.aci.aci_tenant:
host: '{{ ansible_host }}'
username: '{{ username }}'
password: '{{ password }}'
validate_certs: no
tenant: Apress_Ansible
description: Tenant created by Ansible
state: present
delegate_to: localhost
547
Chapter 10 ACI Automation and Programmability
Try running this playbook from your working directory and, if all the files are
correctly prepared, you should receive similar output to the tenant created in ACI. See
Listing 10-37.
<output omitted>
As a next step, you will create new playbook named query_tenant.yml, using
the same ACI tenant module, but this time its state is set to query, requesting the list
of tenants in JSON format. Ansible can save the output of the module (if any) to the
specified variable with the register keyword, in your case creating the variable
all_tenants. See Listing 10-38.
tasks:
- name: Query all tenants
cisco.aci.aci_tenant:
hostname: '{{ ansible_host }}'
username: '{{ username }}'
password: '{{ password }}'
548
Chapter 10 ACI Automation and Programmability
validate_certs: no
state: query
register: all_tenants
You will add another two tasks in playbook query_tenant.yml. The first debug
module just prints the content of the variable all_tenants and the second task will
perform a local copy action, creating a new file in the working directory per each
returned tenant object. See Listing 10-39.
- debug:
var: all_tenants
- local_action:
module: copy
content: "{{ item.fvTenant.attributes }}"
dest: ./Tenant_{{ item.fvTenant.attributes.name }}_JSON
with_items:
- "{{ all_tenants.current }}"
In the previous local_action task you can observe one of the commonly used
options to implement iterations: with_items. When provided with a list of objects (all_
tenants.current), it will run the related task multiple times for each item in the list. By
default, each item during an iteration is accessible from the task by referring to the local
item variable (e. g., item.fvTenant.attributes). fvTenant and attributes are standard
ACI JSON object keys. The item name is configurable, as you will see in the following
example.
After running the finished playbook query_tenant.yml, in your working directory
you should see individual text files named by your tenants, carrying the content of their
JSON objects.
Note The source code for this book is available on GitHub via the book’s product
page, located at www.apress.com/ISBN.
549
Chapter 10 ACI Automation and Programmability
550
Chapter 10 ACI Automation and Programmability
app_profile:
- name: WebApp
- name: JavaApp
Now let’s compose a specialized Ansible role called tenant_structure to help with
creating defined tenants, VRFs, and application profiles. A significant advantage of
using roles here is the ability to actually implement nested loops in Ansible. What for?
Consider the tenant structure defined in the variables file from Listing 10-39. You have a
list of tenants, which is the first level of an iteration. You need to go through each tenant
object and create it. Then, inside each tenant, you can have another two lists of multiple
VRFs and application profiles; that is second nested level of iteration. To perform such a
creation at once, Ansible roles are the solution.
In your working directory, create a new folder structure called ./roles/tenant_
structure/tasks. Inside the tasks folder, four YAML files will be created: main.yml,
tenant_create.yml, vrf_create.yml and app_create.yml. main.yml is predefined by
Ansible to serve as an initial point of execution when calling the role later. Therefore, it
just consists of modular task includes. See Listing 10-41.
- include: tenant_create.yml
- include: vrf_create.yml
- include: app_create.yml
All other role YAMLs will define just the tasks needed to create particular ACI
objects. The tenant_create.yml file will use the already-known ACI module to create a
new tenant. Note that the tenant’s name isn’t a static value, but rather a name attribute
of a variable called tenant_item. The variable will be created in the main playbook later,
with content of your tenant structure from ./host_vars/aci.yml (aci_config.aci_
tenant_structure) and passed to this role automatically. See Listing 10-42.
551
Chapter 10 ACI Automation and Programmability
The second file, vrf_create.yml , implements a nested loop over a list of VRFs from
the tenant structure in the ./host_vars/aci.yml variable file. Another option to iterate
over a list is shown in Listing 10-43 using the loop keyword.
And the third role task YAML file, app_create.yml, is analogical to previous one, just
the module used here creates application profile objects. See Listing 10-44.
552
Chapter 10 ACI Automation and Programmability
validate_certs: no
state: present
delegate_to: localhost
loop: "{{tenant_item.app_profile}}"
The completed role now needs to be called from some standard playbook. In the
main working directory, create deploy_tenant_structure.yml with the content in
Listing 10-45.
tasks:
- name: Create Tenant structures
include_role:
name: tenant_structure
loop: "{{ aci_config.aci_tenant_structure }}"
when: tenant_item.state == 'present'
loop_control:
loop_var: tenant_item
The previous task isn’t using any particular Ansible module, but instead it is calling
a precreated role named tenant_structure. Additionally, it loops through a list in
your tenant variable structure in ./host_vars/aci.yml: aci_config.aci_tenant_
structure. For each item in the tenant list, all tasks defined inside the role will be
executed. Further, this is an example how conditionals can be implemented in Ansible;
the when keyword checks if a defined condition is true and only then the loop action is
performed. Finally, the loop_control and loop_var keywords enable you to change
the default item name in the iteration from item to tenant_item in your case. That’s the
variable name you have already seen, which is automatically passed to role tasks to be
used there.
Try running deploy_tenant_structure.yml and you should receive the output
shown in Listing 10-46 together with main tenant objects created in ACI as described in
the variable file.
553
Chapter 10 ACI Automation and Programmability
554
Chapter 10 ACI Automation and Programmability
Figure 10-18. Source CSV data file editable in Visual Studio code
The first row in the CSV is considered to be a set of column names, later referenced
from Ansible modules as variable key names. The structure of CSV is completely up to
you and your needs. Consider Figure 10-18 as an example to start with. I use several of
such excel tables to create a different set of objects in ACI.
It’s useful to put each CSV’s file name to a variable file in ./host_vars/aci.yml.
If you later change the filename, only a single line in Ansible will be updated. See
Listing 10-47.
bd_epg_csv_filename: "ACI_VLAN_BD_EPG.csv"
The playbook deploy_bds.yml will read the data from the CSV file using read_csv,
a native Ansible module, and save a list of its rows into the specified variable bd_set.
With that, you can easily iterate over each row and access any column’s value by the
column name. With two iterations through all rows, you will create bridge domain
objects and subnets. Note that all ACI object attributes are parametrized now. Just with
a change in the CSV source file, you will alter the execution of the Ansible playbook. See
Listing 10-48.
Listing 10-48. Playbook Creating ACI Bridge Domains from a CSV File
555
Chapter 10 ACI Automation and Programmability
tasks:
- name: Read data from CSV file {{ bd_epg_csv_filename }}
read_csv:
path: "{{ bd_epg_csv_filename }}"
delimiter: ';'
register: bd_set
Similarly, you will create new EPGs from CSV data and associate them with bridge
domains in playbook deploy_epgs.yml. See Listing 10-49.
556
Chapter 10 ACI Automation and Programmability
tasks:
- name: read data from CSV file {{ bd_epg_csv_filename }}
read_csv:
path: "{{ bd_epg_csv_filename }}"
delimiter: ';'
register: epg_set
It’s completely fine to run previous playbooks individually whenever they are
needed. However, the even better practice is to prepare a single main.yml and import
them there in a modular fashion. See Listing 10-50.
##############################################################
##### Main playbook for ACI Configuration Management #####
##############################################################
557
Chapter 10 ACI Automation and Programmability
Note The source code for this book is available on GitHub via the book’s product
page, located at www.apress.com/ISBN.
558
Chapter 10 ACI Automation and Programmability
7HUUDIRUP
7HUUDIRUP&RUH
$&,3URYLGHU
3OXJLQV
UHVRXUFH7HQDQW
,QWHUVLJKW
UHVRXUFH95) 1;263URYLGHU 8&63URYLGHU
3URYLGHU
559
Chapter 10 ACI Automation and Programmability
7HUUDIRUP
,QIUDVWUXFWXUH 6WDWH
8SGDWH DQG GLII ZLWK
&21),* 67$7( &RQILJ )LOH
&KDQJH 'HSOR\PHQW
5HTXLUHG 6WDWH &XUUHQW ,QIUD 6WDWH
%DVHG RQ WI )LOHV WHUUDIRUPWIVWDWH )LOHV
Before proceeding with the definition of managed resources in the config file, you
need to tell Terraform which providers you would like to use and usually configure their
authentication to the managed system. The provider in Listing 10-51 is used for ACI.
terraform {
required_providers {
aci = {
source = "CiscoDevNet/aci"
}
}
}
provider "aci" {
# cisco-aci user name
username = "admin"
# cisco-aci password
password = "cisco123"
# cisco-aci url
url = "https://10.17.87.60"
insecure = true
}
560
Chapter 10 ACI Automation and Programmability
Note As soon as you first start an automation job later, Terraform will
automatically install all additional plugins needed for its operation on your behalf
thanks to the providers definition.
With providers in place, you can start creating managed resources. In Terraform,
each resource is organized into code blocks with the consistent syntax shown in
Listing 10-52.
In the case of ACI, a sample resource block creating a new Tenant object could look
like Listing 10-53.
561
Chapter 10 ACI Automation and Programmability
Individual resources can be easily linked between each other by referring to their
names (it’s the reason for the name uniqueness requirement). This function is handy in
ACI as many objects have to be associated with their parent. In Listing 10-54, instead of
using the static distinguished name of the tenant in the VRF resource block, all you need
to do is refer to its resource name (more specifically, its ID). See Listing 10-54.
A common question I get from students and customer is “If I start using Terraform
for managing ACI, will it delete all other objects not specified in configuration files?”
Of course not! Terraform will only take care of objects listed in resource blocks of
configuration files; all others will stay intact. So, you can easily combine multiple
configurations and automation approaches together and only selectively apply
Terraform to some parts of the ACI configuration.
Terraform Commands
For its operation, Terraform uses several commands to initiate automation jobs or check
the current state of the infrastructure.
562
Chapter 10 ACI Automation and Programmability
terraform init
The first command to run inside the working directory with .tf files. Terraform will
initialize the directory, search for required providers and their config blocks, and based
on its findings; it will install all the necessary plugins. See Listing 10-55.
You may now begin working with Terraform. Try running "terraform plan" to
see any changes that are required for your infrastructure. All Terraform
commands should now work.
Now you can start deploying the resource objects into ACI.
Note The terraform init command needs to be rerun each time any change
is made to the provider definition.
563
Chapter 10 ACI Automation and Programmability
terraform plan
The plan command merges together all .tf files in the working directory, downloads the
current state of defined resources from actual infrastructure, and just informs the user
about found differences. It won’t deploy any change yet (as illustrated in Figure 10-21).
ϭ
/RFDOVWDWHILOHLVUHIUHVKHGWR
Ϯ PDWFKUHPRWHLQIUDVWUXFWXUH
7HUUDIRUPFDOFXODWHV
GLIIHUHQFHVEHWZHHQGHVLUHG
FRQILJ DQGFXUUHQW VWDWH
ϯ
'LIIHUHQFHVDUHVKRZQLQ&/,
You will get the following output after running terraform plan if the config files
specify a single new ACI Tenant. See Listing 10-56.
564
Chapter 10 ACI Automation and Programmability
terraform apply
The apply command will run terraform plan first, present you with identified
changes, and then interactively ask for a deployment confirmation. There is a switch
called -auto-approve to actually skip the plan confirmation, but in the default state, it’s a
useful measure to avoid deploying the configuration by mistake. Anyway, if you confirm
the changes, Terraform apply will implement them right away to the infrastructure
(as shown in Figure 10-22).
ϭ
/RFDOVWDWHILOHLVUHIUHVKHGWR
Ϯ PDWFKUHPRWHLQIUDVWUXFWXUH
7HUUDIRUPFDOFXODWHV
GLIIHUHQFHVEHWZHHQGHVLUHG
FRQILJ DQGFXUUHQW VWDWH
ϯ
'LIIHUHQFHVDUHGLUHFWO\DSSOLHGWRWKHLQIUDVWUXFWXUH
Sample output from running the apply command can be found in Listing 10-57.
565
Chapter 10 ACI Automation and Programmability
aci_tenant.Apress_Terraform_Tenant: Creating...
aci_tenant.Apress_Terraform_Tenant: Creation complete after 1s [id=uni/tn-
Apress_Terraform]
terraform destroy
The last main Terraform command worth mentioning is destroy. As you can probably
tell already, when initiated, Terraform performs the plan again, presents you with
expected changes, and after your approval, it will this time remove all managed
resources defined in configuration .tf files (as shown in Figure 10-23).
566
Chapter 10 ACI Automation and Programmability
ϭ
/RFDOVWDWHILOHLVUHIUHVKHGWR
Ϯ PDWFKUHPRWHLQIUDVWUXFWXUH
7HUUDIRUPFDOFXODWHV
GLIIHUHQFHVEHWZHHQGHVLUHG
FRQILJ DQGFXUUHQW VWDWH
ϯ
$OOGHILQHG UHVRXUFHVLQFRQILJDUHUHPRYHGIURPWKHLQIUDVWUXFWXUH
567
Chapter 10 ACI Automation and Programmability
provider "aci" {
username = "Apress"
private_key = "path to private key"
cert_name = "user-cert-"
url = "https://my-cisco-aci.com"
insecure = true
}
provider "aci" {
username = "apic:TACACS_domain\\\\Apress"
# private_key = "path to private key"
# cert_name = "user-cert"
password = "password"
url = "url"
insecure = true
}
568
Chapter 10 ACI Automation and Programmability
terraform {
required_providers {
aci = {
source = "ciscodevnet/aci"
}
}
}
569
Chapter 10 ACI Automation and Programmability
570
Chapter 10 ACI Automation and Programmability
Note The source code for this book is available on GitHub via the book’s product
page, located at www.apress.com/ISBN.
When you apply the previous source file, the objects in Listing 10-62 will be created
in ACI by Terraform (with information about their DNs).
<output omitted>
571
Chapter 10 ACI Automation and Programmability
aci_tenant.Apress_Terraform_Tenant: Creating...
aci_tenant.Apress_Terraform_Tenant: Creation complete after 1s [id=uni/tn-
Apress_Terraform]
aci_vrf.PROD: Creating...
aci_application_profile.Apress_AP: Creating...
aci_bridge_domain.Apress_FrontEnd_BD: Creating...
aci_application_profile.Apress_AP: Creation complete after 1s [id=uni/tn-
Apress_Terraform/ap-Apress_AP]
aci_vrf.PROD: Creation complete after 2s [id=uni/tn-Apress_Terraform/
ctx-PROD]
aci_bridge_domain.Apress_FrontEnd_BD: Creation complete after 2s [id=uni/
tn-Apress_Terraform/BD-FrontEnd_BD]
aci_subnet.Apress_FrontEnd_BD_Subnet: Creating...
aci_application_epg.Apress_Frontend_EPG: Creating...
aci_subnet.Apress_FrontEnd_BD_Subnet: Creation complete after 4s [id=uni/
tn-Apress_Terraform/BD-FrontEnd_BD/subnet-[10.1.1.1/24]]
aci_application_epg.Apress_Frontend_EPG: Creation complete after 4s
[id=uni/tn-Apress_Terraform/ap-Apress_AP/epg-Apress_Frontend_EPG]
In the ACI GUI you can notice interesting behavior achieved by annotating all
objects with the orchestrator:terraform label. First, the tenant is visibly differentiated
from others, stating that it is managed by Terraform (as shown in Figure 10-24).
572
Chapter 10 ACI Automation and Programmability
Secondly, when a user navigates inside the tenant, they will be warned about the
fact that the tenant has been created by and managed from an orchestrator. All other
annotated objects are also marked with a little symbol, meaning they are managed by an
external automation solution (shown in Figure 10-25).
Tip All documentation related to the usage of the ACI Terraform provider can be
found at https://registry.terraform.io/providers/CiscoDevNet/
aci/latest/docs.
573
Chapter 10 ACI Automation and Programmability
574
Chapter 10 ACI Automation and Programmability
This command should create two files: a private key called Apress.key and an X.509
certificate called Apress.crt. Make sure to safely store the private key; you will need it
for every REST call sign later.
Now display the generated certificate in the Privacy Enhanced Mail (PEM) format.
The most important part of the output is the Base64 encoded certificate block starting
with BEGIN CERTIFICATE and ending with END CERTIFICATE. See Listing 10-64.
Copy the whole block including leading and trailing lines to the system clipboard
and create a new local ACI user in Admin -> AAA -> Users. In the second step of the
user creation form, add a User Certificate and paste the previous Base64 output there (as
shown in Figure 10-26).
575
Chapter 10 ACI Automation and Programmability
When you double-click a created user, scroll down to its certificate and open it again
by double-clicking. You will need to download the JSON representation of the certificate
object in ACI using the Save-As function (see Figure 10-27).
576
Chapter 10 ACI Automation and Programmability
In the JSON object, focus only on a single attribute: the DN of the ACI User
Certificate (as shown in Figure 10-28). Note it and put it aside for now.
Finally, you are ready to start signing the REST API calls. Get the list oftTenants with
the following class call:
In the Linux machine, create a simple text file called payload.txt (the filename
doesn’t matter). Its content needs to be in the format of an HTTP method (GET/POST),
directly followed by a truncated REST call URL, just the section starting with /api/… so I
recommend using the echo command in Listing 10-65 to make sure there are no hidden
white characters that could potentially get there by plain copy-pasting.
Based on the payload.txt file you will now generate a signature. With the OpenSSL
commands in Listing 10-66, first create a binary signature (using a private key generated
earlier), then extract the Base64 text representation and let “cat” write it to the terminal.
577
Chapter 10 ACI Automation and Programmability
Again, put aside the resulting key and make sure you don’t copy any additional
whitespace with it.
After the previous steps, you are ready to assemble the REST call. The specifics of a
presigned REST calls are in the four cookies they have to carry, substituting the standard
preceding authentication process. Each cookie has a key=value format with the following
content (the order is not important; you just have to have all of them at the end):
• APIC-Request-Signature= RbjWLfSRO/K9NnWyO9RRrY
+BXSzOWkOcjA38skAK+Hz84X4r7mj3JIEEsuhXBcHk8ISO
qt6impDM8BW/m5PWYU++vG2Rbpx9ir1s5OZOZyfRe
DsOy9RsKUJ0dEAqnIcVQhYzpUuaBVhndfbVPLBVb
5yuEebKyjwTwGlrOnv2A5U=
• APIC-Certificate-DN=uni/userext/user-Apress/usercert-Apress
• APIC-Certificate-Algorithm=v1.0
• APIC-Certificate-Fingerprint=fingerprint
578
Chapter 10 ACI Automation and Programmability
In order to test the presigned call, use any HTTP client of your choice. The cookie
attachment will be highly dependent on its implementation, so refer to the client’s
documentation. For Postman, click the Cookies hyperlink under the Send button (as
shown in Figure 10-29).
Inside the cookie’s configuration, fill in APIC’s IP address or its domain name and
click Add Domain. Subsequently, add the four prepared cookies. Always change just the
first section CookieX=value; you don’t need to touch the rest. In Figure 10-30 you can see
the example of a signature cookie. Create the rest accordingly.
The finished REST call with the cookies in place now just needs to be sent to
APIC and you should right away receive a reply, without previous authentication (see
Figure 10-31).
580
Chapter 10 ACI Automation and Programmability
:HE6RFNHW&OLHQW :HE6RFNHW6HUYHU
ACI has the WebSockets technology implemented and, actually, it natively uses
it a lot in the GUI without you even knowing. I’m sure you have already noticed
automatically refreshing graphs, tables, or lists of objects in the APIC GUI when
configuring and automating the system. That’s where WebSockets connections come
into play and dynamically refresh the data after each object change.
In order to enable a subscription to the particular ACI object, you need to send a
special ?subscription=yes option in the GET URL. Then with the refresh-timeout
option, you can set the maximum amount of time for which the subscription will stay
valid. After its expiration and without refreshing, APIC stops sending updates to the
WebSocket. Let’s suppose you want to subscribe to your Apress tenant. The resulting
URL will look like Listing 10-67.
https://<APIC_IP>/api/node/mo/uni/tn-Apress.json?subscription=yes&refresh-
timeout=60
In a reply to such a call, APIC will return a Subscription ID with a standard ACI JSON
object. This will uniquely identify all messages related to this object sent over WebSocket
and will enable you to track subscriptions and their refreshments.
581
Chapter 10 ACI Automation and Programmability
requests.packages.urllib3.disable_warnings()
# APIC Credentials:
APIC = "<APIC IP ADDRESS>"
USER = "<YOUR USERNAME>"
PASS = "<YOUR PASSWORD"
Several Python functions will follow. The first one simply authenticates the
application to APIC and returns a preformatted APIC-cookie. See Listing 10-69.
#Authentication to APIC
def apic_authenticate():
login_response = requests.post(AUTH_URL, json=AUTH_BODY,
verify=False).content
response_body_dictionary = json.loads(login_response)
582
Chapter 10 ACI Automation and Programmability
token = response_body_dictionary["imdata"][0]["aaaLogin"]["attributes"]
["token"]
cookie = {"APIC-cookie": token}
return cookie
Next, you define a function to request a subscription to the Apress tenant and
return the Subscription ID. Note the ?subcription=yes option in the REST call URL in
Listing 10-70.
With subscriptions it’s important to constantly keep looking for messages coming
over WebSocket and refreshing the Subscription IDs, so the next function starts two
threads responsible for these tasks. See Listing 10-72.
583
Chapter 10 ACI Automation and Programmability
Listing 10-73 shows the actual implementation of the associated target function in
each thread.
# Refresh subscription ID
def refresh(subscription_id, cookie):
while True:
time.sleep(30)
refresh_id = "https://" + APIC + "/api/subscriptionRefresh.
json?id={}".format(subscription_id)
refresh_id = requests.get(refresh_id, verify=False, cookies=cookie)
And finally, you define the main function, serving as an entry point into the
whole application. The main function primarily calls all previously defined ones. See
Listing 10-74.
#Main Function
if __name__ == "__main__":
cookie = apic_authenticate()
token = (cookie["APIC-cookie"])
584
Chapter 10 ACI Automation and Programmability
print ("*" * 10, "WebSocket Subscription Status & Messages", "*" * 10)
wbsocket = ws_socket(token)
subscription_id = subscribe(cookie)
thread_start(wbsocket, subscription_id, cookie)
When the described Python script is run, you should get the Subscription ID of
specified tenant object from the source code. The application will run infinitely until
you stop it using Ctrl+C and wait for any message incoming over WebSocket. See
Listing 10-75.
While the script is running, try changing any attribute of the monitored tenant
object in ACI, for example, its description. Instantly after submitting the change, you
should get the following JSON data to object over WebSocket. It will provide the related
Subscription ID, ACI object DN, changed attribute, modification timestamp, and status
telling you about the object being “modified.” See Listing 10-76.
{"subscriptionId":["72058908314894337"],"imdata":[{"fvTenant":{"attributes"
:{"childAction":"","descr":"WebsocketTest","dn":"uni/tn-Apress","modTs":"2
022-07-31T20:33:33.166+02:00","rn":"","status":"modified"}}}]}
This way you can literally subscribe to any ACI object, ranging from a physical leaf
interface, through its routing tables and endpoint database to ACI faults. The limit
here is just your creativity. Subscriptions, when implemented properly, can become a
powerful tool to construct a real-time monitoring system with automatic reactions to
various events.
585
Chapter 10 ACI Automation and Programmability
Summary
ACI automation and programmability is an extremely interesting and wide topic, as you
saw during the chapter. APIC not only automates the deployment of various policies to
the ACI fabric, but it also offers a completely open API to speed up the creation of these
policies in the automation layer above. I hope in this chapter we have covered everything
you need to do so, ranging from the REST API basic concepts through ACI’s object model
to the vast amount of practical automation aspects and tools.
Thanks to the single common API endpoint, at the end of the day, it’s up to you
and your preference as to which approach or tool you will use. They are mutually
combinable, starting with simpler but less powerful options like the APIC GUI/CLI itself
to completely unlimited possibilities with the Python programming language and Cobra
SDK. Or, if writing Python source code is not your cup of tea, you can still reach for a
popular orchestrator like Ansible or Terraform.
586
Useful Cisco ACI Resources
Although we have covered quite a lot of topics related to ACI in this book, there is always
significantly more to explore. Here are additional useful study and technical resources
directly published by Cisco. I often use them during ACI implementation projects.
Cisco Application Centric Infrastructure Design Guide
www.cisco.com/c/en/us/td/docs/dcn/whitepapers/cisco-
application-centric-infrastructure-design-guide.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-739989.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-743951.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/
guide-c07-743150.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-737855.html
587
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2
USEFUL CISCO ACI RESOURCES
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-739714.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-739571.html
Cisco ACI Multi-Site Architecture White Paper
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-739609.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-743107.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-740861.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-2491213.html
www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-
paper-c11-739971.html
588
USEFUL CISCO ACI RESOURCES
www.cisco.com/c/dam/en/us/td/docs/Website/datacenter/
apicmatrix/index.html
w ww.cisco.com/c/en/us/td/docs/switches/datacenter/aci/
apic/sw/kb/Cisco-ACI-Upgrade-Checklist.html
589
Index
A ACI architecture
Multi-Pod architecture (see also
Access policies
Inter-Pod Network (IPN))
AAEP, 159–161
Multi-Site architecture (see Multi-Site
configuration objects, 165
ACI architectures)
encapsulation resources, 147
multi-tier ACI rack deisgn, 32, 33
global ACI administrator, 147
remote leaf, 46, 47
individual policy objects, 148
stretched fabric, 34–36
interface and switch profile mutual
ACI automation, 505
association, 159
Ansible (see Ansible)
name convention, 171, 172
Terraform (see Terraform)
object-oriented philosophy, 165
ACI capacity dashboard, 241
physical and external domains,
161, 162 ACI CNI components
physical domain and VLAN pool architecture, 489
relationship, 163 CNI Operator, 489
policy group, 148 Containers Controller, 488
practical experience, 165 Containers Host, 488
process, 166, 167 aci-containers-host, 488
structure, 168 mcast-daemon, 488
switch (see Switch policies) opflex-agent, 488
universal configuration Containers Open vSwitch, 488
objects, 147 ACI components, 12, 13, 57, 244, 286
verification, 168 aci-containers.yaml, 472
VLAN pool, 163, 164 ACI control plane, 48–49, 55, 136–140,
VSAN pool, 163, 164 250, 300
VXLAN pool, 163, 164 ACI documentation
ACI access policies, 443, 444 MIM documentation, 512
ACI Ansible collection and all object attributes, 512, 513
installation, 545 REST, 511
ACI application policies, 210, 242–244, ACI encapsulation normalization, 246
457, 472 ACI Endpoint Tracker, 299
591
© Jan Janovic 2023
J. Janovic, Cisco ACI: Zero to Hero, https://doi.org/10.1007/978-1-4842-8838-2
INDEX
ACI fabric, 13, 27, 28, 30, 31, 34–36, 40, 45, pervasive gateways, 272
47, 50–53, 58, 61, 66, 68, 72–74, Unicast Routing checkbox, 271
245, 246, 248, 249, 254, 256, 273, unknown layer 3 unicast, 278–283
285, 293, 303, 393, 396, 417, 429, ACI implementations, 31, 50, 51, 57,
434, 439, 440, 456, 462, 469, 125, 587
494, 496 ACI iVXLAN header, 247
ACI fabric monitoring and backup ACI leaf-spine fabric, 26–28
ACI Syslog, logging (see Logging in ACI licensing, 56, 57
ACI Syslog) ACI management access
APIC cluster backup, 125 APIC connectivity preference, 100, 101
APIC local configuration fabric in-band configuration, 96–100
snapshots, 126 fabric out-of-band
NetFlow (see NetFlow) configuration, 92–95
one-time job, 126 ACI multi-Pod, 28, 38–40, 55, 57, 66, 67, 82
remote APIC configuration ACI multi-Pod architecture, 36–38
backup, 127 ACI multi-Pod configuration
remote configuration export policy ACI IPN architecture and
with scheduler, 127, 128 configuration, 129
SNMP, 111–115 APIC multi-pod wizard, 140, 141
.zip file, 126 COOP database verification, 144
ACI fabric wiring, 28 external routable TEP pool, 129
ACI forwarding, 257 internal TEP pool, 129
general high-level forwarding process, Inter-Pod network, 128
256, 257 Inter-POD network configuration (see
layer 2 forwarding bridge domain, Inter-POD network configuration)
259, 260 IPN-facing spine switch, 142
known layer 2 unicast, 263–266 IPN OSPF verification, 141
L2 Unknown Unicast configuration MP-BGP endpoint database entry, 144
knob, 267–269 MP-BGP peerings verification, 143
multi-destination (ARP) spine multi-pod L3 interfaces, 142
forwarding, 260–263 underlay IPN Routing Subnets, 129
overview, 270, 271 ACI multi-Pod environment, 67
layer 3 forwarding bridge domain, ACI object model
271, 272 ACI MOs names, 510
ARP processing, 273–275 class names, 510
external forwarding, 284 distinguished name (DN), 510
known layer 3 unicast, 282, 283 managed object discovery tools (see
overview of forwarding, 284, 285 Managed object discovery tools)
592
INDEX
593
INDEX
594
INDEX
595
INDEX
596
INDEX
597
INDEX
598
INDEX
preferred group member, 195 ESXi access policies, 441, 445, 447
Qos class, 194 Ethernet Out of Band Channel (EOBC), 23
VM domain profile, 195 Ethernet Protocol Channel (EPC), 23
Endpoint learning verification, 207 Events, 118
APIC, 200 Explicit VPC protection groups, 149
correct interface and Extensible Markup Language (XML), 502,
encapsulation, 203 505–507, 515, 516, 518, 520, 523,
faults related to EPG, 203 525, 526, 530, 574
GUI, 200 External bridge domain, 162
host IP configuration, 203 External Layer 2 ACI connectivity
leaf Switch CLI, 200 ACI BD gateway, 321
silent Host, 203 application profile including
Endpoint security groups, 210, 211 L2OUT, 321
Endpoint-to-EPG mapping approaches, 315
dynamic EPG to interface bridge domain extension, 316–318
mapping, 199 configuration, 316
static EPG to interface EPG extension, 321, 322
mapping, 196–199 extension, 316
Enhanced Interior Gateway Routing external L2 EPG, 320
Protocol (EIGRP), 366 L2OUT configuration, 319, 320
adjacency, 366 migration phases, 315
authentication, 366 Spanning Tree Protocol (STP) and
authentication in interface profile, 368 ACI, 323
autonomous system, 366 extending STP domain to
interface policy, 368 ACI, 323–326
interface policy configuration, 369 overlapping VLANs, 329–332
interface profile, 367 STP-related configuration, 326
K-values, 366 TCNs (see Topology Change
L3OUT configuration, 367 Notifications (TCNs))
with OSPF, 366 External Layer 3 ACI connectivity
protocol verification, 370 ACI fabric, 332
root L3OUT object level, 367 ACI transit routing, 380–383
successor, 366 contract application to external
EPG preferred groups, 229 EPG, 357–359
EPG static path configuration, 197 dynamic routing (see Dynamic routing
Equal-Cost Multi-Path routing protocols in ACI)
(ECMP), 249 external route propagation
ESG preferred groups, 228–230 BGP route reflector policy, 351
599
INDEX
600
INDEX
discovery process, APIC and ACI leaf/ Forwarding Tag ID (FTag), 260–262, 267, 289
spine switch, 79 fTriage, 303–305
leaf switches, 76
multi-pod fabric discovery, 80–82
Node ID, 76 G
registered Nodes, 77 GARP-based endpoint move
Fabric in-band configuration detection, 191
access policies, APIC allow in-band GCM-AES-256-XPN, 15
VLAN connectivity, 99, 100 Generic routing encapsulation (GRE)
APIC node, 99 header, 307
default gateway, under bridge Glean, 190
domain, 97 Graphical user interface
in-band EPG, 98 APIC configuration objects, 75
inb bridge domain, 96, 97 APIC dashboard, 71
Nexus Dashboard platform, 96 admin, 73
tenant policies, in-band management apps, 74
access, 100 fabric, 72, 73
verification, 98 integrations, 74
Fabric Membership, 78 operations, 73, 74
Fabric out-of-band configuration system, 71, 72
ACI Node OOB configuration tenants, 72
verification, 93 virtual networking, 73
configure OOB IP addresses, ACI Group IP outer (GiPO) address, 261, 289,
switches, 92 290, 292
mgmt, 92
OOB contract, 94
out-of-band contract, 94
H
recovery procedure, OOB contract Hardware proxy, 258, 268, 288, 289
miss-configuration, 96 HashiCorp Configuration Language
tenant policies, out-of-band (HCL), 559
management access, 95 Head-of-line blocking, 18
Fabric-wide best practice Hierarchical campus LAN design, 3
configuration, 109–111 Horizontal cabling, 29, 32
Faults, 116, 117 HTTP DELETE, 502
Fiber channel domain, 162 HTTP GET, 502
FibreChannel support, 25 HTTP POST, 502
Five-node APIC cluster, 53, 55 HTTP PUT, 502
Flow tables, 15 Hypervisor virtual switches, 8
601
INDEX
602
INDEX
603
INDEX
604
INDEX
605
INDEX
606
INDEX
save-as ACI user certificate into JSON queuing configuration, 137, 138
object, 576 verification, 139
signing payload file, 578 Quality of Service (QoS), 40, 136
X.509 certificate
generation, 574
local ACI user, 576 R
PEM Format, 575 Redirection REST status codes, 504
Privacy Enhanced Mail (PEM) format, 575 Relative name (RN), 510
ps-redundant (N+1 redundancy), 24 Remote leaf, 30, 46, 47
Python, 499, 538 Rendezvous point (RP), 133
Python ACI automation Representational State Transfer (REST),
authentication function, 533 500, 511
create tenant function, 534, 535 Resilient hashing, 416
get tenants function, 534 Resolution immediacy, 452
imports and global variables, 533 REST API, 500, 509, 512, 516
Python requests library, 532–535 ACI, 501
Python subscription application client-server architecture, 501
ACI object, 583 data format, 501
authentication, 582 HTTP methods
imports and global variables, 582 HTTP DELETE, 502
main function, 584 HTTP GET, 502
multi-thread, 584 HTTP POST, 502
open Web Socket, 583 HTTP PUT, 502
received JSON object over object subscription, 580
Websocket, 585 resource identification, 501
run script, 585 REST CRUD operations, 502
subscription refresh and ws message REST HTTP response
receive, 584 codes, 502–505
stateless nature, 501
WEB vs. REST communication, 501
Q REST API authentication
QoS, ACI control-plane in IPN against login domain, 527, 528
Classification, 137 authentication-related objects, ACI,
custom ACI queuing policy map 525, 526
application, 138, 139 JSON, 525, 526
DSCP to CoS translation policy, 136 token, 527
IPN custom queuing verification, 139 Web Token Timeout(s) field, 525
Multi-Pod ACI, 136 XML, 525, 526
607
INDEX
608
INDEX
609
INDEX
610
INDEX
611