5.3 ACI Upgrade

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

ACI upgrade

Petru Sarcov, Stephane Romagny


Consulting Engineer
date
Summary

1. ACI inventory overview


2. Double version scenario
3. Maintenance window size. Estimated time.
4. HW vs SW compatibility matrix
5. Upgrade path
6. Check list before upgrade
7. Usefull Howto
Upgrade Timeline
1. Cisco Integrated Management Controller(CIMC) and Application Policy Infrastructure Controller
(APIC)
2. Spines
3. Leaves

Upgrade sequence:

• Separate
leaves in 2
• CIMC* first if Upgrade upgrade
APIC necessary Spines spines one by Leaves groups
• APIC one • Upgrade
groups one
by one
ACI inventory normal behavior
APIC
SPINE SPINE SPINE

All the devices are in


a target version and
correctly
synchronized.
This is the normal
behavior for
production
environments

Leaf Leaf Leaf Leaf Leaf


Double version scenario 1
APIC
SPINE SPINE SPINE

All the APIC+ CIMC


were upgraded but
the rest of the
fabric is running an
old version.
This scenario is
supported but should
not be running for a
long time in
production.
Depending on
Leaf Leaf Leaf Leaf Leaf releases, it could be
some new features
on APIC which can
not be implemented
in a old switch
software.
Double version scenario 2
APIC
SPINE SPINE SPINE

All the APIC+ CIMC


and SPINES were
upgraded but the
rest of the fabric is
running an old
version.
This scenario is
supported but should
not be running for a
long time in
production.
Leaf Leaf Leaf Leaf Leaf Depending on
releases, it could be
some new features
on APIC which can
not be implemented
in a old switch
software.
ACI inventory. Double version scenario 2.
All the APIC+ CIMC
and SPINES and 1
APIC upgrade group of
SPINE SPINE SPINE the leaves were
upgraded but the
rest of the fabric is
running an old
version.
This scenario is
supported as-well
but is the least
reliable for an
production
environment.
Considering that
server redundancy is
Leaf Leaf Leaf Leaf Leaf given by the pair of
the leaves with
features like VPC for
example. It is
strongly
recommended to
keep them on the
same SW release
Double version scenario 3 (RMA use case)
All the APIC+ CIMC
APIC and SPINES/leaves
SPINE SPINE SPINE were upgraded
except one
Leaf/Spine is not
upgraded for a short
period.
This scenario has a
low risk for
production but can it
can occur
operational
limitations if there
are some issues with
the redundancy of
Leaf Leaf Leaf Leaf the fabric from
configuration point of
view
ACI upgrade – Risk assessment by type of
device upgraded for DATA PLANE
APIC+CIMC SPINES LEAVES RMA use
case APIC clusters
could be
General. Data Very Low Low High Low upgraded during
plane working hours if
forwarding risk there was no
Impact on day Medium High High Medium configuration
to day planned during
operational that timeslot. This
changes can provide more
during time to thsoot for
upgrade. operational teams
Upgrade Recommended Strongly
recommended Recommended
on MW (HNO)
ACI upgrade – Risk assessment by type of
device upgraded for HW/SW ISSUE
APIC+CIMC SPINES LEAVES RMA use
case APIC clusters
could be
HW issue risk Medium Medium Medium Low upgraded during
working hours if
there was no
SW issue Risk High Medium High N/A configuration
planned during
that timeslot. This
can provide more
Upgrade Recommended Strongly time to thsoot for
recommended Recommended operational teams
on MW (HNO)
1.ACI upgrade - average timing:
APIC Spines Leaves
General Timings 30-40 min per controller, Baby spines : 15 6-10 min per node
1h is the real time on min per node
production environment Double supp 20 Leaves max can be
taking in considerations spines: 30 min per upgraded simultaneously
pre-cheks, process, etc. supp => So 1h per
node

SSD Scenario

SSD upgrade is required sometimes , this cannot be known in advance. That means
that leaf/spines will need a second reload (double time for complete upgrade)
Understanding SSD usage leaf/spine
• Known facts:

• SSD is located on ACI fabric devices(Leaf/Spine) and APIC

• SSD has own lifetime which is depends on SSD model

• Based on the vendor and model, ACI use some thresholds to determine if SSD lifetime is exceeded or if
firmware should be upgraded. This action could be done in a separate maintenance window.

• Micron for example has a firmware update with several improvements (disk corruption, unable to read from
partition, bios unable to detect SSD)

• Deep dive Monitoring SSD is transparent for network admins, faults or action events are risen by APIC when is
the case
REX CU1 Use case:
Considering last upgrade feedbacks
• REX CU site1 with number of q - 8h of for CIMC upgrade (if
devices required**)
• 4 leaves q 8h of MW for APIC clusters on both
• 3 double supp spines sites + site1 leaves/spines
• 3 APICs q 8h of MW for site2 leaves/Spines

• REX CU site2 number of


devices :
• 77 leaves
• 3 double supp spines
• 3 APICs
REX CU2 use cases:
• CU2 site1 number of devices: Considering last upgrade
feedbacks
• 21 leaves
q - 8h of for CIMC
• 3 Baby Spines
upgrade (if required**)
• 3 APICs q 6h of MW for APIC
clusters on both sites
q 8h of MW leaves/Spines
• CU2 site2 number of devices :
both sites
• 20 leaves
• 3 Baby Spines
• 3 APICs
1. ACI upgrade - average timing:

• What could be done to reduce upgrade maintenance windows?


=> Split MWs. ACI firmware can support 2 firmware versions.

• Important Considerations:
=> Risk assessment is required and should be done by analyzing the feature
GAP between releases.
=> Check APIC upgrade/downgrade matrix (public) :
https://www.cisco.com/c/dam/en/us/td/docs/Website/datacenter/apicma
trix/index.html#cur=1.3(2)&tar=3.2(7)
HW-SW compatibility 3.2.x and above
ALL GEN of
APICs
5. Upgrade Path
Public Online Tool:
Common upgrade problems

• Cisco CX is having a detailed analysis of this issues in concordance


with TAC and TME (already done)
• Most of the known steps to “get out” from this situations if hitted have
to be done by TAC ( root account needed)
• To be done : Detailed BUG analysis for each release notes
6. Check List
Check List before the upgrade(1)
Clear All the Faults

Many faults in ACI fabric state that there are invalid or conflict policies or even disconnected interfaces etc. Please understand the trigger and clear them before
you start the upgrade. Be aware, the faults such as encap already been used or Routed port is in L2 mode could result in unexpected outage. When you
upgrade the switch, it would download all the policies from APIC from scratch. As a result, the unexpected policies may take over the expected polices which
could cause an outage.

Clear VLAN Pool Overlap

VLAN pool overlap means the same VLAN ID is part of two or more VLAN pools. If the same VLAN ID is deployed on multiple leaf switches which is part of
different VLAN pools, it would have a different VXLAN ID on these switches. Since ACI uses the VXLAN ID for forwarding, traffic destined to a particular VLAN
may end up in different VLAN or get dropped. Since the leaf downloads the configuration from APIC after its upgrade, the order in which VLAN gets deployed
has a major role. So, this could result in an outage or intermittent connectivity loss to endpoints in some VLANs

Backup APIC Configuration

Make sure to export a configuration back up to a remote server before you start the upgrade. This exported back up file can be used to get the configuration
back on APICs if you have to lose all configuration or a data corruption after the upgrade.

Confirm APIC CIMC Access

Cisco Integrated Management Controller (CIMC) is the best way to get the remote console access to the APIC. If the APIC doesn't come back up after a reboot
or the processes are stuck, you may not be able to connect to the APIC through out of band or in band management of the APIC. At this stage, you can login to
CIMC and connect to the KVM console for the APIC to perform some checks and troubleshoot the issue.
Check List before the upgrade(2)

Check and Confirm the CIMC Version Compatibility

Always make sure to run the Cisco recommended CIMC version compatible with the target ACI version, before you start the ACI upgrade. Refer
to Recommended APIC and CIMC Version

Confirm APIC Process is not Locked

The process called Appliance Element(AE) which runs in the APIC is responsible to trigger the upgrade in the APIC. There is a known bug in
CentOS Intelligent Platform Management Interface (IPMI) which could lock the AE process in APIC. If AE process is locked, the APIC firmware
upgrade will not kick in. This process queries the chassis IPMI every 10 seconds. If the AE process has not queried the chassis IPMI in the last
10 seconds, that could mean the AE process is locked.

You can check the status of AE process to know the last IPMI query. From the APIC CLI, run the command date to check the current system
time. Now run the command grep "ipmi" /var/log/dme/log/svc_ifc_ae.bin.log | tail -5 and check the last time when the AE process has
queried the IPMI. Compare the time against the system time to check if the last query was within the 10 second window of the system time.

If the AE process has failed to query the IPMI in the last 10 seconds of the system time, contact Cisco.
Check List before the upgrade(3)
Check and Confirm the NTP Availability

From each APIC, ping and confirm the reachability to the NTP server to avoid known issues due to APIC time mismatch. More details on this can
be found in the troubleshooting section of this article.

Check APIC Health State

Check and confirm the health status of the APIC in the cluster before you start the upgrade. The health score of 255 means the APIC is healthy.
Run the commandacidiag avread | grep id= | cut -d ' ' -f 9,10,20,26,46 from any APIC CLI, to check the APIC health status. If the health score
is not 255 for any APIC, don't start the upgrade.

Evaluate the Impact of New Version

Before you start the upgrade, please review the Release Notes for your target ACI version and understand any behavioural changes that are
applicable to your fabric configuration to avoid any unexpected results after the upgrade.
Check List before the upgrade(4)
Things to Do Before Switch Upgrade:

Place Virtual Port Channel (vPC) and Redundant Leaf Pairs in Different Maintenance Groups

ACI APIC has a mechanism to check and defer the upgrade of vPC pair leaf nodes from a certain version and beyond.
However, it is best practice to put vPC pair switches in different maintenance groups to avoid both the vPC switches
reboot at the same time.

This point will be discussed consequently, it makes sense to make more than 2 mainteanance groups in function of what
we have connected to the leaves (server types)
7. Usefull howto
5.1 How To know the upgrade status from CLI

APIC1# show firmware upgrade status

Pod Node Current-Firmware Target-Firmware Status Upgrade-Progress(%)


---------- ---------- -------------------- -------------------- ------------------------- --------------------
1 1 apic-3.0(0.281) apic-3.0(0.281) success 100

1 2 apic-3.0(0.281) apic-3.0(0.281) success 100

1 3 apic-3.0(0.281) apic-3.0(0.281) success 100

1 101 n9000-12.2(2.164) n9000-13.0(0.115) scheduled 0

1 102 n9000-12.2(2.164) n9000-13.0(0.115) scheduled 0

1 103 n9000-12.2(2.164) n9000-13.0(0.115) scheduled 0

1 104 n9000-12.2(2.164) n9000-13.0(0.115) scheduled


5.2 How To know Switch and APIC running version

apic1# moquery -c firmwareRunning


Total Objects shown: 5 Use moquery to get the APIC version

# firmware.Running apic1# moquery -c firmwareCtrlrRunning


biosTs : 2015-10-12T10:00:00.000+10:00 Total Objects shown: 3
biosVer : 07.41
childAction : # firmware.CtrlrRunning
descr : version 12.3(1f) [build 12.3(1f)] childAction :
dn : topology/pod-1/node-102/sys/fwstatuscont/running descr :
internalLabel : 9cb4ce2302409bdc8b41729d52f170707977511e dn : topology/pod-1/node-1/sys/ctrlrfwstatuscont/ctrlrrunning
ksFile : bootflash:aci-n9000-dk9.12.3.1f.bin internalLabel : 9cb4ce2302409bdc8b41729d52f170707977511e
modTs : never lcOwn : local
mode : normal modTs : 2017-08-16T15:52:08.634+10:00
peVer : 2.3(1f) mode : normal
rn : running rn : ctrlrrunning
status : status :
sysFile : bootflash:///auto-s tpmInUse : no
ts : 2017-07-07T17:12:00.000+10:00 ts : 2017-08-16T15:52:06.281+10:00
type : switch type : controller
version : n9000-12.3(1f) version : 2.3(1f)
...rest output ommitted... ...rest outuput ommitted...
5.3 How Does Catalog Work and Select A Catalog from CLI
Catalog contains the information about various hardware versions of the switches and the compatibility matrix between various APIC and switch firmware releases.
Catalog images are part of the APIC images and extracted while loading the image into the repository.
When upgrading from version A to B, the highest version of the catalog image of version A and version B is selected and loaded on APIC and switches.

apic1# config
apic1(config) # firmware
apic1(config-firmware) # catalog-version <catalog-image-name>

5.4 How to Add Image to Repository from CLI


Each APIC's /firmware/fwrepos/fwrepo/ directory contains the images. It is not enough to just scp the image into this directory to add an image. We should also do

apic1# firmware repository add <path/to/image>

5.5 How to confirm APIC is healthy before Upgrade


Issue the command below from the apic1, 255 means APIC is 100% healthy.

apic1# acidiag avread | grep id= | cut -d ' ' -f 9,10,20,26,46


appliance id=1 version=2.3(1f) rK=(stable,present,0X207373642D687373) health=(applnc:255
appliance id=2 version=2.3(1f) rK=(stable,present,0X207373642D687373) health=(applnc:255
appliance id=3 version=2.3(1f) rK=(stable,present,0X207373642D687373) health=(applnc:255

You might also like