5.3 ACI Upgrade
5.3 ACI Upgrade
5.3 ACI Upgrade
Upgrade sequence:
• Separate
leaves in 2
• CIMC* first if Upgrade upgrade
APIC necessary Spines spines one by Leaves groups
• APIC one • Upgrade
groups one
by one
ACI inventory normal behavior
APIC
SPINE SPINE SPINE
SSD Scenario
SSD upgrade is required sometimes , this cannot be known in advance. That means
that leaf/spines will need a second reload (double time for complete upgrade)
Understanding SSD usage leaf/spine
• Known facts:
• Based on the vendor and model, ACI use some thresholds to determine if SSD lifetime is exceeded or if
firmware should be upgraded. This action could be done in a separate maintenance window.
• Micron for example has a firmware update with several improvements (disk corruption, unable to read from
partition, bios unable to detect SSD)
• Deep dive Monitoring SSD is transparent for network admins, faults or action events are risen by APIC when is
the case
REX CU1 Use case:
Considering last upgrade feedbacks
• REX CU site1 with number of q - 8h of for CIMC upgrade (if
devices required**)
• 4 leaves q 8h of MW for APIC clusters on both
• 3 double supp spines sites + site1 leaves/spines
• 3 APICs q 8h of MW for site2 leaves/Spines
• Important Considerations:
=> Risk assessment is required and should be done by analyzing the feature
GAP between releases.
=> Check APIC upgrade/downgrade matrix (public) :
https://www.cisco.com/c/dam/en/us/td/docs/Website/datacenter/apicma
trix/index.html#cur=1.3(2)&tar=3.2(7)
HW-SW compatibility 3.2.x and above
ALL GEN of
APICs
5. Upgrade Path
Public Online Tool:
Common upgrade problems
Many faults in ACI fabric state that there are invalid or conflict policies or even disconnected interfaces etc. Please understand the trigger and clear them before
you start the upgrade. Be aware, the faults such as encap already been used or Routed port is in L2 mode could result in unexpected outage. When you
upgrade the switch, it would download all the policies from APIC from scratch. As a result, the unexpected policies may take over the expected polices which
could cause an outage.
VLAN pool overlap means the same VLAN ID is part of two or more VLAN pools. If the same VLAN ID is deployed on multiple leaf switches which is part of
different VLAN pools, it would have a different VXLAN ID on these switches. Since ACI uses the VXLAN ID for forwarding, traffic destined to a particular VLAN
may end up in different VLAN or get dropped. Since the leaf downloads the configuration from APIC after its upgrade, the order in which VLAN gets deployed
has a major role. So, this could result in an outage or intermittent connectivity loss to endpoints in some VLANs
Make sure to export a configuration back up to a remote server before you start the upgrade. This exported back up file can be used to get the configuration
back on APICs if you have to lose all configuration or a data corruption after the upgrade.
Cisco Integrated Management Controller (CIMC) is the best way to get the remote console access to the APIC. If the APIC doesn't come back up after a reboot
or the processes are stuck, you may not be able to connect to the APIC through out of band or in band management of the APIC. At this stage, you can login to
CIMC and connect to the KVM console for the APIC to perform some checks and troubleshoot the issue.
Check List before the upgrade(2)
Always make sure to run the Cisco recommended CIMC version compatible with the target ACI version, before you start the ACI upgrade. Refer
to Recommended APIC and CIMC Version
The process called Appliance Element(AE) which runs in the APIC is responsible to trigger the upgrade in the APIC. There is a known bug in
CentOS Intelligent Platform Management Interface (IPMI) which could lock the AE process in APIC. If AE process is locked, the APIC firmware
upgrade will not kick in. This process queries the chassis IPMI every 10 seconds. If the AE process has not queried the chassis IPMI in the last
10 seconds, that could mean the AE process is locked.
You can check the status of AE process to know the last IPMI query. From the APIC CLI, run the command date to check the current system
time. Now run the command grep "ipmi" /var/log/dme/log/svc_ifc_ae.bin.log | tail -5 and check the last time when the AE process has
queried the IPMI. Compare the time against the system time to check if the last query was within the 10 second window of the system time.
If the AE process has failed to query the IPMI in the last 10 seconds of the system time, contact Cisco.
Check List before the upgrade(3)
Check and Confirm the NTP Availability
From each APIC, ping and confirm the reachability to the NTP server to avoid known issues due to APIC time mismatch. More details on this can
be found in the troubleshooting section of this article.
Check and confirm the health status of the APIC in the cluster before you start the upgrade. The health score of 255 means the APIC is healthy.
Run the commandacidiag avread | grep id= | cut -d ' ' -f 9,10,20,26,46 from any APIC CLI, to check the APIC health status. If the health score
is not 255 for any APIC, don't start the upgrade.
Before you start the upgrade, please review the Release Notes for your target ACI version and understand any behavioural changes that are
applicable to your fabric configuration to avoid any unexpected results after the upgrade.
Check List before the upgrade(4)
Things to Do Before Switch Upgrade:
Place Virtual Port Channel (vPC) and Redundant Leaf Pairs in Different Maintenance Groups
ACI APIC has a mechanism to check and defer the upgrade of vPC pair leaf nodes from a certain version and beyond.
However, it is best practice to put vPC pair switches in different maintenance groups to avoid both the vPC switches
reboot at the same time.
This point will be discussed consequently, it makes sense to make more than 2 mainteanance groups in function of what
we have connected to the leaves (server types)
7. Usefull howto
5.1 How To know the upgrade status from CLI
apic1# config
apic1(config) # firmware
apic1(config-firmware) # catalog-version <catalog-image-name>