Cisco Datacenter : vPC Failover Scenario

Today I am going to talk about the vPC recovery scenario in the case vPC peer link down and also there is the issue of the keep alive. The importance of understanding vPC Election process cannot be underestimated, especially in vPC recovery scenarios.

So let's take up with the scenario where we have two Nexus switches and we have typical VPC set up, Nexus-01 is the VPC Primary and Nexus-02 is the VPC Secondary. Both of them has their Sticky Bits re-set to FALSE by default.

So now Nexus-01 now has a power outage and has been isolated from the network. Nexus-02 promoted itself to vPC Primary and set vPC Sticky Bit to TRUE and Nexus-02 now becomes Operational Primary, and the sticky bit is now set to TRUE.


So when Nexus-01 comes back online after the power outage has been restored, Nexus-02 will retain the Operational PRIMARY role regardless of its role priority (because it has a TRUE sticky bit) and Nexus-02 will take the secondary role when it comes online. Only Nexus-01 will begin the VPC initialization process whereas N7K-02 will remain as Primary and be forwarding traffic as usual. Therefore, no network outage will be seen.

Fig 1.1-vPC Failover Scenario
There are two timers associated with the vPC initialization process on Nexus-01, which is now the vPC operational Secondary device:
  • delay restore SVI (10 seconds by default) 
  • delay restore (30 seconds by default)
As a result, you can expect a 40-second recovery time on Nexus-01 after Nexus-01 is re-introduced back into the network as a vPC Secondary device. However, since Nexus-02 takes the Primary role, all traffic now is passing through Nexus-01 as mentioned above, no network outage will be seen. 


However, a network outage can occur after an isolated switch is introduced back to the VPC domain if the sticky bits are not set correctly on both Nexus switches. Before an isolated switch is introduced back to the VPC domain, its sticky bit must be set to FALSE. (Procedures for replacing an N7K chassis.

Fig 1.2-vPC Failover Scenario

When the PKA and Peer Link are restored, Nexus-02 will take the Primary role regardless of its role priority (because it has a TRUE sticky bit) and force Nexus-01 to become Secondary and the VPC initialisation process will begin on Nexus-01. Therefore, link E1/1 and E1/2 of Nexus-01 will be suspended by VPC and will come online after the relay restore timers (40 seconds by default) expire.  In this case, a 40-second network outage will be seen after the PKA and Peer Link are restored.


When re-introducing a Nexus back to the vPC domain, we must ensure that there will be no vPC role change in the active vPC device. To avoid a vPC role change when the sticky bits of both switches are set to the same value, the active vPC device has to have a higher role priority for it to retain its Primary role.