Cisco Nexus Training – Go from Beginner to Advanced!
VDC, VPC, OTV, FRX, and many more…

vPC failure scenarios are sometimes destructive. However, if you have good understanding on vPC and you follow Cisco recommended vPC design, then you can handle Virtual Port Channel (vPC) failure scenarios with confidence. In this lesson, i will be discussing different vPC Failure Scenarios, it’s impact on the network and how to solve the problem with Cisco recommended way.

vPC Design:

Depending on your requirement, vPC can be design as one-sided (regular) vPC, double sided or Multilayer (DCI) vPC. You can use Cisco guide for vPC design from here.

Cisco Nexus vPC

If you are new to vPC configuration, below articles are recommended for you from this blog.

How to configure Cisco Nexus vPC
How to configure Double-Sided vPC in Cisco Nexus

vPC Failure Scenarios

I will be discussing based on following failures.

  • vPC keep-alive link failure
  • vPC peer-link failure
  • Member port failure
  • vPC Peer switch failure
  • Dual Failure Scenarios
    ++ Case 1: Peer-link failure, followed by Keep-alive link
    ++ Case 2: Keep-alive link failure, followed by Peer-link

vPC keep-alive link failure:

Impact:
If only keep-alive fails, nothing will happen. Only heartbeat between Primary and Secondary node will be lost.

vPC Failure Scenarios-1-Keep-alive-fails

Solution:
Restore the link as early as possible to avoid further complication if double failure happens.

vPC peer-link failure

Impact:
If peer-link fails, then all the member ports from vPC secondary node will be suspended. Here important to note, keep-alive is active in this scenario, which allowing nodes to exchange heartbeat.

vPC Failure Scenarios-2-Peer-link-fails

Solution:
Make sure peer-link is UP and running.

Recommendation:
The peer-link is important, so it’s really good idea to create port channel with multiple ports from multiple modules, so that, if a link/module goes faulty, other links remain up and active.

Member port failure

Impact:
If the member port fails for a particular end host, that host only will be affected. All other members will still be operational. In case of one link fails, then traffic will be through another interface. If both fails, then full outage for that end host.

vPC Failure Scenarios-3-Member-port-fails

Solution:
Make sure members are up and running.

vPC Peer switch failure

Impact:
In case of Primary switch failure in vPC, secondary switch will be promoted as operational primary and forward all the traffic. If secondary switch fails, primary will keep forwarding traffic like earlier.

vPC Failure Scenarios-4-Peer switch down

Solution:
Bring the peer switch UP. Then, make sure the keep-alive is UP and make sure it’s operational. And, then move to peer-link and lastly, the member ports.

Dual Failure Scenarios

In dual failure scenarios, we will be discussing below failure cases.

1. Case: Peer-link failure, followed by Keep-alive link
2. Case: Keep-alive link failure, followed by Peer-link

Case 1: Peer-link failure, followed by Keep-alive link
Here, the member port will be suspended first due to peer-link down, but the heartbeat is there through keep-alive link. Traffic will flow through the primary peer switch. Now, if keep-alive fails, the suspended ports will remain suspended and all the traffic keeps flowing through primary node.

Double failure vPC Scenario

Solution:
Just bring the keep-alive link first and then work with peer-link. You should maintain this order.

Case 2: Keep-alive link failure, followed by Peer-link
This failure is most critical. If keep-alive link fail first, nothing will happen due to vPC peer roles are already decided. However, if peer-link dies after the keep-alive, secondary vPC node will start thinking that, the primary node are completely down because of no heartbeat from Primary node. So, secondary node will become operational primary. In this case, both vPC nodes will forward the traffic. This type of scenario called split brain scenario in vPC.

Double failure vPC Scenario

Solution:
Make all the member nodes from secondary switch are down. Then, bring the keep-alive link. After restoring heartbeat (keep-alive), make the peer-link up and running. If vPC form, then up the member ports.

Written by Rajib Kumer Das

I am Rajib Kumer Das, a network engineer with 8+ years of experience in multi-vendor environment. In my current position, I am responsible to take care critical projects and it's support cases. I do have several vendor certificates and have plans to go further.

This article has 17 comments

  1. Freddy Reply

    Dude great stuff, you kill it.
    Can you give me a spanning-tree best practice? I’m very new when it comes to Data Center stuff but I’m catching up really fast.

  2. Ashutosh Malik Reply

    Nicely explained , I read several docs but this is best.
    Thanks

  3. uger Reply

    thanks for article. I understand the the order of fails is quite important, but why is the order of bringing of ports back to up important as well? say the primary switch reloaded and came back with all ports down. what happens if I bring up the peer-link first then keep-alive?

      • uger Reply

        I labbed it, I think it doesn’t make a big difference. I shutdown keep-alive and followed-by peer-link. I saw both switches as primary. and I enabled keep-alive only they were still in split-brain state, they were both primary until I enabled peer-link too.

        vice versa also seems to end up with same results. I put them in split brain state, and I enabled peer-link first, their roles didn’t change until I enabled keep-alive. I think role change doesn’t happen until both of keep-alive and peer-link come up.

        this is output for keep-alive up, peer-link down state:

        switch1:

        vPC domain id : 70
        Peer status : peer link is down
        vPC keep-alive status : peer is alive
        vPC role : primary

        switch2:

        vPC domain id : 70
        Peer status : peer link is down
        vPC keep-alive status : peer is alive
        vPC role : secondary, operational primary
        Number of vPCs configured : 2

  4. Arumugam Reply

    Really very nice and live scenario basis issues explained clearly with diagram.

  5. Farzana Reply

    If the keep alive link goes down,then how Peer know live status of other peer ?
    Does heartbeat flow through the peerlink ?

  6. Aswath Reply

    If keep-alive went down and we un-noticed it… What will be impact if keep alive link not brought up.

Leave a Comment

Your email address will not be published. Required fields are marked *