vPC Failure Scenarios

vPC failure scenarios are sometimes destructive. However, if you have good understanding on vPC and you follow Cisco recommended vPC design, then you can handle Virtual Port Channel (vPC) failure scenarios with confidence. In this lesson, i will be discussing different vPC Failure Scenarios, it’s impact on the network and how to solve the problem with Cisco recommended way.

vPC Design:

Depending on your requirement, vPC can be design as one-sided (regular) vPC, double sided or Multilayer (DCI) vPC. You can use Cisco guide for vPC design from here.

If you are new to vPC configuration, below articles are recommended for you from this blog.

I will be discussing based on following failures.

vPC keep-alive link failure
vPC peer-link failure
Member port failure
vPC Peer switch failure
Dual Failure Scenarios
++ Case 1: Peer-link failure, followed by Keep-alive link
++ Case 2: Keep-alive link failure, followed by Peer-link

vPC keep-alive link failure:

Impact:
If only keep-alive fails, nothing will happen. Only heartbeat between Primary and Secondary node will be lost.

Solution:
Restore the link as early as possible to avoid further complication if double failure happens.

vPC peer-link failure

Impact:
If peer-link fails, then all the member ports from vPC secondary node will be suspended. Here important to note, keep-alive is active in this scenario, which allowing nodes to exchange heartbeat.

Member port failure

Impact:
If the member port fails for a particular end host, that host only will be affected. All other members will still be operational. In case of one link fails, then traffic will be through another interface. If both fails, then full outage for that end host.

Solution:
Make sure members are up and running.

vPC Peer switch failure

Impact:
In case of Primary switch failure in vPC, secondary switch will be promoted as operational primary and forward all the traffic. If secondary switch fails, primary will keep forwarding traffic like earlier.

Solution:
Bring the peer switch UP. Then, make sure the keep-alive is UP and make sure it’s operational. And, then move to peer-link and lastly, the member ports.

Dual Failure Scenarios

In dual failure scenarios, we will be discussing below failure cases.

1. Case: Peer-link failure, followed by Keep-alive link
2. Case: Keep-alive link failure, followed by Peer-link

Case 1: Peer-link failure, followed by Keep-alive link
Here, the member port will be suspended first due to peer-link down, but the heartbeat is there through keep-alive link. Traffic will flow through the primary peer switch. Now, if keep-alive fails, the suspended ports will remain suspended and all the traffic keeps flowing through primary node.

Solution:
Just bring the keep-alive link first and then work with peer-link. You should maintain this order.

Case 2: Keep-alive link failure, followed by Peer-link
This failure is most critical. If keep-alive link fail first, nothing will happen due to vPC peer roles are already decided. However, if peer-link dies after the keep-alive, secondary vPC node will start thinking that, the primary node are completely down because of no heartbeat from Primary node. So, secondary node will become operational primary. In this case, both vPC nodes will forward the traffic. This type of scenario called split brain scenario in vPC.

Solution:
Make all the member nodes from secondary switch are down. Then, bring the keep-alive link. After restoring heartbeat (keep-alive), make the peer-link up and running. If vPC form, then up the member ports.

Aswath

August 12, 2021 at 2:46 am

If keep-alive went down and we un-noticed it… What will be impact if keep alive link not brought up.

Rajib Kumer Das
August 19, 2021 at 11:01 pm
I already discussed, what will happen if keep-alive goes down.

Farzana

July 31, 2021 at 12:16 am

If the keep alive link goes down,then how Peer know live status of other peer ?
Does heartbeat flow through the peerlink ?

Rajib Kumer Das
August 11, 2021 at 4:19 pm
Hi Farzana, heartbeat only flow through keep-alive.

Arumugam

June 12, 2021 at 9:19 am

Really very nice and live scenario basis issues explained clearly with diagram.

Rajib Kumer Das
June 12, 2021 at 12:06 pm
Thanks Arumugam..

uger

January 12, 2021 at 7:51 pm

thanks for article. I understand the the order of fails is quite important, but why is the order of bringing of ports back to up important as well? say the primary switch reloaded and came back with all ports down. what happens if I bring up the peer-link first then keep-alive?

Rajib Kumer Das
January 12, 2021 at 8:00 pm
If you bring up the peer-link first, how nodes will decide for primary and secondary?
1. uger
  January 13, 2021 at 12:00 am
  I labbed it, I think it doesn’t make a big difference. I shutdown keep-alive and followed-by peer-link. I saw both switches as primary. and I enabled keep-alive only they were still in split-brain state, they were both primary until I enabled peer-link too.
  vice versa also seems to end up with same results. I put them in split brain state, and I enabled peer-link first, their roles didn’t change until I enabled keep-alive. I think role change doesn’t happen until both of keep-alive and peer-link come up.
  this is output for keep-alive up, peer-link down state:
  switch1:
  vPC domain id : 70
  Peer status : peer link is down
  vPC keep-alive status : peer is alive
  vPC role : primary
  switch2:
  vPC domain id : 70
  Peer status : peer link is down
  vPC keep-alive status : peer is alive
  vPC role : secondary, operational primary
  Number of vPCs configured : 2

thil@gmail.com

November 6, 2020 at 3:07 pm

great

Rajib Kumer Das
November 6, 2020 at 6:46 pm
thanks 🙂

Ashutosh Malik

June 25, 2020 at 2:01 pm

Nicely explained , I read several docs but this is best.
Thanks

Rajib Kumer Das
June 25, 2020 at 2:24 pm
Glad to know that. Thanks Malik.

Freddy

June 22, 2020 at 7:13 am

Dude great stuff, you kill it.
Can you give me a spanning-tree best practice? I’m very new when it comes to Data Center stuff but I’m catching up really fast.

Rajib Kumer Das
June 22, 2020 at 6:28 pm
Hi Freddy, i will publish articles on spanning-tree soon. Thanks for your comment.

vetri C

June 4, 2020 at 1:08 pm

Well write up buddy.. Kudos..

Rajib Kumer Das
June 4, 2020 at 1:24 pm
Thank you so much.

vPC Failure Scenarios – Impact and Solution

vPC Design: