Redundant links with Virtual PortChannels

Traditional spanning tree offers challenges to network designers as it will block every redundant path between two Ethernet switches.  The drawbacks of STP ( spanning tree protocol ) prove to be extremely expensive in data centers when you have multiple redundant links for mission critical applications, essentially wasting 50% of your bandwidth.

You can use Etherchannels to scale bandwidth at Layer 2 as the bundled links appear as one to higher-level protocols, resulting in all ports in the channel to forward or block for a certain VLAN.  You should aim to design all links in a data center to be an Etherchannel as this will optimize your bandwidth and reliability.  Servers can be attached to the access switches as Etherchannels, uplinks from the access can be Ethernchannel and the core links can also be bundled ( A channel can also be a Layer 3 link ).  The majority of switches can support 8 ports in a bundle and Nexus platforms can support up to 16 ports.  It is recommended that you create an Etherchannel with ports from different line cards in each switch.  This will prevent failure of a single line card from affecting the entire channel.

Link aggregation ( Etherchannel and IEEE 802.3ad ) was developed to address that limitation where two Ethernet Switches were connected through multiple up-links. However, this did not address the challenges in the data center environment for the deployment  of link aggregation on trianglular topologies.  Traditional LAG ( link aggregation ) has limitation in that its standard would only allow the aggregated links to terminate on a single switch. The introduction of technologies such as vPC ( Virtual PortChannels ) and VSS are implemented to over come this limitation.  As part of the IEEE 802.3ad standards, Link Aggregation Control Protocol ( LACP ) was created to conduct the negotiation of the channel and its recommended to use this feature when building a bundle.  LACP modes can be either active or passive.   Active mode means that the switch is actively negotiating the channel, whereas passive means the port does not initiate a LACP negotiation. You can form channels between active and passive ports or active and active ports but not passive and passive ports.

Link Aggregation

 

When a layer 2 frame is forwarding to a PortChannel a hash function is performed to determine which physical links to forward the frame.  The load balancing method use for Nexus switches is granular and include the following:

a) Destination IP address

b) Destination MAC address

c) Destination TCP and UDP port number

d) Source and Destination IP address

e) Source and Destination MAC address

f) Source and Destination TCP and UDP port numbers

g) Source IP address

h) Source MAC address

i) Source TCP and UDP port number

It’s important to monitor the traffic distribution over each of the physical links to detect polarized links. The polarization effect occurs if some of the links attract more traffic than others resulting in heavy utilization of some links and low utilization of others.  Before you choose the load balancing method, analyse the traffic flows from source to destination and determine if traffic flow is one to many or evenly spread.  I would not use the source IP address load balancing method to load balance traffic originating from a Firewall that is deploying Network Address Translation ( NAT ) to a single device.

Note: Routing protocols see the channel as one link, so if you have 8 x 10 ports in one bundle and that bundle has a OSPF cost of 10 and a failure occurs and you lose a member of that channel, the OSPF will still mark that link with the same metric.  Routing protocols don’t dynamically change their metrics due to a member link failure.

As mentioned previously, the quandary of the inability to build triangles with link aggregation can be mitigated by deploying either the Nexus technology known as virtual Port Channels ( vPC’s ) or the Catalyst technology know as Virtual Switching System ( VSS ).  Both VSS and vPC allow the termination of a LAG on two separate switches resulting in a triangular design.  They allow the grouping of two physical switches to form a single logical switch to any downstream device ( switch or server ).

VSS and VPC

vPC and VSS offers the following benefits:

a) Improved convergence with link and device failure.

b) Eliminate the need for STP.

c) Independent control planes ** Not with the VSS.

d) Increased bandwidth but combining all links to one from the perspective of STP.

vPC and VSS are similar technologies but the Nexus vPC feature has dual control planes and it offers In Service Upgrade ( ISSU ), which allows upgrading one of the two switches without  causing any service interruption.  Because the control plane runs independently on each of the vPC peers, failure of one peer does not affect the virtual switch.  With the VSS,  the active peer going down brings down the entire system because of the lack of dual control planes.  It should be worth noting that vPC does actually fall back to STP and the reliance to STP can only be fully circumvented if you use Cisco’s Fabric Path or THRILL.  The VSS is available on the Catalyst platforms while vPC is solely a Nexus technology.

PortChannel

vPC Terminology:

vPC Peer – a vPC switch, one of a pair.

vPC member port – one of a set of ports that form a vPC.

vPC – the combined port channel between the vPC peers and the downstream device.

vPC peer-link – link used to synchronize state between vPC peer devices, must be 10GbE.  The vPC-related control plane communications occur over this link and any Ethernet frames transported receive special treatment to avoid loops in the vPC member ports.

vPC peer keep-alive link – the keepalive link between the vPC peer devices. Recommended to use the MGMT0 interface and a VRF instance.  If the mgmt interface is not available , then a routed interfaces in the mgmt VRF.

vPC VLAN – one of the VLANs carried over the vPC peer link and used to communicate via the vPC with a peer device.

non-vPC peer VLAN – One of the STP VLANs not carried over the peer-link.

CFS – Cisco Fabric Service Protocol, used for state synchronization and configuration validation between vPC peer devices.

Within a vPC domain, each pair is assigned to a role: primary or secondary and by default the switch with the lowest MAC address becomes the primary peer.  The domain identifies the pair of switches and they both generate a shared MAC address that can be used as a logical switch bridge ID in STP communication.

Below are the best practices to consider for implementation:

a) Manually define which VPC switch is primary and secondary.  Lower the priority, the more preferred switch will act as the primary.

b) Form Layer 2 port channels using different 10GE modules on the Nexus switch for the VPC peer-link with ports in dedicated mode.

c) Form Layer 2 port channels using different 10GE modules on the Nexus switch for the VPC peer keepalive link ( non-default VRF ).

d) Enable Bridge Assurance ( BA ) on the VPC peer-link interface ( default ).

e) Enable UDLD aggressive on the VPC peer-link interface.

f) Configure the STP root bridge for a VLAN, the active HSRP router, and the PIM DR router on the primary vPC switch. Likewise , configure the secondary STP too and the standby HSRP router on the secondary VPC switch.  The Layer 2 and the Layer 3 topologies should match.

If you want to add even more redundancy with vPC you can use it with Fabric Extenders.  Fabric Extenders act as a remote line card to a parent switch and can be used with vPC in three forms.  The first is known as host vPC  and is a vPC southbound from the FEX to the server; the second is a vPC northbound from the FEX to the parent switch, sometimes called a Fabric vPC; and third is both a southbound and northbound vPC from the FEX which is known as Enhanced vPC.

Because vPC characterizes a single connection from the perspective of higher level protocols e.g STP or OSPF it can be used as a Layer 2 extension for a DCI ( data center interconnect ) over short distances and over dark fiber or protected DWDM only.  vPC best practices still apply and it is recommended that you use different vPC domains for each site and that Layer 3 communication between vPC peers is performed on dedicated routed links.  If you want to connect more than two data centers with a full mesh topology you should opt for OTV or VPLS as the DCI mechanism.  vPC can work with two or more data centers but you need to design the topology as a hub and spoke and any spoke to spoke communication must flow through the hub.  Whether you are connecting two data centers back to back or two or more in a hub and spoke design the  layer 2 boundary and STP isolation can be achieved with bridge protocol unit ( BPDU ) filtering on the DCI links.  BPDU filtering avoids transmitting BPDU’s on a link, essentially turning off STP on the DCI links.

VPC as a Data Center Interconnect ( DCI )

VPC as a Data Center Interconnect ( DCI )

vPC has a built in loop prevention mechanism; never forward a frame received through a peer link to a vPC member port.  Under normal operations a vPC peer switch should never learn MAC addresses over the peer link and is mostly used for flooding, multicast, broadcast, and control plane traffic. This is due to the fact that the LAG is terminated on two peer switches and you don’t want to send traffic received from a single downstream device back down to the same downstream device resulting in a loop.  However, this rule does not apply to:

a) Non-vPC interface ( orphan port ) and

b) vPC member ports that are only active in the receiving pair.

Note: An orphan ports in a port to a downstream device that is connected to only one peer.

VPC Broadcast Frame Forwarding

VPC Broadcast Frame Forwarding

As mentioned previously the vPC peer link should under normal operations not used for end host reachability.  However, if there is a failure on all members of a vPC in a single peer the peer link will forward frames to the remaining member ports of the vPC. This explains why Cisco have recommended to use the same 10G for the peer link.

The peer keep alive link is also mandatory and is used as a heart beat mechanism to transport UDP datagrams between peers.  This avoids a dual-active / split-brain scenario where both peers are active at the same time.  If no heartbeat is received after a configurable timeout, the secondary vPC peer takes the role as the primary peer and all its member ports remain active.  There is however undesirable behavior if you have an orphan-port connected to only one peer.  With a vPC peer link failure, the orphan ports remain active in the secondary peer, even though they are now isolated from the rest of the network.  In this case, it is recommended to configure a non-vPC trunk between peer switches.

 

 

 

 

 

 

 

About Matt Conran

Matt Conran has created 184 entries.