container based virtualization

Cisco Switch Virtualization Nexus 1000v

Cisco Switch Virtualization Nexus 1000v

Virtualization has become integral to modern data centers in today's digital landscape. With the increasing demand for agility, flexibility, and scalability, organizations are turning to virtual networking solutions to meet their evolving needs. One such solution is the Nexus 1000v, a virtual network switch offering comprehensive features and functionalities. In this blog post, we will delve into the world of the Nexus 1000v, exploring its key features, benefits, and use cases.

The Nexus 1000v is a distributed virtual switch that operates at the hypervisor level, providing advanced networking capabilities for virtual machines (VMs). It is designed to integrate seamlessly with VMware vSphere, offering enhanced network visibility, control, and security.

Cisco Switch Virtualization is a revolutionary concept that allows network administrators to create multiple virtual switches on a single physical switch. By abstracting the network functions from the hardware, it provides enhanced flexibility, scalability, and efficiency. With Cisco Switch Virtualization, businesses can maximize resource utilization and simplify network management.

At the forefront of Cisco's Switch Virtualization portfolio is the Nexus 1000v. This powerful platform brings the benefits of virtualization to the data center, enabling seamless integration between virtual and physical networks. By extending Cisco's renowned networking capabilities into the virtual environment, Nexus 1000v empowers organizations to achieve consistent policy enforcement, enhanced security, and simplified operations.

The Nexus 1000v boasts a wide range of features that make it a compelling choice for network administrators. From advanced network segmentation and traffic isolation to granular policy control and deep visibility, this platform has it all. By leveraging the power of Cisco's Virtual Network Services (VNS), organizations can optimize their network infrastructure, streamline operations, and deliver superior performance.

Deploying Cisco Switch Virtualization, specifically the Nexus 1000v, requires careful planning and consideration. Organizations must evaluate their network requirements, ensure compatibility with existing infrastructure, and adhere to best practices. From designing a scalable architecture to implementing proper security measures, attention to detail is crucial to achieve a successful deployment.

To truly understand the impact of Cisco Switch Virtualization, it's essential to explore real-world use cases and success stories. From large enterprises to service providers, organizations across various industries have leveraged the power of Nexus 1000v to transform their networks. This section will highlight a few compelling examples, showcasing the versatility and value that Cisco Switch Virtualization brings to the table.

Highlights: Cisco Switch Virtualization Nexus 1000v

Hypervisor and vSphere Introduction

An operating system can run multiple operating systems on a single hardware host using a hypervisor, also known as a virtual machine manager. Operating systems use the host’s processor, memory, and other resources. Hypervisors control the host processor, memory, and other resources and allocate what each operating system needs. Hypervisors run guest operating systems or virtual machines on top of them.

Designed specifically for integration with VMware vSphere environments, the Cisco Nexus 1000V Series Switch runs Cisco NX-OS software. Enterprise-class performance, scalability, and scalability are delivered by VMware vSphere 2.0 across multiple platforms. Within the VMware ESX hypervisor, the Nexus 1000V runs. With the Cisco Nexus 1000V Series, you can take advantage of Cisco VN-Link server virtualization technology

• Policy-based virtual machine (VM) connectivity

• Mobile VM security

• Network policy

Nondisruptive operational model for your server virtualization and networking teams

As with physical servers, virtual servers can be configured with the same network configuration, security policy, diagnostic tools, and operational models as physical servers. The Cisco Nexus 1000V Series is also compatible with VMware vSphere, vCenter, ESX, and ESXi.

A brief overview of the Nexus 1000V system

There are two primary components of the Cisco Nexus 1000V Series switch:

  • VEM (Virtual Ethernet Module): Executes inside hypervisors
  • VSM (External Virtual Supervisor Module): Manages VEMs

Nexus 1000v implements a generic concept of Cisco Distributed Virtual Switch (DVS). VMware ESX or ESXi executes the Cisco Nexus 1000V Virtual Ethernet Module (VEM). The VEM’s application programming interface (API) is VMware vNetwork Distributed Switch (vDS).

By integrating the API with VMware VMotion and Distributed Resource Scheduler (DRS), advanced networking capabilities can be provided to virtual machines. In the VEM, Layer 2 switching and advanced networking functions are performed based on configuration information from the VSM:

Nexus Switch Virtualization

**Virtual routing and forwarding**

Virtual routing and forwarding form the basis of this stack. Firstly, network virtualization comes with two primary methods: 1) One too many and 2) Many to one.  The “one too many” network virtualization method means you segment one physical network into multiple logical segments. Conversely, the “many to one” network virtualization method consolidates numerous physical devices into one logical entity. By definition, they seem to be opposites, but they fall under the same umbrella in network virtualization.

Before you proceed, you may find the following posts helpful:

  1. Container Based Virtualization
  2. Virtual Switch
  3. What is VXLAN
  4. Redundant Links
  5. WAN Virtualization
  6. What Is FabricPath

Network virtualization

Before we get stuck in Cisco virtualization, let us address some basics. For example, if you have multiple virtual endpoints share a physical network. Still, different virtual endpoints belong to various customers, and the communication between these endpoints also needs to be isolated. In other words, the network is a resource, too, and network virtualization is the technology that enables the sharing of a standard physical network infrastructure.

Virtualization uses software to simulate traditional hardware platforms and create virtual software-based systems. For example, virtualization allows specialists to construct a single virtual network or partition a physical network into multiple virtual networks.

Cisco Switch Virtualization: Logical segmentation: One too many

We have one-to-many network virtualization for the Cisco switch virtualization design; a single physical network is logically segmented into multiple virtual networks. For example, each virtual network could correspond to a user group or a specific security function.

End-to-end path isolation requires the virtualization of networking devices and their interconnecting links. VLANs have been traditionally used, and hosts from one user group are mapped to a single VLAN. To extend the path across multiple switches at Layer 2, VLAN tagging (802.1Q) can carry VLAN information between switches. These VLAN trunks were created to transport multiple VLANs over a single Ethernet interface.

The diagram below displays two independent VLANs, VLAN201 and VLAN101. These VLANs can share one physical wire to provide L2 reachability between hosts connected to Switch B and Switch A via Switch C, but they remain separate entities.

Nexus1000v
Nexus1000v: The operation

VLANs are sufficient for small Layer 2 segments. However, today’s networks will likely have a mix of Layer 2 and 3 routed networks. In this case, Layer 2 VLANs alone are insufficient because you must extend the Layer 2 isolation over a Layer 3 device. This can be achieved by using Virtual Routing and Forwarding ( VRF ), the next step in the Cisco switch virtualization. A virtual routing and forwarding instance logically carves a Layer 3 device into several isolated independent L3 devices. The virtual routing and forwarding configured locally cannot communicate directly.

The diagram below displays one physical Layer 3 router with three VRFs: VRF Yellow, VRF Red, and VRF Blue. These virtual routing and forwarding instances are completely separated; without explicit configuration, routes in one virtual routing and forwarding instance cannot be leaked to another.

Virtual Routing and Forwarding

virtual routing and forwarding

The virtualization of the interconnecting links depends on how the virtual routers are connected. If they are physically ( directly ) connected, you could use a technology known as VRF-lite to separate traffic and 802.1Q to label the data plane. This is known as hop-by-hop virtualization.

However, it’s possible to run into scalability issues when the number of devices grows. This design is typically used when you connect virtual routing and forwarding back to back, i.e., no more than two devices.

When the virtual routers are connected over multiple hops through an IP cloud, you can use generic routing encapsulation ( GRE ) or Multiprotocol Label Switching ( MPLS ) virtual private networks.

GRE is probably the simpler of the Layer 3 methods, and it can work over any IP core. GRE can encapsulate the contents and transport them over a network with the network unaware of the packet contents. Instead, the core will see the GRE header, virtualizing the network path.

Cisco Switch Virtualization: The additional overhead

When designing Cisco switch virtualization, you need to consider the additional overhead. There are a further 24 bytes overhead for the GRE header, so it may be the case that the forwarding router may break the datagram into two fragments, so the packet may not be larger than the outgoing interface MTU. To resolve the fragmentation issue, you can correctly configure MTU, MSS, and Path MTU parameters on the outgoing and intermediate routers.

The GRE standard is typically static. You only need to configure tunnel endpoints, and the tunnel will be up as long as you can reach those endpoints. However, recent designs can establish a dynamic GRE tunnel.

GRE over IPsec

MPLS/VPN, on the other hand, is a different beast. It requires signaling to distribute labels and build an end-to-end Label Switched Path ( LSP ). The label distribution can be done with BGP+label, LDP, and RSVP. Unlike GRE tunnels, MPLS VPNs do not have to manage multiple point-to-point tunnels to provide a full mesh of connectivity. Instead, they are used for connectivity, and packets’ labels provide traffic separation.

Cisco switch virtualization: Many to one

Many-to-one network consolidation refers to grouping two or more physical devices into one. Examples of this Cisco switch virtualization technology include a Virtual Switching System ( VSS ), Stackable switches, and Nexus VPC. Combining many physicals into one logical entity allows STP to view the logical group as one, allowing all ports to be active. By default, STP will block the redundant path.

Software-defined networking takes this concept further; it completely abstracts the entire network into a single virtual switch. The control and data planes are on the same device on traditional routers, yet they are decoupled with SDN. The control plan is now on a policy-driven controller, and the data plane is local on the OpenFlow-enabled switch.

Network Virtualization

Server and network virtualization presented the challenge of multiple VMs sharing a single network physical port, such as a network interface controller ( NIC ). The question then arises: how do I link multiple VMs to the same uplink? How do I provide path separation? Today’s networks need to virtualize the physical port and allow the configuration of policies per port.

Nexus 1000

NIC-per-VM design

One way to do this is to have a NIC-per-VM design where each VM is assigned a single physical NIC, and the NIC is not shared with any other VM. The hypervisor, aka virtualization layer, would be bypassed, and the VM would access the I/O device directly. This is known as VMDirectPath.

This direct path or pass-through can improve performance for hosts that utilize high-speed I/O devices, such as 10 Gigabit Ethernet. However, the lack of flexibility and the ability to move VMs offset higher performance benefits.  

Virtual-NIC-per-VM in Cisco UCS (Adapter FEX)

Another way to do this is to create multiple logical NICs on the same physical NIC, such as Virtual-NIC-per-VM in Cisco UCS (Adapter FEX). These logical NICs are assigned directly to VMs, and traffic gets marked with a vNIC-specific tag on the hardware (VN-Tag/802.1ah).

The actual VN-Tag tagging is implemented in the server NICs so that you can clone the physical NIC in the server to multiple virtual NICs. This technology provides faster switching and enables you to apply a rich set of management features to local and remote traffic.

Software Virtual Switch

The third option is to implement a virtual software switch in the hypervisor. For example, VMware introduced virtual switching compatibility with its vSphere ( ESXi ) hypervisor, called vSphere Distributed Switch ( VDS ). Initially, they introduced a local L2 software switch, which was soon phased out due to a lack of distributed architecture.

Data physically moves between the servers through the external network, but the control plane abstracts this movement to look like one large distributed switch spanning multiple servers. This approach has a single management and configuration point, similar to stackable switches – one control plane with many physical data forwarding paths.

The data does not move through a parent partition but logically connects directly to the network interface through local vNICs associated with each VM.

Network virtualization and Nexus 1000v ( Nexus 1000 )

The VDS introduced by VMware lacked any good networking features, which led Cisco to introduce the Nexus 1000V software-based switch. The Nexus 1000v is a multi-cloud, multi-hypervisor, and multi-services distributed virtual switch. Its function is to enable communication between VMs.

Nexus1000v
Nexus1000v: Virtual Distributed Switch.

**Nexus 1000 components: VEM and VSM**

The Nexus 1000v has two essential components:

  1. The Virtual Supervisor Module ( VSM )
  2. The Virtual Ethernet Module ( VEM ).

Compared to a physical switch, the VSM could be viewed as the supervisor, setting up the control plane functions for the data plane to forward efficiently, and the VEM as the physical line cards that do all the packet forwarding. The VEM is the software component that runs within the hypervisor kernel. It handles all VM traffic, including inter-VM frames and Ethernet traffic between a VM and external resources.

The VSM runs its NX-OS code and controls the control and management planes, which integrate into a cloud manager, such as a VMware vCenter. You can have two VSMs for redundancy. Both modules remain constantly synchronized with unicast VSM-to-VSM heartbeats to provide stateful failover in the event of an active VSM failure.

The two available communication options for VSM to VEM are:

  1. Layer 2 control mode: The VSM control interface shares the same VLAN with the VEM.
  2. Layer 3 control mode: The VEM and the VSM are in different IP subnets.

The VSM also uses heartbeat messages to detect a loss of connectivity between it and the VEM. However, the VEM does not depend on connectivity to the VSM to perform its data plane functions and will continue forwarding packets if the VSM fails.

With Layer 3 control mode, the heartbeat messages are encapsulated in a GRE envelope.

Nexus 1000 and VSM best practices

  • L2 control is recommended for new installations.
  • Use MAC pinning instead of LACP.
  • Packet, Control, and Management in the same VLAN.
  • Do not use VLAN 1 for Control and Packet.
  • Use 2 x VSM for redundancy. 

The max latency between VSM and VEM is ten milliseconds. Therefore, a VSM can be placed outside the data center if you have a high-quality DCI link, and the VEM can still be controlled.

Nexus 1000v InterCloud – Cisco switch virtualization

A vital element of the Nexus 1000 is its use case for hybrid cloud deployments and its ability to place workloads in private and public environments via a single pane of glass. In addition, the Nexus 1000v interCloud addresses the main challenges with hybrid cloud deployments, such as security concerns and control/visibility challenges within the public cloud.

The Nexus 1000 interCloud works with Cisco Prime Service Controller to create a secure L2 extension between the private data center and the public cloud.

This L2 extension is based on Datagram Transport Layer Security ( DTLS ) protocol and allows you to securely transfer VMs and Network services over a public IP backbone. DTLS derives the SSL protocol and provides communications privacy for datagram protocols, so all data in motion is cryptographically isolated and encrypted.

Nexus 1000
Nexus 1000 and Hybrid Cloud.

 

Nexus 1000v Hybrid Cloud Components 

Cisco Prime Network Service Controller for InterCloud **A VM that provides a single pane of glass to manage all functions of the inter clouds
InterCloud VSMManage port profiles for VMs in the InterCloud infrastructure
InterCloud ExtenderProvides secure connectivity to the InterCloud Switch in the provider cloud. Install in the private data center.
InterCloud SwitchVirtual Machine in the provider data center has secure connectivity to the InterCloud Extender in the enterprise cloud and secure connectivity to the Virtual Machines in the provider cloud.
Cloud Virtual MachinesVMs in the public cloud running workloads.

Prerequisites

Port 80HTTP access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 443HTTPS access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 22SSH from PNSC to InterCloud VMs in the provider cloud
UDP 6644DTLS data tunnel
TCP 6644DTLS control tunnel

VXLAN – Virtual Extensible LAN

The requirement for applications on demand has led to an increased number of required VLANs for cloud providers. The standard 12-bit identifier, which provided 4000 VLANs, proved to be a limiting factor in multi-tier, multi-tenant environments, and engineers started to run out of isolation options.

This has introduced a 24-bit VXLAN identifier, offering 16 million logical networks. Now, we can cross Layer 3 boundaries. The MAC in UDP encapsulation uses switch hashing to analyze UDP packets and efficiently distribute all packets in a port channel.

nexus 1000
VXLAN operations

VXLAN works like a layer 2 bridge ( Flood and Learn ); the VEM learn does all the heavy lifting, learns all the VM source MAC and Host VXLAN IPs, and encapsulates the traffic according to the port profile to which the VM belongs. Broadcast, Multicast, and unknown unicast traffic are sent as Multicast.

At the same time, unicast traffic is encapsulated and shipped directly to the destination host’s VXLAN IP, aka destination VEM. Enhanced VXLAN offers VXLAN MAC distribution and ARP termination, making it more optional. 

VXLAN Mode Packet Functions

PacketVXLAN(multicast mode)Enhanced VXLAN(unicast mode)Enhanced VXLANMAC DistributionEnhanced VXLANARP Termination
Broadcast /MulticastMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapReplication plus Unicast Encap
Unknown UnicastMulticast EncapsulationReplication plus Unicast EncapDropDrop
Known UnicastUnicast EncapsulationUnicast EncapUnicast EncapUnicast Encap
ARPMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapVEM ARP Reply

vPath – Service chaining

Intelligent Policy-based traffic steering through multiple network services.

vPath allows you to intelligently traffic steer VM traffic to virtualized devices. It intercepts and redirects the initial traffic to the service node. Once the service node performs its policy function, the result is cached, and the local virtual switch treats the subsequent packets accordingly. In addition, it enables you to tie services together to push the VM through each service as required. Previously, if you wanted to tie services together in a data center, you needed to stitch the VLANs together, which was limited by design and scale.

Cisco virtualization
Nexus and service chaining

vPath 3.0 is now submitted to the IETF for standardization, allowing service chaining with vPath and non-vpath network services. It enables you to use vpath service chaining between multiple physical devices and supporting multiple hypervisors.

License Options 

Nexus 1000 Essential EditionNexus 1000 Advanced Edition
Full Layer-2 Feature SetAll Features of Essential Edition
Security, QoS PoliciesVSG firewall
VXLAN virtual overlaysVXLAN Gateway
vPath enabled Virtual ServicesTrustSec SGA
Full monitoring and management capabilitiesA platform for other Cisco DC Extensions in the Future
Free$695 per CPU MSRP

Nexus 1000 features and benefits

SwitchingL2 Switching, 802.1Q Tagging, VLAN, Rate Limiting (TX), VXLAN
IGMP Snooping, QoS Marking (COS & DSCP), Class-based WFQ
SecurityPolicy Mobility, Private VLANs w/ local PVLAN Enforcement
Access Control Lists, Port Security, Cisco TrustSec Support
Dynamic ARP inspection, IP Source Guard, DHCP Snooping
Network ServicesVirtual Services Datapath (vPath) support for traffic steering & fast-path off-load[leveraged by Virtual Security Gateway (VSG), vWAAS, ASA1000V]
ProvisioningPort Profiles, Integration with vC, vCD, SCVMM*, BMC CLM
Optimized NIC Teaming with Virtual Port Channel – Host Mode
VisibilityVM Migration Tracking, VC Plugin, NetFlow v.9 w/ NDE, CDP v.2
VM-Level Interface Statistics, vTrackerSPAN & ERSPAN (policy-based)
ManagementVirtual Centre VM Provisioning, vCenter Plugin, Cisco LMS, DCNM
Cisco CLI, Radius, TACACs, Syslog, SNMP (v.1, 2, 3)
Hitless upgrade, SW Installer

Advantages and disadvantages of the Nexus 1000

AdvantagesDisadvantages
The Standard edition is FREE; you can upgrade to an enhanced version when needed.VEM and VSM internal communication is very sensitive to latency. Due to their chatty nature, they may not be good for inter-DC deployments.
Easy and Quick to deployVSM – VEM, VSM (active) – VSM (standby) heartbeat time of 6 seconds makes it sensitive to network failures and congestion.
It offers you many rich network features unavailable on other distributed software switches.VEM over-dependency on VSM reduces resiliency.
Hypervisor agnosticVSM is required for vSphere HA, FT, and VMotion to work.
Hybrid Cloud functionality 

**Closing Points on Cisco Nexus 1000v**

Virtual Ethernet Module (VEM):

The Nexus 1000v employs the Virtual Ethernet Module (VEM), which runs as a module inside the hypervisor. This allows for efficient and direct communication between VMs, bypassing the traditional reliance on the hypervisor networking stack.

Virtual Supervisor Module (VSM):

The Virtual Supervisor Module (VSM) serves as the control plane for the Nexus 1000v, providing centralized management and configuration. It enables network administrators to define policies, manage virtual ports, and monitor network traffic.

Policy-Based Virtual Network Management:

With the Nexus 1000v, administrators can define policies to manage virtual networks. These policies ensure consistent network configurations across multiple hosts, simplifying network management and reducing the risk of misconfigurations.

Advanced Security and Monitoring Capabilities:

The Nexus 1000v offers granular security controls, including access control lists (ACLs), port security, and dynamic host configuration protocol (DHCP) snooping. Additionally, it provides comprehensive visibility into network traffic, enabling administrators to monitor and troubleshoot network issues effectively.

Benefits of the Nexus 1000v:

Enhanced Network Performance:

By offloading network processing to the VEM, the Nexus 1000v minimizes the impact on the hypervisor, resulting in improved network performance and reduced latency.

Increased Scalability:

The distributed architecture of the Nexus 1000v allows for seamless scalability, ensuring that organizations can meet the growing demands of their virtualized environments.

Simplified Network Management:

With its policy-based approach, the Nexus 1000v simplifies network management tasks, enabling administrators to provision and manage virtual networks more efficiently.

Use Cases:

Data Centers:

The Nexus 1000v is particularly beneficial in data center environments where virtualization is prevalent. It provides a robust and scalable networking solution, ensuring optimal performance and security for virtualized workloads.

Cloud Service Providers:

Cloud service providers can leverage the Nexus 1000v to enhance their network virtualization capabilities, offering customers more flexibility and control over their virtual networks.

The Nexus 1000v is a powerful virtual network switch that provides advanced networking capabilities for virtualized environments. Its rich features, policy-based management approach, and seamless integration with VMware vSphere allow organizations to achieve enhanced network performance, scalability, and management efficiency. As virtualization continues to shape the future of data centers, the Nexus 1000v remains a valuable tool for optimizing virtual network infrastructures.

Summary: Cisco Switch Virtualization Nexus 1000v

Welcome to our blog post, where we dive into the world of Cisco Switch Virtualization, explicitly focusing on the Nexus 1000v. In this article, we will unravel the complexities surrounding switch virtualization, explore the key features of the Nexus 1000v, and understand its significance in modern networking environments.

Understanding Switch Virtualization

Switch virtualization is a technique that allows for creating multiple virtual switches on a single physical switch, enabling greater flexibility and efficiency in network management. Organizations can consolidate their infrastructure, reduce costs, and streamline network operations by virtualizing switches.

Introducing the Nexus 1000v

The Cisco Nexus 1000v is a powerful switch virtualization solution that extends the functionality of VMware environments. Unlike traditional virtual switches, it provides a more comprehensive set of features and advanced network control. It seamlessly integrates with VMware vSphere, offering enhanced visibility, security, and policy management.

Key Features of the Nexus 1000v

– Distributed Virtual Switch: The Nexus 1000v operates as a distributed virtual switch, distributing network intelligence across all hosts in the virtualized environment. This ensures consistent policies, simplified troubleshooting, and improved performance.

– Virtual Port Profiles: With virtual port profiles, administrators can define consistent network policies for virtual machines, irrespective of their physical location. This simplifies network provisioning and reduces the chances of misconfigurations.

– Network Analysis Module (NAM): The Nexus 1000v incorporates NAM, a robust monitoring and analysis tool that provides deep visibility into virtual network traffic. This enables administrators to identify and resolve network issues, ensuring optimal performance quickly.

Deployment Considerations

When planning to deploy the Nexus 1000v, it is essential to consider factors such as network architecture, compatibility with existing infrastructure, and scalability requirements. It is advisable to consult with Cisco experts or certified partners to ensure a smooth and successful implementation.

Conclusion:

In conclusion, the Cisco Nexus 1000v is a game-changer in switch virtualization. Its advanced features, seamless integration with VMware environments, and extensive network control make it an ideal choice for organizations seeking to optimize their network infrastructure. By understanding the fundamentals of switch virtualization and exploring Nexus 1000v’s capabilities, network administrators can unlock a world of possibilities in network management and performance.

SDN Data Center

Redundant links with Virtual PortChannels

Redundant Links with Virtual PortChannels

In the world of networking, efficiency, and reliability are paramount. As data centers expand and organizations strive for seamless connectivity, Virtual PortChannels (vPCs) have emerged as a powerful solution. This blog post aims to demystify vPCs, comprehensively understanding their benefits, functionality, and implementation considerations.

Virtual PortChannels, also known as vPCs, are a technology designed to enhance network scalability and resiliency. By combining multiple physical links into a single logical interface, vPCs allow for increased bandwidth and redundancy, ensuring uninterrupted connectivity and load balancing across network switches.

Redundant links refer to the practice of having multiple physical connections between network devices. This approach mitigates the risks of single points of failure and ensures uninterrupted network connectivity. However, managing redundant links can be complex and resource-intensive.

Virtual PortChannel (vPC) is a technology developed by Cisco Systems that revolutionizes how redundant links are deployed and managed. It allows the creation of a logical link aggregation group (LAG) by bundling multiple physical links into a single logical interface. This logical interface acts as a single point of attachment for downstream devices, simplifying the network topology.

Enhanced Redundancy: By bundling multiple physical links into a vPC, network administrators can achieve higher levels of redundancy. In the event of a link failure, traffic seamlessly fails over to the remaining active links, ensuring uninterrupted connectivity.

Improved Bandwidth Utilization: vPC enables load balancing across multiple physical links, maximizing the available bandwidth. This intelligent distribution of traffic prevents link congestion and optimizes network performance.

Simplified Network Design: Traditional redundant link configurations often involve complex Spanning Tree Protocol (STP) configurations to avoid loops. With vPC, STP is not required, simplifying the network design and reducing potential points of failure.

Hardware and Software Requirements: Implementing vPC requires compatible hardware and software. Network administrators must ensure that their devices support vPC functionality and that the necessary licenses are in place.

Configuration Best Practices: Proper configuration is crucial for the successful deployment of vPC. Network administrators should follow best practices provided by the equipment manufacturer and ensure consistency across all devices in the vPC domain.

Real-World Use Cases

Data Centers: vPC is widely used in data center environments to provide high availability and optimal network performance. It allows for seamless migration of virtual machines (VMs) across physical hosts without losing network connectivity.

Campus Networks: Large campus networks can leverage vPC to enhance redundancy and simplify network management. By aggregating multiple uplinks from access switches, vPC provides a resilient and scalable network infrastructure.

Highlights: Redundant Links with Virtual PortChannels

Port Channels and vPCs

During the early days of Layer 2 Ethernet networks, Spanning Tree Protocol (STP) was used to limit the devastating effects of a topology loop. Even though there may be many connections in a network, STP has one suboptimal principle: only one active path is allowed between two devices.

There are two problems with a single logical link: the first is that half (or more) of the system’s bandwidth is unavailable to data traffic, and the second is that if the active link fails, the network will experience multiple seconds of systemwide data loss as it re-evaluates the new “best” solution to network forwarding on a Layer 2 network.

One of the significant drawbacks of the spanning tree is the concept of blocking ports. While they are essential to prevent loops, blocking ports leads to inefficient network performance. The blocked ports essentially go unused, resulting in unused bandwidth and decreased overall network throughput.

Spanning Tree Root Switch

Load Balancing

Furthermore, in a robust network with STP loop management, there is no efficient dynamic way to utilize all the available bandwidth. Enhanced Layer 2 Ethernet networks have been developed through the use of port channels and virtual port channels (vPCs). Port Channel technology allows forwarding traffic between two participating devices using a load-balancing algorithm to balance traffic across multiple inter-switch links (ISLs).

By bundling the links together as one logical link, the loop problem is also managed. Multi-device port channels can be formed using vPC technology. Port channel-attached devices see a pair of switches acting as a single logical endpoint when attached to a vPC peer; the devices act as separate endpoints. By combining hardware redundancy with port-channel loop management, the vPC environment provides multiple benefits.

Example: Cisco ACI

These technologies are extensively used in the ACI Cisco. A virtual port channel (vPC) allows links physically connected to two different ACI leaf nodes to appear as a single port channel to a third device (i.e., network switch, server, or any other networking device that supports link aggregation technology). Firstly, let us start with the basics.

**Spanning Tree Challenges**

Traditional spanning trees challenge network designers as they block redundant links. The drawbacks of STP ( spanning tree protocol ) prove extremely expensive in data centers when multiple redundant links are used for mission-critical applications, essentially wasting 50% of the capacity.

You can use the port channel to scale bandwidth, as the bundled links appear as one to higher-level protocols, resulting in all ports forwarding or blocking for a particular VLAN. It would help if you aimed to design all links in a data center as an EtherChannel, as this will optimize your bandwidth and reliability.

STP Path distribution

EtherChannel Technology

Network administrators connect multiple physical Ethernet links between devices to achieve more bandwidth and redundancy. The Spanning Tree Protocol blocks these links, so we need EtherChannel Technology. EtherChannel technology combines several physical links between switches into one logical connection to provide high-speed links and redundancy without being blocked by the Spanning Tree Protocol.

Understanding Layer 2 EtherChannel

Layer 2 EtherChannel, also known as Link Aggregation or Port Channel, combines multiple physical links into a single logical link. This powerful technique enhances bandwidth, improves redundancy, and optimizes load balancing. By bundling multiple links, Layer 2 EtherChannel presents a unified interface for higher throughput and fault tolerance.

Configuring Layer 2 EtherChannel involves steps that vary depending on the networking equipment used. Generally, it starts with identifying the physical links that will be part of the EtherChannel bundle. Then, the appropriate channel mode and load balancing method must be configured. Lastly, the EtherChannel interface is created, and the physical links are assigned. Proper configuration ensures seamless data transmission and efficient utilization of network resources.

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel, also known as routed Etherchannel or port-channel, bundles multiple physical interfaces into a single logical interface to increase bandwidth and provide redundancy at Layer 3. Unlike Layer 2 Etherchannel, which operates at the data link layer, Layer 3 Etherchannel operates at the network layer, allowing traffic distribution across multiple physical links based on routing protocols.

Layer 3 Etherchannel offers several advantages over traditional single link configurations. Firstly, it allows for load balancing, where traffic is distributed across multiple links, maximizing bandwidth utilization and improving overall network performance. Additionally, Layer 3 Etherchannel provides redundancy, ensuring that traffic seamlessly switches to the remaining active links if one link fails, minimizing downtime, and enhancing network reliability.

Configuration of Layer 3 Etherchannel

Configuring Layer 3 Etherchannel involves a few essential steps. Firstly, the physical interfaces that will be part of the Etherchannel bundle must be identified and prepared. Then, a logical interface, often called a port-channel interface, is created. This interface acts as the virtual representation of the bundled physical links. Next, the routing protocol must be configured to distribute traffic across the links. Finally, verification and testing are crucial to ensure proper configuration and functionality.

Layer 3 Etherchannel finds its applications in various scenarios. One everyday use case is in data centers, where high-speed connectivity and redundancy are critical. By bundling multiple links, Layer 3 Etherchannel provides the bandwidth and failover capabilities required for demanding data center environments. Another use case is in network edge deployments, where Layer 3 Etherchannel allows for efficient load balancing and redundancy in connecting access switches to distribution or core switches.

Port Channel and vPC 

Port Channel technology forwards traffic between two participating devices using a load-balancing algorithm. Multiple devices can form a virtual port channel (vPC). A third device can see two Cisco Nexus 7000 or 9000 Series devices as a single port channel using a virtual port channel (vPC). Third devices can be switches, servers, or other networking devices that support port channels.

A virtual private cloud can provide Layer 2 multipathing to create redundancy and increase bandwidth by enabling multiple parallel paths between nodes and load-balancing traffic. The only ones you can use in the vPC are Layer 2 port channels. LACP or a static no protocol configuration is used to configure the port channels.

vPC provides the following technical benefits:

  • A single device can share a port channel between two upstream devices
  • Spanning Tree Protocol (STP) blocked ports are removed
  • Makes sure there are no loops in the topology
  • Uplink bandwidth is utilized to the fullest extent possible
  • When either a device or a link fails, the system quickly converges
  • Resilience at the link level is ensured
  • Ensures a high level of availability

**Implementation of vPC topologies**

VPC supports the following topologies:

  1. Dual-uplink Layer 2 access: Using a Cisco Nexus 9000 Series switch, an access switch is dual-homed to a pair of distribution switches.
  2. Dual-homing: This topology connects two servers to two switches,
  3. Topologies supported by FEX: FEX supports various vPC topologies using Cisco Nexus 7000 and 9000 Series switches.

Related: For pre-information, you may find the following posts helpful:

  1. Data Center Fabric
  2. Optimal Layer 3 Forwarding
  3. Data Center Failure
  4. Active Active Data Center Design
  5. Network Overlays
  6. Dead Peer Detection

Redundant Links with Virtual PortChannels

STP has one suboptimal principle: to break loops in a network. This issue with having a single practice is that only one active path is allowed from one device to another, regardless of how many connections might exist in the network. In addition, no efficient dynamic mechanism exists for using all the available bandwidth with STP loop management.

  • Port Channel Technology

So, to overcome these challenges, enhancements to Layer 2 Ethernet networks were made in the shape of port channel and virtual port channel (vPC) technologies. Port Channel technology permits multiple links between two participating devices to forward traffic using a load-balancing algorithm while managing the loop problem by bundling the links as one logical link.

  • vPC Technology

Then, we have vPC technology. This technology permits multiple devices to create a port channel. In vPC, a pair of switches acting as a vPC peer endpoint looks like a single logical entity to port channel–attached devices; the two devices that serve as the logical port-channel endpoint are still two different and separate devices.

High Availability: Link and Device.

You need to identify the level of high availability you want to achieve in enterprise branch offices. Then, you can meet your high availability requirements with the appropriate device level and link redundancy.

Link-level redundancy requires two links to run as active/active or active/backup links to recover traffic forwarding lost if one link fails. Therefore, any failure on an access link should not result in a loss of connectivity. To qualify, a branch office must have at least two upstream links, either to a private network or the Internet.

Device-level redundancy is another level of high availability, ensuring that the backup device can take over in the event of a failed device. Device and link redundancy is typically coupled, which means that if one fails, the other will too. As a result of this strategy, there should be no loss of connectivity between branch offices and data centers due to a single device failure.

High Availability and Designs

High-availability designs combine link and device redundancy between branches and data centers to ensure business-critical connectivity. Each data center is dual-homed, so if one fails, traffic can be redirected to the backup data center in the event of a complete failure.

Reroute traffic within 30 seconds should be possible whenever a failure (link, device, or data center) occurs. Packets can be lost during this period. When the user applications can withstand these failover times, sessions are maintained. Established sessions should not be dropped in a branch office with redundant devices if the failed device was forwarding traffic.

redundant links
Diagram: Redundant links with EtherChannel. Source is jmcritobal

**vPC vs Port Channel**

Servers can be attached to the access switches as port channels, uplinks that consist of redundant links formed from the access can be link aggregated, and the core links can also be bundled. Most switches can support 8 ports in a bundle, and Nexus platforms can support up to 16 – 32 ports.

It would help to create a port channel with ports from different line cards in each redundant switch. This will prevent the failure of a single line card from affecting the entire channel. With this approach, we get redundancy at a logical and physical layer.

Link Aggregation and Port Channels

Link aggregation (EtherChannel and IEEE 802.3ad ) was developed to address that limitation where two Ethernet redundant switches were connected through multiple up-links. However, this did not address the challenges in the data center environment for deploying link aggregation on triangular topologies or if you want to terminate on different switches.

Traditional LAG ( link aggregation ) has limitations because its standard only allows aggregated links to terminate on a single switch. Technologies such as vPC Virtual Port Channel and Virtual Switching System (VSS) have been implemented to overcome this limitation.

Key Points: Port Channels

In summary, a port channel aggregates multiple physical interfaces that create a logical interface. On some platforms, you can bundle up to 32 individual redundant links. The Port channel will also load balance traffic across the redundant links. The port channel will remain operational as long as at least one physical interface within the port channel is operational. Finally, before we move to vPC vs port channel, you can create either Layer 2 or 3 port channels. However, as expected, you cannot combine Layer 2 and 3 interfaces in the same port channel.

Port-channel load balancing

Frames are distributed between the physical interfaces that make up the port channel using a hashing function. This hash will differ depending on the load balancing method. Based on the hash result, the physical port to be used for transmission is determined.

A hashing operation can be performed on MAC or IP addresses based on the source address, destination address, or both (some methods use the port number). Depending on the switch model and software version, default load-balancing methods can be layer 2, 3, or 4 and apply globally to all port channels. Here are a few methods for balancing etherchannels:

  • src-ip : Source IP address
  • dst-ip : Destination IP address
  • src-dst-ip : Source and destination IP address
  • src-mac : Source MAC address
  • dst-mac : Destination MAC address
  • src-dst-mac : Source and destination MAC address
  • src-port : Source port number
  • dst-port : Destination port number
  • src-dst-port : Destination source port number

Starting the Debate: vPC vs Port Channel

vPC (Virtual Port-Channel), or multi-chassis ether channel (MEC), is a feature on the Cisco Nexus switches. You can configure port channels across multiple redundant switches. The virtual port channel (vPC) is configured using the interfaces of two redundant switches.

Now we have redundancy at a link and a switch layer forming a triangular design. We must terminate the links with a standard port channel on the same switch. We don’t have a channel between two redundant switches in this case.

Virtual PortChannels (vPCs), links between two Cisco switches, appear to a third downstream device as coming from one device and as part of a single PortChannel. A third device can be a switch, a server, or other networking devices that support IEEE 802.3ad PortChannels. Both standard port channels and Virtual PortChannels (vPC) can use the link Aggregation Control Protocol ( LACP ).

LACP negotiation and redundant switches

As part of the IEEE 802.3ad standards, a Link Aggregation Control Protocol ( LACP ) was created to negotiate the channel, and it is recommended that this feature be used when building a bundle. LACP modes can be either active or passive. Active mode means the switch actively negotiates the channel, whereas passive means the port does not initiate an LACP negotiation.

You can form channels between active and passive or two active ports but not passive and passive ports. The port channel will not negotiate and remain down if the correct modes are not configured on each side of the challenge.

The following diagram depicts the logic and physical aspects of a vPC virtual port channel. This is not specific to a Cisco vPC but all aggregation technologies. We have several physical redundant links; in our case, four appear to be one prominent link from a logical perspective.

vpc virtual port channel
Diagram: vPC virtual port channels.

In either case, LACP can be used as the control plane to negotiate the channel. You may ask, is LACP mandatory for vPC?” – No. We can use mode “On” & bring UP the port channel without negotiation/checks. So, we are turning off the LACP or other control protocols. However, is LACP recommended for vPC?” –

Like a normal port channel, it is always advised to use a control protocol for the vPC/port channel. LACP adds a lot of intelligence to the background. The main difference between vPC and port channels is that vPC can terminate on secondary switches, creating a triangular design.

Building triangles for better redundancy

The quandary of the inability to build triangles with link aggregation can be mitigated by deploying either the Nexus technology, known as virtual Port Channels (vPCs), or the Catalyst technology, known as Virtual Switching System ( VSS ). VSS and vPC virtual port channels allow the termination of an LAG on two separate switches, resulting in a triangular design. In addition, they will enable the grouping of two physical redundant switches to form a single logical switch to any downstream device ( switch or server ).

Load Balancing Functions

A hash function is performed when a layer 2 frame is forwarding to a PortChannel to determine which physical links to forward the frame. The load balancing method used for Nexus switches is granular and includes the following:

Nexus Switches

Load Balancing Method

Option 1

Destination IP address

Option 2

Destination MAC address

Option 3

Destination TCP and UDP port number

Option 4

Source and Destination IP address

Option 5

Source and Destination MAC address

Option 6

Source and Destination TCP and UDP port numbers

Option 7

Source IP address

Option 8

Source MAC address

Option 9

Source TCP and UDP port number

Redundant Links: Detect polarized links

Monitoring the traffic distribution over each physical link is essential to detect polarized links. The polarization effect occurs if some links attract more traffic than others, resulting in heavy utilization of some redundant links and low utilization of others. Therefore, before choosing the load balancing method, analyze the traffic flows from source to destination and determine if the flow is too many or evenly spread. For example, I would not use the source IP address load balancing method to load balance traffic from a firewall deploying Network Address Translation ( NAT ) to a single device.

Routing Protocols

Keep in mind for routing convergence, that routing protocols see the channel as one link, so if you have 8 x 10 ports in one bundle and that bundle has an OSPF cost of 10, a failure occurs, and you lose a member of that channel, the OSPF will still mark that link with the same metric. Routing protocols don’t dynamically change metrics due to a member link failure.

vPC and VSS offer the following benefits:

Redundant links with Virtual PortChannels

Improved convergence with a link and device failure.

Eliminate the need for STP.

Independent control planes ** Not with the VSS.

Increased bandwidth but combining all redundant links to one from the perspective of STP.

What is vPC?

MEC (Multi-Chassis EtherChannel) is a feature on Cisco Nexus switches that allows you to configure a Port-Channel across multiple switches (i.e., vPC peers). Virtual PC is similar to Virtual Switch System (VSS) on Catalyst 6500s. However, VSS creates just one logical switch instead of vPC’s multiple ones. In this way, a single control plane handles both management and configuration.

vPC, on the other hand, allows each switch to be managed and configured independently. vPC manages both switches independently, so it’s important to remember that. Therefore, you must create and permit your VLANs on both Nexus switches.

Comparing vPC and VSS

vPC and VSS are similar technologies, but the Nexus vPC feature has dual control planes. It offers In-Service Upgrade ( ISSU ), which allows upgrading one of the two switches without causing any service interruption. Because the control plane runs independently on each of the vPC peers, the failure of one peer does not affect the virtual switch.

With the VSS, the active peer going down brings down the entire system because of the lack of dual control planes. It should be worth noting that vPC falls back to STP, and the reliance on STP can only be entirely circumvented if you use Cisco’s Fabric Path or THRILL. The VSS is available on the Catalyst platforms, while vPC is solely a Nexus technology.

**vPC Terminology:**

  • vPC Peer – a vPC switch, one of a pair.
  • vPC member port – one of the ports that form a vPC.
  • vPC – the combined port channel between the vPC peers and the downstream device.
  • vPC peer-link – link used to synchronize state between vPC peer devices, must be 10GbE. The vPC-related control plane communications occur over this link, and any Ethernet frames transported receive special treatment to avoid loops in the vPC member ports.
  • vPC peer keepalive link – the keepalive link between the vPC peer devices. It recommended using the MGMT0 interface and a VRF instance. If the mgmt interface is unavailable, then a routed interface in the mgmt VRF is.
  • vPC VLAN – one of the VLANs that carry over the vPC peer link and communicate via the vPC with a peer device.
  • non-vPC peer VLAN – One STP VLAN is not carried over the peer link.
  • CFS – Cisco Fabric Service Protocol, used for state synchronization and configuration validation between vPC peer devices.

Within a vPC domain, each pair is assigned to a primary or secondary role; by default, the switch with the lowest MAC address becomes the primary peer. The domain identifies the pair of redundant switches, generating a shared MAC address that can be used as a logical switch bridge ID in STP communication.

Virtual PortChannels Best Practices

Below are the best practices to consider for implementation:

  1. Manually define which vPC switch is primary and secondary. Lower the priority, and the more preferred switch will act as the primary.
  2. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer-link with ports in dedicated mode.
  3. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer keepalive link ( non-default VRF ).
  4. Enable Bridge Assurance ( BA ) on the vPC peer-link interface ( default ).
  5. Enable UDLD aggression on the vPC peer-link interface.
  6. On the primary vPC switch, configure the STP root bridge for a VLAN, the active HSRP router, and the PIM DR router. Likewise, on the secondary vPC switch, configure the secondary STP and the standby HSRP router. The Layer 2 and Layer 3 topologies should match.

Introducing Fabric Extenders

If you want to add even more redundancy to vPC, use it with a fabric extender. Fabric Extenders act as a remote line card to a parent switch and can be used with vPC in three forms. The first is known as host vPC and is a vPC southbound from the FEX to the server; the second is a vPC northbound from the FEX to the parent switch, sometimes called a Fabric vPC; and the third is both a southbound and northbound vPC from the FEX which is known as Enhanced vPC.

Data Center Topology types
Diagram: Introducing Fabric Extenders.

Virtual PortChannels, the single connection

  • Datacenter interconnect

Because vPC characterizes a single connection from the perspective of higher-level protocols, e.g., STP or OSPF, it can be used as a Layer 2 extension for a DCI ( data center interconnect ) over short distances and dark fiber or protected DWDM only. vPC best practices still apply, and it is recommended that you use different vPC domains for each site and that Layer 3 communication between vPC peers is performed on dedicated routed redundant links.

  • OTV or VPLS

If you connect more than two data centers with an entire mesh topology, the best DCI mechanism would be Overlay Transport Virtualization (OTV) or VPLS ( Ethernet-based point-to-multipoint Layer 2 VPN ). vPC can work with two or more data centers, but you must design the topology as a hub-and-spoke.

Any spoke-to-spoke communication must flow through the hub, connecting two data centers back to back or two or more in a hub-and-spoke design. The layer 2 boundary and STP isolation can be achieved with bridge protocol unit ( BPDU ) filtering on the DCI links. BPDU filtering avoids transmitting BPDUs on a link, essentially turning off STP on the DCI links. You can extend with the multi-pod or multi-site designs if you have Cisco ACI.

redundant links
Diagram: vPC as a Data Center Interconnect.
  • A key point: Loop prevention

vPC has a built-in loop prevention mechanism; never forward a frame received through a peer link to a vPC member port. Under normal operations, a vPC peer switch should never learn MAC addresses over the peer link and is mainly used for flooding, multicast, broadcast, and control plane traffic.

This is because the LAG is terminated on two peer switches, and you don’t want to send traffic received from a single downstream device back down to the same downstream device, resulting in a loop. However, this rule does not apply to:

  1. Non-vPC interface ( orphan port ) and
  2. vPC member ports that are only active in the receiving pair.

Note: An orphan port in a port to a downstream device connected to only one peer.

Redundant Switches: vPC peer link usage

As mentioned, the vPC peer link should not be used for end-host reachability under normal operations. However, if there is a failure on all members of a vPC in a single peer, the peer link will forward frames to the remaining member ports of the vPC. This explains why Cisco has recommended using the same 10G for the peer link.

The peer keepalive link is also mandatory and is used as a heartbeat mechanism to transport UDP datagrams between peers. This avoids a dual-active / split-brain scenario where both peers are active simultaneously. If no heartbeat is received after a configurable timeout, the secondary vPC peer is the primary peer, and all its member ports remain active.

However, undesirable behavior occurs if an orphan port is connected to only one peer. For example, with a vPC peer link failure, the orphan ports remain active in the secondary peer, even though they are now isolated from the rest of the network. In this case, it is recommended that a non-vPC trunk be configured between peer switches.

Summary: Redundant Links with Virtual PortChannels

In today’s fast-paced digital world, network reliability and performance are paramount. With the increasing demand for seamless connectivity, businesses seek innovative solutions to enhance their network infrastructure. One such solution that has gained significant traction is the implementation of redundant links with Virtual PortChannel (vPC). In this blog post, we explored the concept of redundant links and delved into the benefits and considerations of utilizing vPC technology.

Understanding Redundant Links

As the name suggests, redundant links are duplicate connections that provide failover capabilities in case of network failures. By establishing multiple links between network devices, organizations can ensure uninterrupted connectivity and minimize the risk of downtime. By distributing traffic across multiple paths, redundant links not only enhance network reliability but also improve overall network performance.

Exploring Virtual PortChannel (vPC) Technology

Virtual PortChannel (vPC) is a technology that aggregates multiple physical links into a single logical link. By bundling these links, vPC provides increased bandwidth, load balancing, and redundancy. This technology enables network devices to form a virtual port channel, presenting as a single port to connected devices. With vPC, organizations can achieve high availability and scalability while simplifying network configuration and management.

Benefits of Redundant Links with vPC

1. Enhanced Network Availability: Redundant links with vPC ensure network availability by providing alternate paths in case of link failures. This redundancy eliminates single points of failure and minimizes the impact of network disruptions.

2. Improved Load Balancing: VPC technology optimizes network performance and prevents bottlenecks by distributing traffic across multiple links. This load-balancing capability results in the efficient utilization of network resources and improved user experience.

3. Simplified Network Management: vPC technology simplifies network configuration and management. By logically consolidating multiple physical links, administrators can streamline their network setup, reducing complexity and potential human errors.

Considerations for Implementing Redundant Links with vPC

While the benefits of redundant links with vPC are significant, it’s essential to consider a few key factors before implementation. Factors such as network topology, hardware compatibility, and proper configuration must be thoroughly evaluated to ensure a successful deployment.

Conclusion:

In conclusion, redundant links with Virtual PortChannel (vPC) present a powerful solution for organizations aiming to enhance network reliability and performance. By combining the advantages of redundant links and virtualization, businesses can achieve high availability, improved load balancing, and simplified network management. With careful planning and consideration, implementing redundant links with vPC can pave the way for a robust and resilient network infrastructure.

Green data center with eco friendly electricity usage tiny person concept. Database server technology for file storage hosting with ecological and carbon neutral power source vector illustration.

Data Center Design with Active Active design

Active Active Data Center Design

In today's digital age, where businesses heavily rely on uninterrupted access to their applications and services, data center design plays a pivotal role in ensuring high availability. One such design approach is the active-active design, which offers redundancy and fault tolerance to mitigate the risk of downtime. This blog post will explore the active-active data center design concept and its benefits.

Active-active data center design refers to a configuration where two or more data centers operate simultaneously, sharing the load and providing redundancy for critical systems and applications. Unlike traditional active-passive setups, where one data center operates in standby mode, the active-active design ensures that both are fully active and capable of handling the entire workload.

Enhanced Reliability: Redundant data centers offer unparalleled reliability by minimizing the impact of hardware failures, power outages, or network disruptions. When a component or system fails, the redundant system takes over seamlessly, ensuring uninterrupted connectivity and preventing costly downtime.

Scalability and Flexibility: With redundant data centers, businesses have the flexibility to scale their operations effortlessly. Companies can expand their infrastructure without disrupting ongoing operations, as redundant systems allow for seamless integration and expansion.

Disaster Recovery: Redundant data centers play a crucial role in disaster recovery strategies. By having duplicate systems in geographically diverse locations, businesses can recover quickly in the event of natural disasters, power grid failures, or other unforeseen events. Redundancy ensures that critical data and services remain accessible, even during challenging circumstances.

Dual Power Sources: Redundant data centers rely on multiple power sources, such as grid power and backup generators. This ensures that even if one power source fails, the infrastructure continues to operate without disruption.

Network Redundancy: Network redundancy is achieved by setting up multiple network paths, routers, and switches. In case of a network failure, traffic is automatically redirected to alternative paths, maintaining seamless connectivity.

Data Replication: Redundant data centers employ data replication techniques to ensure that data is duplicated and synchronized across multiple systems. This safeguards against data loss and allows for quick recovery in case of a system failure.

Highlights: Active Active Data Center Design

The Role of Data Centers

An enterprise’s data center houses the computational power, storage, and applications needed to run its operations. All content is sourced or passed through the data center infrastructure in the IT architecture. Performance, resiliency, and scalability must be considered when designing the data center infrastructure.

Furthermore, the data center design should be flexible so that new services can be deployed and supported quickly. The many considerations required for such a design are port density, access layer uplink bandwidth, actual server capacity, and oversubscription.

A few short years ago, data centers were very different from what they are today. In a multi-cloud environment, virtual networks have replaced physical servers that support applications and workloads across pools of physical infrastructure. Nowadays, data exists across multiple data centers, the edge, and public and private clouds.

Communication between these locations must be possible in the on-premises and cloud data centers. Public clouds are also collections of data centers. In the cloud, applications use the cloud provider’s data center resources.

**Redundant data centers**

Redundant data centers are essentially two or more in different physical locations. This enables organizations to move their applications and data to another data center if they experience an outage. This also allows for load balancing and scalability, ensuring the organization’s services remain available.

Redundant data centers are generally located in geographically dispersed locations. This ensures that if one of the data centers experiences an issue, the other can take over, thus minimizing downtime. These data centers should also be connected via a high-speed network connection, such as a dedicated line or virtual private network, to allow seamless data transfers between the locations.

Implementing redundant data center BGP involves several crucial steps.

– Firstly, establishing a robust network architecture with multiple data centers interconnected via high-speed links is essential.

– Secondly, configuring BGP routers in each data center to exchange routing information and maintain consistent network topologies is crucial. Additionally, techniques such as Anycast IP addressing and route reflectors further enhance redundancy and fault tolerance.

**Benefits of Active-Active Data Center Design**

1. Enhanced Redundancy: With active-active design, organizations can achieve higher levels of redundancy by distributing the workload across multiple data centers. This redundancy ensures that even if one data center experiences a failure or maintenance downtime, the other data center seamlessly takes over, minimizing the impact on business operations.

2. Improved Performance and Scalability: Active-active design enables organizations to scale their infrastructure horizontally by distributing the load across multiple data centers. This approach ensures that the workload is evenly distributed, preventing any single data center from becoming a performance bottleneck. It also allows businesses to accommodate increasing demands without compromising performance or user experience.

3. Reduced Downtime: The active-active design significantly reduces the risk of downtime compared to traditional architectures. In the event of a failure, the workload can be immediately shifted to the remaining active data center, ensuring continuous availability of critical services. This approach minimizes the impact on end-users and helps organizations maintain their reputation for reliability.

4. Disaster Recovery Capabilities: Active-active data center design provides a robust disaster recovery solution. Organizations can ensure that their critical systems and applications remain operational despite a catastrophic failure at one location by having multiple geographically distributed data centers. This design approach minimizes the risk of data loss and provides a seamless failover mechanism.

**Implementation Considerations:**

Implementing an active-active data center design requires careful planning and consideration of various factors. Here are some key considerations:

1. Network Design: A robust and resilient network infrastructure is crucial for active-active data center design. Implementing load balancers, redundant network links, and dynamic routing protocols can help ensure seamless failover and optimal traffic distribution.

2. Data Synchronization: Organizations need to implement effective data synchronization mechanisms to maintain data consistency across multiple data centers. This may involve deploying real-time replication, distributed databases, or file synchronization protocols.

3. Application Design: Applications must be designed to be aware of the active-active architecture. They should be able to distribute the workload across multiple data centers and seamlessly switch between them in case of failure. Application-level load balancing and session management become critical in this context.

Active-active data center design offers organizations a robust solution for high availability and fault tolerance. Businesses can ensure uninterrupted access to critical systems and applications by distributing the workload across multiple data centers. The enhanced redundancy, improved performance, reduced downtime, and disaster recovery capabilities make active-active design an ideal choice for organizations striving to provide seamless and reliable services in today’s digital landscape.

Network Connectivity Center

### What is Google’s Network Connectivity Center?

Google Network Connectivity Center (NCC) is a centralized platform that enables enterprises to manage their global network connectivity. It integrates with Google Cloud’s global infrastructure, offering a unified interface to monitor, configure, and optimize network connections. Whether you are dealing with on-premises data centers, remote offices, or multi-cloud environments, NCC provides a streamlined approach to network management.

### Key Features of NCC

Google’s NCC is packed with features that make it an indispensable tool for network administrators. Here are some key highlights:

– **Centralized Management**: NCC offers a single pane of glass for monitoring and managing all network connections, reducing complexity and improving efficiency.

– **Scalability**: Built on Google Cloud’s robust infrastructure, NCC can scale effortlessly to accommodate growing network demands.

– **Automation and Intelligence**: With built-in automation and intelligent insights, NCC helps in proactive network management, minimizing downtime and optimizing performance.

– **Integration**: Seamlessly integrates with other Google Cloud services and third-party tools, providing a cohesive ecosystem for network operations.

Understanding Network Tiers

Network tiers refer to the different levels of performance and cost offered by cloud service providers. They allow businesses to choose the most suitable network option based on their specific needs. Google Cloud offers two network tiers: Standard and Premium.

The Standard Tier provides businesses with a cost-effective network solution that meets their basic requirements. It offers reliable performance and ensures connectivity within Google Cloud services. With its lower costs, the Standard Tier is an excellent choice for businesses with moderate network demands.

For businesses that demand higher levels of performance and reliability, the Premium Tier is the way to go. This tier offers optimized routes, reduced latency, and enhanced global connectivity. With its advanced features, the Premium Tier ensures optimal network performance for mission-critical applications and services.

Understanding VPC Networking

VPC networking is the backbone of a cloud infrastructure, providing a private and secure environment for your resources. In Google Cloud, a VPC network can be thought of as your own virtual data center in the cloud. It allows you to define IP ranges, subnets, and firewall rules, empowering you with complete control over your network architecture.

Google Cloud’s VPC networking offers a plethora of features that enhance network management and security. From custom IP address ranges to subnet creation and route configuration, you have the flexibility to design your network infrastructure according to your specific needs. Additionally, VPC peering and VPN connectivity options enable seamless communication with other networks, both within and outside of Google Cloud.

Understanding VPC Peering

VPC Peering enables you to connect VPC networks across projects or organizations. It allows for secure communication and seamless access to resources between peered networks. By leveraging VPC Peering, you can create a virtual network fabric across various environments.

VPC Peering offers several advantages. First, it simplifies network architecture by eliminating the need for complex VPN setups or public IP addresses. Second, it provides low-latency and high-bandwidth connections between VPC networks, ensuring fast and reliable data transfer. Third, it lets you share resources across peering networks, such as databases or storage, promoting collaboration and resource optimization.

Understanding HA VPN

HA VPN, short for High Availability Virtual Private Network, is a feature provided by Google Cloud that ensures continuous and reliable connectivity between your on-premises network and your Google Cloud Virtual Private Cloud (VPC) network. It is designed to minimize downtime and provide fault tolerance by establishing redundant VPN tunnels.

To set up HA VPN, follow a few simple steps. First, ensure that you have a supported on-premises VPN gateway. Then, configure the necessary settings to create a VPN gateway in your VPC network. Next, configure the on-premises VPN gateway to establish a connection with the HA VPN gateway. Finally, validate the connectivity and ensure all traffic is routed securely through the VPN tunnels.

Implementing HA VPN offers several benefits for your network infrastructure. First, it enhances reliability by providing automatic failover in case of VPN tunnel or gateway failures, ensuring uninterrupted connectivity for your critical workloads. Second, HA VPN reduces the risk of downtime by offering a highly available and redundant connection. Third, it simplifies network management by centralizing the configuration and monitoring of VPN connections.

On-premises Data Centers

Understanding Nexus 9000 Series VRRP

Nexus 9000 Series VRRP is a protocol that allows multiple routers to work together as a virtual router, providing redundancy and seamless failover in the event of a failure. These routers ensure continuous network connectivity by sharing a virtual IP address, improving network reliability.

With Nexus 9000 Series VRRP, organizations can achieve enhanced network availability and minimize downtime. Utilizing multiple routers can eliminate single points of failure and maintain uninterrupted connectivity. This is particularly crucial in data center environments, where downtime can lead to significant financial losses and reputational damage.

Configuring Nexus 9000 Series VRRP involves several steps. First, a virtual IP address must be defined and assigned to the VRRP group. Next, routers participating in VRRP must be configured with their respective priority levels and advertisement intervals. Additionally, tracking mechanisms can monitor the availability of specific network interfaces and adjust the VRRP priority dynamically.

High Availability and BGP

High availability refers to the ability of a system or network to remain operational and accessible even during failures or disruptions. BGP is pivotal in achieving high availability by employing various mechanisms and techniques.

BGP Multipath is a feature that allows for the simultaneous use of multiple paths to reach a destination. BGP can use various paths to ensure redundancy, load balancing, and enhanced network availability.

BGP Route Reflectors are used in large-scale networks to alleviate the full-mesh requirement between BGP peers. By simplifying the BGP peering configuration, route reflectors enhance scalability and fault tolerance, contributing to high availability.

BGP Anycast is a technique that enables multiple servers or routers to share the same IP address. This method routes traffic to the nearest or least congested node, improving response times and fault tolerance.

BGP AS Prepend

Understanding BGP Route Reflection

BGP route reflection is used in large-scale networks to reduce the number of full-mesh peerings required in a BGP network. It allows a BGP speaker to reflect routes received from one set of peers to another set of peers, eliminating the need for every peer to establish a direct connection with every other peer. Using route reflection, network administrators can simplify their network topology and improve its scalability.

The network must be divided into two main components to implement BGP route reflection: route reflectors and clients. Route reflectors serve as the central point for route reflection, while clients are the BGP speakers who establish peering sessions with the route reflectors. It is essential to carefully plan the placement of route reflectors to ensure optimal routing and redundancy in the network.

Route Reflector Hierarchy and Scaling

In large-scale networks, a hierarchy of route reflectors can be implemented to enhance scalability further. This involves using multiple route reflectors, where higher-level route reflectors reflect routes received from lower-level route reflectors. This hierarchical approach distributes the route reflection load and reduces the number of peering sessions required for each BGP speaker, thus improving scalability even further.

Understanding BGP Multipath

BGP multipath enables the selection and utilization of multiple equal-cost paths for forwarding traffic. Traditionally, BGP would only utilize a single best path, resulting in suboptimal network utilization. With multipath, network administrators can maximize link utilization, reduce congestion, and achieve load balancing across multiple paths.

One of the primary advantages of BGP multipath is enhanced network resilience. By utilizing multiple paths, networks become more fault-tolerant, as traffic can be rerouted in the event of link failures or congestion. Additionally, multipath can improve overall network performance by distributing traffic evenly across available paths, preventing bottlenecks, and ensuring efficient resource utilization.

Expansion and scalability

Expanding capacity is straightforward if a link is oversubscribed (more traffic than can be aggregated on the active link simultaneously). Expanding every leaf switch’s uplinks is possible, adding interlayer bandwidth and reducing oversubscription by adding a second spine switch. New leaf switches can be added by connecting them to every spine switch and configuring them as network switches if device port capacity becomes a concern. Scaling the network is made more accessible through ease of expansion. A nonblocking architecture can be achieved without oversubscription between the lower-tier switches and their uplinks.

Defining an active-active data center strategy isn’t easy when you talk to network, server, and compute teams that don’t usually collaborate when planning their infrastructure. An active-active Data center design requires a cohesive technology stack from end to end. Establishing the idea usually requires an enterprise-level architecture drive. In addition, it enables the availability and traffic load sharing of applications across DCs with the following use cases.

  • Business continuity
  • Mobility and load sharing
  • Consistent policy and fast provisioning capability across

Understanding Spanning Tree Protocol (STP)

Spanning Tree Protocol (STP) is a fundamental mechanism to prevent loops in Ethernet networks. It ensures that only one active path exists between two network devices, preventing broadcast storms and data collisions. STP achieves this by creating a loop-free logical topology known as the spanning tree. But what about MST? Let’s find out.

As networks grow and become more complex, a single spanning tree may not be sufficient to handle the increasing traffic demands. This is where Spanning Tree MST comes into play. MST allows us to divide the network into multiple logical instances, each with its spanning tree. By doing so, we can distribute the traffic load more efficiently, achieving better performance and redundancy.

MST operates by grouping VLANs into multiple instances, known as regions. Each region has its spanning tree, allowing for independent configuration and optimization. MST relies on the concept of a Root Bridge, which acts as the central point for each instance. By assigning different VLANs to separate cases, we can control traffic flow and minimize the impact of network changes.

Example: Understanding UDLD

UDLD is a layer 2 protocol designed to detect and mitigate unidirectional links in a network. It operates by exchanging protocol packets between neighboring devices to verify the bidirectional nature of a link. UDLD prevents one-way communication and potential network disruptions by ensuring traffic flows in both directions.

UDLD helps maintain network reliability by identifying and addressing unidirectional links promptly. It allows network administrators to proactively detect and resolve potential issues before they can impact network performance. This proactive approach minimizes downtime and improves overall network availability.

Attackers can exploit unidirectional links to gain unauthorized access or launch malicious activities. UDLD acts as a security measure by ensuring bidirectional communication, making it harder for adversaries to manipulate network traffic or inject harmful packets. By safeguarding against such threats, UDLD strengthens the network’s security posture.

Understanding Port Channel

Port Channel, also known as Link Aggregation, is a mechanism that allows multiple physical links to be combined into a single logical interface. This logical interface provides higher bandwidth, improved redundancy, and load-balancing capabilities. Cisco Nexus 9000 Port Channel takes this concept to the next level, offering enhanced performance and flexibility.

a. Increased Bandwidth: By aggregating multiple physical links, the Cisco Nexus 9000 Port Channel significantly increases the available bandwidth, allowing for higher data throughput and improved network performance.

b. Redundancy and High Availability: Port Channel provides built-in redundancy, ensuring network resilience during link failures. With Cisco Nexus 9000, link-level redundancy is seamlessly achieved, minimizing downtime and maximizing network availability.

c. Load Balancing: Cisco Nexus 9000 Port Channel employs intelligent load balancing algorithms that distribute traffic across the aggregated links, optimizing network utilization and preventing bottlenecks.

d. Simplified Network Management: Cisco Nexus 9000 Port Channel simplifies network management by treating multiple links as a logical interface. This streamlines configuration, monitoring, and troubleshooting processes, leading to increased operational efficiency.

Understanding Virtual Port Channel (VPC)

VPC is a link aggregation technique that treats multiple physical links between two switches as a single logical link. This technology enables enhanced scalability, improved resiliency, and efficient utilization of network resources. By combining the bandwidth of multiple links, VPC provides higher throughput and creates a loop-free topology that eliminates the need for Spanning Tree Protocol (STP).

Implementing VPC brings several advantages to network administrators.

First, it enhances redundancy by providing seamless failover in case of link or switch failures.

Second, active-active multi-homing is achieved, ensuring traffic is evenly distributed across all available links.

Third, VPC simplifies network management by treating two switches as single entities, enabling streamlined configuration and consistent policy enforcement.

Lastly, VPC allows for the creation of large Layer 2 domains, facilitating workload mobility and flexibility.

Understanding Nexus Switch Profiles

Nexus Switch Profiles are a feature of Cisco’s Nexus switches that enable administrators to define and manage a group of switch configurations as a single entity. This simplifies the management of complex networks by reducing manual configuration tasks and ensuring consistent settings across multiple switches. By encapsulating configurations into profiles, network administrators can achieve greater efficiency and operational agility.

Implementing Nexus Switch Profiles offers a plethora of benefits for network management. Firstly, it enables rapid deployment of new switches with pre-defined configurations, reducing time and effort. Secondly, profiles ensure consistency across the network, minimizing configuration errors and improving overall reliability. Additionally, profiles facilitate streamlined updates and changes, as modifications made to a profile are automatically applied to associated switches. This results in enhanced network security, reduced downtime, and simplified troubleshooting.

A. Active-active Transport Technologies

Transport technologies interconnect data centers. As part of the transport domain, redundancies and links are provided across the site to ensure HA and resiliency. Redundancy may be provided for multiplexers, GPONs, DCI network devices, dark fibers, diversity POPs for surviving POP failure, and 1+1 protection schemes for devices, cards, and links.

In addition, the following list contains the primary considerations to consider when designing a data center interconnection solution.

  • Recovery from various types of failure scenarios: Link failures, module failures, node failures, etc.
  • Traffic round-trip requirements between DCs based on link latency and applications
  • Requirements for bandwidth and scalability

B. Active-Active Network Services

Network services connect all devices in data centers through traffic switching and routing functions. Applications should be able to forward traffic and share load without disruptions on the network. Network services also provide pervasive gateways, L2 extensions, and ingress and egress path optimization across the data centers. Most major network vendors’ SDN solutions also integrate VxLAN overlay solutions to achieve L2 extension, path optimization, and gateway mobility.

Designing active-active network services requires consideration of the following factors:

  • Recovery from various failure scenarios, such as links, modules, and network devices, is possible.
  • Availability of the gateway locally as well as across the DC infrastructure
  • Using a VLAN or VxLAN between two DCs to extend the L2 domain
  • Policies are consistent across on-premises and cloud infrastructure – including naming, segmentation rules for integrating various L4/L7 services, hypervisor integration, etc.
  • Optimizing path ingresses and regresses.
  • Centralized management includes inventory management, troubleshooting, AAA capabilities, backup and restore traffic flow analysis, and capacity dashboards.

C. Active-Active L4-L7 Services

ADC and security devices must be placed in both DCs before active-active L4-L7 services can be built. The major solutions in this space include global traffic managers, application policy controllers, load balancers, and firewalls. Furthermore, these must be deployed at different tiers for perimeter, extranet, WAN, core server farm, and UAT segments. Also, it should be noted that most of the leading L4-L7 service vendors currently offer clustering solutions for their products across the DCs. As a result of clustering, its members can share L4/L7 policies, traffic loads, and failover seamlessly in case of an issue.

Below are some significant considerations related to L4-L7 service design

  • Various failure scenarios can be recovered, including link, module, and L4-L7 device failure.
  • In addition to naming policies, L4-L7 rules for various traffic types must be consistent across the on-premises infrastructure and in the multiple clouds.
  • Network management centralized (e.g., inventory, troubleshooting, AAA capabilities, backups, traffic flow analysis, capacity dashboards, etc.)

D. Active-Active Storage Services 

Active-active data centers rely on storage and networking solutions. They refer to the storage in both DCs that serve applications. Similarly, the design should allow for uninterrupted read and write operations. Therefore, real-time data mirroring and seamless failover capabilities across DCs are also necessary. The following are some significant factors to consider when designing a storage system.

  • Recover from single-disk failures, storage array failures, and split-brain failures.
  • Asynchronous vs. synchronous replication: With synchronized replication, data is simultaneously written for primary storage and replica. It typically requires dedicated FC links, which consume more bandwidth.
  • High availability and redundancy of storage: Storage replication factors and the number of disks available for redundancy
  • Failure scenarios of storage networks: Links, modules, and network devices

E. Active-Active Server Virtualization

Over the years, server virtualization has evolved. Microservices and containers are becoming increasingly popular among organizations.  The primary consideration here is to extend hypervisor/container clusters across the DCs to achieve seamless virtual machine/ container instance movement and fail-over. VMware Docker and Microsoft are the two dominant players in this market. Other examples include KVM, Kubernetes (container management), etc.

Here are some key considerations when it comes to virtualizing servers

  • Creating a cross-DC virtual host cluster using a virtualization platform
  • HA protects the VM in normal operational conditions and creates affinity rules that prefer local hosts.
  • Deploying the same service, VMs in two DCs can take over the load in real time when the host machine is unavailable.
  • A symmetric configuration with failover resources is provided across the compute node devices and DCs.
  • Managing computing resources and hypervisors centrally

F. Active-Active Applications Deployment

The infrastructure needs to be in place for the application to function. Additionally, it is essential to ensure high application availability across DCs. Applications can also fail over and get proximity access to locations. It is necessary to have Web, App, and DB tiers available at both data centers, and if the application fails in one, it should allow fail-over and continuity.

Here are a few key points to consider

  • Deploy the Web services on virtual or physical machines (VMs) by using multiple servers to form independent clusters per DC.
  • VM or physical machine can be used to deploy App services. If the application supports distributed deployment, multiple servers within the DC can form a cluster or various servers across DCs can create a cluster (preferred IP-based access).
  • The databases should be deployed on physical machines to form a cross-DC cluster (active-standby or active-active). For example, Oracle RAC, DB2, SQL with Windows server failover cluster (WSFC)

Knowledge Check: Default Gateway Redundancy

A first-hop redundancy protocol (FHRP) always provides an active default IP gateway. To transparently failover at the first-hop IP router, FHRPs use two or more routers or Layer 3 switches.

The default gateway facilitates network communication. Source hosts send data to their default gateways. Default gateways are IP addresses on routers (or Layer 3 switches) connected to the same subnet as the source hosts. End hosts are usually configured with a single default gateway IP address when the network topology changes. The local device cannot send packets off the local network segment if the default gateway is not reached. There is no dynamic method by which end hosts can determine the address of a new default gateway, even if there is a redundant router that may serve as the default gateway for that segment.

Advanced Topics:

Understanding VXLAN Flood and Learn

The flood and learning process is an essential component of VXLAN networks. It involves flooding broadcast, unknown unicast, and multicast traffic within the VXLAN segment to ensure that all relevant endpoints receive the necessary information. By using multicast, VXLAN optimizes network traffic and reduces unnecessary overhead.

Multicast plays a crucial role in enhancing the efficiency of VXLAN flood and learn. By utilizing multicast groups, the network can intelligently distribute traffic to only those endpoints that require the information. This approach minimizes unnecessary flooding, reduces network congestion, and improves overall performance.

Several components must be in place to enable VXLAN flood and learn with multicast. We will explore the necessary configurations on the VXLAN Tunnel Endpoints (VTEPs) and the underlying multicast infrastructure. Topics covered will include multicast group management, IGMP snooping, and PIM (Protocol Independent Multicast) configuration.

Related: Before you proceed, you may find the following useful:

  1. Data Center Topologies
  2. LISP Protocol
  3. Data Center Network Design
  4. ASA Failover
  5. LISP Hybrid Cloud
  6. LISP Control Plane

Active Active Data Center Design

At its core, an active active data center is based on fault tolerance, redundancy, and scalability principles. This means that the active data center should be designed to withstand any hardware or software failure, have multiple levels of data storage redundancy, and scale up or down as needed.

The data center also provides an additional layer of security. It is designed to protect data from unauthorized access and malicious attacks. It should also be able to detect and respond to any threats quickly and in a coordinated manner.

A comprehensive monitoring and management system is essential to ensure the data center functions correctly. This system should be designed to track the data center’s performance, detect problems, and provide the necessary alerting mechanisms. It should also provide insights into how the data center operates so that any necessary changes can be made.

Cisco Validated Design

Cisco has validated this design, which is freely available on the Cisco site. In summary, they have tested a variety of combinations, such as VSS-VSS, VSS-vPV, and vPC-vPC, and validated the design with 200 Layer 2 VLANs and 100 SVIs or 1000 VLANs and 1000 SVI with static routing.

At the time of writing, the M series for the Nexus 7000 supports native encryption of Ethernet frames through the IEEE 802.1AE standard. This implementation uses Advanced Encryption Standard ( AES ) cipher and a 128-bit shared key.

Example: Cisco ACI

In the following lab guide, we demonstrate Cisco ACI. To extend Cisco ACI, we have different designs, such as multi-site and multi-pod. This type of design overcomes many challenges of raising a data center, which we will discuss in this post, such as extending layer 2 networks.

One crucial value of the Cisco ACI is the COOP database that maps endpoints in the network. The following screenshots show the synchronized COOP database across spines, even in different data centers. Notice that the bridge domain VNID is mapped to the MAC address. The COOP database is unique to the Cisco ACI.

COOP database
Diagram: COOP database

**The Challenge: Layer 2 is Weak**

The challenge of data center design is “Layer 2 is weak & IP is not mobile.” In the past, best practices recommended that networks from distinct data centers be connected through Layer 3 ( routing ), isolating the known Layer 2 turmoil. However, the business is driving the application requirements, changing the connectivity requirements between data centers.

The need for an active data center has been driven by the following. It is generally recommended to have Layer 3 connections with path separation through Multi-VRF, P2P VLANs, or MPLS/VPN, along with a modular building block data center design.

Yet, some applications cannot function over a Layer 3 environment. For example, most geo clusters require Layer 2 adjacency between their nodes, whether for heartbeat and connection ( status and control synchronization ) state information or the requirement to share virtual IP.

MAC addresses to facilitate traffic handling in case of failure. However, some clustering products ( Veritas, Oracle RAC ) support communication over Layer 3 but are a minority and don’t represent the general case.

Defining active data centers

The term active-active refers to using at least two data centers where both can service an application at any time, so each functions as an active application site. The demand for active-active data center architecture is to accomplish seamless workload mobility and enable distributed applications along with the ability to pool and maximize resources.  

We must first have active-active data center infrastructure for an active/active application setup. Remember that the network is just one key component of active/active data centers). An active-active DC can be divided into two halves from a pure network perspective:-

  1. Ingress Traffic – inbound traffic
  2. Egress Traffic – outbound traffic
active active data center
Diagram: Active active data center. Scenario. Source is twoearsonemouth

Active Active Data Center and VM Migration

Migrating applications and data to virtual machines (VMs) are becoming increasingly popular as organizations seek to reduce their IT costs and increase the efficiency of their services. VM migration moves existing applications, data, and other components from a physical server to a virtualized environment. This process is becoming increasingly more cost-effective and efficient for organizations, eliminating the need for additional hardware, software, and maintenance costs.

Virtual Machine migration between data centers increases application availability, Layer 2 network adjacency between ESX hosts is currently required, and a consistent LUN must be maintained for stateful migration. In other words, if the VM loses its IP address, it will lose its state, and the TCP sessions will drop, resulting in a cold migration ( VM does a reboot ) instead of a hot migration ( VM does not reboot ).

Due to the stretched VLAN requirement, data center architects started to deploy traditional Layer 2 over the DCI and, unsurprisingly, were faced with exciting results. Although flooding and broadcasts are necessary for IP communication in Ethernet networks, they can become dangerous in a DCI environment.

Traffic Tramboning

Traffic tromboning can also be formed between two stretched data centers, so nonoptimal internal routing happens within extended VLANs. Trombones, by their very nature, create a network traffic scalability problem. Addressing this through load balancing among multiple trombones is challenging since their services are often stateful.

Traffic tromboning can affect either ingress or egress traffic. On egress, you can have FHRP filtering to isolate the HSRP partnership and provide an active/active setup for HSRP. On ingress, you can have GSLB, Route Injection, and LISP.

Traffic Tramboning
Diagram: Traffic Tramboning. Source is Silvanogai

Cisco Active-active data center design and virtualization technologies

Virtualization technologies can overcome many of these problems by being used for Layer 2 extensions between data centers. These include vPC, VSS, Cisco FabricPath, VPLS, OTV, and LISP with its Internet locator design. In summary, different technologies can be used for LAN extensions, and the primary mediums in which they can be deployed are Ethernet, MPLS, and IP.

    1. Ethernet: VSS and vPC or Fabric Path
    2. MLS: EoMPLS and A-VPLS and H-VPLS
    3. IP: OTV
    4. LISP

Ethernet Extensions and Multi-Chassis EtherChannel ( MEC )

It requires protected DWDM or direct fibers and works only between two data centers. It cannot support multi-datacenter topology, i.e., a full mesh of data centers, but can help hub and spoke topologies.

Previously, LAG could only terminate on one physical switch. VSS-MEC and vPC are port-channeling concepts extending link aggregation to two physical switches. This allows for creating L2 typologies based on link aggregation, eliminating the dependency on STP, thus enabling you to scale available Layer 2 bandwidth by bonding the physical links.

Because vPC and VSS create a single connection from an STP perspective, disjoint STP instances can be deployed in each data center. Such isolation can be achieved with BPDU Filtering on the DCI links or Multiple Spanning Tree ( MST ) regions on each site.

At the time of writing, vPC does not support Layer 3 peering, but if you want an L3 link, create one, as this does not need to run on dark fiber or protected DWDM, unlike the extended Layer 2 links. 

Ethernet Extension and Fabric path

The fabric path allows network operators to design and implement a scalable Layer 2 fabric, allowing VLANs to help reduce the physical constraints on server location. It provides a high-availability design with up to 16 active paths at layer 2, with each path a 16-member port channel for Unicast and Multicast.

This enables the MSDC networks to have flat typologies, separating nodes by a single hop ( equidistant endpoints ). Cisco has not targeted Fabric Path as a primary DCI solution as it does not have specific DCI functions compared to OTV and VPLS.

Its primary purpose is for Clos-based architectures. However, if you need to interconnect three or more sites, the Fabric path is a valid solution when you have short distances between your DCs via high-quality point-to-point optical transmission links.

Your WAN links must support Remote Port Shutdown and microflapping protection. By default, OTV and VPLS should be the first solutions considered as they are Cisco-validated designs with specific DCI features. For example, OTV can flood unknown unicast for particular VLANs.

FabricPath
Diagram: FabricPath. Source is Cisco

IP Core with Overlay Transport Virtualization ( OTV ).

OTV provides dynamic encapsulation with multipoint connectivity of up to 10 sites ( NX-OS 5.2 supports 6 sites, and NX-OS 6.2 supports 10 sites ). OTV, also known as Over-The-Top virtualization, is a specific DCI technology that enables Layer 2 extension across data center sites by employing a MAC in IP encapsulation with built-in loop prevention and failure boundary preservation.

There is no data plane learning. Instead, the overlay control plane ( Layer 2 IS-IS ) on the provider’s network facilitates all unicast and multicast learning between sites. OTV has been supported on the Nexus 7000 since the 5.0 NXOS Release and ASR 1000 since the 3.5 XE Release. OTV as a DCI has robust high availability, and most failures can be sub-sec convergence with only extreme and very unlikely failures such as device down resulting in <5 seconds.

Locator ID/Separator Protocol ( LISP)

Locator ID/Separator Protocol ( LISP) has many applications. As the name suggests, it separates the location and identifier of the network hosts, enabling VMs to move across subnet boundaries while retaining their IP address and enabling advanced triangular routing designs.

LISP works well when you have to move workloads and distribute workloads across data centers, making it a perfect complementary technology for an active-active data center design. It provides you with the following:

  • a) Global IP mobility across subnets for disaster recovery and cloud bursting ( without LAN extension ) and optimized routing across extended subnet sites.
  • b) Routing with extended subnets for active/active data centers and distributed clusters ( with LAN extension).
LISP networking
Diagram: LISP Networking. Source is Cisco

LISP answers the problems with ingress and egress traffic tromboning. It has a location mapping table, so when a host move is detected, updates are automatically triggered, and ingress routers (ITRs or PITRs) send traffic to the new location. From an ingress path flow inbound on the WAN perspective, LISP can answer our little problems with BGP in controlling ingress flows. Without LISP, we are limited to specific route filtering, meaning if you have a PI Prefix consisting of a /16.

If you break this up and advertise into 4 x /18, you may still get poor ingress load balancing on your DC WAN links; even if you were to break this up to 8 x /19, the results might still be unfavorable.

LISP works differently than BGP because a LISP proxy provider would advertise this /16 for you ( you don’t advertise the /16 from your DC WAN links ) and send traffic at 50:50 to our DC WAN links. LISP can get a near-perfect 50:50 conversion rate at the DC edge.

Summary: Active Active Data Center Design

In today’s digital age, businesses and organizations rely heavily on data centers to store, process, and manage critical information. However, any disruption or downtime can have severe consequences, leading to financial losses and damage to reputation. This is where redundant data centers come into play. In this blog post, we explored the concept of redundant data centers, their benefits, and how they ensure uninterrupted digital operations.

Understanding Redundancy in Data Centers

Redundancy in data centers refers to duplicating critical components and systems to minimize the risk of failure. It involves creating multiple backups of hardware, power sources, cooling systems, and network connections. With redundant systems, data centers can continue functioning even if one or more components fail.

Types of Redundancy

Data centers employ various types of redundancy to ensure uninterrupted operations. These include:

1. Hardware Redundancy involves duplicate servers, storage devices, and networking equipment. If one piece of hardware fails, the redundant backup takes over seamlessly, preventing disruption.

2. Power Redundancy: Power outages can harm data center operations. Redundant power systems, such as backup generators and uninterruptible power supplies (UPS), provide continuous power supply even during electrical failures.

3. Cooling Redundancy: Overheating can damage sensitive equipment in data centers. Redundant cooling systems, including multiple air conditioning units and cooling towers, help maintain optimal temperature levels and prevent downtime.

Network Redundancy

Network connectivity is crucial for data centers to communicate with the outside world. Redundant network connections ensure that alternative paths are available to maintain uninterrupted data flow if one connection fails. This can be achieved through diverse internet service providers (ISPs), multiple routers, and network switches.

Benefits of Redundant Data Centers

Implementing redundant data centers offers several benefits, including:

1. Increased Reliability: Redundancy minimizes the risk of single points of failure, making data centers highly reliable and resilient.

2. Improved Uptime: Data centers can achieve impressive uptime percentages with redundant systems, ensuring continuous access to critical data and services.

3. Disaster Recovery: Redundant data centers are crucial in disaster recovery strategies. If one data center becomes inaccessible due to natural disasters or other unforeseen events, the redundant facility takes over seamlessly, ensuring business continuity.

Conclusion:

Redundant data centers are vital for organizations that cannot afford any interruption in their digital operations. By implementing hardware, power, cooling, and network redundancy, businesses can mitigate risks, ensure uninterrupted access to critical data, and safeguard their operations from potential disruptions. Investing in redundant data centers is a proactive measure to save businesses from significant financial losses and reputational damage in the long run.