container based virtualization

Cisco Switch Virtualization Nexus 1000v

Cisco Switch Virtualization Nexus 1000v

Virtualization has become integral to modern data centers in today's digital landscape. With the increasing demand for agility, flexibility, and scalability, organizations are turning to virtual networking solutions to meet their evolving needs. One such solution is the Nexus 1000v, a virtual network switch offering comprehensive features and functionalities. In this blog post, we will delve into the world of the Nexus 1000v, exploring its key features, benefits, and use cases.

The Nexus 1000v is a distributed virtual switch that operates at the hypervisor level, providing advanced networking capabilities for virtual machines (VMs). It is designed to integrate seamlessly with VMware vSphere, offering enhanced network visibility, control, and security.

Cisco Switch Virtualization is a revolutionary concept that allows network administrators to create multiple virtual switches on a single physical switch. By abstracting the network functions from the hardware, it provides enhanced flexibility, scalability, and efficiency. With Cisco Switch Virtualization, businesses can maximize resource utilization and simplify network management.

At the forefront of Cisco's Switch Virtualization portfolio is the Nexus 1000v. This powerful platform brings the benefits of virtualization to the data center, enabling seamless integration between virtual and physical networks. By extending Cisco's renowned networking capabilities into the virtual environment, Nexus 1000v empowers organizations to achieve consistent policy enforcement, enhanced security, and simplified operations.

The Nexus 1000v boasts a wide range of features that make it a compelling choice for network administrators. From advanced network segmentation and traffic isolation to granular policy control and deep visibility, this platform has it all. By leveraging the power of Cisco's Virtual Network Services (VNS), organizations can optimize their network infrastructure, streamline operations, and deliver superior performance.

Deploying Cisco Switch Virtualization, specifically the Nexus 1000v, requires careful planning and consideration. Organizations must evaluate their network requirements, ensure compatibility with existing infrastructure, and adhere to best practices. From designing a scalable architecture to implementing proper security measures, attention to detail is crucial to achieve a successful deployment.

To truly understand the impact of Cisco Switch Virtualization, it's essential to explore real-world use cases and success stories. From large enterprises to service providers, organizations across various industries have leveraged the power of Nexus 1000v to transform their networks. This section will highlight a few compelling examples, showcasing the versatility and value that Cisco Switch Virtualization brings to the table.

Highlights: Cisco Switch Virtualization Nexus 1000v

Hypervisor and vSphere Introduction

An operating system can run multiple operating systems on a single hardware host using a hypervisor, also known as a virtual machine manager. Operating systems use the host’s processor, memory, and other resources. Hypervisors control the host processor, memory, and other resources and allocate what each operating system needs. Hypervisors run guest operating systems or virtual machines on top of them.

Designed specifically for integration with VMware vSphere environments, the Cisco Nexus 1000V Series Switch runs Cisco NX-OS software. Enterprise-class performance, scalability, and scalability are delivered by VMware vSphere 2.0 across multiple platforms. Within the VMware ESX hypervisor, the Nexus 1000V runs. With the Cisco Nexus 1000V Series, you can take advantage of Cisco VN-Link server virtualization technology

• Policy-based virtual machine (VM) connectivity

• Mobile VM security

• Network policy

Nondisruptive operational model for your server virtualization and networking teams

As with physical servers, virtual servers can be configured with the same network configuration, security policy, diagnostic tools, and operational models as physical servers. The Cisco Nexus 1000V Series is also compatible with VMware vSphere, vCenter, ESX, and ESXi.

A brief overview of the Nexus 1000V system

There are two primary components of the Cisco Nexus 1000V Series switch:

VEM (Virtual Ethernet Module): Executes inside hypervisors

VSM (External Virtual Supervisor Module): Manages VEMs

Nexus 1000v implements a generic concept of Cisco Distributed Virtual Switch (DVS). VMware ESX or ESXi executes the Cisco Nexus 1000V Virtual Ethernet Module (VEM). The VEM’s application programming interface (API) is VMware vNetwork Distributed Switch (vDS). By integrating the API with VMware VMotion and Distributed Resource Scheduler (DRS), advanced networking capabilities can be provided to virtual machines. In the VEM, Layer 2 switching and advanced networking functions are performed based on configuration information from the VSM:

Nexus Switch Virtualization

Virtual routing and forwarding

Virtual routing and forwarding form the basis of this stack. Firstly, network virtualization comes with two primary methods: 1) One too many and 2) Many to one.  The “one too many” network virtualization method means you segment one physical network into multiple logical segments. Conversely, the “many to one” network virtualization method consolidates numerous physical devices into one logical entity. By definition, they seem to be opposites, but they fall under the same umbrella in network virtualization.

Before you proceed, you may find the following posts helpful:

  1. Container Based Virtualization
  2. Virtual Switch
  3. What is VXLAN
  4. Redundant Links
  5. WAN Virtualization
  6. What Is FabricPath

Cisco Switch Virtualization.

Key Nexus 1000v Discussion Points:


  • Introduction to Nexus1000v and what is involved.

  • Highlighting the details on Cisco switch virtualization. Logical separation. 

  • Technical details on the additional overhead from virtualization.

  • Scenario: Network virtualization.

  • A final note on software virtual switch designs.

Back to basics with network virtualization

Before we get stuck in Cisco virtualization, let us address some basics. For example, if you have multiple virtual endpoints share a physical network. Still, different virtual endpoints belong to various customers, and the communication between these endpoints also needs to be isolated. In other words, the network is a resource, too, and network virtualization is the technology that enables the sharing of a standard physical network infrastructure.

Virtualization uses software to simulate traditional hardware platforms and create virtual software-based systems. For example, virtualization allows specialists to construct a single virtual network or partition a physical network into multiple virtual networks.

Cisco Switch Virtualization: Logical segmentation: One too many

We have one-to-many network virtualization for the Cisco switch virtualization design; a single physical network is logically segmented into multiple virtual networks. For example, each virtual network could correspond to a user group or a specific security function.

End-to-end path isolation requires the virtualization of networking devices and their interconnecting links. VLANs have been traditionally used, and hosts from one user group are mapped to a single VLAN. To extend the path across multiple switches at Layer 2, VLAN tagging (802.1Q) can carry VLAN information between switches. These VLAN trunks were created to transport multiple VLANs over a single Ethernet interface.

The diagram below displays two independent VLANs, VLAN201 and VLAN101. These VLANs can share one physical wire to provide L2 reachability between hosts connected to Switch B and Switch A via Switch C, but they remain separate entities.

Nexus1000v
Nexus1000v: The operation

VLANs are sufficient for small Layer 2 segments. However, today’s networks will likely have a mix of Layer 2 and 3 routed networks. In this case, Layer 2 VLANs alone are insufficient because you must extend the Layer 2 isolation over a Layer 3 device. This can be achieved by using Virtual Routing and Forwarding ( VRF ), the next step in the Cisco switch virtualization. A virtual routing and forwarding instance logically carves a Layer 3 device into several isolated independent L3 devices. The virtual routing and forwarding configured locally cannot communicate directly.

The diagram below displays one physical Layer 3 router with three VRFs: VRF Yellow, VRF Red, and VRF Blue. These virtual routing and forwarding instances are completely separated; without explicit configuration, routes in one virtual routing and forwarding instance cannot be leaked to another.

Virtual Routing and Forwarding

virtual routing and forwarding

The virtualization of the interconnecting links depends on how the virtual routers are connected. If they are physically ( directly ) connected, you could use a technology known as VRF-lite to separate traffic and 802.1Q to label the data plane. This is known as hop-by-hop virtualization. However, it’s possible to run into scalability issues when the number of devices grows. This design is typically used when you connect virtual routing and forwarding back to back, i.e., no more than two devices.

When the virtual routers are connected over multiple hops through an IP cloud, you can use generic routing encapsulation ( GRE ) or Multiprotocol Label Switching ( MPLS ) virtual private networks.

GRE is probably the simpler of the Layer 3 methods, and it can work over any IP core. GRE can encapsulate the contents and transport them over a network with the network unaware of the packet contents. Instead, the core will see the GRE header, virtualizing the network path.

Cisco Switch Virtualization: The additional overhead

When designing Cisco switch virtualization, you need to consider the additional overhead. There are a further 24 bytes overhead for the GRE header, so it may be the case that the forwarding router may break the datagram into two fragments, so the packet may not be larger than the outgoing interface MTU. To resolve the fragmentation issue, you can correctly configure MTU, MSS, and Path MTU parameters on the outgoing and intermediate routers.

The GRE standard is typically static. You only need to configure tunnel endpoints, and the tunnel will be up as long as you can reach those endpoints. However, recent designs can establish a dynamic GRE tunnel.

GRE over IPsec

MPLS/VPN, on the other hand, is a different beast. It requires signaling to distribute labels and build an end-to-end Label Switched Path ( LSP ). The label distribution can be done with BGP+label, LDP, and RSVP. Unlike GRE tunnels, MPLS VPNs do not have to manage multiple point-to-point tunnels to provide a full mesh of connectivity. Instead, they are used for connectivity, and packets’ labels provide traffic separation.

Cisco switch virtualization: Many to one

Many-to-one network consolidation refers to grouping two or more physical devices into one. Examples of this Cisco switch virtualization technology include a Virtual Switching System ( VSS ), Stackable switches, and Nexus VPC. Combining many physicals into one logical entity allows STP to view the logical group as one, allowing all ports to be active. By default, STP will block the redundant path.

Software-defined networking takes this concept further; it completely abstracts the entire network into a single virtual switch. The control and data planes are on the same device on traditional routers, yet they are decoupled with SDN. The control plan is now on a policy-driven controller, and the data plane is local on the OpenFlow-enabled switch.

Network Virtualization

Server and network virtualization presented the challenge of multiple VMs sharing a single network physical port, such as a network interface controller ( NIC ). The question then arises: how do I link multiple VMs to the same uplink? How do I provide path separation? Today’s networks need to virtualize the physical port and allow the configuration of policies per port.

Nexus 1000

NIC-per-VM design

One way to do this is to have a NIC-per-VM design where each VM is assigned a single physical NIC, and the NIC is not shared with any other VM. The hypervisor, aka virtualization layer, would be bypassed, and the VM would access the I/O device directly. This is known as VMDirectPath. This direct path or pass-through can improve performance for hosts that utilize high-speed I/O devices, such as 10 Gigabit Ethernet. However, the lack of flexibility and the ability to move VMs offset higher performance benefits.  

Virtual-NIC-per-VM in Cisco UCS (Adapter FEX)

Another way to do this is to create multiple logical NICs on the same physical NIC, such as Virtual-NIC-per-VM in Cisco UCS (Adapter FEX). These logical NICs are assigned directly to VMs, and traffic gets marked with a vNIC-specific tag on the hardware (VN-Tag/802.1ah). The actual VN-Tag tagging is implemented in the server NICs so that you can clone the physical NIC in the server to multiple virtual NICs. This technology provides faster switching and enables you to apply a rich set of management features to local and remote traffic.

Software Virtual Switch

The third option is to implement a virtual software switch in the hypervisor. For example, VMware introduced virtual switching compatibility with its vSphere ( ESXi ) hypervisor, called vSphere Distributed Switch ( VDS ). Initially, they introduced a local L2 software switch, which was soon phased out due to a lack of distributed architecture.

Data physically moves between the servers through the external network, but the control plane abstracts this movement to look like one large distributed switch spanning multiple servers. This approach has a single management and configuration point, similar to stackable switches – one control plane with many physical data forwarding paths. The data does not move through a parent partition but logically connects directly to the network interface through local vNICs associated with each VM.

Network virtualization and Nexus 1000v ( Nexus 1000 )

The VDS introduced by VMware lacked any good networking features, which led Cisco to introduce the Nexus 1000V software-based switch. The Nexus 1000v is a multi-cloud, multi-hypervisor, and multi-services distributed virtual switch. Its function is to enable communication between VMs.

Nexus1000v
Nexus1000v: Virtual Distributed Switch.

Nexus 1000 components: VEM and VSM

The Nexus 1000v has two essential components:

  1. The Virtual Supervisor Module ( VSM )
  2. The Virtual Ethernet Module ( VEM ).

Compared to a physical switch, the VSM could be viewed as the supervisor, setting up the control plane functions for the data plane to forward efficiently, and the VEM as the physical line cards that do all the packet forwarding. The VEM is the software component that runs within the hypervisor kernel. It handles all VM traffic, including inter-VM frames and Ethernet traffic between a VM and external resources.

The VSM runs its NX-OS code and controls the control and management planes, which integrate into a cloud manager, such as a VMware vCenter. You can have two VSMs for redundancy. Both modules remain constantly synchronized with unicast VSM-to-VSM heartbeats to provide stateful failover in the event of an active VSM failure.

The two available communication options for VSM to VEM are:

  1. Layer 2 control mode: The VSM control interface shares the same VLAN with the VEM.
  2. Layer 3 control mode: The VEM and the VSM are in different IP subnets.

The VSM also uses heartbeat messages to detect a loss of connectivity between it and the VEM. However, the VEM does not depend on connectivity to the VSM to perform its data plane functions and will continue forwarding packets if the VSM fails.

 

With Layer 3 control mode, the heartbeat messages are encapsulated in a GRE envelope.

 

Nexus 1000 and VSM best practices

  • L2 control is recommended for new installations.
  • Use MAC pinning instead of LACP.
  • Packet, Control, and Management in the same VLAN.
  • Do not use VLAN 1 for Control and Packet.
  • Use 2 x VSM for redundancy. 

The max latency between VSM and VEM is ten milliseconds. Therefore, a VSM can be placed outside the data center if you have a high-quality DCI link, and the VEM can still be controlled.

Nexus 1000v InterCloud – Cisco switch virtualization

A vital element of the Nexus 1000 is its use case for hybrid cloud deployments and its ability to place workloads in private and public environments via a single pane of glass. In addition, the Nexus 1000v interCloud addresses the main challenges with hybrid cloud deployments, such as security concerns and control/visibility challenges within the public cloud.

The Nexus 1000 interCloud works with Cisco Prime Service Controller to create a secure L2 extension between the private data center and the public cloud.

This L2 extension is based on Datagram Transport Layer Security ( DTLS ) protocol and allows you to securely transfer VMs and Network services over a public IP backbone. DTLS derives the SSL protocol and provides communications privacy for datagram protocols, so all data in motion is cryptographically isolated and encrypted.

Nexus 1000
Nexus 1000 and Hybrid Cloud.

 

Nexus 1000v Hybrid Cloud Components 

Cisco Prime Network Service Controller for InterCloud **A VM that provides a single pane of glass to manage all functions of the inter clouds
InterCloud VSMManage port profiles for VMs in the InterCloud infrastructure
InterCloud ExtenderProvides secure connectivity to the InterCloud Switch in the provider cloud. Install in the private data center.
InterCloud SwitchVirtual Machine in the provider data center has secure connectivity to the InterCloud Extender in the enterprise cloud and secure connectivity to the Virtual Machines in the provider cloud.
Cloud Virtual MachinesVMs in the public cloud running workloads.

Prerequisites

Port 80HTTP access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 443HTTPS access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 22SSH from PNSC to InterCloud VMs in the provider cloud
UDP 6644DTLS data tunnel
TCP 6644DTLS control tunnel

VXLAN – Virtual Extensible LAN

The requirement for applications on demand has led to an increased number of required VLANs for cloud providers. The standard 12-bit identifier, which provided 4000 VLANs, proved to be a limiting factor in multi-tier, multi-tenant environments, and engineers started to run out of isolation options.

This has introduced a 24-bit VXLAN identifier, offering 16 million logical networks. Now, we can cross Layer 3 boundaries. The MAC in UDP encapsulation uses switch hashing to analyze UDP packets and efficiently distribute all packets in a port channel.

nexus 1000
VXLAN operations

VXLAN works like a layer 2 bridge ( Flood and Learn ); the VEM learn does all the heavy lifting, learns all the VM source MAC and Host VXLAN IPs, and encapsulates the traffic according to the port profile to which the VM belongs. Broadcast, Multicast, and unknown unicast traffic are sent as Multicast.

At the same time, unicast traffic is encapsulated and shipped directly to the destination host’s VXLAN IP, aka destination VEM. Enhanced VXLAN offers VXLAN MAC distribution and ARP termination, making it more optional. 

VXLAN Mode Packet Functions

PacketVXLAN(multicast mode)Enhanced VXLAN(unicast mode)Enhanced VXLANMAC DistributionEnhanced VXLANARP Termination
Broadcast /MulticastMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapReplication plus Unicast Encap
Unknown UnicastMulticast EncapsulationReplication plus Unicast EncapDropDrop
Known UnicastUnicast EncapsulationUnicast EncapUnicast EncapUnicast Encap
ARPMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapVEM ARP Reply

vPath – Service chaining

Intelligent Policy-based traffic steering through multiple network services.

vPath allows you to intelligently traffic steer VM traffic to virtualized devices. It intercepts and redirects the initial traffic to the service node. Once the service node performs its policy function, the result is cached, and the local virtual switch treats the subsequent packets accordingly. In addition, it enables you to tie services together to push the VM through each service as required. Previously, if you wanted to tie services together in a data center, you needed to stitch the VLANs together, which was limited by design and scale.

Cisco virtualization
Nexus and service chaining

vPath 3.0 is now submitted to the IETF for standardization, allowing service chaining with vPath and non-vpath network services. It enables you to use vpath service chaining between multiple physical devices and supporting multiple hypervisors.

License Options 

Nexus 1000 Essential EditionNexus 1000 Advanced Edition
Full Layer-2 Feature SetAll Features of Essential Edition
Security, QoS PoliciesVSG firewall
VXLAN virtual overlaysVXLAN Gateway
vPath enabled Virtual ServicesTrustSec SGA
Full monitoring and management capabilitiesA platform for other Cisco DC Extensions in the Future
Free$695 per CPU MSRP

Nexus 1000 features and benefits

SwitchingL2 Switching, 802.1Q Tagging, VLAN, Rate Limiting (TX), VXLAN
IGMP Snooping, QoS Marking (COS & DSCP), Class-based WFQ
SecurityPolicy Mobility, Private VLANs w/ local PVLAN Enforcement
Access Control Lists, Port Security, Cisco TrustSec Support
Dynamic ARP inspection, IP Source Guard, DHCP Snooping
Network ServicesVirtual Services Datapath (vPath) support for traffic steering & fast-path off-load[leveraged by Virtual Security Gateway (VSG), vWAAS, ASA1000V]
ProvisioningPort Profiles, Integration with vC, vCD, SCVMM*, BMC CLM
Optimized NIC Teaming with Virtual Port Channel – Host Mode
VisibilityVM Migration Tracking, VC Plugin, NetFlow v.9 w/ NDE, CDP v.2
VM-Level Interface Statistics, vTrackerSPAN & ERSPAN (policy-based)
ManagementVirtual Centre VM Provisioning, vCenter Plugin, Cisco LMS, DCNM
Cisco CLI, Radius, TACACs, Syslog, SNMP (v.1, 2, 3)
Hitless upgrade, SW Installer

Advantages and disadvantages of the Nexus 1000

AdvantagesDisadvantages
The Standard edition is FREE; you can upgrade to an enhanced version when needed.VEM and VSM internal communication is very sensitive to latency. Due to their chatty nature, they may not be good for inter-DC deployments.
Easy and Quick to deployVSM – VEM, VSM (active) – VSM (standby) heartbeat time of 6 seconds makes it sensitive to network failures and congestion.
It offers you many rich network features unavailable on other distributed software switches.VEM over-dependency on VSM reduces resiliency.
Hypervisor agnosticVSM is required for vSphere HA, FT, and VMotion to work.
Hybrid Cloud functionality 

Closing Points on Cisco Nexus 1000v

Key Features and Functionalities:

Virtual Ethernet Module (VEM):

The Nexus 1000v employs the Virtual Ethernet Module (VEM), which runs as a module inside the hypervisor. This allows for efficient and direct communication between VMs, bypassing the traditional reliance on the hypervisor networking stack.

Virtual Supervisor Module (VSM):

The Virtual Supervisor Module (VSM) serves as the control plane for the Nexus 1000v, providing centralized management and configuration. It enables network administrators to define policies, manage virtual ports, and monitor network traffic.

Policy-Based Virtual Network Management:

With the Nexus 1000v, administrators can define policies to manage virtual networks. These policies ensure consistent network configurations across multiple hosts, simplifying network management and reducing the risk of misconfigurations.

Advanced Security and Monitoring Capabilities:

The Nexus 1000v offers granular security controls, including access control lists (ACLs), port security, and dynamic host configuration protocol (DHCP) snooping. Additionally, it provides comprehensive visibility into network traffic, enabling administrators to monitor and troubleshoot network issues effectively.

Benefits of the Nexus 1000v:

Enhanced Network Performance:

By offloading network processing to the VEM, the Nexus 1000v minimizes the impact on the hypervisor, resulting in improved network performance and reduced latency.

Increased Scalability:

The distributed architecture of the Nexus 1000v allows for seamless scalability, ensuring that organizations can meet the growing demands of their virtualized environments.

Simplified Network Management:

With its policy-based approach, the Nexus 1000v simplifies network management tasks, enabling administrators to provision and manage virtual networks more efficiently.

Use Cases:

Data Centers:

The Nexus 1000v is particularly beneficial in data center environments where virtualization is prevalent. It provides a robust and scalable networking solution, ensuring optimal performance and security for virtualized workloads.

Cloud Service Providers:

Cloud service providers can leverage the Nexus 1000v to enhance their network virtualization capabilities, offering customers more flexibility and control over their virtual networks.

The Nexus 1000v is a powerful virtual network switch that provides advanced networking capabilities for virtualized environments. Its rich features, policy-based management approach, and seamless integration with VMware vSphere allow organizations to achieve enhanced network performance, scalability, and management efficiency. As virtualization continues to shape the future of data centers, the Nexus 1000v remains a valuable tool for optimizing virtual network infrastructures.

 

Summary: Cisco Switch Virtualization Nexus 1000v

Welcome to our blog post, where we dive into the world of Cisco Switch Virtualization, explicitly focusing on the Nexus 1000v. In this article, we will unravel the complexities surrounding switch virtualization, explore the key features of the Nexus 1000v, and understand its significance in modern networking environments.

Understanding Switch Virtualization

Switch virtualization is a technique that allows for creating multiple virtual switches on a single physical switch, enabling greater flexibility and efficiency in network management. Organizations can consolidate their infrastructure, reduce costs, and streamline network operations by virtualizing switches.

Introducing the Nexus 1000v

The Cisco Nexus 1000v is a powerful switch virtualization solution that extends the functionality of VMware environments. Unlike traditional virtual switches, it provides a more comprehensive set of features and advanced network control. It seamlessly integrates with VMware vSphere, offering enhanced visibility, security, and policy management.

Key Features of the Nexus 1000v

– Distributed Virtual Switch: The Nexus 1000v operates as a distributed virtual switch, distributing network intelligence across all hosts in the virtualized environment. This ensures consistent policies, simplified troubleshooting, and improved performance.

– Virtual Port Profiles: With virtual port profiles, administrators can define consistent network policies for virtual machines, irrespective of their physical location. This simplifies network provisioning and reduces the chances of misconfigurations.

– Network Analysis Module (NAM): The Nexus 1000v incorporates NAM, a robust monitoring and analysis tool that provides deep visibility into virtual network traffic. This enables administrators to identify and resolve network issues, ensuring optimal performance quickly.

Deployment Considerations

When planning to deploy the Nexus 1000v, it is essential to consider factors such as network architecture, compatibility with existing infrastructure, and scalability requirements. It is advisable to consult with Cisco experts or certified partners to ensure a smooth and successful implementation.

Conclusion:

In conclusion, the Cisco Nexus 1000v is a game-changer in switch virtualization. Its advanced features, seamless integration with VMware environments, and extensive network control make it an ideal choice for organizations seeking to optimize their network infrastructure. By understanding the fundamentals of switch virtualization and exploring Nexus 1000v’s capabilities, network administrators can unlock a world of possibilities in network management and performance.

WAN Design Requirements

Spine Leaf Architecture

Spine Leaf Architecture

In today's interconnected world, where data traffic is growing exponentially, having a robust and scalable network architecture is crucial for businesses and organizations. One such architecture that has gained popularity in recent years is the Spine Leaf architecture. This blog post will explore Spine Leaf architecture, its benefits, and how it can revolutionize network design.

Spine Leaf architecture, also known as Clos architecture, is a network design approach that offers high bandwidth, low latency, and scalability. It is commonly used in data centers and large-scale enterprise networks. The architecture is based on a two-tier design consisting of spine switches and leaf switches.

The spine-leaf architecture is a data center design that provides a scalable and low-latency network fabric. Unlike traditional three-tiered designs, the spine-leaf architecture eliminates the need for complex hierarchical structures, allowing for faster and more efficient communication between devices. With a non-blocking fabric and equal-length paths, data can travel seamlessly from one leaf switch to another, enhancing overall network performance.

The spine-leaf data center offers a myriad of benefits for organizations. Firstly, it provides predictable and consistent low-latency connectivity, ensuring optimal performance for mission-critical applications. Additionally, the architecture allows for easy scalability, enabling seamless expansion as network demands grow. Furthermore, the simplified design reduces complexity, making deployment and management more efficient. Overall, the spine-leaf data center empowers organizations with a robust and agile network infrastructure.

One of the key advantages of the spine-leaf data center is its ability to enhance network flexibility and resilience. By utilizing equal-length paths, traffic can be distributed evenly, preventing bottlenecks and maximizing network capacity. Moreover, the architecture allows for the implementation of link aggregation techniques, increasing bandwidth and redundancy. These features not only improve network performance but also provide built-in fault tolerance, ensuring high availability for critical applications.

The spine-leaf architecture represents a significant shift in the evolution of data center networks. Traditional designs often faced challenges in adapting to the increasing demands of virtualization, cloud computing, and big data analytics. The spine-leaf data center addresses these challenges by providing a scalable, high-performance, and flexible network design that can meet the requirements of modern applications and workloads. It sets the stage for the future of data center networking.

Highlights: Spine Leaf Architecture

The spine-leaf architecture has only two layers of switches: spines and leaves. Switches form the spine layer, which performs routing and works as the network’s core. Access switches connect servers, storage devices, and other end users to the leaf layer. A data center network with this structure has a lower hop count and a lower network latency. Leaf switches are connected to spine switches in the spine-leaf architecture. In this design, there is only one interconnected switch path between any leaf switches so that any server can communicate with any other server.

Why Use Spine-leaf Architecture?

The spine-leaf architecture has become a famous data center architecture, bringing many advantages, including scalability and network performance. In five points, we summarize the benefits of spine-leaf architecture in modern networks.

Enhanced redundancy: The spine-leaf architecture connects the servers with the core network, providing greater flexibility in hyperscale data centers. As a result, the leaf switch can serve as a bridge between the server and the core network. By connecting leaf switches to spine switches, a sizeable non-blocking fabric is formed, increasing redundancy and reducing traffic bottlenecks.

Enhanced bandwidth: Through protocols such as transparent interconnection of multiple links (TRILL) and shortest path bridging (SPB), the spine-leaf architecture can effectively avoid traffic congestion. Adding uplinks to the spine switch increases interlayer bandwidth and reduces oversubscription to secure network stability using the spine-leaf architecture.

Enhanced scalability: Multiple links carry traffic in the spine-leaf architecture. In addition to improving scalability, switches can help enterprises expand their businesses in the future.

Reduced expenses: Because spine-leaf architecture allows switches to handle more connections, data centers deploy fewer devices. A spine-leaf architecture minimizes costs in many data center networks.

Increased Performance: With a maximum number of hops of only two, we facilitate a more direct traffic path, enhancing overall performance and reducing bottlenecks. This applies only when the destination is on the same leaf switch.

leaf and spine

Spine and Leaf Popularity

Because of cloud computing and containerized infrastructure, east-west traffic increases in modern data centers. East-west traffic moves from server to server in a lateral fashion. Modern applications have components distributed across multiple servers or virtual machines, which explains this shift in part. When it comes to east-west traffic, low-latency, optimized flows are critical for applications that are time-sensitive or data-intensive. Spine-leaf architectures reduce latency by ensuring every hop between destinations is the same.

STP has also been removed, increasing capacity. Only one switch can be active simultaneously, even though STP can provide redundant paths between two switches. Consequently, paths are oversubscribed. Spine-leaf architectures use protocols such as Equal-Cost Multipath (ECMP) to accomplish load-balancing traffic across all available paths across all available paths. Topologies with spines and leaves improve scalability and performance. Capacity can be increased by adding additional spine switches and connecting them to each leaf. Likewise, new leaf switches can be seamlessly inserted if port density becomes an issue. “Scaling out” infrastructure doesn’t change anything.

stp port states

Charles Clos – large scale switching fabrics

Using Edson Erwin’s concept of building large-scale switching fabrics for telephone systems, Charles Clos (pronounced Klo) developed the Clos network, published in the Bell System Technical Journal in 1953. The original paper, “A Study of Non-Blocking Switching Networks,” is cited in hundreds of subsequent documents. In telephony systems, a Clos network consists of three stages, each with several crossbar switches. To reduce complexity and cost, stages were introduced instead of a single prominent crossbar to reduce the number of crosspoint interconnections needed to build large-scale crossbar-like functionality.

Crossbar switches are strictly non-blocking switches with n inputs and n outputs and interconnecting lines connecting inputs and outputs. For idle input and output lines, non-blocking means that connections can be made without interrupting other connected lines. A crossbar is fundamentally designed to accomplish this. In this case, the complexity of the crossbar switch is O(n2).

Clos. Non Blocking

Data center topology

A spine-leaf architecture is a variation of data center topologies that consists of two switching layers. We have a spine-leaf switch design consisting of two layers. The leaf layer consists of access switches that aggregate traffic from endpoints that could be traditional servers or containers and connect directly to the spine, which is the network core. The Spine switch will often have two for redundancy to interconnect all leaf switches in a full-mesh leaf and spine topology. With a spine and leaf data center network design, the leaf switches do not directly connect.

The underlay and the overlay

eBGP, in this case, is used to exchange routing information between the nodes of the fabric through the underlay, which provides point-to-point Layer 3 interfaces between leafs and spines. Using eBGP to advertise the loopback addresses of VTEPs in the fabric (typically leaves), the underlay provides connectivity between the loopbacks.

In the overlay layer, packets are encapsulated in an outer IP header and transported from one VTEP to another using the data plane encapsulation layer. Source IP addresses are the loopbacks of the originating VTEPs, and destination IP addresses are the loopbacks of the terminating VTEPs.

Multicast VXLAN
Diagram: Multicast VXLAN

Example: Cisco ACI

Instead, all connectivity goes through the core, and the physical and logical layout is generally the same based on network overlay protocols, more than likely VXLAN. An example of a data center that utilizes such a design is the Cisco ACI. The ACI Cisco consists of three main components: the Application Policy Infrastructure Controller (APIC), the spine switches, and the leaf switches.



Spine Leaf Architecture

Key Spine Leaf Architecture Discussion Points:


  • Introduction to the spine leaf architecture and what is involved.

  • Highlighting the details of this type of data center design.

  • Critical points on spine-leaf switch requirements.

  • Technical details on the origins of this design.

  • Technical solutions that can be used in the leaf and spine design.

Back to Basics: Data center design

At its most straightforward, a data center is a physical facility that houses applications and data. Such a design is based on a computing and storage resources network that enables the delivery of shared applications and data. The critical elements of a data center design include routers, switches, firewalls, storage systems, servers, and application-delivery controllers.

The data center should be flexible in quickly deploying and supporting new services. Such a design needs substantial initial planning and consideration of port density, access layer uplink bandwidth, actual server capacity, and oversubscription, to name a few.

Traditional Tree-Based Topologies

We have tree-based topologies on the opposite side of a spine-leaf switch design. Tree-based topologies have been the mainstay of data center networks. Traditionally, Cisco has recommended a multi-tier tree-based data center topology, as depicted in the diagram below.

These networks are characterized by aggregation pairs ( AGGs ) that aggregate through many network points. Hosts connect to access or edge switches, which connect to distribution, and distribution connects to the core.

The core should offer no services ( firewall, load balancing, or WAAS ), and its central role is to forward packets as quickly as possible. The aggregation switches define the boundary for the Layer 2 domain, and to contain broadcast traffic to individual domains, VLANs are used to subdivide traffic into segmented groups further. This style of design operates very differently from spine-leaf architecture.

1st Lab Guide: Leaf and Spine with Cisco ACI 

The following lab guide addresses the leaf and spine with Cisco ACI. The screenshot below shows a small topology that is fine for demonstration purposes. The leaf and spine are based on the Cisco Nexus 9000 series. The ACI has an automated fabric discovery process, and as you can see, we have all fabric members successfully registered.

SDN data center
Diagram: Cisco ACI fabric checking.

The traditional three-tier model was based on the following design principles:

  1. The access switch connects to endpoints, e.g., servers.
  2. The aggregation or distribution switches provide redundant connections to access switches.
  3. The core switches provide fast transport between aggregation switches, typically connected in a redundant pair for high availability.
  4. Networking and security services such as load balancing or firewalling were typically connected to the distribution layers.
spine leaf architecture
The traditional data center design. Non spine-leaf architecture.

The focus of the design

Their design’s focus was based on fault avoidance principles, and the strategy for implementing this principle is to take each switch and its connected links and build redundancy into it. This led to the introduction of port channels and devices deployed in pairs. In addition, servers pointed to a First Hop Redundancy Protocol, like HSRP or VRRP ( Hot Standby Router Protocol or Virtual Router Redundancy Protocol ). Unfortunately, the steady-state type of network design led to many inefficiencies:

  1. Inefficient use of bandwidth via a single-rooted core.
  2. Operational and configuration complexity.
  3. The cost of having redundant hardware.
  4. It is not optimized for small flows.

Recent changes to application and user requirements have changed the functions of data centers, which in turn has altered the topology and design of the data center to a spine-leaf switch topology. For example, the traditional aggregation point design style was inefficient, and recent changes in end-user requirements are driving architects to design around the following key elements.

Spine Leaf Architecture: Requirements

A spine-leaf architecture collapses one of these tiers at the most basic level, as depicted in the diagram below. Follow the following design principles:

  1. The removal of the Spanning Tree Protocol (STP)
  2. Increased use of fixed-port switches over modular models for the network backbone
  3. More cabling to purchase and manage, given the higher interconnection count
  4. A scale-out vs. scale-up of infrastructure.
what is spine and leaf architecture
Diagram: What is spine and leaf architecture? 2-Tier Spine Leaf Design

Leaf and Spine Main Points

With the introduction of the cloud and containerized infrastructure, there was an increase in east-west traffic. East-west traffic differs from north to south traffic and moves laterally from server to server. Generally, this type of traffic flow stays internal to the data center.

With the change in traffic patterns, we must design our data centers to have low latency and optimized traffic flows, especially for time-sensitive or data-intensive applications. A spine-leaf data center design aids this by ensuring traffic always has the same number of hops from its next destination, so latency is lower and predictable.

STP has always been problematic in the data center. Now, the capacity improves with a leaf and spine because STP is no longer required. In the past, STP blocked redundant paths between two switches, where only one could be active at any time.

As a result, paths often need to be more subscribed. With a leaf, spine-leaf architectures rely on protocols such as Equal-Cost Multipath (ECPM) routing to load balance traffic across all available paths while still preventing network loops. So, instead of running STP to the spine layer, we can run routing protocols.

We also have better scalability. We can add additional spine switches, and leaf switches can be seamlessly inserted when port density becomes problematic. There is no need to take down the core layer for upgrades.

STP Blocking.
Diagram: STP Blocking. Source Cisco Press free chapter.

Data Center Requirements

  • 1) Equidistant endpoints with non-blocking network core.

Equidistant endpoints mean that every device is a maximum of one hop away from the other, resulting in consistent latency in the data center. The term “non-blocking” refers to the internal forwarding performance of the switch.

Non-blocking is the ability to forward at line rate tx/Rx – sender X can send to receiver Y and not be blocked by a simultaneous sender. A blocking architecture cannot deliver the total bandwidth even if the individually switching modules are not oversubscribed or if all ports are not transmitting simultaneously.

  • 2) Unlimited workload placement and mobility.

The application team wants to place the application at any point in the network and communicate with existing services like storage. This usually means that VLANs need to sprawl for VMotion to work. The main question is, where do we need large layer 2 domains? Bridging doesn’t scale, and that’s not just because of spanning tree issues; it’s because the MAC addresses are not hierarchical and cannot be summarized. There is also a limit of 4000 VLANs.

  • 3) Lossless transport for storage and other elephant flows.

To support this type of traffic, data centers require not only conventional QoS tools but also Data Center Bridging ( DCB ) tools such as Priority flow control ( PFC ), Enhanced transmission selection ( ETS ), and Data Center Bridging Exchange ( DCBX ) to be applied throughout their designs. These standards are enhancements that allow lossless transport and congestion notification over full-duplex 10 Gigabit Ethernet networks.

Feature

 Benefit

Priority-based Flow Control ( PFC )

Manages bursty single traffic source on a multiprotocol link

Enhanced transmission selection ( ETS )

Enables bandwidth management between traffic types for multiprotocol links

Congestion notification

Addresses the problems of sustained congestion by moving corrective action to the edge of the network

Data Center Bridging Exchange Protocol 

Allows the exchange of enhanced Ethernet parameters

  • 4) Simplified provisioning and management.

Simplified provisioning and management are critical to operational efficiency. However, the ability to auto-provision and for the users to manage their networks is challenging for future networks.

  • 5) High server-to-access layer transmission rate at Gigabit and 10 Gigabit Ethernet.

Before the advent of virtualization, servers transitioned from 100Mbps to 1GbE as processor performance increased. With the introduction of high-performance multicore processors and each physical server hosting multiple VMs, the processor-to-network connection bandwidth requirements increased dramatically, making 10 Gigabit Ethernet the most common network access option for servers.

In addition, the popularization of 10 Gigabit Ethernet for server access has provided a straightforward approach to group/bundle multiple Gigabit Ethernet interfaces into a single connection, making Ethernet an extremely viable technology for future-proof I/O consolidation.

In addition, to reduce networking costs, data centers are now carrying data and storage traffic over Ethernet using protocols such as iSCSI ( Internet Small Computer System Interface ) and FCoE ( Fibre Channel over Ethernet ). FCoE allows the transport of Fibre channels over a lossless Ethernet network.

spine-leaf switch
FCoE Frame Format

Although there has been some talk of introducing 25 Gigabit Ethernet due to the excessive price of 40 Gigabit Ethernet, the two main speeds on the market are Gigabit and 10 Gigabit Ethernet. The following is a comparison table between Gigabit and 10 Gigabit Ethernet:

Gigabit Ethernet

 10 Gigabit Ethernet

+ Well know and field-tested

+ Much faster vMotion

+ Standard and cheap Copper cabling

+ Converged storage & network ( FCoE or lossless iSCSI/NFS)

+ NIC on the motherboard

+ Reduce the number of NICs per server

+ Cedric Kelly

+ Built-in Qos with ETS and PFC

+ Uses fiber cabling which has lower energy consumption and error rate

– Numerous NICs per hypervisor host. Maybe up to 6 NICs ( user data, VMotion, storage )

– More expensive NIC cards

– No storage/networking convergence. Unable to combine networking and storage onto one NIC

– Usually requires new cabling to be laid which intern could mean more structured panels

– No lossless transport for storage and elephant flows

– SFP used either for single-mode or multimode fiber can be up to $4000 list per each

Spine-Leaf Switch Design

The critical difference between traditional aggregation layers/points and fabric networks is that fabric doesn’t aggregate. If we want to provide 10GB for every edge router to send 10GB to every other edge router, we must add bandwidth between routers A and B, i.e., if we have three hosts sending at 10GB each, we need a core that supports 30 GB.

We must add bandwidth at the core because what if two routers wanted to send 2 x 10GB of data, and the core only supports a maximum of 10GB ( 10GB link between routers A and B)? Both data streams must be interleaved onto the oversubscribed link so that both senders get equal bandwidth. 

You get blocked and oversubscribed when more bandwidth comes into the core than the core can accommodate. Blocking and oversubscription cause delay and jitter, which is bad for some applications, so we must find a way to provide total bandwidth between each end host.

Oversubscription is expressed as the ratio of inputs to outputs (ex. 3:1) or as a percent that is calculated (1 – (# outputs / # inputs)). For example, (1 – (1 output / 3 inputs)) = 67% oversubscribed). There will always be some oversubscription on the network, and there is nothing we can do to get away from that, but as a general rule of thumb, an oversubscription value of 3:1 is best practice.

Some applications will operate fine when oversubscription occurs. It is up to the architect to thoroughly understand application traffic patterns, bursting needs, and baseline states to define the oversubscription limits a system can tolerate accurately.

The simplest solution to overcome the oversubscription and blocking problems would be to increase the bandwidth between Router A and B, as shown in the diagram labeled “Traditional Aggregation Topology.” This is feasible up to a certain point. Router A and B links must also grow to 10GB and 30 GB when the number of edge hosts grows. Datacenter links and the optics used to connect them are expensive.

Spine-Leaf Switch Design

The solution is to divide the core devices into several spine devices, which expose the internal fabric, enabling a spine-leaf architecture similar to what you see with ACI networks. This is achieved by spreading the fabric across multiple devices ( leaf and spine ).

The spreading of the fabric results in every leaf edge switch connecting to every spine core switch, resulting in every edge device having the total bandwidth of the fabric. This places multiple traffic streams parallel, unlike the traditional multitier design that stacks multiple streams onto a single link.

In addition, the higher degree of equal-cost multi-path routing ( ECMP ) found with leaf and spine architectures allows for greater cross-sectional bandwidth between layers, thus greater east-west bandwidth. There is also a reduction in the fault domain compared to traditional access, distribution, and core designs.

A failure of a single device only reduces the available bandwidth by a fraction, and only transit traffic will be lost with a link failure. ECMP reduces liability to a single fault and brings domain optimization.

Origination of the spine and leaf design

Charles Clos initially designed a Clos network 1952 as a multi-stage circuit-switched interconnection network to provide a scalable approach to building large-scale voice switches. It constrained high-speed switching fabrics and required low-latency, non-blocking switching elements.

There has been an increase in the deployment of Clos-based models in data center deployments. Usually, the Clos network is folded around the middle to form a “folded-Clos” network, referred to as a spine-leaf architecture. The spine-leaf switch design consists of three switches:

  • Servers connect directly to ToR ( top of rack ) switches.
  • ToR connects to aggregation switches.
  • Intermediate switches connect to aggregation switches. 

The spine is responsible for interconnecting all Leafs and allows hosts in one rack to talk to hosts in another. The leaves are responsible for physically connecting the servers and distributing traffic via ECMP across all spines nodes.

Leaf and Spine: Folded 3-Stage Clos fabric

Spine-leaf switch deployment considerations:

A. Spine-leaf switch: Fixed or modular switches

Fixed Switches

Modular switches

+ Cheaper

+ Gradual Growth

+ Lower Power Consumption

 + Larger fabrics with leaf/spine topologies

+ Require less space

 + Build-in redundancy with redundant SUPs and SSO/NSF

+ More ports per RU

+ In-Service software redundancy

+ Easier to manage

– Hard to manage

– More expensive

– Difficult to expand

– More cabling due to an increase in device numbers

The leaf layer determines the size of the spine and the oversubscription ratios. It is responsible for advertising subnets into the network fabric. An example of a leaf device would be a Nexus 3064, which provides the following:

  1. Line rate for Layer 2 and Layer 3 on all ports.
  2. Shared memory buffer space.
  3. Throughput of 1/2 terabits per second ( Tbps ) and 950 million packets per second ( Mpps )
  4. 64-way ECMP

The spine layer is responsible for learning infrastructure routes and physically interconnecting all leaf nodes. The Nexus 7K is the platform for the Spine device layer. The F2 series line cards can provide 48x 10G line rate ports and fit the requirements for spine architecture very well.
The following are the types of implementations you could have with this topology:

  1. Layer 3 fabric with standard routing.
  2. Large-scale bridging ( FabricPath, THRILL, or SPB ).
  3. Many-chassis MLAG ( Cisco VSS ).

This article will focus on Layer 3 fabrics with standard routing.

B. Spine-leaf switch: Non-redundant layer 3 design

Spine-leaf switch: Design Summary

  1. Layer 3 directly to the access layer. Layer 2 VLANs do not span the spine layer.
  2. Servers are connected to single switches. Servers are not dual connected to two switches, i.e., there is no server to switch redundancy or MLAG.
  3. All connections between the switches will be pure routed point-to-point layer 3 links.
  4. There are no inter-switch VLANs, so no VLAN will ever go beyond one switch.

When the spine switches only advertise the default to the leaf switches, the leaf switches lose visibility of the entire network, and you will need additional intra-spine links. Therefore, intra-spine links should not be used for data plane traffic in a leaf-spine architecture.

Spine-leaf switch: Design assumptions

The spine layer passes a default route to the Leaf. The link between the Leaf connecting to Host 1 and Spine Z fails. In the diagram, the link is marked with a red “X.” Host 4 sends traffic to the fabric destined for Host 1.

This traffic spreads ( ECMP ) across all links connecting the connected Leaf to the Spine layers. The traffic hits Spine C, and as C does not have a direct link ( it has failed ) to the Leaf connecting to Host 1, some traffic may be dropped while others will be sub-optimal. To overcome this, you must add inter-switch links between the Spine layers, which is not recommended.

Spine-leaf switch: Recommendations

  1. Buy Leaf switches that can support enough IP prefixes and don’t use summarization from Spine to Leaf.
  2. Always use 40G links instead of channels of 4 x 10G links because link aggregation bandwidth does not affect routing costs. If you lose a link in the port channel, the cost of the port channel does not change, which could result in congestion on the link. You could use Embedded Event Manager ( EEM ) scripting to change the OSPF cost after one of the port channels fails. This would add complexity to the network as you now don’t have equal-cost routes. This would lead you to use the Cisco proprietary protocol EIGRP, which supports unequal cost routing. If you didn’t want to support a Cisco proprietary protocol, you could implement MPLS TE between the ToR switches. First, you need to check that the DC switches support the MPLS switching of labels.
  3. Use QSFP optics as they are more robust than SFP optics. This will lower the likelihood of one of the parallel links failing.

 

C. Spine-leaf switch: Redundant layer 3 design

Spine-leaf switch: Design Summary

  1. The servers are dual home to two different switches.
  2. Servers have one IP address due to the restriction of TCP applications. Ideally, use LACP ( Link Aggregation Control Protocol ) between the host and servers.
  3. Layer 2 trunk links between the Leaf switches are needed to carry VLANs that span both switches. This will restrict VLANs from spanning the core, thus creating a sizeable L2 fabric based on STP.
  4. ToR switches must be in the same subnets ( share the server’s subnet) and advertise this subnet into the fabric. Again, the servers are dual-homed to 2 switches with one IP address.

Spine-leaf switch: The challenges

The leaf switches both advertise the same subnet to the spine switches. The spine switches and thinks they have two paths to reach the host. The Spine switch will spread its traffic from Host 1 to Leaf switches connecting Host 1 and Host 2. In specific scenarios, this could result in traffic to the hosts traversing the Interswitch link between the leaf nodes. This may not be a problem if most traffic leaves the servers northbound ( traffic leaving the data center ). However, if there is a lot of inbound traffic, this link could become a bottleneck and congestion point. This may not be an issue if this is a hosting web server farm because most traffic will leave the data center to external users.

Spine-leaf switch: Recommendation

  1. If there is a lot of east-to-west traffic ( 80 % ), using LAG ( Link Aggregation Group ) between the servers and ToR Leaf switches is mandatory.
  2. The two Leaf switches must support MLAG ( Multichassis Link Aggregation ). The result of using MLAG on the Leaf switches is that when connecting Leaf receives traffic destined for host X, it knows it can reach it directly through its connected link—resulting in optimal southbound traffic flow.
  3. Most LAG solutions place traffic generated from a single TCP session onto a single uplink, limiting the TCP session throughput to the bandwidth of a single uplink interface. However, Dynamic NIC teaming is available in Windows Server 2012 R2 which can split a single TCP session into multiple flows and distribute them across all uplinks.
  4. Use dynamic link aggregation – LACP and not static port channels. The LAGs between servers and switches should use LACP to prevent traffic blackholing.




Key Spine Leaf Architecture Summary Points:

Main Checklist Points To Consider

  • The spine leaf architecture consists of a leaf layer and a spine layer. Endpoints connect to the leaf layer—the spine switch act as the core.

  • This layout of the leaf and spine gives you optimal load balancing and ECMP for any endpoint in any location.

  • The traditional tree-based topologies are not suited for virtualization and you will always be hit with the core port count.

  • The spine and leaf can build massive data centers with, for example, folder 3-stage design.

  • Cisco ACI is an example of a leaf and spine design. VXLAN is the most common overlay protocol that works over what is known as the underlay.

Recap on Spine and Leaf Architecture

Spine Switches:

Spine switches form the backbone of the network in a Spine Leaf architecture. They are high-performance switches that connect to every leaf switch in the network. The spine switches provide a non-blocking, high-bandwidth fabric for data transfer between leaf switches. They ensure data traffic flows seamlessly across the network, avoiding bottlenecks and congestion.

Leaf Switches:

Leaf switches are connected to the spine switches and act as the access layer in a Spine Leaf architecture. They connect end-user devices, servers, or other network devices to the spine switches. Leaf switches are responsible for forwarding traffic between devices within the same leaf and between different leaf switches. They offer a high degree of network flexibility and redundancy.

Benefits of Spine Leaf Architecture:

1. Scalability: Spine Leaf architecture allows for easy scalability as new leaf switches can be added without affecting the existing network. This scalability makes it ideal for growing businesses and organizations with expanding network requirements.

2. High Bandwidth: The architecture provides high bandwidth capacity by leveraging multiple spine switches. This efficiently handles heavy data traffic and ensures optimal network performance even during peak usage.

3. Low Latency: Spine Leaf architecture minimizes latency by eliminating multiple layers of network hierarchy. With fewer hops and shorter paths, data packets can be transmitted quickly, improving application response times.

4. Redundancy and Resilience: The architecture offers built-in redundancy and resilience. If a link or a switch fails, traffic can be automatically rerouted through alternate paths, ensuring uninterrupted network connectivity and minimizing downtime.

5. Enhanced Performance: Spine Leaf architecture improves overall network performance by evenly distributing traffic across multiple paths. This load-balancing capability optimizes resource utilization and prevents any single point of failure.

Summary: Spine Leaf Architecture

In data centers, efficiency and scalability are critical to ensuring optimal performance. One architectural design that has gained significant attention is the leaf and spine architecture. This blog post delved into the intricacies of leaf and spine architecture, exploring its benefits, components, and the future it holds for data centers.

Section 1: Understanding Leaf and Spine Architecture

Leaf and spine architecture, also known as Clos architecture, is a network design approach that aims to eliminate bottlenecks and enhance scalability in data centers. The architecture consists of two main components: leaf switches and spine switches. Leaf switches act as the access layer, connecting directly to servers, while spine switches serve as the backbone, interconnecting the leaf switches.

Section 2: Benefits of Leaf and Spine Architecture

The leaf and spine architecture offers several advantages over traditional network designs. Firstly, it provides high bandwidth and low latency due to the non-blocking nature of the spine switches. This ensures smooth and efficient communication between servers. Additionally, the architecture allows for easy scalability, as new leaf switches can be added without impacting the existing network. This modular approach enables data centers to adapt to growing demands seamlessly.

Section 3: Components of Leaf and Spine Architecture

To grasp the essence of leaf and spine architecture, it’s essential to understand its main components. Leaf switches connect servers within a rack, offering multiple high-speed ports. Spine switches, conversely, provide the interconnectivity between leaf switches, forming a fabric network. Additionally, the architecture may incorporate top-of-rack (ToR) switches for enhanced flexibility and redundancy.

Section 4: Future Trends and Innovations

As technology continues to evolve, leaf and spine architecture is poised to witness further advancements. With the rise of software-defined networking (SDN), data centers can achieve greater control and programmability in managing their network infrastructure. Moreover, integrating artificial intelligence (AI) and machine learning (ML) algorithms can optimize traffic flow and improve overall network performance.

Conclusion:

In conclusion, leaf and spine architecture has revolutionized how data centers are designed and operated. Its scalable and efficient nature brings numerous benefits, including high bandwidth, low latency, and easy expansion. As technology progresses, we can expect further innovations in this architectural approach, ensuring that data centers can meet the ever-growing digital age demands.

network overlays

What is FabricPath

What is Fabric Path

In today's digital era, businesses rely heavily on networking infrastructure to ensure seamless communication and efficient data transfer. Cisco FabricPath is a cutting-edge technology that provides a scalable and resilient solution for modern network architectures. In this blog post, we will delve into the intricacies of Cisco FabricPath, exploring its features, benefits, and use cases.

Cisco FabricPath is a comprehensive network virtualization technology designed to address the limitations of traditional Ethernet networks. It offers a flexible and scalable approach for building large-scale networks that can handle the increasing demands of modern data centers. By combining the benefits of Layer 2 simplicity with Layer 3 scalability, Cisco FabricPath provides a robust and efficient solution for building high-performance networks.

Cisco Fabric Path is a networking technology that provides a scalable and flexible solution for data center networks. It leverages the benefits of both Layer 2 and Layer 3 protocols, combining the best of both worlds to create a robust and efficient network infrastructure.

- Simplified Network Design: One of the key advantages of Cisco Fabric Path is its ability to simplify network design and reduce complexity. By utilizing a loop-free topology and eliminating the need for complex Spanning Tree Protocol (STP) configurations, Fabric Path streamlines the network architecture and improves overall efficiency.

- Increased Scalability: Scalability is a crucial aspect of any modern network infrastructure, and Cisco Fabric Path excels in this area. With its support for large Layer 2 domains and the ability to accommodate thousands of VLANs, Fabric Path provides organizations with the flexibility to scale their networks as per their evolving needs.

- Enhanced Traffic Load Balancing: Cisco Fabric Path incorporates Equal-Cost Multipath (ECMP) routing, which enables efficient distribution of traffic across multiple paths. This results in improved performance, reduced congestion, and enhanced load balancing capabilities within the network.

- Converged Traffic and Virtualization: Fabric Path allows for the convergence of both Ethernet and Fibre Channel traffic onto a single infrastructure, simplifying management and reducing costs. Additionally, it seamlessly integrates with virtualization technologies, enabling organizations to leverage the benefits of virtual environments without compromising on performance or security.

Highlights: What is Fabric Path

Nexus OS Software Release

Introduced by Cisco in Nexus OS software Release 5.1(3), FabricPath Nexus allows architects to design highly scalable actual Layer 2 fabrics. Similar to the spanning tree, it provides an almost plug-and-play deployment model with the benefits of Layer 3 routing, allowing FabricPath networks to scale at an unprecedented level.

In addition to its simplicity, Fabric Path enables faster, simpler, and flatter data center networks. Cisco FabricPath uses routing principles to allow Layer 2 scaling. Therefore, it brings the stability of Layer 3 routing to Layer 2Fabric path traffic is no longer forwarded along a spanning tree design. As a result, we now have a more scalable design that is not limited by bisectional bandwidth.

The need for layer 2 

In the past, data centers were designed primarily to provide high availability. Layer 2 is crucial for modern data centers. Today’s networks must be agile and flexible, just like the organizations they serve. Since switching allows devices to be moved and infrastructure to be modified transparently, expanding the Layer 2 domain would satisfy this additional requirement. On the other hand, existing switching technologies rely on inefficient forwarding schemes based on spanning trees that cannot be extended to the entire network. The flexibility of Layer 2 compromises the scalability of Layer 3.

Spanning Tree Root Switch

 

Routing concepts at layer 2

Cisco FabricPath: Expanding Routing Concepts to Layer 2 Cisco® FabricPath extends routing stability and scalability to Layer 2 of Cisco NX-OS. Workloads can be moved across the entire data center by eliminating the need to segment the switched domain. As a result, the bisectional bandwidth of the network is no longer limited by a spanning tree, allowing for massive scalability.

An entirely new Layer 2 data plane is created with Cisco FabricPath, where frames enter the fabric with routable source and destination addresses. The source address of a frame is its receiving switch’s address, and its destination address is its destination switch’s address. As soon as the frame reaches the remote switch, it is de-encapsulated and delivered in its original Ethernet format.

The role of Fabric Path

In large data centers, virtualization of physical servers began a few years ago. Due to server virtualization and economies of scale, “mega data centers” containing tens of thousands of servers emerged due to server virtualization. As a result, distributed applications had to be supported on a large scale and provisioned in different data center zones. A scalable and resilient Layer 2 fabric was required to enable any-to-any communication. FabricPath was developed by Cisco to meet these new demands. Providing scalability, resilience, and flexibility, FabricPath is a highly scalable Layer 2 fabric.

FabricPath

Fabric Path Requirements

Massive Scalable Data Centers (MSDCs) and virtualization technologies have led to the development of large Layer 2 domains in data centers with more than 1000 servers and a design for scalability. Due to the limitations of Spanning Tree Protocol (STP), Layer 2 switching has evolved into technologies such as TRILL and FabricPath. To understand FabricPath’s limitations, you need to consider the limitations of current Layer 2 networks based on STP:

By blocking redundant paths, STP creates loop-free topologies in Layer 2 networks. STP uses the root selection process to accomplish this. To build shortest paths to the root switch, all the other switches block the other ports while building shortest paths to the root switch. The result is a Layer 2 network topology that is loop-free. This blocks layer 2 networks because all redundant paths are blocked. PVST, which enables per-VLAN load balancing, also has limitations in multipathing support, although some enhancements were made using the per-VLAN Spanning Tree Protocol (PVSTP).

The root bridge is selected based on the shortest path, which results in inefficient path selection between switches. So, selecting a path between switches doesn’t necessarily mean choosing the shortest path. Take two access switches as an example connected to distribution and each other. If the distribution switch serves as the root bridge for STP, the link between the two access switches is blocked. All traffic flows through the distribution switch.

Unavailability of Time-To-Live (TTL): The Layer 2 packet header doesn’t have a TTL field. This can lead to network meltdowns in switched networks. This is because a forwarding loop can cause a broadcast packet to duplicate, consuming excessive network resources exponentially.

STP Path distribution

MAC address scalability: Nonhierarchical flat addressing of Ethernet MAC addresses leads to limited scalability since MAC address summarization is impossible. Additionally, all the MAC addresses are essentially assigned to every switch in the Layer 2 network, increasing the size of Layer 2 tables.

As a result, Layer 3 routing protocols provide multipathing and efficient shortest paths between all nodes in the network, which resolve the shortcomings of Layer 2 networks. Layer 3 solves these issues, but the network design becomes static. Static network design limits Layer 2 domain size, so virtualization cannot be used. Thanks to FabricPath’s combination of the two technologies, a Layer 2 network can be flexible and scaled with Layer 3 networks.

stp port states

Fabric Path Benefits

With FabricPath, data center architects and administrators can design and implement scalable Layer 2 fabrics. Benefits of FabricPath include:

Maintains the plug-and-play features of classical Ethernet: Due to the minimal configuration requirements and the fact that the administrator must include the FabricPath core network interfaces, configuration effort is significantly reduced. Unicast forwarding, multicast forwarding, and VLAN pruning are also controlled by a single protocol (IS-IS). FabricPath operations, administration, and management (OAM) now support ping and trace routes, allowing network administrators to troubleshoot Layer 2 FabricPath networks similarly to Layer 3 networks.

Multipathing allows data center network architects to build large, scalable networks using N-way (more than one path) multipathing. A network administrator can also incrementally add new devices as needed to the existing topology. Using flat topologies, MSDC networks can be connected by only one hop between nodes. A single node failure in N-way multipathing results in a reduction in fabric bandwidth of 1/Nth.

With the enhanced Layer 2 network capabilities combined with Layer 3 capabilities, multiple paths can be created between endpoints instead of just one, replacing STP. It allows network administrators to increase bandwidth as bandwidth requirements increase incrementally.

The FabricPath protocol enables traffic to be forwarded over the shortest path to the destination, reducing network latency. This is more efficient than Layer 2 forwarding based on STP.

With FabricPath, MAC addresses are learned selectively based on active flows with conversational MAC learning. As a result, the need for large MAC tables is reduced.

Related: Before you proceed, you may find the following posts helpful:

  1. What is VXLAN
  2. Data Center Fabric
  3. Nexus 1000
  4. SDN Data Center
  5. Data Center Network Design
  6. Cisco ACI

Cisco FabricPath

Key What is FabricPath Discussion Points:


  • Introduction to What is FabricPath and what is involved.

  • Highlighting the details of FabricPath Nexus and the components involved.

  • Technical details on the issues with STP.

  • Scenario: Fabric Path use cases.

  • A final note on the Fabric Patch control plane and IS-IS.

Back to Basics: Cisco FabricPath.

We must support distributed applications at a considerable scale and have the flexibility to provision them in different zones of data center topologies. This necessitated creating a scalable and resilient Layer 2 fabric enabling any-to-any communication without workload placement restrictions—Cisco developed FabricPath to meet these new demands.

FabricPath is a powerful network technology from Cisco Systems that provides a unified, programmable fabric to connect, manage, and optimize data center networks. It is based on a distributed Layer 2 network protocol that enables the creation of multi-tenant, multi-domain, and multi-site networks with a single, unified control plane. FabricPath operates on a flat, non-hierarchical topology designed to simplify network virtualization and automation.

FabricPath delivers a highly scalable Layer 2 fabric. It uses a single control protocol (IS-IS) for unicast forwarding, multicast forwarding, and VLAN pruning. FabricPath also enables traffic to be forwarded across the shortest path to the destination, thus reducing latency in the Layer 2 network. This is more efficient than Layer 2 forwarding based on the STP.

FabricPath includes several features that make it ideal for large enterprise networks and data centers. It uses a distributed control plane to provide a unified view of the network and reduce network complexity. In addition, FabricPath supports virtualization, allowing the creation of multiple virtual networks within the same physical infrastructure. It also allows the creation of multiple forwarding instances and provides fast convergence times.

FabricPath
Diagram: FabricPath. Source is Cisco

The Challenges Of Inefficient Forwarding Schemes

The challenge is that existing switching technologies have inefficient IP forwarding schemes based on spanning trees and cannot be extended to the network. Therefore, current designs compromise the flexibility of Layer 2 and the scaling offered by Layer 3. On the other hand, Fabric Path introduces a new method of forwarding. 

The data design can stay the same as a leaf and spine. Still, we have a new Layer 2 data plane with fabric paths that encapsulate the frames entering the fabric with a header consisting of routable source and destination addresses.

These addresses are the address of the switch on which the frame was received and the address of the destination switch to which the frame is heading. From there, the frame is routed until it reaches the remote switch, where it is de-encapsulated and delivered in its original Ethernet format. FabricPath Nexus also uses a Shortest Path First (SPF) routing protocol to determine reachability and path selection in the FabricPath domain.

With Fabric Path, we have a simple and flexible behavior of Layer 2 while using the routing mechanisms that make IP reliable and scalable. So you may ask, what about the Layer 2 and 3 boundaries? The Layer 2 and 3 boundary still exists in a data center based on Cisco FabricPath. However, there is little difference in how traffic is forwarded in those two distinct areas of the network. The following sections discuss the drivers for FabricPath and what you may opt for in its design.

Why Cisco Fabricpath?

1) No Multipathing support at Layer 2: Spanning Tree Protocol ( STP ) lacks any good Layer 2 multipathing features for large data centers. The protocol has been enhanced with PVST per VLAN load balancing, but this feature can only load balance on VLANs.

2) MAC address scalability: Layer 2 end hosts are discovered by their MAC address, and this type of host addressing cannot be hierarchical and summarized. For example, one MAC address cannot represent a stub of networks. Traditional Layer 3 networks overcome this by introducing ABRs in OSPF or summarization/filtering in EIGRP. Also, in the Layer 2 network, all the MAC addresses are populated in ALL switches, leading to large requirements in the Layer 2 table sizes.

3) Instability of Layer 2 networks: Layer 3 networks have an eight-bit Time to Live ( TTL ) field that prevents datagrams from persisting (e.g., going in circles ) on the internet. However, compared to Layer 3 headers, the Layer 2 packet header does not have a TTL field. The lack of a TTL field will cause Layer 2 packets to loop infinitely, causing a network meltdown.

4) Incompetent path selection: The shortest path for a Layer 2 network depends on the placement of the Root switch. Depending on costs and port priorities, you can influence the root port selection ( forwarding port ), but the root switch’s placement is how the forwarding path is built. For example, in the diagram below, the most optimum traffic for the server-to-server flows would be via the inter-switch link, but as you can see, spanning tree blocks, this port, and traffic takes the sub-optimal path through the distribution switch.

Issues with Spanning Tree: Vendors’ responses.

A Spanning Tree allows only one path to be active between any two nodes and blocks the rest, making it unsuitable for low-latency data centers and cloud environments. Every vendor addressing the data center market proposes augmenting or replacing a Spanning Tree with a link-state protocol.

For example, Brocade uses TRILL in the data plane, while the control plane is based on Fabric Shortest Path First, an ANSI standard used by all Fibre Channel SAN fabrics as the link-state routing protocol.

On the other hand, Juniper implemented a tagging mechanism in the Broadcom silicon in its QFabric switches rather than a link-state protocol. Cisco FabricPath is considered a “superset” of TRILL, bringing scale to the data center and improving application performance.

Fabric Path typical use cases

Fabric Path can support any new protocol that can be done elegantly in IS-IS by adding new extensions without modifying the base infrastructure. Each IS-IS Intermediate router advertises one or more IS-IS Link State Protocol Data Units (LSPs) with routing information.

The LSP comprises a fixed header and several tuples, each consisting of a Type, a Length, and a Value. Such tuples are commonly known as TLVs and are a good way of encoding information in a flexible and extensible format. These make IS-IS a very extensible routing protocol, and FabricPath takes advantage of this extensibility.

This allows FarbicPath to support the following prominent use cases.

  1. Large flat data centers that need Layer 2 multipathing and equidistant endpoints.
  2. DC requires a reduction of Layer 2 table sizes ( done via MAC conversational learning ).
Fabric Path
Diagram: Fabric Path Conversational learning. Source is Cisco

Cisco FabricPath control plane

FabricPath is a Layer 2 overlay network with an IS-IS control plane. Using FabricPath IS-IS, the switches build their forwarding tables, similar to building the forwarding table in Layer 3 networks. The extensions used in IS-IS to support Fabricpath allow this Layer 2 overlay to take advantage of all the scalable and load balancing ( ECMP, up to 16 routes ) benefits of a Layer 3 network while retaining the benefits of a plug-and-play Layer 2 network.

The FabricPath Header

The FabricPath header has a hop count in one of the fields, which mitigates temporary loops in FabricPath networks. This header uses locally assigned hierarchical MAC addresses for forwarding frames within the network. The original Layer 2 frames are encapsulated with a FabricPath header, and a new CRC is appended to the existing packet. One of the main elements of the FabricPath header is the SwitchID, and the core switches forward Fabricpath traffic by examining this field. The switch ID is the field used in the FabricPath domain to forward packets to the correct destination switch.

Why use IS-IS as the FabricPath Nexus control plane?

We touched on this just a moment ago. Its control protocol is built on top of the Intermediate System–to Intermediate System (IS-IS) routing protocol, which provides fast convergence and has been proven to scale up to the largest service provider environments.

  1. IS-IS is flexible and can be extended to support other functions with new type-length values (TLVs).
  2. TLV is also known as tag-length value and encodes optional information.IS-IS runs directly over the link layer, thereby preventing the need for any underlying Layer 3 protocol like IP to work.

Virtual PortChannel

Fabricpath Nexus uses Virtual PortChannel. Now, we have multiple active link capabilities, resulting in active-active forwarding paths. The vPC allows a more granular design over the standard port channeling that only allows you to terminate on one switch. In addition, Cisco vPC enables a more flexible triangular design. Both aggregation technologies can use LACP for the control plane to negotiate the links.

Virtual Device Context

Fabricpath Nexus also uses Virtual Device Contexts (VDC), which allows each FabricPath control-plane protocol and functional block to run in its own protected memory space as individual processes for stability and fault isolation. A VDC design enables modular building blocks to improve security and performance.

FabricPath Nexus and conversational MAC learning

FabricPath Nexus performs conversational MAC learning, enabling a switch to learn only those MACs involved in active bidirectional communication. Similar to a three-way handshake, this new technique leads to the population of only the interested host’s MAC addresses rather than all MAC addresses in the domain. This dramatically reduces the need for large table sizes as each switch only learns the MAC addresses that the hosts under its interface are actively communicating with. As a result, edge nodes only know the MAC addresses of local nodes or nodes that want to speak with local nodes directly.

FabricPath Nexus benefits and drawbacks

Benefits  

Drawbacks

Plug-and-play features like Classical Ethernet

 Cisco proprietary

The single control plane for ALL types of traffic and good troubleshooting features to debug problems at Layer 2 

Fabric interfaces carry only FabricPath encapsulated traffic

High performance and high availability using multipathing**

Useful as a DCI solution only over short distances

Easy to add new devices to an existing FabricPath domain

NA

Small Layer 2 table sizes result in better performance

NA

** This enables the MSDC networks to have flat topologies, separating the nodes by a single hop.

Although IS-IS forms the basis of Cisco FabricPath, you don’t need to be an IS-IS expert. You can enable FabricPath interfaces and begin forwarding FabricPath encapsulated frames in the same way they can activate Spanning Tree and interconnect switches.

The only necessary configuration is distinguishing the core ports, which link the switches, from the edge ports, where end devices are attached. No other parameters need to be tuned to achieve an optimal configuration, and the switch addresses are assigned automatically for you.

Summary: What is Fabric Path

Key Features of Cisco FabricPath:

1. MAC-in-MAC Encapsulation: Cisco FabricPath utilizes MAC-in-MAC encapsulation to overcome the traditional Spanning Tree Protocol (STP) limitations. By encapsulating Layer 2 frames within another Layer 2 frame, FabricPath enables efficient forwarding and eliminates the need for STP.

2. Loop-Free Topology: Unlike STP-based networks, Cisco FabricPath employs a loop-free topology, ensuring optimal forwarding paths and maximizing network utilization. This feature enhances network resilience and eliminates the risk of network outages caused by loops.

3. Scalability: Cisco FabricPath supports up to 16 million virtual ports, enabling organizations to scale their networks without compromising performance. This scalability makes it ideal for large data centers and enterprises with growing network demands.

4. Traffic Optimization: Cisco FabricPath optimizes traffic flows using Equal-Cost Multipath (ECMP) routing. ECMP distributes traffic across multiple paths, allowing for efficient load balancing and improved network performance.

Benefits of Cisco FabricPath:

1. Simplified Network Design: Cisco FabricPath simplifies network design by eliminating the need for complex STP configurations. With its loop-free architecture, FabricPath reduces network complexity and improves overall network stability.

2. Enhanced Network Resilience: Cisco FabricPath ensures high network availability and resilience by utilizing multiple paths and load balancing techniques. In the event of a link failure, traffic is automatically rerouted, minimizing downtime and enhancing network reliability.

3. Increased Performance: With its scalable design and traffic optimization capabilities, Cisco FabricPath delivers superior network performance. FabricPath minimizes bottlenecks and improves overall network throughput by distributing traffic across multiple paths.

Use Cases of Cisco FabricPath:

1. Data Center Networks: Cisco FabricPath is widely used in data center environments, providing a scalable and resilient networking solution. Its ability to handle high traffic volumes and optimize data flows makes it an ideal choice for modern data centers.

2. Virtualized Environments: Cisco FabricPath is particularly beneficial in virtualized environments, simplifying network provisioning and enhancing virtual machine mobility. Its scalability and flexibility enable seamless communication between virtualized resources.

Conclusion: Cisco FabricPath is a powerful networking solution that offers numerous benefits for organizations seeking scalable and resilient network architectures. With its loop-free topology, MAC-in-MAC encapsulation, and traffic optimization capabilities, Cisco FabricPath simplifies network design, enhances network resilience, and boosts overall performance. By implementing Cisco FabricPath, businesses can build robust and efficient networks that meet the demands of today’s digital landscape.

What is VXLAN

What is VXLAN

What is VXLAN

In the rapidly evolving networking world, virtualization has become critical for businesses seeking to optimize their IT infrastructure. One key technology that has emerged is VXLAN (Virtual Extensible LAN), which enables the creation of virtual networks independent of physical network infrastructure. In this blog post, we will delve into the concept of VXLAN, its benefits, and its role in network virtualization.

VXLAN is an encapsulation protocol designed to extend Layer 2 (Ethernet) networks over Layer 3 (IP) networks. It provides a scalable and flexible solution for creating virtualized networks, enabling seamless communication between virtual machines (VMs) and physical servers across different data centers or geographic regions.

VXLAN is a technology that creates virtual networks within an existing physical network. A Layer 2 overlay network runs on top of the current Layer 2 network. VXLAN utilizes UDP as the transport protocol, providing a secure, efficient, and reliable way to create a virtual network.

VXLAN encapsulates the original Layer 2 Ethernet frames within UDP packets, using a 24-bit VXLAN Network Identifier (VNI) to distinguish between different virtual networks. The encapsulated packets are then transmitted over the underlying IP network, enabling the creation of virtualized Layer 2 networks across Layer 3 boundaries.

- Scalability: VXLAN solves the limitations of traditional VLANs by providing a much larger network identifier space, accommodating up to 16 million virtual networks. This scalability allows for the efficient isolation and segmentation of network traffic in highly virtualized environments.

VXLAN enables the decoupling of virtual and physical networks, providing the flexibility to move virtual machines across different physical hosts or even data centers without the need for reconfiguration. This flexibility greatly simplifies workload mobility and enhances overall network agility.

- Multitenancy: With VXLAN, multiple tenants can securely share the same physical infrastructure while maintaining isolation between their virtual networks. This is achieved by assigning unique VNIs to each tenant, ensuring their traffic remains separate and secure.

- Underlay Network: VXLAN relies on an IP underlay network, which must provide sufficient bandwidth, low latency, and optimal routing. Careful planning and design of the underlay network are crucial to ensure optimal VXLAN performance.

- Network Virtualization Gateway: To enable communication between VXLAN-based virtual networks and traditional VLAN-based networks, a network virtualization gateway, such as a VXLAN Gateway or an overlay-to-underlay gateway, is required. These gateways bridge the gap between virtual and physical networks, facilitating seamless connectivity.

Highlights: What is VXLAN

Data centers evolution

In recent years, data centers have seen a significant evolution. This evolution has brought popular technologies such as virtualization, cloud computing (private, public, and hybrid), and software-defined networking (SDN). Mobile-first and cloud-native data centers must scale, be agile, secure, consolidate, and integrate with compute/storage orchestrators. As well as visibility, automation, ease of management, operability, troubleshooting, and advanced analytics, today’s data center solutions are expected to include many other features.

A more service-centric approach is replacing device-by-device management. Most requests for proposals (RFPs) specify open application programming interfaces (APIs) and standards-based protocols to prevent vendor lock-in. A Cisco Virtual Extensible LAN (VXLAN)-based fabric using Nexus switches2 and NX-OS controllers form Cisco Virtual Extensible LAN (VXLAN).

what is spine and leaf architecture
Diagram: What is spine and leaf architecture. 2-Tier Spine Leaf Design

Issues with STP

When a switch receives redundant paths, the spanning tree protocol must designate one of those paths as blocked to prevent loops. While this mechanism is necessary, it can lead to suboptimal network performance. Blocked ports limit bandwidth utilization, which can be particularly problematic in environments with heavy data traffic.

One significant concern with the spanning tree protocol is its slow convergence time. When a network topology changes, the protocol takes time to recompute the spanning tree and reestablish connectivity. During this convergence period, network downtime can occur, disrupting critical operations and causing frustration for users.

stp port states

What is VXLAN?

The Internet Engineering Task Force (IETF) developed VXLAN, or Virtual eXtensible Local-Area Network, as a network virtualization technology standard. Multi-tenant networks allow multiple organizations to share a physical network without accessing each other’s traffic.

The VXLAN can be compared to individual apartments in a building: each apartment is a separate, private dwelling within a shared physical structure, just as each VXLAN is a discrete, private network segment within a shared physical infrastructure.

With VXLANs, physical networks can be segmented into 16 million logical networks. To encapsulate Layer 2 Ethernet frames, User Datagram Protocol (UDP) packets with a VXLAN header are used. Combining VXLAN with Ethernet virtual private networks (EVPNs), which transport Ethernet traffic over WAN protocols, allows Layer 2 networks to be extended across Layer 3 IP or MPLS networks.

VXLAN vs. GRE

VXLAN, an overlay network technology, is designed to address the limitations of traditional VLANs. It enables the creation of virtual networks over an existing Layer 3 infrastructure, allowing for more flexible and scalable network deployments. VXLAN operates by encapsulating Layer 2 Ethernet frames within UDP packets, extending Layer 2 domains across Layer 3 boundaries.

GRE, on the other hand, is a simple IP packet encapsulation protocol. It provides a mechanism for encapsulating arbitrary protocols over an IP network and is widely used for creating point-to-point tunnels. GRE encapsulates the payload packets within IP packets, making it a versatile option for connecting remote networks securely.

GRE without IPsec

Key VXLAN advantages

Because VXLANs are encapsulated inside UDP packets, they can run on any network that can send UDP packets. No matter how far a VTEP is from the decapsulating VTEP physically or geographically, it must forward UDP datagrams. 

VXLAN and EVPN enable operators to create virtual networks from physical ports on any Layer 3 network switch supporting the standard. Connecting a port on switch A to two ports on switch B and another port on switch C creates a virtual network that appears to all connected devices as one physical network. Devices in this virtual network cannot see VXLANs or the underlying network fabric.

Problems that VXLAN solves

In the same way, as server virtualization has increased agility and flexibility, decoupling virtual networks from physical infrastructure has done the same. Therefore, network operators can scale their infrastructure rapidly and economically to meet growing demand while securely sharing a single physical network. For privacy and security reasons, networks are segmented to prevent one tenant from seeing or accessing the traffic of another.

In a similar way to traditional virtual LANs (VLANs), VXLANs enable operators to overcome the scaling limitations associated with VLANs.

  • Up to 16 million VXLANs can be created in an administrative domain, compared to 4094 traditional VLANs. Cloud and service providers can segment networks using VXLANs to support many tenants.
  • By using a VXLAN, you can create network segments between different data centers. In traditional VLAN networks, broadcast domains are created by segmenting traffic by VLAN tags, but once a packet containing VLAN tags reaches a router, the VLAN information is removed. There is no limit to the distance VLANs can travel within a Layer 2 network. Layer 3 boundaries, such as virtual machine migration, are generally avoided in certain use cases. Segmenting networks based on VXLAN encapsulates packets as UDP packets, while segmenting networks based on VXLAN encapsulates packets as IP packets. A virtual overlay network can extend as far as the physical Layer 3 routed network can reach when all switches and routers in the path support VXLAN without the applications running on the overlay network having to cross any Layer 3 boundaries. Servers connected to the network are still part of the Layer 2 network, even though UDP packets may have transited one or more routers.
  • Using Layer 2 segmentation on top of an underlying Layer 3 network allows one to segment a Layer 2 network over an underlying Layer 3 network and support many network segments. By providing Layer 2 segmentation on top of an underlying Layer 3 network, Layer 2 networks can remain small even if they are distant. Smaller Layer 2 networks can prevent MAC table overflows on switches.

Primary VXLAN applications

A service provider or cloud provider deploys VXLAN for apparent reasons: they have many tenants or customers, and they must separate the traffic of one customer from another due to legal, privacy, and ethical considerations.

Users, departments, or other groups of network-segmented devices may be tenants in enterprise environments for security reasons. Isolating IoT network traffic from production network applications is a good security practice for Internet of Things (IoT) devices such as data center environmental sensors.

VXLAN has been widely adopted and is now used in many large enterprise networks for virtualization and cloud computing. It provides:

  • A secure and efficient way to create virtual networks.
  • Allowing for the creation of multi-tenant segmentation.
  • Efficient routing.
  • Hardware-agnostic capabilities.

With its widespread adoption, VXLAN has become an essential technology for network virtualization.

Related: Before you proceed, you may find the following posts helpful for pre-information:

  1. Data Center Topologies
  2. Segment Routing
  3. What is OpenFlow
  4. Overlay Virtual Networks
  5. Layer 3 Data Center

What is VXLAN

Key What is VXLAN Discussion Points:


  • Introduction to What is VXLAN and what is involved.

  • Highlighting the details of VXLAN vs VLAN.

  • Technical details on the VXLAN Spanning Tree.

  • Scenario: Why introduce VXLAN? VLXAN benefits. 

  • A final note on the VXLAN enhancements.

Back to Basics: The Need For VXLAN

Traditional layer two networks have issues because of the following reasons:

  • Spanning tree: Restricts links.
  • Limited amount of VLANs: Restricts scalability;
  • Large MAC address tables: Restricts scalability and mobility

Spanning-tree avoids loops by blocking redundant links. By blocking connections, we create a loop-free topology and pay for links we can’t use. Although we could switch to a layer three network, some technologies require layer two networking.

VLAN IDs are 12 bits long, so we can create 4094 VLANs (0 and 4095 are reserved). Data centers may need help with only 4094 available VLANs. Let’s say we have a service provider with 500 customers. There are 4094 available VLANs, so each customer can only have eight.

STP Path distribution

The Role of Server Virtualization

Server virtualization has exponentially increased the number of addresses in our switches’ MAC addresses. Before server virtualization, there was only one MAC address per switch port. We can run many virtual machines (VMs) or containers on a single physical server with server virtualization. Virtual NICs and virtual MAC addresses are assigned to each virtual machine. One switch port must learn many MAC addresses.

A data center could have 24 or 48 physical servers connected to a top-of-rack (ToR) switch. Since there may be many racks in a data center, each switch must store the MAC addresses of all VMs that communicate. Networks without server virtualization require much larger MAC address tables.

1st Lab Guide: VXLAN

In the following lab, I created a Layer 2 overlay with VXLAN over a Layer 3 core. A bridge domain VNI of 6001 must match both sides of the overlay tunnel. What Is a VNI? The VLAN ID field in an Ethernet frame has only 12 bits, so VLAN cannot meet isolation requirements on data center networks. The emergence of VNI specifically solves this problem.

Note: The VNI

A VNI is a user identifier similar to a VLAN ID. A VNI identifies a tenant. VMs with different VNIs cannot communicate at Layer 2. During VXLAN packet encapsulation, a 24-bit VNI is added to a VXLAN packet, enabling VXLAN to isolate many tenants.

In the screenshot below, you will notice that I can ping from desktop 0 to desktop one even though the IP addresses are not in the routing table of the core devices, simulating a Layer 2 overlay. Consider VXLAN to be the overlay and the routing Layer 3 core to be the underlay.

VXLAN overlay
Diagram: VXLAN Overlay

In the following screenshot, notice that the VNI has been changed. The VNI needs to be changed in two places in the configuration, as illustrated below. Once changed, the Peers are down; however, the NVE  interface remains up. The VXLAN layer two overlay is not operational.

Diagram: Changing the VNI

How does VXLAN work?

VXLAN uses tunneling to encapsulate Layer 2 Ethernet frames within IP packets. A unique 24-bit segment ID, the VXLAN Network Identifier (VNI), identifies each VXLAN network. The source VM encapsulates the original Ethernet frame with a VXLAN header, including the VNI. The encapsulated packet is then sent over the physical IP network to the destination VM and decapsulated to retrieve the original Ethernet frame.

Analysis:

Notice below that it is running a ping from desktop 0 to desktop 1. The IP addresses assigned to this host are 10.0.0.1 and 10.0.0.2. First, notice that the ping is booming. When I do a packet capture on the links Gi1 connected to Leaf A, we see the encapsulation of the ICMP echo request and reply.

Everything is encapsulated into UDP port 1024. In my configurations of Leaf A and Leaf B, I explicitly set the VXLAN port to 1024.

VXLAN unicast mode

Benefits of VXLAN:

– Scalability: VXLAN allows the creation of up to 16 million logical networks, providing the scalability required for large-scale virtualized environments.

– Network Segmentation: By leveraging VXLAN, organizations can segment their networks into virtual segments, enhancing security and isolating traffic between applications or user groups.

– Flexibility and Mobility: VXLAN enables the movement of VMs across physical servers and data centers without the need to reconfigure network settings. This flexibility is crucial for workload mobility in dynamic environments.

– Interoperability: VXLAN is an industry-standard protocol supported by various networking vendors, ensuring compatibility across different network devices and platforms.

Data Center

VXLAN

VXLAN Benefits

  • Scalability

  • Network Segmentation

  • Flexibility and Mobility

  • Interopability 

Data Center

VXLAN

VLAN Use Cases

  • Data Center Interconnect (DCI)

  • Multi Tenant Environments

  • Network Virtualization

  • Hybrid Cloud Connectivity

Use Cases for VXLAN:

– Data Center Interconnect (DCI): VXLAN allows organizations to interconnect multiple data centers, enabling seamless workload migration, disaster recovery, and workload balancing across different locations.

– Multi-Tenant Environments: VXLAN enables service providers to offer virtualized network services to multiple tenants securely and isolatedly. This is particularly useful in cloud computing environments.

– Network Virtualization: VXLAN plays a crucial role in network virtualization, allowing organizations to create virtual networks independent of the underlying physical infrastructure. This enables greater flexibility and agility in managing network resources.

Back to Basics: VXLAN and Network Virtualization.

VXLAN and network virtualization

VXLAN is a form of network virtualization. Network virtualization cuts a single physical network into many virtual networks, often called network overlays. Virtualizing a resource allows it to be shared by multiple users. Virtualization provides the illusion that each user is on his or her resources. In the case of virtual networks, each user is under the misconception that there are no other users of the network. To preserve the illusion, virtual networks are separated from one another. Packets cannot leak from one virtual network to another.

Network Virtualization
Diagram: Network Virtualization. Source Parallels

VXLAN Loop Detection and Prevention

So, before we dive into the benefits of VXLAN, let us address the basics of loop detection and prevention, which is a significant driver for using network overlays such as VLXAN. The challenge is that data frames can exist indefinitely when loops occur, disrupting network stability and degrading performance.

In addition, loops introduce broadcast radiation, increasing CPU and network bandwidth utilization, which degrades user application access experience. Finally, in multi-site networks, a loop can span multiple data centers, causing disruptions that are difficult to pinpoint. Overlay networking can solve much of this.

VXLAN vs VLAN

However, first-generation Layer-2 Ethernet networks could not natively detect or mitigate looped topologies, while modern Layer-2 overlays implicitly build loop-free topologies. Therefore, overlays do not need loop detection and mitigation as long as no first-gen Layer-2 network is attached. Essentially, there is no need for a VXLAN spanning tree.

So, one of the differences between VXLAN vs VLAN is that the VLAN has a 12-bit VID while VXLAN has a 24-bit VID network identifier, allowing you to create up to 16 million segments. VXLAN has tremendous scale and stable loop-free networking and is a foundation technology in the ACI Cisco.

Spanning tree VXLAN
Diagram: Loop prevention. Source is Cisco

VXLAN and Data Center Interconnect

VXLAN has revolutionized data center interconnect by providing a scalable, flexible, and efficient solution for extending Layer 2 networks. Its ability to enable network segmentation, multi-tenancy support, and seamless mobility makes it a valuable technology for modern businesses.

However, careful planning, consideration of network infrastructure, and security measures are essential for successful implementation. By harnessing the power of VXLAN, organizations can achieve a more agile, scalable, and interconnected data center environment.

Considerations for Implementing VXLAN:

1. Underlying Network Infrastructure: Before implementing VXLAN, it is essential to assess the underlying network infrastructure. Network devices must support VXLAN encapsulation and decapsulation and have sufficient bandwidth to handle the increased traffic.

2. Network Overhead: While VXLAN provides numerous benefits, it does introduce additional network overhead due to encapsulation and decapsulation processes. It is crucial to consider the impact on network performance and plan accordingly.

3. Security: As VXLAN extends Layer 2 networks over Layer 3 infrastructure, appropriate security measures must be implemented. These include encrypting VXLAN traffic, deploying access control policies, and monitoring network traffic for anomalies.

VXLAN vs VLAN: The VXLAN Benefits Drive Adoption

Introduced by Cisco and VMware and now heavily used in open networking, VXLAN stands for Virtual eXtensible Local Area Network and is perhaps the most popular overlay technology for IP-based SDN data centers. And is used extensively with ACI networks.

VXLAN was explicitly designed for Layer 2 over Layer 3 tunneling. Its early competition from NVGRE and STT are fading away, and VXLAN is becoming the industry standard. VLXAN brings many advantages, especially in loop prevention, as there is no need for a VXLAN spanning tree.

VXLAN Benefits
VXLAN Benefits: Scale and loop-free networks.

Today, with overlays such as with VXLAN, the dependency on loop prevention protocols is almost eliminated. However, even though virtualized overlay networks such as VXLAN are loop-free, having a failsafe loop detection and mitigation method is still desirable because loops can be introduced by topologies connected to the overlay network.

Loop prevention traditionally started with Spanning Tree Protocols (STP) to counteract the loop problem in first-gen Layer-2 Ethernet networks. Over time, other approaches evolved by moving networks from “looped topologies” to “loop-free topologies.

While LAG and MLAG were used, other approaches for building loop-free topologies arose using ECMP at the MAC or IP layers. For example, FabricPath or TRILL is a MAC layer ECMP approach that emerged in the last decade. More recently, network virtualization overlays that build loop-free topologies on top of IP layer ECMP became state-of-the-art.

What is VXLAN
What is VXLAN and the components involved?

VXLAN vs VLAN: Why Introduce VXLAN?

  1. STP issues and scalability constraints: STP is undesirable on a large scale and lacks a proper load-balancing mechanism. A solution was needed to leverage the ECMP capabilities of an IP network while offering extended VLANs across an IP core, i.e., virtual segments across the network core. There is no VXLAN spanning tree.
  2. Multi-tenancy: Layer 2 networks are capped at 4000 VLANs, restricting multi-tenancy design—a big difference in the VXLAN vs VLAN debates.
  3. ToR table scalability: Every ToR switch may need to support several virtual servers, and each virtual server requires several NICs and MAC addresses. This pushes the limits on the table sizes for the ToR switch. In addition, after the ToR tables become full, Layer 2 traffic will be treated as unknown unicast traffic, which will be flooded across the network, causing instability to a previously stable core.
STP Blocking.
Diagram: STP Blocking. Source Cisco Press free chapter.

VXLAN use cases

Use Case 

VXLAN Details

Use Case 1

Multi-tenant IaaS Clouds where you need a large number of segments

Use Case 2

Link Virtual to Physical Servers. This is done via software or hardware VXLAN to VLAN gateway

Use Case 3

HA Clusters across failure domains/availability zones

Use Case 4

VXLAN works well over fabrics that have equidistant endpoints

Use Case 5

VXLAN-encapsulated VLAN traffic across availability zones must be rate-limited to prevent broadcast storm propagation across multiple availability zones

What is VXLAN? The operations

When discussing VXLAN vs VLAN, VXLAN employs a MAC over IP/UDP overlay scheme and extends the traditional VLAN boundary of 4000 VLANs. The 12-bit VLAN identifier in traditional VLANs capped scalability within the SDN data center and proved cumbersome if you wanted a VLAN per application segment model. VXLAN scales the 12-bit to a 24-bit identifier and allows for 16 million logical endpoints, with each endpoint potentially offering another 4,000 VLANs.

While tunneling does provide Layer 2 adjacency between these logical endpoints and allows VMs to move across boundaries, the main driver for its insertion was to overcome the challenge of having only 4000 VLAN.

Typically, an application segment has multiple segments; between each segment, you will have firewalling and load-balancing services, and each segment requires a different VLAN. The Layer 2 VLAN segment transfers non-routable heartbeats or state information that can’t cross an L3 boundary. If you are a cloud provider, you will soon reach the 4000k VLAN limit.

vxlan vs vlan
Multiple segments are required per application stack.

The control plane

The control plane is very similar to the spanning tree control plane. If a switch receives a packet destined for an unknown address, the switch will forward the packet to an IP address that floods the packet to all the other switches.

This IP address is, in turn, mapped to a multicast group across the network. VXLAN doesn’t explicitly have a control plane and requires an IP multicast running in the core for forwarding traffic and host discovery.

Best practices for enabling IP Multicast in the core

IP Multicast

In the Core

  1. Bidirectional PIM or PIM Sparse Mode
  1. Redundant Rendezvous Points (RP)
  1. Shared trees (reduce the amount of IP multicast state)
  1. Always check the IP multicast table sizes on core and ToR switches
  1. Single IP multicast address for multiple VXLAN segments is OK

The requirement for IP multicast in the core made VXLAN undesirable from an operation point of view. For example, creating the tunnel endpoints is simple, but introducing a protocol like IP multicast to a core just for the tunnel control plane was considered undesirable. As a result, some of the more recent versions of VXLAN support IP unicast.

VXLAN uses a MAC over IP/UDP solution to eliminate the need for a spanning tree. There is no VXLAN spanning tree. This enables the core to be IP and not run a spanning tree. Many people ask why VXLAN uses UDP. The reason is that the UDP port numbers cause VXLAN to inherit Layer 3 ECMP features. The entropy that enables load balancing across multiple paths is embedded into the UDP source port of the overlay header.

2nd Lab Guide: Multicast VLXAN

In this lab guide, we will look at a VXLAN multicast mode. The multicat mode requires both unicast and multicast connectivity between sites. Similar to the previous one, this configuration guide uses OSPF to provide unicast connectivity, and now we have an additional bidirectional Protocol Independent Multicast (PIM) to provide multicast connectivity.

This does not mean that you don’t have a multicast-enabled core. It would be best if you still had multicast enabled on the core. 

So we are not, let’s say, tunneling multicast over an IPv4 core without having multicast enabled on the core. I have multicast on all Layer 3 interfaces, and the mroute table is populated on all Layer 3 routers. With the command: Show ip mroute we are tunneling the multicast traffic, and with the command: Show nve vni we have multicast group 239.0.0.10, and we have a state of UP.

Multicast VXLAN
Diagram: Multicast VXLAN

VXLAN benefits and stability

The underlying control plan network impacts the stability of VXLAN and the applications running within it. For example, if the underlying IP network cannot converge quickly enough, VLXAN packets may be dropped, and an application cache timeout may be triggered.

The rate of change in the underlying network has a significant impact on the stability of the tunnels, yet the rate and change of the tunnels do not affect the underlying control plane. This is similar to how the strength of an MPLS / VPN overlay is affected by the core’s IGP.

VXLAN Points

VXLAN benefits

VXLAN drawbacks

Point 1

Runs over IP Transport

 No control plane

Point 2

Offers a large number of logical endpoints 

Needs IP Multicast***

Point 3

Reduced flooding scope

No IGMP snooping ( yet )

Point 4

Eliminates STP

No Pvlan support

Point 5

Easily integrated over existing Core

Requires Jumbo frames in the core ( 50 bytes)

Point 6

Minimal host-to-network integration

No built-in security features **

Point 7

Not a DCI solution ( no arp reduction, first-hop gateway localization, no inbound traffic steering i.e, LISP )

** VXLAN has no built-in security features. Anyone who gains access to the core network can insert traffic into segments. The VXLAN transport network must be secure, as no existing firewall or intrusion prevention system (IPS) equipment can be seen in the VXLAN traffic.

*** Recent versions have Unicast VXLAN. Nexus 1000V release 4.2(1)SV2(2.1)

Updated: VXLAN enhancements

MAC distribution mode is an enhancement to VXLAN that prevents unknown unicast flooding and eliminates data plane MAC address learning. Traditionally, this was done by flooding to locate an unknown end host, but it has now been replaced with a control plane solution.

During VM startup, the VSM ( control plane ) collects the list of MAC addresses and distributes the MAC-to-VTEP mappings to all VEMs participating in a VXLAN segment. This technique makes VXLAN more optimal by unicasting more intelligently, similar to Nicira and VMware NVP.

ARP termination works by giving the VSM controller all the ARP and MAC information. This enables the VSM to proxy and respond locally to ARP requests without sending a broadcast. Because 90% of broadcast traffic is ARP requests ( ARP reply is unicast ), this significantly reduces broadcast traffic on the network.

Final Notes: VXLAN

In recent years, the rapid growth of cloud computing and the increasing demand for scalable and flexible networks have led to the development of various technologies to address these needs. One such technology is VXLAN (Virtual Extensible LAN), an overlay network protocol that has gained significant popularity in networking. In this blog post, we will delve into the intricacies of VXLAN, exploring its key features, benefits, and use cases.

What is VXLAN?

VXLAN is a network overlay technology that enables the creation of virtualized Layer 2 networks over existing Layer 3 infrastructure. It was developed to address the limitations of traditional VLANs, which could not scale beyond a few thousand networks due to the limited number of VLAN IDs available. VXLAN solves this problem using a 24-bit VXLAN Network Identifier (VNI), allowing for an impressive 16 million unique network segments.

Key Features of VXLAN:

1. Scalability: As mentioned earlier, VXLAN’s use of a 24-bit VNI allows for a significantly larger number of network segments than traditional VLANs. This scalability makes VXLAN an ideal solution for large-scale virtualized environments.

2. Network Segmentation: VXLAN enables the creation of logical network segments, allowing for network isolation and improved security. By encapsulating Layer 2 Ethernet frames within Layer 3 UDP packets, VXLAN provides a flexible and scalable approach to network segmentation.

3. Multicast Support: VXLAN leverages IP multicast to efficiently distribute broadcast, unknown unicast, and multicast (BUM) traffic across the network. This feature reduces network congestion and improves overall performance.

4. Mobility: VXLAN supports seamless virtual machines (VMs) movement across physical hosts and data centers. By decoupling the VMs from the underlying physical network, VXLAN enables mobility without requiring any changes to the network infrastructure.

Benefits of VXLAN:

1. Enhanced Network Flexibility: VXLAN enables the creation of virtualized networks decoupled from the underlying physical infrastructure. This flexibility allows for easier network provisioning, scaling, and reconfiguration, making it an ideal choice for cloud environments.

2. Improved Scalability: With its larger network segment capacity, VXLAN offers improved scalability compared to traditional VLANs. This scalability is crucial in modern data centers and cloud environments where virtual machines and network segments are continuously growing.

3. Simplified Network Management: VXLAN simplifies network management tasks by abstracting the network infrastructure. Network administrators can define and manage virtual networks independently of the underlying physical infrastructure, streamlining network operations and reducing complexity.

Use Cases for VXLAN:

1. Data Center Interconnect: VXLAN is widely used for interconnecting geographically dispersed data centers. By extending Layer 2 network connectivity over Layer 3 infrastructure, VXLAN facilitates seamless VM mobility, disaster recovery, and workload balancing across data centers.

2. Multi-tenancy in Cloud Environments: VXLAN allows cloud service providers to create isolated network segments for different tenants, enhancing security and providing dedicated network resources. This feature is vital in multi-tenant cloud environments where data privacy and network isolation are critical.

3. Network Virtualization: VXLAN plays a crucial role in network virtualization, enabling the creation of virtual networks that are independent of the underlying physical infrastructure. This virtualization simplifies network management, enhances flexibility, and enables efficient resource utilization.

VXLAN has emerged as a powerful network virtualization technology with many use cases. VXLAN provides the flexibility, scalability, and efficiency required in modern networking environments, from data center virtualization to multi-tenancy, hybrid cloud connectivity, and disaster recovery. As organizations continue to embrace cloud computing and virtualization, VXLAN will undoubtedly play a pivotal role in shaping the future of networking.

Summary: What is VXLAN

VXLAN, short for Virtual Extensible LAN, is a network virtualization technology that has recently gained significant popularity. In this blog post, we will examine VXLAN’s definition, workings, and benefits. So, let’s dive into the world of VXLAN!

Understanding VXLAN Basics

VXLAN is an encapsulation protocol that enables the creation of virtual networks over existing Layer 3 infrastructures. It extends Layer 2 segments over Layer 3 networks, allowing for greater flexibility and scalability. By encapsulating Layer 2 frames within Layer 3 packets, VXLAN enables efficient communication between virtual machines (VMs) across physical hosts or data centers.

VXLAN Operation and Encapsulation

To understand how VXLAN works, we must look at its operation and encapsulation process. When a VM sends a Layer 2 frame, it is encapsulated into a VXLAN packet by adding a VXLAN header. This header includes information such as the VXLAN network identifier (VNI), which helps identify the virtual network to which the packet belongs. The VXLAN packet is then transported over the underlying Layer 3 network to the destination physical host, encapsulated, and delivered to the appropriate VM.

Benefits and Use Cases of VXLAN

VXLAN offers several benefits that make it an attractive choice for network virtualization. Firstly, it enables the creation of large-scale virtual networks, allowing for seamless VM mobility and workload placement flexibility. VXLAN also helps overcome the limitations of traditional VLANs by providing a much larger address space, accommodating the ever-growing number of virtual machines in modern data centers. Additionally, VXLAN facilitates network virtualization across geographically dispersed data centers, making it ideal for multi-site deployments and disaster recovery scenarios.

VXLAN vs. Other Network Virtualization Technologies

While VXLAN is widely used, it is essential to understand its key differences and advantages compared to other network virtualization technologies. For instance, VXLAN offers better scalability and flexibility than traditional VLANs. It also provides better isolation and segmentation of virtual networks, making it an ideal choice for multi-tenant environments. Additionally, VXLAN is agnostic to the physical network infrastructure, allowing it to be easily deployed in existing networks without requiring significant changes.

Conclusion:

In conclusion, VXLAN is a powerful network virtualization technology that has revolutionized how virtual networks are created and managed. Its ability to extend Layer 2 networks over Layer 3 infrastructures, scalability, flexibility, and ease of deployment make VXLAN a go-to solution for modern data centers. Whether for workload mobility, multi-site implementations, or overcoming VLAN limitations, VXLAN offers a robust and efficient solution. Embracing VXLAN can unlock new possibilities in network virtualization, enabling organizations to build agile, scalable, and resilient virtual networks.

Concept set of mobile network, wireless Internet connection technology. Wifi illustration. People use device to connect Internet network Modern colorful flat vector illustration for poster, banner.

Routing Convergence

Routing Convergence

Routing convergence, a critical aspect of network performance, refers to the process of network routers exchanging information to update their routing tables in the event of network changes. It ensures efficient and reliable data transmission, minimizing disruptions and optimizing network performance. In this blog post, we will delve into the intricacies of routing convergence, exploring its importance, challenges, and best practices.

Routing convergence refers to the process by which a network's routing tables reach a consistent and stable state after making changes. It ensures that all routers within a network have up-to-date information about the available paths and can make efficient routing decisions.

When a change occurs in a network, such as a link failure or the addition of a new router, routing convergence is necessary to update the routing tables and ensure that packets are delivered correctly. The goal is to minimize the time it takes for all routers in the network to converge and resume normal routing operations.

Several mechanisms and protocols contribute to routing convergence. One of the critical components is the exchange of routing information between routers. This can be done through protocols such as Routing Information Protocol (RIP), Open Shortest Path First (OSPF), or Border Gateway Protocol (BGP).

Highlights: Routing Convergence

Understanding Convergence

What is router convergence?

Router convergence means they have the same topological information about the network they operate. To converge, a set of routers must have collected all topology information from each other using the routing protocol implemented. For this information to be accurate, it must reflect the current state of the network and not contradict other routers’ topology information. All routers agree upon the topology of a converged network. For dynamic routing to work, a set of routers must be able to communicate with each other. All Interior Gateway Protocols depend on convergence. An autonomous system in operation is usually converged or convergent. Exterior Gateway Routing Protocol BGP rarely converges due to the size of the Internet.

Convergence Process

Each router in a routing protocol attempts to exchange topology information about the network. The extent, method, and type of information exchanged between routing protocols, such as BGP4, OSPF, and RIP, differs. A routing protocol convergence occurs once all routing protocol-specific information has been distributed to all routers. In the event of a routing table change in the network, convergence will be temporarily broken until the change has been successfully communicated to all routers.

Example: The convergence process

During the convergence process, routers exchange information about the network’s topology. Based on this information, they update their routing tables and calculate the most efficient paths to reach destination networks. This process continues until all routers have consistent and accurate routing tables.

The convergence time can vary depending on the size and complexity of the network, as well as the routing protocols used. Convergence can happen relatively quickly in smaller networks, while more extensive networks may take longer to achieve convergence.

To optimize routing convergence, network administrators can employ various strategies. These include implementing fast convergence protocols, such as OSPF’s Fast Hello and Bidirectional Forwarding Detection (BFD), which minimize the time it takes to detect and respond to changes in the network.

Example Technology:  BFD

Bidirectional Forwarding Detection (BFD) is a lightweight protocol designed to detect failures in communication paths between routers or switches. It operates independently of the routing protocols and detects rapid failure by utilizing fast packet exchanges. Unlike traditional methods like hello packets, BFD offers sub-second detection, allowing for quicker convergence and network stability. BFD is pivotal in achieving fast routing convergence, providing real-time detection, and facilitating swift rerouting decisions.

-Enhanced Network Resilience: By swiftly detecting link failures, BFD enables routers to act immediately, rerouting traffic through alternate paths. This proactive approach ensures minimal disruption and enhances network resilience, especially in environments where redundancy is critical.

-Reduced Convergence Time: BFD’s ability to detect failures within milliseconds significantly reduces the time required for converging routing protocols. This translates into improved network responsiveness, reduced packet loss, and enhanced user experience.

-Scalability and Flexibility: BFD can be implemented across various network topologies and routing protocols, making it a versatile solution. Whether a small enterprise network or a large-scale service provider environment, BFD adapts seamlessly, providing consistent performance and stability.

Convergence Time

Convergence time measures the speed at which a group of routers converges. Fast and reliable convergent routers are a significant performance indicator for routing protocols. The size of the network is also essential. A more extensive network will converge more slowly than a smaller one.

When a few routers are connected to RIP, a routing protocol that converges slowly, it can take several minutes for the network to converge. A triggered update for a new route can speed up RIP’s convergence, but a hold-down timer will slow flushing an existing route. OSPF is an example of a fast-convergence routing protocol. It is impossible to limit the speed at which OSPF routers can converge.

Unless specific hardware and configuration conditions are met, networks can never converge. “Flapping” interfaces (ones that frequently change between “up” and “down”) propagate conflicting information throughout the network, so routers cannot agree on the current state. Certain parts of a network can be deprived of detailed routing information by using route aggregation, resulting in faster convergence of topological information.

Topological information

A set of routers in a network share the same topological information during convergence or routing convergence. Routing protocols exchange topology information between routers in a network. Routers in a network receive routing information when convergence is reached. Therefore, in a converged network, all routers know the network topology and the optimal route to take. Any change in the network – for example, the failure of a device – affects convergence until all routers are informed of the change. The convergence time in a network is the time it takes for routers to achieve convergence after a topology change. In high-performance service provider networks, sensitive applications are run that require fast failover in case of failures. Several factors determine a network’s convergence rate:

  1. Devices detect route failures. Finding a new forwarding path begins with identifying the failed device. The existence of virtual networks establishes device reachability through their longevity, as opposed to physical networks, in which events determine device availability. To achieve fast network convergence, the detection time – the time it takes to detect a failure – must be kept within acceptable limits.
  2. In the event of a device failure on the primary route, traffic is diverted to the backup route. The failure or topology change has not yet affected all devices.
  3. Routing protocols are said to achieve global repair or network convergence when they propagate a change in topology to all network devices.

Enhancing Routing Convergence

To improve routing convergence, network administrators can implement various strategies. One approach is to utilize route summarization, which reduces the number of routes advertised and processed by routers. This helps to minimize the impact of changes in specific network segments on overall convergence time.

Furthermore, implementing fast link failure detection mechanisms, such as Bidirectional Forwarding Detection (BFD), can significantly reduce convergence time. BFD allows routers to quickly detect link failures and trigger immediate updates to routing tables, ensuring faster convergence.

Factors Influencing Routing Convergence

Several factors impact routing convergence in a network. Firstly, the efficiency of the routing protocols being used plays a crucial role. Protocols such as OSPF (Open Shortest Path First) and EIGRP (Enhanced Interior Gateway Routing Protocol) are designed to facilitate fast convergence by quickly adapting to network changes.

Additionally, network topology and scale can affect routing convergence. Large networks with complex topologies may require more time for routers to converge due to the increased number of routes and potential link failures. Network administrators must carefully design and optimize the network architecture to minimize convergence time.

Control and data plane

When considering routing convergence with forwarding routing protocols, we must first highlight that a networking device is tasked with two planes of operation—the control plane and the data plane. The job of the data plane is to switch traffic across the router’s interfaces as fast as possible, i.e., move packets. The control plane has the more complex operation of putting together and creating the controls so the data plane can operate efficiently. How these two planes interact will affect network convergence time.

The network’s control plane finds the best path for routing convergence from any source to any network destination. For quick convergence routing, it must react quickly and be dynamic to changes in the network, both of the LAN and for WAN.

Control and Data Plane

Monitoring and Troubleshooting Routing Convergence

Network administrators must monitor routing convergence to identify and promptly address potential issues. Network management tools, such as SNMP (Simple Network Management Protocol) and NetFlow analysis, can provide valuable insights into routing convergence performance, including convergence time, route flapping, and stability.

When troubleshooting routing convergence problems, administrators should carefully analyze routing table updates, link state information, and routing protocol logs. This information can help pinpoint the root cause of convergence delays or inconsistencies, allowing for targeted remediation.

netflow

Related: For pre-information, you may find the following posts helpful:

  1. Implementing Network Security
  2. Dead Peer Detection
  3. IPsec Fault Tolerance
  4. WAN Virtualization
  5. Port 179



Convergence Routing


Key Routing Convergence Discussion Points:


  • Convergence time definitions.

  • IP Forwarding paradigms.

  • Path Selection.

  • The effects of TCP congestion controls.

  • Adding resilience.

  • Routing protocol convergence steps.

Convergence Time Definition.

I found two similar definitions of convergence time:

“Convergence is the amount of time ( and thus packet loss ) after a failure in the network and before the network settles into a steady state.” Also, ” Convergence is the amount of time ( and thus packet loss) after a failure in the network and before the network responds to the failure.”

The difference between the two convergence time definitions is subtle but essential – steady-state vs. just responding. The control plane and its reaction to topology changes can be separated into four parts below. Each area must be addressed individually, as leaving one area out results in slow network convergence time and application time-out.

Knowledge Check: Routing and Convergence

Strategies for Achieving Optimal Routing Convergence

Enhanced Link-State Routing Protocol (EIGRP)

EIGRP is a dynamic routing protocol that utilizes a Diffusing Update Algorithm (DUAL) to achieve fast convergence. By maintaining a backup route in case of link failures and employing triggered updates, EIGRP significantly reduces the time required for routing tables to converge.

Optimizing Routing Metrics

Carefully configuring routing metrics, such as bandwidth, delay, and reliability, can aid in achieving faster convergence. Assigning appropriate weights to these metrics ensures that routers select the most efficient paths quickly, leading to improved convergence times.

Implementing Route Summarization

Route summarization involves aggregating multiple network routes into a single summarized route. This technique reduces the size of routing tables and minimizes the complexity of route calculations, resulting in faster convergence.

Example: BGP Next Hop Tracking

The next hop for every route in the BGP table must exist and be reachable. If not, the route cannot be used. Every 60 seconds, BGP checks all routes in the BGP table. The BGP scanner calculates the best path, checks the next hop addresses, and determines if the next hops are reachable.

It takes a long time to finish a task in 60 seconds. We must wait for the next scan to start before resolving problems with the next hops during the 60 seconds between two scans. It is possible to have black holes and routing loops.

A BGP next-hop tracking feature reduces the convergence time by monitoring changes in BGP next-hop addresses. It is based on events that are detected when changes are made to the routing table. Whenever a change is detected, it schedules a next hop scan to adjust the next hop.

When a change is detected, the next hop scan is delayed by 5 seconds by default. The next hop tracking system also supports dampening penalties. The next hop scan becomes delayed when the next hop address in the routing table is constantly changing.

Back to Basics: IP routing

Moving IP Packets

A router’s primary role is moving an IP packet from one network to another. Routers select the best loop-free path in a network to forward a packet to its destination IP address. A router learns about nonattached networks through static configuration or dynamic IP routing protocols. Both static and dynamic are examples of routing protocols.

With dynamic IP routing protocols, we can handle network topology changes dynamically. Here, we can distribute network topology information between routers in the network. When there is a change in the network topology, the dynamic routing protocol provides updates without intervention when a topology change occurs.

On the other hand, we have IP routing to static routes, which do not accommodate topology changes very well and can be a burden depending on the network size. However, static routing is a viable solution for minimal networks with no modifications.

Dynamic Routing Protocols
Diagram: Dynamic Routing Protocols. Source Cisco Press.

Knowledge Check: Bidirectional Forwarding Detection

Understanding BFD

BFD is a lightweight protocol designed to detect faults in the forwarding path between network devices. It operates at a low level, constantly monitoring the connectivity and responsiveness of neighboring devices. BFD can quickly detect failures by exchanging control packets and taking appropriate action to maintain network stability.

The Benefits of BFD

The implementation of BFD brings numerous advantages to network administrators and operators. Firstly, it provides rapid fault detection, reducing downtime and minimizing the impact of network failures. Additionally, BFD offers scalable and efficient operation, as it consumes minimal network resources. This makes it an ideal choice for large-scale networks where resource optimization is crucial.

BFD runs independently from other (routing) protocols. Once it’s up and running, you can configure protocols like OSPF, EIGRP, BGP, HSRP, MPLS LDP, etc., to use BFD for link failure detection instead of their mechanisms. When the link fails, BFD informs the protocol. When BFD no longer receives its control packets, it realizes we have a link failure and reports this to OSPF. OSPF will then tear down the neighbor adjacency.

Bidirectional Forwarding Detection (BFD)

Use Cases of BFD

BFD finds its applications in various networking scenarios. One prominent use case is link aggregation, where BFD helps detect link failures and ensures seamless failover to alternate links. BFD is also widely utilized in Virtual Private Networks (VPNs) to monitor the connectivity of tunnel endpoints, enabling quick detection of connectivity issues and swift rerouting.

Implementing BFD in Practice

Implementing BFD requires careful consideration and configuration. Network devices must be appropriately configured to enable BFD sessions and define appropriate parameters such as timers and thresholds. Additionally, network administrators must ensure proper integration with underlying routing protocols to maximize BFD’s efficiency.

Convergence Routing and Network Convergence Time

Network convergence connects multiple computer systems, networks, or components to establish communication and efficient data transfer. However, it can be a slow process, depending on the size and complexity of the network, the amount of data that needs to be transferred, and the speed of the underlying technologies.

For networks to converge, all of the components must interact with each other and establish rules for data transfer. This process requires that the various components communicate with each other and usually involves exchanging configuration data to ensure that all components use the same protocols.
Network convergence is also dependent on the speed of the underlying technologies.

To speed up convergence, administrators should use the latest technologies, minimize the amount of data that needs to be transferred, and ensure that all components are correctly configured to be compatible. By following these steps, network convergence can be made faster and more efficient.

1st Lab Guide: OSPF

The following lab guide demonstrates OSPF and its ability to perform ECMP. ECMP is performed with the total metric (i.e., OSPF costs ) being the same end-to-end for two links. In the screenshot below, also notice the default broadcast and DR election network types.

Diagram: Leaf and Spine Routed.

Example: OSPF

To put it simply, convergence or routing convergence is a state in which a set of routers in a network share the same topological information. For example, we have ten routers in one OSFP area. OSPF is an example of a fast-converging routing protocol. A network of a few OSPF routers can converge in seconds.

The routers within the OSPF area in the network collect the topology information from one another through the routing protocol. Depending on the routing protocol used to collect the data, the routers in the same network should have identical copies of routing information.

Different routing protocols will have additional convergence time. The time the routers take to reach convergence after a change in topology is termed convergence time. Fast network convergence and fast failover are critical factors in network performance. Before we get into the details of routing convergence, let us recap how networking works.

network convergence time
Diagram: Network convergence time.

Unlike IS-IS, OSPF has fewer “knobs” for optimizing convergence. This is probably because IS-IS is being developed and supported by a separate team geared towards ISPs, where fast convergence is a competitive advantage.

Example Convergence Time with OSPF
Diagram: Example Convergence Time with OSPF. Source INE.

OSPF: Incremental SPF

OSPF calculates the SPT (Shortest Path Tree) using the SPF (Shortest Path First) algorithm. SPTs are built by OSPF routers within the same area with the same LSAs, LSDBs, and LSAs. OSPF routers will rerun a full SPF calculation even when there is just a single change in the network topology (change to an LSA type 1 and LSA type 2).

If a topology change occurs, we should run a full SPT calculation to find the shortest paths to all destinations. Unfortunately, we also calculate paths that have not changed since the last SPF.

In incremental SPF, OSPF only recalculates the parts of the SPT that have changed.

Because you don’t run a full SPF all the time, the router’s CPU load decreases,, and convergence times improve—additionally, your router stores the previous SPT copy, which requires more memory.

In three scenarios, incremental SPF is beneficial:

  • Adding (or removing) a leaf node to a branch

  • Link failure in non-SPT

  • Link failure in branch of SPT

When you have a lot of routers in a single area and your CPU load is high because of OSPF, incremental SPF can be enabled per router.

Forwarding Paradigms

We have bridging routing and switching with data and the control plane. So, we need to get packets across a network, which is easy if we have a single cable. You need to find the node’s address, and small and non-IP protocols would use a broadcast. When devices in the middle break this path, we can use source routing, path-based forwarding, and hop-by-hop address-based forwarding based solely on the destination address.

When protocols like IP came into play, hop-by-hop destination-based forwarding became the most popular; this is how IP forwarding works. Everyone in the path makes independent forwarding decisions. Each device looks at the destination address, examines its lookup tables, and decides where to send the packet.

Finding paths across the network

How do we find a path across the network? We know there are three ways to get packets across the network – source routing, path-based forwarding, and hop-by-hop destination-based forwarding. So, we need some way to populate the forwarding tables. You need to know how your neighbors are and who your endpoints are. This can be static routing, but it is more likely to be a routing protocol. Routing protocols have to solve and describe the routing convergence on the network at a high level.

When we are up and running, events can happen to the topology that force or make the routing protocols react and perform a convergence routing state. For example, we have a link failure, and the topology has changed, impacting our forwarding information. So, we must propagate the information and adjust the path information after the topology change. We know these convergence routing states to be Detect, Describe, Switch, and Find.

Rouitng Convergence

Convergence


Detect


Describe


Switch 


Find

To better understand routing convergence, I would like to share the network convergence time for each routing protocol before diving into each step. The times displayed below are from a Cisco Live session based on real-world case studies and field research. We are separating each of the convergence routing steps described above into the following fields: Detect, describe, find alternative, and total time.

Routing Protocol

RIP

OSPF 

EIGRP

Detect

<1 second-best, 105 seconds average

<1 second-best, 20 seconds average 

<1 second-best, 15 seconds average.30 seconds worst

Describe

15 seconds average, 30 seconds worst

1 second-best, 5 seconds average.

2 seconds

Find Alternative

15 seconds average, 30 seconds worst

 1-second average.

*** <500ms per query hop average Assume a 2-second average

Total Time

Best Average Case: 31 seconds Average Case: 135 seconds Worse Case: 179 seconds

Best Average Case: 2 to 3 seconds

Average Case: 25 seconds

Worse Case: 45 seconds

Best Average Case: <1 second

Average Case: 20 seconds

Worse Case: 35 seconds

*** The alternate route is found before the described phase due to the feasible successor design with EIGRP path selection.

Convergence Routing

Convergence routing: EIGPR

EIGRP is the fastest but only fractional. EIGRP has a pre-built loop-free path known as a feasible successor. The FS route has a higher metric than the successor, making it a backup route to the successor route. The effect of a pre-computed backup route on convergence is that EIGRP can react locally to a change in the network topology; nowadays, this is usually done in the FIB. EIGRP would have to query for the alternative route without a feasible successor, increasing convergence time.

However, you can have a Loop Free Alternative ( LFA ) for OSPF, which can have a pre-computed alternate path. Still, LFAs can only work with specific typologies and don’t guarantee against micro-loops ( EIGRP guarantees against micro-loops).

Lab Guide: EIGRP LFA FRR

With Loop-Free Alternate (LFA) Fast Reroute (FRR), EIGRP can switch to a backup path in less than 50 milliseconds. Fast rerouting means switching to another next hop, and a loop-free alternate refers to a loop-free alternative path.

Perhaps this sounds familiar to you. After all, EIGRP has feasible successors. The alternate paths calculated by EIGRP are loop-free. As soon as the successor fails, EIGRP can use a feasible successor.

It’s true, but there’s one big catch. In the routing table, EIGRP feasible successors are not immediately installed. There is only one route installed, the successor route. EIGRP installs the feasible successor when the successor fails, which takes time. By installing both successor routes and feasible successor routes in the routing table, fast rerouting makes convergence even faster.

EIGRP Configuration

These four routers run EIGRP; there’s a loopback on R4 with network 4.4.4.4/32. R1 can go through R2 or R3 to get there. The delay on R1’s GigabitEthernet3 interface has increased, so R2 is our successor, and R3 is our feasible successor. The output below is interesting. We still see the successor route, but at the bottom, you can see the repair path…that’s our feasible successor.

TCP Congestion control

Ask yourself, is < 1-second convergence fast enough for today’s applications? Indeed, the answer would be yes for some non-critical applications that work on TCP. TCP has built-in backoff algorithms that can deal with packet loss by re-transmitting to recover lost segments. However, non-bulk data applications like video and VOIP have stricter rules and require fast convergence and minimal packet loss.

For example, a 5-second delay in routing protocol convergence could mean several hundred dropped voice calls. A 50-second delay in a Gigabit Ethernet link implies about 6.25 GB of lost information.

Adding Resilience

To add resilience to a network, you can aim to make the network redundant. When you add redundancy, you are betting that outages of the original path and the backup path will not co-occur and that the primary path does not fate share with the backup path ( they do not share common underlying infrastructure, i.e., physical conducts or power ).

There needs to be a limit on the number of links you add to make your network redundant, and adding 50 extra links does not make your network 50 times more redundant. It does the opposite! The control plane is tasked with finding the best path and must react to modifications in the network as quickly as possible.

However, every additional link you add slows down the convergence of the router’s control plane as there is additional information to compute, resulting in longer convergence times. The correct number of backup links is a trade-off between redundancy versus availability. The optimal level of redundancy between two points should be two or three links. The fourth link would make the network converge slower.

Convergence Routing
Diagram: Convergence routing and adding resilience.

Routing Convergence and Routing Protocol Algorithms

Routing protocol algorithms can be tweaked to exponentially back off and deal with bulk information. However, no matter how many timers you do, the more data in the routing databases, the longer convergence times. The primary way to reduce network convergence is to reduce the size of your routing tables by accepting just a default route, creating a flooding boundary domain, or some other configuration method.

For example, a common approach in OSPF to reduce the size of routing tables and flooding boundaries is to create OSPF stub areas. OSPF stub areas limit the amount of information in the area. For example, EIGRP limits the flooding query domain by creating EIGRP stub routers and intelligently designing aggregation points. Now let us revisit the components of routing convergence:

Routing Convergence Step

Routing Convergence Details

Step 1

Failure detection

Step 2

Failure propagation ( flooding, etc.) IGP Reaction

Step 3

Topology/Routing calculation. IGP Reaction.

Step 4

Update the routing and forwarding table ( RIB & FIB)

Stage 1: Failure Detection

The first and foremost problem facing the control plane is quickly detecting topology changes. Detecting the failure is the most critical and challenging part of network convergence. It can occur at different layers of the OSI stack – Physical Layers ( Layer 1), Data Link Layer ( Layer 2 ), Network Layer ( Layer 3 ), and Application layer ( Layer 7 ).  There are many types of techniques used to detect link failures, but they all generally come down to two basic types:

  • Event-driven notification – loss of carrier or when one network element detects a failure and notifies the other network elements.
  • Polling-driven notification – generally HELLO protocols that test the path for reachability, such as Bidirectional Forwarding Detection ( BFD ). Event-driven notifications are always preferred over polling-driven ones as the latter have to wait for three polls before declaring a path down. However, there are some cases when you have multiple Layer devices in the path, and HELLO polling systems are the only method that can be used to detect a failure.

Layer 1 failure detection

Layer 1: Ethernet mechanisms like auto-negotiation ( 1 GigE ) and link fault signaling ( 10 GigE 802.3ae/ 40 GigE 802.3ba ) can signal local failures to the remote end.

network convergence time
Diagram: Network convergence time and Layer 1.

However, the challenge is getting the signal across an optical cloud, as relaying the fault information to the other end is impossible. When there is a “bump” in the Layer 1 link, it is not always possible for the remote end to detect the failure. In this case, the link fault signaling from Ethernet would get lost in the service provider’s network.

The actual link-down / interface-down event detection is hardware-dependent. Older platforms, such as the 6704 line cards for the Catalyst 6500, used a per-port polling mechanism, resulting in a 1 sec detect link failure period. More recent Nexus switches and the latest Catalyst 5600 line cards have an interrupt-driven notification mechanism resulting in high-speed and predictable link failure detection.

Layer 2 failure detection

Layer 2: The layer 2 detection mechanism will kick in if the Layer 1 mechanism does not. Unidirectional Link Detection ( UDLD ) is a Cisco proprietary lightweight Layer 2 failure detection protocol designed for detecting one-way connections due to physical or soft failure and miss-wirings.

  • A key point: UDLD is a slow protocol

UDLD is a reasonably slow protocol that uses an average of 15 seconds for message interval and 21 seconds for detection. Its period has raised questions about its use in today’s data centers. However, the chances of miswirings are minimal; Layer 1 mechanisms always communicate unidirectional physical failure, and STP Bridge Assurance takes care of soft failures in either direction.

STP Bridge assurance turns STP into a bidirectional protocol and ensures that the spanning tree never fails to open and only fails to close. Failing open means that if a switch does not hear from its neighbor, it immediately starts forwarding on initially blocked ports, causing network havoc.

Layer 3 failure detection

Layer 3: In some cases, failure detection has to reply to HELLO protocols at Layer 3 and is needed when there are intermediate Layer 2 hops over Layer links and when you have concerns over uni-direction failures on point-to-point physical links.

Diagram: Layer 3 failure detection

All Layer 3 protocols use HELLOs to maintain neighbor adjacency and a DEAD time to declare a neighbor dead. These timers can be tuned for faster convergence. However, it is generally not recommended due to the increase in CPU utilization causing false positives and the challenges ISSU and SSO face. They enable bidirectional forwarding detection ( BFD ) as the layer 3 detection mechanism is strongly recommended over aggressive protocol times, and they use BFD for all protocols.

Bidirectional Forwarding Detection ( BFD ) is a lightweight hello protocol for sub-second Layer 3 failure detection. It can run over multiple transport protocols such as MPLS, THRILL, IPv6, and IPv4, making it the preferred Layer 3 failure detection method.

Stage 2: Routing convergence and failure propagation

When a change occurs in the network topology, it must be registered with the local router and transmitted throughout the rest of the network. The transmission of the change information will be carried out differently for Link-State and Distance Vector protocols. Link state protocols must flood information to every device in the network, and the distance vector must process the topology change at every hop through the network.

The processing of information at every hop may lead you to conclude that link-state protocols always converge more quickly than path-vector protocols, but this is not the case. EIGRP, due to its pre-computed backup path, will converge more rapidly than any link-state protocol.

Routing convergence
Diagram: Routing convergence and failure propagation

To propagate topology changes as quickly as possible, OSPF ( Link state ) can group changes into a few LSA while slowing down the rate at which information is flooded, i.e., do not flood on every change. This is accomplished with link-state flood timer tuning combined with exponential backoff systems, such as link-state advertisement delay / initial link-state advertisement throttle delay.

Unfortunately, Distance Vector Protocols do not have such timers. Therefore, reducing the routing table size is the only option for EIGRP. This can be done by aggregating and filtering reachability information ( summary route or Stub areas ).

Stage 3: Topology/Routing calculation

Similar to the second step, this is where link-state protocols use exponential backoff timers. These timers adjust the waiting time OSPF and ISIS wait after receiving new topology information before calculating the best path. 

Stage 4: Update the routing and forwarding table ( RIB & FIB)

Finally, after the topology information has been flooding through the network and a new best path has been calculated, the new best path must be installed in the Forwarding Information Base ( FIB ). The FIB is a copy of the RIB in hardware, and the forwarding process finds it much easier to read than the RIB. However, again, this is usually done in hardware. Most vendors offer features that will install a pre-computed backup path on the line cards forwarding table so the fail-over from the primary path to the backup path can be done in milliseconds without interrupting the router CPU.

Closing Points: Routing Convergence

Routing convergence refers to the process by which network routers exchange routing information and adapt to changes in network topology or routing policies. It involves the timely update and synchronization of routing tables across the network, allowing routers to determine the best paths for forwarding data packets.

IRouting convergence is vital for maintaining network stability and minimizing disruptions. Network traffic may experience delays, bottlenecks, or even failures without proper convergence. Routing convergence enables efficient and reliable communication by ensuring all network routers have consistent routing information.

Mechanisms for Achieving Routing Convergence:

1. Routing Protocols:

– Link-State Protocols: OSPF (Open Shortest Path First) and IS-IS (Intermediate System to Intermediate System) are examples of link-state protocols. They use flooding techniques to exchange information about network topology, allowing routers to calculate the shortest path to each destination.

– Distance-Vector Protocols: RIP (Routing Information Protocol) and EIGRP (Enhanced Interior Gateway Routing Protocol) are distance-vector protocols that use iterative algorithms to determine the best path based on distance metrics.

2. Fast Convergence Techniques:

– Triggered Updates: When a change occurs in network topology, routers immediately send updates to inform other routers about the change, reducing the convergence time.

– Route Flapping Detection: Route flapping occurs when a network route repeatedly becomes available and unavailable. By detecting and suppressing flapping routes, convergence time can be significantly improved.

– Convergence Optimization: Techniques like unequal-cost load balancing and route summarization help optimize routing convergence by distributing traffic across multiple paths and reducing the size of routing tables.

3. Redundancy and Resilience:

– Redundant Links: Multiple physical connections between routers increase network reliability and provide alternate paths in case of link failures.

– Virtual Router Redundancy Protocol (VRRP): VRRP allows multiple routers to act as a single virtual router, ensuring seamless failover in case of a primary router failure.

– Multi-Protocol Label Switching (MPLS): MPLS technology offers fast rerouting capabilities, enabling quick convergence in case of link or node failures.

Benefits of Efficient Routing Convergence:

1. Improved Network Performance: Efficient routing convergence reduces network congestion, latency, and packet loss, improving overall network performance.

2. Enhanced Reliability: Routing convergence ensures uninterrupted communication and minimizes downtime by quickly adapting to changes in network conditions.

3. Scalability: Proper routing convergence techniques facilitate network expansion and accommodate increased traffic demands without sacrificing performance or reliability.

Summary: Routing Convergence

Routing convergence is crucial in network management, ensuring smooth and efficient communication between devices. In this blog post, we will explore the concept of routing convergence, its importance in network operations, common challenges faced, and strategies to achieve faster convergence times.

Section 1: Understanding Routing Convergence

Routing convergence refers to network protocols adapting to changes in network topology, such as link failures or changes in network configurations. It involves recalculating and updating routing tables to ensure the most optimal paths for data transmission. Network downtime can be minimized by achieving convergence quickly, and data can flow seamlessly.

Section 2: The Importance of Fast Convergence

Fast routing convergence is critical for maintaining network stability and minimizing disruptions. In today’s fast-paced digital landscape, where businesses rely heavily on uninterrupted connectivity, delays in convergence can result in significant financial losses, degraded user experience, and even security vulnerabilities. Therefore, network administrators must prioritize measures to enhance convergence speed.

Section 3: Challenges in Routing Convergence

While routing convergence is essential, it comes with its challenges. Network size, complex topologies, and diverse routing protocols can significantly impact convergence times. Additionally, suboptimal route selection, route flapping, and inefficient link failure detection can further hinder the convergence process. Understanding these challenges is crucial for devising practical solutions.

Section 4: Strategies for Achieving Faster Convergence

To optimize routing convergence, network administrators can implement various strategies. These include:

1. Implementing Fast Convergence Protocols: Utilizing protocols like Bidirectional Forwarding Detection (BFD) and Link State Tracking (LST) can expedite the detection of link failures and trigger faster convergence.

2. Load Balancing and Redundancy: Distributing traffic across multiple paths and employing redundancy mechanisms, such as Equal-Cost Multipath (ECMP) routing, can mitigate the impact of link failures and improve convergence times.

3. Optimizing Routing Protocol Parameters: Fine-tuning routing protocol timers, hello intervals, and dead intervals can contribute to faster convergence by reducing the time it takes to detect and react to network changes.

Section 5: Conclusion

In conclusion, routing convergence is fundamental to network management, ensuring efficient data transmission and minimizing disruptions. By understanding the concept, recognizing the importance of fast convergence, and implementing appropriate strategies, network administrators can enhance network stability, improve user experience, and safeguard against potential financial and security risks.

Physical diagram - Fabric Extenders and Parent switches

How-to: Fabric Extenders & VPC

Topology Diagram

The topology diagram depicts two Nexus 5K acting as parent switches with physical connections to two downstream Nexus 2k (FEX) acting as the 10G physical termination points for the connected server.

Physical diagram - Fabric Extenders and Parent switches
Physical diagram – Fabric Extenders and Parent switches

Part1. Connecting the FEX to the Parent switch:

The FEX and the parent switch use Satellite Discovery Protocol (SDP) periodic messages to discovery and register with one another.

When you initially log on to the Nexus 5K you can see that the OS does not recognise the FEX even though there are two FEXs that are cabled correctly to parent switch. As the FEX is recognised as a remote line card you would expect to see it with a “show module” command.

N5K3# sh module
Mod Ports Module-Type Model Status
— —– ——————————– ———————- ————
1 40 40x10GE/Supervisor N5K-C5020P-BF-SUP active *
2 8 8×1/2/4G FC Module N5K-M1008 ok
Mod Sw Hw World-Wide-Name(s) (WWN)
— ————– —— ————————————————–
1 5.1(3)N2(1c) 1.3 —

2 5.1(3)N2(1c) 1.0 93:59:41:08:5a:0c:08:08 to 00:00:00:00:00:00:00:00

Mod MAC-Address(es) Serial-Num

— ————————————– ———-

1 0005.9b1e.82c8 to 0005.9b1e.82ef JAF1419BLMA

2 0005.9b1e.82f0 to 0005.9b1e.82f7 JAF1411AQBJ

We issue the “feature fex” command we observe the FEX sending SDP messages to the parent switch i.e. RX but we don’t see the parent switch sending SDP messages to the FEX i.e. TX.

Notice in the output below there is only “fex:Sdp-Rx” messages.

N5K3# debug fex pkt-trace
N5K3# 2014 Aug 21 09:51:57.410701 fex: Sdp-Rx: Interface: Eth1/11, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1
2014 Aug 21 09:51:57.410729 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x4000ff2929f0, device: Fex, Remote link: 0x20000080
2014 Aug 21 09:51:57.410742 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHX
2014 Aug 21 09:51:57.821776 fex: Sdp-Rx: Interface: Eth1/10, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1
2014 Aug 21 09:51:57.821804 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x2ff2929f0, device: Fex, Remote link: 0x20000080
2014 Aug 21 09:51:57.821817 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHU

The FEX appears as “DISCOVERED” but no additional FEX host interfaces appear when you issue a “show interface brief“.

Command: show fex [chassid_id [detail]]: Displays information about a specific Fabric Extender Chassis ID

Command: show interface brief: Display interface information and connection status for each interface.

N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial
————————————————————————
— ——– Discovered N2K-C2232PP-10GE SSI16510AWF
— ——– Discovered N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3# show interface brief

——————————————————————————–

Ethernet VLAN Type Mode Status Reason Speed Port

Interface Ch #

——————————————————————————–

Eth1/1 1 eth access down SFP validation failed 10G(D) —

Eth1/2 1 eth access down SFP validation failed 10G(D) —

Eth1/3 1 eth access up none 10G(D) —

Eth1/4 1 eth access up none 10G(D) —

Eth1/5 1 eth access up none 10G(D) —

Eth1/6 1 eth access down Link not connected 10G(D) —

Eth1/7 1 eth access down Link not connected 10G(D) —

Eth1/8 1 eth access down Link not connected 10G(D) —

Eth1/9 1 eth access down Link not connected 10G(D) —

Eth1/10 1 eth fabric down FEX not configured 10G(D) —

Eth1/11 1 eth fabric down FEX not configured 10G(D) —

Eth1/12 1 eth access down Link not connected 10G(D) —

snippet removed

The Fabric interface Ethernet1/10 show as DOWN with a “FEX not configured” statement.

N5K3# sh int Ethernet1/10
Ethernet1/10 is down (FEX not configured)
Hardware: 1000/10000 Ethernet, address: 0005.9b1e.82d1 (bia 0005.9b1e.82d1)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is fex-fabric
auto-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

snippet removed

To enable the parent switch to fully discover the FEX we need to issue the “switchport mode fex-fabric” under the connected interface. As you can see we are still not sending any SDP messages but we are discovering the FEX.

The next step is to enable the FEX logical numbering under the interface so we can start to configure the FEX host interfaces. Once this is complete we run the “debug fex pkt-trace” and we are not sending TX and receiving RX SDP messages.

Command:”fex associate chassis_id“: Associates a Fabric Extender (FEX) to a fabric interface. To disassociate the Fabric Extender, use the “no” form of this command.

From the “debug fexpkt-race” you can see the parent switch is now sending TX SDP messages to the fully discovered FEX.

N5K3(config)# int Ethernet1/10
N5K3(config-if)# fex associate 101
N5K3# debug fex pkt-trace
N5K3# 2014 Aug 21 10:00:33.674605 fex: Sdp-Tx: Interface: Eth1/10, Fex Id: 101, Ctrl Vntag: 0, Ctrl Vlan: 4042
2014 Aug 21 10:00:33.674633 fex: Sdp-Tx: Refresh Intvl: 3000ms, Uid: 0xc0821e9b0500, device: Switch, Remote link: 0x1a009000
2014 Aug 21 10:00:33.674646 fex: Sdp-Tx: Vendor: Model: Serial: ———-

2014 Aug 21 10:00:33.674718 fex: Sdp-Rx: Interface: Eth1/10, Fex Id: 0, Ctrl Vntag: 0, Ctrl Vlan: 4042

2014 Aug 21 10:00:33.674733 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x2ff2929f0, device: Fex, Remote link: 0x20000080

2014 Aug 21 10:00:33.674746 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHU

2014 Aug 21 10:00:33.836774 fex: Sdp-Rx: Interface: Eth1/11, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1

2014 Aug 21 10:00:33.836803 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x4000ff2929f0, device: Fex, Remote link: 0x20000080

2014 Aug 21 10:00:33.836816 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHX

2014 Aug 21 10:00:36.678624 fex: Sdp-Tx: Interface: Eth1/10, Fex Id: 101, Ctrl Vntag: 0, Ctrl Vlan: 4042

2014 Aug 21 10:00:36.678664 fex: Sdp-Tx: Refresh Intvl: 3000ms, Uid: 0xc0821e9b0500, device: Switch, Remote snippet removed

Now the 101 FEX status changes from “DISCOVERED” to “ONLINE”. You may also see an additional FEX with serial number SSI165204YC as “DISCOVERED” and not “ONLINE”. This is due to the fact that we have not explicitly configured it under the other Fabric interface.

N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial
————————————————————————
101 FEX0101 Online N2K-C2232PP-10GE SSI16510AWF
— ——– Discovered N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3# show module fex 101

FEX Mod Ports Card Type Model Status.

— — —– ———————————- —————— ———–

101 1 32 Fabric Extender 32x10GE + 8x10G Module N2K-C2232PP-10GE present

FEX Mod Sw Hw World-Wide-Name(s) (WWN)

— — ————– —— ———————————————–

101 1 5.1(3)N2(1c) 4.4 —

FEX Mod MAC-Address(es) Serial-Num

— — ————————————– ———-

101 1 f029.29ff.0200 to f029.29ff.021f SSI16510AWF

Issuing the “show interface brief” we see new interfaces, specifically host interfaces for the FEX. The syntax below shows that only one interface is up; interface labelled Eth101/1/1. Reason for this is that only one end host (server) is connected to the FEX

N5K3# show interface brief
——————————————————————————–
Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
——————————————————————————–
Eth1/1 1 eth access down SFP validation failed 10G(D) —
Eth1/2 1 eth access down SFP validation failed 10G(D) —
snipped removed

——————————————————————————–

Port VRF Status IP Address Speed MTU

——————————————————————————–

mgmt0 — up 192.168.0.53 100 1500

——————————————————————————–

Ethernet VLAN Type Mode Status Reason Speed Port

Interface Ch #

——————————————————————————–

Eth101/1/1 1 eth access up none 10G(D) —

Eth101/1/2 1 eth access down SFP not inserted 10G(D) —

Eth101/1/3 1 eth access down SFP not inserted 10G(D) —

Eth101/1/4 1 eth access down SFP not inserted 10G(D) —

Eth101/1/5 1 eth access down SFP not inserted 10G(D) —

Eth101/1/6 1 eth access down SFP not inserted 10G(D) —

snipped removed

N5K3# sh run int eth1/10
interface Ethernet1/10
switchport mode fex-fabric
fex associate 101

The Fabric Interfaces do not run a Spanning tree instance while the host interfaces do run BPDU guard and BPDU filter by default. The reason why the fabric interfaces do not run spanning tree is because they are backplane point to point interfaces.

By default, the FEX interfaces will send out a couple of BPDU’s on start-up.

N5K3# sh spanning-tree interface Ethernet1/10
No spanning tree information available for Ethernet1/10
N5K3#
N5K3#
N5K3# sh spanning-tree interface Eth101/1/1Vlan Role Sts Cost Prio.Nbr Type
—————- —- — ——— ——– ——————————–
VLAN0001 Desg FWD 2 128.1153 Edge P2p

N5K3#

N5K3# sh spanning-tree interface Eth101/1/1 detail

Port 1153 (Ethernet101/1/1) of VLAN0001 is designated forwarding

Port path cost 2, Port priority 128, Port Identifier 128.1153

Designated root has priority 32769, address 0005.9b1e.82fc

Designated bridge has priority 32769, address 0005.9b1e.82fc

Designated port id is 128.1153, designated path cost 0

Timers: message age 0, forward delay 0, hold 0

Number of transitions to forwarding state: 1

The port type is edge

Link type is point-to-point by default

Bpdu guard is enabled

Bpdu filter is enabled by default

BPDU: sent 11, received 0

N5K3# sh spanning-tree interface Ethernet1/10
No spanning tree information available for Ethernet1/10
N5K3#
N5K3#
N5K3# sh spanning-tree interface Eth101/1/1Vlan Role Sts Cost Prio.Nbr Type—————- —- — ——— ——– ——————————–
VLAN0001 Desg FWD 2 128.1153 Edge P2p

N5K3#

N5K3# sh spanning-tree interface Eth101/1/1 detail

Port 1153 (Ethernet101/1/1) of VLAN0001 is designated forwarding

Port path cost 2, Port priority 128, Port Identifier 128.1153

Designated root has priority 32769, address 0005.9b1e.82fc

Designated bridge has priority 32769, address 0005.9b1e.82fc

Designated port id is 128.1153, designated path cost 0

Timers: message age 0, forward delay 0, hold 0

Number of transitions to forwarding state: 1

The port type is edge

Link type is point-to-point by default

Bpdu guard is enabled

Bpdu filter is enabled by default

BPDU: sent 11, received 0

Issue the commands below to determine the transceiver type for the fabric ports and also the hosts ports for each fabric interface.

Command: “show interface fex-fabric“: displays all the Fabric Extender interfaces

Command: “show fex detail“: Shows detailed information about all FEXs. Including more recent log messages related to the FEX.

N5K3# show interface fex-fabric
Fabric Fabric Fex FEX
Fex Port Port State Uplink Model Serial
—————————————————————
101 Eth1/10 Active 3 N2K-C2232PP-10GE SSI16510AWF
— Eth1/11 Discovered 3 N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3#

N5K3# show interface Ethernet1/10 fex-intf

Fabric FEX

Interface Interfaces

—————————————————

Eth1/10 Eth101/1/1

N5K3#

N5K3# show interface Ethernet1/10 transceiver

Ethernet1/10

transceiver is present

type is SFP-H10GB-CU3M

name is CISCO-TYCO

part number is 1-2053783-2

revision is N

serial number is TED1530B11W

nominal bitrate is 10300 MBit/sec

Link length supported for copper is 3 m

cisco id is —

cisco extended id number is 4

N5K3# show fex detail

FEX: 101 Description: FEX0101 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/10

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Up. State: Active

Fex Port State Fabric Port

Eth101/1/1 Up Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

Eth101/1/4 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online S

Now we quickly enable the second FEX connected to fabric interface E1/11.

N5K3(config)# int et1/11
N5K3(config-if)# switchport mode fex-fabric
N5K3(config-if)# fex associate 102
N5K3(config-if)# end
N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial

————————————————————————

101 FEX0101 Online N2K-C2232PP-10GE SSI16510AWF

102 FEX0102 Online N2K-C2232PP-10GE SSI165204YC

N5K3# show fex detail

FEX: 101 Description: FEX0101 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/10

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Up. State: Active

Fex Port State Fabric Port

Eth101/1/1 Up Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

Eth101/1/4 Down None

Eth101/1/5 Down None

Eth101/1/6 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online Sequence

08/21/2014 10:00:09.621772: Module Online

FEX: 102 Description: FEX0102 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI165204YC

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:00:42, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/11

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/11 – Interface Up. State: Active

Fex Port State Fabric Port

Eth102/1/1 Up Eth1/11

Eth102/1/2 Down None

Eth102/1/3 Down None

Eth102/1/4 Down None

Eth102/1/5 Down None

snippet removed

Logs:

08/21/2014 10:12:13.281018: Module register received

08/21/2014 10:12:13.283215: Registration response sent

08/21/2014 10:12:13.421037: Module Online Sequence

08/21/2014 10:12:16.665624: Module Online

Part 2. Fabric Interfaces redundancy

Static Pinning is when you pin a number of host ports to a fabric port. If the fabric port goes down so do the host ports that are pinned to it. This is useful when you want no oversubscription in the network.

Once the host port shut down due to a fabric port down event, the server if configured correctly should revert to the secondary NIC.

The “pinning max-link” divides the number specified in the command by the number of host interfaces to determine how many host interfaces go down if there is a fabric interface failure.

Now we shut down fabric interface E1/10, you can see that Eth101/1/1 has changed its operation mode to DOWN. The FEX has additional connectivity with E1/11 which remains up.

Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# int et1/10
N5K3(config-if)# shu
N5K3(config-if)#
N5K3(config-if)# end
N5K3# sh fex detail
FEX: 101 Description: FEX0101 state: Offline
FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic:

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Down. State: Configured

Fex Port State Fabric Port

Eth101/1/1 Down Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online Sequence

08/21/2014 10:00:09.621772: Module Online

08/21/2014 10:13:20.50921: Deleting route to FEX

08/21/2014 10:13:20.58158: Module disconnected

08/21/2014 10:13:20.61591: Offlining Module

08/21/2014 10:13:20.62686: Module Offline Sequence

08/21/2014 10:13:20.797908: Module Offline

FEX: 102 Description: FEX0102 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI165204YC

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:00:42, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/11

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/11 – Interface Up. State: Active

Fex Port State Fabric Port

Eth102/1/1 Up Eth1/11

Eth102/1/2 Down None

Eth102/1/3 Down None

Eth102/1/4 Down None

snippet removed

Logs:

08/21/2014 10:12:13.281018: Module register received

08/21/2014 10:12:13.283215: Registration response sent

08/21/2014 10:12:13.421037: Module Online Sequence

08/21/2014 10:12:16.665624: Module Online

Port Channels can be used instead of static pinning between parent switch and FEX so in the event of a fabric interface failure all hosts ports remain active.  However, the remaining bandwidth on the parent switch will be shared by all the host ports resulting in an increase in oversubscription.

Part 3. Fabric Extender Topologies

Straight-Through: The FEX is connected to a single parent switch. The servers connecting to the FEX can leverage active-active data plane by using host vPC.

Shutting down the Peer link results in ALL vPC member ports in the secondary peer become disabled. For this reason it is better to use a Dual Homed.

Dual Homed: Connecting a single FEX to two parent switches.

In active – active, a single parent switch failure does not affect the host’s interfaces because both vpc peers have separate control planes and manage the FEX separately.
For the remainder of the post we are going to look at Dual Homed FEX connectivity with host Vpc.

Full configuration:

N5K1:
feature lacp
feature vpc
feature fex
!
vlan 10
!

vpc domain 1

peer-keepalive destination 192.168.0.52

!

interface port-channel1

switchport mode trunk

spanning-tree port type network

vpc peer-link

!

interface port-channel10

switchport access vlan 10

vpc 10

!

interface Ethernet1/1

switchport access vlan 10

spanning-tree port type edge

speed 1000

!

interface Ethernet1/3 – 5

switchport mode trunk

spanning-tree port type network

channel-group 1 mode active

!

interface Ethernet1/10

switchport mode fex-fabric

fex associate 101

!

interface Ethernet101/1/1

switchport access vlan 10

channel-group 10 mode on

N5K2:

feature lacp

feature vpc

feature fex

!

vlan 10

!

vpc domain 1

peer-keepalive destination 192.168.0.51

!

interface port-channel1

switchport mode trunk

spanning-tree port type network

vpc peer-link

!

interface port-channel10

switchport access vlan 10

vpc 10

!

interface Ethernet1/2

switchport access vlan 10

spanning-tree port type edge

speed 1000

!

interface Ethernet1/3 – 5

switchport mode trunk

spanning-tree port type network

channel-group 1 mode active

!

interface Ethernet1/11

switchport mode fex-fabric

fex associate 102

!

interface Ethernet102/1/1

switchport access vlan 10

channel-group 10 mode on

The FEX do not support LACP so configure the port-channel mode to “ON”

The first step is to check the VPC peer link and general VPC parameters.

Command: “show vpc brief“. Displays the vPC domain ID, the peer-link status, the keepalive message status, whether the configuration consistency is successful, and whether peer-link formed or the failure to form

Command: “show vpc peer-keepalive” Displays the destination IP of the peer keepalive message for the vPC. The command also displays the send and receives status as well as the last update from the peer in seconds and milliseconds

N5K3# sh vpc brief
Legend:
(*) – local vPC is down, forwarding via vPC peer-link
vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive

Configuration consistency status: success

Per-vlan consistency status : success

Type-2 consistency status : success

vPC role : primary

Number of vPCs configured : 1

Peer Gateway : Disabled

Dual-active excluded VLANs : –

Graceful Consistency Check : Enabled

vPC Peer-link status

———————————————————————

id Port Status Active vlans

— —- —— ————————————————–

1 Po1 up 1,10

vPC status

—————————————————————————-

id Port Status Consistency Reason Active vlans

—— ———– —— ———– ————————– ———–

10 Po10 up success success 10

N5K3# show vpc peer-keepalive

vPC keep-alive status : peer is alive

–Peer is alive for : (1753) seconds, (536) msec

–Send status : Success

–Last send at : 2014.08.21 10:52:30 130 ms

–Sent on interface : mgmt0

–Receive status : Success

–Last receive at : 2014.08.21 10:52:29 925 ms

–Received on interface : mgmt0

–Last update from peer : (0) seconds, (485) msec

vPC Keep-alive parameters

–Destination : 192.168.0.54

–Keepalive interval : 1000 msec

–Keepalive timeout : 5 seconds

–Keepalive hold timeout : 3 seconds

–Keepalive vrf : management

–Keepalive udp port : 3200

–Keepalive tos : 192

The trunk interface should be forwarding on the peer link and VLAN 10 must be forwarding and active on the trunk link. Take note if any vlans are on err-disable mode on the trunk.

N5K3# sh interface trunk
——————————————————————————-
Port Native Status Port
Vlan Channel
——————————————————————————–
Eth1/3 1 trnk-bndl Po1
Eth1/4 1 trnk-bndl Po1
Eth1/5 1 trnk-bndl Po1

Po1 1 trunking —

——————————————————————————–

Port Vlans Allowed on Trunk

——————————————————————————–

Eth1/3 1-3967,4048-4093

Eth1/4 1-3967,4048-4093

Eth1/5 1-3967,4048-4093

Po1 1-3967,4048-4093

——————————————————————————–

Port Vlans Err-disabled on Trunk

——————————————————————————–

Eth1/3 none

Eth1/4 none

Eth1/5 none

Po1 none

——————————————————————————–

Port STP Forwarding

——————————————————————————–

Eth1/3 none

Eth1/4 none

Eth1/5 none

Po1 1,10

——————————————————————————–

Port Vlans in spanning tree forwarding state and not pruned

——————————————————————————–

Eth1/3 —

Eth1/4 —

Eth1/5 —

Po1 —

——————————————————————————–

Port Vlans Forwarding on FabricPath

——————————————————————————–

N5K3# sh spanning-tree vlan 10

VLAN0010

Spanning tree enabled protocol rstp

Root ID Priority 32778

Address 0005.9b1e.82fc

This bridge is the root

Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 32778 (priority 32768 sys-id-ext 10)

Address 0005.9b1e.82fc

Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type

—————- —- — ——— ——– ——————————–

Po1 Desg FWD 1 128.4096 (vPC peer-link) Network P2p

Po10 Desg FWD 1 128.4105 (vPC) Edge P2p

Eth1/1 Desg FWD 4 128.129 Edge P2p

Check the Port Channel database and determine the status of the port channel

N5K3# show port-channel database
port-channel1
Last membership update is successful
3 ports in total, 3 ports up
First operational port is Ethernet1/3
Age of the port-channel is 0d:00h:13m:22s
Time since last bundle is 0d:00h:13m:18s
Last bundled member is Ethernet1/5

Ports: Ethernet1/3 [active ] [up] *

Ethernet1/4 [active ] [up]

Ethernet1/5 [active ] [up]

port-channel10

Last membership update is successful

1 ports in total, 1 ports up

First operational port is Ethernet101/1/1

Age of the port-channel is 0d:00h:13m:20s

Time since last bundle is 0d:00h:02m:42s

Last bundled member is Ethernet101/1/1

Time since last unbundle is 0d:00h:02m:46s

Last unbundled member is Ethernet101/1/1

Ports: Ethernet101/1/1 [on] [up] *

To execute reachability tests, create an SVI on the first parent switch and run ping tests. You must first enable the feature set “feature interface-vlan”. The reason we create an SVI in VLAN 10 is because we need an interfaces to source our pings.

N5K3# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# fea
feature feature-set
N5K3(config)# feature interface-vlan
N5K3(config)# int vlan 10
N5K3(config-if)# ip address 10.0.0.3 255.255.255.0

N5K3(config-if)# no shu

N5K3(config-if)#

N5K3(config-if)#

N5K3(config-if)# end

N5K3# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

64 bytes from 10.0.0.3: icmp_seq=0 ttl=255 time=0.776 ms

64 bytes from 10.0.0.3: icmp_seq=1 ttl=255 time=0.504 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=255 time=0.471 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=255 time=0.473 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=255 time=0.467 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 5 packets received, 0.00% packet loss

round-trip min/avg/max = 0.467/0.538/0.776 ms

N5K3# ping 10.0.0.10

PING 10.0.0.10 (10.0.0.10): 56 data bytes

Request 0 timed out

64 bytes from 10.0.0.10: icmp_seq=1 ttl=127 time=1.874 ms

64 bytes from 10.0.0.10: icmp_seq=2 ttl=127 time=0.896 ms

64 bytes from 10.0.0.10: icmp_seq=3 ttl=127 time=1.023 ms

64 bytes from 10.0.0.10: icmp_seq=4 ttl=127 time=0.786 ms

— 10.0.0.10 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 0.786/1.144/1.874 ms

N5K3#

Do the same tests on the second Nexus 5K.

N5K4(config)# int vlan 10
N5K4(config-if)# ip address 10.0.0.4 255.255.255.0
N5K4(config-if)# no shu
N5K4(config-if)# end
N5K4# ping 10.0.0.10
PING 10.0.0.10 (10.0.0.10): 56 data bytes
Request 0 timed out

64 bytes from 10.0.0.10: icmp_seq=1 ttl=127 time=1.49 ms

64 bytes from 10.0.0.10: icmp_seq=2 ttl=127 time=1.036 ms

64 bytes from 10.0.0.10: icmp_seq=3 ttl=127 time=0.904 ms

64 bytes from 10.0.0.10: icmp_seq=4 ttl=127 time=0.889 ms

— 10.0.0.10 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 0.889/1.079/1.49 ms

N5K4# ping 10.0.0.13

PING 10.0.0.13 (10.0.0.13): 56 data bytes

Request 0 timed out

Request 1 timed out

Request 2 timed out

Request 3 timed out

Request 4 timed out

— 10.0.0.13 ping statistics —

5 packets transmitted, 0 packets received, 100.00% packet loss

N5K4# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

Request 0 timed out

64 bytes from 10.0.0.3: icmp_seq=1 ttl=254 time=1.647 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=254 time=1.298 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=254 time=1.332 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=254 time=1.24 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 1.24/1.379/1.647 ms

Shut down one of the FEX links to the parent and you see that the FEX is still reachable via the other link that is in the port channel bundle.

N5K3# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# int Eth101/1/1
N5K3(config-if)# shu
N5K3(config-if)# end
N5K3#
N5K3#

N5K3# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

64 bytes from 10.0.0.3: icmp_seq=0 ttl=255 time=0.659 ms

64 bytes from 10.0.0.3: icmp_seq=1 ttl=255 time=0.515 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=255 time=0.471 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=255 time=0.466 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=255 time=0.465 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 5 packets received, 0.00% packet loss

round-trip min/avg/max = 0.465/0.515/0.659 ms

If you would like to futher your knowlege on VPC and how it relates to Data Center toplogies and more specifically, Cisco’s Application Centric Infrastructure (ACI), you can check out my following training courses on Cisco ACI. Course 1: Design and Architect Cisco ACI,  Course 2: Implement Cisco ACI, and Course 3: Troubleshooting Cisco ACI,

SDN Data Center

Redundant links with Virtual PortChannels

Redundant Links with Virtual PortChannels

In the world of networking, efficiency, and reliability are paramount. As data centers expand and organizations strive for seamless connectivity, Virtual PortChannels (vPCs) have emerged as a powerful solution. This blog post aims to demystify vPCs, comprehensively understanding their benefits, functionality, and implementation considerations.

Virtual PortChannels, also known as vPCs, are a technology designed to enhance network scalability and resiliency. By combining multiple physical links into a single logical interface, vPCs allow for increased bandwidth and redundancy, ensuring uninterrupted connectivity and load balancing across network switches.

Redundant links refer to the practice of having multiple physical connections between network devices. This approach mitigates the risks of single points of failure and ensures uninterrupted network connectivity. However, managing redundant links can be complex and resource-intensive.

Virtual PortChannel (vPC) is a technology developed by Cisco Systems that revolutionizes how redundant links are deployed and managed. It allows the creation of a logical link aggregation group (LAG) by bundling multiple physical links into a single logical interface. This logical interface acts as a single point of attachment for downstream devices, simplifying the network topology.

1. Enhanced Redundancy: By bundling multiple physical links into a vPC, network administrators can achieve higher levels of redundancy. In the event of a link failure, traffic seamlessly fails over to the remaining active links, ensuring uninterrupted connectivity.

2. Improved Bandwidth Utilization: vPC enables load balancing across multiple physical links, maximizing the available bandwidth. This intelligent distribution of traffic prevents link congestion and optimizes network performance.

3. Simplified Network Design: Traditional redundant link configurations often involve complex Spanning Tree Protocol (STP) configurations to avoid loops. With vPC, STP is not required, simplifying the network design and reducing potential points of failure.

4. Hardware and Software Requirements: Implementing vPC requires compatible hardware and software. Network administrators must ensure that their devices support vPC functionality and that the necessary licenses are in place.

5. Configuration Best Practices: Proper configuration is crucial for the successful deployment of vPC. Network administrators should follow best practices provided by the equipment manufacturer and ensure consistency across all devices in the vPC domain.

Real-World Use Cases

- Data Centers: vPC is widely used in data center environments to provide high availability and optimal network performance. It allows for seamless migration of virtual machines (VMs) across physical hosts without losing network connectivity.

- Campus Networks: Large campus networks can leverage vPC to enhance redundancy and simplify network management. By aggregating multiple uplinks from access switches, vPC provides a resilient and scalable network infrastructure.

Highlights: Redundant Links with Virtual PortChannels

Port Channels and vPCs

During the early days of Layer 2 Ethernet networks, Spanning Tree Protocol (STP) was used to limit the devastating effects of a topology loop. Even though there may be many connections in a network, STP has one suboptimal principle: only one active path is allowed between two devices.

There are two problems with a single logical link: the first is that half (or more) of the system’s bandwidth is unavailable to data traffic, and the second is that if the active link fails, the network will experience multiple seconds of systemwide data loss as it re-evaluates the new “best” solution to network forwarding on a Layer 2 network.

One of the significant drawbacks of the spanning tree is the concept of blocking ports. While they are essential to prevent loops, blocking ports leads to inefficient network performance. The blocked ports essentially go unused, resulting in unused bandwidth and decreased overall network throughput.

Spanning Tree Root Switch

Load Balancing

Furthermore, in a robust network with STP loop management, there is no efficient dynamic way to utilize all the available bandwidth. Enhanced Layer 2 Ethernet networks have been developed through the use of port channels and virtual port channels (vPCs). Port Channel technology allows forwarding traffic between two participating devices using a load-balancing algorithm to balance traffic across multiple inter-switch links (ISLs).

By bundling the links together as one logical link, the loop problem is also managed. Multi-device port channels can be formed using vPC technology. Port channel-attached devices see a pair of switches acting as a single logical endpoint when attached to a vPC peer; the devices act as separate endpoints. By combining hardware redundancy with port-channel loop management, the vPC environment provides multiple benefits.

Example: Cisco ACI

These technologies are extensively used in the ACI Cisco. A virtual port channel (vPC) allows links physically connected to two different ACI leaf nodes to appear as a single port channel to a third device (i.e., network switch, server, or any other networking device that supports link aggregation technology). Firstly, let us start with the basics.

Spanning Tree Challenges

Traditional spanning trees challenge network designers as they block redundant links. The drawbacks of STP ( spanning tree protocol ) prove extremely expensive in data centers when multiple redundant links are used for mission-critical applications, essentially wasting 50% of the capacity.

You can use the port channel to scale bandwidth, as the bundled links appear as one to higher-level protocols, resulting in all ports forwarding or blocking for a particular VLAN. It would help if you aimed to design all links in a data center as an EtherChannel, as this will optimize your bandwidth and reliability.

STP Path distribution

EtherChannel Technology

Network administrators connect multiple physical Ethernet links between devices to achieve more bandwidth and redundancy. The Spanning Tree Protocol blocks these links, so we need EtherChannel Technology. EtherChannel technology combines several physical links between switches into one logical connection to provide high-speed links and redundancy without being blocked by the Spanning Tree Protocol.

Understanding Layer 2 EtherChannel

Layer 2 EtherChannel, also known as Link Aggregation or Port Channel, combines multiple physical links into a single logical link. This powerful technique enhances bandwidth, improves redundancy, and optimizes load balancing. By bundling multiple links, Layer 2 EtherChannel presents a unified interface for higher throughput and fault tolerance.

Configuring Layer 2 EtherChannel involves steps that vary depending on the networking equipment used. Generally, it starts with identifying the physical links that will be part of the EtherChannel bundle. Then, the appropriate channel mode and load balancing method must be configured. Lastly, the EtherChannel interface is created, and the physical links are assigned. Proper configuration ensures seamless data transmission and efficient utilization of network resources.

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel, also known as routed Etherchannel or port-channel, bundles multiple physical interfaces into a single logical interface to increase bandwidth and provide redundancy at Layer 3. Unlike Layer 2 Etherchannel, which operates at the data link layer, Layer 3 Etherchannel operates at the network layer, allowing traffic distribution across multiple physical links based on routing protocols.

Layer 3 Etherchannel offers several advantages over traditional single link configurations. Firstly, it allows for load balancing, where traffic is distributed across multiple links, maximizing bandwidth utilization and improving overall network performance. Additionally, Layer 3 Etherchannel provides redundancy, ensuring that traffic seamlessly switches to the remaining active links if one link fails, minimizing downtime, and enhancing network reliability.

Configuration of Layer 3 Etherchannel

Configuring Layer 3 Etherchannel involves a few essential steps. Firstly, the physical interfaces that will be part of the Etherchannel bundle must be identified and prepared. Then, a logical interface, often called a port-channel interface, is created. This interface acts as the virtual representation of the bundled physical links. Next, the routing protocol must be configured to distribute traffic across the links. Finally, verification and testing are crucial to ensure proper configuration and functionality.

Layer 3 Etherchannel finds its applications in various scenarios. One everyday use case is in data centers, where high-speed connectivity and redundancy are critical. By bundling multiple links, Layer 3 Etherchannel provides the bandwidth and failover capabilities required for demanding data center environments. Another use case is in network edge deployments, where Layer 3 Etherchannel allows for efficient load balancing and redundancy in connecting access switches to distribution or core switches.

Port Channel and vPC 

Port Channel technology forwards traffic between two participating devices using a load-balancing algorithm. Multiple devices can form a virtual port channel (vPC). A third device can see two Cisco Nexus 7000 or 9000 Series devices as a single port channel using a virtual port channel (vPC). Third devices can be switches, servers, or other networking devices that support port channels.

A virtual private cloud can provide Layer 2 multipathing to create redundancy and increase bandwidth by enabling multiple parallel paths between nodes and load-balancing traffic. The only ones you can use in the vPC are Layer 2 port channels. LACP or a static no protocol configuration is used to configure the port channels.

vPC provides the following technical benefits:

  • A single device can share a port channel between two upstream devices
  • Spanning Tree Protocol (STP) blocked ports are removed
  • Makes sure there are no loops in the topology
  • Uplink bandwidth is utilized to the fullest extent possible
  • When either a device or a link fails, the system quickly converges
  • Resilience at the link level is ensured
  • Ensures a high level of availability

Implementation of vPC topologies

VPC supports the following topologies:

  1. Dual-uplink Layer 2 access: Using a Cisco Nexus 9000 Series switch, an access switch is dual-homed to a pair of distribution switches.
  2. Dual-homing: This topology connects two servers to two switches,
  3. Topologies supported by FEX: FEX supports various vPC topologies using Cisco Nexus 7000 and 9000 Series switches.

Related: For pre-information, you may find the following posts helpful:

  1. Data Center Fabric
  2. Optimal Layer 3 Forwarding
  3. Data Center Failure
  4. Active Active Data Center Design
  5. Network Overlays
  6. Dead Peer Detection

Virtual Portchannels

Key Redundant Links Discussion Points:


  • Introduction to redundant links and what is involved.

  • .Highlighting the details of VPC vs Port Channel.

  • Technical details on Link aggregation.

  • Scenario: LACP negotiation.

  • Details on load balancing functions and virtual portchannels.

  • Final note on fabric extenders.

Back to Basics: Redundant links 

STP has one suboptimal principle: to break loops in a network. This issue with having a single practice is that only one active path is allowed from one device to another, regardless of how many connections might exist in the network. In addition, no efficient dynamic mechanism exists for using all the available bandwidth with STP loop management.

Port Channel Technology

So, to overcome these challenges, enhancements to Layer 2 Ethernet networks were made in the shape of port channel and virtual port channel (vPC) technologies. Port Channel technology permits multiple links between two participating devices to forward traffic using a load-balancing algorithm while managing the loop problem by bundling the links as one logical link.

vPC Technology

Then we have the vPC technology. The vPC technology permits multiple devices to create a port channel. In vPC, a pair of switches acting as a vPC peer endpoint looks like a single logical entity to port channel–attached devices; the two devices that serve as the logical port-channel endpoint are still two different and separate devices.

High Availability: Link and Device.

You need to identify the level of high availability you want to achieve in enterprise branch offices. Then, you can meet your high availability requirements with the appropriate device level and link redundancy.

Link-level redundancy requires two links to run as active/active or active/backup links to recover traffic forwarding lost if one link fails. Therefore, any failure on an access link should not result in a loss of connectivity. To qualify, a branch office must have at least two upstream links, either to a private network or the Internet.

Device-level redundancy is another level of high availability, ensuring that the backup device can take over in the event of a failed device. Device and link redundancy is typically coupled, which means that if one fails, the other will too. As a result of this strategy, there should be no loss of connectivity between branch offices and data centers due to a single device failure.

High Availability and Designs

High-availability designs combine link and device redundancy between branches and data centers to ensure business-critical connectivity. Each data center is dual-homed, so if one fails, traffic can be redirected to the backup data center in the event of a complete failure.

Reroute traffic within 30 seconds should be possible whenever a failure (link, device, or data center) occurs. Packets can be lost during this period. When the user applications can withstand these failover times, sessions are maintained. Established sessions should not be dropped in a branch office with redundant devices if the failed device was forwarding traffic.

redundant links
Diagram: Redundant links with EtherChannel. Source is jmcritobal

vPC vs Port Channel

Servers can be attached to the access switches as port channels, uplinks that consist of redundant links formed from the access can be link aggregated, and the core links can also be bundled. Most switches can support 8 ports in a bundle, and Nexus platforms can support up to 16 – 32 ports.

It would help to create a port channel with ports from different line cards in each redundant switch. This will prevent the failure of a single line card from affecting the entire channel. With this approach, we get redundancy at a logical and physical layer.

Link Aggregation and Port Channels

Link aggregation (EtherChannel and IEEE 802.3ad ) was developed to address that limitation where two Ethernet redundant switches were connected through multiple up-links. However, this did not address the challenges in the data center environment for deploying link aggregation on triangular topologies or if you want to terminate on different switches.

Traditional LAG ( link aggregation ) has limitations because its standard only allows aggregated links to terminate on a single switch. Technologies such as vPC Virtual Port Channel and Virtual Switching System (VSS) have been implemented to overcome this limitation.

Key Points: Port Channels

In summary, a port channel aggregates multiple physical interfaces that create a logical interface. On some platforms, you can bundle up to 32 individual redundant links. The Port channel will also load balance traffic across the redundant links. The port channel will remain operational as long as at least one physical interface within the port channel is operational. Finally, before we move to vPC vs port channel, you can create either Layer 2 or 3 port channels. However, as expected, you cannot combine Layer 2 and 3 interfaces in the same port channel.

Port-channel load balancing

Using a hashing function, frames are distributed between the physical interfaces that make up the port channel. Depending on the method used for load balancing, this hash will differ. Based on the hash result, the physical port to be used for transmission is determined.

A hashing operation can be performed on MAC or IP addresses based on the source address, destination address, or both (some methods use the port number). Depending on the switch model and software version, default load-balancing methods can be layer 2, 3, or 4 and apply globally to all port channels. Here are a few methods for balancing etherchannels:

  • src-ip : Source IP address
  • dst-ip : Destination IP address
  • src-dst-ip : Source and destination IP address
  • src-mac : Source MAC address
  • dst-mac : Destination MAC address
  • src-dst-mac : Source and destination MAC address
  • src-port : Source port number
  • dst-port : Destination port number
  • src-dst-port : Destination source port number

Starting the Debate: vPC vs Port Channel

vPC (Virtual Port-Channel), or multi-chassis ether channel (MEC), is a feature on the Cisco Nexus switches. You can configure port-channel across multiple redundant switches. The virtual Port-channel (vPC) is configured using the interfaces of two redundant switches. Now we have redundancy at a link and a switch layer forming a triangular design. We must terminate the links with a standard port channel on the same switch. We don’t have a channel between two redundant switches in this case.

Virtual PortChannels (vPCs), links between two Cisco switches, appear to a third downstream device as coming from one device and as part of a single PortChannel. A third device can be a switch, a server, or other networking devices that support IEEE 802.3ad PortChannels. Both standard port channels and Virtual PortChannels (vPC) can use the link Aggregation Control Protocol ( LACP ).

LACP negotiation and redundant switches

As part of the IEEE 802.3ad standards, a Link Aggregation Control Protocol ( LACP ) was created to negotiate the channel, and it recommended using this feature when building a bundle. LACP modes can be either active or passive. Active mode means the switch actively negotiates the channel, whereas passive means the port does not initiate an LACP negotiation.

You can form channels between active and passive or two active ports but not passive and passive ports. The port channel will not negotiate and remain down if the correct modes are not configured on each side of the challenge.

The following diagram depicts the logic and physical aspects of a vPC virtual port channel. This is not specific to a Cisco vPC but all aggregation technologies. We have several physical redundant links; in our case, four appear to be one prominent link from a logical perspective.

vpc virtual port channel
Diagram: vPC virtual port channels.

In either case, LACP can be used as the control plane to negotiate the channel. You may ask, is LACP mandatory for vPC?” – No. We can use mode “On” & bring UP the port channel without negotiation/checks. So, we are turning off the LACP or other control protocols. However, is LACP recommended for vPC?” –

Like a normal port channel, it is always advised to use a control protocol for the vPC/port channel. LACP adds a lot of intelligence to the background. The main difference between vPC and port channels is that vPC can terminate on secondary switches, creating a triangular design.

Building triangles for better redundancy

The quandary of the inability to build triangles with link aggregation can be mitigated by deploying either the Nexus technology, known as virtual Port Channels (vPCs), or the Catalyst technology, known as Virtual Switching System ( VSS ). VSS and vPC virtual port channels allow the termination of an LAG on two separate switches, resulting in a triangular design. In addition, they will enable the grouping of two physical redundant switches to form a single logical switch to any downstream device ( switch or server ).

Load Balancing Functions

A hash function is performed when a layer 2 frame is forwarding to a PortChannel to determine which physical links to forward the frame. The load balancing method used for Nexus switches is granular and includes the following:

Nexus Switches

Load Balancing Method

Option 1

Destination IP address

Option 2

Destination MAC address

Option 3

Destination TCP and UDP port number

Option 4

Source and Destination IP address

Option 5

Source and Destination MAC address

Option 6

Source and Destination TCP and UDP port numbers

Option 7

Source IP address

Option 8

Source MAC address

Option 9

Source TCP and UDP port number

Redundant Links: Detect polarized links

Monitoring the traffic distribution over each physical link is essential to detect polarized links. The polarization effect occurs if some links attract more traffic than others, resulting in heavy utilization of some redundant links and low utilization of others. Therefore, before choosing the load balancing method, analyze the traffic flows from source to destination and determine if the flow is too many or evenly spread. For example, I would not use the source IP address load balancing method to load balance traffic from a firewall deploying Network Address Translation ( NAT ) to a single device.

Routing Protocols

Keep in mind for routing convergence, that routing protocols see the channel as one link, so if you have 8 x 10 ports in one bundle and that bundle has an OSPF cost of 10, a failure occurs, and you lose a member of that channel, the OSPF will still mark that link with the same metric. Routing protocols don’t dynamically change metrics due to a member link failure.

Virtual PortChannels Benefits

vPC and VSS offer the following benefits:

Redundant links with Virtual PortChannels

Improved convergence with a link and device failure.

Eliminate the need for STP.

Independent control planes ** Not with the VSS.

Increased bandwidth but combining all redundant links to one from the perspective of STP.

What is vPC?

MEC (Multi-Chassis EtherChannel) is a feature on Cisco Nexus switches that allows you to configure a Port-Channel across multiple switches (i.e., vPC peers). Virtual PC is similar to Virtual Switch System (VSS) on Catalyst 6500s. However, VSS creates just one logical switch instead of vPC’s multiple ones. In this way, a single control plane handles both management and configuration. vPC, on the other hand, allows each switch to be managed and configured independently. vPC manages both switches independently, so it’s important to remember that. Therefore, you must create and permit your VLANs on both Nexus switches.

Comparing vPC and VSS

vPC and VSS are similar technologies, but the Nexus vPC feature has dual control planes. It offers In-Service Upgrade ( ISSU ), which allows upgrading one of the two switches without causing any service interruption. Because the control plane runs independently on each of the vPC peers, the failure of one peer does not affect the virtual switch.

With the VSS, the active peer going down brings down the entire system because of the lack of dual control planes. It should be worth noting that vPC falls back to STP, and the reliance on STP can only be entirely circumvented if you use Cisco’s Fabric Path or THRILL. The VSS is available on the Catalyst platforms, while vPC is solely a Nexus technology.

vPC Terminology:

  • vPC Peer – a vPC switch, one of a pair.
  • vPC member port – one of the ports that form a vPC.
  • vPC – the combined port channel between the vPC peers and the downstream device.
  • vPC peer-link – link used to synchronize state between vPC peer devices, must be 10GbE. The vPC-related control plane communications occur over this link, and any Ethernet frames transported receive special treatment to avoid loops in the vPC member ports.
  • vPC peer keepalive link – the keepalive link between the vPC peer devices. It recommended using the MGMT0 interface and a VRF instance. If the mgmt interface is unavailable, then a routed interface in the mgmt VRF is.
  • vPC VLAN – one of the VLANs that carry over the vPC peer link and are used to communicate via the vPC with a peer device.
  • non-vPC peer VLAN – One STP VLAN is not carried over the peer link.
  • CFS – Cisco Fabric Service Protocol, used for state synchronization and configuration validation between vPC peer devices.

Within a vPC domain, each pair is assigned to a primary or secondary role; by default, the switch with the lowest MAC address becomes the primary peer. The domain identifies the pair of redundant switches, generating a shared MAC address that can be used as a logical switch bridge ID in STP communication.

Virtual PortChannels Best Practices

Below are the best practices to consider for implementation:

  1. Manually define which vPC switch is primary and secondary. Lower the priority, and the more preferred switch will act as the primary.
  2. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer-link with ports in dedicated mode.
  3. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer keepalive link ( non-default VRF ).
  4. Enable Bridge Assurance ( BA ) on the vPC peer-link interface ( default ).
  5. Enable UDLD aggression on the vPC peer-link interface.
  6. On the primary vPC switch, configure the STP root bridge for a VLAN, the active HSRP router, and the PIM DR router. Likewise, on the secondary vPC switch, configure the secondary STP and the standby HSRP router. The Layer 2 and Layer 3 topologies should match.

Introducing Fabric Extenders

If you want to add even more redundancy with vPC, use it with a Fabric Extender. Fabric Extenders act as a remote line card to a parent switch and can be used with vPC in three forms. The first is known as host vPC and is a vPC southbound from the FEX to the server; the second is a vPC northbound from the FEX to the parent switch, sometimes called a Fabric vPC; and the third is both a southbound and northbound vPC from the FEX which is known as Enhanced vPC.

Data Center Topology types
Diagram: Introducing Fabric Extenders.

Virtual PortChannels, the single connection

  • Datacenter interconnect

Because vPC characterizes a single connection from the perspective of higher-level protocols, e.g., STP or OSPF, it can be used as a Layer 2 extension for a DCI ( data center interconnect ) over short distances and dark fiber or protected DWDM only. vPC best practices still apply, and it is recommended that you use different vPC domains for each site and that Layer 3 communication between vPC peers is performed on dedicated routed redundant links.

  • OTV or VPLS

If you connect more than two data centers with a full mesh topology, the best DCI mechanism would be Overlay Transport Virtualization (OTV) or VPLS ( Ethernet-based point-to-multipoint Layer 2 VPN ). vPC can work with two or more data centers, but you must design the topology as a hub-and-spoke.

Any spoke-to-spoke communication must flow through the hub, connecting two data centers back to back or two or more in a hub and spoke design; the layer 2 boundary and STP isolation can be achieved with bridge protocol unit ( BPDU ) filtering on the DCI links. BPDU filtering avoids transmitting BPDUs on a link, essentially turning off STP on the DCI links. You can extend with the multi-pod or multi-site designs if you have Cisco ACI.

redundant links
Diagram: vPC as a Data Center Interconnect.
  • A key point: Loop prevention

vPC has a built-in loop prevention mechanism; never forward a frame received through a peer link to a vPC member port. Under normal operations, a vPC peer switch should never learn MAC addresses over the peer link and is mainly used for flooding, multicast, broadcast, and control plane traffic.

This is because the LAG is terminated on two peer switches, and you don’t want to send traffic received from a single downstream device back down to the same downstream device, resulting in a loop. However, this rule does not apply to:

  1. Non-vPC interface ( orphan port ) and
  2. vPC member ports that are only active in the receiving pair.

Note: An orphan port in a port to a downstream device connected to only one peer.

Redundant Switches: vPC peer link usage

As mentioned, the vPC peer link should not be used for end-host reachability under normal operations. However, if there is a failure on all members of a vPC in a single peer, the peer link will forward frames to the remaining member ports of the vPC. This explains why Cisco has recommended using the same 10G for the peer link.

The peer keepalive link is also mandatory and is used as a heartbeat mechanism to transport UDP datagrams between peers. This avoids a dual-active / split-brain scenario where both peers are active simultaneously. If no heartbeat is received after a configurable timeout, the secondary vPC peer is the primary peer, and all its member ports remain active.

However, if an orphan port is connected to only one peer, undesirable behavior occurs. For example, with a vPC peer link failure, the orphan ports remain active in the secondary peer, even though they are now isolated from the rest of the network. In this case, it is recommended that a non-vPC trunk be configured between peer switches.

Benefits of Virtual PortChannels:

1. Enhanced Network Performance: vPCs distribute traffic across multiple physical links, increasing overall bandwidth and reducing congestion. This improves network performance, especially in environments with high data transfer requirements.

2. Improved Redundancy and High Availability: By utilizing vPCs, organizations can eliminate single points of failure and achieve network resiliency. If a link or switch fails, traffic can seamlessly failover to the remaining active links, ensuring uninterrupted connectivity.

3. Simplified Network Management: vPCs simplify network management by treating multiple physical links as a single logical entity. This unified approach allows for more straightforward configuration, troubleshooting, and maintenance, reducing operational complexity and potential errors.

4. Scalability: As organizations grow and their network requirements evolve, vPCs offer a scalable solution. Additional switches and links can be seamlessly added to the vPC domain, expanding network capacity without disrupting ongoing operations.

Implementing Virtual PortChannels:

Implementing vPCs requires careful planning and consideration. Here are some key factors to keep in mind:

1. Compatible Network Equipment: Ensure the network switches and devices used support vPC technology. Consult the manufacturer’s documentation to verify compatibility and recommended configurations.

2. vPC Peer Link: Establishing a peer link between the vPC-enabled switches is crucial for synchronization and control plane communication. This link should have sufficient bandwidth and redundancy to support the vPC domain.

3. vPC Member Ports: Determine which physical links will be part of the vPC domain and configure them as vPC member ports. These ports should be connected to separate switches to ensure redundancy and minimize single points of failure.

4. vPC Keepalive Link: To monitor the health of vPC peers, a dedicated keepalive link is required. This link should be separate from the vPC peer link and have sufficient bandwidth to exchange keepalive messages.

Virtual PortChannels have revolutionized network connectivity, offering increased performance, redundancy, and scalability. By aggregating multiple physical links into a single logical interface, vPCs simplify network management and ensure uninterrupted connectivity.

While implementing vPCs requires careful planning and consideration, their benefits make them a valuable addition to any modern data center or network infrastructure. Remember, understanding the fundamentals and best practices of vPCs is essential for successfully implementing this technology and maximizing its benefits.

Summary: Redundant Links with Virtual PortChannels

In today’s fast-paced digital world, network reliability and performance are paramount. With the increasing demand for seamless connectivity, businesses seek innovative solutions to enhance their network infrastructure. One such solution that has gained significant traction is the implementation of redundant links with Virtual PortChannel (vPC). In this blog post, we explored the concept of redundant links and delved into the benefits and considerations of utilizing vPC technology.

Understanding Redundant Links

As the name suggests, redundant links are duplicate connections that provide failover capabilities in case of network failures. By establishing multiple links between network devices, organizations can ensure uninterrupted connectivity and minimize the risk of downtime. By distributing traffic across multiple paths, redundant links not only enhance network reliability but also improve overall network performance.

Exploring Virtual PortChannel (vPC) Technology

Virtual PortChannel (vPC) is a technology that aggregates multiple physical links into a single logical link. By bundling these links, vPC provides increased bandwidth, load balancing, and redundancy. This technology enables network devices to form a virtual port channel, presenting as a single port to connected devices. With vPC, organizations can achieve high availability and scalability while simplifying network configuration and management.

Benefits of Redundant Links with vPC

1. Enhanced Network Availability: Redundant links with vPC ensure network availability by providing alternate paths in case of link failures. This redundancy eliminates single points of failure and minimizes the impact of network disruptions.

2. Improved Load Balancing: VPC technology optimizes network performance and prevents bottlenecks by distributing traffic across multiple links. This load-balancing capability results in the efficient utilization of network resources and improved user experience.

3. Simplified Network Management: vPC technology simplifies network configuration and management. By logically consolidating multiple physical links, administrators can streamline their network setup, reducing complexity and potential human errors.

Considerations for Implementing Redundant Links with vPC

While the benefits of redundant links with vPC are significant, it’s essential to consider a few key factors before implementation. Factors such as network topology, hardware compatibility, and proper configuration must be thoroughly evaluated to ensure a successful deployment.

Conclusion:

In conclusion, redundant links with Virtual PortChannel (vPC) present a powerful solution for organizations aiming to enhance network reliability and performance. By combining the advantages of redundant links and virtualization, businesses can achieve high availability, improved load balancing, and simplified network management. With careful planning and consideration, implementing redundant links with vPC can pave the way for a robust and resilient network infrastructure.

multipath tcp

Data Center Topologies

Data Center Topology

In the world of technology, data centers play a crucial role in storing, managing, and processing vast amounts of digital information. However, behind the scenes, a complex infrastructure known as data center topology enables seamless data flow and optimal performance. In this blog post, we will delve into the intricacies of data center topology, its different types, and how it impacts the efficiency and reliability of data centers.

Data center topology refers to a data center's physical and logical layout. It encompasses the arrangement and interconnection of various components like servers, storage devices, networking equipment, and power sources. A well-designed topology ensures high availability, scalability, and fault tolerance while minimizing latency and downtime. As technology advances, so does the landscape of data center topologies. Here are a few emerging trends worth exploring:

Leaf-Spine Architecture: This modern approach replaces the traditional three-tier architecture with a leaf-spine model. It offers high bandwidth, low latency, and improved scalability, making it ideal for cloud-based applications and data-intensive workloads.

Software-Defined Networking (SDN): SDN introduces a new level of flexibility and programmability to data center topologies. By separating the control plane from the data plane, it enables centralized management, automated provisioning, and dynamic traffic optimization.

The chosen data center topology has a significant impact on the overall performance and reliability of an organization's IT infrastructure. A well-designed topology can optimize data flow, minimize latency, and prevent bottlenecks. By considering factors such as fault tolerance, scalability, and network traffic patterns, organizations can tailor their topology to meet their specific needs.

Highlights: Data Center Topology

Choosing a topology

Data centers are the backbone of many businesses, providing the necessary infrastructure to store and manage data and access applications and services. As such, it is essential to understand the different types of available data center topologies. When choosing a topology for a data center, it is necessary to consider the organization’s specific needs and requirements. Each topology offers its advantages and disadvantages, so it is crucial to understand the pros and cons of each before making a decision.

A data center topology refers to the physical layout and interconnection of network devices within a data center. It determines how servers, switches, routers, and other networking equipment are connected, ensuring efficient and reliable data transmission. Topologies are based on scalability, fault tolerance, performance, and cost.

Scalability of the topology

Additionally, it is essential to consider the topology’s scalability, as a data center may need to accommodate future growth. By understanding the different topologies and their respective strengths and weaknesses, organizations can make the best decision for their data centers. For example, in a spine-and-leaf architecture, traffic traveling from one server to another always crosses the same number of devices (unless both servers are located on the same leaf). Payloads need only hop to a spine switch and another leaf switch to reach their destination, thus reducing latency.

what is spine and leaf architecture

Data Center Topology Types

Centralized Model

Smaller data centers (less than 5,000 square feet) may benefit from the centralized model. It is shown that there are separate local area networks (LANs) and storage area networks (SANs), with home-run cables going to each server cabinet and zone. Each server is effectively connected back to the core switches in the main distribution area. As a result, port switches can be utilized more efficiently, and components can be managed and added more quickly. The centralized topology works well for smaller data centers but does not scale up well, making expansion difficult. Many cable runs in larger data centers cause cable pathways and cabinets congestion and increase costs. Zoned or top-of-rack topologies may be used in large data centers for LAN traffic, but centralized architectures may be used for SAN traffic. In particular, port utilization is essential when SAN switch ports are expensive.

Zoned

Distributed switching resources make up a zoned topology. Typically, chassis-based switches support multiple server cabinets and can be distributed among end-of-row (EoR) and middle-of-row (MoR) locations. It is highly scalable, repeatable, and predictable and is recommended by the ANS/TIA-942 Data Center Standards. A zoned architecture provides the highest switch and port utilization level while minimizing cabling costs. Switching at the end of a row can be advantageous in certain situations. Two servers’ local area network (LAN) ports can be connected to the same end-of-row switch for low-latency port-to-port switching. Having to run cable back to the end-of-row switch is a potential disadvantage of end-of-row switching. It is possible for this cabling to exceed that required for a top-of-rack system if every server is connected to redundant switches.

Top-of-rack (ToR)

Switches are typically placed at the top of a server rack to provide top-of-rack (ToR) switching, as shown below. Using this topology is a good option for dense one-rack-unit (1RU) server environments. For redundancy, both switches are connected to all servers in the rack. There are uplinks to the next layer of switching from the top-of-rack switches. It simplifies cable management and minimizes cable containment requirements when cables are managed at the top of the rack. Using this approach, servers within the rack can quickly switch from port to port, and the uplink oversubscription is predictable. In top-of-rack designs, cabling is more efficiently utilized. In exchange, there is usually an increase in the cost of switches and a high cost for under-utilization of ports. There is also the possibility of overheating local area network (LAN) switch gear in server racks when top-of-rack switching is required.

Data Center Architecture Types

Mesh architecture

Mesh networks, known as “network fabrics” or leaf-spine, consist of meshed connections between leaf-and-spine switches.  They are well suited for supporting universal “cloud services” because the mesh of network links enables any-to-any connectivity with predictable capacity and lower latency. The mesh network has multiple switching resources scattered throughout the data center, making it inherently redundant. Compared to huge, centralized switching platforms, these distributed network designs can be more cost-effective to deploy and scale.

Multi-Tier

Multi-tier architectures are commonly used in enterprise data centers. In this design, mainframes, blade servers, 1RU servers, and mainframes run the web, application, and database server tiers.

Mesh point of delivery

Mesh point of delivery (PoD) architectures have leaf switches interconnected within PoDs and spine switches aggregated in a central main distribution area (MDA). This architecture also enables multiple PoDs to connect efficiently to a super-spine tier. Three-tier topologies that support east-west data flows will be able to support new cloud applications with low latency. Mesh PoD networks can provide a pool of low-latency computing and storage for these applications that can be added without disrupting the existing environment.

Super Spine architectecutre

Hyperscale organizations that deploy large-scale data center infrastructures or campus-style data centers often deploy super spine architecture. This type of architecture handles data passing east to west across data halls.

Related: For pre-information, you may find the following post helpful

  1. ACI Cisco
  2. Virtual Switch
  3. Ansible Architecture
  4. Overlay Virtual Networks



Data Center Network Topology

Key Data Center Topologies Discussion Points:


  • End of Row and Top of Rack designs.

  • The use of Fabric Extenders.

  • Layer 2 or Layer 3 to the Core.

  • The rise of Network Virtualization.

  • VXLAN transports.

  • The Cisco ACI and ACI Network.

Back to Basics: Data Center Network Topology

A data center is a physical facility that houses critical applications and data for an organization. It consists of a network of computing and storage resources that support shared applications and data delivery. The components of a data center are routers, switches, firewalls, storage systems, servers, and application delivery controllers.

Enterprise IT data centers support the following business applications and activities:

  • Email and file sharing
  • Productivity applications
  • Customer relationship management (CRM)
  • Enterprise resource planning (ERP) and databases
  • Big data, artificial intelligence, and machine learning
  • Virtual desktops, communications, and collaboration services

A data center consists of the following core infrastructure components:

  • Network infrastructure: Connects physical and virtual servers, data center services, storage, and external connections to end users.
  • Storage Infrastructure: Modern data centers use storage infrastructure to power their operations. Storage systems hold this valuable commodity.
  • A data center’s computing infrastructure is its applications. The computing infrastructure comprises servers that provide processors, memory, local storage, and application network connectivity. In the last 65 years, computing infrastructure has undergone three major waves:
    • In the first wave of replacements of proprietary mainframes, x86-based servers were installed on-premises and managed by internal IT teams.
    • In the second wave, application infrastructure was widely virtualized. The result was improved resource utilization and workload mobility across physical infrastructure pools.
    • The third wave finds us in the present, where we see the move to the cloud, hybrid cloud, and cloud-native (that is, applications born in the cloud).

Common Types of Data Center Topologies:

a) Bus Topology: In this traditional topology, all devices are connected linearly to a common backbone, resembling a bus. While it is simple and cost-effective, a single point of failure can disrupt the entire network.

b) Star Topology: Each device is connected directly to a central switch or hub in a star topology. This design offers centralized control and easy troubleshooting, but it can be expensive due to the requirement of additional cabling.

c) Mesh Topology: A mesh topology provides redundant connections between devices, forming a network where every device is connected to every other device. This design ensures high fault tolerance and scalability but can be complex and costly.

d) Hybrid Topology: As the name suggests, a hybrid topology combines elements of different topologies to meet specific requirements. It offers flexibility and allows organizations to optimize their infrastructure based on their unique needs.

Considerations in Data Center Topology Design:

a) Redundancy: Redundancy is essential to ensure continuous operation even during component failures. By implementing redundant paths, power sources, and network links, data centers can minimize the risk of downtime and data loss.

b) Scalability: As the data center’s requirements grow, the topology should be able to accommodate additional devices and increased data traffic. Scalability can be achieved through modular designs, virtualization, and flexible network architectures.

c) Performance and Latency: The distance between devices, the quality of network connections, and the efficiency of routing protocols significantly impact data center performance and latency. Optimal topology design considers these factors to minimize delays and ensure smooth data transmission.

Impact of Data Center Topology:

Efficient data center topology directly influences the entire infrastructure’s reliability, availability, and performance. A well-designed topology reduces single points of failure, enables load balancing, enhances fault tolerance, and optimizes data flow. It directly impacts the user experience, especially for cloud-based services, where data centers simultaneously cater to many users.

Data Center Topology

Main Data Center Topology Components

Data Center Topology

  • You need to understanding the different topologies and their respective strengths and weaknesses.

  • Rich connectivity among the ToR switches so that all application and end-user requirements are satisfied

  • A well-designed topology reduces single points of failure.

  • Example: Bus, star, mesh, and hybrid topologies

Knowledge Check: Cisco ACI Building Blocks

Before Cisco ACI 4.1, Cisco ACI fabric supported only a two-tier (leaf-and-spine switch) topology in which leaf switches are connected to spine switches without interconnecting them. The Cisco ACI fabric allows multitier (three-tier) fabrics and two tiers of leaf switches, starting with Cisco ACI 4.1, which allows for vertical expansion. As a result, a traditional three-tier aggregation access architecture can be migrated, which is still required for many enterprise networks.

In some situations, building a full-mesh two-tier fabric is not ideal due to the high cost of fiber cables and the limitations of cable distances. A spine-leaf topology is more efficient in these cases, and Cisco ACI continues to automate and improve visibility.

ACI fabric Details
Diagram: Cisco ACI fabric Details

The Role of Networks

A network lives to serve the connectivity requirements of applications and applications. We build networks by designing and implementing data centers. A common trend is that the data center topology is much bigger than a decade ago, with application requirements considerably different from the traditional client–server applications and with deployment speeds in seconds instead of days. This changes how networks and your chosen data center topology are designed and deployed.

The traditional network design was scaled to support more devices by deploying larger switches (and routers). This is the scale-in model of scaling. However, these large switches are expensive and primarily designed to support only a two-way redundancy.

Today, data center topologies are built to scale out. They must satisfy the three main characteristics of increasing server-to-server traffic, scale ( scale on-demand ), and resilience. The following diagram shows a ToR design we discussed at the start of the blog.

Top of Rack (ToR)
Diagram: Data center network topology. Top of Rack (ToR).

The Role of The ToR

Top of rack (ToR) is a term used to describe the architecture of a data center. It is a server architecture in which servers, switches, and other equipment are mounted on the same rack. This allows for the most efficient use of space since the equipment is all within arm’s reach.

ToR is also the most efficient way to manage power and cooling since the equipment is all in the same area. Since all the equipment is close together, ToR also allows faster access times. This architecture can also be utilized in other areas, such as telecommunications, security, and surveillance.

ToR is a great way to maximize efficiency in any data center and is becoming increasingly popular. In contrast to the ToR data center design, the following diagram shows an EoR switch design.

End of Row (EoR)
Diagram: Data center network topology. End of Row (EoR).

The Role of The EoR

The term end-of-row (EoR) design is derived from a dedicated networking rack or cabinet placed at either end of a row of servers to provide network connectivity to the servers within that row. In EoR network design, each server in the rack has a direct connection with the end-of-row aggregation switch, eliminating the need to connect servers directly with the in-rack switch.

Racks are usually arranged to form a row; a cabinet or rack is positioned at the end of this row. This rack has a row aggregation switch, which provides network connectivity to servers mounted in individual racks. This switch, a modular chassis-based platform, sometimes supports hundreds of server connections. However, a large amount of cabling is required to support this architecture.

Data center topology types
Diagram: ToR and EoR. Source. FS Community.

A ToR configuration requires one switch per rack, resulting in higher power consumption and operational costs. Moreover, unused ports are often more significant in this scenario than with an EoR arrangement.

On the other hand, ToR’s cabling requirements are much lower than those of EoR, and faults are primarily isolated to a particular rack, thus improving the data center’s fault tolerance.

If fault tolerance is the ultimate goal, ToR is the better choice, but EoR configuration is better if an organization wants to save on operational costs. The following table lists the differences between a ToR and an EoR data center design.

data center network topology
Diagram: Data center network topology. The differences. Source FS Community

Data Center Topology Types:

Fabric extenders – FEX

Cisco has introduced the concept of Fabric Extenders, which are not Ethernet switches but remote line cards of a virtualized modular chassis ( parent switch ). This allows scalable topologies previously impossible with traditional Ethernet switches in the access layer.

You should relate an FEX device like a remote line card attached to a parent switch. All the configuration is done on the parent switch, yet physically, the fabric extender could be in a different location. The mapping between the parent switch and the FEX ( fabric extender ) is done via a special VN-Link.

The following diagram shows an example of a FEX in a standard data center network topology. More specifically, we are looking at the Nexus 2000 FEX Series. Cisco Nexus 2000 Series Fabric Extenders (FEX) are based on the standard IEEE 802.1BR. They deliver fabric extensibility with a single point of management.

Cisco FEX
Diagram: Cisco FEX design. Source Cisco.

Different types of Fex solution

FEXs come with various connectivity solutions, including 100 Megabit Ethernet, 1 Gigabit Ethernet, 10 Gigabit Ethernet ( copper and fiber ), and 40 Gigabit Ethernet. They can be synchronized with the following models of parent switches – Nexus 5000, Nexus 6000, Nexus 7000, Nexus 9000, and Cisco UCS Fabric Interconnect.

In addition, because of the simplicity of FEX, they have very low latency ( as low as 500 nanoseconds ) compared to traditional Ethernet switches.

Data Center design
Diagram: Data center fabric extenders.

Some network switches can be connected to others and operate as a single unit. These configurations are called “stacks” and are helpful for quickly increasing the capacity of a network. A stack is a network solution composed of two or more stackable switches. Switches that are part of a stack behave as one single device.

Traditional switches like the 3750s still stand in the data center network topology access layer and can be used with stacking technology, combining two physical switches into one logical switch.

This stacking technology allows you to build a highly resilient switching system, one switch at a time. If you are looking at a standard access layer switch like the 3750s, consider the next-generation Catalyst 3850 series.

The 3850 supports BYOD/mobility and offers a variety of performance and security enhancements to previous models. The drawback of stacking is that you can only stack several switches. So, if you want additional throughout, you should aim for a different design type.

Data Center Design: Layer 2 and Layer 3 Solutions

Traditional views of data center design

Depending on the data center network topology deployed, packet forwarding at the access layer can be either Layer 2 or Layer 3. A Layer 3 approach would involve additional management and configuring IP addresses on hosts in a hierarchical fashion that matches the switch’s assigned IP address.

An alternative approach is to use Layer 2, which has less overhead as Layer 2 MAC addresses do not need specific configuration. However, it has drawbacks with scalability and poor performance.

Generally, access switches focus on communicating servers in the same IP subnet, allowing any type of traffic – unicast, multicast, or broadcast. You can, however, have filtering devices such as a Virtual Security Gateway ( VSG ) to permit traffic between servers, but that is generally reserved for inter-POD ( Platform Optimized Design ) traffic.

Leaf and Spine With Layer 3

We use a leaf and spine data center design with Layer 3 everywhere and overlay networking. This modern, robust architecture provides a high-performance, highly available network. With this architecture, data center networks are composed of leaf switches that connect to one or more spine switches.

The leaf switches are connected to end devices such as servers, storage devices, and other networking equipment. The spine switches, meanwhile, act as the network’s backbone, connecting the multiple leaf switches.

The leaf and spine architecture provides several advantages over traditional data center networks. It allows for greater scalability, as additional leaf switches can be easily added to the network. It also offers better fault tolerance, as the network can operate even if one of the spine switches fails.

Furthermore, it enables faster traffic flows, as the spine switches to route traffic between the leaf switches faster than a traditional flat network.

leaf and spine

Data Center Traffic Flow

Datacenter topologies can have North-South or East-to-West traffic. North-south ( up / down ) corresponds to traffic between the servers and the external world ( outside the data center ). East-to-west corresponds to internal server communication, i.e., traffic does not leave the data center.

Therefore, determining the type of traffic upfront is essential as it influences the type of topology used in the data center.

data center traffic flow
Diagram: Data center traffic flow.

For example, you may have a pair of ISCSI switches, and all traffic is internal between the servers. In this case, you would need high-bandwidth inter-switch links. Usually, an ether channel supports all the cross-server talk; the only north-to-south traffic would be management traffic.

In another part of the data center, you may have data server farm switches with only HSRP heartbeat traffic across the inter-switch links and large bundled uplinks for a high volume of north-to-south traffic. Depending on the type of application, which can be either outward-facing or internal, computation will influence the type of traffic that will be dominant. 

Virtual Machine and Containers.

This drive was from virtualization, virtual machines, and container technologies regarding east-west traffic. Many are moving to a leaf and spine data center design if they have a lot of east-to-west traffic and want better performance.

container based virtualization

Network Virtualization and VXLAN

Network virtualization and the ability of a physical server to host many VMs and move those VMs are also used extensively in data centers, either for workload distribution or business continuity. This will also affect the design you have at the access layer.

For example, in a Layer 3 fabric, migrating a VM across that boundary changes its IP address, resulting in a reset of the TCP sessions because, unlike SCTP, TCP does not support dynamic address configuration. In a Layer 2 fabric, migrating a VM incurs ARP overhead and requires forwarding on millions of flat MAC addresses, which leads to MAC scalability and poor performance problems.

1st Lab Guide: VXLAN

The following lab guide displays a VXLAN network. We are running VXLAN in unicast mode. VXLAN can also be configured to run in multicast mode. In the screenshot below, we have created a Layer 2 overlay across a routed Layer 3 core. The command: Show nve interface nve 1 displays an operational tunnel with the encapsulation set to VXLAN.

The screenshot shows a ping test from the desktops that connect to a Layer 3 port on the Leafs.

VXLAN overlay
Diagram: VXLAN Overlay

VXLAN: stability over Layer 3 core

Network virtualization plays a vital role in the data center. Technologies like VXLAN attempt to move the control plane from the core to the edge and stabilize the core so that it only has a handful of addresses for each ToR switch. The following diagram shows the ACI networks with VXLAN as the overlay that operates over a spine leaf architecture.

Layer 2 and 3 traffic is mapped to VXLAN VNIs that run over a Layer 3 core. The Bridge Domain is for layer 2, and the VRF is for layer 3 traffic. Now, we have the separation of layer 2 and 3 traffic based on the VNI in the VXLAN header.  

One of the first notable differences between VXLAN and VLAN was scale. VLAN has a 12-bit identifier called VID, while VXLAN has a 24-bit identifier called a VID network identifier. This means that with VLAN, you can create only 4094 networks over ethernet, while with VXLAN, you can create up to 16 million.

ACI network
Diagram: ACI network.

Whether you can build layer 2 or layer 3 in the access and use VXLAN or some other overlay to stabilize the core, it would help if you modularized the data center. The first step is to build each POD or rack as a complete unit. Each POD will be able to perform all its functions within that POD.

  • A key point: A POD data center design

POD: It is a design methodology that aims to simplify, speed deployment, optimize utilization of resources, and drive the interoperability of the three or more data center components: server, storage, and networks.

A POD example: Data center modularity

For example, one POD might be a specific human resources system. The second is modularity based on the type of resources offered. For example, a storage pod or bare metal compute may be housed in separate pods.

These two modularization types allow designers to control inter-POD traffic with predefined policies easily. Operators can also upgrade PODs and a specific type of service at once without affecting other PODs.

However, this type of segmentation does not address the scale requirements of the data center. Even when we have adequately modularized the data center into specific portions, the MAC table sizes on each switch still increase exponentially as the data center grows.

Current and Future Design Factors

New technologies with scalable control planes must be introduced for a cloud-enabled data center, and these new control planes should offer the following:

Option

Data Center Feature

Data center feature 1

The ability to scale MAC addresses

Data center feature 2

First-Hop Redundancy Protocol ( FHRP ) multipathing and Anycast HSRP

Data center feature 3

Equal-Cost multipathing

Data center feature 4

MAC learning optimizations

Several design factors need to be taken into account when designing a data center. First, what is the growth rate for servers, switch ports, and data center customers? This prevents part of the network topology from becoming a bottleneck or linking congested.

Application bandwidth demand?

This demand is usually translated into oversubscription. In data center networking, oversubscription refers to how much bandwidth switches are offered to downstream devices at each layer.

Oversubscription is expected in a data center design. By limiting oversubscription to the ToR and edge of the network, you offer a single place to start when performance problems occur.

A data center with no oversubscription ratio will be costly, especially with a low latency network design. So, it’s best to determine what oversubscription ratio your applications support and work best. Optimizing your switch buffers to improve performance is recommended before you decide on a 1:1 oversubscription rate.

Ethernet 6-byte MAC addressing is flat.

Ethernet forms the basis of data center networking in tandem with IP. Since its inception 40 years ago, Ethernet frames have been transmitted over various physical media, even barbed wire. Ethernet 6-byte MAC addressing is flat; the manufacturer typically assigns the address without considering its location.

Ethernet-switched networks do not have explicit routing protocols to ensure readability about the flat addresses of the server’s NICs. Instead, flooding and address learning are used to create forwarding table entries.

IP addressing is a hierarchy.

On the other hand, IP addressing is a hierarchy, meaning that its address is assigned by the network operator based on its location in the network. A hierarchy address space advantage is that forwarding tables can be aggregated. If summarization or other routing techniques are employed, changes in one side of the network will not necessarily affect other areas.

This makes IP-routed networks more scalable than Ethernet-switched networks. IP-routed networks also offer ECMP techniques that enable networks to use parallel links between nodes without spanning tree disabling one of those links. The ECMP method hashes packet headers before selecting a bundled link to avoid out-of-sequence packets within individual flows. 

Equal Cost Load Balancing

Equal-cost load balancing is a method for distributing network traffic among multiple paths of equal cost. It provides redundancy and increases throughput. Sending traffic over multiple paths avoids congestion on any single link. In addition, the load is equally distributed across the paths, meaning that each path carries roughly the same total traffic.

ecmp
Diagam: ECMP 5 Tuple hash. Source: Keysight

This allows for using multiple paths at a lower cost, providing an efficient way to increase throughput.

The idea behind equal cost load balancing is to use multiple paths of equal cost to balance the load on each path. The algorithm considers the number of paths, each path’s weight, and each path’s capacity. It also feels the number of packets that must be sent and the delay allowed for each packet.

Considering these factors, it can calculate the best way to distribute the load among the paths.

Equal-cost load balancing can be implemented using various methods. One method is to use a Link Aggregation Protocol (LACP), which allows the network to use multiple links and distribute the traffic among the links in a balanced way.

ecmp
Diagam: ECMP 5 Tuple hash. Source: Keysight
  • A keynote: Data center topologies. The move to VXLAN.

Given the above considerations, a solution that encompasses the benefits of L2’s plug-and-play flat addressing and the scalability of IP is needed. Location-Identifier Split Protocol ( LISP ) has a set of solutions that use hierarchical addresses as locators in the core and flat addresses as identifiers in the edges. However, not much is seen in its deployment these days.

Equivalent approaches such as THRILL and Cisco FabricPath create massive scalable L2 multipath networks with equidistant endpoints. Tunneling is also being used to extend down to the server and access layer to overcome the 4K limitation with traditional VLANs. What is VXLAN? Tunneling with VXLAN is now the standard design in most data center topologies with leaf-spine designs. The following video provides VXLAN guidance.

Data Center Network Topology

Leaf and spine data center topology types

This is commonly seen in a leaf and spine design. For example, in a leaf-spine fabric, We have a Layer 3 IP fabric that supports equal-cost multi-path (ECMP) routing between any two endpoints in the network. Then, on top of the Layer 3 fabric is an overlay protocol, commonly VXLAN.

A spine-leaf architecture consists of a data center network topology of two switching layers—a spine and a leaf. The leaf layer comprises access switches that aggregate traffic from endpoints such as the servers and connect directly to the spine or network core.

Spine switches interconnect all leaf switches in a full-mesh topology. The leaf switches do not directly connect. The Cisco ACI is a data center topology that utilizes the leaf and spine.

The ACI network’s physical topology is a leaf and spine, while the logical topology is formed with VXLAN. From a protocol side point, VXLAN is the overlay network, and the BGP and IS-IS provide the Layer 3 routing, the underlay network that allows the overlay network to function.

As a result, the nonblocking architecture performs much better than the traditional data center design based on access, distribution, and core designs.

Cisco ACI
Diagram: Data center topology types and the leaf and spine with Cisco ACI

Closing Points: Data Center Topologies

A data center topology refers to the physical layout and interconnection of network devices within a data center. It determines how servers, switches, routers, and other networking equipment are connected, ensuring efficient and reliable data transmission. Topologies are based on scalability, fault tolerance, performance, and cost.

  • Hierarchical Data Center Topology:

The hierarchical or tree topology is one of the most commonly used data center topologies. This design consists of multiple core, distribution, and access layers. The core layer connects all the distribution layers, while the distribution layer connects to the access layer. This structure enables better management, scalability, and fault tolerance by segregating traffic and minimizing network congestion.

  • Mesh Data Center Topology:

Every network device is interlinked in a mesh topology, forming a fully connected network with multiple paths for data transmission. This redundancy ensures high availability and fault tolerance. However, this topology can be cost-prohibitive and complex, especially in large-scale data centers.

  • Leaf-Spine Data Center Topology:

The leaf-spine topology is gaining popularity due to its scalability and simplicity. It consists of interconnected leaf switches at the access layer and spine switches at the core layer. This design allows for non-blocking, low-latency communication between any leaf switch and spine switch, making it suitable for modern data center requirements.

  • Full-Mesh Data Center Topology:

As the name suggests, the full-mesh topology connects every network device to every other device, creating an extensive web of connections. This topology offers maximum redundancy and fault tolerance. However, it can be expensive to implement and maintain, making it more suitable for critical applications with stringent uptime requirements.

Summary: Data Center Topology

Data centers are vital in supporting and enabling our digital infrastructure in today’s interconnected world. Behind the scenes, intricate network topologies ensure seamless data flow, allowing us to access information and services easily. In this blog post, we dived into the world of data center topologies, unraveling their complexities and understanding their significance.

Section 1: Understanding Data Center Topologies

Datacenter topologies refer to a data center’s physical and logical layout of networking components. These topologies determine how data flows between servers, switches, routers, and other network devices. By carefully designing the topology, data center operators can optimize performance, scalability, redundancy, and fault tolerance.

Section 2: Common Data Center Topologies

There are several widely adopted data center topologies, each with its strengths and use cases. Let’s explore some of the most common ones:

2.1. Tree Topology:

Tree topology, or hierarchical topology, is widely used in data centers. It features a hierarchical structure with multiple layers of switches, forming a tree-like network. This topology offers scalability and ease of management, making it suitable for large-scale deployments.

2.2. Mesh Topology:

The mesh topology provides a high level of redundancy and fault tolerance. In this topology, every device is connected to every other device, forming a fully interconnected network. While it offers robustness, it can be complex and costly to implement.

2.3. Spine-Leaf Topology:

The spine-leaf topology, also known as a Clos network, has recently gained popularity. It consists of leaf switches connecting to multiple spine switches, forming a non-blocking fabric. This design allows for efficient east-west traffic flow and simplified scalability.

Section 3: Factors Influencing Topology Selection

Choosing the right data center topology depends on various factors, including:

3.1. Scalability:

It is crucial for a topology to accommodate a data center’s growth. Scalable topologies ensure that additional devices can be seamlessly added without causing bottlenecks or performance degradation.

3.2. Redundancy and Fault Tolerance:

Data centers require high availability to minimize downtime. Topologies that offer redundancy and fault tolerance mechanisms, such as link and device redundancy, are crucial in ensuring uninterrupted operations.

3.3. Traffic Patterns:

Understanding the traffic patterns within a data center is essential for selecting an appropriate topology. Some topologies excel in handling east-west traffic, while others are better suited for north-south traffic flow.

Conclusion:

Datacenter topologies form the backbone of our digital infrastructure, providing the connectivity and reliability needed for our ever-expanding digital needs. By understanding the intricacies of these topologies, we can better appreciate the complexity involved in keeping our data flowing seamlessly. Whether it’s the hierarchical tree, the fully interconnected mesh, or the efficient spine-leaf, each topology has its place in the world of data centers.

Green data center with eco friendly electricity usage tiny person concept. Database server technology for file storage hosting with ecological and carbon neutral power source vector illustration.

Data Center Design with Active Active design

Active Active Data Center Design

In today's digital age, where businesses heavily rely on uninterrupted access to their applications and services, data center design plays a pivotal role in ensuring high availability. One such design approach is the active-active design, which offers redundancy and fault tolerance to mitigate the risk of downtime. This blog post will explore the active-active data center design concept and its benefits.

Active-active data center design refers to a configuration where two or more data centers operate simultaneously, sharing the load and providing redundancy for critical systems and applications. Unlike traditional active-passive setups, where one data center operates in standby mode, the active-active design ensures that both are fully active and capable of handling the entire workload.

Enhanced Reliability: Redundant data centers offer unparalleled reliability by minimizing the impact of hardware failures, power outages, or network disruptions. When a component or system fails, the redundant system takes over seamlessly, ensuring uninterrupted connectivity and preventing costly downtime.

Scalability and Flexibility: With redundant data centers, businesses have the flexibility to scale their operations effortlessly. Companies can expand their infrastructure without disrupting ongoing operations, as redundant systems allow for seamless integration and expansion.

Disaster Recovery: Redundant data centers play a crucial role in disaster recovery strategies. By having duplicate systems in geographically diverse locations, businesses can recover quickly in the event of natural disasters, power grid failures, or other unforeseen events. Redundancy ensures that critical data and services remain accessible, even during challenging circumstances.

Dual Power Sources: Redundant data centers rely on multiple power sources, such as grid power and backup generators. This ensures that even if one power source fails, the infrastructure continues to operate without disruption.

Network Redundancy: Network redundancy is achieved by setting up multiple network paths, routers, and switches. In case of a network failure, traffic is automatically redirected to alternative paths, maintaining seamless connectivity.

Data Replication: Redundant data centers employ data replication techniques to ensure that data is duplicated and synchronized across multiple systems. This safeguards against data loss and allows for quick recovery in case of a system failure.

Highlights: Active Active Data Center Design

The Role of Data Centers

An enterprise’s data center houses the computational power, storage, and applications needed to run its operations. All content is sourced or passed through the data center infrastructure in the IT architecture. When designing the data center infrastructure, performance, resiliency, and scalability must be considered. Furthermore, the data center design should be flexible so that new services can be deployed and supported quickly. The many considerations required for such a design are port density, access layer uplink bandwidth, actual server capacity, and oversubscription.

Modern data centers

A few short years ago, data centers were very different from what they are today. In a multi-cloud environment, virtual networks have replaced physical servers that support applications and workloads across pools of physical infrastructure. Nowadays, data exists across multiple data centers, the edge, and public and private clouds. Communication between these locations must be possible in the on-premises and cloud data centers. Public clouds are also collections of data centers. In the cloud, applications use the cloud provider’s data center resources.

Example: Spine-Leaf Network

A full-mesh topology is achieved by connecting every lower-tier switch (leaf layer) to each top-tier switch (spine layer). Devices such as servers are connected to the leaf layer by access switches. All leaf switches are interconnected through the spine layer, the network’s backbone. The leaf switches in the fabric are connected to the spine switches. The top-tier switches are evenly distributed based on the path chosen at random. Data center performance would only be slightly affected if one of the top-tier switches failed.

leaf and spine design

Redundant data centers

Redundant data centers are essentially two or more in different physical locations. This enables organizations to move their applications and data to another data center if they experience an outage. This also allows for load balancing and scalability, ensuring the organization’s services remain available.

Redundant data centers are generally located in geographically dispersed locations. This ensures that if one of the data centers experiences an issue, the other can take over, thus minimizing downtime. These data centers should also be connected via a high-speed networks connection, such as a dedicated line or virtual private network, to allow seamless data transfers between the locations.

Redundant Data Centers

Implementing redundant data center BGP involves several crucial steps. Firstly, establishing a robust network architecture with multiple data centers interconnected via high-speed links is essential. Secondly, configuring BGP routers in each data center to exchange routing information and maintain consistent network topologies is crucial. Additionally, utilizing techniques such as Anycast IP addressing and route reflectors further enhances redundancy and fault tolerance.

High Availability and BGP

High availability refers to the ability of a system or network to remain operational and accessible even during failures or disruptions. BGP is pivotal in achieving high availability by employing various mechanisms and techniques.

BGP Multipath is a feature that allows for the simultaneous use of multiple paths to reach a destination. By utilizing various paths, BGP can ensure redundancy and load balancing and enhance network availability.

BGP Route Reflectors are used in large-scale networks to alleviate the full-mesh requirement between BGP peers. By simplifying the BGP peering configuration, route reflectors enhance scalability and fault tolerance, contributing to high availability.

BGP Anycast is a technique that enables multiple servers or routers to share the same IP address. This method allows traffic routed to the nearest or least congested node, improving response times and fault tolerance.

BGP AS Prepend

Understanding BGP Route Reflection

BGP route reflection is used in large-scale networks to reduce the number of full-mesh peerings required in a BGP network. It allows a BGP speaker to reflect routes received from one set of peers to another set of peers, eliminating the need for every peer to establish a direct connection with every other peer. By using route reflection, network administrators can simplify their network topology and improve its scalability.

The network needs to be divided into two main components to implement BGP route reflection: route reflectors and clients. Route reflectors serve as the central point for route reflection, while clients are the BGP speakers who establish peering sessions with the route reflectors. It is important to carefully plan the placement of route reflectors to ensure optimal routing and redundancy in the network.

Route Reflector Hierarchy and Scaling

In large-scale networks, a hierarchy of route reflectors can be implemented to enhance scalability further. This involves using multiple route reflectors, where higher-level route reflectors reflect routes received from lower-level route reflectors. This hierarchical approach distributes the route reflection load and reduces the number of peering sessions required for each BGP speaker, thus improving scalability even further.

Expansion and scalability

Expanding capacity is straightforward if a link is oversubscribed (more traffic than can be aggregated on the active link simultaneously). Expanding every leaf switch’s uplinks is possible, adding interlayer bandwidth and reducing oversubscription by adding a second spine switch. New leaf switches can be added by connecting them to every spine switch and configuring them as network switches if device port capacity becomes a concern. Scaling the network is made more accessible through ease of expansion. A nonblocking architecture can be achieved without oversubscription between the lower-tier switches and their uplinks.

Defining an active-active data center strategy isn’t easy when you talk to network, server, and compute teams that don’t usually collaborate when planning their infrastructure. An active-active Data center design requires a cohesive technology stack from end to end. Establishing the idea usually requires an enterprise-level architecture drive. In addition, it enables the availability and traffic load sharing of applications across DCs with the following use cases.

  • Business continuity
  • Mobility and load sharing
  • Consistent policy and fast provisioning capability across

Active-active Transport Technologies

Transport technologies interconnect data centers. As part of the transport domain, redundancies and links are provided across the site to ensure HA and resiliency. Redundancy may be provided for multiplexers, GPONs, DCI network devices, dark fibers, diversity POPs for surviving POP failure, and 1+1 protection schemes for devices, cards, and links.

In addition, the following list contains the primary considerations to consider when designing a data center interconnection solution.

  • Recovery from various types of failure scenarios: Link failures, module failures, node failures, etc.
  • Traffic round-trip requirements between DCs based on link latency and applications
  • Requirements for bandwidth and scalability

Active-Active Network Services

Network services connect all devices in data centers through traffic switching and routing functions. Applications should be able to forward traffic and share load without disruptions on the network. Network services also provide pervasive gateways, L2 extensions, and ingress and egress path optimization across the data centers. Most of the major network vendors’ SDN solutions also integrate VxLAN overlay solutions to achieve L2 extension, path optimization, and gateway mobility.

Designing active-active network services requires consideration of the following factors:

  • Recovery from various failure scenarios, such as links, modules, and network devices, is possible.
  • Availability of the gateway locally as well as across the DC infrastructure
  • Using a VLAN or VxLAN between two DCs to extend the L2 domain
  • Policies are consistent across on-premises and cloud infrastructure – including naming, segmentation rules for integrating various L4/L7 services, hypervisor integration, etc.
  • Optimizing path ingresses and regresses.
  • Centralized management includes inventory management, troubleshooting, AAA capabilities, backup and restore traffic flow analysis, and capacity dashboards.

Active-Active L4-L7 Services

ADC and security devices must be placed in both DCs before active-active L4-L7 services can be built. The major solutions in this space include global traffic managers, application policy controllers, load balancers, and firewalls. Furthermore, these must be deployed at different tiers for perimeter, extranet, WAN, core server farm, and UAT segments. Also, it should be noted that most of the leading L4-L7 service vendors currently offer clustering solutions for their products across the DCs. As a result of clustering, its members can share L4/L7 policies, traffic loads, and failover seamlessly in case of an issue.

Below are some significant considerations related to L4-L7 service design

  • Various failure scenarios can be recovered, including link, module, and L4-L7 device failure.
  • In addition to naming policies, L4-L7 rules for various traffic types must be consistent across the on-premises infrastructure and in the multiple clouds.
  • Network management centralized (e.g., inventory, troubleshooting, AAA capabilities, backups, traffic flow analysis, capacity dashboards, etc.)

Active-Active Storage Services 

Active-active data centers rely on storage and networking solutions. They refer to the storage in both DCs that serve applications. Similarly, the design should allow for uninterrupted read and write operations. Therefore, real-time data mirroring and seamless failover capabilities across DCs are also necessary. The following are some significant factors to consider when designing a storage system.

  • Recover from single-disk failures, storage array failures, and split-brain failures.
  • Asynchronous vs. synchronous replication: With synchronized replication, data is simultaneously written for primary storage and replica. In addition, it typically requires dedicated FC links, which consume more bandwidth.
  • High availability and redundancy of storage: Storage replication factors and the number of disks available for redundancy
  • Failure scenarios of storage networks: Links, modules, and network devices

Active-Active Server Virtualization

Over the years, server virtualization has evolved. Microservices and containers are becoming increasingly popular among organizations.  The primary consideration here is to extend hypervisor/container clusters across the DCs to achieve seamless virtual machine/ container instance movement and fail-over. VMware Docker and Microsoft are the two dominant players in this market. Other examples include KVM, Kubernetes (container management), etc.

Here are some key considerations when it comes to virtualizing servers

  • Creating a cross-DC virtual host cluster using a virtualization platform
  • HA protects the VM in normal operational conditions and creates affinity rules that prefer local hosts.
  • VMs in two DCs can take over the load in real time when the host machine is unavailable by deploying the same service.
  • A symmetric configuration with failover resources is provided across the compute node devices and DCs.
  • Managing computing resources and hypervisors centrally

Active-Active Applications Deployment

The infrastructure needs to be in place for the application to function. Additionally, it is essential to ensure high application availability across DCs. Applications can also fail over and get proximity access to locations. It is necessary to have Web, App, and DB tiers available at both data centers, and if the application fails in one, it should allow fail-over and continuity.

Here are a few key points to consider

  • Use multiple servers to form independent clusters per DC to deploy the Web services on virtual or physical machines (VMs).
  • VM or physical machine can be used to deploy App services. If the application supports distributed deployment, multiple servers within the DC can form a cluster, or various servers across DCs can create a cluster (preferred IP-based access).
  • The databases should be deployed on physical machines to form a cross-DC cluster (active-standby or active-active). For example, Oracle RAC, DB2, SQL with Windows server failover cluster (WSFC)

Knowledge Check: Default Gateway Redundancy

A first-hop redundancy protocol (FHRP) always provides an active default IP gateway. To transparently failover at the first-hop IP router, FHRPs use two or more routers or Layer 3 switches.

The default gateway facilitates network communication. Source hosts send data to their default gateways. Default gateways are IP addresses on routers (or Layer 3 switches) connected to the same subnet as the source hosts. End hosts are usually configured with a single default gateway IP address when the network topology changes. The local device cannot send packets off the local network segment if the default gateway is not reached. There is no dynamic method by which end hosts can determine the address of a new default gateway, even if there is a redundant router that may serve as the default gateway for that segment.

Related: Before you proceed, you may find the following useful:

  1. Data Center Topologies
  2. LISP Protocol
  3. Data Center Network Design
  4. ASA Failover
  5. LISP Hybrid Cloud
  6. LISP Control Plane

Active active data center

Increased dependence on East-West traffic

Clustered Applications

Multi-Tenancy

Business Continuity

Workload Mobility

Back to Basics: Active-active Data Center Design Cisco.

At its core, an active active data center is based on fault tolerance, redundancy, and scalability principles. This means that the active data center should be designed to withstand any hardware or software failure, have multiple levels of data storage redundancy, and scale up or down as needed.

The data center also provides an additional layer of security. It is designed to protect data from unauthorized access and malicious attacks. It should also be able to detect and respond to any threats quickly and in a coordinated manner.

A comprehensive monitoring and management system is essential to ensure the data center functions correctly. This system should be designed to track the data center’s performance, detect problems, and provide the necessary alerting mechanisms. It should also provide insights into how the data center operates so that any necessary changes can be made.

Cisco Validated Design

Cisco has validated this design, freely available on the Cisco site. In summary, they have tested a variety of combinations, such as VSS-VSS, VSS-vPV, and vPC-vPC, and validated the design with 200 Layer 2 VLANs and 100 SVIs or 1000 VLANs and 1000 SVI with static routing.

At the time of writing, the M series for the Nexus 7000 supports native encryption of Ethernet frames through the IEEE 802.1AE standard. This implementation uses Advanced Encryption Standard ( AES ) cipher and a 128-bit shared key.

1st Lab Guide: Cisco ACI

In the following lab guide, we demonstrate Cisco ACI. To extend Cisco ACI, we have different designs, such as multi-site and multi-pod. This type of design overcomes many challenges of raising a data center, which we will discuss in this post, such as extending layer 2 networks.

One crucial value of the Cisco ACI is the COOP database that maps endpoints in the network. The following screenshots show the synchronized COOP database across spines, even in different data centers. Notice that the bridge domain VNID is mapped to the MAC address. The COOP database is unique to the Cisco ACI.

COOP database
Diagram: COOP database

The Challenge: Layer 2 is Weak.

The challenge of data center design is “Layer 2 is weak & IP is not mobile.” In the past, best practices recommended that networks from distinct data centers be connected through Layer 3 ( routing ), isolating the known Layer 2 turmoil. However, the business is driving the application requirements, changing the connectivity requirements between data centers. The need for an active data center has been driven by the following. It is generally recommended to have Layer 3 connections with path separation through Multi-VRF, P2P VLANs, or MPLS/VPN, along with a modular building block data center design.

Yet, some applications cannot function over a Layer 3 environment. For example, most geo clusters require Layer 2 adjacency between their nodes, whether for heartbeat and connection ( status and control synchronization ) state information or the requirement to share virtual IP.

MAC addresses to facilitate traffic handling in case of failure. However, some clustering products ( Veritas, Oracle RAC ) support communication over Layer 3 but are a minority and don’t represent the general case.

Defining active data centers

The term active-active refers to using at least two data centers where both can service an application at any time, so each functions as an active application site. The demand for active-active data center architecture is to accomplish seamless workload mobility and enable distributed applications along with the ability to pool and maximize resources.  

We must first have active-active data center infrastructure for an active/active application setup. Remember that the network is just one key component of active/active data centers). An active-active DC can be divided into two halves from a pure network perspective:-

  1. Ingress Traffic – inbound traffic
  2. Egress Traffic – outbound traffic
active active data center
Diagram: Active active data center. Scenario. Source is twoearsonemouth

Active Active Data Center and VM Migration

Migrating applications and data to virtual machines (VMs) are becoming increasingly popular as organizations seek to reduce their IT costs and increase the efficiency of their services. VM migration moves existing applications, data, and other components from a physical server to a virtualized environment. This process is becoming increasingly more cost-effective and efficient for organizations, eliminating the need for additional hardware, software, and maintenance costs.

Virtual Machine migration between data centers increases application availability, Layer 2 network adjacency between ESX hosts is currently required, and a consistent LUN must be maintained for stateful migration. In other words, if the VM loses its IP address, it will lose its state, and the TCP sessions will drop, resulting in a cold migration ( VM does a reboot ) instead of a hot migration ( VM does not reboot ).

Due to the stretched VLAN requirement, data center architects started to deploy traditional Layer 2 over the DCI and, unsurprisingly, were faced with exciting results. Although flooding and broadcasts are necessary for IP communication in Ethernet networks, they can become dangerous in a DCI environment.

Traffic Tramboning

Traffic tromboning can also be formed between two stretched data centers, so nonoptimal internal routing happens within extended VLANs. Trombones, by their very nature, create a network traffic scalability problem. Addressing this through load balancing among multiple trombones is challenging since their services are often stateful.

Traffic tromboning can affect either ingress or egress traffic. On egress, you can have FHRP filtering to isolate the HSRP partnership and provide an active/active setup for HSRP. On ingress, you can have GSLB, Route Injection, and LISP.

Traffic Tramboning
Diagram: Traffic Tramboning. Source is Silvanogai

Cisco Active-active data center design and virtualization technologies

Virtualization technologies can overcome many of these problems by being used for Layer 2 extensions between data centers. These include vPC, VSS, Cisco FabricPath, VPLS, OTV, and LISP with its Internet locator design. In summary, different technologies can be used for LAN extensions, and the primary mediums in which they can be deployed are Ethernet, MPLS, and IP.

    1.  Ethernet: VSS and vPC or Fabric Path
    2. MLS: EoMPLS and A-VPLS and H-VPLS
    3.  IP: OTV
    4. LISP

Ethernet Extensions and Multi-Chassis EtherChannel ( MEC )

It requires protected DWDM or direct fibers and works only between two data centers. It cannot support multi-datacenter topology, i.e., a full mesh of data centers, but it can help hub and spoke topologies.

Previously, LAG could only terminate on one physical switch. VSS-MEC and vPC are port-channeling concepts extending link aggregation to two separate physical switches. This allows for creating L2 typologies based on link aggregation, eliminating the dependency on STP, thus enabling you to scale available Layer 2 bandwidth by bonding the physical links.

Because vPC and VSS create a single connection from an STP perspective, disjoint STP instances can be deployed in each data center. Such isolation can be achieved with BPDU Filtering on the DCI links or Multiple Spanning Tree ( MST ) regions on each site.

At the time of writing, vPC does not support Layer 3 peering, but if you want an L3 link, create one, as this does not need to run on dark fiber or protected DWDM, unlike the extended Layer 2 links. 

Ethernet Extension and Fabric path

The fabric path allows network operators to design and implement a scalable Layer 2 fabric, allowing VLANs to help reduce the physical constraints on server location. It provides a high-availability design with up to 16 active paths at layer 2, with each path a 16-member port channel for Unicast and Multicast.

This enables the MSDC networks to have flat typologies, separating nodes by a single hop ( equidistant endpoints ). Cisco has not targeted Fabric Path as a primary DCI solution as it does not have specific DCI functions compared to OTV and VPLS.

Its primary purpose is for Clos-based architectures. However, if you need to interconnect three or more sites, the Fabric path is a valid solution when you have short distances between your DCs via high-quality point-to-point optical transmission links.

Your WAN links must support Remote Port Shutdown and microflapping protection. By default, OTV and VPLS should be the first solutions considered as they are Cisco-validated designs with specific DCI features, e.g., OTV can flood unknown unicast for particular VLANs.

FabricPath
Diagram: FabricPath. Source is Cisco

IP Core with Overlay Transport Virtualization ( OTV ).

OTV provides dynamic encapsulation with multipoint connectivity of up to 10 sites ( NX-OS 5.2 supports 6 sites, and NX-OS 6.2 supports 10 sites ). OTV, also known as Over-The-Top virtualization, is a specific DCI technology that enables Layer 2 extension across data center sites by employing a MAC in IP encapsulation with built-in loop prevention and failure boundary preservation.

There is no data plane learning. Instead, the overlay control plane ( Layer 2 IS-IS ) on the provider’s network facilitates all unicast and multicast learning between sites. OTV has been supported on the Nexus 7000 since the 5.0 NXOS Release and ASR 1000 since the 3.5 XE Release. OTV as a DCI has robust high availability, and most failures can be sub-sec convergence with only extreme and very unlikely failures such as device down resulting in <5 seconds.

Locator ID/Separator Protocol ( LISP)

Locator ID/Separator Protocol ( LISP) has many applications. As the name suggests, it separates the location and identifier of the network hosts, making it possible for VMs to move across subnet boundaries while retaining their IP address and enabling advanced triangular routing designs.

LISP works well when you have to move workloads and distribute workloads across data centers, making it a perfect complementary technology for an active-active data center design. It provides you with the following:

  • a) Global IP mobility across subnets for disaster recovery and cloud bursting ( without LAN extension ) and optimized routing across extended subnet sites.
  • b) Routing with extended subnets for active/active data centers and distributed clusters ( with LAN extension).
LISP networking
Diagram: LISP Networking. Source is Cisco

LISP answers the problems with ingress and egress traffic tromboning. It has a location mapping table, so when a host move is detected, updates are automatically triggered, and ingress routers (ITRs or PITRs) send traffic to the new location. From an ingress path flow inbound on the WAN perspective, LISP can answer our little problems with BGP in controlling ingress flows. Without LISP, we are limited to specific route filtering, meaning if you have a PI Prefix consisting of a /16.

If you break this up and advertise into 4 x /18, you may still get poor ingress load balancing on your DC WAN links; even if you were to break this up to 8 x /19, the results might still be unfavorable.

LISP works differently than BGP because a LISP proxy provider would advertise this /16 for you ( you don’t advertise the /16 from your DC WAN links ) and send traffic at 50:50 to our DC WAN links. LISP can get a near-perfect 50:50 conversion rate at the DC edge.

Benefits of Active-Active Data Center Design:

1. Enhanced Redundancy: With active-active design, organizations can achieve higher levels of redundancy by distributing the workload across multiple data centers. This redundancy ensures that even if one data center experiences a failure or maintenance downtime, the other data center seamlessly takes over, minimizing the impact on business operations.

2. Improved Performance and Scalability: Active-active design enables organizations to scale their infrastructure horizontally by distributing the load across multiple data centers. This approach ensures that the workload is evenly distributed, preventing any single data center from becoming a performance bottleneck. It also allows businesses to accommodate increasing demands without compromising performance or user experience.

3. Reduced Downtime: The active-active design significantly reduces the risk of downtime compared to traditional architectures. In the event of a failure, the workload can be immediately shifted to the remaining active data center, ensuring continuous availability of critical services. This approach minimizes the impact on end-users and helps organizations maintain their reputation for reliability.

4. Disaster Recovery Capabilities: Active-active data center design provides a robust disaster recovery solution. Organizations can ensure that their critical systems and applications remain operational despite a catastrophic failure at one location by having multiple geographically distributed data centers. This design approach minimizes the risk of data loss and provides a seamless failover mechanism.

Implementation Considerations:

Implementing an active-active data center design requires careful planning and consideration of various factors. Here are some key considerations:

1. Network Design: A robust and resilient network infrastructure is crucial for active-active data center design. Implementing load balancers, redundant network links, and dynamic routing protocols can help ensure seamless failover and optimal traffic distribution.

2. Data Synchronization: Organizations need to implement effective data synchronization mechanisms to maintain data consistency across multiple data centers. This may involve deploying real-time replication, distributed databases, or file synchronization protocols.

3. Application Design: Applications must be designed to be aware of the active-active architecture. They should be able to distribute the workload across multiple data centers and seamlessly switch between them in case of failure. Application-level load balancing and session management become critical in this context.

Active-active data center design offers organizations a robust solution for high availability and fault tolerance. By distributing the workload across multiple data centers, businesses can ensure uninterrupted access to critical systems and applications. The enhanced redundancy, improved performance, reduced downtime, and disaster recovery capabilities make active-active design an ideal choice for organizations striving to provide seamless and reliable services in today’s digital landscape.

Summary: Active Active Data Center Design

In today’s digital age, businesses and organizations rely heavily on data centers to store, process, and manage critical information. However, any disruption or downtime can have severe consequences, leading to financial losses and damage to reputation. This is where redundant data centers come into play. In this blog post, we explored the concept of redundant data centers, their benefits, and how they ensure uninterrupted digital operations.

Understanding Redundancy in Data Centers

Redundancy in data centers refers to duplicating critical components and systems to minimize the risk of failure. It involves creating multiple backups of hardware, power sources, cooling systems, and network connections. With redundant systems, data centers can continue functioning even if one or more components fail.

Types of Redundancy

Data centers employ various types of redundancy to ensure uninterrupted operations. These include:

1. Hardware Redundancy involves duplicate servers, storage devices, and networking equipment. If one piece of hardware fails, the redundant backup takes over seamlessly, preventing disruption.

2. Power Redundancy: Power outages can harm data center operations. Redundant power systems, such as backup generators and uninterruptible power supplies (UPS), provide continuous power supply even during electrical failures.

3. Cooling Redundancy: Overheating can damage sensitive equipment in data centers. Redundant cooling systems, including multiple air conditioning units and cooling towers, help maintain optimal temperature levels and prevent downtime.

Network Redundancy

Network connectivity is crucial for data centers to communicate with the outside world. Redundant network connections ensure that alternative paths are available to maintain uninterrupted data flow if one connection fails. This can be achieved through diverse internet service providers (ISPs), multiple routers, and network switches.

Benefits of Redundant Data Centers

Implementing redundant data centers offers several benefits, including:

1. Increased Reliability: Redundancy minimizes the risk of single points of failure, making data centers highly reliable and resilient.

2. Improved Uptime: Data centers can achieve impressive uptime percentages with redundant systems, ensuring continuous access to critical data and services.

3. Disaster Recovery: Redundant data centers are crucial in disaster recovery strategies. If one data center becomes inaccessible due to natural disasters or other unforeseen events, the redundant facility takes over seamlessly, ensuring business continuity.

Conclusion:

Redundant data centers are vital for organizations that cannot afford any interruption in their digital operations. By implementing hardware, power, cooling, and network redundancy, businesses can mitigate risks, ensure uninterrupted access to critical data, and safeguard their operations from potential disruptions. Investing in redundant data centers is a proactive measure to save businesses from significant financial losses and reputational damage in the long run.

stp port states

Data Center Network Design

Data Center Network Design

Data centers are crucial in today’s digital landscape, serving as the backbone of numerous businesses and organizations. A well-designed data center network ensures optimal performance, scalability, and reliability. This blog post will explore the critical aspects of data center network design and its significance in modern IT infrastructure.

Data center network design involves the architectural planning and implementation of networking infrastructure within a data center environment. It encompasses various components such as switches, routers, cables, and protocols. A well-designed network ensures seamless communication, high availability, and efficient data flow.

The traditional three-tier network architecture is being replaced by more streamlined and flexible designs. Two popular approaches gaining traction are the spine-leaf architecture and the fabric-based architecture. The spine-leaf design offers low latency, high bandwidth, and improved scalability, making it ideal for large-scale data centers. On the other hand, fabric-based architectures provide a unified and simplified network fabric, enabling efficient management and enhanced performance.

Network virtualization, powered by technologies like SDN, is transforming data center network design. By decoupling the network control plane from the underlying hardware, SDN enables centralized network management, automation, and programmability. This results in improved agility, better resource allocation, and faster deployment of applications and services.

With the rising number of cyber threats, ensuring robust security and resilience has become paramount. Data center network design should incorporate advanced security measures such as firewalls, intrusion detection systems, and encryption protocols. Additionally, implementing redundant links, load balancing, and disaster recovery mechanisms enhances network resilience and minimizes downtime.

Highlights: Data Center Network Design

Data Center Network Design

a. Understanding the Requirements

Before embarking on the design process, it’s crucial to understand the data center’s unique requirements. Factors such as power and cooling, network connectivity, scalability, and security are vital in determining the design approach. By thoroughly assessing these requirements, architects can create a blueprint that aligns with the organization’s current and future needs.

b. Optimizing Physical Layout

The physical layout of a data center significantly impacts its efficiency and performance. This section will delve into rack placement, aisle design, cable management, and airflow optimization. By adopting best practices in physical layout design, data center operators can minimize energy consumption, reduce maintenance costs, and enhance overall operational efficiency.

c. Redundancy and Resilience

Data centers demand high levels of redundancy and resilience to ensure uninterrupted operations. This section will explore the concept of redundancy in power and cooling systems, backup generators, redundant network connectivity, and failover mechanisms. Implementing robust redundancy measures helps mitigate the risk of downtime and ensures continuous availability of critical services.

4. Security and Compliance

Data centers store sensitive and valuable information, making security a top priority. This section will discuss the importance of physical security measures, access controls, surveillance systems, and fire suppression mechanisms. Additionally, we will explore compliance standards and regulations that govern data center operations, such as SOC 2, ISO 27001, and GDPR.

5. Embracing Green Initiatives

As environmental sustainability gains importance, data centers seek ways to minimize their carbon footprint. This section will focus on energy-efficient design practices, including using renewable energy sources, efficient cooling techniques, and server virtualization. Data centers can contribute to a more sustainable future by adopting green initiatives.

Composition of Data Center Architecture

Routing and Switching:

Routing is the backbone of a data center network, guiding data packets through the labyrinthine pathways. It involves determining the optimal path for data to travel from source to destination, considering network congestion, latency, and cost factors. Advanced routing protocols like Border Gateway Protocol (BGP) enable dynamic route selection, ensuring efficient and fault-tolerant data delivery.

Switching complements routing by facilitating efficient data transmission within a local network. At the heart of a data center, switches act as intelligent traffic controllers, directing data packets to their intended destinations. With features like VLANs (Virtual Local Area Networks) and Quality of Service (QoS), switches prioritize and prioritize traffic, optimizing network performance and ensuring seamless communication.

stp port states

Example: Spanning Tree Uplink Fast

Spanning Tree Protocol (STP) prevents loops in Ethernet networks by creating a loop-free logical topology and blocking redundant paths. While STP ensures network stability, it can also introduce delays in network convergence. Network downtime caused by STP convergence can be a major concern for businesses. Even a few seconds of downtime can result in significant losses in critical environments. This is where Spanning Tree Uplink Fast comes into play. Uplink Fast is an enhancement to STP that provides faster convergence times, reducing network downtime and improving overall network efficiency.

How Uplink Fast Works

Uplink Fast allows a switch to detect a link failure on its designated root port and immediately activate an alternate port. This process eliminates the need for the traditional STP convergence process, resulting in faster network recovery times. Uplink Fast is instrumental when network redundancy is crucial, such as in data centers or enterprise networks.

Network, security, and computing

A data center architecture consists of three main components: the data center network, the data center security, and the data center computing architecture. In addition to these three types of architecture, there are also data center physical architectures and data center information architectures. The following are three typical compositions. Network architecture for data centers: Data center networks (DCNs) are arrangements of network devices interconnecting data center resources. They are a crucial research area for Internet companies and large cloud computing firms. The design of a data center depends on its network architecture.

It is common for routers and switches to be arranged in hierarchies of two or three levels. There are three-tier DCNs: fat tree DCNs, DCells, and others. There has always been a focus on scalability, robustness, and reliability regarding data center network architectures.

Data center security refers to physical practices and virtual technologies for protecting data centers from threats, attacks, and unauthorized access. It can be divided into two components: physical security and software security. A firewall between a data center’s external and internal networks can protect it from attack.

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel is a link aggregation technique that combines multiple physical links between switches into a single logical channel. By bundling these links together, traffic can be distributed across them, increasing overall bandwidth capacity and providing load-balancing capabilities. Unlike Layer 2 Etherchannel, Layer 3 Etherchannel operates at the network layer, allowing traffic to be routed.

To configure Layer 3 Etherchannel, several steps need to be followed. First, the physical interfaces on the switches need to be identified and grouped into the Etherchannel bundle. Then, a logical interface, the Port-Channel interface, is created and assigned an IP address. Subsequently, routing protocols or static routes can be configured on the Port-Channel interface to enable communication between different networks.

Layer 3 Etherchannel supports various load-balancing algorithms, which determine how traffic is distributed across the bundled links. Standard algorithms include source IP, destination IP, and round-robin. Each algorithm has advantages and considerations depending on the network requirements and traffic patterns.

Example: Data Center WAN Protocol

BGP, also known as the routing protocol of the Internet, is responsible for exchanging routing and reachability information among autonomous systems (AS). It enables routers to make intelligent decisions about the most optimal paths for data transmission. Unlike interior gateway protocols, BGP focuses on routing between different networks rather than within a single network.

BGP operates on a trust-based model, where routers form peer relationships to exchange routing information. These peers establish connections and exchange routing updates, allowing them to build a complete picture of network reachability. BGP uses a sophisticated algorithm that considers multiple factors, such as path length, quality of service, and policy-based decisions, to determine the best route for traffic.

Understanding BGP AS Prepend

AS Prepend involves adding additional Autonomous System (AS) numbers to the AS path attribute of BGP advertisements. By manipulating the AS path, network operators can influence inbound traffic routing decisions by neighboring autonomous systems. This technique makes a specific path appear less desirable, diverting traffic to alternative paths.

AS Prepend holds excellent potential for optimizing network routing in various scenarios. It can achieve load balancing across multiple links, redirect traffic to less congested paths, or prefer specific transit providers. By carefully implementing AS Prepend, network administrators can improve network performance, reduce latency, and enhance overall service quality.

BGP AS Prepend

Recap: Border Gateway Protocol (BGP) is data centers’ most commonly used routing protocol. It has been used to connect Internet systems worldwide for decades and can also be used outside a data center. The BGP protocol is a standard-based open-source software package. It’s more common to find BGP peering between data centers over the WAN. However, we see BGP often used purely inside the data center.

Understanding BGP Route Reflection

BGP route reflection, at its core, is a method that allows a BGP speaker to reflect routing information to its peers, alleviating the need for full-mesh connectivity. The network structure becomes more streamlined and manageable by designating specific BGP routers as route reflectors.

The utilization of BGP route reflection offers several advantages. First, it reduces the number of required BGP peering sessions, resulting in a simplified and less resource-intensive network. Second, route reflection enhances scalability by eliminating the need for full-mesh connectivity, particularly in large-scale networks. Third, it improves convergence time and reduces BGP update processing overhead, enhancing overall network performance.

Critical Data Center Network Design Requirments

The third wave of application architectures

Google and Amazon, two of the world’s leading web-scale pioneers, developed a modern data center. The third wave of application architectures represents these organizations’ search and cloud applications. Towards the end of the 20th century, client-server architectures and monolithic single-machine applications dominated the landscape. This third wave of applications has three primary characteristics:

Unlike client-server architectures, modern data center applications involve a lot of communication between servers. In client-server architectures, clients communicate with monolithic servers, which either handle the request entirely themselves or communicate with fewer than a handful of other servers, such as database servers. Search (or Hadoop, its more popular variant) employs many mappers and reducers instead of search. In the cloud, virtual machines can reside on different nodes but must communicate seamlessly. In some cases, VMs are deployed on servers with the least load, scaled out, or balanced loads.

A microservices architecture also increases server-to-server communication. This architecture is based on separating a single function into smaller building blocks and interacting with them. Each block can be used in several applications and enhanced, modified, and fixed independently in such an architecture. Since diagrams usually show servers next to each other, East-West traffic is often called server communication. Traffic flows north-south between local networks and external networks.

Scale and resilience

The sheer size of modern data centers is characterized by rows and rows of dark, humming, blinking machines. As opposed to the few hundred or so servers of the past, a modern data center contains between a few hundred and a hundred thousand servers. To address the connectivity requirements at such scales, as well as the need for increased server-to-server connectivity, network design must be rethought. Unlike older architectures, modern data center applications assume failures as a given. Failures should be limited to the smallest possible footprint. Failures must have a limited “blast radius.” By minimizing the impact of network or server failures on the end-user experience, we aim to provide a stable and reliable experience.

Data Center Goal: Interconnect networks

The goal of data center design and interconnection network is to transport end-user traffic from A to B without any packet drops, yet the metrics we use to achieve this goal can be very different. The data center is evolving and progressing through various topology and technology changes, resulting in multiple network designs.  The new data center control planes we see today, such as Fabric Path, LISP, THRILL, and VXLAN, are driven by a change in the end user’s requirements; the application has changed. These new technologies may address new challenges, yet the fundamental question of where to create the Layer 2/Layer three boundaries and the need for Layer 2 in the access layer remains the same. The question stays the same, yet the technologies available to address this challenge have evolved.

Example Protocol: Understanding VXLAN

VXLAN, an encapsulation protocol, enables the creation of virtualized Layer 2 networks over an existing Layer 3 infrastructure. By extending the Layer 2 domain, VXLAN allows the seamless transfer of network traffic between geographically dispersed data centers. It achieves this by encapsulating Ethernet frames within IP packets, providing flexibility and scalability to network virtualization.

Scalability and Flexibility: VXLAN addresses the limitations of traditional VLANs by allowing for a significantly more significant number of virtual networks – up to 16 million – compared to the 4,096 limit of VLANs. This scalability enables organizations to allocate virtual networks more efficiently while accommodating the growing demands of cloud-based applications and services.

Enhanced Network Segmentation and Isolation: VXLAN provides improved network segmentation by creating logical networks that are isolated from one another, even if they share the same physical infrastructure. This isolation enhances security and enables more granular control over network traffic, facilitating efficient multi-tenancy in cloud environments.

VXLAN overlay
Diagram: VXLAN Overlay

Modern Data Centers

There is a vast difference between modern data centers and what they used to be just a few years ago. Physical servers have evolved into virtual networks that support applications and workloads across pools of physical infrastructure and into a multi-cloud environment. There are multiple data centers, the edge, and public and private clouds where data exists and is connected. Both on-premises and cloud-based data centers must be able to communicate. Data centers are even part of the public cloud. Cloud-hosted applications use the cloud provider’s data center resources.

Unified Fabric

Through Cisco’s fabric-based data center infrastructure, tiered silos and inefficiencies of multiple network domains are eliminated, and a unified, flat fabric is provided instead, which allows local area networks (LANs), storage area networks (SANs), and network-attached storage (NASs) to be consolidated into one high-performance, fault-tolerant network. Creating large pools of virtualized network resources that can be easily moved and rapidly reconfigured with Cisco Unified Fabric provides massive scalability and resiliency to the data center.

This approach automatically deploys virtual machines and applications, thereby reducing complexity. Thanks to deep integration between server and network architecture, secure IT services can be delivered from any device within the data center, between data centers, or beyond. In addition to Cisco Nexus switches, Cisco Unified Fabric uses Cisco NX-OS as its operating system.

The use of Open Networking

We also have the Open Networking Foundation ( ONF ), which provides open networking. Open networking describes a network that uses open standards and commodity hardware. So, consider open networking in terms of hardware and software. Unlike a vendor approach like Cisco, this gives you much more choice with what hardware and software you use to make up and design your network.

Related: Before you proceed, you may find the following useful:

  1. ACI Networks
  2. IPv6 Attacks
  3. SDN Data Center
  4. Active Active Data Center Design
  5. Virtual Switch

Data Center Control Plane

Key Data Center Network Design Discussion Points:


  • Introduction to data center network design and what is involved.

  • Highlighting the details of VLANs and virtualization.

  • Technical details on the issues of Layer 2 in data centers. 

  • Scenario: Cisco FabricPath and DFA.

  • Details on overlay networking and Cisco OTV.

The Rise of Overlay Networking

What has the industry introduced to overcome these limitations and address the new challenges? – Network virtualization and overlay networking. In its simplest form, an overlay is a dynamic tunnel between two endpoints that enables Layer 2 frames to be transported between them. In addition, these overlay-based technologies provide a level of indirection that allows switching table sizes to not increase in the order of the number of supported end hosts.

Today’s overlays are Cisco FabricPath, THRILL, LISP, VXLAN, NVGRE, OTV, PBB, and Shorted Path Bridging. They are essentially virtual networks that sit on top of a physical network, and often, the physical network is unaware of the virtual layer above it.

1st Lab Guide: VXLAN

The following lab guide displays a VXLAN network. We are running VXLAN in multicast mode. Multicast VXLAN is a variant of VXLAN that utilizes multicast-based IP multicast for transmitting overlay network traffic. VXLAN is an encapsulation protocol that extends Layer 2 Ethernet networks over Layer 3 IP networks.

Linking multicast enables efficient and scalable communication within the overlay network. Notice the multicast group of 239.0.0.10 and the route of 239.0.0.10 forwarding out the tunnel interface. We have multicast enabled on all Layer 3 interfaces, including the core that consists of Spine A and Spine B.

Multicast VXLAN
Diagram: Multicast VXLAN

Traditional Data Center Network Design

How do routers create a broadcast domain boundary? Firstly, using the traditional core, distribution, and access model, the access layer is layer 2, and servers served to each other in the access layer are in the same IP subnet and VLAN. The same access VLAN will span the access layer switches for east-to-west traffic, and any outbound traffic is via a First Hop Redundancy Protocol ( FHRP ) like Hot Standby Router Protocol ( HSRP ).

Servers in different VLANs are isolated from each other and cannot communicate directly; inter-VLAN communications require a Layer 3 device. Virtualization’s humble beginnings started with VLANs, which were used to segment traffic at Layer 2. It was expected to find single VLANs spanning an entire data center fabric.

Redundant Data Centers 

VLAN and Virtualization

The virtualization side of VLANs comes from two servers physically connected to different switches. Assuming the VLAN spans both switches, the same VLAN can communicate with each server. Each VLAN can be defined as a broadcast domain in a single Ethernet switch or shared among connected switches.

Whenever a switch interface belonging to a VLAN receives a broadcast frame ( destination MAC is ffff.ffff.ffff), the device must forward this frame to all other ports defined in the same VLAN.

This approach is straightforward in design and is almost like a plug-and-play network. The first question is, why not connect everything in the data center into one large Layer 2 broadcast domain? Layer 2 is a plug-and-play network, so why not? STP also blocks links to prevent loops.

stp port states

 The issues of Layer 2

The reason is that there are many scaling issues in large layer 2 networks. Layer 2 networks don’t have controlled / efficient network discovery protocols. Address Resolution Protocol ( ARP ) is used to locate end hosts and uses Broadcasts and Unicast replies. A single host might not generate much traffic, but imagine what would happen if 10,000 hosts were connected to the same broadcast domain. VLANs span an entire data center fabric, which can bring a lot of instability due to loops and broadcast storms.

Address Resolution Protocol

 No hierarchy in MAC addresses

MAC addressing also lacks hierarchy. Unlike Layer 3 networks, which allow summarization and hierarchy addressing, MAC addresses are flat. Adding several thousand hosts to a single broadcast domain will create large forwarding information tables.

Because end hosts are potentially not static, they are likely to be attached and removed from the network at regular intervals, creating a high rate of change in the control plane. Of course, you can have a large Layer 2 data center with multiple tenants if they don’t need to communicate with each other.

The shared services requirements, such as WAAS or load balancing, can be solved by spinning up the service VM in the tenant’s Layer 2 broadcast domain. This design will hit scaling and management issues. There is a consensus to move from a Layer 2 design to a more robust and scalable Layer 3 design.

But why is Layer 2 still needed in data center topologies? One solution is Layer 2 VPN with EVPN. But first, let us look at Cisco DFA.

The Requirement for Layer 2 in Data Center Network Design

  • Servers that perform the same function might need to communicate with each other due to a clustering protocol or simply as part of the application’s inner functions. If the communication is clustering protocol heartbeats or some server-to-server application packets that are not routable, then you need this communication layer to be on the same VLAN, i.e., Layer 2 domain, as these types of packets are not routable and don’t understand the IP layer.

  • Stateful devices such as firewalls and load balancers need Layer 2 adjacency as they constantly exchange connection and session state information.

  • Dual-homed servers: Single server with two server NICs and one NIC to each switch will require a layer 2 adjacency if the adapter has a standby interface that uses the same MAC and IP addresses after a failure. In this situation, the active and standby interfaces must be on the same VLAN and use the same default gateway.

  • Suppose your virtualization solutions cannot handle Layer 3 VM mobility. In that case, you may need to stretch VLANs between PODS / Virtual Resource Pools or even data centers so you can move VMs around the data center at Layer 2 ( without changing their IP address ).

Data Center Design and Cisco DFA

Cisco took a giant step and recently introduced a data center fabric with Dynamic Fabric Automaton ( DFA ), similar to Juniper QFabric. This fabric offers Layer 2 switching and Layer 3 routing at the access layer / ToR. Firstly, it has a Fabric Path ( IS-IS for Layer 2 connectivity ) in the core, which gives optimal Layer 2 forwarding between all the edges.

Then they configure the same Layer 3 address on all the edges, which gives you optimal Layer 3 forwarding across the whole Fabric.

On edge, you can have Layer 3 Leaf switches, for example, the Nexus 6000 series, or integrate with Layer 2-only devices like the Nexus 5500 series or the Nexus 1000v. You can also connect external routers or USC or FEX to the Fabric. In addition to running IS-IS as the data center control plane, DFA uses MP-iBGP, with some Spine nodes being the Route Reflector to exchange IP forwarding information.

Cisco FabricPath

DFA also employs a Cisco FabricPath technique called “Conversational Learning.” The first packet triggers a full RIB lookup, and the subsequent packets are switched in the hardware-implemented switching cache.

This technology provides Layer 2 mobility throughout the data center while providing optimal traffic flow using Layer 3 routing. Cisco commented, “DFA provides a scale-out architecture without congestion points in the network while providing optimized forwarding for all applications.”

Terminating Layer 3 at the access / ToR has clear advantages and disadvantages. Other benefits include reducing the size of the broadcast domain, which comes at the cost of reducing the mobility domain across which VMs can be moved.

Terminating Layer 3 at the accesses can also result in sub-optimal routing because there will be hair pinning or traffic tromboning of across-subnet traffic, taking multiple and unnecessary hops across the data center fabric.

FabricPath

The role of the Cisco Fabricpath

Cisco FabricPath is a Layer 2 technology that provides Layer 3 benefits, such as multipathing the classical Layer 2 networks using IS-IS at Layer 2. This eliminates the need for spanning tree protocol, avoiding the pitfalls of having large Layer 2 networks. As a result, Fabric Path enables a massive Layer 2 network that supports multipath ( ECMP ). THRILL is an IEEE standard that, like Fabric Path, is a Layer 2 technology that provides the same Layer 3 benefits as Cisco FabricPath to the Layer 2 networks using IS-IS.

LISP is popular in Active data centers for DCI route optimization/mobility. It separates the host’s location from the identifier ( EID ), allowing VMs to move across subnet boundaries while keeping the endpoint identification. LISP is often referred to as an Internet locator. 

That can enable some designs of triangular routing. Popular encapsulation formats include VXLAN ( proposed by Cisco and VMware ) and STT (created by Nicira but will be deprecated over time as VXLAN comes to dominate ).

The role of OTV

OTV is a data center interconnect ( DCI ) technology enabling Layer 2 extension across data center sites. While Fabric Path can be a DCI technology with dark fiber over short distances, OTV has been explicitly designed for DCI. In contrast, the Fabric Path data center control plane is primarily used for intra-DC communications.

Failure boundary and site independence are preserved in OTV networks because OTV uses a data center control plane protocol to sync MAC addresses between sites and prevent unknown unicast floods. In addition, recent IOS versions can allow unknown unicast floods for certain VLANs, which are unavailable if you use Fabric Path as the DCI technology.

The Role of Software-defined Networking (SDN)

Another potential trade-off between data center control plane scaling, Layer 2 VM mobility, and optimal ingress/egress traffic flow would be software-defined networking ( SDN ). At a basic level, SDN can create direct paths through the network fabric to isolate private networks effectively.

An SDN network allows you to choose the correct forwarding information per-flow basis. This per-flow optimization eliminates VLAN separation in the data center fabric. Instead of using VLANs to enforce traffic separation, the SDN controller has a set of policies allowing traffic to be forwarded from a particular source to a destination.

The ACI Cisco borrows concepts of SDN to the data center. It operates over a leaf and spine design and traditional routing protocols such as BGP and IS-IS. However, it brings a new way to manage the data center with new constructs such as Endpoint Groups (EPGs). In addition, no more VLANs are needed in the data center as everything is routed over a Layer 3 core, with VXLAN as the overlay protocol.

SDN and OpenFlow

Closing Points: Data Center Design

Data centers are the backbone of modern technology infrastructure, providing the foundation for storing, processing, and transmitting vast amounts of data. A critical aspect of data center design is the network architecture, which ensures efficient and reliable data transmission within and outside the facility.  1. Scalability and Flexibility

One of the primary goals of data center network design is to accommodate the ever-increasing demand for data processing and storage. Scalability ensures the network can grow seamlessly as the data center expands. This involves designing a network that supports many devices, servers, and users without compromising performance or reliability. Additionally, flexibility is essential to adapt to changing business requirements and technological advancements.

Redundancy and High Availability

Data centers must ensure uninterrupted access to data and services, making redundancy and high availability critical for network design. Redundancy involves duplicating essential components, such as switches, routers, and links, to eliminate single points of failure. This ensures that if one component fails, there are alternative paths for data transmission, minimizing downtime and maintaining uninterrupted operations. High availability further enhances reliability by providing automatic failover mechanisms and real-time monitoring to detect and address network issues promptly.

Traffic Optimization and Load Balancing

Efficient data flow within a data center is vital to prevent network congestion and bottlenecks. Traffic optimization techniques, such as Quality of Service (QoS) and traffic prioritization, can be implemented to ensure that critical applications and services receive the necessary bandwidth and resources. Load balancing is crucial in evenly distributing network traffic across multiple servers or paths, preventing overutilization of specific resources, and optimizing performance.

Security and Data Protection

Data centers house sensitive information and mission-critical applications, making security a top priority. The network design should incorporate robust security measures, including firewalls, intrusion detection systems, and encryption protocols, to safeguard data from unauthorized access and cyber threats. Data protection mechanisms, such as backups, replication, and disaster recovery plans, should also be integrated into the network design to ensure data integrity and availability.

Monitoring and Management

Proactive monitoring and effective management are essential for maintaining optimal network performance and addressing potential issues promptly. The network design should include comprehensive monitoring tools and centralized management systems that provide real-time visibility into network traffic, performance metrics, and security events. This enables administrators to promptly identify and resolve network bottlenecks, security breaches, and performance degradation.

Data center network design is critical in ensuring efficient, reliable, and secure data transmission within and outside the facility. Scalability, redundancy, traffic optimization, security, and monitoring are key considerations for designing a robust, high-performance network. By implementing best practices and staying abreast of emerging technologies, data centers can build networks that meet the growing demands of the digital age while maintaining the highest levels of performance, availability, and security.

Summary: Data Center Network Design

In today’s digital age, data centers are the backbone of countless industries, powering the storage, processing, and transmitting massive amounts of information. However, the efficiency and scalability of data center network design have become paramount concerns. In this blog post, we explored the challenges traditional data center network architectures face and delved into innovative solutions that are revolutionizing the field.

The Limitations of Traditional Designs

Traditional data center network designs, such as three-tier architectures, have long been the industry standard. However, these designs come with inherent limitations that hinder performance and flexibility. The oversubscription of network links, the complexity of managing multiple layers, and the lack of agility in scaling are just a few of the challenges that plague traditional designs.

Enter the Spine-and-Leaf Architecture

The spine-and-leaf architecture has emerged as a game-changer in data center network design. This approach replaces the hierarchical three-tier model with a more scalable and efficient structure. The spine-and-leaf design comprises spine switches, acting as the core, and leaf switches, connecting directly to the servers. This non-blocking, high-bandwidth architecture eliminates oversubscription and provides improved performance and scalability.

Embracing Software-Defined Networking (SDN)

Software-defined networking (SDN) is another revolutionary concept transforming data center network design. SDN abstracts the network control plane from the underlying infrastructure, allowing centralized network management and programmability. With SDN, data center administrators can dynamically allocate resources, optimize traffic flows, and respond rapidly to changing demands.

The Rise of Network Function Virtualization (NFV)

Network Function Virtualization (NFV) complements SDN by virtualizing network services traditionally implemented using dedicated hardware appliances. By decoupling network functions, such as firewalls, load balancers, and intrusion detection systems, from specialized hardware, NFV enables greater flexibility, scalability, and cost savings in data center network design.

Conclusion:

The landscape of data center network design is undergoing a significant transformation. Traditional architectures are being replaced by more scalable and efficient models like the spine-and-leaf architecture. Moreover, concepts like SDN and NFV empower administrators with unprecedented control and flexibility. As technology evolves, data center professionals must embrace these innovations and stay at the forefront of this paradigm shift.

Address Resolution Protocol

IP Forwarding 

IP Forwarding

The following post discusses IP forwarding and includes an IP forwarding example. IP forwarding is a networking feature that allows a device, such as a router or a computer, to forward IP packets from one network to another. It plays a crucial role in ensuring the smooth flow of data between different networks.

When a device receives an IP packet, it examines the destination IP address to determine the next hop for forwarding it. The forwarding decision is based on the device's routing table, which contains information about network destinations and their associated next hops.

How does a router forward ( IP forwarding ) IP Datagrams? Firstly, let us clarify some terminology. The term routing describes the functionality performed by the control software of routers. This includes routing table maintenance, static route processing, dynamic routing protocols, etc.

The process of IP forwarding involves moving transit packets between interfaces. During the forwarding process, packets are examined in the forwarding table, a forwarding decision is made, and the packet is sent out of the interface.

To implement IP forwarding, various techniques and protocols come into play. One popular protocol is the Internet Control Message Protocol (ICMP), which helps in the exchange of error messages and operational information between network devices.

Routing protocols like OSPF (Open Shortest Path First) and BGP (Border Gateway Protocol) play a crucial role in establishing and maintaining routing tables for efficient IP forwarding. Understanding these protocols and configuring them appropriately is vital for successful implementation.

While IP forwarding is a powerful tool, it is not immune to challenges. Network administrators often encounter issues such as packet loss, routing loops, or suboptimal routing paths. Troubleshooting techniques, such as analyzing routing tables, monitoring network traffic, and performing packet captures, can help identify and resolve these issues.

Additionally, following best practices such as implementing proper security measures, regularly updating routing configurations, and conducting network performance audits can ensure smooth IP forwarding operations.

Highlights: IP Forwarding

IP Forwarding

What is IP forwarding?:

IP forwarding, or packet forwarding, is a fundamental function of network routers. It involves directing data packets from one network interface to another, ultimately reaching their intended destination. This intelligent routing mechanism plays a crucial role in ensuring efficient data transmission across networks, enabling seamless communication.

How Does IP Forwarding Work?

Underneath the surface, IP forwarding operates based on the principles of routing tables. These tables contain valuable network information, including IP addresses, subnet masks, and next-hop destinations. When a data packet arrives at a router, it examines the destination IP address and consults its routing table to determine the appropriate interface to forward the packet. This dynamic decision-making process allows routers to navigate through the vast realm of interconnected networks efficiently.

IP Forwarding Algorithms and Protocols

Various algorithms and protocols are used to optimize the routing process. Some popular algorithms include shortest-path algorithms like Dijkstra’s and Bellman-Ford. Additionally, protocols such as Open Shortest Path First (OSPF) and Border Gateway Protocol (BGP) are widely used in IP forwarding to establish and maintain efficient routing paths.

EIGRP Configuration

IP forwarding is required to use the system as a router.

Imagine a server with two physical Ethernet ports connected to two different networks (your internal network and the outside world via a DSL modem). The system can communicate on either network if you connect and configure those two interfaces. Packets from one network cannot reach another due to the lack of forwarding. Let’s look at ‘route add’ as an example. If you have two network interfaces, you must add two routes, one for each interface. When determining where to send network packets, kernels choose the most suitable route. The kernel does not check the origin of packets when forwarding is disabled. The kernel discards any packet that comes from a different interface.

There is no need to have two physical network interfaces when using a router. VLAN-based servers, for instance, can transfer IP packets between VLANs, but they only have one physical network interface. It is known as a one-armed router. IP forwarding is unnecessary if you have only one physical network interface. IP forwarding must be enabled to transfer packets between two network interfaces (real or virtual) on the same network. Transferring packets between the interfaces makes little sense since they are already on the same network.

router on a stick

Understanding Routing

In the OSI model, the network layer is responsible for routing. Among all the available paths, the network layer chooses the optimal or shortest path from sender to receiver. The routing process uses routing algorithms to calculate the optimal paths.

We can classify routing into three categories: static, default, and dynamic. Static routing is a nonadaptive type in which the administrator adds and defines the route the data needs to follow to reach the destination from the source. In default routing, all packets take a predefined default path. It’s helpful in bulk data transmission and networks with a single exit point. Finally, dynamic routing, also known as adaptive routing, uses dynamic protocols to find new routes to reach the receiver.

Routing is like the GPS of a network, determining the best path for data packets to travel from one network to another. It involves exchanging information between routers, enabling them to make intelligent decisions about forwarding data. Routing protocols, such as OSPF or BGP, allow routers to communicate and build routing tables, ensuring efficient and reliable data transmission across networks. While OSPF and BGP are better for production networks, RIP can be used in a lab environment to demonstrate how dynamic routing protocols work. 

RIP Routing Protocol

Unraveling Switching

Switches are essential to a network’s functioning. They connect computers, wireless access points, printers, servers, and printers within a building or campus, allowing devices to share information and communicate. Plug-and-play network switches do not require configuration and are designed to work immediately. Unmanaged switches provide basic connectivity. They are often found in home networks or places where more ports are needed, such as your desk, lab, or conference room.

A managed switch offers higher security and features because it can be configured according to your network. Through greater control, your network can be better protected, and the quality of service for users can be improved. Switching, however, focuses on efficiently delivering data packets within a network. Switches operate at the data link layer of the OSI model and use MAC addresses to direct packets to their intended destinations.

By building and maintaining MAC address tables, switches facilitate fast and accurate data forwarding within local networks. Spanning Tree Protocol (STP) identifies redundant links and shuts them down to prevent possible network loops. To determine the root bridge, all switches in the network exchange BPDU messages.

Spanning Tree Root Switch

Comparing Routing and Switching

While routing and switching contribute to a network’s functioning, they differ in scope and operation. Routing occurs at the network layer (Layer 3) and is responsible for interconnecting networks, enabling communication between different subnets or autonomous systems. Switching, on the other hand, operates at the data link layer (Layer 2) and focuses on efficient data delivery within a local network. The two most essential equipment for building a small office network are switches and routers. Both devices perform different functions within a network despite their similar appearances.

Switches: what are they?

In a small business network, switches facilitate the sharing of resources by connecting computers, printers, and servers. Whether these connected devices are in a building or on a campus, they can share information and talk to each other thanks to the switch. Switches tie devices together, making it impossible to build a small business network without them.

What is a router?

Routers connect multiple switches and their respective networks to form an even more extensive network, like switches connecting various devices. One or more of these networks may be at one location. When setting up a small business network, you will need one or more routers. Networked devices and multiple users can access the Internet through the router and connect to various networks.

A router acts as a dispatcher, directing traffic and selecting the best route for data packets to travel across the network. With a router, your business can connect to the world, protect information from security threats, and prioritize devices.

In the context of routing, the next hop refers to the next network device or hop to which a packet should be forwarded to reach its destination. It represents the immediate next step in the data’s journey through the network and determines the network interface or IP address to which the packet should be sent. Regarding tunneling protocols, the next hop can be the tunnel interface.

Example Technology: GRE Tunneling

Before diving into the intricacies of GRE tunneling, it is crucial to grasp its fundamental concepts. GRE tunneling involves encapsulating one protocol within another, typically encapsulating IP packets within IP packets. This encapsulation process creates a virtual tunnel through which data can be securely transmitted between two networks. Routing the encapsulated packets through the tunnel allows network administrators to establish seamless connectivity between geographically separated networks, regardless of the underlying network infrastructure.

Implementation of GRE Tunnels

Implementing GRE tunnels requires a systematic approach to ensure their successful deployment. Firstly, it is essential to identify the source and destination networks between which the tunnel will be created. Once identified, the next step involves configuring the tunnel interfaces on the corresponding devices.

This includes specifying the tunnel source and destination IP addresses and selecting suitable tunneling protocols and parameters. Appropriate routing configurations must also be applied to ensure that traffic flows correctly through the tunnel. Properly implementing GRE tunnels lays the foundation for efficient and secure network communication.

GRE configuration

What are Routing Algorithms?

Routing algorithms are computational procedures routers use to determine the optimal path for data packets to travel from the source to the destination in a network. They are crucial in ensuring efficient and effective data transmission, minimizing delays and congestion, and improving network performance. Routing algorithms employ hop count, bandwidth, delay, and load metrics to determine the optimal path for data packets. These algorithms rely on routing tables, which store information about available routes and their associated metrics. By evaluating these metrics, routers make informed decisions to forward packets along the most suitable path to reach the destination.

Distance vector routing

Distance vector routing protocols allow routers to advertise routing information to neighbors from their perspective, modified from the original route. As a result, distance vector protocols do not have an exact map of the entire network; instead, their database reflects how a neighbor router can reach the destination network and its distance from it. Other routers may be on their path toward those networks, but I do not know how many. In addition to requiring less CPU and memory, distance vector protocols can be run on low-end routers.

For example, distance vector protocols are often compared to road signs at intersections that indicate the destination is 20 miles away; people blindly follow these signs without realizing if there is a shorter or better route to the destination or whether the sign is even accurate.

EIGRP routing

Link-State Algorithms

Each connected and directly connected router advertises the link state and metric in link-state dynamic IP routing protocols. Enterprise and service provider networks commonly use OSPF and IS-IS link-state routing protocols. IS-IS advertisements are called link-state packets (LSPs), while OSPF advertisements are called link-state advertisements (LSAs).

Whenever a router receives an advertisement from a neighbor, it stores the information in a local link-state database (LSDB) and advertises it to all its neighbors. As advertised by the originating router, link-state information is essentially flooded throughout the network unchanged. A synchronized and identical map of the network is provided to all routers in the network.

Link State Protocols

Routing neighbor relationship

Routing neighbor relationships refer to the connections between neighboring network routers. These relationships facilitate the exchange of routing information, allowing routers to communicate and make informed forwarding decisions efficiently. Routers can build a dynamic and adaptive network topology by forming these relationships.

Once established, routing neighbor relationships require ongoing maintenance to ensure stability and reliability. Routers periodically exchange keepalive messages to confirm the continued presence of their neighbors. Additionally, routers monitor the health and reachability of their neighbors through various mechanisms, such as dead interval timers and neighbor state tracking.

Example: EIGRP neighbor relationship

The Hello Protocol is the cornerstone of EIGRP neighbor relationship formation. Routers using EIGRP send Hello packets to discover and establish neighbors. These packets contain information such as router ID, hold time and other parameters. Routers can identify potential neighbors and establish a neighbor relationship by exchanging Hello packets.

Once Hello packets are exchanged, routers engage in neighbor discovery and verification. During this phase, routers compare the received Hello packet information with their configured parameters. If the information matches, a neighbor relationship is formed. This verification step ensures that only compatible routers become neighbors, enhancing the stability and reliability of the network.

EIGRP Neighbor and DUAL

Understanding Policy-Based Routing

Policy-based routing, also known as PBR, is a technique that allows network administrators to make routing decisions based on specific policies or conditions. Traditional routing relies solely on destination IP addresses, whereas PBR considers additional factors such as source IP addresses, protocols, or packet attributes. By leveraging these conditions, network administrators can exert fine-grained control over traffic flow.

The flexibility offered by policy-based routing brings numerous advantages. First, it enables the implementation of customized routing policies tailored to specific requirements. Whether prioritizing real-time applications or load balancing across multiple links, PBR empowers administrators to optimize network performance based on their unique needs. Additionally, policy-based routing enhances network security by allowing traffic to be redirected through firewalls or VPNs, providing an added layer of protection.

Implementing Policy-Based Routing

Implementing policy-based routing involves several key steps. The first step is defining the routing policy, which involves determining the conditions and actions to take. This can be done through access control lists (ACLs) or route maps. Once the policy is defined, it must be applied to the desired interfaces or traffic flows. Testing and monitoring the policy’s effectiveness is crucial to ensure that desired outcomes are achieved.

PBR vs. IP Forwarding

Understanding IP Forwarding

IP forwarding is a fundamental concept in networking. It involves forwarding data packets between network interfaces based on their destination IP addresses. IP forwarding operates at the network layer of the OSI model and is essential for routing packets across multiple networks. It relies on routing tables to determine the best path for packet transmission.

Unveiling Policy-Based Routing (PBR)

Policy-based routing goes beyond traditional IP forwarding by allowing network administrators to control the path of packets based on specific policies or criteria. These policies can include source IP address, destination IP address, protocols, or port numbers. PBR enables granular control over network traffic, making it a powerful tool for shaping traffic flows within a network.

Traffic Engineering and Load Balancing: PBR excels in scenarios where traffic engineering and load balancing are crucial. By selectively routing packets based on specific policies, network administrators can distribute traffic across multiple paths, optimizing bandwidth utilization and avoiding congestion. On the other hand, IP forwarding lacks the flexibility to dynamically manage traffic flows, making it less suitable for such use cases.

Security and Access Control: PBR offers an advantage when it comes to enforcing security measures and access control. By defining policies based on source IP addresses or protocols, network administrators can steer traffic through firewalls, intrusion detection systems, or other security devices. IP forwarding cannot selectively route traffic based on such policies, limiting its effectiveness in security-focused environments.

MPLS Forwarding

MPLS, short for Multiprotocol Label Switching, is used in telecommunications networks to efficiently direct data packets. Unlike traditional IP routing, MPLS employs labels to guide packets along predetermined paths, known as Label Switched Paths (LSPs). These labels are attached to the packets, allowing routers to forward them based on this labeling, significantly improving network performance.

Within an MPLS network, routers use labels to determine the optimal path for data packets. Each router along the LSP has a forwarding table that maps incoming labels to outgoing interfaces. Routers swap labels as packets traverse the network, ensuring proper forwarding based on predetermined policies and traffic engineering considerations.

MPLS Forwarding vs IP Forwarding

MPLS (Multi-Protocol Label Switching) is a flexible and efficient forwarding mechanism used in modern networks. At its core, MPLS forwarding relies on label-switching to expedite data transmission. It operates at the network layer (Layer 3) and involves the encapsulation of packets with labels. These labels contain crucial routing information, allowing faster and more deterministic forwarding decisions. MPLS forwarding is widely adopted in service provider networks to enhance traffic engineering, quality of service (QoS), and virtual private networks (VPNs) capabilities.

On the other hand, IP (Internet Protocol) forwarding is the traditional method of routing packets across networks. It is based on the destination IP address present in the packet header. IP forwarding operates at the network layer (Layer 3) and is the backbone of the internet, responsible for delivering packets from the source to the destination. Unlike MPLS forwarding, IP forwarding does not involve encapsulating packets with labels.

Key Differences

While MPLS and IP forwarding share the objective of routing packets, several key differences set them apart. Firstly, MPLS forwarding introduces an additional layer of encapsulation with labels, allowing for more efficient and deterministic routing decisions. In contrast, IP forwarding relies solely on the destination IP address. Secondly, MPLS forwarding provides enhanced traffic engineering capabilities, enabling service providers to optimize network resource utilization and prioritize certain types of traffic. IP forwarding, being more straightforward, does not possess these advanced traffic engineering features.

Benefits of MPLS Forwarding

MPLS forwarding offers several advantages in networking. First, it provides improved scalability, as the label-switching mechanism allows for more efficient handling of large volumes of traffic. Second, MPLS forwarding enables the implementation of QoS mechanisms, ensuring that critical applications receive the necessary bandwidth and prioritization. Additionally, MPLS forwarding facilitates the creation of secure and isolated VPNs, offering enhanced privacy and connectivity options.

Benefits of IP Forwarding

While MPLS forwarding has its own advantages, IP forwarding remains a reliable and widely used forwarding technique. It is simpler to implement and manage, making it suitable for smaller networks or scenarios where advanced traffic engineering features are not required. Moreover, IP forwarding is the foundation of the Internet, making it universally compatible and widely supported.

Datagram networks

Datagram networks are similar to postal networks. Letters sent between people do not require a connection beforehand. Instead, they can provide their address information by dropping the note off at the local post office. A letter is sent to another post office closer to the destination by the post office. The letter traverses a set of post offices to reach the local post office.

A packet in a datagram network includes the source and destination addresses. Packets are sent to the nearest network device, which passes them on to a device closer to the destination. Network devices make routing decisions, also known as forwarding decisions, when they receive packets. This can be determined using the packet’s destination address and routing table. Each intermediate device decides for each packet.

IP Forwarding and NAT

In addition to interconnecting networks, IP forwarding enables network address translation (NAT). NAT allows multiple devices within a private network to share a single public IP address. When a device from the private network sends an IP packet to the internet, the router performing NAT modifies the packet’s source IP address to the public IP address before forwarding it. This allows the device to communicate with the internet using a single public IP address, effectively hiding the internal network structure.

network address translation

Advanced Topic

Understanding BGP Next Hop

BGP Next Hop refers to the IP address to which a router forwards packets destined for a specific network. Traditionally, the next hop address remains unchanged in the BGP routing table, irrespective of the underlying network conditions. However, with Next Hop Tracking, the next hop address dynamically adjusts based on real-time monitoring and evaluation of network paths and link availability.

Enhanced Network Resiliency: By continuously monitoring the status of network paths, BGP Next Hop Tracking enables routers to make informed decisions in the face of link failures or congestion. Routers can swiftly adapt and reroute traffic to alternate paths, thus minimizing downtime and improving network resiliency.

Enabling Next Hop Tracking: Network operators must configure their routers to exchange BGP updates, including the Next Hop Tracking attribute, to allow BGP to Next Hop Tracking. This attribute carries information about the status and availability of alternate paths, allowing routers to make informed decisions when selecting the next hop.

Adjusting Next Hop Metric: Network administrators can also customize the metric used for Next Hop Tracking. By assigning weights or preferences to specific paths, administrators can influence the decision-making process of routers, ensuring traffic is routed as desired.

Next Hop Tracking empowers network administrators to fine-tune traffic engineering policies. Administrators can intelligently distribute traffic across multiple paths by considering factors such as link utilization, latency, and available bandwidth, ensuring optimal resource utilization and improved network performance.

Understanding BGP and Its Challenges

BGP, the internet’s de facto interdomain routing protocol, is designed to exchange routing information between different autonomous systems (ASes). However, as networks grow in size and complexity, challenges arise regarding scalability, convergence, and the propagation of routing updates. These challenges necessitate the use of mechanisms like route reflection.

Route reflection is a technique employed in BGP to reduce the number of full-mesh connections required in a network by allowing route reflectors to act as intermediaries between BGP speakers. The route reflector receives BGP updates from its clients and reflects them to other clients, simplifying the topology and reducing the number of required BGP sessions. This enables efficient propagation of routing information within a network.

The implementation of route reflection brings several benefits to network operators. Firstly, it reduces the required BGP sessions, leading to lower resource consumption and improved scalability. Secondly, it enhances convergence time by eliminating the need for full-mesh connectivity. Additionally, route reflection aids in reducing the complexity of BGP configurations, making network management more manageable and less error-prone.

Understanding Next Hop Redundancy Protocol

NHRP, a routing protocol designed for multipoint networks, plays a crucial role in providing redundancy and enhancing network reliability. By eliminating single points of failure, it enables seamless communication even in the face of link failures or network congestion. Let’s explore how NHRP achieves this.

The underlay functionality of NHRP focuses on establishing and maintaining direct communication paths between network devices. It utilizes Address Resolution Protocol (ARP) to map logical to physical addresses, ensuring efficient routing within the underlay network. By dynamically updating and caching information about network devices, NHRP minimizes latency and optimizes data transmission.

In addition to its underlay capabilities, NHRP also provides overlay functionality. This enables communication between devices across different network domains, utilizing overlay tunnels. By encapsulating packets within these tunnels and dynamically discovering the most efficient paths, NHRP ensures seamless connectivity and load balancing across the overlay network.

Related: Before you proceed, you may find the following post helpful for pre-information:

  1. OpenFlow protocol
  2. What is OpenFlow
  3. Data Center Performance
  4. Dead Peer Detection
  5. Network Connectivity
  6. Computer Networking



IP Forwarding Example.

Key What is IP Forwarding Discussion points:


  • Discussion on packet forwarding.

  • IP forwarding example.

  • The CEF process.

  • TCP performance.

  • A final note on security concerns.

Basics: IP forwarding with an IP Forwarding example

IP routing is the process by which routers forward IP packets. Routers have to take different steps when forwarding IP packets from one interface to another, which has nothing to do with the “learning” of network routes via static or dynamic routing protocols.

Key Points on Default Gateway
Default gateways are used by hosts to reach destinations outside their networks. Default gateways are routers or multilayer switches (switches that can do routing).

When it wants to send something to another host, a host checks whether the destination is within or outside its network. When in the same network, it uses address resolution protocol (ARP) to determine the destination’s MAC address and can then send IP packets. What is the host’s method for checking whether the destination belongs to the same network? Subnet masks are used for this purpose.

Analysis:

Netstat is a versatile utility used across different operating systems, including Windows, macOS, and Linux. It lets users view active network connections, monitor traffic, and troubleshoot network issues.

The netstat command generates displays that show network status and protocol statistics. You can display the status of TCP and UDP endpoints in table format, routing table information, and interface information.

Netstat offers a wide range of commands that can be utilized to gather specific information. Let’s take a look at some commonly used netstat commands:

    • netstat -a: This command displays all active connections and listening ports on the system, providing valuable insights into network activities.
    • netstat -r: This command lets users view the system’s routing table, which shows the paths that network traffic takes to reach its destination. This information is crucial for network troubleshooting and optimization.
    • netstat -n: This command instructs netstat to display numerical IP addresses and port numbers instead of resolving them to hostnames. It can help identify network performance issues and potential security risks.

Advanced Netstat Techniques

Netstat goes beyond essential network monitoring. Here are a few advanced techniques that can further enhance its utility:

    • netstat -p: This command allows users to view the specific process or application associated with each network connection. Administrators can identify potential bottlenecks and optimize system performance by knowing which processes are utilizing network resources.
    • netstat -s: By using this command, users can access detailed statistics about various network protocols, including TCP, UDP, and ICMP. These statistics provide valuable insights into network utilization and can aid in troubleshooting network-related issues.

IP Forwarding

Before you get into the details, let me cover the basics. A router receives a packet on one of its interfaces and then forwards the packet out of another based on the contents found in the IP header. For example, if the packet were part of a video stream ( Multicast / multi-destination), it would be forwarded to multiple interfaces. If the packet were part of a typical banking transaction ( Unicast ), it would be forwarded to one of its interfaces.

As each routing device forwards the packet hop-by-hop, the packet’s IP header remains relatively unchanged, containing complete instructions for forwarding the packet. However, the data link headers ( the layer directly below ) may change radically at each hop to match the changing media types. 

For example, the router receives a packet on one of its attached Ethernet Segments. The routers will first look at the packet’s data-link header, which is Ethernet. If the Ethertype is set to (0x800 ), indicating an IP packet ( A unicast MPLS packet has an Ethertype value of 0x8847 ), the Ethernet header is stripped from the packet, and the IP header is examined.

IP forward
Diagram: IP forward. The source is WordPress Site.

The “learning” of network routes

IP packet forwarding has nothing to do with the “learning” of network routes carried out through static or dynamic routing protocols. Instead, IP forwarding has everything to do with routers’ steps when they forward an IP packet from one interface to another.

Let us say we have two host computers and two routers. Host1, on the left, will send an IP packet to Host2, which is on the right. This IP packet has to be routed by R0 and R1. Host 1 is on 192.168.1.0/24, and Host 2 is on 192.168.2.0/24

Let’s start with Host1, which creates an IP packet with its IP address (192.168.1.1) as the source and Host 2 (192.168.2.1) as the destination. So the first question that Host1 will need to determine:

  • Question: Is the destination I am trying to reach either local or remote?

To get the answers to this question, look at its IP address, subnet mask, and destination IP address.

1st Lab Guide: IP Forwarding

Our network has Host1, which is in network 192.168.1.0/24. As a result, all IP addresses in the 192.168.1.1 – 254 range are local. Unfortunately, our destination (192.168.2.1, which is the remote Host) is outside the local subnet, so we must use the default gateway. In our case, the default gateway is the connected router.

Therefore, Host 1 will build an Ethernet frame and enter its source MAC address. Host 1 then has to ask itself another question. What is the destination MAC address of the default gateway, the connected router?

Analysis:

To determine this, the Host checks its ARP table to find the answer. In our case, Host1 has an ARP entry for 192.168.1.2, the default gateway. It has an ARP entry, as I did a quick testing ping before I took the screenshot. It is a dynamic entry, not a static one, as I did not manually enter this MAC address. If the Host did not have an APR entry, it would have sent an ARP request.

Note:

Ping uses the ICMP protocol, and IP uses the network layer (layer 3). Our IP packet will have a source IP address of 192.168.1.1 and a destination IP address of 192.168.1.2. The next step is to put our IP packet in an Ethernet frame, where we set our source MAC address PC0 and destination MAC address PC1.

Now, wait a second. How does PC0 know the MAC address of PC1? We know the IP address because we typed it, but H1 cannot identify the MAC address of H2. Another protocol that will solve this problem is ARP (Address Resolution Protocol).

Address Resolution Protocol

So, with ARP, I have the IP address, and I want to find out the MAC address as we move down the OSI model’s layers so bits can be forwarded on the wire. At this stage, the Ethernet frame carries an IP packet. Then, the frame will be on its way to the directly connected router.

In the network topology at the start of the lab guide, the Ethernet frame makes it to R0. So, as this is a Layer 3 router and not a Host or Layer 2 switch, it has more work. So the first thing it does is check if the FCS (Frame Check Sequence) of the Ethernet frame is correct or not:

If the FCS is incorrect, the frame is dropped right away. Unfortunately, Ethernet has no error recovery; this is done by protocols on upper layers, like TCP on the transport layer.

Frame Check Sequence

If the FCS is correct, we will process the frame if:

  • The destination MAC address is the address of the router interface.
  • The destination MAC address is the subnet’s broadcast address to which the router interface is connected.
  • The destination MAC address is a multicast address that the router listens to.

Note:

In our case, the destination MAC address matches the MAC address of R0’s GigabitEthernet 0/0 interface. Therefore, the router will process it. But, first, we de-encapsulate ( which means we will extract) the IP packet out of the Ethernet frame, which is then discarded:

The router, R0, will now look at the IP packet, and the first thing it does is check if the header checksum is OK. Remember that there is also no error recovery on the Layer 3 network layer; we rely on the upper layers. Then R0 checks its routing table to see if there is a match:

Above, you can see that R0 knows how to reach the 192.168.2.0/24 network. This is because the destination address has a next-hop IP address, 192.168.10.2, which is the connecting Router – R1.

It will now do a second routing table lookup to see if it can reach 192.168.10.2. This is called recursive routing. The Recursive Route Lookup follows the same logic of dividing a task into subtasks of the same type. The device repeatedly performs its routing table lookup until it finds the ongoing interface to reach a particular network.

As you can see above, there is an entry for 192.168.10.0/24 with GigabitEthernet 0/1 as the interface. Once this is done, R0 checks its local ARP table to determine if there is an entry for 192.168.10.2. Similar to the sending Host, if there were no ARP entries, R0 would have to send an ARP request to find the MAC address of 192.168.10.2. 

The next stage is that R0 builds a new Ethernet frame with its MAC address of the GigabitEthernet 0/1 interface and R1 as the destination. The IP packet is then encapsulated in this new Ethernet frame.

In the routing table, we find this:

Network 192.168.2.0/24 directly connects to R1 on its GigabitEthernet 0/1 interface. R1 will now reduce the TTL of the IP packet from 254 to 253, recalculate the IP header checksum, and check its ARP table to see if it knows how to reach 192.168.2.1. There is an ARP entry there. The new Ethernet frame is created, and the IP packet is encapsulated. Host2 then looks for the protocol field to determine what transport layer protocol we are dealing with; what happens next depends on the protocol used. 

Note: The TTL represents a field in IP packets that helps prevent infinite loops and ensures the proper functioning of routing protocols. As a packet traverses through routers, the TTL value gets decremented, and if it reaches zero, the packet is discarded. This mechanism prevents packets from endlessly circulating in a network, facilitating efficient and reliable data transmission.

Knowledge Check: ICMP ( Internet Control Message Protocol )

What is ICMP?

ICMP is a protocol that operates at the network layer of the Internet Protocol Suite. Its primary purpose is to report and diagnose errors that occur during the transmission of IP packets. ICMP messages are typically generated by network devices, such as routers or hosts, to communicate vital information to other devices in the network.

ICMP Functions and Types

ICMP serves several functions, but its most common use is error reporting. When a network device encounters an issue while transmitting an IP packet, it can use ICMP to send an error message back to the source device. This enables the source device to take appropriate action, such as retransmitting the packet or choosing an alternative route.

ICMP messages come in different types, each serving a specific purpose. Some commonly encountered ICMP message types include Destination Unreachable, Time Exceeded, Echo Request, and Echo Reply. Each type has its unique code and is crucial in maintaining network connectivity.

Internet control message protocol

ICMP and Network Troubleshooting

ICMP is an invaluable tool for network troubleshooting. Providing error reporting and diagnostic capabilities helps network administrators identify and resolve network issues more efficiently. ICMP’s ping utility, for example, allows administrators to test a remote host’s reachability and round-trip time. Traceroute, another ICMP-based tool, helps identify the path packets take from the source to the destination.

ICMP and Network Security

While ICMP serves critical networking functions, it is not without its security implications. Certain ICMP message types, such as ICMP Echo Request (ping), can be abused by malicious actors for reconnaissance or denial-of-service attacks. Network administrators often implement security measures, such as ICMP rate limiting or firewall rules, to mitigate potential risks associated with ICMP-based attacks.

Lab Guide: EIGRP and the Variance Command

EIGRP operates at the network layer and uses a distance-vector routing algorithm. It is designed to exchange routing information efficiently, calculate the best paths, and adapt to network changes. Cisco EIGRP offers a range of advanced features that can enhance network performance and scalability. We will explore features such as route redistribution, load balancing, and stub routing. In this guide, I will look at the EIGRP variance feature. Below, I have 4 routers. I have changed the bandwidth and delay. Once changed, the routing table only shows one path to the 4.4.4.4 loopback connected to R4.

EIGRP Configuration

Understanding EIGRP Variance

EIGRP, or Enhanced Interior Gateway Routing Protocol, is a dynamic routing protocol widely used in enterprise networks. By default, EIGRP only allows load balancing across equal-cost paths. However, in scenarios where multiple paths have varying costs, the EIGRP variance command comes to the rescue. It enables load balancing across unequal-cost paths, distributing traffic more efficiently and effectively.

Configuring EIGRP Variance

Configuring the EIGRP variance command is relatively straightforward. First, you need to access the router’s command-line interface. Once there, you can enter the global configuration mode and specify the variance value. The variance value represents the multiplier determining the feasibility condition for unequal-cost load balancing. For instance, a variance of 2 means that a path with a metric up to two times the best path metric is considered feasible.

Impact on Network Performance

The EIGRP variance command significantly impacts network performance. It effectively utilizes network resources and improves overall throughput by allowing load balancing across unequal-cost paths. Additionally, it enhances network resilience by providing alternative paths in case of link failures. However, it’s important to note that deploying the EIGRP variance command requires careful consideration of network topology and link costs to ensure optimal results. 

EIGRP Configuration

Inter-network packet transfer

IP forwarding is used for inter-network packet transfer and not for inter-interface transfers. Therefore, if two interfaces are on the same network, we don’t need to enable IP forwarding.

Let us look at an example of IP forwarding. Consider a server with two physical ethernet ports, such as your internal network and the outside world, as a DSL modem provides. The system can communicate on either network if you connect and configure those two interfaces. However, packets from one network cannot travel to another if forwarding is not enabled.

IP forwarding determines which path a packet or datagram can be sent. The process uses routing information to make decisions and is designed to send a packet over multiple networks. Generally, networks are separated from each other by routers.

IP Forwarding Example
Diagram: IP Forwarding Example.

IP Forwarding Example: Verifying the IP Header

The router then verifies the IP header’s contents by checking several fields to validate it. In addition, the router should check that the entire packet has been received by checking the IP length against the size of the received Ethernet packet. If these basic checks fail, the packet is deemed malformed and discarded.

Next, the router verifies the TTL field in the IP header and determines that it is greater than 1. The Time-To-Live field ( TTL ) specifies how long the packet should live, and its value is counted in terms of the number of routers the packet ( technically a datagram ) has traversed ( hop count ).

IP Forwarding Example: The TTL values

The source host selects the initial TTL value and is recommended to use 64. In specific scenarios, other values are set to limit the time, in hops, that the packet should live. The TTL aims to ensure the packet does not circulate forever when there are routing loops. Each router in the path decrements the TTL field by 1 when it forwards the packet out of its interface (s). When the TTL field is decremented to 0, the packet is discarded, and a message known as an ICMP ( Internet Control Message Protocol ) TTL Exceeded is sent back to the host.

 IP Forwarding Example: Checking the Destination

For IP forwarding, the router then looks at the destination IP address, which can be either a destination host ( Unicast ), a group of destination hosts ( multicast ), or all hosts on the segment ( broadcast ). As mentioned previously, the router has what is known as a routing table, which tells it how to forward a packet, and the destination IP address is a crucial component for the routing table lookup.

Forwarding is done on a destination base; if I want to get to destination X, I must go to Y ( the source routing concept is not in this article’s scope). The contents of the router routing table are parsed, and the best-matching routing table entry is returned, indicating whether to forward the packet and, if so, the interface to forward the packet out of and the IP address of the following IP router ( if any ) in the packet’s path.

The CEF process

CEFs (Cisco Express Forwarding) are packet-switching technology in Cisco routers. It is designed to enhance network forwarding performance by reducing the overhead associated with Layer 3 forwarding. CEF utilizes a Forwarding Information Base (FIB) and an adjacency table to forward packets quickly to their destinations.

The FIB Table

The FIB maintains a list of all known IP prefixes and their associated next-hop addresses. The adjacency table holds information about the Layer 2 addresses of directly connected neighbors. When a packet arrives at a router, the CEF algorithm consults the FIB to determine the appropriate next-hop address. Then, it uses the adjacency table to forward the packet to the correct interface.

On a Cisco device, the actual moving of the packet from the inbound interface to the outbound interface is carried out by a process known as CEF ( Cisco Express Forwarding ). CEF is a mirror image of the routing table, and any changes in the routing table are reflected in the CEF table. It has a structure different from the routing table, allowing fast lookups.

If you want to parse the routing table, you have to start at the top and work your way down; this can be time-consuming and resource-intensive, especially if your match is the last entry in the routing table. CEF structures let you search on the bit boundary and optimize the routing process with the adjacency table.

IP forwarding in router
Diagram: IP forwarding in router. Source is IPwithease

2nd Lab Guide: CEF Operations

In this lab guide, I will address CEF operations. There are different switching methods to forward IP packets. Here are the different switching options: Remember that the default is CEF.

  • Process switching:
    • The CPU examines all packets, and all forwarding decisions are made in software…very slow!
  • Fast switching (also known as route caching):
    • The CPU examines the first packet in a flow; the forwarding decision is cached in hardware for the next packets in the same flow. This is a faster method.
  • (CEF) Cisco Express Forwarding (also known as topology-based switching):
    • Forwarding table created in hardware beforehand. All packets will be switched using hardware. This is the fastest method, but it has some limitations. Multilayer switches and routers use CEF.

For example, a multilayer switch will use the information from tables built by the (control plane) to build hardware tables. It will use the routing table to construct the FIB (Forwarding Information Base) and the ARP table to create the adjacency table. This is the fastest switching method because we now have all the layer two and three information required to forward IP packets in hardware.

Below are three routers and I, using static routing for complete reachability. On R1, let’s look at the routing and CEF tables. Let’s take a detailed look at the entry for network 3.3.3.0 /24:

CEF operations

CEF operations

Note:

How Cisco Express Forwarding Works

CEF builds and maintains two key data structures: the FIB and Adjacency Table. The FIB contains information about the best path to forward packets, while the Adjacency Table stores Layer 2 next-hop addresses. These data structures enable CEF to make forwarding decisions quickly and efficiently, resulting in faster packet processing.

Advanced Features and Customization Options

Cisco Express Forwarding offers various advanced features and customization options to meet specific networking requirements. These include features like NetFlow, which provides detailed traffic analysis, and load-balancing algorithms allowing intelligent traffic distribution. Administrators can tailor CEF to optimize network performance and adapt to changing traffic patterns.

Understanding the MTU

Suppose a router receives a unicast packet too large to be sent out in one piece, as its length exceeds the outgoing interface’s Maximum Transmission Unit ( MTU ). In that case, the router attempts to split the packet into several smaller pieces called fragments. The difference between IPv4 and IPv6 fragmentation is considerable. Slicing packets into smaller packets adversely affects performance and should be avoided.

One way to avoid fragmentation is to have the exact MTU on all links and have the hosts send a packet with an MTU within this range. However, this may not be possible due to the variety of mediums and administrative domains a packet may take from source to destination. Path MTU discovery ( PMTUD ) is the mechanism used to determine the maximum size of the MTU in the path between two end nodes ( source and destination ).

IPv6 fragmentation example
Diagram: IPv6 fragmentation example

PMTUD

PMTUD dynamically determines the lowest MTU of each link between the end nodes. A host sends an initial packet ( datagram ) with the size of an MTU for that interface, with the DF ( don’t fragment ) bit set.

Any router in the path with a lower MTU discards the packet and returns an ICMP ( Type 4—fragmentation needed and DF set) to the source. This message is also known as a “packet too big” message. The sender estimates a new size for the packet, and the process continues until the PMTU is found.

IP Forwarding Example: Modifying the IP Forwarding

The basics of IP forwarding can be modified in several ways, resulting in data packets taking different paths through the network, some of which are triggered by routing convergence. In previous examples, we discussed routers consulting their routing tables to determine the next hop and single exit interface to send a packet to its destination – “destination-based forwarding. ” However, a router may have multiple paths ( exit interface ) to reach a destination.

These paths can then spread traffic to a destination prefix across alternative links, called multipath routing or load balancing, resulting in more bandwidth available for traffic to that destination.

In a layer three environment, links with the exact cost are considered for equal-cost multipathing and can load balance traffic across those links. You can, however, have unequal cost links ( links with different costs ) used for multipathing, but this needs to be supported in the forwarding routing protocol, e.g., EIGRP.

However, the method used equal or unequal cost multipathing when multiple paths to a destination prefix were used; the router routing table lookup would return numerous next hops. Using the virtual overlay network, we have the underlay and overlay concept for additional abstracts. They are commonly seen in the Layer-3 data center.

IP Forwarding Example: TCP performance

Generally, routers want to guarantee that packets belonging to a given TCP connection always travel the same path. Reordering the TCP packets would reduce TCP performance and increase CPU cycles if done in software. For this reason, routers use a hash function of some TCP connection identifiers ( source and destination IP address ) to choose among the multiple next hops. A TCP connection is identified by a 5-tuple, which refers to a set of five values that comprise a TCP/IP connection.

The OSI model

It includes a source IP address/port range, destination IP address/port number, and the protocol in use. A router can load on any of these. In addition, recent availing technologies let L2 load balance ( ECMP ), such as THRILL and Cisco FabricPath, allow you to build massive data center topologies with Layer 2 multipathing. These data centers often operate under a spine-leaf architecture.

IP options

An application can also modify the handling of its packets by extending the IP headers with one or more IP options. IP options are generally used to aid in statistic collection (route record and timestamp) and not to influence path determination as they offer a performance hit. The Internet routers are already optimized for packet forwarding without additional options.

Security devices or filters implemented on routers generally block strict-source and loose-source routes that can be used to control the path packets take. The router then prepends the appropriate data-link header for its outgoing interface. The ARP process then resolves the next-hop IP to the data-link address ( MAC address ), and the router sends the packet to the next hop, where the process is repeated.

A keynote: Ethernet frames have an L2 identifier known as a MAC address, which has 6 bytes for the destination address and 6 bytes for the source address.

The ARP Process

The ARP process is straightforward, translating the IP address ( L3 ) into the associated MAC addresses ( L2 ). Consider the communication between two hosts on an Ethernet segment: Host 1 has an IP address of 10.10.10.1, and Host 2 has an IP address of 10.10.10.2.

For these hosts to communicate, they must build frames at L2 with source and destination hardware MAC addresses. Host 1 opens a web browser and tries to connect to a service on host 2, which has a destination of 10.10.10.2. Host 1 uses ARP to map the IP address to the MAC address of the destination host.

  • Host 1 sends a broadcast ARP request on the Ethernet LAN segment, which contains the IP address of the destination host ( 10.10.10.2 )

  • As this message is a broadcast, all the hosts on the segment receive the ARP broadcast request and examine the IP field of the request

  • Hosts 2 identifies its IP in the request and sends an ARP response with the information about its MAC address. The ARP response is a unicast (single network destination identified by a unique address ) to the host that generated the request

  • The hosts will now cache the ARP requests and response information results into a cache table known as the ARP table

The ARP Table

The ARP tables optimize communications between directly connected devices. Upon receiving an ARP response, devices keep the IP-to-MAC address mapping for some time, usually up to 4 hours. This means a router does not need to send an ARP request for any IP address previously learned. ARP tables may also be updated by what is known as a gratuitous ARP. A gratuitous ARP is an ARP request that a host sends to itself to update its neighbors’ ARP tables.

An example is when a VM is moved from one ESX host to another. For the other devices to know it has moved, it sends a gratuitous ARP. This process updates the router’s ARP table. Due to ARP’s simplistic approach to operation, Layer 2 attacks can exploit its vulnerability.

ARP Security Concerns

One of ARP’s leading security drawbacks is that it does not provide any control that proves that a particular MAC address corresponds to a given IP address. An attacker can exploit this by sending a forged ARP reply with its MAC address and the IP address of a default gateway.

When victims update their ARP table with this new entry, they send packets to the attacker’s host instead of the intended gateway. The attacker can then monitor all traffic destined for the default gateway. This is known as ARP spoofing.

While IP forwarding is a fundamental network connectivity component, it also introduces potential security risks. Unauthorized access to an IP forwarding-enabled device can result in traffic interception, redirection, or even denial of service attacks. Therefore, it is crucial to implement proper security measures, such as access control lists (ACLs) and firewalls, to protect against potential threats.




Key IP Forwarding Summary Points:

Main Checklist Points To Consider

  • Forwarding IP packets is based on the contents found in the IP header.

  • Packets will typically exit one interface. However, in the case of multicast traffic, packets can exit many interfaces.

  •  Routers perform several verifications, such as the TTL field in the IP header.

  • The CEF process moves the packets from one interface to the other.

  • Avoid MTU issues with PMTU. 

3rd Lab Guide: EIGRP Authentication

Routing Protocol Security 

EIGRP authentication is a feature that adds an extra layer of security to network communication. It allows routers to validate the authenticity of the neighboring routers before exchanging routing information. By implementing EIGRP authentication, you can prevent unauthorized devices from participating in the routing process and ensure the integrity of your network.

Note:

Two main components are required to implement EIGRP authentication: a key chain and an authentication algorithm. The key chain contains one or more keys, each with a unique key ID and a corresponding key string. The authentication algorithm, such as MD5 or SHA, generates a message digest based on the key string.

EIGRP Authentication

How does authentication benefit us?

  • Every routing update packet you receive from your router will be authenticated.
  • False routing updates from unapproved sources can be prevented.
  • Malicious routing updates should be ignored.

A potential hacker could attempt the following things sitting on your network with a laptop:

  • Advertise junk routes in your neighbor adjacency.
  • Test whether you can drop the neighbor adjacency of one of your authorized routers by sending malicious packets.

EIGRP keychain

Analysis:

To ensure the success of EIGRP authentication, it is crucial to follow some best practices. These include regularly changing passwords, using complex and unique shared secrets, enabling encryption for password transmission, and monitoring authentication logs for suspicious activity.

Conclusion: EIGRP authentication is an effective means of securing network communication within an enterprise environment. By implementing authentication mechanisms such as simple password authentication or message digest authentication, network administrators can mitigate the risk of unauthorized access and malicious routing information. Remember to carefully configure and manage authentication settings to maintain a robust and secure network infrastructure.

Route Summarization

Route summarization involves creating one summary route that represents multiple networks/subnets. Supernetting is also known as route aggregation.

There are several advantages to summarizing:

  • Reduces memory requirements by shrinking routing tables.
  • We save bandwidth by advertising fewer routes.
  • Processing fewer packets and maintaining smaller routing tables saves CPU cycles.
  • A flapping network can cause routing tables to become unstable.

Summarizing has some disadvantages as well:

  • A router will drop traffic for unused networks without an appropriate destination in the routing table. The summary route may include networks not in use when we use summarization. Routers that have summary routes forward traffic to routers that advertise summary routes.
  • The router prefers the path with the longest prefix match. When you use summaries, your router may choose another path where it has learned a more specific network. There is also a single metric for the summary route.

RIP Routing Protocol

R1 advertises four different networks, and R2 receives them. The more information we advertise, the more bandwidth we require and the more CPU cycles we need to process. Of course, four networks on a Gigabit interface are no problem, but thousands or hundreds of networks are advertised in more extensive networks.

RIP Configuration

RIP creates a summary entry in the RIP routing database when a summary address is needed in the RIP database. Summary addresses remain in routing databases as long as there are child routes. In addition to removing the last child route, the summary entry is also removed from the database. Because each child route is not listed in an entry, this method of handling database entries reduces the number of entries in the database.

For RIP Version 2 route summarization, the lowest metric of an aggregated entry or the lowest metric of all current child routes must be advertised. For aggregated summarized routes, the best metric should be calculated at route initialization or when metric modifications are made to specific routes at the time of advertisement, not when the aggregated routes are advertised.

RIP configuration

Administrative Distance

Suppose a network runs two routing protocols at once, OSPF and EIGRP. R1 receives information from both routing protocols.

  • The router should send IP packets using the top path according to EIGRP.
  • The router should use the bottom path to send IP packets in OSPF.

What routing information are we going to use? Which one? Do you use OSPF or EIGRP?

When two routing protocols provide information about the same destination network, we must make a choice. You can’t go left and right at the same time. We need to consider AD or administrative distance.

Administrative distance should be as low as possible. Directly connected routes have an AD of 0. Directly connecting it to your router makes sense since nothing is better. Because static routes are configured manually, they have a shallow administrative distance 1. You can sometimes override a routing protocol’s decision with a static route.

Since EIGRP is a Cisco routing protocol, its administrative distance is 90. RIP has 120, while OSPF has 110. Because EIGRP’s AD of 90 is lower than OSPF’s 110, we will use the information EIGRP tells us in the routing table.

Closing Points: IP Forwarding

IP forwarding is crucial in ensuring efficient and reliable data transmission in networking. As a fundamental component of network routing, IP forwarding enables packets of data to be forwarded from one network interface to another, ultimately reaching their intended destination. In this blog post, we delved deeper into IP forwarding, its significance, and its implementation in modern network infrastructures.

IP forwarding, or packet forwarding, refers to direct data packets from one network interface to another based on their destination IP addresses. It is the backbone of network routing, enabling seamless data flow across different networks. IP forwarding is typically performed by routers, specialized network devices designed to route packets between networks efficiently.

IP forwarding is essential in enabling effective communication between devices on different networks. By forwarding packets based on their destination IP addresses, IP forwarding allows data to traverse multiple networks, reaching its intended recipient. This capability is particularly critical in large-scale networks, such as the Internet, where data needs to travel through numerous routers before reaching its final destination.

Implementation of IP Forwarding:

Routers use routing tables to determine the best path for forwarding packets to implement IP forwarding. These routing tables contain a list of network destinations and associated next-hop router addresses. When a router receives a packet, it examines the destination IP address and consults its routing table to determine the appropriate next-hop router for forwarding the packet. This process continues until the packet reaches its final destination.

IP forwarding relies on routing protocols, such as Border Gateway Protocol (BGP) or Open Shortest Path First (OSPF), to exchange routing information between routers. These protocols enable routers to dynamically update their routing tables based on changing network conditions, ensuring packets are forwarded along the most optimal paths.

Summary: IP Forwarding

Section 1: Understanding IP Forwarding

IP forwarding is the fundamental process in which routers and network devices direct incoming packets to their destination. It acts as a traffic cop, intelligently routing data based on network IP addresses. IP forwarding ensures efficient and reliable data transmission by analyzing packet headers and leveraging routing tables.

Section 2: The Benefits of IP Forwarding

2.1 Enhanced Network Efficiency:

IP forwarding optimizes network efficiency by selecting the most efficient path for data transmission. It dynamically adapts to network changes, rerouting packets to avoid congestion and bottlenecks. This results in faster data transfers, reduced latency, and improved overall network performance.

2.2 Scalability and Flexibility:

With IP forwarding, networks can scale effortlessly. It allows for the creation of complex network topologies and the seamless integration of various devices and technologies. Whether connecting local networks or bridging geographically dispersed networks, IP forwarding provides the flexibility needed for modern network infrastructures.

2.3 Secure Communication:

IP forwarding plays a crucial role in securing communication. Incorporating advanced routing protocols and encryption mechanisms ensures the confidentiality and integrity of transmitted data. With IP forwarding, organizations can establish secure virtual private networks (VPNs) and safeguard sensitive information from prying eyes.

Section 3: IP Forwarding in Action

3.1 Enterprise Networks:

IP forwarding is the cornerstone of interconnecting multiple branches, data centers, and remote offices in large-scale enterprise networks. It enables seamless communication and data exchange, facilitating collaboration and enhancing productivity across the organization.

3.2 Internet Service Providers (ISPs):

For ISPs, IP forwarding is the lifeblood of internet connectivity. It allows them to efficiently route traffic between their networks and other ISPs, ensuring uninterrupted internet access for end-users. IP forwarding enables the global interconnectivity we rely on for browsing the web, streaming media, and accessing cloud services.

Conclusion:

In conclusion, IP forwarding is not merely a technical process but a catalyst for seamless connectivity. Its ability to efficiently route data, enhance network performance, and ensure secure communication makes it an indispensable component of modern networking. Whether you’re a network engineer, an IT professional, or a curious technology enthusiast, understanding IP forwarding empowers you to unlock the true potential of interconnected networks.

IP Forwarding Example

Forwarding Routing Protocols

Forwarding Routing Protocols

Forwarding routing protocols are crucial for computer networks, enabling efficient data transmission and device communication. This blog post will explore forwarding routing protocols, their significance, and some famous examples.

Forwarding routing protocols, or routing algorithms, determine the paths data packets take in a network. These protocols are vital in delivering information from a source to a destination device. They ensure data packets are transmitted along the most efficient paths, minimizing delays and optimizing network performance.

Forwarding routing protocols are essential components of network communication. They determine the best path for data packets to travel from source to destination, taking into consideration factors such as network congestion, link reliability, and available bandwidth. By efficiently directing traffic, forwarding routing protocols enhance network performance and ensure reliable data transmission.

There are several types of forwarding routing protocols, each with its own characteristics and use cases. This section will explore some of the most common ones, including:

- Distance Vector Routing Protocols:
- Link State Routing Protocols
- Hybrid Routing Protocols:

Choosing the right forwarding routing protocol for a specific network environment requires careful consideration of various factors.

What is IP routing? To answer this question, we must first understand routers' protocol to forward messages. Forwarding routing protocols are networking protocols that facilitate communication between different network nodes.

They are responsible for finding the optimal path for data to travel from one node to another and managing and maintaining routing tables containing information about the available paths for various destinations.

Highlights: Forwarding Routing Protocols

Routing Protocols

Protocol Algorithms:

Forwarding routing protocols are algorithms and protocols used by routers to determine the best path for routing data packets from source to destination. These protocols use routing tables and various metrics to make intelligent decisions about packet forwarding. They facilitate efficient data transmission by dynamically updating and maintaining routing information.

Forwarding routing protocols are vital for maintaining a reliable and efficient network infrastructure. They enable routers to make intelligent routing decisions, adapt to network topology changes, and handle traffic load balancing. Without forwarding routing protocols, networks would be inefficient, prone to congestion, and lack fault tolerance.

Example: Understanding RIP

RIP, or Routing Information Protocol, is one of the oldest distance-vector routing protocols that is still in use today. Initially developed for smaller networks, RIP operates by sharing routing information between neighboring routers. It uses the hop count as its metric, representing the number of routers a packet must traverse to reach its destination.

RIP employs a simple approach to routing. Each router periodically broadcasts its routing table to its neighboring routers, which, in turn, update their tables accordingly. This process ensures that every router within the network has up-to-date knowledge of the available routes.

RIP Routing Protocol

Starting points: Networking Protocols

Networking protocols facilitate communication between computer systems. Today, computer systems use three main protocols: Ethernet, TCP/IP, and Fibre Channel. Cables are used to connect various networking devices using Ethernet. Wireless computer networks are created using the TCP/IP protocol. Fiber channels are used to transfer large amounts of data between computers.

Routing, forwarding, and switching are network terms used when data is sent from one party to another. Each plays a crucial role in data delivery. Routing is the process of moving data from one device to another. Forwarding involves collecting data from one device and sending it to another. With switching, data is collected from one device and sent to multiple devices based on their MAC addresses.

Moving data between devices

Moving data between devices is known as routing. Networking devices called routers perform routing most of the time. Furthermore, routers can forward connections to other networks. In addition, routers help create and manage networks. Within networks, they move data from one device to another. Routers can also transmit data across different networks in some cases. Routing is done at the network layer in the OSI model. The network layer chooses the optimal or shortest path from sender to receiver. Optimal paths are calculated using routing algorithms.

OSI Model and testing

The forwarding process involves collecting data from one device and sending it to another. Unlike routing, this process does not move data between devices. Forwarding differs from routing because it performs some actions instead of simply forwarding packets. It doesn’t decide the path. The packets are only sent to another network in the forwarding process: The network layer performs routing and forwarding. Forwarding devices collect data and send it to another device. Switches, routers, and hubs are standard forwarding devices.

Forwarding Methods

Let’s discuss some popular forwarding methods in networking. In the next hop method, packets are sent from the router to the next gateway in the direction of the destination. Routing tables with network-specific entries contain destinations connected to routers. A routing table is a set of rules, often displayed as a table, that determine where data packets will be directed over an Internet Protocol (IP) network. Routers and switches, as well as all IP-enabled devices, use routing tables. Lastly, when using the host-specific method, the routing table contains information about all the destination hosts in the destination network.

The Role of Switching

Data is switched from one port to another by collecting it from one port and sending it to the destination by switching. There are two types of switching: connectionless and connection-oriented. Connectionless switching does not require handshaking to establish a connection. A forwarding table determines how packets received in a port are sent. Conversely, connection-oriented switching uses a predefined circuit between the sender and receiver and an intermediate node ID.

Switching techniques can be divided into circuit, message, and packet switching. Circuit switching requires establishing a circuit before sending data. The received data is treated as a message when message switching is used and sent to the intermediate switching device. Packet switching breaks the data into small chunks called packets. Each packet is transmitted independently.

router on a stick

IP Routing

Routing is the process of moving IP packets from one network to another. IP routing protocols or static configuration allow routers to learn about nonattached networks. When a network topology change occurs, dynamic IP routing protocols update the network topology without intervention. Depending on the size of the network, IP routing may be limited to static routes due to design or hardware limitations.

Static routes are not accommodating when the topology changes and can be burdensome for network engineers. An IP packet is forwarded to its destination IP address with the help of a router that selects a loop-free path through a network.

Autonomous systems are networks of interconnected routers and related systems managed by a common network administrator. A global network of autonomous systems makes up the Internet.

Rules and Algorithms

Forwarding routing protocols are rules and algorithms determining the best path for data packets to follow within a network. They facilitate the exchange of routing information between routers and ensure that information is forwarded most efficiently. These protocols direct data packets from the source device to the correct destination device, providing reliable and timely delivery.

Example: EIGRP DUAL

DUAL, an abbreviation for Diffusing Update Algorithm, is the decision-making process EIGRP routers use to calculate the best path to reach a destination. It ensures loop-free and efficient routing within a network. To comprehend DUAL, we must explore its key components: the feasible distance (FD), reported distance (RD), and successor and feasible successor routes.

Feasible Distance (FD) is the metric for the best-known path to a destination. It represents the cumulative cost of all the links on that path. Reported Distance (RD) is the metric for a neighbor’s path to the same destination. These two values play a vital role in DUAL’s decision-making process. Successor routes are the best paths chosen by DUAL to reach a destination. A router selects the path with the lowest FD as its successor route.

Feasible Successor routes, on the other hand, are backup paths that have a higher FD but are still loop-free. These routes are pre-calculated and provide fast convergence if the successor route fails. Network convergence refers to the time it takes routers to update their routing tables after a change occurs in the network topology. DUAL plays a crucial role in achieving rapid convergence in EIGRP. DUAL minimizes the time and resources required for network convergence by maintaining successor and feasible successor routes.

EIGRP Neighbor and DUAL

Common Forwarding Routing Protocols

Two of the most commonly used forwarding routing protocols are Open Shortest Path First (OSPF) and Border Gateway Protocol (BGP). OSPF is an interior gateway protocol (IGP) used within autonomous systems and networks managed by a single administrative entity. It uses a link-state algorithm to determine the best route for data to travel. Conversely, BGP is an exterior gateway protocol (EGP) used to connect autonomous systems. It uses a path vector algorithm to determine the best route for data to travel.

Both protocols are essential for routing data across networks, and they both have their advantages and disadvantages. OSPF is more efficient and supports more features, while BGP is more secure and reliable. However, both protocols are required to communicate data across networks efficiently.

Understanding MPLS Forwarding

MPLS (Multiprotocol Label Switching) forwarding is a technique routers use to efficiently direct data packets through a network. Unlike traditional IP routing, MPLS forwarding employs labels to identify and forward packets along pre-determined paths quickly. By using labels, MPLS forwarding eliminates the need for complex IP lookups, resulting in faster and more streamlined data transmission.

Enhanced Performance: MPLS forwarding improves network performance by reducing latency and packet loss. Labels enable routers to make forwarding decisions more swiftly, resulting in lower transmission delays and increased overall efficiency.

Traffic Engineering: MPLS forwarding allows network administrators to engineer traffic paths based on specific requirements. By defining explicit paths for different types of traffic, MPLS enables better control over bandwidth utilization, ensuring that critical applications receive the necessary resources.

Quality of Service (QoS): MPLS forwarding enables the implementation of QoS policies, prioritizing certain types of traffic over others. This ensures that higher-priority applications, such as voice or video, receive the necessary bandwidth and experience minimal latency.

Understanding the Basics of LDP

LDP, or Label Distribution Protocol, is a signaling protocol that operates at the network layer. Its primary function is establishing and maintaining Label Switched Paths (LSPs) in a Multiprotocol Label Switching (MPLS) network. LDP enables routers to forward traffic along predetermined paths by assigning labels to network packets.

LDP Operation and Label Distribution

To comprehend LDP fully, it’s essential to understand how it operates and distributes labels. LDP uses a discovery mechanism to identify neighboring routers and establish peer relationships. Once peers are established, the protocol exchanges label mapping information, allowing routers to build forwarding tables and determine the appropriate paths for incoming packets.

Advanced Topic 

Understanding BGP Next Hop Tracking:

BGP Next Hop Tracking is a mechanism that allows a router to track the reachability of the next hop IP address in the routing table. Essentially, it enables routers to dynamically adjust their routing decisions based on the availability of the next hop. By monitoring the reachability of the next hop, BGP Next Hop Tracking enhances network stability and enables faster convergence during link failures or network changes.

Implementing BGP Next Hop Tracking requires proper configuration on BGP-enabled routers. The specific steps may vary depending on the network equipment and software being used. Generally, it involves enabling the Next Hop Tracking feature and specifying the desired parameters, such as the interval for tracking next hop reachability and the action to be taken upon reachability changes.

BGP Next Hop Tracking is useful in various network architectures. It is commonly used in multi-homed networks, where multiple ISPs are connected to a single network. By tracking the reachability of the next hops, routers can intelligently select the best path for sending traffic and avoid blackholing or suboptimal routing. Additionally, BGP Next Hop Tracking is beneficial when network policies require specific routing decisions based on next-hop reachability.

Understanding BGP Route Reflection

BGP route reflection is a technique to reduce the number of full-mesh peerings required in a BGP network. Network administrators can simplify their network topology and improve scalability by introducing route reflectors. Route reflectors act as centralized points for route distribution, allowing BGP speakers to establish fewer connections while still effectively exchanging routing information.

In a route reflection setup, route reflectors receive BGP updates from their clients and reflect those updates to other clients. The reflection process involves modifying the BGP attributes to preserve the path information while avoiding routing loops. This enables efficient propagation of routing information across the network while reducing the computational overhead associated with maintaining a full mesh of BGP peers.

Forwarding Protocols.

Key Forwarding Routing Protocols Design Discussion Points:


  • Introduction to forwarding routing protocols and what is involved.

  • Highlighting the details of the TCP/IP suite.

  • Technical details on the packet and the datagram. 

  • Scenario: Routing tables and forwarding.

  • Details on routing convergence and path selection.

Related: Before you proceed, you may find the following posts helpful:

  1. IP Forwarding
  2. Routing Convergence
  3. OpenFlow Protocol
  4. IPsec Fault Tolerance
  5. BGP SDN
  6. ICMPv6
  7. SDN Router
  8. Segment Routing
  9. Routing Control Platform
  10. Computer Networking

Back to Basics: Forwarding Routing Protocols

Switching and Routing

Before we get into the technical details of which protocol routers use to forward messages, let us address the basics. We know we have Layer 2 switches that create Ethernet LANs. So, all endpoints physically connect to a Layer 2 switch. And if you are on a single LAN with one large VLAN, you are prepared with this setup as switches work out of the box, causing conclusions based on Layer 2 MAC addresses. However, what if you want to send data from your network to another, across the Internet, or a different set of VLANs in different IP subnets?

Routers and Switches

In this case, we need a Layer 3 router and an IP routing process with an IP forwarding algorithm. So, do you want to know which protocol routers are used to forward messages? The Layer 3 router uses the information in the IP header to determine whether and where to forward each received packet and which network interface to send the packet to.

Examples: Forwarding Routing Protocols

One of the most commonly used forwarding routing protocols is the Routing Information Protocol (RIP). RIP is a distance-vector protocol that uses a metric, typically hop count, to determine the best path for data packets. It exchanges routing information with neighboring routers and updates its routing table accordingly. RIP is suitable for small to medium-sized networks due to its simplicity and ease of configuration.

Another widely used forwarding routing protocol is the Open Shortest Path First (OSPF) protocol. OSPF is a link-state protocol that calculates the shortest path to a destination based on various factors, such as bandwidth, delay, reliability, and cost. It advertises link-state information to neighboring routers, allowing them to build a complete topology of the network. OSPF is commonly implemented in large-scale networks due to its scalability and advanced features.

Border Gateway Protocol (BGP) is a forwarding routing protocol commonly used in internet service provider (ISP) networks. BGP is an exterior gateway protocol that facilitates the exchange of routing information between different autonomous systems (ASes). It enables ISPs to select the best path for data packets based on various policies, such as path length, network congestion, and customer preferences. BGP is crucial for maintaining a stable and efficient internet routing infrastructure.

1st Lab Guide: OSPF

In the following lab guide, we address OSPF. OSPF, developed by the Internet Engineering Task Force (IETF), is an interior gateway protocol (IGP) used for routing within autonomous systems (AS). A link-state routing protocol uses the Shortest Path First (SPF) algorithm to determine the best path for forwarding data packets. OSPF is widely adopted due to its scalability, fast convergence, and support for multiple network types.

Note:

Notice that we have two OSPF neighbors. We use the default broadcast network type and have an OSPF status of FULL/DR. I have changed the OSPF cost on the link Gi1 so that we can perform traffic engineering. Now that the links have the exact OSPF costs, a total metric of 4, we can perform ECMP. You can also bond links; we combine two links for additional bandwidth.

Forwarding Routing Protocols

Example: OSPF Routed Core

With a leaf and spine, we can have a routed core. So, we gain the benefits of running a routing protocol, such as OSPF, all the way down to the access layer. This has many advantages, such as full use of links. The guide below has three routers: two leaves and two spines. OSPF is the routing protocol with Area 0; we are not running STP.

Therefore, we can have Layer 3 routing for both spines to reach the destinations on Leaf B. I have a loopback configured on Leaf B of 1.1.1.1. Each leaf has an OSPF neighbor relationship to each spine with an OSPF network type of Broadcast. Notice the command: Show IP route 1.1.1.1 on Leaf A.

Note:

We initially only had one path via Spine B, i.e., the shortest path based on OSPF cost. Once I made the OSPF costs the same for the entire path  (Cost of 4, routing metric of 4 ), we installed 2 paths in the routing table and can now rely on the fast convergence of OSPF for link failure detection and recovery.

We will expand this with one of the following lab guides in this blog with VXLAN and create a layer 2 overlay. Remember that ACI does not have OSPF and uses IS-IS; it also has a particular configuration for VXLAN, and much of the CLI complexity is abstracted. However, the focus of these lab guides is on illustration and learning.

The process of routing and network stretch

Routing is selecting a path for traffic in a network or between or among multiple networks. Routing is performed for various networks, including the Internet, circuit-switched, and packet-switched networks. The routing process usually directs forwarding based on routing tables, which maintain a record of the routes to various network destinations. Thus, constructing routing tables in the router’s memory is crucial for efficient routing.

Routing is typically based on the shortest path algorithm, which finds the shortest path from source to destination in a network. The shortest path algorithm can be implemented using various techniques, such as Dijkstra’s and Bellman-Ford’s algorithms. In addition, routing can also be based on other criteria, such as least cost, lowest delay, or highest reliability.

Routing Tables

Routing protocols are used to maintain router routing tables. These protocols enable the routers to exchange information about the network topology, such as which nodes are connected, and then determine the best routes. The most common routing protocols are the Open Shortest Path First (OSPF) and the Routing Information Protocol (RIP).

Routing also ensures that data sent over the Internet reaches its destination. To do this, routers use the Internet Protocol (IP) to forward packets between networks. They examine the packet’s IP header and use this information to determine the best route for the packet.

The routing process
Diagram: The routing process. The source is Baeldung.

Lab Guide: EIGRP Configuration

EIGRP stands for Enhanced Interior Gateway Routing Protocol and is a routing protocol created by Cisco. Initially, it was only available on Cisco hardware, but for a few years, it’s now an open standard. EIGRP is called a hybrid or advanced distance vector protocol, and most of the rules that apply to RIP also apply here:

  • Split Horizon
  • Route Poisoning
  • Poison Reverse

EIGRP routers will send hello packets to other routers like OSPF; if you send and receive them, you will become neighbors. EIGRP neighbors will exchange routing information, which will be saved in the topology table

Configuring EIGRP is similar to RIP. The “1” is the AS number, which must be the same on all routers! We require the no auto-summary command because, by default, EIGRP behaves classfully, and we want it to be classless.

EIGRP Neighbors

Next, let’s have a look at the routing table below. The first thing you might notice is that you see a “D” for the EIGRP entries. You see a “D” and not an “E” because the last one has already been taken for EGP, an old routing protocol we no longer use. “D” stands for “dual,” which is the mechanism behind EIGRP. The loopback 4.4.4.0 is connected to R4, and R1 has two ways to reach this network. This is because all links are Gigabit Ethernet, and I have not changed any metrics.

EIGRP routing

EIGRP Changes

Routing vs Forwarding

Often, routing is confused with forwarding, but routing is a different process. When routing data, routers move data between devices. During data forwarding, a device collects data from one device and sends it to another. Let’s take a closer look at the forwarding process.

The forwarding process involves collecting data from one device and sending it to another. Data is not moved from one device to another in this process. In contrast to routing, forwarding performs some actions and forwards packets to intermediate routers. It does not determine the path. We only forward the packets to another attached network in the forwarding process.

The network layer performs both routing and forwarding. A forwarding device collects data and sends it to another. Hubs, routers, and switches are some of the most popular forwarding devices.

3rd Lab Guide: IS-IS Routing Protocol

In the following sample, we have an IS-IS network.

The ISIS routing protocol is a link-state routing protocol that operates at the OSI (Open Systems Interconnection) layer 2. It was initially developed for large-scale networks such as the Internet, where scalability, stability, and efficient routing are paramount.

Note:

Below, we have four routers. R1 and R2 are in area 12, and R3 and R4 are in area 34. R1 and R3 are intra-area routers so they will be configured as level 1 routers. R2 and R4 form the backbone so these routers will be configured as levels 1-2.

Routing Protocol
Diagram: Routing Protocol. ISIS.

♦ Key Features of ISIS Routing Protocol:

Hierarchical Design: ISIS employs a hierarchical design, dividing the network into areas to simplify routing and improve scalability. Each region has a designated router, the Intermediate System (IS), responsible for exchanging routing information with other ISes.

Link-State Database: ISIS maintains a link-state database that contains information about the network topology and the state of individual links. This database calculates the shortest path to a destination and ensures efficient routing.

2.3. Dynamic Updates: ISIS uses a dynamic routing algorithm to exchange routing information between ISes. It continuously updates the link-state database based on network changes, ensuring the routing information is always current.

2.4. Support for Multiple Routing Protocols: ISIS is interoperable with protocols such as OSPF (Open Shortest Path First) and BGP (Border Gateway Protocol). This flexibility allows networks to integrate ISIS with existing routing infrastructures seamlessly.

Packet-switching Networks

The Internet is a packet-switching network that enables its attached devices, such as your personal computer ( PC ), to exchange information with other devices. Information exchange could take many different forms. From a user level, it could be checking your bank balance with Internet banking, buying a book on an Amazon website, watching a movie online, or downloading your favorite song.

Hypertext Transfer Protocol ( HTTP ) accounts for most Internet traffic and is the protocol behind the World Wide Web ( WWW ). However, for these upper-layer protocols ( HTTP ) to work efficiently and offer a satisfactory user experience, elements lower in the Open Systems Interconnection ( OSI ) communication module must be fine-tuned and operational for data transfers. 

Packet Switching Networks
Diagram: Packet Switching Networks. Source is GeeksforGeeks.

Forwarding Protocols

Which protocol is used by routers to forward messages?

  • The two transport protocols

The TCP/IP protocol suite supports two transport protocols ( Layer 4 ): Transmission Control Protocol (TCP ) and User Datagram Protocol ( UDP ). TCP reliably provides a host-to-host communication service, while UDP provides host-to-host communication in an unreliable fashion.

As a result, TCP offers many services better suited for applications requiring certain service guarantees and error correction and detection, such as Border Gateway Protocol, which operates on Port 179. On the other hand, UDP offers fewer services and is helpful for situations where packet loss is less sensitive, but time delays are more problematic.

Port 179
Diagram: Port 179 with BGP peerings.

This information is traversed across the Internet backbone via the Network ( Layer 3 ) and Data Link layer ( Layer 2 ). It is encoded in long strings of bits called packets. Packets describe a chunk of data going from the IP ( Internet Protocol ) layer to the network interface ( Data Link Layer ).

Lab Guide: BGP Next hop tracking

BGP Next Hop Tracking is a feature within the BGP routing protocol that allows routers to track the reachability of next-hop IP addresses. By monitoring the availability of next-hop routers, network administrators can make informed decisions regarding traffic routing and ensure efficient packet transmission.

BGP Next Hop Tracking offers several advantages in terms of network resilience. Firstly, it enables the identification and avoidance of black holes or suboptimal routing paths by detecting unreachable next-hop IP addresses. This ensures traffic is efficiently routed along viable paths, minimizing latency and potential packet loss. Additionally, BGP Next Hop Tracking facilitates faster convergence during network failures by swiftly redirecting traffic to alternate paths, reducing the impact of network disruptions.

The Packet and a Datagram

A packet is not the same as a datagram and can be either an IP datagram or a fragment of an IP datagram. Note: The terminology “packet” refers to the Ethernet payload, which consists of the IP header and the user data. The terminology frame refers to the data link headers and the payload.

As these packets travel through the Internet from their source ( your personal computer ) to their destination ( Amazon website ), certain decisions are made by each device the packet traverses. These are known as routing decisions and determine if the packet should go this way or that way.

The devices making these decisions are called routers. Different routers act at different network points, such as over the WAN with SD-WAN routers: SD WAN tutorial.

IP Packet versus IP Datagram
The diagram shows the different definitions of an IP packet compared to an IP datagram. It also shows how an IP datagram is fragmented into two IP packets, with the second IP packet being the second part of the first IP packet.

IP packet vs Datagram
Diagram: IP packet vs Datagram. Source is crnetpacket

Routing Tables and Routing Protocols

These devices have a routing table that tells them how and where to forward the packets. The routing table is populated by a dynamic or static process called a routing protocol. A static routing protocol is specific to that device, manually configured, and is not automatically populated to other routers.

A dynamic process runs distributed algorithms that the routers run among themselves to make the correct routing decision.

An example of a dynamic routing protocol is OSPF, and a static routing protocol would be a static route. A router’s routing protocol may be Distance Vector Algorithms or Link-State Algorithms. Distance Vector Algorithms are more straightforward and usually try to find paths with a simple metric, such as the number of router hops ( devices ) to the destination.

Then, on the WAN side of things, we have Border Gateway Protocol (BGP) and the use case of BGP SDN. We are enabling WAN virtualization and SDN traffic optimizations.

4th Lab Guide: EIGRP

In the following, we have an EIGRP network that consists of two routers.

Note:

Efficient Exchange of Routing Information

One of the strengths of EIGRP lies in its ability to exchange routing information with neighboring routers. Using Hello packets and Update packets, EIGRP establishes and maintains neighbor relationships. This dynamic exchange ensures that routers are constantly updated with the latest network topology information, facilitating efficient route computation and decision-making.

EIGRP

 For neighbor discovery and recovery, EIGRP neighbors send hello packets. EIGRP will form a neighbor relationship with another router if you send and receive hello packets. If you receive hello packets from the other side, EIGRP will assume the other router is still present. When you no longer receive them, you’ll lose the neighbor relationship called adjacency, and EIGRP might have to look for another route.

EIGRP uses RTP (Reliable Transport Protocol) to deliver packets between neighbors in a reliable and orderly manner. There are two ways to send packets, multicast and unicast, and not all packets are sent reliably to keep things efficient. We need acknowledgment from the other side to ensure our packets are reliable.

EIGRP topology

Analysis:

Populating the Topology Table

EIGRP populates its topology table by exchanging Hello and Update packets with neighboring routers. These packets carry information about the network’s topology, such as feasible successors, advertised distances, and reported distances. As EIGRP receives these updates, its topology table will be updated accordingly.

Computing the Best Paths

Once the topology table is populated, EIGRP utilizes the DUAL algorithm to determine the best paths to reach destination networks. The algorithm considers bandwidth, delay, reliability, and load to calculate each route’s composite metric, the metric value. This metric value aids in selecting the optimal path for packet forwarding.

Maintaining and Updating the Topology Table

The EIGRP topology table is a dynamic entity that undergoes constant updates. EIGRP ensures that the topology table is kept current as changes occur in the network. When a link or router fails, EIGRP recalculates paths based on the remaining available routes and updates the topology table accordingly.

Routing convergence: Determine the Best Path

A router runs its algorithm and determines the best path to a particular destination; the router then notifies all of the neighboring routers of its current path; concurrently, the router’s neighbors also inform the router of their best paths. All of this occurs in a process known as routing convergence.

Rouitng Convergence

Forwarding in Networking


Detect


Describe


Switch 


Find

After seeing all the other best paths from its neighboring devices, the router may notice a better path through one of its neighbors. If so, the router updates its routing table with better paths. A link-state algorithm employs a replicated database approach compared to a Distance Vector Algorithm ( distributed calculation ).

Each router contributes to database pieces; every device adds an element to create a complete network map. However, instead of advertising a list of distances to each known destination, the router advertises the states of its local links ( interfaces ).

routing convergence
The well-known steps in routing convergence.

Link state advertisements

These link-state advertisements are then advertised to the other routers; all these messages combine to complete a network database synchronized between each router at regular intervals.

Essentially, link-state protocols must flood information about the topology to every device in the network, and the distance ( path ) vector protocols must process the topology change information at every hop through the network.

 A final note on forwarding protocols: Forwarding routing protocols

Routing protocols continually reevaluate their contents, and the process of finding new information after a change in the network is called convergence. A network deemed to be highly available must have not only a redundant physical topology but also fast convergence so that service degradation or interruption is avoided. Convergence should be designed efficiently at Layer 2 and Layer 3 levels.

Fast convergence of Layer 2 environments is designed with the Spanning Tree Protocol ( STP ) enhancements, notably PVST+. In L3 environments, we prefer routing protocols that can quickly find new information ( next hops ), with protocols having a short convergence. 

You might conclude from the descriptions of both link-state and distance-vector protocols that link-state algorithms will always converge more quickly than distance or path-vector protocols. However, this isn’t the case; both converge exceptionally promptly if the underlying network has been designed and optimized for operation. 

Closing Points: Forwarding Routing Protocols

Forwarding routing protocols play a crucial role in efficiently transmitting data across networks. This blog post delved into forwarding routing protocols, exploring their significance, functionality, and types. By the end, you will clearly understand how these protocols enable seamless communication between devices on a network.

Forwarding routing protocols have several key benefits that make them essential in network communication:

1. Scalability: Forwarding routing protocols enable networks to expand and accommodate a growing number of devices. These protocols dynamically adapt to changes in network topology, allowing for the seamless integration of new devices and routes.

2. Redundancy: Forwarding routing protocols continuously exchange routing information to ensure alternative paths are available in case of link failures. This redundancy enhances network reliability and minimizes downtime.

3. Load Balancing: Forwarding routing protocols distribute network traffic across multiple paths, optimizing network performance and preventing congestion. This feature allows for efficient utilization of network resources.

Types of Forwarding Routing Protocols:

Various forwarding routing protocols are designed to cater to specific network requirements. Let’s explore some of the most commonly used types:

1. Distance Vector Protocols:

Distance vector protocols, such as the Routing Information Protocol (RIP), use a simple approach to determining the best path. Routers exchange their routing tables, which contain information about the distance and direction of various network destinations. RIP, for example, evaluates paths using hop count as a metric.

2. Link State Protocols:

Link state protocols, such as Open Shortest Path First (OSPF), build a detailed database of the network’s topology. Routers share information about their directly connected links, allowing each router to construct a complete network view. This comprehensive knowledge enables OSPF to calculate the shortest path to each destination.

3. Hybrid Protocols:

Hybrid protocols, like Enhanced Interior Gateway Routing Protocol (EIGRP), combine elements of both distance vector and link state protocols. These protocols balance simplicity and efficiency, utilizing fast convergence and load-balancing features to optimize network performance.

Forwarding routing protocols are essential for ensuring reliable and efficient data transmission in computer networks. By determining the optimal paths for data packets, these protocols contribute to the overall performance and stability of the network. Understanding different forwarding routing protocols, such as RIP, OSPF, and BGP, is crucial for network administrators and engineers to design and manage robust networks.

Forwarding protocols are vital in modern networking, enabling efficient data routing and seamless network communication. Understanding these protocols’ different types, benefits, and challenges is crucial for network administrators and engineers. Organizations can confidently navigate the digital highway by implementing best practices and staying abreast of advancements in forwarding routing protocols.

Summary: Forwarding Routing Protocols

In the vast landscape of computer networks, efficient data transmission is critical. Forwarding routing protocols play a crucial role in ensuring that data packets are delivered accurately and swiftly. In this blog post, we explored the world of forwarding routing protocols, their types, and their significance in modern networking.

Understanding Forwarding Routing Protocols

Forwarding routing protocols are algorithms routers use to determine the best path for data packets to traverse through a network. They enable routers to make informed decisions based on various factors such as network topology, cost metrics, and congestion levels. These protocols optimize network performance and ensure reliable data transmission by efficiently forwarding packets.

Types of Forwarding Routing Protocols

There are several forwarding routing protocols, each with its characteristics and use cases. Let’s explore a few prominent ones:

Distance Vector Protocols

Distance Vector protocols, such as Routing Information Protocol (RIP), share routing information with neighboring routers. They exchange routing tables periodically, making routing decisions based on the number of hops to reach a destination. While simple to implement, distance vector protocols may suffer from slow convergence and limited scalability.

Link State Protocols

Link State protocols, like Open Shortest Path First (OSPF), take a different approach. Routers in a link state network maintain detailed information about the network’s topology. Routers build a comprehensive network view by flooding link state advertisements and calculating the shortest path to each destination. Link state protocols offer faster convergence and better scalability but require more computational resources.

Hybrid Protocols

Hybrid protocols, such as Enhanced Interior Gateway Routing Protocol (EIGRP), combine the advantages of both distance vector and link state protocols. They offer the simplicity of distance vector protocols while providing faster convergence and better scalability. Hybrid protocols are widely used in enterprise networks.

Significance of Forwarding Routing Protocols

Forwarding routing protocols are crucial for efficient network operations. They bring several benefits to the table:

Optimal Path Selection

By analyzing network metrics and topology, forwarding routing protocols enable routers to choose the most efficient path for packet forwarding. This results in reduced latency, improved network reliability, and better overall performance.

Load Balancing

Many forwarding routing protocols support load balancing, distributing traffic across multiple paths. This helps prevent congestion on certain links and ensures efficient resource utilization throughout the network.

Fault Tolerance

Forwarding routing protocols often incorporate mechanisms to handle link failures and reroute traffic dynamically. In case of link failures, routers can quickly adapt and find alternative paths, minimizing downtime and maintaining network connectivity.

Conclusion:

In conclusion, forwarding routing protocols are the backbone of modern computer networks. They provide the intelligence needed for routers to make informed decisions, ensuring efficient packet forwarding and optimal network performance. By understanding the different types and significance of forwarding routing protocols, network administrators can design robust and scalable networks that meet the demands of today’s digital world.