What is OpenFlow

What is OpenFlow

What is OpenFlow?

In today's rapidly evolving digital landscape, network management and data flow control have become critical for businesses of all sizes. OpenFlow is one technology that has gained significant attention and is transforming how networks are managed. In this blog post, we will delve into the concept of OpenFlow, its advantages, and its implications for network control.

OpenFlow is an open-standard communications protocol that separates the control and data planes in a network architecture. It allows network administrators to have direct control over the behavior of network devices, such as switches and routers, by utilizing a centralized controller.

Traditional network architectures follow a closed model, where network devices make independent decisions on forwarding packets. On the other hand, OpenFlow introduces a centralized control plane that provides a global view of the network and allows administrators to define network policies and rules from a centralized location.

OpenFlow operates by establishing a secure channel between the centralized controller and the network switches. The controller is responsible for managing the flow tables within the switches, defining how traffic should be forwarded based on predefined rules and policies. This separation of control and data planes allows for dynamic network management and facilitates the implementation of innovative network protocols.

One of the key advantages of OpenFlow is its ability to simplify network management. By centralizing control, administrators can easily configure and manage the entire network from a single point of control. This reduces complexity and enhances the scalability of network infrastructure. Additionally, OpenFlow enables network programmability, allowing for the development of custom networking applications and services tailored to specific requirements.

OpenFlow plays a crucial role in network virtualization, as it allows for the creation and management of virtual networks on top of physical infrastructure. By abstracting the underlying network, OpenFlow empowers organizations to optimize resource utilization, improve security, and enhance network performance. It opens doors to dynamic provisioning, isolation, and efficient utilization of network resources.

Highlights: What is OpenFlow?

How does OpenFlow work?

1: OpenFlow allows network controllers to determine the path of network packets in a network of switches. There is a difference between switches and controllers. With separate control and forwarding, traffic management can be more sophisticated than access control lists (ACLs) and routing protocols.

2: An OpenFlow protocol allows switches from different vendors, often with proprietary interfaces and scripting languages, to be managed remotely. Software-defined networking (SDN) is considered to be enabled by OpenFlow by its inventors.

3: With OpenFlow, Layer 3 switches can add, modify, and remove packet-matching rules and actions remotely. By doing so, routing decisions can be made periodically or ad hoc by the controller and translated into rules and actions with a configurable lifespan, which are then deployed to the switch’s flow table, where packets are forwarded at wire speed for the duration of the rule.

The Role of OpenFlow Controllers

If the switch cannot match packets, they can be sent to the controller. The controller can modify existing flow table rules or deploy new rules to prevent a structural traffic flow. It may even forward the traffic itself if the switch is instructed to forward packets rather than just their headers.

OpenFlow uses Transport Layer Security (TLS) over Transmission Control Protocol (TCP). Switches wishing to connect should listen on TCP port 6653. In earlier versions of OpenFlow, port 6633 was unofficially used. The protocol is mainly used between switches and controllers.

### Origins of OpenFlow

The inception of OpenFlow can be traced back to the early 2000s when researchers at Stanford University sought to create a more versatile and programmable network architecture. Traditional networking relied heavily on static and proprietary hardware configurations, which limited innovation and adaptability. OpenFlow emerged as a solution to these challenges, offering a standardized protocol that decouples the network control plane from the data plane. This separation allows for centralized control and dynamic adjustment of network traffic, fostering innovation and agility in network design.

### How OpenFlow Works

At its core, the OpenFlow protocol facilitates communication between network devices and a centralized SDN controller. It does this by using a series of flow tables within network switches, which are programmed by the controller. These flow tables dictate how packets should be handled, whether they are forwarded to a destination, dropped, or modified in some way. By leveraging OpenFlow, network administrators can deploy updates, optimize performance, and troubleshoot issues with unprecedented speed and precision, all from a single point of control.

Introducing SDN

Recent changes and requirements have driven networks and network services to become more flexible, virtualization-aware, and API-driven. One major trend affecting the future of networking is software-defined networking ( SDN ). The software-defined architecture aims to extract the entire network into a single switch.

Software-defined networking (SDN) is an evolving technology defined by the Open Networking Foundation ( ONF ). It involves the physical separation of the network control plane from the forwarding plane, where a control plane controls several devices. This differs significantly from traditional IP forwarding that you may have used in the past.

The Core Concepts of SDN

At its heart, Software-Defined Networking decouples the network control plane from the data plane, allowing network administrators to manage network services through abstraction of lower-level functionality. This separation enables centralized network control, which simplifies the management of complex networks. The control plane makes decisions about where traffic is sent, while the data plane forwards traffic to the selected destination. This approach allows for a more flexible, adaptable network infrastructure.

The activities around OpenFlow

Even though OpenFlow has received a lot of industry attention, programmable networks and decoupled control planes (control logic) from data planes have been around for many years. To enhance ATM, Internet, and mobile networks’ openness, extensibility, and programmability, the Open Signaling (OPENING) working group held workshops in 1995. A working group within the Internet Engineering Task Force (IETF) developed GSMP to control label switches based on these ideas. June 2002 marked the official end of this group, and GSMPv3 was published.

What is OpenFlow

Data and control plane

Therefore, SDN separates the data and control plane. The main driving body behind software-defined networking (SDN) is the Open Networking Foundation ( ONF ). Introduced in 2008, the ONF is a non-profit organization that wants to provide an alternative to proprietary solutions that limit flexibility and create vendor lock-in.

The insertion of the ONF allowed its members to run proof of concepts on heterogeneous networking devices without requiring vendors to expose their software’s internal code. This creates a path for an open-source approach to networking and policy-based controllers. 

Knowledge Check: Data & Control Plane

### Data Plane: The Highway for Your Data

The data plane, often referred to as the forwarding plane, is responsible for the actual movement of packets of data from source to destination. Imagine it as a network’s highway, where data travels at high speed. This component operates at the speed of light, handling massive amounts of data with minimal delay. It’s designed to process packets quickly, ensuring that the information arrives where it needs to be without interruption.

### Control Plane: The Brain Behind the Operation

While the data plane acts as the highway, the control plane is the brain that orchestrates the flow of traffic. It makes decisions about routing, managing the network topology, and controlling the data plane’s operations. The control plane uses protocols to determine the best paths for data and to update routing tables as needed. It’s responsible for ensuring that the network operates efficiently, adapting to changes, and maintaining optimal performance.

### Interplay Between Data and Control Planes

The synergy between the data and control planes is what enables modern networks to function effectively. The control plane provides the intelligence and decision-making necessary to guide the data plane. This interaction ensures that data packets take the best possible paths, reducing latency and maximizing throughput. As networks evolve, the lines between these planes may blur, but their distinct roles remain pivotal.

### Real-World Applications and Innovations

The concepts of data and control planes are not just theoretical—they have practical applications in technologies such as Software-Defined Networking (SDN) and Network Function Virtualization (NFV). These innovations allow for greater flexibility and scalability in managing network resources, offering businesses the agility needed to adapt to changing demands.

Building blocks: SDN Environment 

As a fundamental building block of an SDN deployment, the controller, the SDN switch (for example, an OpenFlow switch), and the interfaces are present in the controller to communicate with forwarding devices, generally the southbound interface (OpenFlow) and the northbound interface (the network application interface).

In an SDN, switches function as basic forwarding hardware, accessible via an open interface, with the control logic and algorithms offloaded to controllers. Hybrid (OpenFlow-enabled) and pure (OpenFlow-only) OpenFlow switches are available.

OpenFlow switches rely entirely on a controller for forwarding decisions, without legacy features or onboard control. Hybrid switches support OpenFlow as well, in addition to traditional operation and protocols. Today, hybrid switches are the most common type of commercial switch. A flow table performs packet lookup and forwarding in an OpenFlow switch.

### The Role of the OpenFlow Controller

The OpenFlow controller is the brain of the SDN, orchestrating the flow of data across the network. It communicates with the network devices using the OpenFlow protocol to dictate how packets should be handled. The controller’s primary function is to make decisions on the path data packets should take, ensuring optimal network performance and resource utilization. This centralization of control allows for dynamic network configuration, paving the way for innovative applications and services.

### OpenFlow Switches: The Workhorses of the Network

While the controller is the brain, OpenFlow switches are the workhorses, executing the instructions they receive. These switches operate at the data plane, where they forward packets based on the rules set by the controller. Each switch maintains a flow table that matches incoming packets to particular actions, such as forwarding or dropping the packet. This separation of control and data planes is what sets SDN apart from traditional networking, offering unparalleled flexibility and control.

### Flow Tables: The Heart of OpenFlow Switches

Flow tables are the core component of OpenFlow switches, dictating how packets are handled. Each entry in a flow table consists of match fields, counters, and a set of instructions. Match fields identify the packets that should be affected by the rule, counters track the number of packets and bytes that match the entry, and instructions define the actions to be taken. This modular approach allows for precise traffic management and is essential for implementing advanced network policies.

OpenFlow Table & Routing Tables 

### What is an OpenFlow Flow Table?

OpenFlow is a protocol that allows network controllers to interact with the forwarding plane of network devices like switches and routers. At the heart of OpenFlow is the flow table. This table contains a set of flow entries, each specifying actions to take on packets that match a particular pattern. Unlike traditional routing tables, which rely on predefined paths and protocols, OpenFlow flow tables provide flexibility and programmability, allowing dynamic changes in how packets are handled based on real-time network conditions.

### Understanding Routing Tables

Routing tables are a staple of traditional networking, used by routers to determine the best path for forwarding packets to their final destinations. These tables consist of a list of network destinations, with associated metrics that help in selecting the most efficient route. Routing protocols such as OSPF, BGP, and RIP are employed to maintain and update these tables, ensuring that data flows smoothly across the interconnected web of networks. While reliable, routing tables are less flexible compared to OpenFlow flow tables, as changes in network traffic patterns require updates to routing protocols and configurations.

Example of a Routing Table 

RIP Configuration RIP configuration

### Key Differences Between OpenFlow Flow Table and Routing Table

The primary distinction between OpenFlow flow tables and routing tables lies in their approach to network management. OpenFlow flow tables are dynamic and programmable, allowing for real-time adjustments and fine-grained control over traffic flows. This makes them ideal for environments where network agility and customization are paramount. Conversely, routing tables offer a more static and predictable method of packet forwarding, which can be beneficial in stable networks where consistency and reliability are prioritized.

### Use Cases: When to Use Each

OpenFlow flow tables are particularly advantageous in software-defined networking (SDN) environments, where network administrators need to quickly adapt to changing conditions and optimize traffic flows. They are well-suited for data centers, virtualized networks, and scenarios requiring high levels of automation and scalability. On the other hand, traditional routing tables are best used in established networks with predictable traffic patterns, such as those found in enterprise or service provider settings, where reliability and stability are key.

You may find the following useful for pre-information:

  1. OpenFlow Protocol
  2. Network Traffic Engineering
  3. What is VXLAN
  4. SDN Adoption Report
  5. Virtual Device Context

What is OpenFlow?

What is OpenFlow?

OpenFlow was the first protocol of the Software-Defined Networking (SDN) trend and is the only protocol that allows the decoupling of a network device’s control plane from the data plane. In most straightforward terms, the control plane can be thought of as the brains of a network device. On the other hand, the data plane can be considered hardware or application-specific integrated circuits (ASICs) that perform packet forwarding.

Numerous devices also support running OpenFlow in a hybrid mode, meaning OpenFlow can be deployed on a given port, virtual local area network (VLAN), or even within a regular packet-forwarding pipeline such that if there is not a match in the OpenFlow table, then the existing forwarding tables (MAC, Routing, etc.) are used, making it more analogous to Policy Based Routing (PBR).

What is OpenFlow
Diagram: What is OpenFlow? The source is cable solutions.

What is SDN?

Despite various modifications to the underlying architecture and devices (such as switches, routers, and firewalls), traditional network technologies have existed since the inception of networking. Using a similar approach, frames and packets have been forwarded and routed in a limited manner, resulting in low efficiency and high maintenance costs. Consequently, the architecture and operation of networks need to evolve, resulting in SDN.

By enabling network programmability, SDN promises to simplify network control and management and allow innovation in computer networking. Network engineers configure policies to respond to various network events and application scenarios. They can achieve the desired results by manually converting high-level policies into low-level configuration commands.

Often, minimal tools are available to accomplish these very complex tasks. Controlling network performance and tuning network management are challenging and error-prone tasks.

A modern network architecture consists of a control plane, a data plane, and a management plane; the control and data planes are merged into a machine called Inside the Box. To overcome these limitations, programmable networks have emerged.

How OpenFlow Works:

At the core of OpenFlow is the concept of a flow table, which resides in each OpenFlow-enabled switch. The flow table contains match-action rules defining how incoming packets should be processed and forwarded. The centralized controller determines these rules and communicates using the OpenFlow protocol with the switches.

When a packet arrives at an OpenFlow-enabled switch, it is first matched against the rules in the flow table. If a match is found, the corresponding action is executed, including forwarding the packet, dropping it, or sending it to the controller for further processing. This decoupling of the control and data planes allows for flexible and programmable network management.

What is OpenFlow SDN?

The main goal of SDN is to separate the control and data planes and transfer network intelligence and state to the control plane. These concepts have been exploited by technologies like Routing Control Platform (RCP), Secure Architecture for Network Enterprise (SANE), and, more recently, Ethane.

In addition, there is often a connection between SDN and OpenFlow. The Open Networking Foundation (ONF) is responsible for advancing SDN and standardizing OpenFlow, whose latest version is 1.5.0.

An SDN deployment starts with these building blocks.

For communication with forwarding devices, the controller has the SDN switch (for example, an OpenFlow switch), the SDN controller, and the interfaces. An SDN deployment is based on two basic building blocks: a southbound interface (OpenFlow) and a northbound interface (the network application interface).

As the control logic and algorithms are offloaded to a controller, switches in SDNs may be represented as basic forwarding hardware. Switches that support OpenFlow come in two varieties: pure (OpenFlow-only) and hybrid (OpenFlow-enabled).

Pure OpenFlow switches do not have legacy features or onboard control for forwarding decisions. A hybrid switch can operate with both traditional protocols and OpenFlow. Hybrid switches make up the majority of commercial switches available today. In an OpenFlow switch, a flow table performs packet lookup and forwarding.

OpenFlow reference switch

The OpenFlow protocol and interface allow OpenFlow switches to be accessed as essential forwarding elements. A flow-based SDN architecture like OpenFlow simplifies switching hardware. Still, it may require additional forwarding tables, buffer space, and statistical counters that are difficult to implement in traditional switches with integrated circuits tailored to specific applications.

There are two types of switches in an OpenFlow network: hybrids (which enable OpenFlow) and pores (which only support OpenFlow). OpenFlow is supported by hybrid switches and traditional protocols (L2/L3). OpenFlow switches rely entirely on a controller for forwarding decisions and do not have legacy features or onboard control.

Hybrid switches are the majority of the switches currently available on the market. This link must remain active and secure because OpenFlow switches are controlled over an open interface (through a TCP-based TLS session). OpenFlow is a messaging protocol that defines communication between OpenFlow switches and controllers, which can be viewed as an implementation of SDN-based controller-switch interactions.

Openflow switch
Diagram: OpenFlow switch. The source is cable solution.

Identify the Benefits of OpenFlow

Application-driven routing. Users can control the network paths.

The networks paths.A way to enhance link utilization.

An open solution for VM mobility. No VLAN reliability.

A means to traffic engineer without MPLS.

A solution to build very large Layer 2 networks.

A way to scale Firewalls and Load Balancers.

A way to configure an entire network as a whole as opposed to individual entities.

A way to build your own encryption solution. Off-the-box encryption.

A way to distribute policies from a central controller.

Customized flow forwarding. Based on a variety of bit patterns.

A solution to get a global view of the network and its state. End-to-end visibility.

A solution to use commodity switches in the network. Massive cost savings.

The following table lists the Software Networking ( SDN ) benefits and the problems encountered with existing control plane architecture:

Identify the benefits of OpenFlow and SDN

Problems with the existing approach

Faster software deployment.

Large scale provisioning and orchestration.

Programmable network elements.

Limited traffic engineering ( MPLS TE is cumbersome )

Faster provisioning.

Synchronized distribution policies.

Centralized intelligence with centralized controllers.

Routing of large elephant flows.

Decisions are based on end-to-end visibility.

Qos and load based forwarding models.

Granular control of flows.

Ability to scale with VLANs.

Decreases the dependence on network appliances like load balancers.

**A key point: The lack of a session layer in the TCP/IP stack**

Regardless of the hype and benefits of SDN, neither OpenFlow nor other SDN technologies address the real problems of the lack of a session layer in the TCP/IP protocol stack. The problem is that the client’s application ( Layer 7 ) connects to the server’s IP address ( Layer 3 ), and if you want to have persistent sessions, the server’s IP address must remain reachable. 

This session’s persistence and the ability to connect to multiple Layer 3 addresses to reach the same device is the job of the OSI session layer. The session layer provides the services for opening, closing, and managing a session between end-user applications. In addition, it allows information from different sources to be correctly combined and synchronized.

The problem is the TCP/IP reference module does not consider a session layer, and there is none in the TCP/IP protocol stack. SDN does not solve this; it gives you different tools to implement today’s kludges.

what is openflow
What is OpenFlow? Lack of a session layer

Control and data plane

When we identify the benefits of OpenFlow, let us first examine traditional networking operations. Traditional networking devices have a control and forwarding plane, depicted in the diagram below. The control plane is responsible for setting up the necessary protocols and controls so the data plane can forward packets, resulting in end-to-end connectivity. These roles are shared on a single device, and the fast packet forwarding ( data path ) and the high-level routing decisions ( control path ) occur on the same device.

What is OpenFlow | SDN separates the data and control plane?

**Control plane**

The control plane is part of the router architecture and is responsible for drawing the network map in routing. When we mention control planes, you usually think about routing protocols, such as OSPF or BGP. But in reality, the control plane protocols perform numerous other functions, including:

Connectivity management ( BFD, CFM )

Interface state management ( PPP, LACP )

Service provisioning ( RSVP for InServ or MPLS TE)

Topology and reachability information exchange ( IP routing protocols, IS-IS in TRILL/SPB )

Adjacent device discovery via HELLO mechanism

ICMP

Control plane protocols run over data plane interfaces to ensure “shared fate” – if the packet forwarding fails, the control plane protocol fails as well.

Most control plane protocols ( BGP, OSPF, BFD ) are not data-driven. A BGP or BFD packet is never sent as a direct response to a data packet. There is a question mark over the validity of ICMP as a control plane protocol. The debate is whether it should be classed in the control or data plane category.

Some ICMP packets are sent as replies to other ICMP packets, and others are triggered by data plane packets, i.e., data-driven. My view is that ICMP is a control plane protocol that is triggered by data plane activity. After all, the “C” is ICMP does stand for “Control.”

**Data plane**

The data path is part of the routing architecture that decides what to do when a packet is received on its inbound interface. It is primarily focused on forwarding packets but also includes the following functions:

ACL logging

 Netflow accounting

NAT session creation

NAT table maintenance

The data forwarding is usually performed in dedicated hardware, while the additional functions ( ACL logging, Netflow accounting ) typically happen on the device CPU, commonly known as “punting.” The data plane for an OpenFlow-enabled network can take a few forms.

However, the most common, even in the commercial offering, is the Open vSwitch, often called the OVS. The Open vSwitch is an open-source implementation of a distributed virtual multilayer switch. It enables a switching stack for virtualization environments while supporting multiple protocols and standards.

Software-defined networking changes the control and data plane architecture.

The concept of SDN separates these two planes, i.e., the control and forwarding planes are decoupled. This allows the networking devices in the forwarding path to focus solely on packet forwarding. An out-of-band network uses a separate controller ( orchestration system ) to set up the policies and controls. Hence, the forwarding plane has the correct information to forward packets efficiently.

In addition, it allows the network control plane to be moved to a centralized controller on a server instead of residing on the same box carrying out the forwarding. Moving the intelligence ( control plane ) of the data plane network devices to a controller enables companies to use low-cost, commodity hardware in the forwarding path. A significant benefit is that SDN separates the data and control plane, enabling new use cases.

A centralized computation and management plane makes more sense than a centralized control plane.

The controller maintains a view of the entire network and communicates with Openflow ( or, in some cases, BGP with BGP SDN ) with the different types of OpenFlow-enabled network boxes. The data path portion remains on the switch, such as the OVS bridge, while the high-level decisions are moved to a separate controller. The data path presents a clean flow table abstraction, and each flow table entry contains a set of packet fields to match, resulting in specific actions ( drop, redirect, send-out-port ).

When an OpenFlow switch receives a packet it has never seen before and doesn’t have a matching flow entry, it sends the packet to the controller for processing. The controller then decides what to do with the packet.

Applications could then be developed on top of this controller, performing security scrubbing, load balancing, traffic engineering, or customized packet forwarding. The centralized view of the network simplifies problems that were harder to overcome with traditional control plane protocols.

A single controller could potentially manage all OpenFlow-enabled switches. Instead of individually configuring each switch, the controller can push down policies to multiple switches simultaneously—a compelling example of many-to-one virtualization.

Now that SDN separates the data and control plane, the operator uses the centralized controller to choose the correct forwarding information per-flow basis. This allows better load balancing and traffic separation on the data plane. In addition, there is no need to enforce traffic separation based on VLANs, as the controller would have a set of policies and rules that would only allow traffic from one “VLAN” to be forwarded to other devices within that same “VLAN.”

The advent of VXLAN

With the advent of VXLAN, which allows up to 16 million logical entities, the benefits of SDN should not be purely associated with overcoming VLAN scaling issues. VXLAN already does an excellent job with this. It does make sense to deploy a centralized control plane in smaller independent islands; in my view, it should be at the edge of the network for security and policy enforcement roles. Using Openflow on one or more remote devices is easy to implement and scale.

It also decreases the impact of controller failure. If a controller fails and its sole job is implementing packet filters when a new user connects to the network, the only affecting element is that the new user cannot connect. If the controller is responsible for core changes, you may have interesting results with a failure. New users not being able to connect is bad, but losing your entire fabric is not as bad.

Spanning tree VXLAN
Diagram: Loop prevention. Source is Cisco

A traditional networking device runs all the control and data plane functions. The control plane, usually implemented in the central CPU or the supervisor module, downloads the forwarding instructions into the data plane structures. Every vendor needs communications protocols to bind the two planes together to download forward instructions. 

Therefore, all distributed architects need a protocol between control and data plane elements. The protocol binding this communication path for traditional vendor devices is not open-source, and every vendor uses its proprietary protocol (Cisco uses IPC—InterProcess Communication ).

Openflow tries to define a standard protocol between the control plane and the associated data plane. When you think of Openflow, you should relate it to the communication protocol between the traditional supervisors and the line cards. OpenFlow is just a low-level tool.

OpenFlow is a control plane ( controller ) to data plane ( OpenFlow enabled device ) protocol that allows the control plane to modify forwarding entries in the data plane. It enables SDN to separate the data and control planes.

identify the benefits of openflow

Proactive versus reactive flow setup

OpenFlow operations have two types of flow setups: Proactive and Reactive.

With Proactive, the controller can populate the flow tables ahead of time, similar to a typical routing. However, the packet-in event never occurs by pre-defining your flows and actions ahead of time in the switch’s flow tables. The result is all packets are forwarded at line rate. With Reactive, the network devices react to traffic, consult the OpenFlow controller, and create a rule in the flow table based on the instruction. The problem with this approach is that there can be many CPU hits.

OpenFlow protocol

The following table outlines the critical points for each type of flow setup:

Proactive flow setup

Reactive flow setup

Works well when the controller is emulating BGP or OSPF.

 Used when no one can predict when and where a new MAC address will appear.

The controller must first discover the entire topology.

 Punts unknown packets to the controller. Many CPU hits.

Discover endpoints ( MAC addresses, IP addresses, and IP subnets )

Compute forwarding paths on demand. Not off the box computation.

Compute off the box optimal forwarding.

 Install flow entries based on actual traffic.

Download flow entries to the data plane switches.

Has many scalability concerns such as packet punting rate.

No data plane controller involvement with the exceptions of ARP and MAC learning. Line-rate performance.

 Not a recommended setup.

Hop-by-hop versus path-based forwarding

The following table illustrates the key points for the two types of forwarding methods used by OpenFlow: hop-by-hop forwarding and path-based forwarding:

Hop-by-hop Forwarding

 Path-based Forwarding

Similar to traditional IP Forwarding.

Similar to MPLS.

Installs identical flows on each switch on the data path.

Map flows to paths on ingress switches and assigns user traffic to paths at the edge node

Scalability concerns relating to flow updates after a change in topology.

Compute paths across the network and installs end-to-end path-forwarding entries.

Significant overhead in large-scale networks.

Works better than hop-by-hop forwarding in large-scale networks.

FIB update challenges. Convergence time.

Core switches don’t have to support the same granular functionality as edge switches.

Obviously, with any controller, the controller is a lucrative target for attack. Anyone who knows you are using a controller-based network will try to attack the controller and its control plane. The attacker may attempt to intercept the controller-to-switch communication and replace it with its commands, essentially attacking the control plane with whatever means they like.

An attacker may also try to insert a malformed packet or some other type of unknown packet into the controller ( fuzzing attack ), exploiting bugs in the controller and causing the controller to crash. 

Fuzzing attacks can be carried out with application scanning software such as Burp Suite. It attempts to manipulate data in a particular way, breaking the application.

The best way to tighten security is to encrypt switch-to-controller communications with SSL and self-signed certificates to authenticate the switch and controller. It would also be best to minimize interaction with the data plane, except for ARP and MAC learning.

To prevent denial-of-service attacks on the controller, you can use Control Plane Policing ( CoPP ) on Ingress to avoid overloading the switch and the controller. Currently, NEC is the only vendor implementing CoPP.

sdn separates the data and control plane

The Hybrid deployment model is helpful from a security perspective. For example, you can group specific ports or VLANs to OpenFlow and other ports or VLANs to traditional forwarding, then use traditional forwarding to communicate with the OpenFlow controller.

Software-defined networking or traditional routing protocols?

The move to a Software-Defined Networking architecture has clear advantages. It’s agile and can react quickly to business needs, such as new product development. For businesses to succeed, they must have software that continues evolving.

Otherwise, your customers and staff may lose interest in your product and service. The following table displays the advantages and disadvantages of the existing routing protocol control architecture.

+Reliable and well known.

-Non-standard Forwarding models. Destination-only and not load-aware metrics**

+Proven with 20 plus years field experience.

 -Loosely coupled.

+Deterministic and predictable.

-Lacks end-to-end transactional consistency and visibility.

+Self-Healing. Traffic can reroute around a failed node or link.

-Limited Topology discovery and extraction. Basic neighbor and topology tables.

+Autonomous.

-Lacks the ability to change existing control plane protocol behavior.

+Scalable.

-Lacks the ability to introduce new control plane protocols.

+Plenty of learning and reading materials.

** Basic EIGRP IETF originally proposed an Energy-Aware Control Plane, but the IETF later removed this.

Software-Defined Networking: Use Cases

Edge Security policy enforcement at the network edge.

Authenticate users or VMs and deploy per-user ACL before connecting a user to the network.

Custom routing and online TE.

The ability to route on a variety of business metrics aka routing for dollars. Allowing you to override the default routing behavior.

Custom traffic processing.

For analytics and encryption.

Programmable SPAN ports

 Use Openflow entries to mirror selected traffic to the SPAN port.

DoS traffic blackholing & distributed DoS prevention.

Block DoS traffic as close to the source as possible with more selective traffic targeting than the original RTBH approach**. The traffic blocking is implemented in OpenFlow switches. Higher performance with significantly lower costs.

Traffic redirection and service insertion.

Redirect a subset of traffic to network appliances and install redirection flow entries wherever needed.

Network Monitoring.

 The controller is the authoritative source of information on network topology and Forwarding paths.

Scale-Out Load Balancing.

Punt new flows to the Openflow controller and install per-session entries throughout the network.

IPS Scale-Out.

OpenFlow is used to distribute the load to multiple IDS appliances.

**Remote-Triggered Black Hole: RTBH refers to installing a host route to a bogus IP address ( RTBH address ) pointing to NULL interfaces on all routers. BGP is used to advertise the host routes to other BGP peers of the attacked hosts, with the next hop pointing to the RTBH address, and it is mainly automated in ISP environments.

SDN deployment models

Guidelines:

  1. Start with small deployments away from the mission-critical production path, i.e., the Core. Ideally, start with device or service provisioning systems.
  2. Start at the Edge and slowly integrate with the Core. Minimize the risk and blast radius. Start with packet filters at the Edge and tasks that can be easily automated ( VLANs ).
  3. Integrate new technology with the existing network.
  4. Gradually increase scale and gain trust. Experience is key.
  5. Have the controller in a protected out-of-band network with SSL connectivity to the switches.

There are 4 different models for OpenFlow deployment, and the following sections list the key points of each model.

Native OpenFlow 

  • They are commonly used for Greenfield deployments.
  • The controller performs all the intelligent functions.
  • The forwarding plane switches have little intelligence and solely perform packet forwarding.
  • The white box switches need IP connectivity to the controller for the OpenFlow control sessions. If you are forced to use an in-band network for this communication path using an isolated VLAN with STP, you should use an out-of-band network.
  • Fast convergence techniques such as BFD may be challenging to use with a central controller.
  • Many people believe that this approach does not work for a regular company. Companies implementing native OpenFlow, such as Google, have the time and resources to reinvent the wheel when implementing a new control-plane protocol ( OpenFlow ).

Native OpenFlow with Extensions

  • Some control plane functions are handled from the centralized controller to the forwarding plane switches. For example, the OpenFlow-enabled switches could load balancing across multiple links without the controller’s previous decision. You could also run STP, LACP, or ARP locally on the switch without interaction with the controller. This approach is helpful if you lose connectivity to the controller. If the low-level switches perform certain controller functions, packet forwarding will continue in the event of failure.
  • The local switches should support the specific OpenFlow extensions that let them perform functions on the controller’s behalf.

Hybrid ( Ships in the night )

  • This approach is used where OpenFlow runs in parallel with the production network.
  • The same network box is controlled by existing on-box and off-box control planes ( OpenFlow).
  • Suitable for pilot deployment models as switches still run traditional control plane protocols.
  • The Openflow controller manages only specific VLANs or ports on the network.
  • The big challenge is determining and investigating the conflict-free sharing of forwarding plane resources across multiple control planes.

Integrated OpenFlow

  • OpenFlow classifiers and forwarding entries are integrated with the existing control plane. For example, Juniper’s OpenFlow model follows this mode of operation where OpenFlow static routes can be redistributed into the other routing protocols.
  • No need for a new control plane.
  • No need to replace all forwarding hardware
  • It is the most practical approach as long as the vendor supports it.

Closing Points on OpenFlow

OpenFlow is a communication protocol that provides access to the forwarding plane of a network switch or router over the network. It was initially developed at Stanford University and has since been embraced by the Open Networking Foundation (ONF) as a core component of SDN. OpenFlow allows network administrators to program the control plane, enabling them to direct how packets are forwarded through the network. This decoupling of the control and data planes is what empowers SDN to offer more dynamic and flexible network management.

The architecture of OpenFlow is quite straightforward yet powerful. It consists of three main components: the controller, the switch, and the protocol itself. The controller is the brains of the operation, managing network traffic by sending instructions to OpenFlow-enabled switches. These switches, in turn, execute the instructions received, altering the flow of network data accordingly. This setup allows for centralized network control, offering unprecedented levels of automation and agility.

**Advantages of OpenFlow**

OpenFlow brings several critical advantages to network management and control:

1. Flexibility and Programmability: With OpenFlow, network administrators can dynamically reconfigure the behavior of network devices, allowing for greater adaptability to changing network requirements.

2. Centralized Control: By centralizing control in a single controller, network administrators gain a holistic view of the network, simplifying management and troubleshooting processes.

3. Innovation and Experimentation: OpenFlow enables researchers and developers to experiment with new network protocols and applications, fostering innovation in the networking industry.

4. Scalability: OpenFlow’s centralized control architecture provides the scalability needed to manage large-scale networks efficiently.

**Implications for Network Control**

OpenFlow has significant implications for network control, paving the way for new possibilities in network management:

1. Software-Defined Networking (SDN): OpenFlow is a critical component of the broader concept of SDN, which aims to decouple network control from the underlying hardware, providing a more flexible and programmable infrastructure.

2. Network Virtualization: OpenFlow facilitates network virtualization, allowing multiple virtual networks to coexist on a single physical infrastructure.

3. Traffic Engineering: By controlling the flow of packets at a granular level, OpenFlow enables advanced traffic engineering techniques, optimizing network performance and resource utilization.

OpenFlow represents a paradigm shift in network control, offering a more flexible, scalable, and programmable approach to managing networks. By separating the control and data planes, OpenFlow empowers network administrators to have fine-grained control over network behavior, improving efficiency, innovation, and adaptability. As the networking industry continues to evolve, OpenFlow and its related technologies will undoubtedly play a crucial role in shaping the future of network management.

Summary: What is OpenFlow?

In the rapidly evolving world of networking, OpenFlow has emerged as a game-changer. This revolutionary technology has transformed the way networks are managed, offering unprecedented flexibility, control, and efficiency. In this blog post, we will delve into the depths of OpenFlow, exploring its definition, key features, and benefits.

What is OpenFlow?

OpenFlow can be best described as an open standard communications protocol that enables the separation of the control plane and the data plane in network devices. It allows centralized control over a network’s forwarding elements, making it possible to program and manage network traffic dynamically. By decoupling the intelligence of the network from the underlying hardware, OpenFlow provides a flexible and programmable infrastructure for network administrators.

Key Features of OpenFlow

a) Centralized Control: One of the core features of OpenFlow is its ability to centralize network control, allowing administrators to define and implement policies from a single point of control. This centralized control improves network visibility and simplifies management tasks.

b) Programmability: OpenFlow’s programmability empowers network administrators to define how network traffic should be handled based on their specific requirements. Through the use of flow tables and match-action rules, administrators can dynamically control the behavior of network switches and routers.

c) Software-Defined Networking (SDN) Integration: OpenFlow plays a crucial role in the broader concept of Software-Defined Networking. It provides a standardized interface for SDN controllers to communicate with network devices, enabling dynamic and automated network provisioning.

Benefits of OpenFlow

a) Enhanced Network Flexibility: With OpenFlow, network administrators can easily adapt and customize their networks to suit evolving business needs. The ability to modify network behavior on the fly allows for efficient resource allocation and improved network performance.

b) Simplified Network Management: By centralizing network control, OpenFlow simplifies the management of complex network architectures. Policies and configurations can be applied uniformly across the network, reducing administrative overhead and minimizing the chances of configuration errors.

c) Innovation and Experimentation: OpenFlow fosters innovation by providing a platform for the development and deployment of new network protocols and applications. Researchers and developers can experiment with novel networking concepts, paving the way for future advancements in the field.

Conclusion

OpenFlow has ushered in a new era of network management, offering unparalleled flexibility and control. Its ability to separate the control plane from the data plane, coupled with centralized control and programmability, has opened up endless possibilities in network architecture design. As organizations strive for more agile and efficient networks, embracing OpenFlow and its associated technologies will undoubtedly be a wise choice.

ip routing

Advances of IP routing and Cloud

 

ip routing

 

With the introduction and hype around Software Defined Networking ( SDN ) and Cloud Computing, one would assume that there has been little or no work with the advances in IP routing. You could say that the cloud has clouded the mind of the market. Regardless of the hype around this transformation, routing is still very much alive and makes up a vital part of the main internet statistics you can read. Packets still need to get to their destinations.

 

Advanced in IP Routing

The Internet Engineering Task Force (IETF) develops and promotes voluntary internet standards, particularly those that comprise the Internet Protocol Suite (TCP/IP). The IETF shapes what comes next, and this is where all the routing takes place. It focuses on anything between the physical layer and the application layer. It doesn’t focus on the application itself, but on the technologies used to transport it, for example, HTTP.

In the IETF, no one is in charge, anyone can contribute, and everyone can benefit. As you can see from the chart below, that routing ( RTG ) has over 250 active drafts and is the most popular working group within the IETF.

 

 

IP routinng
Diagram: IETF Work Distribution.

 

The routing area is responsible for ensuring the continuous operation of the Internet routing system by maintaining the scalability and stability characteristics of the existing routing protocols and developing new protocols, extensions, and bug fixes promptly

The following table illustrates the subgroups of the RTG working group:

Bidirectional Forwarding Detection (BFD) Open Shortest Path First IGP (OSPF)
Forwarding and Control Element Separation (forces) Routing Over Low power and Lossy networks (roll)
Interface to the Routing System (i2rs) Routing Area Working Group (RTGW)
Inter-Domain Routing (IDR) Secure Inter-Domain Routing (SCIDR)
IS-IS for IP Internets (isis) Source Packet Routing in Networking (spring)
Keying and Authentication for Routing Protocols (Karp)
Mobile Ad-hoc Networks (manet)

The chart below displays the number of drafts per subgroup of the routing area. There has been a big increase in the subgroup “roll,” which is second to BGP. “Roll” relates to “Routing Over Low power and Lossy networks” and is driven by the Internet of Everything and Machine-to-Machine communication.

 

IP ROUTING
Diagram: RTG Ongoing Work.

 

OSPF Enhancements

OSPF is a link-state protocol that uses a common database to determine the shortest path to any destination.

Two main areas of interest in the Open Shortest Path First IGP (OSPF) subgroups are OSPFv3 LSA Extendibility and Remote Loop-Free Alternatives ( LFAs ). One benefit IS-IS has over OSPF is its ability to easily introduce new features with the inclusion of Type Length Values ( TLVs ) and sub-TLVs. The IETF draft-IETF-OSPF-ospfv3-lsa-extend extends the LSA format by allowing the optional inclusion of TLVs, making OSPF more flexible and extensible. For example, OSPFv3 uses a new TLV to support intra-area Traffic Engineering ( TE ), while OSPFv2 uses an opaque LSA.

 

TLV for OSPFv3
Diagram: TLV for OSPFv3.

 

Another shortcoming of OSPF is that it does not install a backup route in the routing table by default. Having a pre-installed backup up path greatly improves convergence time. With pre-calculated backup routes already installed in the routing table, the router process does not need to go through the convergence process’s LSA propagation and SPF calculation steps.

 

Loop-free alternatives (LFA)

Loop-Free Alternatives ( LFA ), known as Feasible Successors in EIGRP, are local router decisions to pre-install a backup path.
In the diagram below:

-Router A has a primary ( A-C) and secondary ( A-B-C) path to 10.1.1.0/24
-Link State allows Router A to know the entire topology
-Router A should know that Router B is an alternative path. Router B is a Loop-Free Alternate for destination 10.1.1.0/24

OSPF LFA
Diagram: OSPF LFA.

 

This is not done with any tunneling, and the backup route needs to exist for it to be used by the RIB. If the second path doesn’t exist in the first place, the OSPF process cannot install a Loop-Free Alternative. The LFA process does not create backup routes if they don’t already exist. An LFA is simply an alternative loop-free route calculated at any network router.

A drawback of LFA is that it cannot work in all topologies. This is most notable in RING topologies. The answer is to tunnel and to get the traffic past the point where it will loop. This effectively makes the RING topology a MESH topology. For example, the diagram below recognizes that we must tunnel traffic from A to C. The tunnel type doesn’t matter – it could be a GRE tunnel, an MPLS tunnel, an IP-in-IP tunnel, or just about any other encapsulation.

 

In this network:

-Router A’s best path through E
-Routers C’s best path is through D
-Router A must forward traffic directly to C to prevent packets from looping back.

Remote LFA
Diagram: Remote LFA.

 

In the preceding example, we will look at “Remote LFA,” which leverages an MPLS network and Label Distribution Protocol ( LDP ) for label distribution. If you use Traffic Engineering ( TE ), it’s called “TE Fast ReRoute” and not “Remote LFA.” There is also a hybrid model combining Remote LFA and TE Fast ReRoute, and is used only when the above cannot work efficiently due to a complex topology or corner case scenario.

Remote LFAs extend the LFA space to “tunneled neighbors”.

– Router A runs a constrained SPF and finds C is a valid LFA

– Since C is not directly connected, Router A must tunnel to C

a) Router A uses LDP to configure an MPLS path to C

b) Installs this alternate path as an LFA in the CEF table

– If the A->E link fails.

a) Router A begins forwarding traffic along the LDP path

The total time for convergence usually takes 10ms.

Remote LFA has some topology constraints. For example, they cannot be calculated across a flooding domain boundary, i.e., an ABR in OSPF or L1/L2 boundary is IS-IS. However, they work in about 80% of all possible topologies and 90% of production topologies.

 

BGP Enhancements

BGP is a scalable distance vector protocol that runs on top of TCP. It uses a path algorithm to determine the best path to install in the IP routing table and for IP forwarding.

 

Recap BGP route advertisement:

  • RR client can send to a RR client.
  • RR client can send to a non RR client.
  • A non-RR client cannot send to a non-RR client.

One drawback to the default BGP behavior is that it only advertises the best route. When a BGP Route Reflector receives multiple paths to the same destination, it will advertise only one of those routes to its neighbors.

This can limit the visibility in the network and affect the best path selection used for hot potato routing when you want traffic to leave your AS as quickly as possible. In addition, all paths to exit an AS are not advertised to all peers, basically hiding ( not advertising ) some paths to exit the network.

The diagram below displays default BGP behavior; the RR receives two routes from PE2 and PE3 about destination Z; due to the BGP best path mechanism, it only advertises one of those paths to PE1. 

Route Reflector - Default
Diagram: Route Reflector – Default.

 

In certain designs, you could advertise the destination CE with different Route Distinguishers (RDs), creating two instances for the same destination prefix. This would allow the RR to send two paths to PE.

 

Diverse BGP path distribution

Another new feature is diverse BGP Path distribution, where you can create a shadow BGP session to the RR. It is easy to deploy, and the diverse iBGP session will announce the 2nd best path. Shadow BGP sessions are especially useful in virtualized deployments, where you can create another BGP session to a VM acting as a Route-Reflector. The VM can then be scaled out in a virtualized environment creating numerous BGP sessions. You are allowing the advertisements of multiple paths for each destination prefix.

Route Reflector - Shadow Sessions
Diagram: Route Reflector – Shadow Sessions.

 

BGP Add-path 

A special extension to BGP known as “Add Paths” allows BGP speakers to propagate and accept multiple paths for the same prefix. The BGP Add-Path feature will signal diverse paths, so you don’t need to create shadow BGP sessions. There is a requirement that all Add-Path receiver BGP routers must support the Add-Path capability.

There are two flavors of the Add-Path capability, Add-n-path, and Add-all-path. The “Add-n-path” will add 2 or 3 paths depending on the IOS version. With “Add-all-path,” the route reflector will do the primary best path computation (only on the first path) and then send all paths to the BR/PE. This is useful for large ECMP load balancing, where you need multiple existing paths for hot potato routing.

BGP Add Path
Diagram: BGP Add Path

 

Source packet routing

Another interesting draft the IETF is working on is Source Packet Routing ( spring ). Source Packet Routing is the ability of a node to specify a forwarding path. As the packet arrives in the network, the edge device looks at the application, determines what it needs, and predicts its path throughout the network. Segment routing leverages the MPLS data plane, i.e., push, swap, and pop controls, without needing LDP or RSVP-TE. This avoids millions of labels in the LDP database or TE LSPs in the networks.

 

Application Controls - Network DeliversDiagram: Application Controls – Network Delivers 

The complexity and state are now isolated to the network’s edges, and the middle nodes are only swapping labels. The source routing is based on the notion of a 32-bit segment that can represent any instructions, such as service, context, or IGP-based forwarding construct. This results in an ordered chain of topological and service instructions where the ingress node pushes the segment list on the packet.

 

Prefix Hijacking in BGP

BGP hijacking revolves around locating an ISP that is not filtering advertisements, or its misconfiguration makes it susceptible to a man-in-the-middle attack. Once located, an attacker can advertise any prefix they want, causing some or all traffic to be diverted from the real source towards the attacker.

In February 2008, a large portion of YouTube’s address space was redirected to Pakistan when the Pakistan Telecommunication Authority ( PTA ) decided to restrict access to YouTube.com inside the country but accidentally blackholed the route in the global BGP table.

These events and others have led the Secure-Inter Domain Routing Group ( SIDR ) to address the following two vulnerabilities in BGP:

-Is the AS authorized to originate an IP prefix?

-Is the AS-Path represented in the route the same as the path through which the NLRI traveled?

This lockdown of BGP has three solution components:

 

RPKI Infrastructure Offline repository of verifiable secure objects based on public-key cryptography
Follows resources (IPv4/v6 + ASN) allocation hierarchy to provide “right of use”
BGP Secure Origin AS You only validate the Origin AS of a BGP UPDATE
Solves most frequent incidents (*)
No changes to BGP nor the router’s hardware impact
Standardization is almost finished and running code
BGP PATH Validation BGPSEC proposal under development at IETF
Requires forward signing AS-PATH attribute
Changes in BGP and possible routers

The roll-out and implementation should be gradual and create islands of trust worldwide. These islands of trust will eventually interconnect together, making BGP more secure.

The table below displays the RPKI Deployment State;

RIR Total Valid Invalid Unknown Accuracy RPKI Adoption Rate
AFRINIC 100% .44% .42% 99.14% 51.49% .86%
APNIC 100% .22% .24% 99.5% 48.32% .46%
ARIN 100% .4% .14% 99.46% 74.65% .54%
LACNIC 100% 17.84% 2.01% 80.15% 89.87% 19.85%
RIPE NCC 100% 6.7% 0.59% 92.71% 91.92% 7.29%

Cloud Enhancements – The Intercloud

Today’s clouds have crossed well beyond the initial hype, and applications are now offered as on-demand services ( anything-as-a-service [XaaS] ). These services are making significant cost savings, and the cloud transition is shaping up to be as powerful as the previous one – the Internet. The Intercloud and the Internet of Things are the two new big clouds of the future.

Currently, the cloud market is driven by two spaces – the public cloud ( off-premise ) and the private cloud (on-premise). The intercloud takes the concept of cloud much further and attempts to connect multiple public clouds. A single application that could integrate services and workloads from ten or more clouds would create opportunities and potentially alter the cloud market landscape significantly. Hence, it is important to know and understand the need for cloud migration and its related problems.

We are already beginning to see signs of this in the current market. Various applications, such as Spotify and Google Maps, authenticate unregistered users with their Facebook credentials. Another use case is a cloud IaaS provider could divert incoming workload to another provider if it doesn’t have the resources to serve the incoming requests, essentially cloud bursting from provider to provider. It would also make economic sense to move workload and services between cloud providers based on cooling costs ( follow the moon ). Or maybe dynamically move workloads between providers, so they are closest to the active user base ( follow the sun )

The following diagram displays a Dynamic Workload Migration between two Cloud companies.

 

Intercloud
Diagram: Intercloud.

 

A: Cloud 1 finds Cloud 2 -Naming, Presence
B: Cloud 1 Trusts Cloud 2 -Certificates, Trustsec
C: Both Cloud 1 and 2 negotiate -Security, Policy
D: Cloud 1 sets up Cloud 2 -Placement, Deployment
E: Cloud 1 sends to Cloud 2 -VM runs in cloud-Addressing, configurations

The concept of Intercloud was difficult to achieve with the previous version of vSphere based on the restriction of latency for VMotion to operate efficiently. Now vSphere v6 can tolerate 100 msec of RTT.

InterCloud is still a conceptual framework, and the following questions must be addressed before it can be moved from concept to production.

1) Intercloud security

2) Intercloud SLA management

3) Interoperability across cloud providers.

 

Cisco’s One Platform Kit (onePK)

The One Platform Kit is Cisco’s answer to Software Defined Networking. It aims to provide simplicity and agility to a programmatic network. It’s a set of APIs driven by programming languages, such as C and Java, that are used to program the network. We currently have existing ways to program the network with EEM applets but lack an automation tool that can program multiple devices simultaneously. It’s the same with Performance Routing ( PfR ). PfR can program and traffic engineer the network by remotely changing metrics, but the decisions are still local and not controller-based.

 

Traffic engineering

One useful element of Cisco’s One Platform Kit is its ability to perform “Off box” traffic engineering, i.e., the computation is made outside the local routing device. It allows you to create route paths throughout the network without relying on default routing protocol behavior. For example, the cost is the default metric for route selection for equal-length routes in OSPF. This cannot be changed, which makes the routing decisions very static. In addition, Cisco’s One Platform Kit (onePK) allows you to calculate routes using different variables you set, giving you complete path control.

 

ip routing

container based virtualization

Cisco Switch Virtualization Nexus 1000v

Cisco Switch Virtualization Nexus 1000v

Virtualization has become integral to modern data centers in today's digital landscape. With the increasing demand for agility, flexibility, and scalability, organizations are turning to virtual networking solutions to meet their evolving needs. One such solution is the Nexus 1000v, a virtual network switch offering comprehensive features and functionalities. In this blog post, we will delve into the world of the Nexus 1000v, exploring its key features, benefits, and use cases.

The Nexus 1000v is a distributed virtual switch that operates at the hypervisor level, providing advanced networking capabilities for virtual machines (VMs). It is designed to integrate seamlessly with VMware vSphere, offering enhanced network visibility, control, and security.

Cisco Switch Virtualization is a revolutionary concept that allows network administrators to create multiple virtual switches on a single physical switch. By abstracting the network functions from the hardware, it provides enhanced flexibility, scalability, and efficiency. With Cisco Switch Virtualization, businesses can maximize resource utilization and simplify network management.

At the forefront of Cisco's Switch Virtualization portfolio is the Nexus 1000v. This powerful platform brings the benefits of virtualization to the data center, enabling seamless integration between virtual and physical networks. By extending Cisco's renowned networking capabilities into the virtual environment, Nexus 1000v empowers organizations to achieve consistent policy enforcement, enhanced security, and simplified operations.

The Nexus 1000v boasts a wide range of features that make it a compelling choice for network administrators. From advanced network segmentation and traffic isolation to granular policy control and deep visibility, this platform has it all. By leveraging the power of Cisco's Virtual Network Services (VNS), organizations can optimize their network infrastructure, streamline operations, and deliver superior performance.

Deploying Cisco Switch Virtualization, specifically the Nexus 1000v, requires careful planning and consideration. Organizations must evaluate their network requirements, ensure compatibility with existing infrastructure, and adhere to best practices. From designing a scalable architecture to implementing proper security measures, attention to detail is crucial to achieve a successful deployment.

To truly understand the impact of Cisco Switch Virtualization, it's essential to explore real-world use cases and success stories. From large enterprises to service providers, organizations across various industries have leveraged the power of Nexus 1000v to transform their networks. This section will highlight a few compelling examples, showcasing the versatility and value that Cisco Switch Virtualization brings to the table.

Highlights: Cisco Switch Virtualization Nexus 1000v

Hypervisor and vSphere Introduction

An operating system can run multiple operating systems on a single hardware host using a hypervisor, also known as a virtual machine manager. Operating systems use the host’s processor, memory, and other resources. Hypervisors control the host processor, memory, and other resources and allocate what each operating system needs. Hypervisors run guest operating systems or virtual machines on top of them.

Designed specifically for integration with VMware vSphere environments, the Cisco Nexus 1000V Series Switch runs Cisco NX-OS software. Enterprise-class performance, scalability, and scalability are delivered by VMware vSphere 2.0 across multiple platforms. Within the VMware ESX hypervisor, the Nexus 1000V runs. With the Cisco Nexus 1000V Series, you can take advantage of Cisco VN-Link server virtualization technology

• Policy-based virtual machine (VM) connectivity

• Mobile VM security

• Network policy

Nondisruptive operational model for your server virtualization and networking teams

As with physical servers, virtual servers can be configured with the same network configuration, security policy, diagnostic tools, and operational models as physical servers. The Cisco Nexus 1000V Series is also compatible with VMware vSphere, vCenter, ESX, and ESXi.

A brief overview of the Nexus 1000V system

There are two primary components of the Cisco Nexus 1000V Series switch:

  • VEM (Virtual Ethernet Module): Executes inside hypervisors
  • VSM (External Virtual Supervisor Module): Manages VEMs

Nexus 1000v implements a generic concept of Cisco Distributed Virtual Switch (DVS). VMware ESX or ESXi executes the Cisco Nexus 1000V Virtual Ethernet Module (VEM). The VEM’s application programming interface (API) is VMware vNetwork Distributed Switch (vDS).

By integrating the API with VMware VMotion and Distributed Resource Scheduler (DRS), advanced networking capabilities can be provided to virtual machines. In the VEM, Layer 2 switching and advanced networking functions are performed based on configuration information from the VSM:

Nexus Switch Virtualization

**Virtual routing and forwarding**

Virtual routing and forwarding form the basis of this stack. Firstly, network virtualization comes with two primary methods: 1) One too many and 2) Many to one.  The “one too many” network virtualization method means you segment one physical network into multiple logical segments. Conversely, the “many to one” network virtualization method consolidates numerous physical devices into one logical entity. By definition, they seem to be opposites, but they fall under the same umbrella in network virtualization.

Before you proceed, you may find the following posts helpful:

  1. Container Based Virtualization
  2. Virtual Switch
  3. What is VXLAN
  4. Redundant Links
  5. WAN Virtualization
  6. What Is FabricPath

Network virtualization

Before we get stuck in Cisco virtualization, let us address some basics. For example, if you have multiple virtual endpoints share a physical network. Still, different virtual endpoints belong to various customers, and the communication between these endpoints also needs to be isolated. In other words, the network is a resource, too, and network virtualization is the technology that enables the sharing of a standard physical network infrastructure.

Virtualization uses software to simulate traditional hardware platforms and create virtual software-based systems. For example, virtualization allows specialists to construct a single virtual network or partition a physical network into multiple virtual networks.

Cisco Switch Virtualization: Logical segmentation: One too many

We have one-to-many network virtualization for the Cisco switch virtualization design; a single physical network is logically segmented into multiple virtual networks. For example, each virtual network could correspond to a user group or a specific security function.

End-to-end path isolation requires the virtualization of networking devices and their interconnecting links. VLANs have been traditionally used, and hosts from one user group are mapped to a single VLAN. To extend the path across multiple switches at Layer 2, VLAN tagging (802.1Q) can carry VLAN information between switches. These VLAN trunks were created to transport multiple VLANs over a single Ethernet interface.

The diagram below displays two independent VLANs, VLAN201 and VLAN101. These VLANs can share one physical wire to provide L2 reachability between hosts connected to Switch B and Switch A via Switch C, but they remain separate entities.

Nexus1000v
Nexus1000v: The operation

VLANs are sufficient for small Layer 2 segments. However, today’s networks will likely have a mix of Layer 2 and 3 routed networks. In this case, Layer 2 VLANs alone are insufficient because you must extend the Layer 2 isolation over a Layer 3 device. This can be achieved by using Virtual Routing and Forwarding ( VRF ), the next step in the Cisco switch virtualization. A virtual routing and forwarding instance logically carves a Layer 3 device into several isolated independent L3 devices. The virtual routing and forwarding configured locally cannot communicate directly.

The diagram below displays one physical Layer 3 router with three VRFs: VRF Yellow, VRF Red, and VRF Blue. These virtual routing and forwarding instances are completely separated; without explicit configuration, routes in one virtual routing and forwarding instance cannot be leaked to another.

Virtual Routing and Forwarding

virtual routing and forwarding

The virtualization of the interconnecting links depends on how the virtual routers are connected. If they are physically ( directly ) connected, you could use a technology known as VRF-lite to separate traffic and 802.1Q to label the data plane. This is known as hop-by-hop virtualization.

However, it’s possible to run into scalability issues when the number of devices grows. This design is typically used when you connect virtual routing and forwarding back to back, i.e., no more than two devices.

When the virtual routers are connected over multiple hops through an IP cloud, you can use generic routing encapsulation ( GRE ) or Multiprotocol Label Switching ( MPLS ) virtual private networks.

GRE is probably the simpler of the Layer 3 methods, and it can work over any IP core. GRE can encapsulate the contents and transport them over a network with the network unaware of the packet contents. Instead, the core will see the GRE header, virtualizing the network path.

Cisco Switch Virtualization: The additional overhead

When designing Cisco switch virtualization, you need to consider the additional overhead. There are a further 24 bytes overhead for the GRE header, so it may be the case that the forwarding router may break the datagram into two fragments, so the packet may not be larger than the outgoing interface MTU. To resolve the fragmentation issue, you can correctly configure MTU, MSS, and Path MTU parameters on the outgoing and intermediate routers.

The GRE standard is typically static. You only need to configure tunnel endpoints, and the tunnel will be up as long as you can reach those endpoints. However, recent designs can establish a dynamic GRE tunnel.

GRE over IPsec

MPLS/VPN, on the other hand, is a different beast. It requires signaling to distribute labels and build an end-to-end Label Switched Path ( LSP ). The label distribution can be done with BGP+label, LDP, and RSVP. Unlike GRE tunnels, MPLS VPNs do not have to manage multiple point-to-point tunnels to provide a full mesh of connectivity. Instead, they are used for connectivity, and packets’ labels provide traffic separation.

Cisco switch virtualization: Many to one

Many-to-one network consolidation refers to grouping two or more physical devices into one. Examples of this Cisco switch virtualization technology include a Virtual Switching System ( VSS ), Stackable switches, and Nexus VPC. Combining many physicals into one logical entity allows STP to view the logical group as one, allowing all ports to be active. By default, STP will block the redundant path.

Software-defined networking takes this concept further; it completely abstracts the entire network into a single virtual switch. The control and data planes are on the same device on traditional routers, yet they are decoupled with SDN. The control plan is now on a policy-driven controller, and the data plane is local on the OpenFlow-enabled switch.

Network Virtualization

Server and network virtualization presented the challenge of multiple VMs sharing a single network physical port, such as a network interface controller ( NIC ). The question then arises: how do I link multiple VMs to the same uplink? How do I provide path separation? Today’s networks need to virtualize the physical port and allow the configuration of policies per port.

Nexus 1000

NIC-per-VM design

One way to do this is to have a NIC-per-VM design where each VM is assigned a single physical NIC, and the NIC is not shared with any other VM. The hypervisor, aka virtualization layer, would be bypassed, and the VM would access the I/O device directly. This is known as VMDirectPath.

This direct path or pass-through can improve performance for hosts that utilize high-speed I/O devices, such as 10 Gigabit Ethernet. However, the lack of flexibility and the ability to move VMs offset higher performance benefits.  

Virtual-NIC-per-VM in Cisco UCS (Adapter FEX)

Another way to do this is to create multiple logical NICs on the same physical NIC, such as Virtual-NIC-per-VM in Cisco UCS (Adapter FEX). These logical NICs are assigned directly to VMs, and traffic gets marked with a vNIC-specific tag on the hardware (VN-Tag/802.1ah).

The actual VN-Tag tagging is implemented in the server NICs so that you can clone the physical NIC in the server to multiple virtual NICs. This technology provides faster switching and enables you to apply a rich set of management features to local and remote traffic.

Software Virtual Switch

The third option is to implement a virtual software switch in the hypervisor. For example, VMware introduced virtual switching compatibility with its vSphere ( ESXi ) hypervisor, called vSphere Distributed Switch ( VDS ). Initially, they introduced a local L2 software switch, which was soon phased out due to a lack of distributed architecture.

Data physically moves between the servers through the external network, but the control plane abstracts this movement to look like one large distributed switch spanning multiple servers. This approach has a single management and configuration point, similar to stackable switches – one control plane with many physical data forwarding paths.

The data does not move through a parent partition but logically connects directly to the network interface through local vNICs associated with each VM.

Network virtualization and Nexus 1000v ( Nexus 1000 )

The VDS introduced by VMware lacked any good networking features, which led Cisco to introduce the Nexus 1000V software-based switch. The Nexus 1000v is a multi-cloud, multi-hypervisor, and multi-services distributed virtual switch. Its function is to enable communication between VMs.

Nexus1000v
Nexus1000v: Virtual Distributed Switch.

**Nexus 1000 components: VEM and VSM**

The Nexus 1000v has two essential components:

  1. The Virtual Supervisor Module ( VSM )
  2. The Virtual Ethernet Module ( VEM ).

Compared to a physical switch, the VSM could be viewed as the supervisor, setting up the control plane functions for the data plane to forward efficiently, and the VEM as the physical line cards that do all the packet forwarding. The VEM is the software component that runs within the hypervisor kernel. It handles all VM traffic, including inter-VM frames and Ethernet traffic between a VM and external resources.

The VSM runs its NX-OS code and controls the control and management planes, which integrate into a cloud manager, such as a VMware vCenter. You can have two VSMs for redundancy. Both modules remain constantly synchronized with unicast VSM-to-VSM heartbeats to provide stateful failover in the event of an active VSM failure.

The two available communication options for VSM to VEM are:

  1. Layer 2 control mode: The VSM control interface shares the same VLAN with the VEM.
  2. Layer 3 control mode: The VEM and the VSM are in different IP subnets.

The VSM also uses heartbeat messages to detect a loss of connectivity between it and the VEM. However, the VEM does not depend on connectivity to the VSM to perform its data plane functions and will continue forwarding packets if the VSM fails.

With Layer 3 control mode, the heartbeat messages are encapsulated in a GRE envelope.

Nexus 1000 and VSM best practices

  • L2 control is recommended for new installations.
  • Use MAC pinning instead of LACP.
  • Packet, Control, and Management in the same VLAN.
  • Do not use VLAN 1 for Control and Packet.
  • Use 2 x VSM for redundancy. 

The max latency between VSM and VEM is ten milliseconds. Therefore, a VSM can be placed outside the data center if you have a high-quality DCI link, and the VEM can still be controlled.

Nexus 1000v InterCloud – Cisco switch virtualization

A vital element of the Nexus 1000 is its use case for hybrid cloud deployments and its ability to place workloads in private and public environments via a single pane of glass. In addition, the Nexus 1000v interCloud addresses the main challenges with hybrid cloud deployments, such as security concerns and control/visibility challenges within the public cloud.

The Nexus 1000 interCloud works with Cisco Prime Service Controller to create a secure L2 extension between the private data center and the public cloud.

This L2 extension is based on Datagram Transport Layer Security ( DTLS ) protocol and allows you to securely transfer VMs and Network services over a public IP backbone. DTLS derives the SSL protocol and provides communications privacy for datagram protocols, so all data in motion is cryptographically isolated and encrypted.

Nexus 1000
Nexus 1000 and Hybrid Cloud.

 

Nexus 1000v Hybrid Cloud Components 

Cisco Prime Network Service Controller for InterCloud **A VM that provides a single pane of glass to manage all functions of the inter clouds
InterCloud VSMManage port profiles for VMs in the InterCloud infrastructure
InterCloud ExtenderProvides secure connectivity to the InterCloud Switch in the provider cloud. Install in the private data center.
InterCloud SwitchVirtual Machine in the provider data center has secure connectivity to the InterCloud Extender in the enterprise cloud and secure connectivity to the Virtual Machines in the provider cloud.
Cloud Virtual MachinesVMs in the public cloud running workloads.

Prerequisites

Port 80HTTP access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 443HTTPS access from PNSC for AWS calls and communicating with InterCloud VMs in the provider cloud
Port 22SSH from PNSC to InterCloud VMs in the provider cloud
UDP 6644DTLS data tunnel
TCP 6644DTLS control tunnel

VXLAN – Virtual Extensible LAN

The requirement for applications on demand has led to an increased number of required VLANs for cloud providers. The standard 12-bit identifier, which provided 4000 VLANs, proved to be a limiting factor in multi-tier, multi-tenant environments, and engineers started to run out of isolation options.

This has introduced a 24-bit VXLAN identifier, offering 16 million logical networks. Now, we can cross Layer 3 boundaries. The MAC in UDP encapsulation uses switch hashing to analyze UDP packets and efficiently distribute all packets in a port channel.

nexus 1000
VXLAN operations

VXLAN works like a layer 2 bridge ( Flood and Learn ); the VEM learn does all the heavy lifting, learns all the VM source MAC and Host VXLAN IPs, and encapsulates the traffic according to the port profile to which the VM belongs. Broadcast, Multicast, and unknown unicast traffic are sent as Multicast.

At the same time, unicast traffic is encapsulated and shipped directly to the destination host’s VXLAN IP, aka destination VEM. Enhanced VXLAN offers VXLAN MAC distribution and ARP termination, making it more optional. 

VXLAN Mode Packet Functions

PacketVXLAN(multicast mode)Enhanced VXLAN(unicast mode)Enhanced VXLANMAC DistributionEnhanced VXLANARP Termination
Broadcast /MulticastMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapReplication plus Unicast Encap
Unknown UnicastMulticast EncapsulationReplication plus Unicast EncapDropDrop
Known UnicastUnicast EncapsulationUnicast EncapUnicast EncapUnicast Encap
ARPMulticast EncapsulationReplication plus Unicast EncapReplication plus Unicast EncapVEM ARP Reply

vPath – Service chaining

Intelligent Policy-based traffic steering through multiple network services.

vPath allows you to intelligently traffic steer VM traffic to virtualized devices. It intercepts and redirects the initial traffic to the service node. Once the service node performs its policy function, the result is cached, and the local virtual switch treats the subsequent packets accordingly. In addition, it enables you to tie services together to push the VM through each service as required. Previously, if you wanted to tie services together in a data center, you needed to stitch the VLANs together, which was limited by design and scale.

Cisco virtualization
Nexus and service chaining

vPath 3.0 is now submitted to the IETF for standardization, allowing service chaining with vPath and non-vpath network services. It enables you to use vpath service chaining between multiple physical devices and supporting multiple hypervisors.

License Options 

Nexus 1000 Essential EditionNexus 1000 Advanced Edition
Full Layer-2 Feature SetAll Features of Essential Edition
Security, QoS PoliciesVSG firewall
VXLAN virtual overlaysVXLAN Gateway
vPath enabled Virtual ServicesTrustSec SGA
Full monitoring and management capabilitiesA platform for other Cisco DC Extensions in the Future
Free$695 per CPU MSRP

Nexus 1000 features and benefits

SwitchingL2 Switching, 802.1Q Tagging, VLAN, Rate Limiting (TX), VXLAN
IGMP Snooping, QoS Marking (COS & DSCP), Class-based WFQ
SecurityPolicy Mobility, Private VLANs w/ local PVLAN Enforcement
Access Control Lists, Port Security, Cisco TrustSec Support
Dynamic ARP inspection, IP Source Guard, DHCP Snooping
Network ServicesVirtual Services Datapath (vPath) support for traffic steering & fast-path off-load[leveraged by Virtual Security Gateway (VSG), vWAAS, ASA1000V]
ProvisioningPort Profiles, Integration with vC, vCD, SCVMM*, BMC CLM
Optimized NIC Teaming with Virtual Port Channel – Host Mode
VisibilityVM Migration Tracking, VC Plugin, NetFlow v.9 w/ NDE, CDP v.2
VM-Level Interface Statistics, vTrackerSPAN & ERSPAN (policy-based)
ManagementVirtual Centre VM Provisioning, vCenter Plugin, Cisco LMS, DCNM
Cisco CLI, Radius, TACACs, Syslog, SNMP (v.1, 2, 3)
Hitless upgrade, SW Installer

Advantages and disadvantages of the Nexus 1000

AdvantagesDisadvantages
The Standard edition is FREE; you can upgrade to an enhanced version when needed.VEM and VSM internal communication is very sensitive to latency. Due to their chatty nature, they may not be good for inter-DC deployments.
Easy and Quick to deployVSM – VEM, VSM (active) – VSM (standby) heartbeat time of 6 seconds makes it sensitive to network failures and congestion.
It offers you many rich network features unavailable on other distributed software switches.VEM over-dependency on VSM reduces resiliency.
Hypervisor agnosticVSM is required for vSphere HA, FT, and VMotion to work.
Hybrid Cloud functionality 

**Closing Points on Cisco Nexus 1000v**

Virtual Ethernet Module (VEM):

The Nexus 1000v employs the Virtual Ethernet Module (VEM), which runs as a module inside the hypervisor. This allows for efficient and direct communication between VMs, bypassing the traditional reliance on the hypervisor networking stack.

Virtual Supervisor Module (VSM):

The Virtual Supervisor Module (VSM) serves as the control plane for the Nexus 1000v, providing centralized management and configuration. It enables network administrators to define policies, manage virtual ports, and monitor network traffic.

Policy-Based Virtual Network Management:

With the Nexus 1000v, administrators can define policies to manage virtual networks. These policies ensure consistent network configurations across multiple hosts, simplifying network management and reducing the risk of misconfigurations.

Advanced Security and Monitoring Capabilities:

The Nexus 1000v offers granular security controls, including access control lists (ACLs), port security, and dynamic host configuration protocol (DHCP) snooping. Additionally, it provides comprehensive visibility into network traffic, enabling administrators to monitor and troubleshoot network issues effectively.

Benefits of the Nexus 1000v:

Enhanced Network Performance:

By offloading network processing to the VEM, the Nexus 1000v minimizes the impact on the hypervisor, resulting in improved network performance and reduced latency.

Increased Scalability:

The distributed architecture of the Nexus 1000v allows for seamless scalability, ensuring that organizations can meet the growing demands of their virtualized environments.

Simplified Network Management:

With its policy-based approach, the Nexus 1000v simplifies network management tasks, enabling administrators to provision and manage virtual networks more efficiently.

Use Cases:

Data Centers:

The Nexus 1000v is particularly beneficial in data center environments where virtualization is prevalent. It provides a robust and scalable networking solution, ensuring optimal performance and security for virtualized workloads.

Cloud Service Providers:

Cloud service providers can leverage the Nexus 1000v to enhance their network virtualization capabilities, offering customers more flexibility and control over their virtual networks.

The Nexus 1000v is a powerful virtual network switch that provides advanced networking capabilities for virtualized environments. Its rich features, policy-based management approach, and seamless integration with VMware vSphere allow organizations to achieve enhanced network performance, scalability, and management efficiency. As virtualization continues to shape the future of data centers, the Nexus 1000v remains a valuable tool for optimizing virtual network infrastructures.

Summary: Cisco Switch Virtualization Nexus 1000v

Welcome to our blog post, where we dive into the world of Cisco Switch Virtualization, explicitly focusing on the Nexus 1000v. In this article, we will unravel the complexities surrounding switch virtualization, explore the key features of the Nexus 1000v, and understand its significance in modern networking environments.

Understanding Switch Virtualization

Switch virtualization is a technique that allows for creating multiple virtual switches on a single physical switch, enabling greater flexibility and efficiency in network management. Organizations can consolidate their infrastructure, reduce costs, and streamline network operations by virtualizing switches.

Introducing the Nexus 1000v

The Cisco Nexus 1000v is a powerful switch virtualization solution that extends the functionality of VMware environments. Unlike traditional virtual switches, it provides a more comprehensive set of features and advanced network control. It seamlessly integrates with VMware vSphere, offering enhanced visibility, security, and policy management.

Key Features of the Nexus 1000v

– Distributed Virtual Switch: The Nexus 1000v operates as a distributed virtual switch, distributing network intelligence across all hosts in the virtualized environment. This ensures consistent policies, simplified troubleshooting, and improved performance.

– Virtual Port Profiles: With virtual port profiles, administrators can define consistent network policies for virtual machines, irrespective of their physical location. This simplifies network provisioning and reduces the chances of misconfigurations.

– Network Analysis Module (NAM): The Nexus 1000v incorporates NAM, a robust monitoring and analysis tool that provides deep visibility into virtual network traffic. This enables administrators to identify and resolve network issues, ensuring optimal performance quickly.

Deployment Considerations

When planning to deploy the Nexus 1000v, it is essential to consider factors such as network architecture, compatibility with existing infrastructure, and scalability requirements. It is advisable to consult with Cisco experts or certified partners to ensure a smooth and successful implementation.

Conclusion:

In conclusion, the Cisco Nexus 1000v is a game-changer in switch virtualization. Its advanced features, seamless integration with VMware environments, and extensive network control make it an ideal choice for organizations seeking to optimize their network infrastructure. By understanding the fundamentals of switch virtualization and exploring Nexus 1000v’s capabilities, network administrators can unlock a world of possibilities in network management and performance.

WAN Design Requirements

Spine Leaf Architecture

Spine Leaf Architecture

In today's interconnected world, where data traffic is growing exponentially, having a robust and scalable network architecture is crucial for businesses and organizations. One such architecture that has gained popularity in recent years is the Spine Leaf architecture. This blog post will explore Spine Leaf architecture, its benefits, and how it can revolutionize network design.

Spine Leaf architecture, also known as Clos architecture, is a network design approach that offers high bandwidth, low latency, and scalability. It is commonly used in data centers and large-scale enterprise networks. The architecture is based on a two-tier design consisting of spine switches and leaf switches.

The spine-leaf architecture is a data center design that provides a scalable and low-latency network fabric. Unlike traditional three-tiered designs, the spine-leaf architecture eliminates the need for complex hierarchical structures, allowing for faster and more efficient communication between devices. With a non-blocking fabric and equal-length paths, data can travel seamlessly from one leaf switch to another, enhancing overall network performance.

The spine-leaf data center offers a myriad of benefits for organizations. Firstly, it provides predictable and consistent low-latency connectivity, ensuring optimal performance for mission-critical applications. Additionally, the architecture allows for easy scalability, enabling seamless expansion as network demands grow. Furthermore, the simplified design reduces complexity, making deployment and management more efficient. Overall, the spine-leaf data center empowers organizations with a robust and agile network infrastructure.

One of the key advantages of the spine-leaf data center is its ability to enhance network flexibility and resilience. By utilizing equal-length paths, traffic can be distributed evenly, preventing bottlenecks and maximizing network capacity. Moreover, the architecture allows for the implementation of link aggregation techniques, increasing bandwidth and redundancy. These features not only improve network performance but also provide built-in fault tolerance, ensuring high availability for critical applications.

The spine-leaf architecture represents a significant shift in the evolution of data center networks. Traditional designs often faced challenges in adapting to the increasing demands of virtualization, cloud computing, and big data analytics. The spine-leaf data center addresses these challenges by providing a scalable, high-performance, and flexible network design that can meet the requirements of modern applications and workloads. It sets the stage for the future of data center networking.

Highlights: Spine Leaf Architecture

At its most straightforward, a data center is a physical facility that houses applications and data. Such a design is based on a computing and storage resources network that enables the delivery of shared applications and data. The critical elements of a data center design include routers, switches, firewalls, storage systems, servers, and application-delivery controllers.

The data center should be flexible in quickly deploying and supporting new services. Such a design needs substantial initial planning and consideration of port density, access layer uplink bandwidth, actual server capacity, and oversubscription, to name a few.

Key Design Considerations

**Network Architecture: The Foundation of Your Data Center**

The network architecture of a data center is the blueprint that determines how data flows through the system. It encompasses various topological designs, such as the traditional three-tier architecture and the more modern leaf-spine architecture. Each design comes with its unique set of advantages and trade-offs. When deciding on a network architecture, consider factors such as scalability, latency, and redundancy. A well-planned architecture can significantly enhance performance and future-proof your data center against the evolving demands of technology.

**Scalability: Planning for Growth**

A critical factor in data center topology design is scalability. As businesses grow, so do their data processing needs. Designing a topology that can easily accommodate additional resources without significant overhauls is crucial. Scalability considerations include modular designs that allow for the seamless addition of new servers and storage units, as well as network configurations that can handle increased traffic without bottlenecks. Investing in scalable solutions today can save significant time and resources in the future.

**Redundancy and Reliability: Ensuring Uptime**

In any data center, ensuring maximum uptime is paramount. Redundancy and reliability are key considerations in topology design to prevent downtime and data loss. This involves implementing redundant pathways for data, backup power supplies, and failover mechanisms to ensure continuous operation even in the event of hardware failures or power outages. By prioritizing reliability in your design considerations, you can safeguard your data center against unforeseen disruptions and maintain business continuity.

**Energy Efficiency: Balancing Performance and Sustainability**

With growing concerns about energy consumption and environmental impact, designing an energy-efficient data center is more important than ever. Considerations include the use of energy-efficient hardware, optimized cooling systems, and smart power management techniques. Efficient design not only reduces operational costs but also minimizes the environmental footprint of your data center. Striking the right balance between performance and sustainability is critical for long-term success.

Key Topology Considerations:

**Understanding Traditional Topologies**

Before diving into the leaf and spine architecture, it’s essential to understand the limitations of traditional data center topologies. Historically, data centers used a three-tier architecture consisting of core, aggregation, and access layers. While effective at the time, this model struggles to keep up with the demands of modern applications due to its limited scalability and potential for bottlenecks. As data center needs evolved, so too did the need for a more robust and scalable topology.

**The Rise of Leaf and Spine Architecture**

Leaf and spine architecture addresses the limitations of traditional topologies by providing a more scalable and efficient solution. In this model, every leaf switch is connected to every spine switch, creating a non-blocking network fabric. This design ensures that data traffic can be routed with minimal latency and congestion, regardless of the source or destination. The uniformity and predictability of this architecture make it an attractive option for organizations looking to future-proof their data centers.

**Advantages of Leaf and Spine Topologies**

One of the primary benefits of leaf and spine architecture is its scalability. As data center demands grow, additional spine switches can be added without disrupting the existing network, allowing for seamless expansion. Additionally, this topology provides consistent performance by evenly distributing data loads across all available paths. The redundancy inherent in leaf and spine designs also enhances network reliability, minimizing the risk of downtime and ensuring uninterrupted service delivery.

**Implementing Leaf and Spine in Modern Data Centers**

Adopting a leaf and spine topology requires careful planning and execution. Organizations must assess their current infrastructure, consider future growth projections, and select the appropriate switches and cabling to support the architecture. While the initial investment may be higher than that of traditional models, the long-term benefits of scalability, performance, and reliability make it a worthwhile endeavor for businesses aiming to stay competitive in the digital landscape.

Traditional Tree-Based Topologies

We have tree-based topologies on the opposite side of a spine-leaf switch design. Tree-based topologies have been the mainstay of data center networks. Traditionally, Cisco has recommended a multi-tier tree-based data center topology, as depicted in the diagram below.

These networks are characterized by aggregation pairs ( AGGs ) that aggregate through many network points. Hosts connect to access or edge switches, which connect to distribution, and distribution connects to the core.

The core should offer no services ( firewall, load balancing, or WAAS ), and its central role is to forward packets as quickly as possible. The aggregation switches define the boundary for the Layer 2 domain, and to contain broadcast traffic to individual domains, VLANs are used to further subdivide traffic into segmented groups. This style of design operates very differently from spine-leaf architecture.

Guide: Leaf and Spine with Cisco ACI 

The following lab guide addresses the leaf and spine with Cisco ACI. The screenshot below shows a small topology that is fine for demonstration purposes. The leaf and spine are based on the Cisco Nexus 9000 series. The ACI has an automated fabric discovery process, and as you can see, we have successfully registered all fabric members.

SDN data center
Diagram: Cisco ACI fabric checking.

The traditional three-tier model was based on the following design principles:

  1. The access switch connects to endpoints, e.g., servers.
  2. The aggregation or distribution switches provide redundant connections to access switches.
  3. The core switches provide fast transport between aggregation switches, typically connected in a redundant pair for high availability.
  4. Networking and security services such as load balancing or firewalling were typically connected to the distribution layers.
spine leaf architecture
The traditional data center design. Non-spine-leaf architecture.

The focus of the design

Their design’s focus was based on fault avoidance principles, and the strategy for implementing this principle is to take each switch and its connected links and build redundancy into it. This led to the introduction of port channels and devices deployed in pairs. In addition, servers pointed to a First Hop Redundancy Protocol, like HSRP or VRRP ( Hot Standby Router Protocol or Virtual Router Redundancy Protocol ). Unfortunately, the steady-state type of network design led to many inefficiencies:

  1. Inefficient use of bandwidth via a single-rooted core.
  2. Operational and configuration complexity.
  3. The cost of having redundant hardware.
  4. It is not optimized for small flows.

Recent changes to application and user requirements have changed the functions of data centers, which in turn has altered the topology and design of the data center to a spine-leaf switch topology. For example, the traditional aggregation point design style was inefficient, and recent changes in end-user requirements are driving architects to design around the following key elements.

Spine Leaf Architecture: Requirements

A spine-leaf architecture collapses one of these tiers at the most basic level, as depicted in the diagram below. Follow the following design principles:

  1. The removal of the Spanning Tree Protocol (STP)
  2. Increased use of fixed-port switches over modular models for the network backbone
  3. More cabling to purchase and manage, given the higher interconnection count
  4. A scale-out vs. scale-up of infrastructure.
what is spine and leaf architecture
Diagram: What is spine and leaf architecture? 2-Tier Spine Leaf Design

Leaf and Spine Main Points

With the introduction of the cloud and containerized infrastructure, there was an increase in east-west traffic. East-west traffic differs from north to south traffic and moves laterally from server to server. Generally, this type of traffic flow stays internal to the data center.

With the change in traffic patterns, we must design our data centers to have low latency and optimized traffic flows, especially for time-sensitive or data-intensive applications. A spine-leaf data center design aids this by ensuring traffic always has the same number of hops from its next destination, so latency is lower and predictable.

STP has always been problematic in the data center. Now, the capacity improves with a leaf and spine because STP is no longer required. In the past, STP blocked redundant paths between two switches, where only one could be active at any time.

As a result, paths often need to be more subscribed. With a leaf, spine-leaf architectures rely on protocols such as Equal-Cost Multipath (ECPM) routing to load balance traffic across all available paths while still preventing network loops. So, instead of running STP to the spine layer, we can run routing protocols.

We also have better scalability. We can add additional spine switches, and leaf switches can be seamlessly inserted when port density becomes problematic. There is no need to take down the core layer for upgrades.

STP Blocking.
Diagram: STP Blocking. Source Cisco Press free chapter.

Data Center Requirements

  • 1) Equidistant endpoints with non-blocking network core.

Equidistant endpoints mean that every device is a maximum of one hop away from the other, resulting in consistent latency in the data center. The term “non-blocking” refers to the internal forwarding performance of the switch.

Non-blocking is the ability to forward at line rate tx/Rx – sender X can send to receiver Y and not be blocked by a simultaneous sender. A blocking architecture cannot deliver the total bandwidth even if the individually switching modules are not oversubscribed or if all ports are not transmitting simultaneously.

  • 2) Unlimited workload placement and mobility.

The application team wants to place the application at any point in the network and communicate with existing services like storage. This usually means that VLANs need to sprawl for VMotion to work. The main question is, where do we need large layer 2 domains? Bridging doesn’t scale, and that’s not just because of spanning tree issues; it’s because the MAC addresses are not hierarchical and cannot be summarized. There is also a limit of 4000 VLANs.

  • 3) Lossless transport for storage and other elephant flows.

To support this type of traffic, data centers require not only conventional QoS tools but also Data Center Bridging ( DCB ) tools such as Priority flow control ( PFC ), Enhanced transmission selection ( ETS ), and Data Center Bridging Exchange ( DCBX ) to be applied throughout their designs. These standards are enhancements that allow lossless transport and congestion notification over full-duplex 10 Gigabit Ethernet networks.

Feature

 Benefit

Priority-based Flow Control ( PFC )

Manages bursty single traffic source on a multiprotocol link

Enhanced transmission selection ( ETS )

Enables bandwidth management between traffic types for multiprotocol links

Congestion notification

Addresses the problems of sustained congestion by moving corrective action to the edge of the network

Data Center Bridging Exchange Protocol 

Allows the exchange of enhanced Ethernet parameters

  • 4) Simplified provisioning and management.

Simplified provisioning and management are critical to operational efficiency. However, the ability to auto-provision and for the users to manage their networks is challenging for future networks.

  • 5) High server-to-access layer transmission rate at Gigabit and 10 Gigabit Ethernet.

Before the advent of virtualization, servers transitioned from 100Mbps to 1GbE as processor performance increased. With the introduction of high-performance multicore processors and each physical server hosting multiple VMs, the processor-to-network connection bandwidth requirements increased dramatically, making 10 Gigabit Ethernet the most common network access option for servers.

In addition, the popularization of 10 Gigabit Ethernet for server access has provided a straightforward approach to group/bundle multiple Gigabit Ethernet interfaces into a single connection, making Ethernet an extremely viable technology for future-proof I/O consolidation.

In addition, to reduce networking costs, data centers are now carrying data and storage traffic over Ethernet using protocols such as iSCSI ( Internet Small Computer System Interface ) and FCoE ( Fibre Channel over Ethernet ). FCoE allows the transport of Fibre channels over a lossless Ethernet network.

spine-leaf switch
FCoE Frame Format

Although there has been some talk of introducing 25 Gigabit Ethernet due to the excessive price of 40 Gigabit Ethernet, the two main speeds on the market are Gigabit and 10 Gigabit Ethernet. The following is a comparison table between Gigabit and 10 Gigabit Ethernet:

Gigabit Ethernet

 10 Gigabit Ethernet

+ Well know and field-tested

+ Much faster vMotion

+ Standard and cheap Copper cabling

+ Converged storage & network ( FCoE or lossless iSCSI/NFS)

+ NIC on the motherboard

+ Reduce the number of NICs per server

+ Cedric Kelly

+ Built-in Qos with ETS and PFC

+ Uses fiber cabling which has lower energy consumption and error rate

– Numerous NICs per hypervisor host. Maybe up to 6 NICs ( user data, VMotion, storage )

– More expensive NIC cards

– No storage/networking convergence. Unable to combine networking and storage onto one NIC

– Usually requires new cabling to be laid which intern could mean more structured panels

– No lossless transport for storage and elephant flows

– SFP used either for single-mode or multimode fiber can be up to $4000 list per each

Spine-Leaf Switch Design

The critical difference between traditional aggregation layers/points and fabric networks is that fabric doesn’t aggregate. If we want to provide 10GB for every edge router to send 10GB to every other edge router, we must add bandwidth between routers A and B, i.e., if we have three hosts sending at 10GB each, we need a core that supports 30 GB.

We must add bandwidth at the core because what if two routers wanted to send 2 x 10GB of data, and the core only supports a maximum of 10GB ( 10GB link between routers A and B)? Both data streams must be interleaved onto the oversubscribed link so that both senders get equal bandwidth. 

You get blocked and oversubscribed when more bandwidth comes into the core than the core can accommodate. Blocking and oversubscription cause delay and jitter, which is bad for some applications, so we must find a way to provide total bandwidth between each end host.

Oversubscription is expressed as the ratio of inputs to outputs (ex. 3:1) or as a percent that is calculated (1 – (# outputs / # inputs)). For example, (1 – (1 output / 3 inputs)) = 67% oversubscribed). There will always be some oversubscription on the network, and there is nothing we can do to get away from that, but as a general rule of thumb, an oversubscription value of 3:1 is best practice.

Some applications will operate fine when oversubscription occurs. It is up to the architect to thoroughly understand application traffic patterns, bursting needs, and baseline states to define the oversubscription limits a system can tolerate accurately.

The simplest solution to overcome the oversubscription and blocking problems would be to increase the bandwidth between Router A and B, as shown in the diagram labeled “Traditional Aggregation Topology.” This is feasible up to a certain point. Router A and B links must also grow to 10GB and 30 GB when the number of edge hosts grows. Datacenter links and the optics used to connect them are expensive.

Spine-Leaf Switch Design

The solution is to divide the core devices into several spine devices, which expose the internal fabric, enabling a spine-leaf architecture similar to what you see with ACI networks. This is achieved by spreading the fabric across multiple devices ( leaf and spine ).

The spreading of the fabric results in every leaf edge switch connecting to every spine core switch, resulting in every edge device having the total bandwidth of the fabric. This places multiple traffic streams parallel, unlike the traditional multitier design that stacks multiple streams onto a single link.

In addition, the higher degree of equal-cost multi-path routing ( ECMP ) found with leaf and spine architectures allows for greater cross-sectional bandwidth between layers, thus greater east-west bandwidth. There is also a reduction in the fault domain compared to traditional access, distribution, and core designs.

A failure of a single device only reduces the available bandwidth by a fraction, and only transit traffic will be lost with a link failure. ECMP reduces liability to a single fault and brings domain optimization.

**Origination of the spine and leaf design**

Charles Clos initially designed a Clos network in 1952 as a multi-stage circuit-switched interconnection network to provide a scalable approach to building large-scale voice switches. It constrained high-speed switching fabrics and required low-latency, non-blocking switching elements.

There has been an increase in the deployment of Clos-based models in data center deployments. Usually, the Clos network is folded around the middle to form a “folded-Clos” network, referred to as a spine-leaf architecture. The spine-leaf switch design consists of three switches:

  • Servers connect directly to ToR ( top of rack ) switches.
  • ToR connects to aggregation switches.
  • Intermediate switches connect to aggregation switches. 

The spine is responsible for interconnecting all Leafs and allows hosts in one rack to talk to hosts in another. The leaves are responsible for physically connecting the servers and distributing traffic via ECMP across all spine nodes.

Leaf and Spine: Folded 3-Stage Clos fabric

Spine-leaf switch deployment considerations:

A. Spine-leaf switch: Fixed or modular switches

Fixed Switches

Modular switches

+ Cheaper

+ Gradual Growth

+ Lower Power Consumption

 + Larger fabrics with leaf/spine topologies

+ Require less space

 + Build-in redundancy with redundant SUPs and SSO/NSF

+ More ports per RU

+ In-Service software redundancy

+ Easier to manage

– Hard to manage

– More expensive

– Difficult to expand

– More cabling due to an increase in device numbers

The leaf layer determines the size of the spine and the oversubscription ratios. It is responsible for advertising subnets into the network fabric. An example of a leaf device would be a Nexus 3064, which provides the following:

  1. Line rate for Layer 2 and Layer 3 on all ports.
  2. Shared memory buffer space.
  3. Throughput of 1/2 terabits per second ( Tbps ) and 950 million packets per second ( Mpps )
  4. 64-way ECMP

The spine layer is responsible for learning infrastructure routes and physically interconnecting all leaf nodes. The Nexus 7K is the platform for the Spine device layer. The F2 series line cards can provide 48x 10G line rate ports and fit the requirements for spine architecture very well.
The following are the types of implementations you could have with this topology:

  1. Layer 3 fabric with standard routing.
  2. Large-scale bridging ( FabricPath, THRILL, or SPB ).
  3. Many-chassis MLAG ( Cisco VSS ).

This article will focus on Layer 3 fabrics with standard routing.

B. Spine-leaf switch: Non-redundant layer 3 design

**Spine-leaf switch: Design Summary**

  1. Layer 3 directly to the access layer. Layer 2 VLANs do not span the spine layer.
  2. Servers are connected to single switches. Servers are not dual connected to two switches, i.e., there is no server to switch redundancy or MLAG.
  3. All connections between the switches will be pure routed point-to-point layer 3 links.
  4. There are no inter-switch VLANs, so no VLAN will ever go beyond one switch.

When the spine switches only advertise the default to the leaf switches, the leaf switches lose visibility of the entire network, and you will need additional intra-spine links. Therefore, intra-spine links should not be used for data plane traffic in a leaf-spine architecture.

**Spine-leaf switch: Design assumptions**

The spine layer passes a default route to the Leaf. The link between the Leaf connecting to Host 1 and Spine Z fails. In the diagram, the link is marked with a red “X.” Host 4 sends traffic to the fabric destined for Host 1.

This traffic spreads ( ECMP ) across all links connecting the connected Leaf to the Spine layers. The traffic hits Spine C, and as C does not have a direct link ( it has failed ) to the Leaf connecting to Host 1, some traffic may be dropped while others will be sub-optimal. To overcome this, you must add inter-switch links between the Spine layers, which is not recommended.

Spine-leaf switch: Recommendations

  1. Buy Leaf switches that can support enough IP prefixes and don’t use summarization from Spine to Leaf.
  2. Always use 40G links instead of channels of 4 x 10G links because link aggregation bandwidth does not affect routing costs. If you lose a link in the port channel, the cost of the port channel does not change, which could result in congestion on the link.
  3. You could use Embedded Event Manager ( EEM ) scripting to change the OSPF cost after one of the port channels fails. This would add complexity to the network as you now don’t have equal-cost routes. This would lead you to use the Cisco proprietary protocol EIGRP, which supports unequal cost routing. If you didn’t want to support a Cisco proprietary protocol, you could implement MPLS TE between the ToR switches. First, you need to check that the DC switches support the MPLS switching of labels.
  4. Use QSFP optics as they are more robust than SFP optics. This will lower the likelihood of one of the parallel links failing.

C. Spine-leaf switch: Redundant layer 3 design

Spine-leaf switch: Design Summary

  1. The servers are dual home to two different switches.
  2. Servers have one IP address due to the restriction of TCP applications. Ideally, use LACP ( Link Aggregation Control Protocol ) between the host and servers.
  3. Layer 2 trunk links between the Leaf switches are needed to carry VLANs that span both switches. This will restrict VLANs from spanning the core, thus creating a sizeable L2 fabric based on STP.
  4. ToR switches must be in the same subnets ( share the server’s subnet) and advertise this subnet into the fabric. Again, the servers are dual-homed to 2 switches with one IP address.

Spine-leaf switch: The challenges

The leaf switches both advertise the same subnet to the spine switches. The spine switches and thinks they have two paths to reach the host. The Spine switch will spread its traffic from Host 1 to Leaf switches connecting Host 1 and Host 2. In specific scenarios, this could result in traffic to the hosts traversing the Interswitch link between the leaf nodes.

This may not be a problem if most traffic leaves the servers northbound ( traffic leaving the data center ). However, if there is a lot of inbound traffic, this link could become a bottleneck and congestion point. This may not be an issue if this is a hosting web server farm because most traffic will leave the data center to external users.

Spine-leaf switch: Recommendation

  1. If there is a lot of east-to-west traffic ( 80 % ), using LAG ( Link Aggregation Group ) between the servers and ToR Leaf switches is mandatory.
  2. The two Leaf switches must support MLAG ( Multichassis Link Aggregation ). The result of using MLAG on the Leaf switches is that when connecting Leaf receives traffic destined for host X, it knows it can reach it directly through its connected link—resulting in optimal southbound traffic flow.
  3. Most LAG solutions place traffic generated from a single TCP session onto a single uplink, limiting the TCP session throughput to the bandwidth of a single uplink interface. However, Dynamic NIC teaming is available in Windows Server 2012 R2 which can split a single TCP session into multiple flows and distribute them across all uplinks.
  4. Use dynamic link aggregation – LACP and not static port channels. The LAGs between servers and switches should use LACP to prevent traffic blackholing.




Key Spine Leaf Architecture Summary Points:

Main Checklist Points To Consider

  • The spine leaf architecture consists of a leaf layer and a spine layer. Endpoints connect to the leaf layer—the spine switch act as the core.

  • This layout of the leaf and spine gives you optimal load balancing and ECMP for any endpoint in any location.

  • The traditional tree-based topologies are not suited for virtualization and you will always be hit with the core port count.

  • The spine and leaf can build massive data centers with, for example, folder 3-stage design.

  • Cisco ACI is an example of a leaf and spine design. VXLAN is the most common overlay protocol that works over what is known as the underlay.

Recap on Spine and Leaf Architecture

Spine Switches:

Spine switches form the backbone of the network in a Spine Leaf architecture. They are high-performance switches that connect to every leaf switch in the network. The spine switches provide a non-blocking, high-bandwidth fabric for data transfer between leaf switches. They ensure data traffic flows seamlessly across the network, avoiding bottlenecks and congestion.

Leaf Switches:

Leaf switches are connected to the spine switches and act as the access layer in a Spine Leaf architecture. They connect end-user devices, servers, or other network devices to the spine switches. Leaf switches are responsible for forwarding traffic between devices within the same leaf and between different leaf switches. They offer a high degree of network flexibility and redundancy.

Benefits of Spine Leaf Architecture:

1. Scalability: Spine Leaf architecture allows for easy scalability as new leaf switches can be added without affecting the existing network. This scalability makes it ideal for growing businesses and organizations with expanding network requirements.

2. High Bandwidth: The architecture provides high bandwidth capacity by leveraging multiple spine switches. This efficiently handles heavy data traffic and ensures optimal network performance even during peak usage.

3. Low Latency: Spine Leaf architecture minimizes latency by eliminating multiple layers of network hierarchy. With fewer hops and shorter paths, data packets can be transmitted quickly, improving application response times.

4. Redundancy and Resilience: The architecture offers built-in redundancy and resilience. If a link or a switch fails, traffic can be automatically rerouted through alternate paths, ensuring uninterrupted network connectivity and minimizing downtime.

5. Enhanced Performance: Spine Leaf architecture improves overall network performance by evenly distributing traffic across multiple paths. This load-balancing capability optimizes resource utilization and prevents any single point of failure.

Spine Leaf Architecture

The spine-leaf architecture has only two layers of switches: spines and leaves. Switches form the spine layer, which performs routing and works as the network’s core. Access switches connect servers, storage devices, and other end users to the leaf layer. A data center network with this structure has a lower hop count and a lower network latency. Leaf switches are connected to spine switches in the spine-leaf architecture. In this design, there is only one interconnected switch path between any leaf switches so that any server can communicate with any other server.

Why Use Spine-leaf Architecture?

The spine-leaf architecture has become a famous data center architecture, bringing many advantages, including scalability and network performance. In five points, we summarize the benefits of spine-leaf architecture in modern networks.

– Enhanced redundancy: The spine-leaf architecture connects the servers with the core network, providing greater flexibility in hyperscale data centers. As a result, the leaf switch can serve as a bridge between the server and the core network. A sizeable non-blocking fabric is formed by connecting leaf switches to spine switches, increasing redundancy and reducing traffic bottlenecks.

– Enhanced bandwidth: The spine-leaf architecture can effectively avoid traffic congestion through protocols such as transparent interconnection of multiple links (TRILL) and shortest path bridging (SPB). Adding uplinks to the spine switch increases interlayer bandwidth and reduces oversubscription to secure network stability using the spine-leaf architecture.

– Enhanced scalability: Multiple links carry traffic in the spine-leaf architecture. In addition to improving scalability, switches can help enterprises expand their businesses in the future.

– Reduced expenses: Because spine-leaf architecture allows switches to handle more connections, data centers deploy fewer devices. A spine-leaf architecture minimizes costs in many data center networks.

– Increased Performance: With a maximum number of hops of only two, we facilitate a more direct traffic path, enhancing overall performance and reducing bottlenecks. This applies only when the destination is on the same leaf switch.

leaf and spine

Spine and Leaf Popularity

Because of cloud computing and containerized infrastructure, east-west traffic increases in modern data centers. East-west traffic moves from server to server in a lateral fashion. Modern applications have components distributed across multiple servers or virtual machines, which partly explains this shift. When it comes to east-west traffic, low-latency, optimized flows are critical for applications that are time-sensitive or data-intensive. Spine-leaf architectures reduce latency by ensuring every hop between destinations is the same.

STP has also been removed, increasing capacity. Only one switch can be active simultaneously, even though STP can provide redundant paths between two switches. Consequently, paths are oversubscribed. Spine-leaf architectures use protocols such as Equal-Cost Multipath (ECMP) to accomplish load-balancing traffic across all available paths across all available paths. Topologies with spines and leaves improve scalability and performance. Capacity can be increased by adding additional spine switches and connecting them to each leaf. Likewise, new leaf switches can be seamlessly inserted if port density becomes an issue. “Scaling out” infrastructure doesn’t change anything.

stp port states

Charles Clos – large-scale switching fabrics

Using Edson Erwin’s concept of building large-scale switching fabrics for telephone systems, Charles Clos (pronounced Klo) developed the Clos network, published in the Bell System Technical Journal in 1953. The original paper, “A Study of Non-Blocking Switching Networks,” is cited in hundreds of subsequent documents. In telephony systems, a Clos network consists of three stages, each with several crossbar switches. To reduce complexity and cost, stages were introduced instead of a single prominent crossbar to reduce the number of crosspoint interconnections needed to build large-scale crossbar-like functionality.

Crossbar switches are strictly non-blocking switches with n inputs and n outputs and interconnecting lines connecting inputs and outputs. For idle input and output lines, non-blocking means that connections can be made without interrupting other connected lines. A crossbar is fundamentally designed to accomplish this. In this case, the complexity of the crossbar switch is O(n2).

Clos. Non Blocking

Data center topology

A spine-leaf architecture is a variation of data center topologies that consists of two switching layers. We have a spine-leaf switch design consisting of two layers. The leaf layer consists of access switches that aggregate traffic from endpoints that could be traditional servers or containers and connect directly to the spine, which is the network core. The Spine switch will often have two for redundancy to interconnect all leaf switches in a full-mesh leaf and spine topology. With a spine and leaf data center network design, the leaf switches do not directly connect.

The underlay and the overlay

eBGP, in this case, is used to exchange routing information between the nodes of the fabric through the underlay, which provides point-to-point Layer 3 interfaces between leafs and spines. Using eBGP to advertise the loopback addresses of VTEPs in the fabric (typically leaves), the underlay offers connectivity between the loopbacks.

In the overlay layer, packets are encapsulated in an outer IP header and transported from one VTEP to another using the data plane encapsulation layer. Source IP addresses are the loopbacks of the originating VTEPs, and destination IP addresses are the loopbacks of the terminating VTEPs.

Multicast VXLAN
Diagram: Multicast VXLAN

Example: Cisco ACI

Instead, all connectivity goes through the core, and the physical and logical layout is generally the same based on network overlay protocols, more than likely VXLAN. An example of a data center that utilizes such a design is the Cisco ACI. The ACI Cisco consists of three main components: the Application Policy Infrastructure Controller (APIC), the spine switches, and the leaf switches.

 

Summary: Spine Leaf Architecture

In data centers, efficiency and scalability are critical to ensuring optimal performance. One architectural design that has gained significant attention is the leaf and spine architecture. This blog post delved into the intricacies of leaf and spine architecture, exploring its benefits, components, and the future it holds for data centers.

Understanding Leaf and Spine Architecture

Leaf and spine architecture, also known as Clos architecture, is a network design approach that eliminates bottlenecks and enhances data center scalability. The architecture consists of two main components: leaf switches and spine switches. Leaf switches act as the access layer, connecting directly to servers, while spine switches serve as the backbone, interconnecting the leaf switches.

Benefits of Leaf and Spine Architecture

The leaf and spine architecture offers several advantages over traditional network designs. Firstly, it provides high bandwidth and low latency due to the non-blocking nature of the spine switches. This ensures smooth and efficient communication between servers. Additionally, the architecture allows for easy scalability, as new leaf switches can be added without impacting the existing network. This modular approach enables data centers to adapt to growing demands seamlessly.

Components of Leaf and Spine Architecture

To grasp the essence of leaf and spine architecture, it’s essential to understand its main components. Leaf switches connect servers within a rack, offering multiple high-speed ports. Spine switches, conversely, provide the interconnectivity between leaf switches, forming a fabric network. Additionally, the architecture may incorporate top-of-rack (ToR) switches for enhanced flexibility and redundancy.

Future Trends and Innovations

As technology continues to evolve, leaf and spine architecture is poised to witness further advancements. With the rise of software-defined networking (SDN), data centers can achieve greater control and programmability in managing their network infrastructure. Moreover, integrating artificial intelligence (AI) and machine learning (ML) algorithms can optimize traffic flow and improve overall network performance.

Conclusion

In conclusion, leaf and spine architecture has revolutionized how data centers are designed and operated. Its scalable and efficient nature brings numerous benefits, including high bandwidth, low latency, and easy expansion. As technology progresses, we can expect further innovations in this architectural approach, ensuring that data centers can meet the ever-growing digital age demands.

network overlays

What is FabricPath

What is Fabric Path

In today's digital era, businesses rely heavily on networking infrastructure to ensure seamless communication and efficient data transfer. Cisco FabricPath is a cutting-edge technology that provides a scalable and resilient solution for modern network architectures. In this blog post, we will delve into the intricacies of Cisco FabricPath, exploring its features, benefits, and use cases.

Cisco FabricPath is a comprehensive network virtualization technology designed to address the limitations of traditional Ethernet networks. It offers a flexible and scalable approach for building large-scale networks that can handle the increasing demands of modern data centers. By combining the benefits of Layer 2 simplicity with Layer 3 scalability, Cisco FabricPath provides a robust and efficient solution for building high-performance networks.

Cisco Fabric Path is a networking technology that provides a scalable and flexible solution for data center networks. It leverages the benefits of both Layer 2 and Layer 3 protocols, combining the best of both worlds to create a robust and efficient network infrastructure.

Simplified Network Design: One of the key advantages of Cisco Fabric Path is its ability to simplify network design and reduce complexity. By utilizing a loop-free topology and eliminating the need for complex Spanning Tree Protocol (STP) configurations, Fabric Path streamlines the network architecture and improves overall efficiency.

Increased Scalability: Scalability is a crucial aspect of any modern network infrastructure, and Cisco Fabric Path excels in this area. With its support for large Layer 2 domains and the ability to accommodate thousands of VLANs, Fabric Path provides organizations with the flexibility to scale their networks as per their evolving needs.

Enhanced Traffic Load Balancing: Cisco Fabric Path incorporates Equal-Cost Multipath (ECMP) routing, which enables efficient distribution of traffic across multiple paths. This results in improved performance, reduced congestion, and enhanced load balancing capabilities within the network.

Converged Traffic and Virtualization: Fabric Path allows for the convergence of both Ethernet and Fibre Channel traffic onto a single infrastructure, simplifying management and reducing costs. Additionally, it seamlessly integrates with virtualization technologies, enabling organizations to leverage the benefits of virtual environments without compromising on performance or security.

Highlights: What is Fabric Path

Nexus OS Software Release

Introduced by Cisco in Nexus OS software Release 5.1(3), FabricPath Nexus allows architects to design highly scalable actual Layer 2 fabrics. Similar to the spanning tree, it provides an almost plug-and-play deployment model with the benefits of Layer 3 routing, allowing FabricPath networks to scale at an unprecedented level.

In addition to its simplicity, Fabric Path enables faster, simpler, and flatter data center networks. Cisco FabricPath uses routing principles to allow Layer 2 scaling, bringing the stability of Layer 3 routing to Layer 2Fabric path traffic is no longer forwarded along a spanning tree design. As a result, we now have a more scalable design that is not limited by bisectional bandwidth.

**Understanding the Core Concepts of FabricPath**

At its core, FabricPath combines the best aspects of Layer 2 and Layer 3 networking, creating a unified fabric that simplifies data center operations. Unlike traditional Spanning Tree Protocol (STP), FabricPath employs a routing protocol that ensures optimal path selection and loop prevention. This results in a more efficient and robust network design, capable of handling the demands of modern applications. Key to this is the use of Intermediate System to Intermediate System (IS-IS) protocol, which facilitates dynamic path calculation and enhances network resilience.

**Advantages of Implementing FabricPath**

Cisco FabricPath offers a multitude of benefits that address common network challenges. One of its primary advantages is the elimination of STP-related issues, such as blocked links and inefficient bandwidth utilization. With FabricPath, all available paths are active, leading to increased throughput and reduced latency. Additionally, the protocol supports seamless scalability, allowing networks to grow without significant reconfiguration. This makes it an ideal choice for businesses looking to future-proof their infrastructure.

**Deploying FabricPath in Your Network**

Deploying Cisco FabricPath in your network involves several considerations to ensure optimal performance. First, it’s crucial to assess your current network architecture and identify areas that would benefit from FabricPath integration. Once identified, the implementation process involves configuring FabricPath on compatible Cisco Nexus switches and ensuring proper interoperability with existing network protocols. Proper planning and execution are essential to reap the full benefits of this innovative technology.

Key FabricPath Notes:

– Understanding the Basics: Fabric Path is a layer 2 network technology that provides the advantages of traditional Ethernet and routing protocols. By leveraging multiprotocol label switching (MPLS) techniques, it enables the creation of scalable, efficient, and resilient networks. Fabric Path ensures optimal traffic flow and minimizes congestion by utilizing a loop-free topology and distributing forwarding information across the network.

– Key Features and Benefits: Fabric Path offers several compelling features, making it a highly desirable solution for modern network architectures. Firstly, its ability to support large Layer 2 domains without the limitations of Spanning Tree Protocol (STP) enables efficient utilization of network resources.

Additionally, Fabric Path provides increased bandwidth and redundancy, leading to enhanced performance and reliability. With its support for Equal-Cost Multipath (ECMP) load balancing, the technology allows for efficient distribution of traffic across multiple paths, further optimizing network utilization.

– Seamless Migration and Interoperability: One of the significant advantages of Fabric Path is its seamless integration with existing network infrastructures. It enables organizations to migrate from conventional Ethernet-based networks to Fabric Path gradually, without disrupting their ongoing operations. This interoperability ensures a smooth transition and allows businesses to leverage the benefits of Fabric Path without compromising their existing investments.

– Real-World Applications: Fabric Path has found extensive applications in various industries and network environments. In data centers, it provides a highly scalable and flexible solution for building large Layer 2 domains, enabling seamless virtual machine mobility and workload distribution.

Similarly, Fabric Path offers simplified network design, reduced complexity, and improved performance in campus networks. Furthermore, it has proven to be an effective solution for service providers, facilitating the delivery of bandwidth-intensive services with high availability and resiliency.

**The need for layer 2** 

In the past, data centers were designed primarily to provide high availability. Layer 2 is crucial for modern data centers. Today’s networks must be agile and flexible, just like the organizations they serve since switching allows devices to be moved and infrastructure to be modified transparently, expanding the Layer 2 domain would satisfy this additional requirement. On the other hand, existing switching technologies rely on inefficient forwarding schemes based on spanning trees that cannot be extended to the entire network. The flexibility of Layer 2 compromises the scalability of Layer 3.

Spanning Tree Root Switch

**Routing concepts at layer 2**

Cisco FabricPath: Expanding Routing Concepts to Layer 2 Cisco® FabricPath extends routing stability and scalability to Layer 2 of Cisco NX-OS. Workloads can be moved across the entire data center by eliminating the need to segment the switched domain. As a result, the bisectional bandwidth of the network is no longer limited by a spanning tree, allowing for massive scalability.

An entirely new Layer 2 data plane is created with Cisco FabricPath, where frames enter the fabric with routable source and destination addresses. The source address of a frame is its receiving switch’s address, and its destination address is its destination switch’s address. As soon as the frame reaches the remote switch, it is de-encapsulated and delivered in its original Ethernet format.

The role of Fabric Path

In large data centers, virtualization of physical servers began a few years ago. Due to server virtualization and economies of scale, “mega data centers” containing tens of thousands of servers emerged due to server virtualization.

As a result, distributed applications had to be supported on a large scale and provisioned in different data center zones. A scalable and resilient Layer 2 fabric was required to enable any-to-any communication. FabricPath was developed by Cisco to meet these new demands. Providing scalability, resilience, and flexibility, FabricPath is a highly scalable Layer 2 fabric.

FabricPath

Fabric Path Requirements

– Massive Scalable Data Centers (MSDCs) and virtualization technologies have led to the development of large Layer 2 domains in data centers with more than 1000 servers and a design for scalability. Due to the limitations of Spanning Tree Protocol (STP), Layer 2 switching has evolved into technologies such as TRILL and FabricPath. To understand FabricPath’s limitations, you need to consider the limitations of current Layer 2 networks based on STP:

– By blocking redundant paths, STP creates loop-free topologies in Layer 2 networks. STP uses the root selection process to accomplish this. To build shortest paths to the root switch, all the other switches block the other ports while building shortest paths to the root switch. The result is a Layer 2 network topology that is loop-free.

– This blocks layer 2 networks because all redundant paths are blocked. PVST, which enables per-VLAN load balancing, also has limitations in multipathing support, although some enhancements were made using the per-VLAN Spanning Tree Protocol (PVSTP).

– The root bridge is selected based on the shortest path, which results in inefficient path selection between switches. So, selecting a path between switches doesn’t necessarily mean choosing the shortest path. Take two access switches as an example connected to distribution and each other. If the distribution switch serves as the root bridge for STP, the link between the two access switches is blocked. All traffic flows through the distribution switch.

– Unavailability of Time-To-Live (TTL): The Layer 2 packet header doesn’t have a TTL field. This can lead to network meltdowns in switched networks. This is because a forwarding loop can cause a broadcast packet to duplicate, consuming excessive network resources exponentially.

STP Path distribution

MAC address scalability: Nonhierarchical flat addressing of Ethernet MAC addresses leads to limited scalability since MAC address summarization is impossible. Additionally, all the MAC addresses are essentially assigned to every switch in the Layer 2 network, increasing the size of Layer 2 tables.

As a result, Layer 3 routing protocols provide multipathing and efficient shortest paths between all nodes in the network, which resolve the shortcomings of Layer 2 networks. Layer 3 solves these issues, but the network design becomes static. Static network design limits Layer 2 domain size, so virtualization cannot be used. Thanks to FabricPath’s combination of the two technologies, a Layer 2 network can be flexible and scaled with Layer 3 networks.

stp port states

Fabric Path Benefits

With FabricPath, data center architects and administrators can design and implement scalable Layer 2 fabrics. Benefits of FabricPath include:

Maintains the plug-and-play features of classical Ethernet: Due to the minimal configuration requirements and the fact that the administrator must include the FabricPath core network interfaces, configuration effort is significantly reduced.

Unicast forwarding, multicast forwarding, and VLAN pruning are also controlled by a single protocol (IS-IS). FabricPath operations, administration, and management (OAM) now support ping and trace routes, allowing network administrators to troubleshoot Layer 2 FabricPath networks similarly to Layer 3 networks.

**Multipathing**

Multipathing allows data center network architects to build large, scalable networks using N-way (more than one path) multipathing. A network administrator can also incrementally add new devices as needed to the existing topology. Using flat topologies, MSDC networks can be connected by only one hop between nodes. A single node failure in N-way multipathing results in a reduction in fabric bandwidth of 1/Nth.

With the enhanced Layer 2 network capabilities combined with Layer 3 capabilities, multiple paths can be created between endpoints instead of just one, replacing STP. It allows network administrators to increase bandwidth as bandwidth requirements increase incrementally.The FabricPath protocol enables traffic to be forwarded over the shortest path to the destination, reducing network latency. This is more efficient than Layer 2 forwarding based on STP.

With FabricPath, MAC addresses are learned selectively based on active flows with conversational MAC learning. As a result, the need for large MAC tables is reduced.

Related: Before you proceed, you may find the following posts helpful:

  1. What is VXLAN
  2. Data Center Fabric
  3. Nexus 1000
  4. SDN Data Center
  5. Data Center Network Design
  6. Cisco ACI

What is Fabric Path

Scalable Layer 2

We must support distributed applications at a considerable scale and have the flexibility to provision them in different zones of data center topologies. This necessitated creating a scalable and resilient Layer 2 fabric enabling any-to-any communication without workload placement restrictions—Cisco developed FabricPath to meet these new demands.

FabricPath is a powerful network technology from Cisco Systems that provides a unified, programmable fabric to connect, manage, and optimize data center networks. It is based on a distributed Layer 2 network protocol that enables the creation of multi-tenant, multi-domain, and multi-site networks with a single, unified control plane. FabricPath operates on a flat, non-hierarchical topology designed to simplify network virtualization and automation.

FabricPath delivers a highly scalable Layer 2 fabric

FabricPath delivers a highly scalable Layer 2 fabric. It uses a single control protocol (IS-IS) for unicast forwarding, multicast forwarding, and VLAN pruning. FabricPath also enables traffic to be forwarded across the shortest path to the destination, thus reducing latency in the Layer 2 network. This is more efficient than Layer 2 forwarding based on the STP.

FabricPath includes several features that make it ideal for large enterprise networks and data centers. It uses a distributed control plane to provide a unified view of the network and reduce network complexity. In addition, FabricPath supports virtualization, allowing the creation of multiple virtual networks within the same physical infrastructure. It also allows the creation of multiple forwarding instances and provides fast convergence times.

FabricPath
Diagram: FabricPath. Source is Cisco

The Challenges Of Inefficient Forwarding Schemes

1: The challenge is that existing switching technologies have inefficient IP forwarding schemes based on spanning trees and cannot be extended to the network. Therefore, current designs compromise the flexibility of Layer 2 and the scaling offered by Layer 3. On the other hand, Fabric Path introduces a new method of forwarding. 

2: The data design can stay the same as a leaf and spine. Still, we have a new Layer 2 data plane with fabric paths that encapsulate the frames entering the fabric with a header consisting of routable source and destination addresses.

3: These addresses are the address of the switch on which the frame was received and the address of the destination switch to which the frame is heading. From there, the frame is routed until it reaches the remote switch, where it is de-encapsulated and delivered in its original Ethernet format. FabricPath Nexus also uses a Shortest Path First (SPF) routing protocol to determine reachability and path selection in the FabricPath domain.

4: With Fabric Path, we have a simple and flexible behavior of Layer 2 while using the routing mechanisms that make IP reliable and scalable. So you may ask, what about the Layer 2 and 3 boundaries? The Layer 2 and 3 boundary still exists in a data center based on Cisco FabricPath. However, there is little difference in how traffic is forwarded in those two distinct areas of the network. The following sections discuss the drivers for FabricPath and what you may opt for in its design.

**Why Cisco Fabricpath?**

1) No Multipathing support at Layer 2: Spanning Tree Protocol ( STP ) lacks any good Layer 2 multipathing features for large data centers. The protocol has been enhanced with PVST per VLAN load balancing, but this feature can only load balance on VLANs.

2) MAC address scalability: Layer 2 end hosts are discovered by their MAC address, and this type of host addressing cannot be hierarchical and summarized. For example, one MAC address cannot represent a stub of networks. Traditional Layer 3 networks overcome this by introducing ABRs in OSPF or summarization/filtering in EIGRP. Also, in the Layer 2 network, all the MAC addresses are populated in ALL switches, leading to large requirements in the Layer 2 table sizes.

3) Instability of Layer 2 networks: Layer 3 networks have an eight-bit Time to Live ( TTL ) field that prevents datagrams from persisting (e.g., going in circles ) on the internet. However, compared to Layer 3 headers, the Layer 2 packet header does not have a TTL field. The lack of a TTL field will cause Layer 2 packets to loop infinitely, causing a network meltdown.

4) Incompetent path selection: The shortest path for a Layer 2 network depends on the placement of the Root switch. Depending on costs and port priorities, you can influence the root port selection ( forwarding port ), but the root switch’s placement is how the forwarding path is built.

For example, in the diagram below, the most optimum traffic for the server-to-server flows would be via the inter-switch link, but as you can see, spanning tree blocks, this port, and traffic takes the sub-optimal path through the distribution switch.

Issues with Spanning Tree: Vendors’ responses.

A Spanning Tree allows only one path to be active between any two nodes and blocks the rest, making it unsuitable for low-latency data centers and cloud environments. Every vendor addressing the data center market proposes augmenting or replacing a Spanning Tree with a link-state protocol.

For example, Brocade uses TRILL in the data plane, while the control plane is based on Fabric Shortest Path First, an ANSI standard used by all Fibre Channel SAN fabrics as the link-state routing protocol.

On the other hand, Juniper implemented a tagging mechanism in the Broadcom silicon in its QFabric switches rather than a link-state protocol. Cisco FabricPath is considered a “superset” of TRILL, bringing scale to the data center and improving application performance.

Fabric Path typical use cases

Fabric Path can support any new protocol that can be done elegantly in IS-IS by adding new extensions without modifying the base infrastructure. Each IS-IS Intermediate router advertises one or more IS-IS Link State Protocol Data Units (LSPs) with routing information.

The LSP comprises a fixed header and several tuples, each consisting of a Type, a Length, and a Value. Such tuples are commonly known as TLVs and are a good way of encoding information in a flexible and extensible format. These make IS-IS a very extensible routing protocol, and FabricPath takes advantage of this extensibility.

This allows FarbicPath to support the following prominent use cases.

  1. Large flat data centers that need Layer 2 multipathing and equidistant endpoints.
  2. DC requires a reduction of Layer 2 table sizes ( done via MAC conversational learning ).
Fabric Path
Diagram: Fabric Path Conversational learning. Source is Cisco

Cisco FabricPath control plane

FabricPath is a Layer 2 overlay network with an IS-IS control plane. Using FabricPath IS-IS, the switches build their forwarding tables, similar to building the forwarding table in Layer 3 networks. The extensions used in IS-IS to support Fabricpath allow this Layer 2 overlay to take advantage of all the scalable and load balancing ( ECMP, up to 16 routes ) benefits of a Layer 3 network while retaining the benefits of a plug-and-play Layer 2 network.

The FabricPath Header

The FabricPath header has a hop count in one of the fields, which mitigates temporary loops in FabricPath networks. This header uses locally assigned hierarchical MAC addresses for forwarding frames within the network. The original Layer 2 frames are encapsulated with a FabricPath header, and a new CRC is appended to the existing packet. One of the main elements of the FabricPath header is the SwitchID, and the core switches forward Fabricpath traffic by examining this field. The switch ID is the field used in the FabricPath domain to forward packets to the correct destination switch.

Why use IS-IS as the FabricPath Nexus control plane?

We touched on this just a moment ago. Its control protocol is built on top of the Intermediate System–to Intermediate System (IS-IS) routing protocol, which provides fast convergence and has been proven to scale up to the largest service provider environments.

  1. IS-IS is flexible and can be extended to support other functions with new type-length values (TLVs).
  2. TLV is also known as tag-length value and encodes optional information.IS-IS runs directly over the link layer, thereby preventing the need for any underlying Layer 3 protocol like IP to work.

Virtual PortChannel

Fabricpath Nexus uses Virtual PortChannel. Now, we have multiple active link capabilities, resulting in active-active forwarding paths. The vPC allows a more granular design over standard port channeling, which only allows you to terminate on one switch. In addition, Cisco vPC enables a more flexible triangular design. Both aggregation technologies can use LACP for the control plane to negotiate the links.

Virtual Device Context

Fabricpath Nexus also uses Virtual Device Contexts (VDC), which allows each FabricPath control-plane protocol and functional block to run in its own protected memory space as individual processes for stability and fault isolation. A VDC design enables modular building blocks to improve security and performance.

**FabricPath Nexus and conversational MAC learning**

FabricPath Nexus performs conversational MAC learning, enabling a switch to learn only those MACs involved in active bidirectional communication. Similar to a three-way handshake, this new technique leads to the population of only the interested host’s MAC addresses rather than all MAC addresses in the domain. This dramatically reduces the need for large table sizes as each switch only learns the MAC addresses that the hosts under its interface are actively communicating with. As a result, edge nodes only know the MAC addresses of local nodes or nodes that want to speak with local nodes directly.

FabricPath Nexus benefits and drawbacks

Benefits  

Drawbacks

Plug-and-play features like Classical Ethernet

 Cisco proprietary

The single control plane for ALL types of traffic and good troubleshooting features to debug problems at Layer 2 

Fabric interfaces carry only FabricPath encapsulated traffic

High performance and high availability using multipathing**

Useful as a DCI solution only over short distances

Easy to add new devices to an existing FabricPath domain

NA

Small Layer 2 table sizes result in better performance

NA

** This enables the MSDC networks to have flat topologies, separating the nodes by a single hop.

Although IS-IS forms the basis of Cisco FabricPath, you don’t need to be an IS-IS expert. You can enable FabricPath interfaces and begin forwarding FabricPath encapsulated frames in the same way they can activate Spanning Tree and interconnect switches.

The only necessary configuration is distinguishing the core ports, which link the switches, from the edge ports, where end devices are attached. No other parameters need to be tuned to achieve an optimal configuration, and the switch addresses are assigned automatically for you.

Closing Points on FabricPath

Cisco FabricPath is an innovative technology that combines the best of both Layer 2 and Layer 3 architectures. By leveraging the Intermediate System to Intermediate System (IS-IS) protocol, it simplifies network topology and enhances scalability. Unlike traditional spanning tree protocols, FabricPath eliminates loop issues and provides multipath capabilities, ensuring optimal data flow across the network. This section will explore how FabricPath works and why it’s a game-changer for modern networks.

One of the standout features of Cisco FabricPath is its ability to create a flat, scalable network fabric that supports large-scale data center environments. It offers seamless mobility, allowing for the efficient movement of virtual machines across the network without impacting performance. Additionally, FabricPath supports equal-cost multipathing (ECMP), which optimizes bandwidth usage and enhances network resiliency. This section will break down these features and explain how they contribute to a more efficient and reliable networking environment.

Successfully deploying Cisco FabricPath requires a strategic approach. This section will provide best practices for implementation, including considerations for network design, hardware compatibility, and performance optimization. We’ll also discuss common challenges and how to overcome them, ensuring a smooth transition to a FabricPath-enabled network. By following these guidelines, organizations can maximize the benefits of FabricPath and future-proof their network infrastructure.

Summary: What is Fabric Path

Key Features of Cisco FabricPath:

1. MAC-in-MAC Encapsulation: Cisco FabricPath utilizes MAC-in-MAC encapsulation to overcome the traditional Spanning Tree Protocol (STP) limitations. By encapsulating Layer 2 frames within another Layer 2 frame, FabricPath enables efficient forwarding and eliminates the need for STP.

2. Loop-Free Topology: Unlike STP-based networks, Cisco FabricPath employs a loop-free topology, ensuring optimal forwarding paths and maximizing network utilization. This feature enhances network resilience and eliminates the risk of network outages caused by loops.

3. Scalability: Cisco FabricPath supports up to 16 million virtual ports, enabling organizations to scale their networks without compromising performance. This scalability makes it ideal for large data centers and enterprises with growing network demands.

4. Traffic Optimization: Cisco FabricPath optimizes traffic flows using Equal-Cost Multipath (ECMP) routing. ECMP distributes traffic across multiple paths, allowing for efficient load balancing and improved network performance.

Benefits of Cisco FabricPath:

1. Simplified Network Design: Cisco FabricPath simplifies network design by eliminating the need for complex STP configurations. With its loop-free architecture, FabricPath reduces network complexity and improves overall network stability.

2. Enhanced Network Resilience: Cisco FabricPath ensures high availability and resilience by utilizing multiple paths and load-balancing techniques. In the event of a link failure, traffic is automatically rerouted, minimizing downtime and enhancing network reliability.

3. Increased Performance: With its scalable design and traffic optimization capabilities, Cisco FabricPath delivers superior network performance. By distributing traffic across multiple paths, FabricPath minimizes bottlenecks and improves overall network throughput.

Use Cases of Cisco FabricPath:

1. Data Center Networks: Cisco FabricPath is widely used in data center environments, providing a scalable and resilient networking solution. Its ability to handle high traffic volumes and optimize data flows makes it an ideal choice for modern data centers.

2. Virtualized Environments: Cisco FabricPath is particularly beneficial in virtualized environments. It simplifies network provisioning and enhances virtual machine mobility. Its scalability and flexibility enable seamless communication between virtualized resources.

Conclusion: Cisco FabricPath is a powerful networking solution that offers numerous benefits for organizations seeking scalable and resilient network architectures. With its loop-free topology, MAC-in-MAC encapsulation, and traffic optimization capabilities, Cisco FabricPath simplifies network design, enhances network resilience, and boosts overall performance. By implementing Cisco FabricPath, businesses can build robust and efficient networks that meet the demands of today’s digital landscape

What is VXLAN

What is VXLAN

What is VXLAN

In the rapidly evolving networking world, virtualization has become critical for businesses seeking to optimize their IT infrastructure. One key technology that has emerged is VXLAN (Virtual Extensible LAN), which enables the creation of virtual networks independent of physical network infrastructure. In this blog post, we will delve into the concept of VXLAN, its benefits, and its role in network virtualization.

VXLAN is an encapsulation protocol designed to extend Layer 2 (Ethernet) networks over Layer 3 (IP) networks. It provides a scalable and flexible solution for creating virtualized networks, enabling seamless communication between virtual machines (VMs) and physical servers across different data centers or geographic regions.

VXLAN is a technology that creates virtual networks within an existing physical network. A Layer 2 overlay network runs on top of the current Layer 2 network. VXLAN utilizes UDP as the transport protocol, providing a secure, efficient, and reliable way to create a virtual network.

VXLAN encapsulates the original Layer 2 Ethernet frames within UDP packets, using a 24-bit VXLAN Network Identifier (VNI) to distinguish between different virtual networks. The encapsulated packets are then transmitted over the underlying IP network, enabling the creation of virtualized Layer 2 networks across Layer 3 boundaries.

Scalability: VXLAN solves the limitations of traditional VLANs by providing a much larger network identifier space, accommodating up to 16 million virtual networks. This scalability allows for the efficient isolation and segmentation of network traffic in highly virtualized environments.

VXLAN enables the decoupling of virtual and physical networks, providing the flexibility to move virtual machines across different physical hosts or even data centers without the need for reconfiguration. This flexibility greatly simplifies workload mobility and enhances overall network agility.

Multitenancy: With VXLAN, multiple tenants can securely share the same physical infrastructure while maintaining isolation between their virtual networks. This is achieved by assigning unique VNIs to each tenant, ensuring their traffic remains separate and secure.

Underlay Network: VXLAN relies on an IP underlay network, which must provide sufficient bandwidth, low latency, and optimal routing. Careful planning and design of the underlay network are crucial to ensure optimal VXLAN performance.

Network Virtualization Gateway: To enable communication between VXLAN-based virtual networks and traditional VLAN-based networks, a network virtualization gateway, such as a VXLAN Gateway or an overlay-to-underlay gateway, is required. These gateways bridge the gap between virtual and physical networks, facilitating seamless connectivity.

Highlights: What is VXLAN

Understanding VXLAN Basics

It is essential to grasp VXLAN’s fundamental concepts to comprehend it. VXLAN enables the creation of virtualized Layer 2 networks over an existing Layer 3 infrastructure. It uses encapsulation techniques to extend Layer 2 segments over long distances, enabling flexible deployment of virtual machines across physical hosts and data centers.

VXLAN Encapsulation: One of the key components of VXLAN is encapsulation. When a virtual machine sends data across the network, VXLAN encapsulates the original Ethernet frame within a new UDP/IP packet. This encapsulated packet is then transmitted over the underlying Layer 3 network, allowing for seamless communication between virtual machines regardless of their physical location.

VXLAN Tunneling: VXLAN employs tunneling to transport the encapsulated packets between VXLAN-enabled devices. These devices, known as VXLAN Tunnel Endpoints (VTEPs), establish tunnels to carry VXLAN traffic. By leveraging tunneling protocols like Generic Routing Encapsulation (GRE) or Virtual Extensible LAN (VXLAN-GPE), VTEPs ensure the delivery of encapsulated packets across the network.

**Benefits of VXLAN**

VXLAN brings numerous benefits to modern network architectures. It enables network virtualization and multi-tenancy, allowing for the efficient and secure isolation of network segments. VXLAN also provides scalability, as it can support a significantly higher number of virtual networks than traditional VLAN-based networks. Additionally, VXLAN facilitates workload mobility and disaster recovery, making it an ideal choice for cloud environments.

**Implementing VXLAN**

VXLAN Implementation Considerations: While VXLAN offers immense advantages, there are a few considerations to consider when implementing it. VXLAN requires network devices that support the technology, including VTEPs and VXLAN-aware switches. It is also crucial to properly configure and manage the VXLAN overlay network to ensure optimal performance and security.

Data centers evolution

In recent years, data centers have seen a significant evolution. This evolution has brought popular technologies such as virtualization, cloud computing (private, public, and hybrid), and software-defined networking (SDN). Mobile-first and cloud-native data centers must scale, be agile, secure, consolidate, and integrate with compute/storage orchestrators. As well as visibility, automation, ease of management, operability, troubleshooting, and advanced analytics, today’s data center solutions are expected to include many other features.

A more service-centric approach is replacing device-by-device management. Most requests for proposals (RFPs) specify open application programming interfaces (APIs) and standards-based protocols to prevent vendor lock-in. A Cisco Virtual Extensible LAN (VXLAN)-based fabric using Nexus switches2 and NX-OS controllers form Cisco Virtual Extensible LAN (VXLAN).

what is spine and leaf architecture
Diagram: What is spine and leaf architecture. 2-Tier Spine Leaf Design

Issues with STP

When a switch receives redundant paths, the spanning tree protocol must designate one of those paths as blocked to prevent loops. While this mechanism is necessary, it can lead to suboptimal network performance. Blocked ports limit bandwidth utilization, which can be particularly problematic in environments with heavy data traffic.

One significant concern with the spanning tree protocol is its slow convergence time. When a network topology changes, the protocol takes time to recompute the spanning tree and reestablish connectivity. During this convergence period, network downtime can occur, disrupting critical operations and causing frustration for users.

stp port states

What is VXLAN?

The Internet Engineering Task Force (IETF) developed VXLAN, or Virtual eXtensible Local-Area Network, as a network virtualization technology standard. Multi-tenant networks allow multiple organizations to share a physical network without accessing each other’s traffic.

The VXLAN can be compared to individual apartment apartments: each apartment is a separate, private dwelling within a shared physical structure, just as each VXLAN is a discrete, private network segment within a shared physical infrastructure.

With VXLANs, physical networks can be segmented into 16 million logical networks. To encapsulate Layer 2 Ethernet frames, User Datagram Protocol (UDP) packets with a VXLAN header are used. Combining VXLAN with Ethernet virtual private networks (EVPNs), which transport Ethernet traffic over WAN protocols, allows Layer 2 networks to be extended across Layer 3 IP or MPLS networks.

**Benefits of VXLAN:**

– Scalability: VXLAN allows creating up to 16 million logical networks, providing the scalability required for large-scale virtualized environments.

– Network Segmentation: By leveraging VXLAN, organizations can segment their networks into virtual segments, enhancing security and isolating traffic between applications or user groups.

– Flexibility and Mobility: VXLAN enables the movement of VMs across physical servers and data centers without the need to reconfigure network settings. This flexibility is crucial for workload mobility in dynamic environments.

– Interoperability: VXLAN is an industry-standard protocol supported by various networking vendors, ensuring compatibility across different network devices and platforms.

**Use Cases for VXLAN**

– Data Center Interconnect (DCI): VXLAN allows organizations to interconnect multiple data centers, enabling seamless workload migration, disaster recovery, and workload balancing across different locations.

– Multi-Tenant Environments: VXLAN enables service providers to offer virtualized network services to multiple tenants securely and isolatedly. This is particularly useful in cloud computing environments.

– Network Virtualization: VXLAN plays a crucial role in network virtualization, allowing organizations to create virtual networks independent of the underlying physical infrastructure. This enables greater flexibility and agility in managing network resources.

**VXLAN vs. GRE**

VXLAN, an overlay network technology, is designed to address the limitations of traditional VLANs. It enables the creation of virtual networks over an existing Layer 3 infrastructure, allowing for more flexible and scalable network deployments. VXLAN operates by encapsulating Layer 2 Ethernet frames within UDP packets, extending Layer 2 domains across Layer 3 boundaries.

GRE, on the other hand, is a simple IP packet encapsulation protocol. It provides a mechanism for encapsulating arbitrary protocols over an IP network and is widely used for creating point-to-point tunnels. GRE encapsulates the payload packets within IP packets, making it a versatile option for connecting remote networks securely.

GRE without IPsec

Point-to-point GRE networks serve as a foundational element in modern networking. They allow for encapsulation and efficient transmission of various protocols over an IP network. Point-to-point GRE networks enable seamless communication and data transfer by establishing a direct virtual link between two endpoints.

Understanding mGRE

mGRE serves as the foundation for building DMVPN networks. It allows multiple sites to communicate with each other over a shared public network infrastructure while maintaining security and scalability. By utilizing a single mGRE tunnel interface on a central hub router, multiple spoke routers can dynamically establish and tear down tunnels, enabling seamless communication across the network.

The utilization of mGRE within DMVPN offers several key advantages. First, it simplifies network configuration by eliminating the need for point-to-point tunnels between each spoke router. Second, mGRE provides scalability, allowing for the dynamic addition or removal of spoke routers without impacting the overall network infrastructure. Third, mGRE enhances network resiliency by supporting multiple paths and providing load-balancing mechanisms.

Key VXLAN advantages

Because VXLANs are encapsulated inside UDP packets, they can run on any network that can send UDP packets. No matter how physically or geographically far a VTEP is from the decapsulating VTEP, it must forward UDP datagrams. 

VXLAN and EVPN enable operators to create virtual networks from physical ports on any Layer 3 network switch supporting the standard. Connecting a port on switch A to two ports on switch B and another port on switch C creates a virtual network that appears to all connected devices as one physical network. Devices in this virtual network cannot see VXLANs or the underlying network fabric.

**Problems that VXLAN solves**

In the same way, as server virtualization has increased agility and flexibility, decoupling virtual networks from physical infrastructure has done the same. Therefore, network operators can scale their infrastructure rapidly and economically to meet growing demand while securely sharing a single physical network. For privacy and security reasons, networks are segmented to prevent one tenant from seeing or accessing the traffic of another.

In a similar way to traditional virtual LANs (VLANs), VXLANs enable operators to overcome the scaling limitations associated with VLANs.

  • Up to 16 million VXLANs can be created in an administrative domain, compared to 4094 traditional VLANs. Cloud and service providers can segment networks using VXLANs to support many tenants.
  • By using a VXLAN, you can create network segments between different data centers. In traditional VLAN networks, broadcast domains are created by segmenting traffic by VLAN tags, but once a packet containing VLAN tags reaches a router, the VLAN information is removed. There is no limit to the distance VLANs can travel within a Layer 2 network. Layer 3 boundaries, such as virtual machine migration, are generally avoided in certain use cases. Segmenting networks based on VXLAN encapsulates packets as UDP packets, while segmenting networks based on VXLAN encapsulates packets as IP packets. A virtual overlay network can extend as far as the physical Layer 3 routed network can reach when all switches and routers in the path support VXLAN without the applications running on the overlay network having to cross any Layer 3 boundaries. Servers connected to the network are still part of the Layer 2 network, even though UDP packets may have transited one or more routers.
  • Using Layer 2 segmentation on top of an underlying Layer 3 network allows one to segment a Layer 2 network over an underlying Layer 3 network and support many network segments. By providing Layer 2 segmentation on top of an underlying Layer 3 network, Layer 2 networks can remain small even if they are distant. Smaller Layer 2 networks can prevent MAC table overflows on switches.

Primary VXLAN applications

A service provider or cloud provider deploys VXLAN for apparent reasons: they have many tenants or customers, and they must separate the traffic of one customer from another due to legal, privacy, and ethical considerations.

Users, departments, or other groups of network-segmented devices may be tenants in enterprise environments for security reasons. Isolating IoT network traffic from production network applications is a good security practice for Internet of Things (IoT) devices such as data center environmental sensors.

VXLAN has been widely adopted and is now used in many large enterprise networks for virtualization and cloud computing. It provides:

  • A secure and efficient way to create virtual networks.
  • Allowing for the creation of multi-tenant segmentation.
  • Efficient routing.
  • Hardware-agnostic capabilities.

With its widespread adoption, VXLAN has become an essential technology for network virtualization.

Example: VXLAN Flood and Learn

Understanding VXLAN Flood and Learn

VXLAN flood and learn handles unknown unicast, multicast, and broadcast traffic in VXLAN networks. It allows the network to learn and forward traffic to the appropriate destination without relying on traditional flooding techniques. By leveraging multicast, VXLAN flood and learn improves efficiency and reduces the network’s reliance on flooding every unknown packet.

Proper multicast group management is essential to implementing VXLAN flood and learning with multicast. VXLAN uses multicast groups to distribute unknown traffic efficiently within the network. 

VXLAN flood and learn with multicast offers several benefits for data center networks. Firstly, it reduces the flooding of unknown traffic, which helps minimize network congestion and improves overall performance. Additionally, it allows for better scalability by avoiding the need to flood every unknown packet to all VTEPs (VXLAN Tunnel Endpoint). This results in more efficient network utilization and reduced processing overhead.

Related: Before you proceed, you may find the following posts helpful for pre-information:

  1. Data Center Topologies
  2. Segment Routing
  3. What is OpenFlow
  4. Overlay Virtual Networks
  5. Layer 3 Data Center

What is VXLAN

Traditional layer two networks have issues because of the following reasons:

  • Spanning tree: Restricts links.
  • Limited amount of VLANs: Restricts scalability;
  • Large MAC address tables: Restricts scalability and mobility

Spanning-tree avoids loops by blocking redundant links. By blocking connections, we create a loop-free topology and pay for links we can’t use. Although we could switch to a layer three network, some technologies require layer two networking.

VLAN IDs are 12 bits long, so we can create 4094 VLANs (0 and 4095 are reserved). Data centers may need help with only 4094 available VLANs. Let’s say we have a service provider with 500 customers. There are 4094 available VLANs, so each customer can only have eight.

STP Path distribution

The Role of Server Virtualization

Server virtualization has exponentially increased the number of addresses in our switches’ MAC addresses. Before server virtualization, there was only one MAC address per switch port. With server virtualization, we can run many virtual machines (VMs) or containers on a single physical server. Virtual NICs and virtual MAC addresses are assigned to each virtual machine. One switch port must learn many MAC addresses.

A data center could connect 24 or 48 physical servers to a top-of-rack (ToR) switch. Since there may be many racks in a data center, each switch must store the MAC addresses of all VMs that communicate. Networks without server virtualization require much larger MAC address tables.

Guide: VXLAN

In the following lab, I created a Layer 2 overlay with VXLAN over a Layer 3 core. A bridge domain VNI of 6001 must match both sides of the overlay tunnel. What Is a VNI? The VLAN ID field in an Ethernet frame has only 12 bits, so VLAN cannot meet isolation requirements on data center networks. The emergence of VNI specifically solves this problem.

Note: The VNI

A VNI is a user identifier similar to a VLAN ID. A VNI identifies a tenant. VMs with different VNIs cannot communicate at Layer 2. During VXLAN packet encapsulation, a 24-bit VNI is added to a VXLAN packet, enabling VXLAN to isolate many tenants.

In the screenshot below, you will notice that I can ping from desktop 0 to desktop one even though the IP addresses are not in the routing table of the core devices, simulating a Layer 2 overlay. Consider VXLAN to be the overlay and the routing Layer 3 core to be the underlay.

VXLAN overlay
Diagram: VXLAN Overlay

In the following screenshot, notice that the VNI has been changed. The VNI needs to be changed in two places in the configuration, as illustrated below. Once changed, the Peers are down; however, the NVE  interface remains up. The VXLAN layer two overlay is not operational.

Diagram: Changing the VNI

How does VXLAN work?

VXLAN uses tunneling to encapsulate Layer 2 Ethernet frames within IP packets. Each VXLAN network is identified by a unique 24-bit segment ID, the VXLAN Network Identifier (VNI). The source VM encapsulates the original Ethernet frame with a VXLAN header, including the VNI. The encapsulated packet is then sent over the physical IP network to the destination VM and decapsulated to retrieve the original Ethernet frame.

Analysis:

Notice below that it is running a ping from desktop 0 to desktop 1. The IP addresses assigned to this host are 10.0.0.1 and 10.0.0.2. First, notice that the ping is booming. When I do a packet capture on the links Gi1 connected to Leaf A, we see the encapsulation of the ICMP echo request and reply.

Everything is encapsulated into UDP port 1024. In my configurations of Leaf A and Leaf B, I explicitly set the VXLAN port to 1024.

VXLAN unicast mode

XLAN and Network Virtualization.

VXLAN and network virtualization

VXLAN is a form of network virtualization. Network virtualization cuts a single physical network into many virtual networks, often called network overlays. Virtualizing a resource allows it to be shared by multiple users. Virtualization provides the illusion that each user is on his or her resources.

In the case of virtual networks, each user is under the misconception that there are no other users of the network. To preserve the illusion, virtual networks are separated from one another. Packets cannot leak from one virtual network to another.

Network Virtualization
Diagram: Network Virtualization. Source Parallels

VXLAN Loop Detection and Prevention

So, before we dive into the benefits of VXLAN, let us address the basics of loop detection and prevention, which is a significant driver for using network overlays such as VLXAN. The challenge is that data frames can exist indefinitely when loops occur, disrupting network stability and degrading performance.

In addition, loops introduce broadcast radiation, increasing CPU and network bandwidth utilization, which degrades user application access experience. Finally, in multi-site networks, a loop can span multiple data centers, causing disruptions that are difficult to pinpoint. Overlay networking can solve much of this.

VXLAN vs VLAN

However, first-generation Layer-2 Ethernet networks could not natively detect or mitigate looped topologies, while modern Layer-2 overlays implicitly build loop-free topologies. Therefore, overlays do not need loop detection and mitigation as long as no first-gen Layer-2 network is attached. Essentially, there is no need for a VXLAN spanning tree.

So, one of the differences between VXLAN vs VLAN is that the VLAN has a 12-bit VID while VXLAN has a 24-bit VID network identifier, allowing you to create up to 16 million segments. VXLAN has tremendous scale and stable loop-free networking and is a foundation technology in the ACI Cisco.

Spanning tree VXLAN
Diagram: Loop prevention. Source is Cisco

VXLAN and Data Center Interconnect

VXLAN has revolutionized data center interconnect by providing a scalable, flexible, and efficient solution for extending Layer 2 networks. Its ability to enable network segmentation, multi-tenancy support, and seamless mobility makes it a valuable technology for modern businesses.

However, careful planning, consideration of network infrastructure, and security measures are essential for successful implementation. By harnessing the power of VXLAN, organizations can achieve a more agile, scalable, and interconnected data center environment.

VXLAN vs VLAN: The VXLAN Benefits Drive Adoption

Introduced by Cisco and VMware and now heavily used in open networking, VXLAN stands for Virtual eXtensible Local Area Network. It is perhaps the most popular overlay technology for IP-based SDN data centers and is used extensively with ACI networks.

VXLAN was explicitly designed for Layer 2 over Layer 3 tunneling. Its early competition from NVGRE and STT is fading away, and VXLAN is becoming the industry standard. VLXAN brings many advantages, especially in loop prevention, as there is no need for a VXLAN spanning tree.

VXLAN Benefits
VXLAN Benefits: Scale and loop-free networks.

Today, overlays such as VXLAN almost eliminate the dependency on loop prevention protocols. However, even though virtualized overlay networks such as VXLAN are loop-free, having a failsafe loop detection and mitigation method is still desirable because loops can be introduced by topologies connected to the overlay network.

Loop prevention traditionally started with Spanning Tree Protocols (STP) to counteract the loop problem in first-gen Layer-2 Ethernet networks. Over time, other approaches evolved by moving networks from “looped topologies” to “loop-free topologies.

While LAG and MLAG were used, other approaches for building loop-free topologies arose using ECMP at the MAC or IP layers. For example, FabricPath or TRILL is a MAC layer ECMP approach that emerged in the last decade. More recently, network virtualization overlays that build loop-free topologies on top of IP layer ECMP became state-of-the-art.

What is VXLAN
What is VXLAN and the components involved?

VXLAN vs VLAN: Why Introduce VXLAN?

  1. STP issues and scalability constraints: STP is undesirable on a large scale and lacks a proper load-balancing mechanism. A solution was needed to leverage the ECMP capabilities of an IP network while offering extended VLANs across an IP core, i.e., virtual segments across the network core. There is no VXLAN spanning tree.
  2. Multi-tenancy: Layer 2 networks are capped at 4000 VLANs, restricting multi-tenancy design—a big difference in the VXLAN vs VLAN debates.
  3. ToR table scalability: Every ToR switch may need to support several virtual servers, and each virtual server requires several NICs and MAC addresses. This pushes the limits on the ToR switch’s table sizes. In addition, after the ToR tables become full, Layer 2 traffic will be treated as unknown unicast traffic, which will be flooded across the network, causing instability to a previously stable core.
STP Blocking.
Diagram: STP Blocking. Source Cisco Press free chapter.

VXLAN use cases

Use Case 

VXLAN Details

Use Case 1

Multi-tenant IaaS Clouds where you need a large number of segments

Use Case 2

Link Virtual to Physical Servers. This is done via software or hardware VXLAN to VLAN gateway

Use Case 3

HA Clusters across failure domains/availability zones

Use Case 4

VXLAN works well over fabrics that have equidistant endpoints

Use Case 5

VXLAN-encapsulated VLAN traffic across availability zones must be rate-limited to prevent broadcast storm propagation across multiple availability zones

What is VXLAN? The operations

When discussing VXLAN vs VLAN, VXLAN employs a MAC over IP/UDP overlay scheme and extends the traditional VLAN boundary of 4000 VLANs. The 12-bit VLAN identifier in traditional VLANs capped scalability within the SDN data center and proved cumbersome if you wanted a VLAN per application segment model. VXLAN scales the 12-bit to a 24-bit identifier and allows for 16 million logical endpoints, with each endpoint potentially offering another 4,000 VLANs.

While tunneling does provide Layer 2 adjacency between these logical endpoints and allows VMs to move across boundaries, the main driver for its insertion was to overcome the challenge of having only 4000 VLAN.

Typically, an application segment has multiple segments; between each segment, you will have firewalling and load-balancing services, and each segment requires a different VLAN. The Layer 2 VLAN segment transfers non-routable heartbeats or state information that can’t cross an L3 boundary. You will soon reach the 4000k VLAN limit if you are a cloud provider.

vxlan vs vlan
Multiple segments are required per application stack.

The control plane

The control plane is very similar to the spanning tree control plane. If a switch receives a packet destined for an unknown address, the switch will forward the packet to an IP address that floods the packet to all the other switches.

This IP address is, in turn, mapped to a multicast group across the network. VXLAN doesn’t explicitly have a control plane and requires an IP multicast running in the core for forwarding traffic and host discovery.

**Best practices for enabling IP Multicast in the core**

IP Multicast

In the Core

  1. Bidirectional PIM or PIM Sparse Mode
  1. Redundant Rendezvous Points (RP)
  1. Shared trees (reduce the amount of IP multicast state)
  1. Always check the IP multicast table sizes on core and ToR switches
  1. Single IP multicast address for multiple VXLAN segments is OK

The requirement for IP multicast in the core made VXLAN undesirable from an operation point of view. For example, creating the tunnel endpoints is simple, but introducing a protocol like IP multicast to a core just for the tunnel control plane was considered undesirable. As a result, some of the more recent versions of VXLAN support IP unicast.

VXLAN uses a MAC over IP/UDP solution to eliminate the need for a spanning tree. There is no VXLAN spanning tree. This enables the core to be IP and not run a spanning tree. Many people ask why VXLAN uses UDP. The reason is that the UDP port numbers cause VXLAN to inherit Layer 3 ECMP features. The entropy that enables load balancing across multiple paths is embedded into the UDP source port of the overlay header.

2nd Lab Guide: Multicast VLXAN

In this lab guide, we will look at a VXLAN multicast mode. The multicat mode requires both unicast and multicast connectivity between sites. Similar to the previous one, this configuration guide uses OSPF to provide unicast connectivity, and now we have an additional bidirectional Protocol Independent Multicast (PIM) to provide multicast connectivity.

This does not mean that you don’t have a multicast-enabled core. It would be best if you still had multicast enabled on the core. 

So we are not tunneling multicast over an IPv4 core without having multicast enabled on the core. I have multicast on all Layer 3 interfaces, and the mroute table is populated on all Layer 3 routers. With the command: Show ip mroute, we are tunneling the multicast traffic, and with the command: Show nve vni, we have multicast group 239.0.0.10 and a state of UP.

Multicast VXLAN
Diagram: Multicast VXLAN

VXLAN benefits and stability

The underlying control plan network impacts the stability of VXLAN and the applications running within it. For example, if the underlying IP network cannot converge quickly enough, VLXAN packets may be dropped, and an application cache timeout may be triggered.

The rate of change in the underlying network significantly impacts the stability of the tunnels, yet the rate and change of the tunnels do not affect the underlying control plane. This is similar to how the strength of an MPLS / VPN overlay is affected by the core’s IGP.

VXLAN Points

VXLAN benefits

VXLAN drawbacks

Point 1

Runs over IP Transport

 No control plane

Point 2

Offers a large number of logical endpoints 

Needs IP Multicast***

Point 3

Reduced flooding scope

No IGMP snooping ( yet )

Point 4

Eliminates STP

No Pvlan support

Point 5

Easily integrated over existing Core

Requires Jumbo frames in the core ( 50 bytes)

Point 6

Minimal host-to-network integration

No built-in security features **

Point 7

Not a DCI solution ( no arp reduction, first-hop gateway localization, no inbound traffic steering i.e, LISP )

** VXLAN has no built-in security features. Anyone who gains access to the core network can insert traffic into segments. The VXLAN transport network must be secure, as no existing firewall or intrusion prevention system (IPS) equipment can be seen in the VXLAN traffic.

*** Recent versions have Unicast VXLAN. Nexus 1000V release 4.2(1)SV2(2.1)

Updated: VXLAN enhancements

MAC distribution mode is an enhancement to VXLAN that prevents unknown unicast flooding and eliminates data plane MAC address learning. Traditionally, this was done by flooding to locate an unknown end host, but it has now been replaced with a control plane solution.

During VM startup, the VSM ( control plane ) collects the list of MAC addresses and distributes the MAC-to-VTEP mappings to all VEMs participating in a VXLAN segment. This technique makes VXLAN more optimal by unicasting more intelligently, similar to Nicira and VMware NVP.

ARP termination works by giving the VSM controller all the ARP and MAC information. This enables the VSM to proxy and respond locally to ARP requests without sending a broadcast. Because 90% of broadcast traffic is ARP requests ( ARP reply is unicast ), this significantly reduces broadcast traffic on the network.

Closing Points on VXLAN

VXLAN is a network virtualization technology that encapsulates Ethernet frames within UDP packets, allowing the creation of a virtual network overlay. This overlay extends beyond traditional VLANs, providing more address space and enabling communication across different physical networks. With a 24-bit segment ID, VXLAN supports up to 16 million unique network segments, compared to the 4096 segments possible with VLANs.

At its core, VXLAN uses tunneling to encapsulate Layer 2 frames within Layer 3 packets. This process involves two key components: the VXLAN Tunnel Endpoint (VTEP) and the VXLAN header. VTEPs are responsible for encapsulating and decapsulating the Ethernet frames. They also handle the communication between different network segments, ensuring seamless data transmission across the network.

The adoption of VXLAN brings several advantages to network management:

1. **Scalability**: VXLAN’s ability to support millions of network segments allows for unprecedented scalability, making it ideal for large-scale data centers.

2. **Flexibility**: By decoupling the network overlay from the physical infrastructure, VXLAN enables flexible network designs that can adapt to changing business needs.

3. **Improved Resource Utilization**: VXLAN optimizes the use of network resources, enhancing performance and reducing congestion by spreading traffic across multiple paths.

VXLAN is particularly beneficial in scenarios where network scalability and flexibility are paramount. Common use cases include:

– **Data Center Interconnect**: Connecting multiple data centers over a single, unified network fabric.

– **Multi-Tenant Environments**: Providing isolated network segments for different tenants in cloud environments.

– **Disaster Recovery**: Facilitating seamless failover and recovery by extending network segments across geographically dispersed locations.

Summary: What is VXLAN

VXLAN, short for Virtual Extensible LAN, is a network virtualization technology that has recently gained significant popularity. In this blog post, we will examine VXLAN’s definition, workings, and benefits. So, let’s dive into the world of VXLAN!

Understanding VXLAN Basics

VXLAN is an encapsulation protocol that enables the creation of virtual networks over existing Layer 3 infrastructures. It extends Layer 2 segments over Layer 3 networks, allowing for greater flexibility and scalability. By encapsulating Layer 2 frames within Layer 3 packets, VXLAN enables efficient communication between virtual machines (VMs) across physical hosts or data centers.

VXLAN Operation and Encapsulation

To understand how VXLAN works, we must look at its operation and encapsulation process. When a VM sends a Layer 2 frame, it is encapsulated into a VXLAN packet by adding a VXLAN header. This header includes information such as the VXLAN network identifier (VNI), which helps identify the virtual network to which the packet belongs. The VXLAN packet is then transported over the underlying Layer 3 network to the destination physical host, encapsulated, and delivered to the appropriate VM.

Benefits and Use Cases of VXLAN

VXLAN offers several benefits that make it an attractive choice for network virtualization. Firstly, it enables the creation of large-scale virtual networks, allowing for seamless VM mobility and workload placement flexibility. VXLAN also helps overcome the limitations of traditional VLANs by providing a much larger address space, accommodating the ever-growing number of virtual machines in modern data centers. Additionally, VXLAN facilitates network virtualization across geographically dispersed data centers, making it ideal for multi-site deployments and disaster recovery scenarios.

VXLAN vs. Other Network Virtualization Technologies

While VXLAN is widely used, it is essential to understand its key differences and advantages compared to other network virtualization technologies. For instance, VXLAN offers better scalability and flexibility than traditional VLANs. It also provides better isolation and segmentation of virtual networks, making it an ideal choice for multi-tenant environments. Additionally, VXLAN is agnostic to the physical network infrastructure, allowing it to be easily deployed in existing networks without requiring significant changes.

Conclusion:

In conclusion, VXLAN is a powerful network virtualization technology that has revolutionized how virtual networks are created and managed. Its ability to extend Layer 2 networks over Layer 3 infrastructures, scalability, flexibility, and ease of deployment make VXLAN a go-to solution for modern data centers. Whether for workload mobility, multi-site implementations, or overcoming VLAN limitations, VXLAN offers a robust and efficient solution. Embracing VXLAN can unlock new possibilities in network virtualization, enabling organizations to build agile, scalable, and resilient virtual networks.

Concept set of mobile network, wireless Internet connection technology. Wifi illustration. People use device to connect Internet network Modern colorful flat vector illustration for poster, banner.

Routing Convergence

Routing Convergence

Routing convergence, a critical aspect of network performance, refers to the process of network routers exchanging information to update their routing tables in the event of network changes. It ensures efficient and reliable data transmission, minimizing disruptions and optimizing network performance. In this blog post, we will delve into the intricacies of routing convergence, exploring its importance, challenges, and best practices.

Routing convergence refers to the process by which a network's routing tables reach a consistent and stable state after making changes. It ensures that all routers within a network have up-to-date information about the available paths and can make efficient routing decisions.

When a change occurs in a network, such as a link failure or the addition of a new router, routing convergence is necessary to update the routing tables and ensure that packets are delivered correctly. The goal is to minimize the time it takes for all routers in the network to converge and resume normal routing operations.

Several mechanisms and protocols contribute to routing convergence. One of the critical components is the exchange of routing information between routers. This can be done through protocols such as Routing Information Protocol (RIP), Open Shortest Path First (OSPF), or Border Gateway Protocol (BGP).

Highlights: Routing Convergence

Understanding Convergence

Router convergence means they have the same topological information about the network they operate. To converge, a set of routers must have collected all topology information from each other using the routing protocol implemented. For this information to be accurate, it must reflect the current state of the network and not contradict other routers’ topology information.

All routers agree upon the topology of a converged network. For dynamic routing to work, a set of routers must be able to communicate with each other. All Interior Gateway Protocols depend on convergence. An autonomous system in operation is usually converged or convergent. Exterior Gateway Routing Protocol BGP rarely converges due to the size of the Internet.

Convergence Process

Each router in a routing protocol attempts to exchange topology information about the network. The extent, method, and type of information exchanged between routing protocols, such as BGP4, OSPF, and RIP, differs. A routing protocol convergence occurs once all routing protocol-specific information has been distributed to all routers. In the event of a routing table change in the network, convergence will be temporarily broken until the change has been successfully communicated to all routers.

Example: The convergence process

During the convergence process, routers exchange information about the network’s topology. Based on this information, they update their routing tables and calculate the most efficient paths to reach destination networks. This process continues until all routers have consistent and accurate routing tables.

The convergence time can vary depending on the size and complexity of the network and the routing protocols used. Convergence can happen relatively quickly in smaller networks, while more extensive networks may take longer to achieve convergence.

Network administrators can employ various strategies to optimize routing convergence. These include implementing fast convergence protocols, such as OSPF’s Fast Hello and Bidirectional Forwarding Detection (BFD), which minimize the time it takes to detect and respond to network changes.

Mechanisms for Achieving Routing Convergence:

1. Routing Protocols:

– Link-State Protocols: OSPF (Open Shortest Path First) and IS-IS (Intermediate System to Intermediate System) are examples of link-state protocols. They use flooding techniques to exchange information about network topology, allowing routers to calculate the shortest path to each destination.

– Distance-Vector Protocols: RIP (Routing Information Protocol) and EIGRP (Enhanced Interior Gateway Routing Protocol) are distance-vector protocols that use iterative algorithms to determine the best path based on distance metrics.

2. Fast Convergence Techniques:

– Triggered Updates: When a change occurs in network topology, routers immediately send updates to inform other routers about the change, reducing the convergence time.

– Route Flapping Detection: Route flapping occurs when a network route repeatedly becomes available and unavailable. By detecting and suppressing flapping routes, convergence time can be significantly improved.

– Convergence Optimization: Techniques like unequal-cost load balancing and route summarization help optimize routing convergence by distributing traffic across multiple paths and reducing the size of routing tables.

3. Redundancy and Resilience:

– Redundant Links: Multiple physical connections between routers increase network reliability and provide alternate paths in case of link failures.

– Virtual Router Redundancy Protocol (VRRP): VRRP allows multiple routers to act as a single virtual router, ensuring seamless failover in case of a primary router failure.

– Multi-Protocol Label Switching (MPLS): MPLS technology offers fast rerouting capabilities, enabling quick convergence in case of link or node failures.

Strategies for Achieving Optimal Routing Convergence

a. Enhanced Link-State Routing Protocol (EIGRP): EIGRP is a dynamic routing protocol that utilizes a Diffusing Update Algorithm (DUAL) to achieve fast convergence. By maintaining a backup route in case of link failures and employing triggered updates, EIGRP significantly reduces the time required for routing tables to converge.

b. Optimizing Routing Metrics: Carefully configuring routing metrics, such as bandwidth, delay, and reliability, can help achieve faster convergence. Assigning appropriate weights to these metrics ensures that routers quickly select the most efficient paths, leading to improved convergence times.

c. Implementing Route Summarization: Route summarization involves aggregating multiple network routes into a single summarized route. This technique reduces the size of routing tables and minimizes the complexity of route calculations, resulting in faster converge

BGP Next Hop Tracking

BGP next hop refers to the IP address of the next router in the path towards a destination network. It serves as crucial information for routers to make forwarding decisions. Typically, BGP relies on the reachability of the next hop to determine the best path. However, various factors can affect this reachability, including link failures, network congestion, or misconfigurations. This is where BGP next hop tracking comes into play.

By incorporating next hop tracking into BGP, network administrators gain valuable insights into the reachability status of next hops. This information enables more informed decision-making regarding routing policies and traffic engineering. With real-time tracking, administrators can dynamically adjust routing paths based on the availability and quality of next hops, leading to improved network performance and reliability.

next hop tracking

**Benefits of Efficient Routing Convergence:**

1. Improved Network Performance: Efficient routing convergence reduces network congestion, latency, and packet loss, improving overall network performance.

2. Enhanced Reliability: Routing convergence ensures uninterrupted communication and minimizes downtime by quickly adapting to changes in network conditions.

3. Scalability: Proper routing convergence techniques facilitate network expansion and accommodate increased traffic demands without sacrificing performance or reliability.

Example Technology:  BFD

Bidirectional Forwarding Detection (BFD) is a lightweight protocol designed to detect failures in communication paths between routers or switches. It operates independently of the routing protocols and detects rapid failure by utilizing fast packet exchanges. Unlike traditional methods like hello packets, BFD offers sub-second detection, allowing quicker convergence and network stability. BFD is pivotal in achieving fast routing convergence, providing real-time detection, and facilitating swift rerouting decisions.

-Enhanced Network Resilience: By swiftly detecting link failures, BFD enables routers to act immediately, rerouting traffic through alternate paths. This proactive approach ensures minimal disruption and enhances network resilience, especially in environments where redundancy is critical.

-Reduced Convergence Time: BFD’s ability to detect failures within milliseconds significantly reduces the time required for converging routing protocols. This translates into improved network responsiveness, reduced packet loss, and enhanced user experience.

-Scalability and Flexibility: BFD can be implemented across various network topologies and routing protocols, making it a versatile solution. Whether a small enterprise network or a large-scale service provider environment, BFD adapts seamlessly, providing consistent performance and stability.

Convergence Time

Convergence time measures the speed at which a group of routers converges. Fast and reliable convergent routers are a significant performance indicator for routing protocols. The size of the network is also essential. A more extensive network will converge more slowly than a smaller one.

When a few routers are connected to RIP, a routing protocol that converges slowly, it can take several minutes for the network to converge. A triggered update for a new route can speed up RIP’s convergence, but a hold-down timer will slow flushing an existing route. OSPF is an example of a fast-convergence routing protocol. It is impossible to limit the speed at which OSPF routers can converge.

Unless specific hardware and configuration conditions are met, networks can never converge. “Flapping” interfaces (ones that frequently change between “up” and “down”) propagate conflicting information throughout the network, so routers cannot agree on the current state. Route aggregation can deprive certain parts of a network of detailed routing information, resulting in faster convergence of topological information.

Topological information

A set of routers in a network share the same topological information during convergence or routing convergence. Routing protocols exchange topology information between routers in a network. Routers in a network receive routing information when convergence is reached. Therefore, all routers know the network topology and optimal route in a converged network.

Any change in the network – for example, the failure of a device – affects convergence until all routers are informed of the change. The convergence time in a network is the time it takes for routers to achieve convergence after a topology change. In high-performance service provider networks, sensitive applications are run that require fast failover in case of failures. Several factors determine a network’s convergence rate:

  1. Devices detect route failures. Finding a new forwarding path begins with identifying the failed device. The existence of virtual networks establishes device reachability through their longevity, as opposed to physical networks, in which events determine device availability. To achieve fast network convergence, the detection time – the time it takes to detect a failure – must be kept within acceptable limits.
  2. In the event of a device failure on the primary route, traffic is diverted to the backup route. The failure or topology change has not yet affected all devices.
  3. Routing protocols are said to achieve global repair or network convergence when they propagate a change in topology to all network devices.

Understanding UDLD

UDLD, at its core, is a layer 2 protocol designed to detect and mitigate unidirectional links in Ethernet connections. It actively monitors the link status, allowing network devices to promptly detect and address potential issues. By verifying the bidirectional communication between neighboring devices, UDLD acts as a guardian, preventing one-way communication breakdowns.

Implementing UDLD brings forth numerous advantages for network administrators and organizations alike.

Firstly, it enhances network reliability by identifying and resolving unidirectional link failures that could otherwise lead to data loss and network disruptions.

Secondly, UDLD helps troubleshoot by providing valuable insights into link quality and integrity. This proactive approach aids in reducing downtime and improving overall network performance.

Enhancing Routing Convergence

Network administrators can implement various strategies to improve routing convergence. One approach is to utilize route summarization, which reduces the number of routes advertised and processed by routers. This helps minimize the impact of changes in specific network segments on overall convergence time.

Furthermore, implementing fast link failure detection mechanisms, such as Bidirectional Forwarding Detection (BFD), can significantly reduce convergence time. BFD allows routers to quickly detect link failures and trigger immediate updates to routing tables, ensuring faster convergence.

**Factors Influencing Routing Convergence**

Several factors impact routing convergence in a network. Firstly, the efficiency of the routing protocols being used plays a crucial role. Protocols such as OSPF (Open Shortest Path First) and EIGRP (Enhanced Interior Gateway Routing Protocol) are designed to facilitate fast convergence by quickly adapting to network changes.

Additionally, network topology and scale can affect routing convergence. Large networks with complex topologies may require more time for routers to converge due to the increased number of routes and potential link failures. Network administrators must carefully design and optimize the network architecture to minimize convergence time.

Control and data plane

When considering routing convergence with forwarding routing protocols, we must first highlight that a networking device is tasked with two planes of operation—the control plane and the data plane. The job of the data plane is to switch traffic across the router’s interfaces as fast as possible, i.e., move packets. The control plane has the more complex operation of putting together and creating the controls so the data plane can operate efficiently. How these two planes interact will affect network convergence time.

The network’s control plane finds the best path for routing convergence from any source to any network destination. For quick convergence routing, it must react quickly and dynamically to changes in the network, both on the LAN and on the WAN.

Control and Data Plane

Monitoring and Troubleshooting Routing Convergence

Network administrators must monitor routing convergence to identify and promptly address potential issues. Network management tools, such as SNMP (Simple Network Management Protocol) and NetFlow analysis, can provide valuable insights into routing convergence performance, including convergence time, route flapping, and stability.

When troubleshooting routing convergence problems, administrators should carefully analyze routing table updates, link state information, and routing protocol logs. This information can help pinpoint the root cause of convergence delays or inconsistencies, allowing for targeted remediation.

netflow

Related: For pre-information, you may find the following posts helpful:

  1. Implementing Network Security
  2. Dead Peer Detection
  3. IPsec Fault Tolerance
  4. WAN Virtualization
  5. Port 179

Routing Convergence

Convergence Time Definition.

I found two similar definitions of convergence time:

“Convergence is the amount of time ( and thus packet loss ) after a failure in the network and before the network settles into a steady state.” Also, ” Convergence is the amount of time ( and thus packet loss) after a failure in the network and before the network responds to the failure.”

The difference between the two convergence time definitions is subtle but essential – steady-state vs. just responding. The control plane and its reaction to topology changes can be separated into four parts below. Each area must be addressed individually, as leaving one area out results in slow network convergence time and application time-out.

**IP routing**

Moving IP Packets

A router’s primary role is moving an IP packet from one network to another. Routers select the best loop-free path in a network to forward a packet to its destination IP address. A router learns about nonattached networks through static configuration or dynamic IP routing protocols. Both static and dynamic are examples of routing protocols.

With dynamic IP routing protocols, we can handle network topology changes dynamically. Here, we can distribute network topology information between routers in the network. When there is a change in the network topology, the dynamic routing protocol provides updates without intervention when a topology change occurs.

On the other hand, we have IP routing to static routes, which do not accommodate topology changes very well and can be a burden depending on the network size. However, static routing is a viable solution for minimal networks with no modifications.

Dynamic Routing Protocols
Diagram: Dynamic Routing Protocols. Source Cisco Press.

Knowledge Check: Bidirectional Forwarding Detection

Understanding BFD

BFD is a lightweight protocol designed to detect faults in the forwarding path between network devices. It operates at a low level, constantly monitoring the connectivity and responsiveness of neighboring devices. BFD can quickly detect failures by exchanging control packets and taking appropriate action to maintain network stability.

The Benefits of BFD

The implementation of BFD brings numerous advantages to network administrators and operators. Firstly, it provides rapid fault detection, reducing downtime and minimizing the impact of network failures. Additionally, BFD offers scalable and efficient operation, as it consumes minimal network resources. This makes it an ideal choice for large-scale networks where resource optimization is crucial.

BFD runs independently from other (routing) protocols. Once it’s up and running, you can configure protocols like OSPF, EIGRP, BGP, HSRP, MPLS LDP, etc., to use BFD for link failure detection instead of their mechanisms. When the link fails, BFD informs the protocol. When BFD no longer receives its control packets, it realizes we have a link failure and reports this to OSPF. OSPF will then tear down the neighbor adjacency.

Bidirectional Forwarding Detection (BFD)

Use Cases of BFD

BFD finds its applications in various networking scenarios. One prominent use case is link aggregation, where BFD helps detect link failures and ensures seamless failover to alternate links. BFD is also widely utilized in Virtual Private Networks (VPNs) to monitor the connectivity of tunnel endpoints, enabling quick detection of connectivity issues and swift rerouting.

Implementing BFD in Practice

Implementing BFD requires careful consideration and configuration. Network devices must be appropriately configured to enable BFD sessions and define appropriate parameters such as timers and thresholds. Additionally, network administrators must ensure proper integration with underlying routing protocols to maximize BFD’s efficiency.

Convergence Routing and Network Convergence Time

Network convergence connects multiple computer systems, networks, or components to establish communication and efficient data transfer. However, it can be a slow process, depending on the size and complexity of the network, the amount of data that needs to be transferred, and the speed of the underlying technologies.

For networks to converge, all of the components must interact with each other and establish rules for data transfer. This process requires that the various components communicate with each other and usually involves exchanging configuration data to ensure that all components use the same protocols.
Network convergence is also dependent on the speed of the underlying technologies.

To speed up convergence, administrators should use the latest technologies, minimize the amount of data that needs to be transferred, and ensure that all components are correctly configured to be compatible. By following these steps, network convergence can be made faster and more efficient.

Example: OSPF

To put it simply, convergence or routing convergence is a state in which a set of routers in a network share the same topological information. For example, we have ten routers in one OSFP area. OSPF is an example of a fast-converging routing protocol. A network of a few OSPF routers can converge in seconds.

The routers within the OSPF area in the network collect the topology information from one another through the routing protocol. Depending on the routing protocol used to collect the data, the routers in the same network should have identical copies of routing information.

Different routing protocols will have additional convergence time. The time the routers take to reach convergence after a change in topology is termed convergence time. Fast network convergence and fast failover are critical factors in network performance. Before we get into the details of routing convergence, let us recap how networking works.

network convergence time
Diagram: Network convergence time.

Unlike IS-IS, OSPF has fewer “knobs” for optimizing convergence. This is probably because IS-IS is being developed and supported by a separate team geared towards ISPs, where fast convergence is a competitive advantage.

Example Convergence Time with OSPF
Diagram: Example Convergence Time with OSPF. Source INE.

OSPF: Incremental SPF

OSPF calculates the SPT (Shortest Path Tree) using the SPF (Shortest Path First) algorithm. SPTs are built by OSPF routers within the same area with the same LSAs, LSDBs, and LSAs. OSPF routers will rerun a full SPF calculation even when there is just a single change in the network topology (change to an LSA type 1 and LSA type 2).

If a topology change occurs, we should run a full SPT calculation to find the shortest paths to all destinations. Unfortunately, we also calculate paths that have not changed since the last SPF.

In incremental SPF, OSPF only recalculates the parts of the SPT that have changed.

Because you don’t run a full SPF all the time, the router’s CPU load decreases,, and convergence times improve—additionally, your router stores the previous SPT copy, which requires more memory.

In three scenarios, incremental SPF is beneficial:

  • Adding (or removing) a leaf node to a branch

  • Link failure in non-SPT

  • Link failure in a branch of SPT

When many routers are in a single area, and the CPU load is high because of OSPF, incremental SPF can be enabled per router.

Forwarding Paradigms

We have bridging routing and switching with data and the control plane. So, we need to get packets across a network, which is easy if we have a single cable. You need to find the node’s address, and small and non-IP protocols would use a broadcast. When devices in the middle break this path, we can use source routing, path-based forwarding, and hop-by-hop address-based forwarding based solely on the destination address.

When protocols like IP came into play, hop-by-hop destination-based forwarding became the most popular; this is how IP forwarding works. Everyone in the path makes independent forwarding decisions. Each device looks at the destination address, examines its lookup tables, and decides where to send the packet.

**Finding paths across the network**

How do we find a path across the network? We know there are three ways to get packets across the network – source routing, path-based forwarding, and hop-by-hop destination-based forwarding. So, we need some way to populate the forwarding tables. You need to know how your neighbors are and who your endpoints are. This can be static routing, but it is more likely to be a routing protocol. Routing protocols have to solve and describe the routing convergence on the network at a high level.

When we are up and running, events can happen to the topology that force or make the routing protocols react and perform a convergence routing state. For example, we have a link failure, and the topology has changed, impacting our forwarding information. So, we must propagate the information and adjust the path information after the topology change. We know these convergence routing states are to detect, describe, switch, and find.

Rouitng Convergence

Convergence


Detect


Describe


Switch 


Find

To better understand routing convergence, I would like to share the network convergence time for each routing protocol before diving into each step. The times displayed below are from a Cisco Live session based on real-world case studies and field research. We are separating each of the convergence routing steps described above into the following fields: Detect, describe, find alternative, and total time.

Routing Protocol

RIP

OSPF 

EIGRP

Detect

<1 second-best, 105 seconds average

<1 second-best, 20 seconds average 

<1 second-best, 15 seconds average.30 seconds worst

Describe

15 seconds average, 30 seconds worst

1 second-best, 5 seconds average.

2 seconds

Find Alternative

15 seconds average, 30 seconds worst

 1-second average.

*** <500ms per query hop average Assume a 2-second average

Total Time

Best Average Case: 31 seconds Average Case: 135 seconds Worse Case: 179 seconds

Best Average Case: 2 to 3 seconds

Average Case: 25 seconds

Worse Case: 45 seconds

Best Average Case: <1 second

Average Case: 20 seconds

Worse Case: 35 seconds

*** The alternate route is found before the described phase due to the feasible successor design with EIGRP path selection.

Convergence Routing

Convergence routing: EIGPR

EIGRP is the fastest but only fractional. EIGRP has a pre-built loop-free path known as a feasible successor. The FS route has a higher metric than the successor, making it a backup route to the successor route. The effect of a pre-computed backup route on convergence is that EIGRP can react locally to a change in the network topology; nowadays, this is usually done in the FIB. EIGRP would have to query for the alternative route without a feasible successor, increasing convergence time.

However, you can have a Loop Free Alternative ( LFA ) for OSPF, which can have a pre-computed alternate path. Still, LFAs can only work with specific typologies and don’t guarantee against micro-loops ( EIGRP guarantees against micro-loops).

Lab Guide: EIGRP LFA FRR

With Loop-Free Alternate (LFA) Fast Reroute (FRR), EIGRP can switch to a backup path in less than 50 milliseconds. Fast rerouting means switching to another next hop, and a loop-free alternate refers to a loop-free alternative path.

Perhaps this sounds familiar to you. After all, EIGRP has feasible successors. The alternate paths calculated by EIGRP are loop-free. As soon as the successor fails, EIGRP can use a feasible successor.

It’s true, but there’s one big catch. In the routing table, EIGRP feasible successors are not immediately installed. Only one route is installed, the successor route. EIGRP installs the feasible successor when the successor fails, which takes time. By installing both successor routes and feasible successor routes in the routing table, fast rerouting makes convergence even faster.

EIGRP Configuration

These four routers run EIGRP; there’s a loopback on R4 with network 4.4.4.4/32. R1 can go through R2 or R3 to get there. The delay on R1’s GigabitEthernet3 interface has increased, so R2 is our successor, and R3 is our feasible successor. The output below is interesting. We still see the successor route, but at the bottom, you can see the repair path…that’s our feasible successor.

TCP Congestion control

Ask yourself, is < 1-second convergence fast enough for today’s applications? Indeed, the answer would be yes for some non-critical applications that work on TCP. TCP has built-in backoff algorithms that can deal with packet loss by re-transmitting to recover lost segments. However, non-bulk data applications like video and VOIP have stricter rules and require fast convergence and minimal packet loss.

For example, a 5-second delay in routing protocol convergence could mean several hundred dropped voice calls. A 50-second delay in a Gigabit Ethernet link implies about 6.25 GB of lost information.

Adding Resilience

To add resilience to a network, you can aim to make the network redundant. When you add redundancy, you are betting that outages of the original path and the backup path will not co-occur and that the primary path does not fate share with the backup path ( they do not share common underlying infrastructure, i.e., physical conducts or power ).

There needs to be a limit on the number of links you add to make your network redundant, and adding 50 extra links does not make your network 50 times more redundant. It does the opposite! The control plane is tasked with finding the best path and must react to modifications in the network as quickly as possible.

However, every additional link you add slows down the convergence of the router’s control plane as there is additional information to compute, resulting in longer convergence times. The correct number of backup links is a trade-off between redundancy versus availability. The optimal level of redundancy between two points should be two or three links. The fourth link would make the network converge slower.

Convergence Routing
Diagram: Convergence routing and adding resilience.

Routing Convergence and Routing Protocol Algorithms

Routing protocol algorithms can be tweaked to exponentially back off and deal with bulk information. However, no matter how many timers you use, the more data in the routing databases, the longer the convergence times. The primary way to reduce network convergence is to reduce the size of your routing tables by accepting just a default route, creating a flooding boundary domain, or using some other configuration method.

For example, a common approach in OSPF to reduce the size of routing tables and flooding boundaries is to create OSPF stub areas. OSPF stub areas limit the amount of information in the area. For example, EIGRP limits the flooding query domain by creating EIGRP stub routers and intelligently designing aggregation points. Now let us revisit the components of routing convergence:

Routing Convergence Step

Routing Convergence Details

Step 1

Failure detection

Step 2

Failure propagation ( flooding, etc.) IGP Reaction

Step 3

Topology/Routing calculation. IGP Reaction.

Step 4

Update the routing and forwarding table ( RIB & FIB)

**Stage 1: Failure Detection**

The first and foremost problem facing the control plane is quickly detecting topology changes. Detecting the failure is the most critical and challenging part of network convergence. It can occur at different layers of the OSI stack – Physical Layers ( Layer 1), Data Link Layer ( Layer 2 ), Network Layer ( Layer 3 ), and Application layer ( Layer 7 ).  There are many types of techniques used to detect link failures, but they all generally come down to two basic types:

  • Event-driven notification occurs when a carrier is lost or when one network element detects a failure and notifies the other network elements.
  • Polling-driven notification – generally HELLO protocols that test the path for reachability, such as Bidirectional Forwarding Detection ( BFD ). Event-driven notifications are always preferred over polling-driven ones as the latter have to wait for three polls before declaring a path down. However, there are some cases when you have multiple Layer devices in the path, and HELLO polling systems are the only method that can be used to detect a failure.

Layer 1 failure detection

Layer 1: Ethernet mechanisms like auto-negotiation ( 1 GigE ) and link fault signaling ( 10 GigE 802.3ae/ 40 GigE 802.3ba ) can signal local failures to the remote end.

network convergence time
Diagram: Network convergence time and Layer 1.

However, the challenge is getting the signal across an optical cloud, as relaying the fault information to the other end is impossible. When there is a “bump” in the Layer 1 link, it is not always possible for the remote end to detect the failure. In this case, the link fault signaling from Ethernet would get lost in the service provider’s network.

The actual link-down / interface-down event detection is hardware-dependent. Older platforms, such as the 6704 line cards for the Catalyst 6500, used a per-port polling mechanism, resulting in a 1 sec detect link failure period. More recent Nexus switches and the latest Catalyst 5600 line cards have an interrupt-driven notification mechanism resulting in high-speed and predictable link failure detection.

Layer 2 failure detection

Layer 2: The layer 2 detection mechanism will kick in if the Layer 1 mechanism does not. Unidirectional Link Detection ( UDLD ) is a Cisco proprietary lightweight Layer 2 failure detection protocol designed for detecting one-way connections due to physical or soft failure and miss-wirings.

  • A key point: UDLD is a slow protocol

UDLD is a reasonably slow protocol that uses an average of 15 seconds for message interval and 21 seconds for detection. Its period has raised questions about its use in today’s data centers. However, the chances of miswirings are minimal; Layer 1 mechanisms always communicate unidirectional physical failure, and STP Bridge Assurance takes care of soft failures in either direction.

STP Bridge assurance turns STP into a bidirectional protocol and ensures that the spanning tree never fails to open and only fails to close. Failing open means that if a switch does not hear from its neighbor, it immediately starts forwarding on initially blocked ports, causing network havoc.

Layer 3 failure detection

Layer 3: In some cases, failure detection has to reply to HELLO protocols at Layer 3. This is needed when there are intermediate Layer 2 hops over Layer links and concerns over uni-direction failures on point-to-point physical links arise.

Diagram: Layer 3 failure detection

All Layer 3 protocols use HELLOs to maintain neighbor adjacency and a DEAD time to declare a neighbor dead. These timers can be tuned for faster convergence. However, it is generally not recommended due to the increase in CPU utilization causing false positives and the challenges ISSU and SSO face. They enable bidirectional forwarding detection ( BFD ) as the layer 3 detection mechanism is strongly recommended over aggressive protocol times, and they use BFD for all protocols.

Bidirectional Forwarding Detection ( BFD ) is a lightweight hello protocol for sub-second Layer 3 failure detection. It can run over multiple transport protocols, such as MPLS, THRILL, IPv6, and IPv4, making it the preferred method.

**Stage 2: Routing convergence and failure propagation**

When a change occurs in the network topology, it must be registered with the local router and transmitted throughout the rest of the network. The transmission of the change information will be carried out differently for Link-State and Distance Vector protocols. Link state protocols must flood information to every device in the network, and the distance vector must process the topology change at every hop through the network.

The processing of information at every hop may lead you to conclude that link-state protocols always converge more quickly than path-vector protocols, but this is not the case. EIGRP, due to its pre-computed backup path, will converge more rapidly than any link-state protocol.

Routing convergence
Diagram: Routing convergence and failure propagation

To propagate topology changes as quickly as possible, OSPF ( Link state ) can group changes into a few LSA while slowing down the rate at which information is flooded, i.e., do not flood on every change. This is accomplished with link-state flood timer tuning combined with exponential backoff systems, such as link-state advertisement delay / initial link-state advertisement throttle delay.

Unfortunately, Distance Vector Protocols do not have such timers. Therefore, reducing the routing table size is the only option for EIGRP. This can be done by aggregating and filtering reachability information ( summary route or Stub areas ).

**Stage 3: Topology/Routing calculation**

Similar to the second step, link-state protocols use exponential back-off timers in this step. These timers adjust the waiting time for OSPF and ISIS to wait after receiving new topology information before calculating the best path. 

**Stage 4: Update the routing and forwarding table ( RIB & FIB)**

Finally, after the topology information has been flooding through the network and a new best path has been calculated, the new best path must be installed in the Forwarding Information Base ( FIB ). The FIB is a copy of the RIB in hardware, and the forwarding process finds it much easier to read than the RIB. However, again, this is usually done in hardware. Most vendors offer features that will install a pre-computed backup path on the line cards forwarding table so the fail-over from the primary path to the backup path can be done milliseconds without interrupting the router CPU.

Closing Points: Routing Convergence

Routing convergence refers to the process by which network routers exchange routing information and adapt to changes in network topology or routing policies. It involves the timely update and synchronization of routing tables across the network, allowing routers to determine the best paths for forwarding data packets.

Routing convergence is vital for maintaining network stability and minimizing disruptions. Network traffic may experience delays, bottlenecks, or even failures without proper convergence. Routing convergence enables efficient and reliable communication by ensuring all network routers have consistent routing information.

Summary: Routing Convergence

Routing convergence is crucial in network management, ensuring smooth and efficient communication between devices. In this blog post, we will explore the concept of routing convergence, its importance in network operations, common challenges faced, and strategies to achieve faster convergence times.

Section 1: Understanding Routing Convergence

Routing convergence refers to network protocols adapting to changes in network topology, such as link failures or changes in network configurations. It involves recalculating and updating routing tables to ensure the most optimal paths for data transmission. Network downtime can be minimized by achieving convergence quickly, and data can flow seamlessly.

Section 2: The Importance of Fast Convergence

Fast routing convergence is critical for maintaining network stability and minimizing disruptions. In today’s fast-paced digital landscape, where businesses rely heavily on uninterrupted connectivity, delays in convergence can result in significant financial losses, degraded user experience, and even security vulnerabilities. Therefore, network administrators must prioritize measures to enhance convergence speed.

Section 3: Challenges in Routing Convergence

While routing convergence is essential, it comes with its challenges. Network size, complex topologies, and diverse routing protocols can significantly impact convergence times. Additionally, suboptimal route selection, route flapping, and inefficient link failure detection can further hinder the convergence process. Understanding these challenges is crucial for devising practical solutions.

Section 4: Strategies for Achieving Faster Convergence

To optimize routing convergence, network administrators can implement various strategies. These include:

1. Implementing Fast Convergence Protocols: Utilizing protocols like Bidirectional Forwarding Detection (BFD) and Link State Tracking (LST) can expedite the detection of link failures and trigger faster convergence.

2. Load Balancing and Redundancy: Distributing traffic across multiple paths and employing redundancy mechanisms, such as Equal-Cost Multipath (ECMP) routing, can mitigate the impact of link failures and improve convergence times.

3. Optimizing Routing Protocol Parameters: Fine-tuning routing protocol timers, hello intervals, and dead intervals can contribute to faster convergence by reducing the time it takes to detect and react to network changes.

Section 5: Conclusion

In conclusion, routing convergence is fundamental to network management, ensuring efficient data transmission and minimizing disruptions. By understanding the concept, recognizing the importance of fast convergence, and implementing appropriate strategies, network administrators can enhance network stability, improve user experience, and safeguard against potential financial and security risks.

Physical diagram - Fabric Extenders and Parent switches

How-to: Fabric Extenders & VPC

Topology Diagram

The topology diagram depicts two Nexus 5K acting as parent switches with physical connections to two downstream Nexus 2k (FEX) acting as the 10G physical termination points for the connected server.

Physical diagram - Fabric Extenders and Parent switches
Physical diagram – Fabric Extenders and Parent switches

Part1. Connecting the FEX to the Parent switch:

The FEX and the parent switch use Satellite Discovery Protocol (SDP) periodic messages to discovery and register with one another.

When you initially log on to the Nexus 5K you can see that the OS does not recognise the FEX even though there are two FEXs that are cabled correctly to parent switch. As the FEX is recognised as a remote line card you would expect to see it with a “show module” command.

N5K3# sh module
Mod Ports Module-Type Model Status
— —– ——————————– ———————- ————
1 40 40x10GE/Supervisor N5K-C5020P-BF-SUP active *
2 8 8×1/2/4G FC Module N5K-M1008 ok
Mod Sw Hw World-Wide-Name(s) (WWN)
— ————– —— ————————————————–
1 5.1(3)N2(1c) 1.3 —

2 5.1(3)N2(1c) 1.0 93:59:41:08:5a:0c:08:08 to 00:00:00:00:00:00:00:00

Mod MAC-Address(es) Serial-Num

— ————————————– ———-

1 0005.9b1e.82c8 to 0005.9b1e.82ef JAF1419BLMA

2 0005.9b1e.82f0 to 0005.9b1e.82f7 JAF1411AQBJ

We issue the “feature fex” command we observe the FEX sending SDP messages to the parent switch i.e. RX but we don’t see the parent switch sending SDP messages to the FEX i.e. TX.

Notice in the output below there is only “fex:Sdp-Rx” messages.

N5K3# debug fex pkt-trace
N5K3# 2014 Aug 21 09:51:57.410701 fex: Sdp-Rx: Interface: Eth1/11, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1
2014 Aug 21 09:51:57.410729 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x4000ff2929f0, device: Fex, Remote link: 0x20000080
2014 Aug 21 09:51:57.410742 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHX
2014 Aug 21 09:51:57.821776 fex: Sdp-Rx: Interface: Eth1/10, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1
2014 Aug 21 09:51:57.821804 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x2ff2929f0, device: Fex, Remote link: 0x20000080
2014 Aug 21 09:51:57.821817 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHU

The FEX appears as “DISCOVERED” but no additional FEX host interfaces appear when you issue a “show interface brief“.

Command: show fex [chassid_id [detail]]: Displays information about a specific Fabric Extender Chassis ID

Command: show interface brief: Display interface information and connection status for each interface.

N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial
————————————————————————
— ——– Discovered N2K-C2232PP-10GE SSI16510AWF
— ——– Discovered N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3# show interface brief

——————————————————————————–

Ethernet VLAN Type Mode Status Reason Speed Port

Interface Ch #

——————————————————————————–

Eth1/1 1 eth access down SFP validation failed 10G(D) —

Eth1/2 1 eth access down SFP validation failed 10G(D) —

Eth1/3 1 eth access up none 10G(D) —

Eth1/4 1 eth access up none 10G(D) —

Eth1/5 1 eth access up none 10G(D) —

Eth1/6 1 eth access down Link not connected 10G(D) —

Eth1/7 1 eth access down Link not connected 10G(D) —

Eth1/8 1 eth access down Link not connected 10G(D) —

Eth1/9 1 eth access down Link not connected 10G(D) —

Eth1/10 1 eth fabric down FEX not configured 10G(D) —

Eth1/11 1 eth fabric down FEX not configured 10G(D) —

Eth1/12 1 eth access down Link not connected 10G(D) —

snippet removed

The Fabric interface Ethernet1/10 show as DOWN with a “FEX not configured” statement.

N5K3# sh int Ethernet1/10
Ethernet1/10 is down (FEX not configured)
Hardware: 1000/10000 Ethernet, address: 0005.9b1e.82d1 (bia 0005.9b1e.82d1)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is fex-fabric
auto-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

snippet removed

To enable the parent switch to fully discover the FEX we need to issue the “switchport mode fex-fabric” under the connected interface. As you can see we are still not sending any SDP messages but we are discovering the FEX.

The next step is to enable the FEX logical numbering under the interface so we can start to configure the FEX host interfaces. Once this is complete we run the “debug fex pkt-trace” and we are not sending TX and receiving RX SDP messages.

Command:”fex associate chassis_id“: Associates a Fabric Extender (FEX) to a fabric interface. To disassociate the Fabric Extender, use the “no” form of this command.

From the “debug fexpkt-race” you can see the parent switch is now sending TX SDP messages to the fully discovered FEX.

N5K3(config)# int Ethernet1/10
N5K3(config-if)# fex associate 101
N5K3# debug fex pkt-trace
N5K3# 2014 Aug 21 10:00:33.674605 fex: Sdp-Tx: Interface: Eth1/10, Fex Id: 101, Ctrl Vntag: 0, Ctrl Vlan: 4042
2014 Aug 21 10:00:33.674633 fex: Sdp-Tx: Refresh Intvl: 3000ms, Uid: 0xc0821e9b0500, device: Switch, Remote link: 0x1a009000
2014 Aug 21 10:00:33.674646 fex: Sdp-Tx: Vendor: Model: Serial: ———-

2014 Aug 21 10:00:33.674718 fex: Sdp-Rx: Interface: Eth1/10, Fex Id: 0, Ctrl Vntag: 0, Ctrl Vlan: 4042

2014 Aug 21 10:00:33.674733 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x2ff2929f0, device: Fex, Remote link: 0x20000080

2014 Aug 21 10:00:33.674746 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHU

2014 Aug 21 10:00:33.836774 fex: Sdp-Rx: Interface: Eth1/11, Fex Id: 0, Ctrl Vntag: -1, Ctrl Vlan: 1

2014 Aug 21 10:00:33.836803 fex: Sdp-Rx: Refresh Intvl: 3000ms, Uid: 0x4000ff2929f0, device: Fex, Remote link: 0x20000080

2014 Aug 21 10:00:33.836816 fex: Sdp-Rx: Vendor: Cisco Systems Model: N2K-C2232PP-10GE Serial: FOC17100NHX

2014 Aug 21 10:00:36.678624 fex: Sdp-Tx: Interface: Eth1/10, Fex Id: 101, Ctrl Vntag: 0, Ctrl Vlan: 4042

2014 Aug 21 10:00:36.678664 fex: Sdp-Tx: Refresh Intvl: 3000ms, Uid: 0xc0821e9b0500, device: Switch, Remote snippet removed

Now the 101 FEX status changes from “DISCOVERED” to “ONLINE”. You may also see an additional FEX with serial number SSI165204YC as “DISCOVERED” and not “ONLINE”. This is due to the fact that we have not explicitly configured it under the other Fabric interface.

N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial
————————————————————————
101 FEX0101 Online N2K-C2232PP-10GE SSI16510AWF
— ——– Discovered N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3# show module fex 101

FEX Mod Ports Card Type Model Status.

— — —– ———————————- —————— ———–

101 1 32 Fabric Extender 32x10GE + 8x10G Module N2K-C2232PP-10GE present

FEX Mod Sw Hw World-Wide-Name(s) (WWN)

— — ————– —— ———————————————–

101 1 5.1(3)N2(1c) 4.4 —

FEX Mod MAC-Address(es) Serial-Num

— — ————————————– ———-

101 1 f029.29ff.0200 to f029.29ff.021f SSI16510AWF

Issuing the “show interface brief” we see new interfaces, specifically host interfaces for the FEX. The syntax below shows that only one interface is up; interface labelled Eth101/1/1. Reason for this is that only one end host (server) is connected to the FEX

N5K3# show interface brief
——————————————————————————–
Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
——————————————————————————–
Eth1/1 1 eth access down SFP validation failed 10G(D) —
Eth1/2 1 eth access down SFP validation failed 10G(D) —
snipped removed

——————————————————————————–

Port VRF Status IP Address Speed MTU

——————————————————————————–

mgmt0 — up 192.168.0.53 100 1500

——————————————————————————–

Ethernet VLAN Type Mode Status Reason Speed Port

Interface Ch #

——————————————————————————–

Eth101/1/1 1 eth access up none 10G(D) —

Eth101/1/2 1 eth access down SFP not inserted 10G(D) —

Eth101/1/3 1 eth access down SFP not inserted 10G(D) —

Eth101/1/4 1 eth access down SFP not inserted 10G(D) —

Eth101/1/5 1 eth access down SFP not inserted 10G(D) —

Eth101/1/6 1 eth access down SFP not inserted 10G(D) —

snipped removed

N5K3# sh run int eth1/10
interface Ethernet1/10
switchport mode fex-fabric
fex associate 101

The Fabric Interfaces do not run a Spanning tree instance while the host interfaces do run BPDU guard and BPDU filter by default. The reason why the fabric interfaces do not run spanning tree is because they are backplane point to point interfaces.

By default, the FEX interfaces will send out a couple of BPDU’s on start-up.

N5K3# sh spanning-tree interface Ethernet1/10
No spanning tree information available for Ethernet1/10
N5K3#
N5K3#
N5K3# sh spanning-tree interface Eth101/1/1Vlan Role Sts Cost Prio.Nbr Type
—————- —- — ——— ——– ——————————–
VLAN0001 Desg FWD 2 128.1153 Edge P2p

N5K3#

N5K3# sh spanning-tree interface Eth101/1/1 detail

Port 1153 (Ethernet101/1/1) of VLAN0001 is designated forwarding

Port path cost 2, Port priority 128, Port Identifier 128.1153

Designated root has priority 32769, address 0005.9b1e.82fc

Designated bridge has priority 32769, address 0005.9b1e.82fc

Designated port id is 128.1153, designated path cost 0

Timers: message age 0, forward delay 0, hold 0

Number of transitions to forwarding state: 1

The port type is edge

Link type is point-to-point by default

Bpdu guard is enabled

Bpdu filter is enabled by default

BPDU: sent 11, received 0

N5K3# sh spanning-tree interface Ethernet1/10
No spanning tree information available for Ethernet1/10
N5K3#
N5K3#
N5K3# sh spanning-tree interface Eth101/1/1Vlan Role Sts Cost Prio.Nbr Type—————- —- — ——— ——– ——————————–
VLAN0001 Desg FWD 2 128.1153 Edge P2p

N5K3#

N5K3# sh spanning-tree interface Eth101/1/1 detail

Port 1153 (Ethernet101/1/1) of VLAN0001 is designated forwarding

Port path cost 2, Port priority 128, Port Identifier 128.1153

Designated root has priority 32769, address 0005.9b1e.82fc

Designated bridge has priority 32769, address 0005.9b1e.82fc

Designated port id is 128.1153, designated path cost 0

Timers: message age 0, forward delay 0, hold 0

Number of transitions to forwarding state: 1

The port type is edge

Link type is point-to-point by default

Bpdu guard is enabled

Bpdu filter is enabled by default

BPDU: sent 11, received 0

Issue the commands below to determine the transceiver type for the fabric ports and also the hosts ports for each fabric interface.

Command: “show interface fex-fabric“: displays all the Fabric Extender interfaces

Command: “show fex detail“: Shows detailed information about all FEXs. Including more recent log messages related to the FEX.

N5K3# show interface fex-fabric
Fabric Fabric Fex FEX
Fex Port Port State Uplink Model Serial
—————————————————————
101 Eth1/10 Active 3 N2K-C2232PP-10GE SSI16510AWF
— Eth1/11 Discovered 3 N2K-C2232PP-10GE SSI165204YC
N5K3#
N5K3#

N5K3# show interface Ethernet1/10 fex-intf

Fabric FEX

Interface Interfaces

—————————————————

Eth1/10 Eth101/1/1

N5K3#

N5K3# show interface Ethernet1/10 transceiver

Ethernet1/10

transceiver is present

type is SFP-H10GB-CU3M

name is CISCO-TYCO

part number is 1-2053783-2

revision is N

serial number is TED1530B11W

nominal bitrate is 10300 MBit/sec

Link length supported for copper is 3 m

cisco id is —

cisco extended id number is 4

N5K3# show fex detail

FEX: 101 Description: FEX0101 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/10

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Up. State: Active

Fex Port State Fabric Port

Eth101/1/1 Up Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

Eth101/1/4 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online S

Now we quickly enable the second FEX connected to fabric interface E1/11.

N5K3(config)# int et1/11
N5K3(config-if)# switchport mode fex-fabric
N5K3(config-if)# fex associate 102
N5K3(config-if)# end
N5K3# sh fex
FEX FEX FEX FEX
Number Description State Model Serial

————————————————————————

101 FEX0101 Online N2K-C2232PP-10GE SSI16510AWF

102 FEX0102 Online N2K-C2232PP-10GE SSI165204YC

N5K3# show fex detail

FEX: 101 Description: FEX0101 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/10

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Up. State: Active

Fex Port State Fabric Port

Eth101/1/1 Up Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

Eth101/1/4 Down None

Eth101/1/5 Down None

Eth101/1/6 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online Sequence

08/21/2014 10:00:09.621772: Module Online

FEX: 102 Description: FEX0102 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI165204YC

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:00:42, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/11

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/11 – Interface Up. State: Active

Fex Port State Fabric Port

Eth102/1/1 Up Eth1/11

Eth102/1/2 Down None

Eth102/1/3 Down None

Eth102/1/4 Down None

Eth102/1/5 Down None

snippet removed

Logs:

08/21/2014 10:12:13.281018: Module register received

08/21/2014 10:12:13.283215: Registration response sent

08/21/2014 10:12:13.421037: Module Online Sequence

08/21/2014 10:12:16.665624: Module Online

Part 2. Fabric Interfaces redundancy

Static Pinning is when you pin a number of host ports to a fabric port. If the fabric port goes down so do the host ports that are pinned to it. This is useful when you want no oversubscription in the network.

Once the host port shut down due to a fabric port down event, the server if configured correctly should revert to the secondary NIC.

The “pinning max-link” divides the number specified in the command by the number of host interfaces to determine how many host interfaces go down if there is a fabric interface failure.

Now we shut down fabric interface E1/10, you can see that Eth101/1/1 has changed its operation mode to DOWN. The FEX has additional connectivity with E1/11 which remains up.

Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# int et1/10
N5K3(config-if)# shu
N5K3(config-if)#
N5K3(config-if)# end
N5K3# sh fex detail
FEX: 101 Description: FEX0101 state: Offline
FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI16510AWF

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:02:02, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic:

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/10 – Interface Down. State: Configured

Fex Port State Fabric Port

Eth101/1/1 Down Eth1/10

Eth101/1/2 Down None

Eth101/1/3 Down None

snippet removed

Logs:

08/21/2014 10:00:06.107783: Module register received

08/21/2014 10:00:06.109935: Registration response sent

08/21/2014 10:00:06.239466: Module Online Sequence

08/21/2014 10:00:09.621772: Module Online

08/21/2014 10:13:20.50921: Deleting route to FEX

08/21/2014 10:13:20.58158: Module disconnected

08/21/2014 10:13:20.61591: Offlining Module

08/21/2014 10:13:20.62686: Module Offline Sequence

08/21/2014 10:13:20.797908: Module Offline

FEX: 102 Description: FEX0102 state: Online

FEX version: 5.1(3)N2(1c) [Switch version: 5.1(3)N2(1c)]

FEX Interim version: 5.1(3)N2(1c)

Switch Interim version: 5.1(3)N2(1c)

Extender Serial: SSI165204YC

Extender Model: N2K-C2232PP-10GE, Part No: 73-12533-05

Card Id: 82, Mac Addr: f0:29:29:ff:00:42, Num Macs: 64

Module Sw Gen: 12594 [Switch Sw Gen: 21]

post level: complete

pinning-mode: static Max-links: 1

Fabric port for control traffic: Eth1/11

FCoE Admin: false

FCoE Oper: true

FCoE FEX AA Configured: false

Fabric interface state:

Eth1/11 – Interface Up. State: Active

Fex Port State Fabric Port

Eth102/1/1 Up Eth1/11

Eth102/1/2 Down None

Eth102/1/3 Down None

Eth102/1/4 Down None

snippet removed

Logs:

08/21/2014 10:12:13.281018: Module register received

08/21/2014 10:12:13.283215: Registration response sent

08/21/2014 10:12:13.421037: Module Online Sequence

08/21/2014 10:12:16.665624: Module Online

Port Channels can be used instead of static pinning between parent switch and FEX so in the event of a fabric interface failure all hosts ports remain active.  However, the remaining bandwidth on the parent switch will be shared by all the host ports resulting in an increase in oversubscription.

Part 3. Fabric Extender Topologies

Straight-Through: The FEX is connected to a single parent switch. The servers connecting to the FEX can leverage active-active data plane by using host vPC.

Shutting down the Peer link results in ALL vPC member ports in the secondary peer become disabled. For this reason it is better to use a Dual Homed.

Dual Homed: Connecting a single FEX to two parent switches.

In active – active, a single parent switch failure does not affect the host’s interfaces because both vpc peers have separate control planes and manage the FEX separately.
For the remainder of the post we are going to look at Dual Homed FEX connectivity with host Vpc.

Full configuration:

N5K1:
feature lacp
feature vpc
feature fex
!
vlan 10
!

vpc domain 1

peer-keepalive destination 192.168.0.52

!

interface port-channel1

switchport mode trunk

spanning-tree port type network

vpc peer-link

!

interface port-channel10

switchport access vlan 10

vpc 10

!

interface Ethernet1/1

switchport access vlan 10

spanning-tree port type edge

speed 1000

!

interface Ethernet1/3 – 5

switchport mode trunk

spanning-tree port type network

channel-group 1 mode active

!

interface Ethernet1/10

switchport mode fex-fabric

fex associate 101

!

interface Ethernet101/1/1

switchport access vlan 10

channel-group 10 mode on

N5K2:

feature lacp

feature vpc

feature fex

!

vlan 10

!

vpc domain 1

peer-keepalive destination 192.168.0.51

!

interface port-channel1

switchport mode trunk

spanning-tree port type network

vpc peer-link

!

interface port-channel10

switchport access vlan 10

vpc 10

!

interface Ethernet1/2

switchport access vlan 10

spanning-tree port type edge

speed 1000

!

interface Ethernet1/3 – 5

switchport mode trunk

spanning-tree port type network

channel-group 1 mode active

!

interface Ethernet1/11

switchport mode fex-fabric

fex associate 102

!

interface Ethernet102/1/1

switchport access vlan 10

channel-group 10 mode on

The FEX do not support LACP so configure the port-channel mode to “ON”

The first step is to check the VPC peer link and general VPC parameters.

Command: “show vpc brief“. Displays the vPC domain ID, the peer-link status, the keepalive message status, whether the configuration consistency is successful, and whether peer-link formed or the failure to form

Command: “show vpc peer-keepalive” Displays the destination IP of the peer keepalive message for the vPC. The command also displays the send and receives status as well as the last update from the peer in seconds and milliseconds

N5K3# sh vpc brief
Legend:
(*) – local vPC is down, forwarding via vPC peer-link
vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive

Configuration consistency status: success

Per-vlan consistency status : success

Type-2 consistency status : success

vPC role : primary

Number of vPCs configured : 1

Peer Gateway : Disabled

Dual-active excluded VLANs : –

Graceful Consistency Check : Enabled

vPC Peer-link status

———————————————————————

id Port Status Active vlans

— —- —— ————————————————–

1 Po1 up 1,10

vPC status

—————————————————————————-

id Port Status Consistency Reason Active vlans

—— ———– —— ———– ————————– ———–

10 Po10 up success success 10

N5K3# show vpc peer-keepalive

vPC keep-alive status : peer is alive

–Peer is alive for : (1753) seconds, (536) msec

–Send status : Success

–Last send at : 2014.08.21 10:52:30 130 ms

–Sent on interface : mgmt0

–Receive status : Success

–Last receive at : 2014.08.21 10:52:29 925 ms

–Received on interface : mgmt0

–Last update from peer : (0) seconds, (485) msec

vPC Keep-alive parameters

–Destination : 192.168.0.54

–Keepalive interval : 1000 msec

–Keepalive timeout : 5 seconds

–Keepalive hold timeout : 3 seconds

–Keepalive vrf : management

–Keepalive udp port : 3200

–Keepalive tos : 192

The trunk interface should be forwarding on the peer link and VLAN 10 must be forwarding and active on the trunk link. Take note if any vlans are on err-disable mode on the trunk.

N5K3# sh interface trunk
——————————————————————————-
Port Native Status Port
Vlan Channel
——————————————————————————–
Eth1/3 1 trnk-bndl Po1
Eth1/4 1 trnk-bndl Po1
Eth1/5 1 trnk-bndl Po1

Po1 1 trunking —

——————————————————————————–

Port Vlans Allowed on Trunk

——————————————————————————–

Eth1/3 1-3967,4048-4093

Eth1/4 1-3967,4048-4093

Eth1/5 1-3967,4048-4093

Po1 1-3967,4048-4093

——————————————————————————–

Port Vlans Err-disabled on Trunk

——————————————————————————–

Eth1/3 none

Eth1/4 none

Eth1/5 none

Po1 none

——————————————————————————–

Port STP Forwarding

——————————————————————————–

Eth1/3 none

Eth1/4 none

Eth1/5 none

Po1 1,10

——————————————————————————–

Port Vlans in spanning tree forwarding state and not pruned

——————————————————————————–

Eth1/3 —

Eth1/4 —

Eth1/5 —

Po1 —

——————————————————————————–

Port Vlans Forwarding on FabricPath

——————————————————————————–

N5K3# sh spanning-tree vlan 10

VLAN0010

Spanning tree enabled protocol rstp

Root ID Priority 32778

Address 0005.9b1e.82fc

This bridge is the root

Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 32778 (priority 32768 sys-id-ext 10)

Address 0005.9b1e.82fc

Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type

—————- —- — ——— ——– ——————————–

Po1 Desg FWD 1 128.4096 (vPC peer-link) Network P2p

Po10 Desg FWD 1 128.4105 (vPC) Edge P2p

Eth1/1 Desg FWD 4 128.129 Edge P2p

Check the Port Channel database and determine the status of the port channel

N5K3# show port-channel database
port-channel1
Last membership update is successful
3 ports in total, 3 ports up
First operational port is Ethernet1/3
Age of the port-channel is 0d:00h:13m:22s
Time since last bundle is 0d:00h:13m:18s
Last bundled member is Ethernet1/5

Ports: Ethernet1/3 [active ] [up] *

Ethernet1/4 [active ] [up]

Ethernet1/5 [active ] [up]

port-channel10

Last membership update is successful

1 ports in total, 1 ports up

First operational port is Ethernet101/1/1

Age of the port-channel is 0d:00h:13m:20s

Time since last bundle is 0d:00h:02m:42s

Last bundled member is Ethernet101/1/1

Time since last unbundle is 0d:00h:02m:46s

Last unbundled member is Ethernet101/1/1

Ports: Ethernet101/1/1 [on] [up] *

To execute reachability tests, create an SVI on the first parent switch and run ping tests. You must first enable the feature set “feature interface-vlan”. The reason we create an SVI in VLAN 10 is because we need an interfaces to source our pings.

N5K3# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# fea
feature feature-set
N5K3(config)# feature interface-vlan
N5K3(config)# int vlan 10
N5K3(config-if)# ip address 10.0.0.3 255.255.255.0

N5K3(config-if)# no shu

N5K3(config-if)#

N5K3(config-if)#

N5K3(config-if)# end

N5K3# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

64 bytes from 10.0.0.3: icmp_seq=0 ttl=255 time=0.776 ms

64 bytes from 10.0.0.3: icmp_seq=1 ttl=255 time=0.504 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=255 time=0.471 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=255 time=0.473 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=255 time=0.467 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 5 packets received, 0.00% packet loss

round-trip min/avg/max = 0.467/0.538/0.776 ms

N5K3# ping 10.0.0.10

PING 10.0.0.10 (10.0.0.10): 56 data bytes

Request 0 timed out

64 bytes from 10.0.0.10: icmp_seq=1 ttl=127 time=1.874 ms

64 bytes from 10.0.0.10: icmp_seq=2 ttl=127 time=0.896 ms

64 bytes from 10.0.0.10: icmp_seq=3 ttl=127 time=1.023 ms

64 bytes from 10.0.0.10: icmp_seq=4 ttl=127 time=0.786 ms

— 10.0.0.10 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 0.786/1.144/1.874 ms

N5K3#

Do the same tests on the second Nexus 5K.

N5K4(config)# int vlan 10
N5K4(config-if)# ip address 10.0.0.4 255.255.255.0
N5K4(config-if)# no shu
N5K4(config-if)# end
N5K4# ping 10.0.0.10
PING 10.0.0.10 (10.0.0.10): 56 data bytes
Request 0 timed out

64 bytes from 10.0.0.10: icmp_seq=1 ttl=127 time=1.49 ms

64 bytes from 10.0.0.10: icmp_seq=2 ttl=127 time=1.036 ms

64 bytes from 10.0.0.10: icmp_seq=3 ttl=127 time=0.904 ms

64 bytes from 10.0.0.10: icmp_seq=4 ttl=127 time=0.889 ms

— 10.0.0.10 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 0.889/1.079/1.49 ms

N5K4# ping 10.0.0.13

PING 10.0.0.13 (10.0.0.13): 56 data bytes

Request 0 timed out

Request 1 timed out

Request 2 timed out

Request 3 timed out

Request 4 timed out

— 10.0.0.13 ping statistics —

5 packets transmitted, 0 packets received, 100.00% packet loss

N5K4# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

Request 0 timed out

64 bytes from 10.0.0.3: icmp_seq=1 ttl=254 time=1.647 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=254 time=1.298 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=254 time=1.332 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=254 time=1.24 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 4 packets received, 20.00% packet loss

round-trip min/avg/max = 1.24/1.379/1.647 ms

Shut down one of the FEX links to the parent and you see that the FEX is still reachable via the other link that is in the port channel bundle.

N5K3# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N5K3(config)# int Eth101/1/1
N5K3(config-if)# shu
N5K3(config-if)# end
N5K3#
N5K3#

N5K3# ping 10.0.0.3

PING 10.0.0.3 (10.0.0.3): 56 data bytes

64 bytes from 10.0.0.3: icmp_seq=0 ttl=255 time=0.659 ms

64 bytes from 10.0.0.3: icmp_seq=1 ttl=255 time=0.515 ms

64 bytes from 10.0.0.3: icmp_seq=2 ttl=255 time=0.471 ms

64 bytes from 10.0.0.3: icmp_seq=3 ttl=255 time=0.466 ms

64 bytes from 10.0.0.3: icmp_seq=4 ttl=255 time=0.465 ms

— 10.0.0.3 ping statistics —

5 packets transmitted, 5 packets received, 0.00% packet loss

round-trip min/avg/max = 0.465/0.515/0.659 ms

If you would like to futher your knowlege on VPC and how it relates to Data Center toplogies and more specifically, Cisco’s Application Centric Infrastructure (ACI), you can check out my following training courses on Cisco ACI. Course 1: Design and Architect Cisco ACI,  Course 2: Implement Cisco ACI, and Course 3: Troubleshooting Cisco ACI,

SDN Data Center

Redundant links with Virtual PortChannels

Redundant Links with Virtual PortChannels

In the world of networking, efficiency, and reliability are paramount. As data centers expand and organizations strive for seamless connectivity, Virtual PortChannels (vPCs) have emerged as a powerful solution. This blog post aims to demystify vPCs, comprehensively understanding their benefits, functionality, and implementation considerations.

Virtual PortChannels, also known as vPCs, are a technology designed to enhance network scalability and resiliency. By combining multiple physical links into a single logical interface, vPCs allow for increased bandwidth and redundancy, ensuring uninterrupted connectivity and load balancing across network switches.

Redundant links refer to the practice of having multiple physical connections between network devices. This approach mitigates the risks of single points of failure and ensures uninterrupted network connectivity. However, managing redundant links can be complex and resource-intensive.

Virtual PortChannel (vPC) is a technology developed by Cisco Systems that revolutionizes how redundant links are deployed and managed. It allows the creation of a logical link aggregation group (LAG) by bundling multiple physical links into a single logical interface. This logical interface acts as a single point of attachment for downstream devices, simplifying the network topology.

Enhanced Redundancy: By bundling multiple physical links into a vPC, network administrators can achieve higher levels of redundancy. In the event of a link failure, traffic seamlessly fails over to the remaining active links, ensuring uninterrupted connectivity.

Improved Bandwidth Utilization: vPC enables load balancing across multiple physical links, maximizing the available bandwidth. This intelligent distribution of traffic prevents link congestion and optimizes network performance.

Simplified Network Design: Traditional redundant link configurations often involve complex Spanning Tree Protocol (STP) configurations to avoid loops. With vPC, STP is not required, simplifying the network design and reducing potential points of failure.

Hardware and Software Requirements: Implementing vPC requires compatible hardware and software. Network administrators must ensure that their devices support vPC functionality and that the necessary licenses are in place.

Configuration Best Practices: Proper configuration is crucial for the successful deployment of vPC. Network administrators should follow best practices provided by the equipment manufacturer and ensure consistency across all devices in the vPC domain.

Real-World Use Cases

Data Centers: vPC is widely used in data center environments to provide high availability and optimal network performance. It allows for seamless migration of virtual machines (VMs) across physical hosts without losing network connectivity.

Campus Networks: Large campus networks can leverage vPC to enhance redundancy and simplify network management. By aggregating multiple uplinks from access switches, vPC provides a resilient and scalable network infrastructure.

Highlights: Redundant Links with Virtual PortChannels

Port Channels and vPCs

During the early days of Layer 2 Ethernet networks, Spanning Tree Protocol (STP) was used to limit the devastating effects of a topology loop. Even though there may be many connections in a network, STP has one suboptimal principle: only one active path is allowed between two devices.

There are two problems with a single logical link: the first is that half (or more) of the system’s bandwidth is unavailable to data traffic, and the second is that if the active link fails, the network will experience multiple seconds of systemwide data loss as it re-evaluates the new “best” solution to network forwarding on a Layer 2 network.

One of the significant drawbacks of the spanning tree is the concept of blocking ports. While they are essential to prevent loops, blocking ports leads to inefficient network performance. The blocked ports essentially go unused, resulting in unused bandwidth and decreased overall network throughput.

Spanning Tree Root Switch

Load Balancing

Furthermore, in a robust network with STP loop management, there is no efficient dynamic way to utilize all the available bandwidth. Enhanced Layer 2 Ethernet networks have been developed through the use of port channels and virtual port channels (vPCs). Port Channel technology allows forwarding traffic between two participating devices using a load-balancing algorithm to balance traffic across multiple inter-switch links (ISLs).

By bundling the links together as one logical link, the loop problem is also managed. Multi-device port channels can be formed using vPC technology. Port channel-attached devices see a pair of switches acting as a single logical endpoint when attached to a vPC peer; the devices act as separate endpoints. By combining hardware redundancy with port-channel loop management, the vPC environment provides multiple benefits.

Example: Cisco ACI

These technologies are extensively used in the ACI Cisco. A virtual port channel (vPC) allows links physically connected to two different ACI leaf nodes to appear as a single port channel to a third device (i.e., network switch, server, or any other networking device that supports link aggregation technology). Firstly, let us start with the basics.

**Spanning Tree Challenges**

Traditional spanning trees challenge network designers as they block redundant links. The drawbacks of STP ( spanning tree protocol ) prove extremely expensive in data centers when multiple redundant links are used for mission-critical applications, essentially wasting 50% of the capacity.

You can use the port channel to scale bandwidth, as the bundled links appear as one to higher-level protocols, resulting in all ports forwarding or blocking for a particular VLAN. It would help if you aimed to design all links in a data center as an EtherChannel, as this will optimize your bandwidth and reliability.

STP Path distribution

EtherChannel Technology

Network administrators connect multiple physical Ethernet links between devices to achieve more bandwidth and redundancy. The Spanning Tree Protocol blocks these links, so we need EtherChannel Technology. EtherChannel technology combines several physical links between switches into one logical connection to provide high-speed links and redundancy without being blocked by the Spanning Tree Protocol.

Understanding Layer 2 EtherChannel

Layer 2 EtherChannel, also known as Link Aggregation or Port Channel, combines multiple physical links into a single logical link. This powerful technique enhances bandwidth, improves redundancy, and optimizes load balancing. By bundling multiple links, Layer 2 EtherChannel presents a unified interface for higher throughput and fault tolerance.

Configuring Layer 2 EtherChannel involves steps that vary depending on the networking equipment used. Generally, it starts with identifying the physical links that will be part of the EtherChannel bundle. Then, the appropriate channel mode and load balancing method must be configured. Lastly, the EtherChannel interface is created, and the physical links are assigned. Proper configuration ensures seamless data transmission and efficient utilization of network resources.

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel, also known as routed Etherchannel or port-channel, bundles multiple physical interfaces into a single logical interface to increase bandwidth and provide redundancy at Layer 3. Unlike Layer 2 Etherchannel, which operates at the data link layer, Layer 3 Etherchannel operates at the network layer, allowing traffic distribution across multiple physical links based on routing protocols.

Layer 3 Etherchannel offers several advantages over traditional single link configurations. Firstly, it allows for load balancing, where traffic is distributed across multiple links, maximizing bandwidth utilization and improving overall network performance. Additionally, Layer 3 Etherchannel provides redundancy, ensuring that traffic seamlessly switches to the remaining active links if one link fails, minimizing downtime, and enhancing network reliability.

Configuration of Layer 3 Etherchannel

Configuring Layer 3 Etherchannel involves a few essential steps. Firstly, the physical interfaces that will be part of the Etherchannel bundle must be identified and prepared. Then, a logical interface, often called a port-channel interface, is created. This interface acts as the virtual representation of the bundled physical links. Next, the routing protocol must be configured to distribute traffic across the links. Finally, verification and testing are crucial to ensure proper configuration and functionality.

Layer 3 Etherchannel finds its applications in various scenarios. One everyday use case is in data centers, where high-speed connectivity and redundancy are critical. By bundling multiple links, Layer 3 Etherchannel provides the bandwidth and failover capabilities required for demanding data center environments. Another use case is in network edge deployments, where Layer 3 Etherchannel allows for efficient load balancing and redundancy in connecting access switches to distribution or core switches.

Port Channel and vPC 

Port Channel technology forwards traffic between two participating devices using a load-balancing algorithm. Multiple devices can form a virtual port channel (vPC). A third device can see two Cisco Nexus 7000 or 9000 Series devices as a single port channel using a virtual port channel (vPC). Third devices can be switches, servers, or other networking devices that support port channels.

A virtual private cloud can provide Layer 2 multipathing to create redundancy and increase bandwidth by enabling multiple parallel paths between nodes and load-balancing traffic. The only ones you can use in the vPC are Layer 2 port channels. LACP or a static no protocol configuration is used to configure the port channels.

vPC provides the following technical benefits:

  • A single device can share a port channel between two upstream devices
  • Spanning Tree Protocol (STP) blocked ports are removed
  • Makes sure there are no loops in the topology
  • Uplink bandwidth is utilized to the fullest extent possible
  • When either a device or a link fails, the system quickly converges
  • Resilience at the link level is ensured
  • Ensures a high level of availability

**Implementation of vPC topologies**

VPC supports the following topologies:

  1. Dual-uplink Layer 2 access: Using a Cisco Nexus 9000 Series switch, an access switch is dual-homed to a pair of distribution switches.
  2. Dual-homing: This topology connects two servers to two switches,
  3. Topologies supported by FEX: FEX supports various vPC topologies using Cisco Nexus 7000 and 9000 Series switches.

Related: For pre-information, you may find the following posts helpful:

  1. Data Center Fabric
  2. Optimal Layer 3 Forwarding
  3. Data Center Failure
  4. Active Active Data Center Design
  5. Network Overlays
  6. Dead Peer Detection

Redundant Links with Virtual PortChannels

STP has one suboptimal principle: to break loops in a network. This issue with having a single practice is that only one active path is allowed from one device to another, regardless of how many connections might exist in the network. In addition, no efficient dynamic mechanism exists for using all the available bandwidth with STP loop management.

  • Port Channel Technology

So, to overcome these challenges, enhancements to Layer 2 Ethernet networks were made in the shape of port channel and virtual port channel (vPC) technologies. Port Channel technology permits multiple links between two participating devices to forward traffic using a load-balancing algorithm while managing the loop problem by bundling the links as one logical link.

  • vPC Technology

Then, we have vPC technology. This technology permits multiple devices to create a port channel. In vPC, a pair of switches acting as a vPC peer endpoint looks like a single logical entity to port channel–attached devices; the two devices that serve as the logical port-channel endpoint are still two different and separate devices.

High Availability: Link and Device.

You need to identify the level of high availability you want to achieve in enterprise branch offices. Then, you can meet your high availability requirements with the appropriate device level and link redundancy.

Link-level redundancy requires two links to run as active/active or active/backup links to recover traffic forwarding lost if one link fails. Therefore, any failure on an access link should not result in a loss of connectivity. To qualify, a branch office must have at least two upstream links, either to a private network or the Internet.

Device-level redundancy is another level of high availability, ensuring that the backup device can take over in the event of a failed device. Device and link redundancy is typically coupled, which means that if one fails, the other will too. As a result of this strategy, there should be no loss of connectivity between branch offices and data centers due to a single device failure.

High Availability and Designs

High-availability designs combine link and device redundancy between branches and data centers to ensure business-critical connectivity. Each data center is dual-homed, so if one fails, traffic can be redirected to the backup data center in the event of a complete failure.

Reroute traffic within 30 seconds should be possible whenever a failure (link, device, or data center) occurs. Packets can be lost during this period. When the user applications can withstand these failover times, sessions are maintained. Established sessions should not be dropped in a branch office with redundant devices if the failed device was forwarding traffic.

redundant links
Diagram: Redundant links with EtherChannel. Source is jmcritobal

**vPC vs Port Channel**

Servers can be attached to the access switches as port channels, uplinks that consist of redundant links formed from the access can be link aggregated, and the core links can also be bundled. Most switches can support 8 ports in a bundle, and Nexus platforms can support up to 16 – 32 ports.

It would help to create a port channel with ports from different line cards in each redundant switch. This will prevent the failure of a single line card from affecting the entire channel. With this approach, we get redundancy at a logical and physical layer.

Link Aggregation and Port Channels

Link aggregation (EtherChannel and IEEE 802.3ad ) was developed to address that limitation where two Ethernet redundant switches were connected through multiple up-links. However, this did not address the challenges in the data center environment for deploying link aggregation on triangular topologies or if you want to terminate on different switches.

Traditional LAG ( link aggregation ) has limitations because its standard only allows aggregated links to terminate on a single switch. Technologies such as vPC Virtual Port Channel and Virtual Switching System (VSS) have been implemented to overcome this limitation.

Key Points: Port Channels

In summary, a port channel aggregates multiple physical interfaces that create a logical interface. On some platforms, you can bundle up to 32 individual redundant links. The Port channel will also load balance traffic across the redundant links. The port channel will remain operational as long as at least one physical interface within the port channel is operational. Finally, before we move to vPC vs port channel, you can create either Layer 2 or 3 port channels. However, as expected, you cannot combine Layer 2 and 3 interfaces in the same port channel.

Port-channel load balancing

Frames are distributed between the physical interfaces that make up the port channel using a hashing function. This hash will differ depending on the load balancing method. Based on the hash result, the physical port to be used for transmission is determined.

A hashing operation can be performed on MAC or IP addresses based on the source address, destination address, or both (some methods use the port number). Depending on the switch model and software version, default load-balancing methods can be layer 2, 3, or 4 and apply globally to all port channels. Here are a few methods for balancing etherchannels:

  • src-ip : Source IP address
  • dst-ip : Destination IP address
  • src-dst-ip : Source and destination IP address
  • src-mac : Source MAC address
  • dst-mac : Destination MAC address
  • src-dst-mac : Source and destination MAC address
  • src-port : Source port number
  • dst-port : Destination port number
  • src-dst-port : Destination source port number

Starting the Debate: vPC vs Port Channel

vPC (Virtual Port-Channel), or multi-chassis ether channel (MEC), is a feature on the Cisco Nexus switches. You can configure port channels across multiple redundant switches. The virtual port channel (vPC) is configured using the interfaces of two redundant switches.

Now we have redundancy at a link and a switch layer forming a triangular design. We must terminate the links with a standard port channel on the same switch. We don’t have a channel between two redundant switches in this case.

Virtual PortChannels (vPCs), links between two Cisco switches, appear to a third downstream device as coming from one device and as part of a single PortChannel. A third device can be a switch, a server, or other networking devices that support IEEE 802.3ad PortChannels. Both standard port channels and Virtual PortChannels (vPC) can use the link Aggregation Control Protocol ( LACP ).

LACP negotiation and redundant switches

As part of the IEEE 802.3ad standards, a Link Aggregation Control Protocol ( LACP ) was created to negotiate the channel, and it is recommended that this feature be used when building a bundle. LACP modes can be either active or passive. Active mode means the switch actively negotiates the channel, whereas passive means the port does not initiate an LACP negotiation.

You can form channels between active and passive or two active ports but not passive and passive ports. The port channel will not negotiate and remain down if the correct modes are not configured on each side of the challenge.

The following diagram depicts the logic and physical aspects of a vPC virtual port channel. This is not specific to a Cisco vPC but all aggregation technologies. We have several physical redundant links; in our case, four appear to be one prominent link from a logical perspective.

vpc virtual port channel
Diagram: vPC virtual port channels.

In either case, LACP can be used as the control plane to negotiate the channel. You may ask, is LACP mandatory for vPC?” – No. We can use mode “On” & bring UP the port channel without negotiation/checks. So, we are turning off the LACP or other control protocols. However, is LACP recommended for vPC?” –

Like a normal port channel, it is always advised to use a control protocol for the vPC/port channel. LACP adds a lot of intelligence to the background. The main difference between vPC and port channels is that vPC can terminate on secondary switches, creating a triangular design.

Building triangles for better redundancy

The quandary of the inability to build triangles with link aggregation can be mitigated by deploying either the Nexus technology, known as virtual Port Channels (vPCs), or the Catalyst technology, known as Virtual Switching System ( VSS ). VSS and vPC virtual port channels allow the termination of an LAG on two separate switches, resulting in a triangular design. In addition, they will enable the grouping of two physical redundant switches to form a single logical switch to any downstream device ( switch or server ).

Load Balancing Functions

A hash function is performed when a layer 2 frame is forwarding to a PortChannel to determine which physical links to forward the frame. The load balancing method used for Nexus switches is granular and includes the following:

Nexus Switches

Load Balancing Method

Option 1

Destination IP address

Option 2

Destination MAC address

Option 3

Destination TCP and UDP port number

Option 4

Source and Destination IP address

Option 5

Source and Destination MAC address

Option 6

Source and Destination TCP and UDP port numbers

Option 7

Source IP address

Option 8

Source MAC address

Option 9

Source TCP and UDP port number

Redundant Links: Detect polarized links

Monitoring the traffic distribution over each physical link is essential to detect polarized links. The polarization effect occurs if some links attract more traffic than others, resulting in heavy utilization of some redundant links and low utilization of others. Therefore, before choosing the load balancing method, analyze the traffic flows from source to destination and determine if the flow is too many or evenly spread. For example, I would not use the source IP address load balancing method to load balance traffic from a firewall deploying Network Address Translation ( NAT ) to a single device.

Routing Protocols

Keep in mind for routing convergence, that routing protocols see the channel as one link, so if you have 8 x 10 ports in one bundle and that bundle has an OSPF cost of 10, a failure occurs, and you lose a member of that channel, the OSPF will still mark that link with the same metric. Routing protocols don’t dynamically change metrics due to a member link failure.

vPC and VSS offer the following benefits:

Redundant links with Virtual PortChannels

Improved convergence with a link and device failure.

Eliminate the need for STP.

Independent control planes ** Not with the VSS.

Increased bandwidth but combining all redundant links to one from the perspective of STP.

What is vPC?

MEC (Multi-Chassis EtherChannel) is a feature on Cisco Nexus switches that allows you to configure a Port-Channel across multiple switches (i.e., vPC peers). Virtual PC is similar to Virtual Switch System (VSS) on Catalyst 6500s. However, VSS creates just one logical switch instead of vPC’s multiple ones. In this way, a single control plane handles both management and configuration.

vPC, on the other hand, allows each switch to be managed and configured independently. vPC manages both switches independently, so it’s important to remember that. Therefore, you must create and permit your VLANs on both Nexus switches.

Comparing vPC and VSS

vPC and VSS are similar technologies, but the Nexus vPC feature has dual control planes. It offers In-Service Upgrade ( ISSU ), which allows upgrading one of the two switches without causing any service interruption. Because the control plane runs independently on each of the vPC peers, the failure of one peer does not affect the virtual switch.

With the VSS, the active peer going down brings down the entire system because of the lack of dual control planes. It should be worth noting that vPC falls back to STP, and the reliance on STP can only be entirely circumvented if you use Cisco’s Fabric Path or THRILL. The VSS is available on the Catalyst platforms, while vPC is solely a Nexus technology.

**vPC Terminology:**

  • vPC Peer – a vPC switch, one of a pair.
  • vPC member port – one of the ports that form a vPC.
  • vPC – the combined port channel between the vPC peers and the downstream device.
  • vPC peer-link – link used to synchronize state between vPC peer devices, must be 10GbE. The vPC-related control plane communications occur over this link, and any Ethernet frames transported receive special treatment to avoid loops in the vPC member ports.
  • vPC peer keepalive link – the keepalive link between the vPC peer devices. It recommended using the MGMT0 interface and a VRF instance. If the mgmt interface is unavailable, then a routed interface in the mgmt VRF is.
  • vPC VLAN – one of the VLANs that carry over the vPC peer link and communicate via the vPC with a peer device.
  • non-vPC peer VLAN – One STP VLAN is not carried over the peer link.
  • CFS – Cisco Fabric Service Protocol, used for state synchronization and configuration validation between vPC peer devices.

Within a vPC domain, each pair is assigned to a primary or secondary role; by default, the switch with the lowest MAC address becomes the primary peer. The domain identifies the pair of redundant switches, generating a shared MAC address that can be used as a logical switch bridge ID in STP communication.

Virtual PortChannels Best Practices

Below are the best practices to consider for implementation:

  1. Manually define which vPC switch is primary and secondary. Lower the priority, and the more preferred switch will act as the primary.
  2. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer-link with ports in dedicated mode.
  3. Form Layer 2 port channels using different 10GE modules on the Nexus switch for the vPC peer keepalive link ( non-default VRF ).
  4. Enable Bridge Assurance ( BA ) on the vPC peer-link interface ( default ).
  5. Enable UDLD aggression on the vPC peer-link interface.
  6. On the primary vPC switch, configure the STP root bridge for a VLAN, the active HSRP router, and the PIM DR router. Likewise, on the secondary vPC switch, configure the secondary STP and the standby HSRP router. The Layer 2 and Layer 3 topologies should match.

Introducing Fabric Extenders

If you want to add even more redundancy to vPC, use it with a fabric extender. Fabric Extenders act as a remote line card to a parent switch and can be used with vPC in three forms. The first is known as host vPC and is a vPC southbound from the FEX to the server; the second is a vPC northbound from the FEX to the parent switch, sometimes called a Fabric vPC; and the third is both a southbound and northbound vPC from the FEX which is known as Enhanced vPC.

Data Center Topology types
Diagram: Introducing Fabric Extenders.

Virtual PortChannels, the single connection

  • Datacenter interconnect

Because vPC characterizes a single connection from the perspective of higher-level protocols, e.g., STP or OSPF, it can be used as a Layer 2 extension for a DCI ( data center interconnect ) over short distances and dark fiber or protected DWDM only. vPC best practices still apply, and it is recommended that you use different vPC domains for each site and that Layer 3 communication between vPC peers is performed on dedicated routed redundant links.

  • OTV or VPLS

If you connect more than two data centers with an entire mesh topology, the best DCI mechanism would be Overlay Transport Virtualization (OTV) or VPLS ( Ethernet-based point-to-multipoint Layer 2 VPN ). vPC can work with two or more data centers, but you must design the topology as a hub-and-spoke.

Any spoke-to-spoke communication must flow through the hub, connecting two data centers back to back or two or more in a hub-and-spoke design. The layer 2 boundary and STP isolation can be achieved with bridge protocol unit ( BPDU ) filtering on the DCI links. BPDU filtering avoids transmitting BPDUs on a link, essentially turning off STP on the DCI links. You can extend with the multi-pod or multi-site designs if you have Cisco ACI.

redundant links
Diagram: vPC as a Data Center Interconnect.
  • A key point: Loop prevention

vPC has a built-in loop prevention mechanism; never forward a frame received through a peer link to a vPC member port. Under normal operations, a vPC peer switch should never learn MAC addresses over the peer link and is mainly used for flooding, multicast, broadcast, and control plane traffic.

This is because the LAG is terminated on two peer switches, and you don’t want to send traffic received from a single downstream device back down to the same downstream device, resulting in a loop. However, this rule does not apply to:

  1. Non-vPC interface ( orphan port ) and
  2. vPC member ports that are only active in the receiving pair.

Note: An orphan port in a port to a downstream device connected to only one peer.

Redundant Switches: vPC peer link usage

As mentioned, the vPC peer link should not be used for end-host reachability under normal operations. However, if there is a failure on all members of a vPC in a single peer, the peer link will forward frames to the remaining member ports of the vPC. This explains why Cisco has recommended using the same 10G for the peer link.

The peer keepalive link is also mandatory and is used as a heartbeat mechanism to transport UDP datagrams between peers. This avoids a dual-active / split-brain scenario where both peers are active simultaneously. If no heartbeat is received after a configurable timeout, the secondary vPC peer is the primary peer, and all its member ports remain active.

However, undesirable behavior occurs if an orphan port is connected to only one peer. For example, with a vPC peer link failure, the orphan ports remain active in the secondary peer, even though they are now isolated from the rest of the network. In this case, it is recommended that a non-vPC trunk be configured between peer switches.

Closing Points on Redundant Links with Virtual Portchannels

vPCs allow multiple links to function as a single logical channel, offering improved bandwidth and fault tolerance. This technology not only enhances performance but also simplifies network design, making it a vital tool for modern enterprises.

Redundant links are the backbone of a resilient network architecture. They ensure uninterrupted data flow even when a primary link fails, thus maintaining the network’s integrity. However, traditional methods of setting up redundancy often lead to complexities such as network loops and inefficient bandwidth utilization. Virtual Portchannels address these issues by aggregating multiple physical links into one logical entity. This aggregation helps in balancing loads more effectively and reduces the potential for network congestion.

Adopting vPCs in your network infrastructure offers several benefits. First and foremost, they provide increased bandwidth by combining the throughput of multiple links. Additionally, vPCs enhance redundancy without the need for complex configurations, resulting in simpler network management. Another significant advantage is the elimination of Spanning Tree Protocol (STP) issues. By bypassing STP, vPCs decrease the risk of network loops and provide a more stable networking environment.

While vPCs present numerous advantages, they are not without challenges. Implementing vPCs requires careful planning and a thorough understanding of both hardware and software configurations. Network administrators must ensure compatibility between devices and be prepared to troubleshoot issues related to synchronization and consistency. Furthermore, as with any technology, ongoing maintenance and monitoring are essential to ensure optimal performance and reliability.

Summary: Redundant Links with Virtual PortChannels

In today’s fast-paced digital world, network reliability and performance are paramount. With the increasing demand for seamless connectivity, businesses seek innovative solutions to enhance their network infrastructure. One such solution that has gained significant traction is the implementation of redundant links with Virtual PortChannel (vPC). In this blog post, we explored the concept of redundant links and delved into the benefits and considerations of utilizing vPC technology.

Understanding Redundant Links

As the name suggests, redundant links are duplicate connections that provide failover capabilities in case of network failures. By establishing multiple links between network devices, organizations can ensure uninterrupted connectivity and minimize the risk of downtime. By distributing traffic across multiple paths, redundant links not only enhance network reliability but also improve overall network performance.

Exploring Virtual PortChannel (vPC) Technology

Virtual PortChannel (vPC) is a technology that aggregates multiple physical links into a single logical link. By bundling these links, vPC provides increased bandwidth, load balancing, and redundancy. This technology enables network devices to form a virtual port channel, presenting as a single port to connected devices. With vPC, organizations can achieve high availability and scalability while simplifying network configuration and management.

Benefits of Redundant Links with vPC

1. Enhanced Network Availability: Redundant links with vPC ensure network availability by providing alternate paths in case of link failures. This redundancy eliminates single points of failure and minimizes the impact of network disruptions.

2. Improved Load Balancing: VPC technology optimizes network performance and prevents bottlenecks by distributing traffic across multiple links. This load-balancing capability results in the efficient utilization of network resources and improved user experience.

3. Simplified Network Management: vPC technology simplifies network configuration and management. By logically consolidating multiple physical links, administrators can streamline their network setup, reducing complexity and potential human errors.

Considerations for Implementing Redundant Links with vPC

While the benefits of redundant links with vPC are significant, it’s essential to consider a few key factors before implementation. Factors such as network topology, hardware compatibility, and proper configuration must be thoroughly evaluated to ensure a successful deployment.

Conclusion:

In conclusion, redundant links with Virtual PortChannel (vPC) present a powerful solution for organizations aiming to enhance network reliability and performance. By combining the advantages of redundant links and virtualization, businesses can achieve high availability, improved load balancing, and simplified network management. With careful planning and consideration, implementing redundant links with vPC can pave the way for a robust and resilient network infrastructure.

multipath tcp

Data Center Topologies

Data Center Topology

In the world of technology, data centers play a crucial role in storing, managing, and processing vast amounts of digital information. However, behind the scenes, a complex infrastructure known as data center topology enables seamless data flow and optimal performance. In this blog post, we will delve into the intricacies of data center topology, its different types, and how it impacts the efficiency and reliability of data centers.

Data center topology refers to a data center's physical and logical layout. It encompasses the arrangement and interconnection of various components like servers, storage devices, networking equipment, and power sources. A well-designed topology ensures high availability, scalability, and fault tolerance while minimizing latency and downtime. As technology advances, so does the landscape of data center topologies. Here are a few emerging trends worth exploring:

Leaf-Spine Architecture: This modern approach replaces the traditional three-tier architecture with a leaf-spine model. It offers high bandwidth, low latency, and improved scalability, making it ideal for cloud-based applications and data-intensive workloads.

Software-Defined Networking (SDN): SDN introduces a new level of flexibility and programmability to data center topologies. By separating the control plane from the data plane, it enables centralized management, automated provisioning, and dynamic traffic optimization.

The chosen data center topology has a significant impact on the overall performance and reliability of an organization's IT infrastructure. A well-designed topology can optimize data flow, minimize latency, and prevent bottlenecks. By considering factors such as fault tolerance, scalability, and network traffic patterns, organizations can tailor their topology to meet their specific needs.

Highlights: Data Center Topology

A data center consists of the following core infrastructure components:

  • Network infrastructure: Connects physical and virtual servers, data center services, storage, and external connections to end users.
  • Storage Infrastructure: Modern data centers use storage infrastructure to power their operations. Storage systems hold this valuable commodity.
  • A data center’s computing infrastructure is its applications. The computing infrastructure comprises servers that provide processors, memory, local storage, and application network connectivity. In the last 65 years, computing infrastructure has undergone three major waves:
    • In the first wave of replacements of proprietary mainframes, x86-based servers were installed on-premises and managed by internal IT teams.
    • In the second wave, application infrastructure was widely virtualized, improving resource utilization and workload mobility across physical infrastructure pools.
    • The third wave finds us in the present, where we see the move to the cloud, hybrid cloud, and cloud-native (that is, applications born in the cloud).

Common Types of Data Center Topologies:

a) Bus Topology: In this traditional topology, all devices are connected linearly to a common backbone, resembling a bus. While it is simple and cost-effective, a single point of failure can disrupt the entire network.

b) Star Topology: Each device is connected directly to a central switch or hub in a star topology. This design offers centralized control and easy troubleshooting, but it can be expensive due to the requirement of additional cabling.

c) Mesh Topology: A mesh topology provides redundant connections between devices, forming a network where every device is connected to every other device. This design ensures high fault tolerance and scalability but can be complex and costly.

d) Hybrid Topology: As the name suggests, a hybrid topology combines elements of different topologies to meet specific requirements. It offers flexibility and allows organizations to optimize their infrastructure based on their unique needs.

**Considerations in Data Center Topology Design**

a) Redundancy: Redundancy is essential to ensure continuous operation even during component failures. By implementing redundant paths, power sources, and network links, data centers can minimize the risk of downtime and data loss.

b) Scalability: As the data center’s requirements grow, the topology should be able to accommodate additional devices and increased data traffic. Scalability can be achieved through modular designs, virtualization, and flexible network architectures.

c) Performance and Latency: The distance between devices, the quality of network connections, and the efficiency of routing protocols significantly impact data center performance and latency. Optimal topology design considers these factors to minimize delays and ensure smooth data transmission.

Google Cloud NCC

### What is Google Network Connectivity Center?

Google NCC is a centralized platform that provides a holistic view of your network infrastructure. It integrates with Google Cloud, enabling businesses to manage their global networks with ease. The platform is built to support hybrid and multi-cloud environments, ensuring that your data center operations are streamlined and efficient.

### Key Features and Benefits

#### Unified Network Management

One of the standout features of Google NCC is its ability to consolidate various network management tasks into a single interface. This means less time spent juggling multiple tools and more time focusing on core business activities.

#### Enhanced Security

Security is a critical concern for any organization. Google NCC incorporates robust security measures, including end-to-end encryption and advanced threat detection, to safeguard your network against potential risks.

#### Scalability and Flexibility

As your business grows, so does your need for a scalable network solution. Google NCC offers unparalleled scalability, allowing you to expand your network infrastructure effortlessly. Its flexibility ensures that it can adapt to the ever-changing demands of your business.

### Integrating with Data Centers

Google NCC is designed to seamlessly integrate with your existing data centers. This integration ensures that you can manage your on-premises and cloud-based resources from a single platform. The result is a more cohesive and efficient network management experience.

### Real-World Applications

#### Enterprise Connectivity

For large enterprises, managing a sprawling network can be a daunting task. Google NCC simplifies this by providing a unified platform that can handle complex network topologies. This makes it easier to connect multiple branch offices, remote workers, and cloud services.

#### Optimized Performance

Google NCC leverages advanced algorithms to optimize network performance. This ensures that your applications run smoothly and that data is transmitted efficiently. Whether you’re running a global e-commerce site or a high-demand application, NCC has you covered.

Impact of Data Center Topology:

Efficient data center topology directly influences the entire infrastructure’s reliability, availability, and performance. A well-designed topology reduces single points of failure, enables load balancing, enhances fault tolerance, and optimizes data flow. It directly impacts the user experience, especially for cloud-based services, where data centers simultaneously cater to many users.

Knowledge Check: Cisco ACI Building Blocks

Before Cisco ACI 4.1, the Cisco ACI fabric supported only a two-tier (leaf-and-spine switch) topology in which leaf switches are connected to spine switches without interconnecting them. Starting with Cisco ACI 4.1, the Cisco ACI fabric allows multitier (three-tier) fabrics and two tiers of leaf switches, allowing vertical expansion. As a result, a traditional three-tier aggregation access architecture can be migrated, which is still required for many enterprise networks.

In some situations, building a full-mesh two-tier fabric is not ideal due to the high cost of fiber cables and the limitations of cable distances. A spine-leaf topology is more efficient in these cases, and Cisco ACI continues to automate and improve visibility.

ACI fabric Details
Diagram: Cisco ACI fabric Details

Choosing a topology

Data centers are the backbone of many businesses, providing the necessary infrastructure to store and manage data and access applications and services. As such, it is essential to understand the different types of available data center topologies.

When choosing a topology for a data center, the organization’s specific needs and requirements must be considered. Each topology offers its advantages and disadvantages, so it is crucial to understand the pros and cons of each before making a decision.

A data center topology refers to the physical layout and interconnection of network devices within a data center. It determines how servers, switches, routers, and other networking equipment are connected, ensuring efficient and reliable data transmission. Topologies are based on scalability, fault tolerance, performance, and cost.

Scalability of the topology

Additionally, it is essential to consider the topology’s scalability, as a data center may need to accommodate future growth. By understanding the different topologies and their respective strengths and weaknesses, organizations can make the best decision for their data centers. For example, in a spine-and-leaf architecture, traffic traveling from one server to another always crosses the same number of devices (unless both servers are located on the same leaf). Payloads need only hop to a spine switch and another leaf switch to reach their destination, thus reducing latency.

what is spine and leaf architecture

Data Center Topology Types

Centralized Model

Smaller data centers (less than 5,000 square feet) may benefit from the centralized model. It is shown that there are separate local area networks (LANs) and storage area networks (SANs), with home-run cables going to each server cabinet and zone. Each server is effectively connected back to the core switches in the main distribution area.

As a result, port switches can be utilized more efficiently, and components can be managed and added more quickly. The centralized topology works well for smaller data centers but does not scale up well, making expansion difficult. Many cable runs in larger data centers cause cable pathways and cabinets congestion and increase costs.

Zoned or top-of-rack topologies may be used in large data centers for LAN traffic, but centralized architectures may be used for SAN traffic. In particular, port utilization is essential when SAN switch ports are expensive.

Zoned Topology

Distributed switching resources make up a zoned topology. Typically, chassis-based switches support multiple server cabinets and can be distributed among end-of-row (EoR) and middle-of-row (MoR) locations. It is highly scalable, repeatable, and predictable and is recommended by the ANS/TIA-942 Data Center Standards.

A zoned architecture provides the highest switch and port utilization level while minimizing cabling costs. Switching at the end of a row can be advantageous in certain situations. Two servers’ local area network (LAN) ports can be connected to the same end-of-row switch for low-latency port-to-port switching.

Having to run cable back to the end-of-row switch is a potential disadvantage of end-of-row switching. It is possible for this cabling to exceed that required for a top-of-rack system if every server is connected to redundant switches.

Top-of-rack (ToR)

Switches are typically placed at the top of a server rack to provide top-of-rack (ToR) switching, as shown below. Using this topology is a good option for dense one-rack-unit (1RU) server environments. For redundancy, both switches are connected to all servers in the rack. There are uplinks to the next layer of switching from the top-of-rack switches.

It simplifies cable management and minimizes cable containment requirements when cables are managed at the top of the rack. Using this approach, servers within the rack can quickly switch from port to port, and the uplink oversubscription is predictable.

In top-of-rack designs, cabling is more efficiently utilized. In exchange, there is usually an increase in the cost of switches and a high cost for under-utilization of ports. There is also the possibility of overheating local area network (LAN) switch gear in server racks when top-of-rack switching is required.

Data Center Architecture Types

Mesh architecture

Mesh networks, known as “network fabrics” or leaf-spine, consist of meshed connections between leaf-and-spine switches.  They are well suited for supporting universal “cloud services” because the mesh of network links enables any-to-any connectivity with predictable capacity and lower latency. The mesh network has multiple switching resources scattered throughout the data center, making it inherently redundant. Compared to huge, centralized switching platforms, these distributed network designs can be more cost-effective to deploy and scale.

Multi-Tier

Multi-tier architectures are commonly used in enterprise data centers. In this design, mainframes, blade servers, 1RU servers, and mainframes run the web, application, and database server tiers.

Mesh point of delivery

Mesh point of delivery (PoD) architectures have leaf switches interconnected within PoDs, and spine switches aggregated in a central main distribution area (MDA). This architecture also enables multiple PoDs to connect efficiently to a super-spine tier. Three-tier topologies that support east-west data flows will be able to support new cloud applications with low latency. Mesh PoD networks can provide a pool of low-latency computing and storage for these applications that can be added without disrupting the existing environment.

Super Spine architectecutre

Hyperscale organizations that deploy large-scale data center infrastructures or campus-style data centers often deploy super spine architecture. This type of architecture handles data passing east to west across data halls.

Cloud Data Centers

Understanding Network Tiers

Network tiers refer to the different levels of service quality and performance that a network can offer. They allow businesses to prioritize and allocate resources based on their specific needs. In the case of Google Cloud, there are two primary network tiers: Premium Tier and Standard Tier.

The Premium Tier in Google Cloud offers businesses a top-of-the-line network experience. It leverages Google’s private global network, which is interconnected with major internet service providers (ISPs) worldwide.

This interconnectivity ensures low latency, high bandwidth, and enhanced reliability for mission-critical workloads. By utilizing the Premium Tier, businesses can deliver an exceptional user experience, reduce downtime, and ensure optimal performance for latency-sensitive applications.

While the Premium Tier provides unparalleled performance, the Standard Tier offers a more cost-effective alternative for businesses with less latency-sensitive workloads. The Standard Tier leverages public internet transit, providing reliable and secure connectivity at a lower price point.

This tier is ideal for applications that can tolerate slightly higher latency, such as batch processing, non-real-time analytics, or backup and recovery tasks. By utilizing the Standard Tier, businesses can achieve significant cost savings without sacrificing overall network reliability.

Understanding VPC Networking

VPC networking forms the foundation of your cloud infrastructure, allowing you to create and manage virtual networks with ease. In Google Cloud, VPC networks provide isolation and connectivity for your resources, ensuring secure communication and data transfer.

Google Cloud’s VPC networking offers a plethora of powerful features. These include custom IP ranges, subnets, firewall rules, routes, and VPN connectivity. Custom IP ranges enable you to define IP addresses for your virtual network, while subnets allow you to divide your network into smaller segments for better organization and control.

Understanding VPC Peering

VPC Peering is a networking arrangement that enables communication between two virtual private clouds (VPCs) in the same or different projects within Google Cloud. It establishes a direct, private connection between VPC networks, allowing them to communicate as if they were part of the same network.

VPC Peering offers numerous benefits to organizations leveraging Google Cloud. First, it enables seamless and secure communication between VPC networks, eliminating the need for complex VPN setups or publicly exposing resources. Second, it allows for low-latency data transfer, ensuring optimal performance for applications and services. Third, it simplifies network management, enabling centralized administration of connected VPCs.

Related: For pre-information, you may find the following post helpful

  1. ACI Cisco
  2. Virtual Switch
  3. Ansible Architecture
  4. Overlay Virtual Networks

Data Center Topology

The Role of Networks

A network lives to serve the connectivity requirements of applications and applications. We build networks by designing and implementing data centers. A common trend is that the data center topology is much bigger than a decade ago, with application requirements considerably different from the traditional client-server applications and with deployment speeds in seconds instead of days. This changes how networks and your chosen data center topology are designed and deployed.

The traditional network design was scaled to support more devices by deploying larger switches (and routers). This is the scale-in model of scaling. However, these large switches are expensive and primarily designed to support only a two-way redundancy.

Today, data center topologies are built to scale out. They must satisfy the three main characteristics of increasing server-to-server traffic, scale ( scale on-demand ), and resilience. The following diagram shows a ToR design we discussed at the start of the blog.

Top of Rack (ToR)
Diagram: Data center network topology. Top of Rack (ToR).

The Role of The ToR

Top of rack (ToR) is a term used to describe the architecture of a data center. It is a server architecture in which servers, switches, and other equipment are mounted on the same rack. This allows for the most efficient use of space since the equipment is all within arm’s reach.

ToR is also the most efficient way to manage power and cooling since the equipment is all in the same area. Since all the equipment is close together, ToR also allows faster access times. This architecture can also be utilized in other areas, such as telecommunications, security, and surveillance.

ToR is a great way to maximize efficiency in any data center and is becoming increasingly popular. In contrast to the ToR data center design, the following diagram shows an EoR switch design.

End of Row (EoR)
Diagram: Data center network topology. End of Row (EoR).

The Role of The EoR

The term end-of-row (EoR) design is derived from a dedicated networking rack or cabinet placed at either end of a row of servers to provide network connectivity to the servers within that row. In EoR network design, each server in the rack has a direct connection with the end-of-row aggregation switch, eliminating the need to connect servers directly with the in-rack switch.

Racks are usually arranged to form a row; a cabinet or rack is positioned at the end of this row. This rack has a row aggregation switch, which provides network connectivity to servers mounted in individual racks. This switch, a modular chassis-based platform, sometimes supports hundreds of server connections. However, a large amount of cabling is required to support this architecture.

Data center topology types
Diagram: ToR and EoR. Source. FS Community.

A ToR configuration requires one switch per rack, resulting in higher power consumption and operational costs. Moreover, unused ports are often more significant in this scenario than with an EoR arrangement.

On the other hand, ToR’s cabling requirements are much lower than those of EoR, and faults are primarily isolated to a particular rack, thus improving the data center’s fault tolerance.

If fault tolerance is the ultimate goal, ToR is the better choice, but EoR configuration is better if an organization wants to save on operational costs. The following table lists the differences between a ToR and an EoR data center design.

data center network topology
Diagram: Data center network topology. The differences. Source FS Community

Data Center Topology Types:

Fabric extenders – FEX

Cisco has introduced the concept of Fabric Extenders, which are not Ethernet switches but remote line cards of a virtualized modular chassis ( parent switch ). This allows scalable topologies previously impossible with traditional Ethernet switches in the access layer.

You should relate an FEX device like a remote line card attached to a parent switch. All the configuration is done on the parent switch, yet physically, the fabric extender could be in a different location. The mapping between the parent switch and the FEX ( fabric extender ) is done via a special VN-Link.

The following diagram shows an example of a FEX in a standard data center network topology. More specifically, we are looking at the Nexus 2000 FEX Series. Cisco Nexus 2000 Series Fabric Extenders (FEX) are based on the standard IEEE 802.1BR. They deliver fabric extensibility with a single point of management.

Cisco FEX
Diagram: Cisco FEX design. Source Cisco.

Different types of Fex solution

FEXs come with various connectivity solutions, including 100 Megabit Ethernet, 1 Gigabit Ethernet, 10 Gigabit Ethernet ( copper and fiber ), and 40 Gigabit Ethernet. They can be synchronized with the following parent switch models: Nexus 5000, Nexus 6000, Nexus 7000, Nexus 9000, and Cisco UCS Fabric Interconnect.

In addition, because of FEX’s simplicity, they have very low latency ( as low as 500 nanoseconds ) compared to traditional Ethernet switches.

Data Center design
Diagram: Data center fabric extenders.

Some network switches can be connected to others and operate as a single unit. These configurations are called “stacks” and are helpful for quickly increasing the capacity of a network. A stack is a network solution composed of two or more stackable switches. Switches that are part of a stack behave as one single device.

Traditional switches like the 3750s still stand in the data center network topology access layer and can be used with stacking technology, combining two physical switches into one logical switch.

This stacking technology allows you to build a highly resilient switching system, one switch at a time. If you are looking at a standard access layer switch like the 3750s, consider the next-generation Catalyst 3850 series.

The 3850 supports BYOD/mobility and offers various performance and security enhancements compared to previous models. However, stacking has a drawback: You can only stack several switches. So, if you want more throughout, you should aim for a different design type.

Data Center Design: Layer 2 and Layer 3 Solutions

Traditional views of data center design

Depending on the data center network topology deployed, packet forwarding at the access layer can be either Layer 2 or Layer 3. A Layer 3 approach would involve additional management and configuring IP addresses on hosts in a hierarchical fashion that matches the switch’s assigned IP address.

An alternative approach is to use Layer 2, which has less overhead as Layer 2 MAC addresses do not need specific configuration. However, it has drawbacks with scalability and poor performance.

Generally, access switches focus on communicating servers in the same IP subnet, allowing any type of traffic – unicast, multicast, or broadcast. You can, however, have filtering devices such as a Virtual Security Gateway ( VSG ) to permit traffic between servers, but that is generally reserved for inter-POD ( Platform Optimized Design ) traffic.

Leaf and Spine With Layer 3

We use a leaf and spine data center design with Layer 3 everywhere and overlay networking. This modern, robust architecture provides a high-performance, highly available network. With this architecture, data center networks are composed of leaf switches that connect to one or more spine switches.

The leaf switches are connected to end devices such as servers, storage devices, and other networking equipment. The spine switches, meanwhile, act as the network’s backbone, connecting the multiple leaf switches.

The leaf and spine architecture provides several advantages over traditional data center networks. It allows for greater scalability, as additional leaf switches can be easily added to the network. It also offers better fault tolerance, as the network can operate even if one of the spine switches fails.

Furthermore, it enables faster traffic flows, as the spine switches to route traffic between the leaf switches faster than a traditional flat network.

Data Center Traffic Flow

Datacenter topologies can have North-South or East-to-West traffic. North-south ( up / down ) corresponds to traffic between the servers and the external world ( outside the data center ). East-to-west corresponds to internal server communication, i.e., traffic does not leave the data center.

Therefore, determining the type of traffic upfront is essential as it influences the type of topology used in the data center.

data center traffic flow
Diagram: Data center traffic flow.

For example, you may have a pair of ISCSI switches, and all traffic is internal between the servers. In this case, you would need high-bandwidth inter-switch links. Usually, an ether channel supports all the cross-server talk; the only north-to-south traffic would be management traffic.

In another part of the data center, you may have data server farm switches with only HSRP heartbeat traffic across the inter-switch links and large bundled uplinks for a high volume of north-to-south traffic. Depending on the type of application, which can be either outward-facing or internal, computation will influence the type of traffic that will be dominant. 

Virtual Machine and Containers.

This drive was from virtualization, virtual machines, and container technologies regarding east-west traffic. Many are moving to a leaf and spine data center design if they have a lot of east-to-west traffic and want better performance.

Network Virtualization and VXLAN

Network virtualization and the ability of a physical server to host many VMs and move those VMs are also used extensively in data centers, either for workload distribution or business continuity. This will also affect the design you have at the access layer.

For example, in a Layer 3 fabric, migrating a VM across that boundary changes its IP address, resulting in a reset of the TCP sessions because, unlike SCTP, TCP does not support dynamic address configuration. In a Layer 2 fabric, migrating a VM incurs ARP overhead and requires forwarding on millions of flat MAC addresses, which leads to MAC scalability and poor performance problems.

VXLAN: stability over Layer 3 core

Network virtualization plays a vital role in the data center. Technologies like VXLAN attempt to move the control plane from the core to the edge and stabilize the core so that it only has a handful of addresses for each ToR switch. The following diagram shows the ACI networks with VXLAN as the overlay that operates over a spine leaf architecture.

Layer 2 and 3 traffic is mapped to VXLAN VNIs that run over a Layer 3 core. The Bridge Domain is for layer 2, and the VRF is for layer 3 traffic. Now, we have the separation of layer 2 and 3 traffic based on the VNI in the VXLAN header.  

One of the first notable differences between VXLAN and VLAN was scale. VLAN has a 12-bit identifier called VID, while VXLAN has a 24-bit identifier called a VID network identifier. This means that with VLAN, you can create only 4094 networks over ethernet, while with VXLAN, you can create up to 16 million.

Whether you can build layer 2 or layer 3 in the access and use VXLAN or some other overlay to stabilize the core, it would help if you modularized the data center. The first step is to build each POD or rack as a complete unit. Each POD will be able to perform all its functions within that POD.

  • A key point: A POD data center design

POD is a design methodology that aims to simplify, speed deployment, optimize resource utilization, and drive the interoperability of three or more data center components: server, storage, and networks.

  • A POD example: Data center modularity

For example, one POD might be a specific human resources system. The second is modularity based on the type of resources offered. For example, a storage pod or bare metal compute may be housed in separate pods.

These two modularization types allow designers to easily control inter-POD traffic with predefined policies. Operators can also upgrade PODs and a specific type of service at once without affecting other PODs.

However, this type of segmentation does not address the data center’s scale requirements. Even when we have adequately modularized the data center into specific portions, the MAC table sizes on each switch still increase exponentially as the data center grows.

Current and Future Design Factors

New technologies with scalable control planes must be introduced for a cloud-enabled data center, and these new control planes should offer the following:

Option

Data Center Feature

Data center feature 1

The ability to scale MAC addresses

Data center feature 2

First-Hop Redundancy Protocol ( FHRP ) multipathing and Anycast HSRP

Data center feature 3

Equal-Cost multipathing

Data center feature 4

MAC learning optimizations

Several design factors need to be considered when designing a data center. First, what is the growth rate for servers, switch ports, and data center customers? This prevents part of the network topology from becoming a bottleneck or linking congested.

**Application bandwidth demand?**

This demand is usually translated into oversubscription. In data center networking, oversubscription refers to how much bandwidth switches are offered to downstream devices at each layer.

Oversubscription is expected in a data center design. Limiting oversubscription to the ToR and edge of the network offers a single place to start when performance problems occur.

A data center with no oversubscription ratio will be costly, especially with a low latency network design. So, it’s best to determine what oversubscription ratio your applications support and work best. Optimizing your switch buffers to improve performance is recommended before you decide on a 1:1 oversubscription rate.

**Ethernet 6-byte MAC addressing is flat**

Ethernet forms the basis of data center networking in tandem with IP. Since its inception 40 years ago, Ethernet frames have been transmitted over various physical media, even barbed wire. Ethernet 6-byte MAC addressing is flat; the manufacturer typically assigns the address without considering its location.

Ethernet-switched networks do not have explicit routing protocols to ensure readability about the flat addresses of the server’s NICs. Instead, flooding and address learning are used to create forwarding table entries.

**IP addressing is a hierarchy**

On the other hand, IP addressing is a hierarchy, meaning that its address is assigned by the network operator based on its location in the network. A hierarchy address space advantage is that forwarding tables can be aggregated. If summarization or other routing techniques are employed, changes in one side of the network will not necessarily affect other areas.

This makes IP-routed networks more scalable than Ethernet-switched networks. IP-routed networks also offer ECMP techniques that enable networks to use parallel links between nodes without spanning tree disabling one of those links. The ECMP method hashes packet headers before selecting a bundled link to avoid out-of-sequence packets within individual flows. 

Equal Cost Load Balancing

Equal-cost load balancing is a method for distributing network traffic among multiple paths of equal cost. It provides redundancy and increases throughput. Sending traffic over numerous paths avoids congestion on any single link. In addition, the load is equally distributed across the paths, meaning that each path carries roughly the same total traffic.

ecmp
Diagam: ECMP 5 Tuple hash. Source: Keysight

This allows for using multiple paths at a lower cost, providing an efficient way to increase throughput.

The idea behind equal-cost load balancing is to use multiple paths of equal cost to balance the load on each path. The algorithm considers the number of paths, each path’s weight, and each path’s capacity. It also considers the number of packets that must be sent and the delay allowed for each packet.

Considering these factors, it can calculate the best way to distribute the load among the paths.

Equal-cost load balancing can be implemented using various methods. One method is to use a Link Aggregation Protocol (LACP), which allows the network to use multiple links and distribute the traffic among the links in a balanced way.

ecmp
Diagam: ECMP 5 Tuple hash. Source: Keysight
  • A keynote: Data center topologies. The move to VXLAN.

Given the above considerations, a solution encompassing the benefits of L2’s plug-and-play flat addressing and IP scalability is needed. Location-Identifier Split Protocol ( LISP ) has a set of solutions that use hierarchical addresses as locators in the core and flat addresses as identifiers in the edges. However, not much is seen in its deployment these days.

Equivalent approaches such as THRILL and Cisco FabricPath create massive scalable L2 multipath networks with equidistant endpoints. Tunneling is also being used to extend down to the server and access layer to overcome the 4K limitation with traditional VLANs. What is VXLAN? Tunneling with VXLAN is now the standard design in most data center topologies with leaf-spine designs. The following video provides VXLAN guidance.

Data Center Network Topology

Leaf and spine data center topology types

This is commonly seen in a leaf and spine design. For example, in a leaf-spine fabric, We have a Layer 3 IP fabric that supports equal-cost multi-path (ECMP) routing between any two endpoints in the network. Then, on top of the Layer 3 fabric is an overlay protocol, commonly VXLAN.

A spine-leaf architecture consists of a data center network topology with two switching layers: a spine and a leaf. The leaf layer comprises access switches that aggregate traffic from endpoints such as servers and connect directly to the spine or network core.

Spine switches interconnect all leaf switches in a full-mesh topology. The leaf switches do not directly connect. The Cisco ACI is a data center topology that utilizes the leaf and spine.

The ACI network’s physical topology is a leaf and spine, while the logical topology is formed with VXLAN. From a protocol side point, VXLAN is the overlay network, and the BGP and IS-IS provide the Layer 3 routing, the underlay network that allows the overlay network to function.

As a result, the nonblocking architecture performs much better than the traditional data center design based on access, distribution, and core designs.

**Closing Points: Data Center Topologies**

A data center topology refers to the physical layout and interconnection of network devices within a data center. It determines how servers, switches, routers, and other networking equipment are connected, ensuring efficient and reliable data transmission. Topologies are based on scalability, fault tolerance, performance, and cost.

  • Hierarchical Data Center Topology:

The hierarchical or tree topology is one of the most commonly used data center topologies. This design consists of multiple core, distribution, and access layers. The core layer connects all the distribution layers, while the distribution layer connects to the access layer. This structure enables better management, scalability, and fault tolerance by segregating traffic and minimizing network congestion.

  • Mesh Data Center Topology:

Every network device is interlinked in a mesh topology, forming a fully connected network with multiple paths for data transmission. This redundancy ensures high availability and fault tolerance. However, this topology can be cost-prohibitive and complex, especially in large-scale data centers.

  • Leaf-Spine Data Center Topology:

The leaf-spine topology is gaining popularity due to its scalability and simplicity. It consists of interconnected leaf switches at the access layer and spine switches at the core layer. This design allows for non-blocking, low-latency communication between any leaf switch and spine switch, making it suitable for modern data center requirements.

  • Full-Mesh Data Center Topology:

As the name suggests, the full-mesh topology connects every network device to every other device, creating an extensive web of connections. This topology offers maximum redundancy and fault tolerance. However, it can be expensive to implement and maintain, making it more suitable for critical applications with stringent uptime requirements.

Summary: Data Center Topology

Data centers are vital in supporting and enabling our digital infrastructure in today’s interconnected world. Behind the scenes, intricate network topologies ensure seamless data flow, allowing us to access information and services easily. In this blog post, we dived into the world of data center topologies, unraveling their complexities and understanding their significance.

Understanding Data Center Topologies

Datacenter topologies refer to a data center’s physical and logical layout of networking components. These topologies determine how data flows between servers, switches, routers, and other network devices. By carefully designing the topology, data center operators can optimize performance, scalability, redundancy, and fault tolerance.

Common Data Center Topologies

There are several widely adopted data center topologies, each with its strengths and use cases. Let’s explore some of the most common ones:

Tree Topology:

Tree topology, or hierarchical topology, is widely used in data centers. It features a hierarchical structure with multiple layers of switches, forming a tree-like network. This topology offers scalability and ease of management, making it suitable for large-scale deployments.

Mesh Topology:

The mesh topology provides a high level of redundancy and fault tolerance. In this topology, every device is connected to every other device, forming a fully interconnected network. While it offers robustness, it can be complex and costly to implement.

Spine-Leaf Topology:

The spine-leaf topology, known as a Clos network, has recently gained popularity. It consists of leaf switches connecting to multiple spine switches, forming a non-blocking fabric. This design allows for efficient east-west traffic flow and simplified scalability.

Factors Influencing Topology Selection

Choosing the right data center topology depends on various factors, including:

Scalability:

A topology must accommodate a data center’s growth. Scalable topologies ensure that additional devices can be seamlessly added without causing bottlenecks or performance degradation.

Redundancy and Fault Tolerance:

Data centers require high availability to minimize downtime. Topologies that offer redundancy and fault tolerance mechanisms, such as link and device redundancy, are crucial in ensuring uninterrupted operations.

Traffic Patterns:

Understanding the traffic patterns within a data center is essential for selecting an appropriate topology. Some topologies excel in handling east-west traffic, while others are better suited for north-south traffic flow.

Conclusion

Datacenter topologies form the backbone of our digital infrastructure, providing the connectivity and reliability needed for our ever-expanding digital needs. By understanding the intricacies of these topologies, we can better appreciate the complexity involved in keeping our data flowing seamlessly. Whether it’s the hierarchical tree, the fully interconnected mesh, or the efficient spine-leaf, each topology has its place in the world of data centers.

Green data center with eco friendly electricity usage tiny person concept. Database server technology for file storage hosting with ecological and carbon neutral power source vector illustration.

Data Center Design with Active Active design

Active Active Data Center Design

In today's digital age, where businesses heavily rely on uninterrupted access to their applications and services, data center design plays a pivotal role in ensuring high availability. One such design approach is the active-active design, which offers redundancy and fault tolerance to mitigate the risk of downtime. This blog post will explore the active-active data center design concept and its benefits.

Active-active data center design refers to a configuration where two or more data centers operate simultaneously, sharing the load and providing redundancy for critical systems and applications. Unlike traditional active-passive setups, where one data center operates in standby mode, the active-active design ensures that both are fully active and capable of handling the entire workload.

Enhanced Reliability: Redundant data centers offer unparalleled reliability by minimizing the impact of hardware failures, power outages, or network disruptions. When a component or system fails, the redundant system takes over seamlessly, ensuring uninterrupted connectivity and preventing costly downtime.

Scalability and Flexibility: With redundant data centers, businesses have the flexibility to scale their operations effortlessly. Companies can expand their infrastructure without disrupting ongoing operations, as redundant systems allow for seamless integration and expansion.

Disaster Recovery: Redundant data centers play a crucial role in disaster recovery strategies. By having duplicate systems in geographically diverse locations, businesses can recover quickly in the event of natural disasters, power grid failures, or other unforeseen events. Redundancy ensures that critical data and services remain accessible, even during challenging circumstances.

Dual Power Sources: Redundant data centers rely on multiple power sources, such as grid power and backup generators. This ensures that even if one power source fails, the infrastructure continues to operate without disruption.

Network Redundancy: Network redundancy is achieved by setting up multiple network paths, routers, and switches. In case of a network failure, traffic is automatically redirected to alternative paths, maintaining seamless connectivity.

Data Replication: Redundant data centers employ data replication techniques to ensure that data is duplicated and synchronized across multiple systems. This safeguards against data loss and allows for quick recovery in case of a system failure.

Highlights: Active Active Data Center Design

The Role of Data Centers

An enterprise’s data center houses the computational power, storage, and applications needed to run its operations. All content is sourced or passed through the data center infrastructure in the IT architecture. Performance, resiliency, and scalability must be considered when designing the data center infrastructure.

Furthermore, the data center design should be flexible so that new services can be deployed and supported quickly. The many considerations required for such a design are port density, access layer uplink bandwidth, actual server capacity, and oversubscription.

A few short years ago, data centers were very different from what they are today. In a multi-cloud environment, virtual networks have replaced physical servers that support applications and workloads across pools of physical infrastructure. Nowadays, data exists across multiple data centers, the edge, and public and private clouds.

Communication between these locations must be possible in the on-premises and cloud data centers. Public clouds are also collections of data centers. In the cloud, applications use the cloud provider’s data center resources.

Redundant data centers

Redundant data centers are essentially two or more in different physical locations. This enables organizations to move their applications and data to another data center if they experience an outage. This also allows for load balancing and scalability, ensuring the organization’s services remain available.

Redundant data centers are generally located in geographically dispersed locations. This ensures that if one of the data centers experiences an issue, the other can take over, thus minimizing downtime. These data centers should also be connected via a high-speed network connection, such as a dedicated line or virtual private network, to allow seamless data transfers between the locations.

Implementing redundant data center BGP involves several crucial steps.

– Firstly, establishing a robust network architecture with multiple data centers interconnected via high-speed links is essential.

– Secondly, configuring BGP routers in each data center to exchange routing information and maintain consistent network topologies is crucial. Additionally, techniques such as Anycast IP addressing and route reflectors further enhance redundancy and fault tolerance.

**Benefits of Active-Active Data Center Design**

1. Enhanced Redundancy: With active-active design, organizations can achieve higher levels of redundancy by distributing the workload across multiple data centers. This redundancy ensures that even if one data center experiences a failure or maintenance downtime, the other data center seamlessly takes over, minimizing the impact on business operations.

2. Improved Performance and Scalability: Active-active design enables organizations to scale their infrastructure horizontally by distributing the load across multiple data centers. This approach ensures that the workload is evenly distributed, preventing any single data center from becoming a performance bottleneck. It also allows businesses to accommodate increasing demands without compromising performance or user experience.

3. Reduced Downtime: The active-active design significantly reduces the risk of downtime compared to traditional architectures. In the event of a failure, the workload can be immediately shifted to the remaining active data center, ensuring continuous availability of critical services. This approach minimizes the impact on end-users and helps organizations maintain their reputation for reliability.

4. Disaster Recovery Capabilities: Active-active data center design provides a robust disaster recovery solution. Organizations can ensure that their critical systems and applications remain operational despite a catastrophic failure at one location by having multiple geographically distributed data centers. This design approach minimizes the risk of data loss and provides a seamless failover mechanism.

**Implementation Considerations:**

Implementing an active-active data center design requires careful planning and consideration of various factors. Here are some key considerations:

1. Network Design: A robust and resilient network infrastructure is crucial for active-active data center design. Implementing load balancers, redundant network links, and dynamic routing protocols can help ensure seamless failover and optimal traffic distribution.

2. Data Synchronization: Organizations need to implement effective data synchronization mechanisms to maintain data consistency across multiple data centers. This may involve deploying real-time replication, distributed databases, or file synchronization protocols.

3. Application Design: Applications must be designed to be aware of the active-active architecture. They should be able to distribute the workload across multiple data centers and seamlessly switch between them in case of failure. Application-level load balancing and session management become critical in this context.

Active-active data center design offers organizations a robust solution for high availability and fault tolerance. Businesses can ensure uninterrupted access to critical systems and applications by distributing the workload across multiple data centers. The enhanced redundancy, improved performance, reduced downtime, and disaster recovery capabilities make active-active design an ideal choice for organizations striving to provide seamless and reliable services in today’s digital landscape.

Network Connectivity Center

### What is Google’s Network Connectivity Center?

Google Network Connectivity Center (NCC) is a centralized platform that enables enterprises to manage their global network connectivity. It integrates with Google Cloud’s global infrastructure, offering a unified interface to monitor, configure, and optimize network connections. Whether you are dealing with on-premises data centers, remote offices, or multi-cloud environments, NCC provides a streamlined approach to network management.

### Key Features of NCC

Google’s NCC is packed with features that make it an indispensable tool for network administrators. Here are some key highlights:

– **Centralized Management**: NCC offers a single pane of glass for monitoring and managing all network connections, reducing complexity and improving efficiency.

– **Scalability**: Built on Google Cloud’s robust infrastructure, NCC can scale effortlessly to accommodate growing network demands.

– **Automation and Intelligence**: With built-in automation and intelligent insights, NCC helps in proactive network management, minimizing downtime and optimizing performance.

– **Integration**: Seamlessly integrates with other Google Cloud services and third-party tools, providing a cohesive ecosystem for network operations.

Understanding Network Tiers

Network tiers refer to the different levels of performance and cost offered by cloud service providers. They allow businesses to choose the most suitable network option based on their specific needs. Google Cloud offers two network tiers: Standard and Premium.

The Standard Tier provides businesses with a cost-effective network solution that meets their basic requirements. It offers reliable performance and ensures connectivity within Google Cloud services. With its lower costs, the Standard Tier is an excellent choice for businesses with moderate network demands.

For businesses that demand higher levels of performance and reliability, the Premium Tier is the way to go. This tier offers optimized routes, reduced latency, and enhanced global connectivity. With its advanced features, the Premium Tier ensures optimal network performance for mission-critical applications and services.

Understanding VPC Networking

VPC networking is the backbone of a cloud infrastructure, providing a private and secure environment for your resources. In Google Cloud, a VPC network can be thought of as your own virtual data center in the cloud. It allows you to define IP ranges, subnets, and firewall rules, empowering you with complete control over your network architecture.

Google Cloud’s VPC networking offers a plethora of features that enhance network management and security. From custom IP address ranges to subnet creation and route configuration, you have the flexibility to design your network infrastructure according to your specific needs. Additionally, VPC peering and VPN connectivity options enable seamless communication with other networks, both within and outside of Google Cloud.

Understanding VPC Peering

VPC Peering enables you to connect VPC networks across projects or organizations. It allows for secure communication and seamless access to resources between peered networks. By leveraging VPC Peering, you can create a virtual network fabric across various environments.

VPC Peering offers several advantages. First, it simplifies network architecture by eliminating the need for complex VPN setups or public IP addresses. Second, it provides low-latency and high-bandwidth connections between VPC networks, ensuring fast and reliable data transfer. Third, it lets you share resources across peering networks, such as databases or storage, promoting collaboration and resource optimization.

Understanding HA VPN

HA VPN, short for High Availability Virtual Private Network, is a feature provided by Google Cloud that ensures continuous and reliable connectivity between your on-premises network and your Google Cloud Virtual Private Cloud (VPC) network. It is designed to minimize downtime and provide fault tolerance by establishing redundant VPN tunnels.

To set up HA VPN, follow a few simple steps. First, ensure that you have a supported on-premises VPN gateway. Then, configure the necessary settings to create a VPN gateway in your VPC network. Next, configure the on-premises VPN gateway to establish a connection with the HA VPN gateway. Finally, validate the connectivity and ensure all traffic is routed securely through the VPN tunnels.

Implementing HA VPN offers several benefits for your network infrastructure. First, it enhances reliability by providing automatic failover in case of VPN tunnel or gateway failures, ensuring uninterrupted connectivity for your critical workloads. Second, HA VPN reduces the risk of downtime by offering a highly available and redundant connection. Third, it simplifies network management by centralizing the configuration and monitoring of VPN connections.

On-premises Data Centers

Understanding Nexus 9000 Series VRRP

Nexus 9000 Series VRRP is a protocol that allows multiple routers to work together as a virtual router, providing redundancy and seamless failover in the event of a failure. These routers ensure continuous network connectivity by sharing a virtual IP address, improving network reliability.

With Nexus 9000 Series VRRP, organizations can achieve enhanced network availability and minimize downtime. Utilizing multiple routers can eliminate single points of failure and maintain uninterrupted connectivity. This is particularly crucial in data center environments, where downtime can lead to significant financial losses and reputational damage.

Configuring Nexus 9000 Series VRRP involves several steps. First, a virtual IP address must be defined and assigned to the VRRP group. Next, routers participating in VRRP must be configured with their respective priority levels and advertisement intervals. Additionally, tracking mechanisms can monitor the availability of specific network interfaces and adjust the VRRP priority dynamically.

High Availability and BGP

High availability refers to the ability of a system or network to remain operational and accessible even during failures or disruptions. BGP is pivotal in achieving high availability by employing various mechanisms and techniques.

BGP Multipath is a feature that allows for the simultaneous use of multiple paths to reach a destination. BGP can use various paths to ensure redundancy, load balancing, and enhanced network availability.

BGP Route Reflectors are used in large-scale networks to alleviate the full-mesh requirement between BGP peers. By simplifying the BGP peering configuration, route reflectors enhance scalability and fault tolerance, contributing to high availability.

BGP Anycast is a technique that enables multiple servers or routers to share the same IP address. This method routes traffic to the nearest or least congested node, improving response times and fault tolerance.

BGP AS Prepend

Understanding BGP Route Reflection

BGP route reflection is used in large-scale networks to reduce the number of full-mesh peerings required in a BGP network. It allows a BGP speaker to reflect routes received from one set of peers to another set of peers, eliminating the need for every peer to establish a direct connection with every other peer. Using route reflection, network administrators can simplify their network topology and improve its scalability.

The network must be divided into two main components to implement BGP route reflection: route reflectors and clients. Route reflectors serve as the central point for route reflection, while clients are the BGP speakers who establish peering sessions with the route reflectors. It is essential to carefully plan the placement of route reflectors to ensure optimal routing and redundancy in the network.

Route Reflector Hierarchy and Scaling

In large-scale networks, a hierarchy of route reflectors can be implemented to enhance scalability further. This involves using multiple route reflectors, where higher-level route reflectors reflect routes received from lower-level route reflectors. This hierarchical approach distributes the route reflection load and reduces the number of peering sessions required for each BGP speaker, thus improving scalability even further.

Understanding BGP Multipath

BGP multipath enables the selection and utilization of multiple equal-cost paths for forwarding traffic. Traditionally, BGP would only utilize a single best path, resulting in suboptimal network utilization. With multipath, network administrators can maximize link utilization, reduce congestion, and achieve load balancing across multiple paths.

One of the primary advantages of BGP multipath is enhanced network resilience. By utilizing multiple paths, networks become more fault-tolerant, as traffic can be rerouted in the event of link failures or congestion. Additionally, multipath can improve overall network performance by distributing traffic evenly across available paths, preventing bottlenecks, and ensuring efficient resource utilization.

Expansion and scalability

Expanding capacity is straightforward if a link is oversubscribed (more traffic than can be aggregated on the active link simultaneously). Expanding every leaf switch’s uplinks is possible, adding interlayer bandwidth and reducing oversubscription by adding a second spine switch. New leaf switches can be added by connecting them to every spine switch and configuring them as network switches if device port capacity becomes a concern. Scaling the network is made more accessible through ease of expansion. A nonblocking architecture can be achieved without oversubscription between the lower-tier switches and their uplinks.

Defining an active-active data center strategy isn’t easy when you talk to network, server, and compute teams that don’t usually collaborate when planning their infrastructure. An active-active Data center design requires a cohesive technology stack from end to end. Establishing the idea usually requires an enterprise-level architecture drive. In addition, it enables the availability and traffic load sharing of applications across DCs with the following use cases.

  • Business continuity
  • Mobility and load sharing
  • Consistent policy and fast provisioning capability across

Understanding Spanning Tree Protocol (STP)

Spanning Tree Protocol (STP) is a fundamental mechanism to prevent loops in Ethernet networks. It ensures that only one active path exists between two network devices, preventing broadcast storms and data collisions. STP achieves this by creating a loop-free logical topology known as the spanning tree. But what about MST? Let’s find out.

As networks grow and become more complex, a single spanning tree may not be sufficient to handle the increasing traffic demands. This is where Spanning Tree MST comes into play. MST allows us to divide the network into multiple logical instances, each with its spanning tree. By doing so, we can distribute the traffic load more efficiently, achieving better performance and redundancy.

MST operates by grouping VLANs into multiple instances, known as regions. Each region has its spanning tree, allowing for independent configuration and optimization. MST relies on the concept of a Root Bridge, which acts as the central point for each instance. By assigning different VLANs to separate cases, we can control traffic flow and minimize the impact of network changes.

Example: Understanding UDLD

UDLD is a layer 2 protocol designed to detect and mitigate unidirectional links in a network. It operates by exchanging protocol packets between neighboring devices to verify the bidirectional nature of a link. UDLD prevents one-way communication and potential network disruptions by ensuring traffic flows in both directions.

UDLD helps maintain network reliability by identifying and addressing unidirectional links promptly. It allows network administrators to proactively detect and resolve potential issues before they can impact network performance. This proactive approach minimizes downtime and improves overall network availability.

Attackers can exploit unidirectional links to gain unauthorized access or launch malicious activities. UDLD acts as a security measure by ensuring bidirectional communication, making it harder for adversaries to manipulate network traffic or inject harmful packets. By safeguarding against such threats, UDLD strengthens the network’s security posture.

Understanding Port Channel

Port Channel, also known as Link Aggregation, is a mechanism that allows multiple physical links to be combined into a single logical interface. This logical interface provides higher bandwidth, improved redundancy, and load-balancing capabilities. Cisco Nexus 9000 Port Channel takes this concept to the next level, offering enhanced performance and flexibility.

a. Increased Bandwidth: By aggregating multiple physical links, the Cisco Nexus 9000 Port Channel significantly increases the available bandwidth, allowing for higher data throughput and improved network performance.

b. Redundancy and High Availability: Port Channel provides built-in redundancy, ensuring network resilience during link failures. With Cisco Nexus 9000, link-level redundancy is seamlessly achieved, minimizing downtime and maximizing network availability.

c. Load Balancing: Cisco Nexus 9000 Port Channel employs intelligent load balancing algorithms that distribute traffic across the aggregated links, optimizing network utilization and preventing bottlenecks.

d. Simplified Network Management: Cisco Nexus 9000 Port Channel simplifies network management by treating multiple links as a logical interface. This streamlines configuration, monitoring, and troubleshooting processes, leading to increased operational efficiency.

Understanding Virtual Port Channel (VPC)

VPC is a link aggregation technique that treats multiple physical links between two switches as a single logical link. This technology enables enhanced scalability, improved resiliency, and efficient utilization of network resources. By combining the bandwidth of multiple links, VPC provides higher throughput and creates a loop-free topology that eliminates the need for Spanning Tree Protocol (STP).

Implementing VPC brings several advantages to network administrators.

First, it enhances redundancy by providing seamless failover in case of link or switch failures.

Second, active-active multi-homing is achieved, ensuring traffic is evenly distributed across all available links.

Third, VPC simplifies network management by treating two switches as single entities, enabling streamlined configuration and consistent policy enforcement.

Lastly, VPC allows for the creation of large Layer 2 domains, facilitating workload mobility and flexibility.

Understanding Nexus Switch Profiles

Nexus Switch Profiles are a feature of Cisco’s Nexus switches that enable administrators to define and manage a group of switch configurations as a single entity. This simplifies the management of complex networks by reducing manual configuration tasks and ensuring consistent settings across multiple switches. By encapsulating configurations into profiles, network administrators can achieve greater efficiency and operational agility.

Implementing Nexus Switch Profiles offers a plethora of benefits for network management. Firstly, it enables rapid deployment of new switches with pre-defined configurations, reducing time and effort. Secondly, profiles ensure consistency across the network, minimizing configuration errors and improving overall reliability. Additionally, profiles facilitate streamlined updates and changes, as modifications made to a profile are automatically applied to associated switches. This results in enhanced network security, reduced downtime, and simplified troubleshooting.

A. Active-active Transport Technologies

Transport technologies interconnect data centers. As part of the transport domain, redundancies and links are provided across the site to ensure HA and resiliency. Redundancy may be provided for multiplexers, GPONs, DCI network devices, dark fibers, diversity POPs for surviving POP failure, and 1+1 protection schemes for devices, cards, and links.

In addition, the following list contains the primary considerations to consider when designing a data center interconnection solution.

  • Recovery from various types of failure scenarios: Link failures, module failures, node failures, etc.
  • Traffic round-trip requirements between DCs based on link latency and applications
  • Requirements for bandwidth and scalability

B. Active-Active Network Services

Network services connect all devices in data centers through traffic switching and routing functions. Applications should be able to forward traffic and share load without disruptions on the network. Network services also provide pervasive gateways, L2 extensions, and ingress and egress path optimization across the data centers. Most major network vendors’ SDN solutions also integrate VxLAN overlay solutions to achieve L2 extension, path optimization, and gateway mobility.

Designing active-active network services requires consideration of the following factors:

  • Recovery from various failure scenarios, such as links, modules, and network devices, is possible.
  • Availability of the gateway locally as well as across the DC infrastructure
  • Using a VLAN or VxLAN between two DCs to extend the L2 domain
  • Policies are consistent across on-premises and cloud infrastructure – including naming, segmentation rules for integrating various L4/L7 services, hypervisor integration, etc.
  • Optimizing path ingresses and regresses.
  • Centralized management includes inventory management, troubleshooting, AAA capabilities, backup and restore traffic flow analysis, and capacity dashboards.

C. Active-Active L4-L7 Services

ADC and security devices must be placed in both DCs before active-active L4-L7 services can be built. The major solutions in this space include global traffic managers, application policy controllers, load balancers, and firewalls. Furthermore, these must be deployed at different tiers for perimeter, extranet, WAN, core server farm, and UAT segments. Also, it should be noted that most of the leading L4-L7 service vendors currently offer clustering solutions for their products across the DCs. As a result of clustering, its members can share L4/L7 policies, traffic loads, and failover seamlessly in case of an issue.

Below are some significant considerations related to L4-L7 service design

  • Various failure scenarios can be recovered, including link, module, and L4-L7 device failure.
  • In addition to naming policies, L4-L7 rules for various traffic types must be consistent across the on-premises infrastructure and in the multiple clouds.
  • Network management centralized (e.g., inventory, troubleshooting, AAA capabilities, backups, traffic flow analysis, capacity dashboards, etc.)

D. Active-Active Storage Services 

Active-active data centers rely on storage and networking solutions. They refer to the storage in both DCs that serve applications. Similarly, the design should allow for uninterrupted read and write operations. Therefore, real-time data mirroring and seamless failover capabilities across DCs are also necessary. The following are some significant factors to consider when designing a storage system.

  • Recover from single-disk failures, storage array failures, and split-brain failures.
  • Asynchronous vs. synchronous replication: With synchronized replication, data is simultaneously written for primary storage and replica. It typically requires dedicated FC links, which consume more bandwidth.
  • High availability and redundancy of storage: Storage replication factors and the number of disks available for redundancy
  • Failure scenarios of storage networks: Links, modules, and network devices

E. Active-Active Server Virtualization

Over the years, server virtualization has evolved. Microservices and containers are becoming increasingly popular among organizations.  The primary consideration here is to extend hypervisor/container clusters across the DCs to achieve seamless virtual machine/ container instance movement and fail-over. VMware Docker and Microsoft are the two dominant players in this market. Other examples include KVM, Kubernetes (container management), etc.

Here are some key considerations when it comes to virtualizing servers

  • Creating a cross-DC virtual host cluster using a virtualization platform
  • HA protects the VM in normal operational conditions and creates affinity rules that prefer local hosts.
  • Deploying the same service, VMs in two DCs can take over the load in real time when the host machine is unavailable.
  • A symmetric configuration with failover resources is provided across the compute node devices and DCs.
  • Managing computing resources and hypervisors centrally

F. Active-Active Applications Deployment

The infrastructure needs to be in place for the application to function. Additionally, it is essential to ensure high application availability across DCs. Applications can also fail over and get proximity access to locations. It is necessary to have Web, App, and DB tiers available at both data centers, and if the application fails in one, it should allow fail-over and continuity.

Here are a few key points to consider

  • Deploy the Web services on virtual or physical machines (VMs) by using multiple servers to form independent clusters per DC.
  • VM or physical machine can be used to deploy App services. If the application supports distributed deployment, multiple servers within the DC can form a cluster or various servers across DCs can create a cluster (preferred IP-based access).
  • The databases should be deployed on physical machines to form a cross-DC cluster (active-standby or active-active). For example, Oracle RAC, DB2, SQL with Windows server failover cluster (WSFC)

Knowledge Check: Default Gateway Redundancy

A first-hop redundancy protocol (FHRP) always provides an active default IP gateway. To transparently failover at the first-hop IP router, FHRPs use two or more routers or Layer 3 switches.

The default gateway facilitates network communication. Source hosts send data to their default gateways. Default gateways are IP addresses on routers (or Layer 3 switches) connected to the same subnet as the source hosts. End hosts are usually configured with a single default gateway IP address when the network topology changes. The local device cannot send packets off the local network segment if the default gateway is not reached. There is no dynamic method by which end hosts can determine the address of a new default gateway, even if there is a redundant router that may serve as the default gateway for that segment.

Advanced Topics:

Understanding VXLAN Flood and Learn

The flood and learning process is an essential component of VXLAN networks. It involves flooding broadcast, unknown unicast, and multicast traffic within the VXLAN segment to ensure that all relevant endpoints receive the necessary information. By using multicast, VXLAN optimizes network traffic and reduces unnecessary overhead.

Multicast plays a crucial role in enhancing the efficiency of VXLAN flood and learn. By utilizing multicast groups, the network can intelligently distribute traffic to only those endpoints that require the information. This approach minimizes unnecessary flooding, reduces network congestion, and improves overall performance.

Several components must be in place to enable VXLAN flood and learn with multicast. We will explore the necessary configurations on the VXLAN Tunnel Endpoints (VTEPs) and the underlying multicast infrastructure. Topics covered will include multicast group management, IGMP snooping, and PIM (Protocol Independent Multicast) configuration.

Related: Before you proceed, you may find the following useful:

  1. Data Center Topologies
  2. LISP Protocol
  3. Data Center Network Design
  4. ASA Failover
  5. LISP Hybrid Cloud
  6. LISP Control Plane

Active Active Data Center Design

At its core, an active active data center is based on fault tolerance, redundancy, and scalability principles. This means that the active data center should be designed to withstand any hardware or software failure, have multiple levels of data storage redundancy, and scale up or down as needed.

The data center also provides an additional layer of security. It is designed to protect data from unauthorized access and malicious attacks. It should also be able to detect and respond to any threats quickly and in a coordinated manner.

A comprehensive monitoring and management system is essential to ensure the data center functions correctly. This system should be designed to track the data center’s performance, detect problems, and provide the necessary alerting mechanisms. It should also provide insights into how the data center operates so that any necessary changes can be made.

Cisco Validated Design

Cisco has validated this design, which is freely available on the Cisco site. In summary, they have tested a variety of combinations, such as VSS-VSS, VSS-vPV, and vPC-vPC, and validated the design with 200 Layer 2 VLANs and 100 SVIs or 1000 VLANs and 1000 SVI with static routing.

At the time of writing, the M series for the Nexus 7000 supports native encryption of Ethernet frames through the IEEE 802.1AE standard. This implementation uses Advanced Encryption Standard ( AES ) cipher and a 128-bit shared key.

Example: Cisco ACI

In the following lab guide, we demonstrate Cisco ACI. To extend Cisco ACI, we have different designs, such as multi-site and multi-pod. This type of design overcomes many challenges of raising a data center, which we will discuss in this post, such as extending layer 2 networks.

One crucial value of the Cisco ACI is the COOP database that maps endpoints in the network. The following screenshots show the synchronized COOP database across spines, even in different data centers. Notice that the bridge domain VNID is mapped to the MAC address. The COOP database is unique to the Cisco ACI.

COOP database
Diagram: COOP database

**The Challenge: Layer 2 is Weak**

The challenge of data center design is “Layer 2 is weak & IP is not mobile.” In the past, best practices recommended that networks from distinct data centers be connected through Layer 3 ( routing ), isolating the known Layer 2 turmoil. However, the business is driving the application requirements, changing the connectivity requirements between data centers.

The need for an active data center has been driven by the following. It is generally recommended to have Layer 3 connections with path separation through Multi-VRF, P2P VLANs, or MPLS/VPN, along with a modular building block data center design.

Yet, some applications cannot function over a Layer 3 environment. For example, most geo clusters require Layer 2 adjacency between their nodes, whether for heartbeat and connection ( status and control synchronization ) state information or the requirement to share virtual IP.

MAC addresses to facilitate traffic handling in case of failure. However, some clustering products ( Veritas, Oracle RAC ) support communication over Layer 3 but are a minority and don’t represent the general case.

Defining active data centers

The term active-active refers to using at least two data centers where both can service an application at any time, so each functions as an active application site. The demand for active-active data center architecture is to accomplish seamless workload mobility and enable distributed applications along with the ability to pool and maximize resources.  

We must first have active-active data center infrastructure for an active/active application setup. Remember that the network is just one key component of active/active data centers). An active-active DC can be divided into two halves from a pure network perspective:-

  1. Ingress Traffic – inbound traffic
  2. Egress Traffic – outbound traffic
active active data center
Diagram: Active active data center. Scenario. Source is twoearsonemouth

Active Active Data Center and VM Migration

Migrating applications and data to virtual machines (VMs) are becoming increasingly popular as organizations seek to reduce their IT costs and increase the efficiency of their services. VM migration moves existing applications, data, and other components from a physical server to a virtualized environment. This process is becoming increasingly more cost-effective and efficient for organizations, eliminating the need for additional hardware, software, and maintenance costs.

Virtual Machine migration between data centers increases application availability, Layer 2 network adjacency between ESX hosts is currently required, and a consistent LUN must be maintained for stateful migration. In other words, if the VM loses its IP address, it will lose its state, and the TCP sessions will drop, resulting in a cold migration ( VM does a reboot ) instead of a hot migration ( VM does not reboot ).

Due to the stretched VLAN requirement, data center architects started to deploy traditional Layer 2 over the DCI and, unsurprisingly, were faced with exciting results. Although flooding and broadcasts are necessary for IP communication in Ethernet networks, they can become dangerous in a DCI environment.

Traffic Tramboning

Traffic tromboning can also be formed between two stretched data centers, so nonoptimal internal routing happens within extended VLANs. Trombones, by their very nature, create a network traffic scalability problem. Addressing this through load balancing among multiple trombones is challenging since their services are often stateful.

Traffic tromboning can affect either ingress or egress traffic. On egress, you can have FHRP filtering to isolate the HSRP partnership and provide an active/active setup for HSRP. On ingress, you can have GSLB, Route Injection, and LISP.

Traffic Tramboning
Diagram: Traffic Tramboning. Source is Silvanogai

Cisco Active-active data center design and virtualization technologies

Virtualization technologies can overcome many of these problems by being used for Layer 2 extensions between data centers. These include vPC, VSS, Cisco FabricPath, VPLS, OTV, and LISP with its Internet locator design. In summary, different technologies can be used for LAN extensions, and the primary mediums in which they can be deployed are Ethernet, MPLS, and IP.

    1. Ethernet: VSS and vPC or Fabric Path
    2. MLS: EoMPLS and A-VPLS and H-VPLS
    3. IP: OTV
    4. LISP

Ethernet Extensions and Multi-Chassis EtherChannel ( MEC )

It requires protected DWDM or direct fibers and works only between two data centers. It cannot support multi-datacenter topology, i.e., a full mesh of data centers, but can help hub and spoke topologies.

Previously, LAG could only terminate on one physical switch. VSS-MEC and vPC are port-channeling concepts extending link aggregation to two physical switches. This allows for creating L2 typologies based on link aggregation, eliminating the dependency on STP, thus enabling you to scale available Layer 2 bandwidth by bonding the physical links.

Because vPC and VSS create a single connection from an STP perspective, disjoint STP instances can be deployed in each data center. Such isolation can be achieved with BPDU Filtering on the DCI links or Multiple Spanning Tree ( MST ) regions on each site.

At the time of writing, vPC does not support Layer 3 peering, but if you want an L3 link, create one, as this does not need to run on dark fiber or protected DWDM, unlike the extended Layer 2 links. 

Ethernet Extension and Fabric path

The fabric path allows network operators to design and implement a scalable Layer 2 fabric, allowing VLANs to help reduce the physical constraints on server location. It provides a high-availability design with up to 16 active paths at layer 2, with each path a 16-member port channel for Unicast and Multicast.

This enables the MSDC networks to have flat typologies, separating nodes by a single hop ( equidistant endpoints ). Cisco has not targeted Fabric Path as a primary DCI solution as it does not have specific DCI functions compared to OTV and VPLS.

Its primary purpose is for Clos-based architectures. However, if you need to interconnect three or more sites, the Fabric path is a valid solution when you have short distances between your DCs via high-quality point-to-point optical transmission links.

Your WAN links must support Remote Port Shutdown and microflapping protection. By default, OTV and VPLS should be the first solutions considered as they are Cisco-validated designs with specific DCI features. For example, OTV can flood unknown unicast for particular VLANs.

FabricPath
Diagram: FabricPath. Source is Cisco

IP Core with Overlay Transport Virtualization ( OTV ).

OTV provides dynamic encapsulation with multipoint connectivity of up to 10 sites ( NX-OS 5.2 supports 6 sites, and NX-OS 6.2 supports 10 sites ). OTV, also known as Over-The-Top virtualization, is a specific DCI technology that enables Layer 2 extension across data center sites by employing a MAC in IP encapsulation with built-in loop prevention and failure boundary preservation.

There is no data plane learning. Instead, the overlay control plane ( Layer 2 IS-IS ) on the provider’s network facilitates all unicast and multicast learning between sites. OTV has been supported on the Nexus 7000 since the 5.0 NXOS Release and ASR 1000 since the 3.5 XE Release. OTV as a DCI has robust high availability, and most failures can be sub-sec convergence with only extreme and very unlikely failures such as device down resulting in <5 seconds.

Locator ID/Separator Protocol ( LISP)

Locator ID/Separator Protocol ( LISP) has many applications. As the name suggests, it separates the location and identifier of the network hosts, enabling VMs to move across subnet boundaries while retaining their IP address and enabling advanced triangular routing designs.

LISP works well when you have to move workloads and distribute workloads across data centers, making it a perfect complementary technology for an active-active data center design. It provides you with the following:

  • a) Global IP mobility across subnets for disaster recovery and cloud bursting ( without LAN extension ) and optimized routing across extended subnet sites.
  • b) Routing with extended subnets for active/active data centers and distributed clusters ( with LAN extension).
LISP networking
Diagram: LISP Networking. Source is Cisco

LISP answers the problems with ingress and egress traffic tromboning. It has a location mapping table, so when a host move is detected, updates are automatically triggered, and ingress routers (ITRs or PITRs) send traffic to the new location. From an ingress path flow inbound on the WAN perspective, LISP can answer our little problems with BGP in controlling ingress flows. Without LISP, we are limited to specific route filtering, meaning if you have a PI Prefix consisting of a /16.

If you break this up and advertise into 4 x /18, you may still get poor ingress load balancing on your DC WAN links; even if you were to break this up to 8 x /19, the results might still be unfavorable.

LISP works differently than BGP because a LISP proxy provider would advertise this /16 for you ( you don’t advertise the /16 from your DC WAN links ) and send traffic at 50:50 to our DC WAN links. LISP can get a near-perfect 50:50 conversion rate at the DC edge.

Summary: Active Active Data Center Design

In today’s digital age, businesses and organizations rely heavily on data centers to store, process, and manage critical information. However, any disruption or downtime can have severe consequences, leading to financial losses and damage to reputation. This is where redundant data centers come into play. In this blog post, we explored the concept of redundant data centers, their benefits, and how they ensure uninterrupted digital operations.

Understanding Redundancy in Data Centers

Redundancy in data centers refers to duplicating critical components and systems to minimize the risk of failure. It involves creating multiple backups of hardware, power sources, cooling systems, and network connections. With redundant systems, data centers can continue functioning even if one or more components fail.

Types of Redundancy

Data centers employ various types of redundancy to ensure uninterrupted operations. These include:

1. Hardware Redundancy involves duplicate servers, storage devices, and networking equipment. If one piece of hardware fails, the redundant backup takes over seamlessly, preventing disruption.

2. Power Redundancy: Power outages can harm data center operations. Redundant power systems, such as backup generators and uninterruptible power supplies (UPS), provide continuous power supply even during electrical failures.

3. Cooling Redundancy: Overheating can damage sensitive equipment in data centers. Redundant cooling systems, including multiple air conditioning units and cooling towers, help maintain optimal temperature levels and prevent downtime.

Network Redundancy

Network connectivity is crucial for data centers to communicate with the outside world. Redundant network connections ensure that alternative paths are available to maintain uninterrupted data flow if one connection fails. This can be achieved through diverse internet service providers (ISPs), multiple routers, and network switches.

Benefits of Redundant Data Centers

Implementing redundant data centers offers several benefits, including:

1. Increased Reliability: Redundancy minimizes the risk of single points of failure, making data centers highly reliable and resilient.

2. Improved Uptime: Data centers can achieve impressive uptime percentages with redundant systems, ensuring continuous access to critical data and services.

3. Disaster Recovery: Redundant data centers are crucial in disaster recovery strategies. If one data center becomes inaccessible due to natural disasters or other unforeseen events, the redundant facility takes over seamlessly, ensuring business continuity.

Conclusion:

Redundant data centers are vital for organizations that cannot afford any interruption in their digital operations. By implementing hardware, power, cooling, and network redundancy, businesses can mitigate risks, ensure uninterrupted access to critical data, and safeguard their operations from potential disruptions. Investing in redundant data centers is a proactive measure to save businesses from significant financial losses and reputational damage in the long run.

Data Center Network Design

Data Center Network Design

Data centers are crucial in today’s digital landscape, serving as the backbone of numerous businesses and organizations. A well-designed data center network ensures optimal performance, scalability, and reliability. This blog post will explore the critical aspects of data center network design and its significance in modern IT infrastructure.

Data center network design involves the architectural planning and implementation of networking infrastructure within a data center environment. It encompasses various components such as switches, routers, cables, and protocols. A well-designed network ensures seamless communication, high availability, and efficient data flow.

The traditional three-tier network architecture is being replaced by more streamlined and flexible designs. Two popular approaches gaining traction are the spine-leaf architecture and the fabric-based architecture. The spine-leaf design offers low latency, high bandwidth, and improved scalability, making it ideal for large-scale data centers. On the other hand, fabric-based architectures provide a unified and simplified network fabric, enabling efficient management and enhanced performance.

Network virtualization, powered by technologies like SDN, is transforming data center network design. By decoupling the network control plane from the underlying hardware, SDN enables centralized network management, automation, and programmability. This results in improved agility, better resource allocation, and faster deployment of applications and services.

With the rising number of cyber threats, ensuring robust security and resilience has become paramount. Data center network design should incorporate advanced security measures such as firewalls, intrusion detection systems, and encryption protocols. Additionally, implementing redundant links, load balancing, and disaster recovery mechanisms enhances network resilience and minimizes downtime.

Highlights: Data Center Network Design

Data Center Network Design

To embark on a successful network design journey, it is essential first to understand the data center’s specific requirements. Factors such as scalability, bandwidth, latency, and reliability need to be carefully assessed. By comprehending the data center’s unique needs, network architects can lay a solid foundation for an optimized design.

Efficiency and resilience are at the core of any well-designed data center network. Building on the requirements identified in the previous section, architects must consider redundancy, load balancing, and fault tolerance principles. The design should minimize single points of failure while maximizing resource utilization and network performance.

Various network topologies and architectures can be employed in data center network design. Each option offers unique advantages and trade-offs, from traditional hierarchical designs to modern approaches like leaf-spine architectures. This section will explore different topologies, highlighting their strengths and considerations.

Virtualization and SDN have revolutionized data center network design, offering increased flexibility and agility. By abstracting network functions from physical infrastructure, virtualization allows for dynamic resource allocation and improved scalability. SDN further enhances network programmability, enabling centralized management and automation. This section will delve into the benefits and implementation considerations of these technologies.

Network, security, and computing

– A data center architecture consists of three main components: the data center network, the data center security, and the data center computing architecture. In addition to these three types of architecture, there are also data center physical architectures and data center information architectures. The following are three typical compositions.

– Network architecture for data centers: Data center networks (DCNs) are arrangements of network devices interconnecting data center resources. They are a crucial research area for Internet companies and large cloud computing firms. The design of a data center depends on its network architecture.

– It is common for routers and switches to be arranged in hierarchies of two or three levels. There are three-tier DCNs: fat tree DCNs, DCells, and others. There has always been a focus on scalability, robustness, and reliability regarding data center network architectures.

– Data center security refers to physical practices and virtual technologies for protecting data centers from threats, attacks, and unauthorized access. It can be divided into two components: physical security and software security. A firewall between a data center’s external and internal networks can protect it from attack.

Data Center Network Design Considerations

a. Understanding the Requirements

Before embarking on the design process, it’s crucial to understand the data center’s unique requirements. Factors such as power and cooling, network connectivity, scalability, and security are vital in determining the design approach. By thoroughly assessing these requirements, architects can create a blueprint that aligns with the organization’s current and future needs.

b. Optimizing Physical Layout

The physical layout of a data center significantly impacts its efficiency and performance. This section will delve into rack placement, aisle design, cable management, and airflow optimization. By adopting best practices in physical layout design, data center operators can minimize energy consumption, reduce maintenance costs, and enhance overall operational efficiency.

c. Redundancy and Resilience

Data centers demand high levels of redundancy and resilience to ensure uninterrupted operations. This section will explore the concept of redundancy in power and cooling systems, backup generators, redundant network connectivity, and failover mechanisms. Implementing robust redundancy measures helps mitigate the risk of downtime and ensures continuous availability of critical services.

4. Security and Compliance

Data centers store sensitive and valuable information, making security a top priority. This section will discuss the importance of physical security measures, access controls, surveillance systems, and fire suppression mechanisms. Additionally, we will explore compliance standards and regulations that govern data center operations, such as SOC 2, ISO 27001, and GDPR.

5. Embracing Green Initiatives

As environmental sustainability gains importance, data centers seek ways to minimize their carbon footprint. This section will focus on energy-efficient design practices, including using renewable energy sources, efficient cooling techniques, and server virtualization. Data centers can contribute to a more sustainable future by adopting green initiatives.

Data Center Network Security 

### What is Cloud Armor?

Cloud Armor is a security service offered by Google Cloud that provides protection against distributed denial-of-service (DDoS) attacks and other web-based threats. It leverages Google’s global infrastructure to offer scalable and reliable protection, ensuring that your applications and services remain available and secure even in the face of large-scale attacks.

### Key Features of Cloud Armor

Cloud Armor comes packed with several features that make it an indispensable tool for modern enterprises. Some of its key features include:

– **DDoS Protection:** Automatically detects and mitigates DDoS attacks, ensuring minimal disruption to your services.

– **Web Application Firewall (WAF):** Provides customizable rules to block malicious traffic and protect against common web vulnerabilities.

– **Edge Security Policies:** Allows you to define security policies at the edge of your network, ensuring threats are mitigated before they reach your core infrastructure.

– **Adaptive Protection:** Uses machine learning to identify and respond to evolving threats in real-time.

### Understanding Edge Security Policies

One of the standout features of Cloud Armor is its ability to implement edge security policies. These policies enable organizations to enforce security measures at the periphery of their network, providing an additional layer of defense. By stopping threats at the edge, you can prevent them from penetrating deeper into your network, thereby reducing the risk of data breaches and other security incidents.

Edge security policies can be tailored to your specific needs, allowing you to block traffic based on various criteria such as IP address, geographic location, and request patterns. This granular control helps you enforce stringent security measures while maintaining the performance and availability of your services.

### Benefits of Using Cloud Armor

Deploying Cloud Armor offers several benefits that can significantly enhance your security posture. These include:

– **Scalability:** Designed to handle traffic spikes and large-scale attacks, ensuring your services remain available even under heavy load.

– **Customization:** Flexible rules and policies allow you to tailor security measures to your unique requirements.

– **Proactive Defense:** Real-time threat detection and mitigation keep your applications protected against the latest cyber threats.

– **Cost-Effective:** By leveraging Google’s global infrastructure, you can achieve enterprise-level security without the need for significant upfront investment.

### What is Google Network Connectivity Center?

Google Network Connectivity Center is a unified platform designed to manage and monitor network connections across a variety of environments. Whether you’re dealing with on-premises data centers, cloud environments, or hybrid setups, NCC provides a centralized control point. It simplifies the complexities involved in network management, allowing IT teams to focus on optimizing performance rather than troubleshooting issues.

### Key Features of Google NCC

#### Unified Management

NCC offers a single pane of glass for managing network connections, making it easier to oversee and control your entire network infrastructure. This unified management approach reduces the need for multiple tools and interfaces, streamlining operations and increasing efficiency.

#### Flexible Connectivity Options

Google NCC supports a range of connectivity options, including VPNs, interconnects, and peering. This flexibility ensures that you can choose the best connectivity method for your specific needs, whether it’s connecting remote offices or integrating with third-party cloud services.

#### Real-Time Monitoring and Analytics

One of the standout features of NCC is its real-time monitoring and analytics capabilities. With detailed insights into network performance and traffic patterns, you can quickly identify and resolve issues, optimize resource allocation, and ensure consistent network performance.

Understanding Network Tiers

Network tiers are a concept that categorizes network traffic based on its importance and priority. By classifying traffic into different tiers, businesses can allocate resources accordingly and optimize their network usage. In the case of Google Cloud, there are two main network tiers: Premium Tier and Standard Tier.

The Premium Tier is designed to deliver exceptional performance and reliability. It leverages Google’s global network infrastructure, ensuring low latency and high throughput for critical applications. By utilizing the Premium Tier, businesses can enhance user experience, reduce latency-related issues, and improve overall network performance.

While the Premium Tier offers top-tier performance, the Standard Tier provides a cost-effective solution for non-critical workloads. It offers reliable network connectivity at a lower price point, making it an excellent choice for applications that do not require ultra-low latency or high bandwidth. By strategically utilizing the Standard Tier, businesses can optimize their network spend without compromising on reliability.

Understanding VPC Networking

VPC, or Virtual Private Cloud, is a virtual network dedicated to a specific Google Cloud project. It allows users to define and manage their network resources, including subnets, IP addresses, and firewall rules. With VPC networking, businesses can create isolated environments and control the flow of traffic within their cloud infrastructure.

Google Cloud’s VPC networking offers a range of powerful features. Firstly, it provides global connectivity, allowing businesses to connect resources across regions seamlessly. Additionally, VPC peering enables secure communication between different VPC networks, facilitating collaboration and data sharing. Moreover, VPC networking offers granular control through firewall rules, ensuring robust security for applications and services.

What is Google Cloud CDN?

Google Cloud CDN, short for Content Delivery Network, is a globally distributed network of servers designed to deliver content to users at blazing-fast speed. Cloud CDN minimizes latency and ensures a seamless user experience by caching your content in strategic locations worldwide. Whether it’s static assets, dynamic content, or even streaming media, Cloud CDN optimizes the delivery process, reducing the load on your origin servers and improving overall performance.

Cloud CDN operates by leveraging Google’s extensive network infrastructure. When a user requests content from your website or application, Cloud CDN intelligently routes the request to the nearest edge location. If the content is already cached at that edge location, it is immediately delivered to the user, eliminating the need for a round trip to the origin server. This not only reduces latency but also saves bandwidth and server resources.

Understanding VPC Network Peering

VPC network peering connects VPC networks from different projects or within the same project within Google Cloud. It enables direct communication between these networks, eliminating the need for complex VPN setups or public IP addresses. This seamless connectivity can significantly enhance collaboration, data sharing, and network management.

Enhanced Security: VPC network peering ensures that communication between peered networks remains isolated from the public internet. This adds an extra layer of security by reducing the exposure to potential cyber threats.

Improved Performance: By leveraging VPC network peering, data can be transferred at incredibly high speeds between peered networks. This enables faster resource access, reduces latency, and enhances overall application performance.

Simplified Network Architecture: VPC network peering allows for a more streamlined and simplified network architecture. Instead of relying on complex gateways or routers, communication between VPCs can be established directly, making network management and troubleshooting more straightforward.

Data Center Network Types

a. The Three-Tier Data Center Network

The three-tier DCN architecture has been a traditional approach in data center networking. It consists of three layers: the access layer, the aggregation layer, and the core layer. Each layer serves a specific purpose, from connecting end devices to aggregating traffic and providing high-speed connectivity. This hierarchical design allows for scalability and redundancy, making it a popular choice for many data centers.

b. Unleashing the Power of Fat Tree Data Center Networks

The fat tree DCN, also known as the Clos network, has gained prominence recently due to its ability to handle large-scale data center deployments. Unlike the three-tier DCN, a fat tree network provides multiple paths between devices, enabling better load balancing and higher bandwidth capacity. Fat tree networks offer low-latency communication and enhanced fault tolerance by utilizing a non-blocking switching fabric, making them ideal for mission-critical applications.

c. Exploring the Revolutionary DCell Approach

The DCell architecture takes a novel approach to data center networking and offers a unique perspective on scalability and fault tolerance. DCell networks are based on a hierarchical structure of cells, where each cell consists of a group of servers connected together. This decentralized design eliminates the need for traditional core switches and enables direct server-to-server communication. With its self-organizing capabilities, DCell networks provide excellent scalability, fault tolerance, and efficient resource utilization.

Composition of Data Center Architecture

Routing and Switching:

Routing is the backbone of a data center network, guiding data packets through the labyrinthine pathways. It involves determining the optimal path for data to travel from source to destination, considering network congestion, latency, and cost factors. Advanced routing protocols like Border Gateway Protocol (BGP) enable dynamic route selection, ensuring efficient and fault-tolerant data delivery.

Switching complements routing by facilitating efficient data transmission within a local network. At the heart of a data center, switches act as intelligent traffic controllers, directing data packets to their intended destinations. With features like VLANs (Virtual Local Area Networks) and Quality of Service (QoS), switches prioritize and prioritize traffic, optimizing network performance and ensuring seamless communication.

stp port states

Example: Spanning Tree Uplink Fast

Spanning Tree Protocol (STP) prevents loops in Ethernet networks by creating a loop-free logical topology and blocking redundant paths. While STP ensures network stability, it can also introduce delays in network convergence. Network downtime caused by STP convergence can be a primary concern for businesses. Even a few seconds of downtime can result in significant losses in critical environments. This is where Spanning Tree Uplink Fast comes into play. Uplink Fast is an enhancement to STP that provides faster convergence times, reducing network downtime and improving overall network efficiency.

How Uplink Fast Works

Uplink Fast allows a switch to detect a link failure on its designated root port and immediately activate an alternate port. This process eliminates the need for the traditional STP convergence process, resulting in faster network recovery times. Uplink Fast is instrumental when network redundancy is crucial, such as in data centers or enterprise networks.

Introducing Spanning Tree MST

Spanning Tree MST enhances the traditional STP, providing a more efficient and flexible solution. MST allows network administrators to divide the network into multiple regions, each with its own Spanning Tree instance. By doing so, MST optimizes network resources and enables load balancing across multiple paths, leading to increased performance and redundancy.

To implement Spanning Tree MST, network switches need to be properly configured. This involves defining regions, assigning VLANs to instances, and configuring parameters such as root bridges and priorities. MST configuration can be complex, but with careful planning and understanding, it offers significant benefits.

Spanning Tree MST offers several key advantages. First, it enables efficient utilization of network resources by load-balancing traffic across multiple paths. Second, it provides enhanced redundancy, ensuring that if one path fails, traffic can automatically reroute through an alternate path. Third, MST simplifies network management by allowing administrators to control traffic flow and prioritize specific VLANs within each instance.

Data Center Security Technologies

Understanding the MAC Move Policy

The MAC Move Policy is a crucial feature in Cisco NX-OS devices that governs the movement of MAC addresses within a network. By defining specific rules and criteria, administrators can control how MAC addresses are learned, aged, and moved across different interfaces and VLANs.

Configuring the MAC Move Policy

Proper configuration is essential to effectively utilizing the MAC Move Policy. This section will guide you through the step-by-step process of configuring the policy on Cisco NX-OS devices. From defining the MAC move parameters to implementing the policy on specific interfaces or VLANs, we will cover all the necessary commands and considerations to ensure a seamless configuration experience.

Understanding MAC ACLs

MAC ACLs, also known as Ethernet ACLs or Layer 2 ACLs, operate at the data link layer of the OSI model. Unlike traditional IP-based ACLs, which focus on network layer addresses, MAC ACLs allow administrators to filter traffic based on MAC addresses. This enables granular control over network access, providing an additional layer of defense against unauthorized devices.

By implementing MAC ACLs on the Nexus 9000 series, network administrators can exercise enhanced control over their network environment. MAC ACLs prevent MAC address spoofing, mitigating the risk of unauthorized devices gaining access. Furthermore, they enable the isolation of specific devices or groups of devices, ensuring that only designated entities can communicate within a given VLAN or network segment.

Understanding VLANs and ACLs

Before we embark on our journey to explore VLAN ACLs’ potential, let’s establish a solid foundation by understanding VLANs and ACLs individually. VLANs (Virtual Local Area Networks) allow us to logically segment networks, improving performance, scalability, and network management. On the other hand, ACLs (Access Control Lists) act as gatekeepers, controlling traffic flow and enforcing security policies.

VLAN ACLs serve as a crucial layer of defense in protecting our networks from unauthorized access, malicious activities, and potential breaches. By implementing VLAN ACLs, we can define granular rules that filter and restrict traffic between VLANs, ensuring that only desired communication occurs. This level of control empowers network administrators to mitigate risks, maintain data integrity, and enforce compliance.

Understanding Nexus Switch Profiles

Nexus switch profiles are a feature of Cisco’s Nexus series switches that allow administrators to define and manage a group of switches as a single entity. By creating a profile, administrators can easily configure and monitor all switches within the group, eliminating the need for repetitive manual configurations. This centralization of management simplifies network administration and saves valuable time and resources.

One of the primary advantages of using Nexus switch profiles is the ability to streamline network operations. With a profile in place, administrators can make changes or updates to configurations across multiple switches simultaneously. This significantly reduces the risk of configuration errors and ensures consistent settings throughout the network. Furthermore, the centralized management approach simplifies troubleshooting and enables faster resolution of network issues.

Data Center Technologies

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel is a link aggregation technique that combines multiple physical links between switches into a single logical channel. By bundling these links together, traffic can be distributed across them, increasing overall bandwidth capacity and providing load-balancing capabilities. Unlike Layer 2 Etherchannel, Layer 3 Etherchannel operates at the network layer, allowing traffic to be routed.

To configure Layer 3 Etherchannel, several steps need to be followed. First, the physical interfaces on the switches need to be identified and grouped into the Etherchannel bundle. Then, a logical interface, the Port-Channel interface, is created and assigned an IP address. Subsequently, routing protocols or static routes can be configured on the Port-Channel interface to enable communication between different networks.

Layer 3 Etherchannel supports various load-balancing algorithms, determining how traffic is distributed across the bundled links. Standard algorithms include source IP, destination IP, and round-robin. Each algorithm has advantages and considerations depending on the network requirements and traffic patterns.

Cisco Nexus 9000 Port Channel

Implementing Port Channels on Cisco Nexus 9000 switches offers several advantages. Firstly, it provides increased link bandwidth, allowing for efficient data transfer and reducing bottlenecks. Secondly, Port Channels enhance network resilience by providing link redundancy. In a link failure, traffic seamlessly switches to the remaining active links. Lastly, Port Channels enable load balancing, distributing network traffic evenly across the aggregated links for optimal utilization.

Setting up a Port Channel on Cisco Nexus 9000 switches is straightforward. Administrators can configure Port Channels using the Link Aggregation Control Protocol (LACP) or the Port Aggregation Protocol (PAgP). Administrators can maximize the benefits of this feature by adequately configuring interfaces and assigning them to the Port Channel.

Understanding Unidirectional Link Detection (UDLD)

UDLD is a layer 2 protocol that helps identify and mitigate the presence of unidirectional links in a network. It works by exchanging periodic messages between neighboring switches to verify bidirectional connectivity. By detecting unidirectional links, UDLD helps prevent potential network issues such as black holes, spanning-tree loops, and data loss.

Cisco Nexus 9000 switches offer seamless integration and support for UDLD. To enable UDLD on a Nexus 9000 switch, administrators can utilize simple commands within the switch configuration. By configuring UDLD timers, administrators can customize the frequency of UDLD messages exchanged between switches. Additionally, UDLD can be configured to operate in either standard or aggressive mode, depending on the specific needs of the network environment.

Understanding VRRP

VRRP, an essential networking protocol, provides automatic failover and load-balancing capabilities. It allows multiple routers to work as a virtual group, presenting a single IP address. By intelligently distributing network traffic, VRRP ensures seamless connectivity even in the face of router failures.

The Nexus 9000 Series, Cisco’s flagship product line, offers a range of cutting-edge features, including VRRP. Designed to meet the demands of modern networks, these switches deliver exceptional performance, scalability, and flexibility. With the Nexus 9000 Series, network administrators can harness the power of VRRP to build a robust and highly available network infrastructure.

Example: Data Center WAN Protocol

BGP, also known as the routing protocol of the Internet, is responsible for exchanging routing and reachability information among autonomous systems (AS). It enables routers to make intelligent decisions about the most optimal paths for data transmission. Unlike interior gateway protocols, BGP focuses on routing between different networks rather than within a single network.

BGP operates on a trust-based model, where routers form peer relationships to exchange routing information. These peers establish connections and exchange routing updates, allowing them to build a complete picture of network reachability. BGP uses a sophisticated algorithm that considers multiple factors, such as path length, quality of service, and policy-based decisions, to determine the best route for traffic.

Understanding BGP AS Prepend

AS Prepend involves adding additional Autonomous System (AS) numbers to the AS path attribute of BGP advertisements. By manipulating the AS path, network operators can influence inbound traffic routing decisions by neighboring autonomous systems. This technique makes a specific path appear less desirable, diverting traffic to alternative paths.

AS Prepend holds excellent potential for optimizing network routing in various scenarios. It can achieve load balancing across multiple links, redirect traffic to less congested paths, or prefer specific transit providers. By carefully implementing AS Prepend, network administrators can improve network performance, reduce latency, and enhance overall service quality.

BGP AS Prepend

Recap: Border Gateway Protocol (BGP) is data centers’ most commonly used routing protocol. It has been used to connect Internet systems worldwide for decades and can also be used outside a data center. The BGP protocol is a standard-based open-source software package. It’s more common to find BGP peering between data centers over the WAN. However, we see BGP often used purely inside the data center.

 Understanding Leaf and Spine Networks

Leaf and spine networks, also known as Clos networks, are a modern approach to data center architecture. The design revolves around a hierarchical structure consisting of two key components: leaf switches and spine switches. Leaf switches connect directly to endpoints, while spine switches interconnect the leaf switches, forming a non-blocking fabric. This architecture eliminates bottlenecks and enables seamless scalability.

BGP (Border Gateway Protocol) is a crucial routing protocol in leaf and spine networks. It ensures efficient data forwarding between leaf switches using a set of rules known as BGP route advertisements. By default, BGP requires every router to have a full mesh of connections with all other routers in the network, which can be resource-intensive. This is where BGP route reflection comes into play.

Understanding BGP Route Reflection

BGP route reflection, at its core, is a method that allows a BGP speaker to reflect routing information to its peers, alleviating the need for full-mesh connectivity. Designating specific BGP routers as route reflectors streamlines and manages the network structure.

The utilization of BGP route reflection offers several advantages. First, it reduces the number of required BGP peering sessions, resulting in a simplified and less resource-intensive network. Second, route reflection enhances scalability by eliminating the need for full-mesh connectivity, particularly in large-scale networks. Third, it improves convergence time and reduces BGP update processing overhead, enhancing overall network performance.

**The third wave of application architectures**

Google and Amazon, two of the world’s leading web-scale pioneers, developed a modern data center. The third wave of application architectures represents these organizations’ search and cloud applications. Towards the end of the 20th century, client-server architectures and monolithic single-machine applications dominated the landscape. This third wave of applications has three primary characteristics:

Unlike client-server architectures, modern data center applications involve a lot of communication between servers. In client-server architectures, clients communicate with monolithic servers, which either handle the request entirely themselves or communicate with fewer than a handful of other servers, such as database servers. Search (or Hadoop, its more popular variant) employs many mappers and reducers instead of search. In the cloud, virtual machines can reside on different nodes but must communicate seamlessly. In some cases, VMs are deployed on servers with the least load, scaled out, or balanced loads.

A microservices architecture also increases server-to-server communication. This architecture is based on separating a single function into smaller building blocks and interacting with them. Each block can be used in several applications and enhanced, modified, and fixed independently in such an architecture. Since diagrams usually show servers next to each other, East-West traffic is often called server communication. Traffic flows north-south between local networks and external networks.

**Scale and resilience**

The sheer size of modern data centers is characterized by rows and rows of dark, humming, blinking machines. As opposed to the few hundred or so servers of the past, a modern data center contains between a few hundred and a hundred thousand servers. To address the connectivity requirements at such scales, as well as the need for increased server-to-server connectivity, network design must be rethought. Unlike older architectures, modern data center applications assume failures as a given. Failures should be limited to the smallest possible footprint. Failures must have a limited “blast radius.” By minimizing the impact of network or server failures on the end-user experience, we aim to provide a stable and reliable experience.

**Data Center Goal: Interconnect networks**

The goal of data center design and interconnection network is to transport end-user traffic from A to B without any packet drops, yet the metrics we use to achieve this goal can be very different. The data center is evolving and progressing through various topology and technology changes, resulting in multiple network designs.  The new data center control planes we see today, such as Fabric Path, LISP, THRILL, and VXLAN, are driven by a change in the end user’s requirements; the application has changed. These new technologies may address new challenges, yet the fundamental question of where to create the Layer 2/Layer three boundaries and the need for Layer 2 in the access layer remains the same. The question stays the same, yet the technologies available to address this challenge have evolved.

Example Protocol: Understanding VXLAN

VXLAN, an encapsulation protocol, enables the creation of virtualized Layer 2 networks over an existing Layer 3 infrastructure. By extending the Layer 2 domain, VXLAN allows the seamless transfer of network traffic between geographically dispersed data centers. It achieves this by encapsulating Ethernet frames within IP packets, providing flexibility and scalability to network virtualization.

Scalability and Flexibility: VXLAN addresses the limitations of traditional VLANs by allowing for a significantly more significant number of virtual networks—up to 16 million—compared to the 4,096 limit of VLANs. This scalability enables organizations to allocate virtual networks more efficiently while accommodating the growing demands of cloud-based applications and services.

Enhanced Network Segmentation and Isolation: VXLAN provides improved network segmentation by creating logical networks that are isolated from one another, even if they share the same physical infrastructure. This isolation enhances security and enables more granular control over network traffic, facilitating efficient multi-tenancy in cloud environments.

VXLAN unicast mode

Modern Data Centers

There is a vast difference between modern data centers and what they used to be just a few years ago. Physical servers have evolved into virtual networks that support applications and workloads across pools of physical infrastructure and into a multi-cloud environment. There are multiple data centers, the edge, and public and private clouds where data exists and is connected. Both on-premises and cloud-based data centers must be able to communicate. Data centers are even part of the public cloud. Cloud-hosted applications use the cloud provider’s data center resources.

Unified Fabric

Through Cisco’s fabric-based data center infrastructure, tiered silos and inefficiencies of multiple network domains are eliminated, and a unified, flat fabric is provided instead, which allows local area networks (LANs), storage area networks (SANs), and network-attached storage (NASs) to be consolidated into one high-performance, fault-tolerant network. Creating large pools of virtualized network resources that can be easily moved and rapidly reconfigured with Cisco Unified Fabric provides massive scalability and resiliency to the data center.

This approach automatically deploys virtual machines and applications, thereby reducing complexity. Thanks to deep integration between server and network architecture, secure IT services can be delivered from any device within the data center, between data centers, or beyond. In addition to Cisco Nexus switches, Cisco Unified Fabric uses Cisco NX-OS as its operating system.

The use of Open Networking

We also have the Open Networking Foundation ( ONF ), which provides open networking. Open networking describes a network that uses open standards and commodity hardware. So, consider open networking in terms of hardware and software. Unlike a vendor approach like Cisco, this gives you much more choice with what hardware and software you use to make up and design your network.

Data Center Performance Parameters

TCP Performance Parameters

TCP (Transmission Control Protocol) is the backbone of modern Internet communication, ensuring reliable data transmission across networks. However, various parameters that determine TCP’s behavior can influence its performance. 

Understanding TCP Window Size: One crucial parameter that affects TCP performance is the window size. The TCP window size refers to the amount of data sent before an acknowledgment is required. A larger window size allows more data to be transmitted without waiting for acknowledgments, thus optimizing throughput. However, substantial window sizes can result in congestion and increased retransmissions.

Congestion Control Mechanisms: Congestion control mechanisms are vital in maintaining network stability and preventing congestion collapse. TCP utilizes algorithms such as Slow Start, Congestion Avoidance, and Fast Recovery to regulate data flow based on network conditions. These mechanisms ensure fairness and efficiency, improving TCP performance and avoiding network congestion.

Timeouts and Retransmission: TCP implements a reliable data transfer mechanism using acknowledgments and timeouts. When a packet is not acknowledged within a specific timeframe, it is considered lost, and TCP initiates retransmission. The selection of appropriate timeout values is crucial to balance reliability and responsiveness. Setting shorter timeouts may lead to unnecessary retransmissions, whereas longer ones can increase latency.

 Selective Acknowledgments and SACK Options: Selective acknowledgments (SACK) enhance TCP performance and recovery from packet loss. SACK lets the receiver inform the sender about specific out-of-order packets received successfully. This enables the sender to retransmit only the necessary packets, reducing unnecessary retransmissions and improving overall efficiency.

Maximum Segment Size (MSS): The Maximum Segment Size (MSS) is another crucial TCP performance parameter defining the maximum amount of data encapsulated within a single TCP segment. Optimizing the MSS can significantly impact performance, especially when network links have different MTU (Maximum Transmission Unit) sizes.

Understanding TCP MSS

TCP MSS refers to the maximum amount of data encapsulated within a single TCP segment. It represents the size of the payload, excluding headers and other overhead. The MSS value is negotiated during the TCP handshake process and remains constant throughout the connection.

The TCP MSS value has a direct impact on network performance and efficiency. Setting an appropriate MSS value ensures optimal network resource utilization and avoids unnecessary data packet fragmentation. Properly configuring TCP MSS becomes crucial when networks have different MTU (Maximum Transmission Unit) sizes.

Fragmentation occurs when the MSS value exceeds the MTU of a network path. This fragmentation can lead to performance degradation, increased latency, and potential packet loss. By carefully managing the TCP MSS value, network administrators can prevent or minimize fragmentation issues and enhance overall network performance.

Configuring TCP MSS requires a thorough understanding of the network infrastructure and the devices involved. It involves adjusting the MSS value at various points within the network, such as routers, firewalls, and load balancers. Aligning the TCP MSS value with the MTU of the underlying network ensures efficient data transmission and avoids unnecessary fragmentation.

Advanced Topics

VXLAN Flood and Learn Mechanism

The flood-and-learn mechanism in VXLAN plays a crucial role in facilitating communication between virtual machines within the overlay network. When a virtual machine sends a broadcast or unknown unicast frame, the frame is encapsulated in a VXLAN packet and flooded throughout the network. Each VXLAN tunnel endpoint (VTEP) learns the source MAC address and VTEP association, enabling subsequent unicast traffic to be directly delivered.

Multicast is a fundamental component of VXLAN flood and learn, offering several benefits. First, using multicast VXLAN reduces bandwidth consumption compared to traditional flooding techniques. Second, multicast enables efficient replicating broadcast, multicast, and unknown unicast traffic across the overlay network. Third, it enhances network scalability by eliminating the need to maintain a multicast group per tenant.

BGP Multipath

Understanding BGP Multipath

BGP multipath is a feature that enables the installation and usage of multiple paths for a single prefix in the routing table. Traditionally, BGP selects a single best path based on factors such as AS path length, origin type, and path attributes. However, with multipath enabled, BGP can utilize multiple paths simultaneously, distributing traffic across them for load balancing and redundancy purposes.

The utilization of BGP multipath brings several advantages to network operators. First, it enhances network resilience by providing redundant paths. In the event of a link failure or congestion, traffic can be automatically rerouted through available alternate paths, ensuring continuous connectivity. Additionally, BGP multipath facilitates load balancing, enabling more efficient utilization of network resources and better traffic distribution across multiple links.

Understanding BGP Next Hop Tracking

BGP next-hop tracking monitors the reachability of the next-hop IP address associated with a particular route. It allows routers to dynamically adjust their routing tables based on changes in the network topology. Routers can make informed decisions about forwarding traffic by continuously tracking the next hop, ensuring optimal path selection.

Enhanced Network Resiliency: BGP next-hop tracking enables routers to detect and respond to network changes quickly. If a next hop becomes unreachable, routers can automatically reroute traffic to an alternative path, minimizing downtime and improving network resiliency.

Load Balancing and Traffic Engineering: Network administrators gain granular control over traffic distribution with BGP next-hop tracking. By monitoring the reachability of multiple next hops, routers can intelligently distribute traffic across different paths, optimizing resource utilization and improving overall network performance.

Improved Network Convergence: Rapid convergence is crucial in dynamic networks. BGP next hop tracking facilitates faster convergence by promptly updating routing tables when next hops become unreachable. This ensures routing decisions are based on current information, reducing packet loss and minimizing network disruptions.

next hop tracking

Related: Before you proceed, you may find the following useful:

  1. ACI Networks
  2. IPv6 Attacks
  3. SDN Data Center
  4. Active Active Data Center Design
  5. Virtual Switch

Data Center Network Design

The Rise of Overlay Networking

What has the industry introduced to overcome these limitations and address the new challenges? – Network virtualization and overlay networking. In its simplest form, an overlay is a dynamic tunnel between two endpoints that enables Layer 2 frames to be transported between them. In addition, these overlay-based technologies provide a level of indirection that allows switching table sizes to not increase in the order of the number of supported end hosts.

Today’s overlays are Cisco FabricPath, THRILL, LISP, VXLAN, NVGRE, OTV, PBB, and Shorted Path Bridging. They are essentially virtual networks that sit on top of a physical network, and often, the physical network is unaware of the virtual layer above it.

Traditional Data Center Network Design

How do routers create a broadcast domain boundary? Firstly, using the traditional core, distribution, and access model, the access layer is layer 2, and servers served to each other in the access layer are in the same IP subnet and VLAN. The same access VLAN will span the access layer switches for east-to-west traffic, and any outbound traffic is via a First Hop Redundancy Protocol ( FHRP ) like Hot Standby Router Protocol ( HSRP ).

Servers in different VLANs are isolated from each other and cannot communicate directly; inter-VLAN communications require a Layer 3 device. Virtualization’s humble beginnings started with VLANs, which were used to segment traffic at Layer 2. It was expected to find single VLANs spanning an entire data center fabric.

VLAN and Virtualization

The virtualization side of VLANs comes from two servers physically connected to different switches. Assuming the VLAN spans both switches, the same VLAN can communicate with each server. Each VLAN can be defined as a broadcast domain in a single Ethernet switch or shared among connected switches.

Whenever a switch interface belonging to a VLAN receives a broadcast frame (the destination MAC is ffff.ffff.ffff), the device must forward it to all other ports defined in the same VLAN.

This approach is straightforward in design and is almost like a plug-and-play network. The first question is, why not connect everything in the data center into one large Layer 2 broadcast domain? Layer 2 is a plug-and-play network, so why not? STP also blocks links to prevent loops.

The issues of Layer 2

The reason is that there are many scaling issues in large layer 2 networks. Layer 2 networks don’t have controlled / efficient network discovery protocols. Address Resolution Protocol ( ARP ) is used to locate end hosts and uses Broadcasts and Unicast replies. A single host might not generate much traffic, but imagine what would happen if 10,000 hosts were connected to the same broadcast domain. VLANs span an entire data center fabric, which can bring a lot of instability due to loops and broadcast storms.

**No hierarchy in MAC addresses**

MAC addressing also lacks hierarchy. Unlike Layer 3 networks, which allow summarization and hierarchy addressing, MAC addresses are flat. Adding several thousand hosts to a single broadcast domain will create large forwarding information tables.

Because end hosts are potentially not static, they are likely to be attached and removed from the network at regular intervals, creating a high rate of change in the control plane. Of course, you can have a large Layer 2 data center with multiple tenants if they don’t need to communicate with each other.

The shared services requirements, such as WAAS or load balancing, can be solved by spinning up the service VM in the tenant’s Layer 2 broadcast domain. This design will hit scaling and management issues. There is a consensus to move from a Layer 2 design to a more robust and scalable Layer 3 design.

But why is Layer 2 still needed in data center topologies? One solution is Layer 2 VPN with EVPN. But first, let us look at Cisco DFA.

The Requirement for Layer 2 in Data Center Network Design

  • Servers that perform the same function might need to communicate with each other due to a clustering protocol or simply as part of the application’s inner functions. If the communication is clustering protocol heartbeats or some server-to-server application packets that are not routable, then you need this communication layer to be on the same VLAN, i.e., Layer 2 domain, as these types of packets are not routable and don’t understand the IP layer.

  • Stateful devices such as firewalls and load balancers need Layer 2 adjacency as they constantly exchange connection and session state information.

  • Dual-homed servers: Single server with two server NICs and one NIC to each switch will require a layer 2 adjacency if the adapter has a standby interface that uses the same MAC and IP addresses after a failure. In this situation, the active and standby interfaces must be on the same VLAN and use the same default gateway.

  • Suppose your virtualization solutions cannot handle Layer 3 VM mobility. In that case, you may need to stretch VLANs between PODS / Virtual Resource Pools or even data centers so you can move VMs around the data center at Layer 2 ( without changing their IP address ).

Data Center Design and Cisco DFA

Cisco took a giant step and recently introduced a data center fabric with Dynamic Fabric Automaton ( DFA ), similar to Juniper QFabric. This fabric offers Layer 2 switching and Layer 3 routing at the access layer / ToR. Firstly, it has a Fabric Path ( IS-IS for Layer 2 connectivity ) in the core, which gives optimal Layer 2 forwarding between all the edges.

Then they configure the same Layer 3 address on all the edges, which gives you optimal Layer 3 forwarding across the whole Fabric.

On the edge, you can have Layer 3 Leaf switches, such as the Nexus 6000 series, or integrate with Layer 2-only devices, like the Nexus 5500 series or the Nexus 1000v. You can connect external routers, USC, or FEX to the Fabric. In addition to running IS-IS as the data center control plane, DFA uses MP-iBGP, with some Spine nodes being the Route Reflector to exchange IP forwarding information.

Cisco FabricPath

DFA also employs a Cisco FabricPath technique called “Conversational Learning.” The first packet triggers a full RIB lookup, and the subsequent packets are switched in the hardware-implemented switching cache.

This technology provides Layer 2 mobility throughout the data center while providing optimal traffic flow using Layer 3 routing. Cisco commented, “DFA provides a scale-out architecture without congestion points in the network while providing optimized forwarding for all applications.”

Terminating Layer 3 at the access / ToR has clear advantages and disadvantages. Other benefits include reducing the size of the broadcast domain, which comes at the cost of reducing the mobility domain across which VMs can be moved.

Terminating Layer 3 at the accesses can also result in sub-optimal routing because there will be hair pinning or traffic tromboning of across-subnet traffic, taking multiple and unnecessary hops across the data center fabric.

The role of the Cisco Fabricpath

Cisco FabricPath is a Layer 2 technology that provides Layer 3 benefits, such as multipathing the classical Layer 2 networks using IS-IS at Layer 2. This eliminates the need for spanning tree protocol, avoiding the pitfalls of having large Layer 2 networks. As a result, Fabric Path enables a massive Layer 2 network that supports multipath ( ECMP ). THRILL is an IEEE standard that, like Fabric Path, is a Layer 2 technology that provides the same Layer 3 benefits as Cisco FabricPath to the Layer 2 networks using IS-IS.

LISP is popular in Active data centers for DCI route optimization/mobility. It separates the host’s location from the identifier ( EID ), allowing VMs to move across subnet boundaries while keeping the endpoint identification. LISP is often referred to as an Internet locator. 

That can enable some triangular routing designs. Popular encapsulation formats include VXLAN ( proposed by Cisco and VMware ) and STT (created by Nicira but will be deprecated over time as VXLAN comes to dominate ).

The role of OTV

OTV is a data center interconnect ( DCI ) technology enabling Layer 2 extension across data center sites. While Fabric Path can be a DCI technology with dark fiber over short distances, OTV has been explicitly designed for DCI. In contrast, the Fabric Path data center control plane is primarily used for intra-DC communications.

Failure boundary and site independence are preserved in OTV networks because OTV uses a data center control plane protocol to sync MAC addresses between sites and prevent unknown unicast floods. In addition, recent IOS versions can allow unknown unicast floods for certain VLANs, which are unavailable if you use Fabric Path as the DCI technology.

The Role of Software-defined Networking (SDN)

Another potential trade-off between data center control plane scaling, Layer 2 VM mobility, and optimal ingress/egress traffic flow would be software-defined networking ( SDN ). At a basic level, SDN can create direct paths through the network fabric to isolate private networks effectively.

An SDN network allows you to choose the correct forwarding information on a per-flow basis. This per-flow optimization eliminates VLAN separation in the data center fabric. Instead of using VLANs to enforce traffic separation, the SDN controller has a set of policies allowing traffic to be forwarded from a particular source to a destination.

The ACI Cisco borrows concepts of SDN to the data center. It operates over a leaf and spine design and traditional routing protocols such as BGP and IS-IS. However, it brings a new way to manage the data center with new constructs such as Endpoint Groups (EPGs). In addition, no more VLANs are needed in the data center as everything is routed over a Layer 3 core, with VXLAN as the overlay protocol.

**Closing Points: Data Center Design**

Data centers are the backbone of modern technology infrastructure, providing the foundation for storing, processing, and transmitting vast amounts of data. A critical aspect of data center design is the network architecture, which ensures efficient and reliable data transmission within and outside the facility.  1. Scalability and Flexibility

One of the primary goals of data center network design is to accommodate the ever-increasing demand for data processing and storage. Scalability ensures the network can grow seamlessly as the data center expands. This involves designing a network that supports many devices, servers, and users without compromising performance or reliability. Additionally, flexibility is essential to adapt to changing business requirements and technological advancements.

Redundancy and High Availability

Data centers must ensure uninterrupted access to data and services, making redundancy and high availability critical for network design. Redundancy involves duplicating essential components, such as switches, routers, and links, to eliminate single points of failure. This ensures that if one component fails, there are alternative paths for data transmission, minimizing downtime and maintaining uninterrupted operations. High availability further enhances reliability by providing automatic failover mechanisms and real-time monitoring to promptly detect and address network issues.

Traffic Optimization and Load Balancing

Efficient data flow within a data center is vital to prevent network congestion and bottlenecks. Traffic optimization techniques, such as Quality of Service (QoS) and traffic prioritization, can be implemented to ensure that critical applications and services receive the necessary bandwidth and resources. Load balancing is crucial in evenly distributing network traffic across multiple servers or paths, preventing overutilizing specific resources, and optimizing performance.

Security and Data Protection

Data centers house sensitive information and mission-critical applications, making security a top priority. The network design should incorporate robust security measures, including firewalls, intrusion detection systems, and encryption protocols, to safeguard data from unauthorized access and cyber threats. Data protection mechanisms, such as backups, replication, and disaster recovery plans, should also be integrated into the network design to ensure data integrity and availability.

Monitoring and Management

Proactive monitoring and effective management are essential for maintaining optimal network performance and addressing potential issues promptly. The network design should include comprehensive monitoring tools and centralized management systems that provide real-time visibility into network traffic, performance metrics, and security events. This enables administrators to promptly identify and resolve network bottlenecks, security breaches, and performance degradation.

Data center network design is critical in ensuring efficient, reliable, and secure data transmission within and outside the facility. Scalability, redundancy, traffic optimization, security, and monitoring are essential considerations for designing a robust, high-performance network. By implementing best practices and staying abreast of emerging technologies, data centers can build networks that meet the growing demands of the digital age while maintaining the highest levels of performance, availability, and security.

Example Product: Data Center Monitoring

#### Understanding Cisco ThousandEyes

Cisco ThousandEyes is a comprehensive network intelligence platform that offers deep insights into the performance and health of your data center. By leveraging cloud-based agents and on-premises appliances, ThousandEyes provides end-to-end visibility across your entire network, from your data center to the cloud and beyond. This holistic approach allows IT teams to quickly identify and resolve issues, ensuring that your data center operates at peak efficiency.

#### Key Features of Cisco ThousandEyes

One of the standout features of Cisco ThousandEyes is its ability to deliver real-time insights into network performance. With its advanced monitoring capabilities, ThousandEyes can detect anomalies, pinpoint bottlenecks, and provide actionable data to help you optimize your data center operations. Here are some of the key features that make ThousandEyes a valuable asset:

– **End-to-End Visibility:** Monitor the entire network path, from the user to the application, ensuring no blind spots.

– **Cloud and On-Premises Integration:** Seamlessly integrate with both cloud-based and on-premises infrastructure for comprehensive coverage.

– **Real-Time Alerts:** Receive instant notifications of any performance issues, allowing for swift resolution.

– **Detailed Reporting:** Generate in-depth reports that provide insights into network performance trends and potential areas for improvement.

#### Benefits of Using Cisco ThousandEyes for Data Center Performance

Implementing Cisco ThousandEyes in your data center can deliver a range of benefits that contribute to enhanced performance and reliability. Some of the key advantages include:

– **Proactive Issue Resolution:** By identifying potential problems before they escalate, ThousandEyes helps prevent downtime and ensures continuous service delivery.

– **Improved User Experience:** With optimized network performance, users enjoy faster, more reliable access to applications and services.

– **Cost Efficiency:** By reducing downtime and improving operational efficiency, ThousandEyes can help lower overall IT costs.

– **Scalability:** As your business grows, ThousandEyes can scale with you, providing consistent performance monitoring across expanding networks.

#### Real-World Applications

Many organizations have successfully leveraged Cisco ThousandEyes to boost their data center performance. For example, a global financial services company used ThousandEyes to monitor their network and quickly identify a latency issue affecting their trading platform. By resolving the issue promptly, they were able to maintain their competitive edge and deliver a seamless experience to their clients. Similarly, an e-commerce giant utilized ThousandEyes to ensure their website remained responsive during peak shopping seasons, resulting in increased customer satisfaction and sales.

 

Summary: Data Center Network Design

In today’s digital age, data centers are the backbone of countless industries, powering the storage, processing, and transmitting massive amounts of information. However, the efficiency and scalability of data center network design have become paramount concerns. In this blog post, we explored the challenges traditional data center network architectures face and delved into innovative solutions that are revolutionizing the field.

The Limitations of Traditional Designs

Traditional data center network designs, such as three-tier architectures, have long been the industry standard. However, these designs come with inherent limitations that hinder performance and flexibility. The oversubscription of network links, the complexity of managing multiple layers, and the lack of agility in scaling are just a few of the challenges that plague traditional designs.

Enter the Spine-and-Leaf Architecture

The spine-and-leaf architecture has emerged as a game-changer in data center network design. This approach replaces the hierarchical three-tier model with a more scalable and efficient structure. The spine-and-leaf design comprises spine switches, acting as the core, and leaf switches, connecting directly to the servers. This non-blocking, high-bandwidth architecture eliminates oversubscription and provides improved performance and scalability.

Embracing Software-Defined Networking (SDN)

Software-defined networking (SDN) is another revolutionary concept transforming data center network design. SDN abstracts the network control plane from the underlying infrastructure, allowing centralized network management and programmability. With SDN, data center administrators can dynamically allocate resources, optimize traffic flows, and respond rapidly to changing demands.

The Rise of Network Function Virtualization (NFV)

Network Function Virtualization (NFV) complements SDN by virtualizing network services traditionally implemented using dedicated hardware appliances. By decoupling network functions, such as firewalls, load balancers, and intrusion detection systems, from specialized hardware, NFV enables greater flexibility, scalability, and cost savings in data center network design.

Conclusion:

The landscape of data center network design is undergoing a significant transformation. Traditional architectures are being replaced by more scalable and efficient models like the spine-and-leaf architecture. Moreover, concepts like SDN and NFV empower administrators with unprecedented control and flexibility. As technology evolves, data center professionals must embrace these innovations and stay at the forefront of this paradigm shift.