Removing State from Network Functions

 

 

Network Functions

In the world of modern communication, network functions play a crucial role in enabling seamless connectivity and efficient data flow. From the moment we send an email or stream a video, to the secure transmission of sensitive information, network functions silently work behind the scenes, ensuring our digital experiences are smooth and uninterrupted. In this blog post, we will explore the concept of network functions, their significance, and how they contribute to the functioning of today’s interconnected world.

Network functions refer to the tasks performed by various networking devices and software components to manage, control, and secure data traffic within a network. These functions encompass a wide range of operations, including routing, switching, firewalls, load balancing, network address translation (NAT), and intrusion detection systems (IDS). Each of these functions serves a specific purpose in optimizing network performance, enhancing security, and facilitating efficient communication between devices.

 

Highlights: Network Functions

  • The Role of Non-Proprietary Hardware

We have seen a significant technological evolution where network functions can run in software on non-proprietary commodity hardware, whether a grey box or white box deployment model. Taking network functions from a physical appliance and putting them into a virtual appliance is only half the battle.

The move to software provides the on-demand elastically and scale of network security components and quick recovery from failures. However, we are still hindered by one major factor – the state that each network function needs to process.

  • The Tights Coupling of State

We still have the challenges created by the tight coupling of the state and processing for each network function, be it virtual firewalls, load balancer scaling, intrusion protection system (IPS), or even distributed firewalls closer to the workloads for dynamic workload scaling use cases. Having the state tightly coupled with the network functions limits the network functions agility, scalability, and failure recovery.

Compounded by this, we have seen an increase in network complexity. The rise of the public cloud and the emergence of hybrid and multi-cloud has made data center connectivity more complicated and critical than ever.

 



Network Functions.

Key Network Functions Discussion points:


  • Discussion on Network Functions.

  • What is state and a description.

  • Example: Stateless Network Functions.

  • The issues with having state: Scaling

  • The issues with having state: Failure.

 

For pre-information, you may find the following helpful:

  1. Event Stream Processing
  2. NFV Use Cases
  3. ICMPv6

 

A Key Point: Knowledge Check 

 

Back to basics with Network Functions

Virtualization

Virtualization (which generally indicates server virtualization when used as a standalone phrase) refers to the abstraction of the application and operating system from the hardware. Similarly, network virtualization is the abstraction of the network endpoints from the physical arrangement of the network. In other words, network virtualization permits you to group or arrange endpoints on a network independent from their physical location.

Network Virtualization refers to forming logical groupings of endpoints on a network. In this case, the endpoints are abstracted from their physical locations so that VMs (and other assets) can look, behave, and be managed as if they are all on the same physical segment of the network.

 

Importance of Network Functions:

Network functions are the backbone of modern communication systems, making them essential for businesses, organizations, and individuals alike. They provide the necessary infrastructure to connect devices, transmit data, and facilitate the exchange of information in a reliable and secure manner. Without network functions, our digital interactions, such as accessing websites, making online payments, or conducting video conferences, would be nearly impossible.

Types of Network Functions:

1. Routing: Routing functions enable the forwarding of data packets between different networks, ensuring that information reaches its intended destination. This process involves selecting the most efficient path for data transmission based on factors like network congestion, bandwidth availability, and network topology.

2. Switching: Switching functions allow data packets to be forwarded within a local network, connecting devices within the same network segment. Switches efficiently direct packets to their intended destination, minimizing latency and optimizing network performance.

3. Firewalls: Firewalls act as a barrier between internal networks and external networks, protecting against unauthorized access and potential security threats. They monitor incoming and outgoing traffic, filtering and blocking any suspicious or malicious data packets.

4. Load Balancing: Load balancing distributes network traffic across multiple servers to prevent overloading and ensure optimal resource utilization. By evenly distributing workloads, load balancing enhances network performance, scalability, and reliability.

5. Network Address Translation (NAT): NAT allows multiple devices within a private network to share a single public IP address. It translates private IP addresses into public ones, enabling communication with external networks while maintaining the security and privacy of internal devices.

6. Intrusion Detection Systems (IDS): IDS monitor network traffic for any signs of intrusion or malicious activity. They analyze data packets, identify potential threats, and generate alerts or take preventive actions to safeguard the network from unauthorized access or attacks.

 

What is State

Before we delve into the potential ways to solve this problem, mainly by introducing stateless network functions, let us first describe the different types of state. We have two: dynamic and static. The network function processes continuously update the dynamic state. The dynamic state could be anything from a firewall’s connection information to the load balancer’s server mappings.

On the other hand, the static state could include something like pre-configured firewall rules or the IPS signature database. The dynamic state must persist across instance failures and be available to the network functions when scaling in or out. On the other hand, the static state is easy and can be replicated to a network instance upon boot time.

 

Stateless network functions

Stateless Network Functions are a new and disruptive technology that decouples the design of network functions into a stateless process component and a data store layer. There also needs to be some orchestration layer that can monitor the network function instances for load and failure and adjust the number of instances accordingly.

Taking or decoupling the state from a network function enables a more elastic and resilient infrastructure. So how does this work? From a 20,000 bird’s eye view, the network functions become stateless. The statefulness of the application, such as a stateful firewall, is maintained by storing the state in a separate data store. The data store provides the resilience of the state. No state is stored on the individual networking functions themselves.

 

  • Datastore example

The data store can be, for example, RAMCloud. RAMCloud is a distributed key-value storage system with high-speed storage for large-scale applications. It is purposely designed when many servers need low-latency access to a durable data store. RAMCloud is suitable for low-latency access as it’s based primarily on DRAM.  RAMCloud keeps all data in DRAM. As a result, the network functions can read RAMCloud objects remotely over the network in as little as 5μs.

 

Stateless network functions advantages.

Stateless network functions may not be helpful for all but are valid for standard network functions that can be re-designed statelessly. Stateful network functions are helpful for a stateful firewall, intrusion prevention system, network address translator, and load balancer. Removing the state and placing it on a database brings many advantages to network management.

As the state is accessed via a data store, a new instance can be launched, and traffic is immediately directed to it offering elasticity. Secondly, resilience, a new instance, can be spawned instantaneously upon failure.  Finally, as any instance can handle an individual packet, packets traversing different paths do not have asymmetric and multi-path routing issues.

 

Problems with having state: Failure

The majority of network designs have redundancy built-in. It sounds easy when one data center fails to let the secondary take over. When the data center interconnect (DCI) is configured correctly, everything should work upon failover, correct?

Let’s not forget about one little thing called state with a firewall in each data center design. The network address translation (NAT) in the primary data center stores the mapping for two flows, let’s call them F1 and F2. Upon failure, the second firewall in the other data center takes over, and traffic is directed to the new firewall. However, any packets from flows F1 and F2 will not enter the second firewall.

This will result in a failed lookup; existing connections will timeout, causing application failure.  Asymmetric routing causes problems. If a firewall has an established state for a client-to-server connection (SYN packet), if the return SYN-ACK passes through a different firewall, the packet will result in a failed lookup and get dropped.

Some have tried to design distributed active-active firewalls to solve layer three issues and asymmetrical traffic flow over the stateful firewalls. The solution looks perfect. Simply configure both wide area network (WAN) routers to advertise the same IP prefix to the outside world.

This will attract inbound traffic and pass the traffic through the nearest firewall. Nice and easy. The active-active firewalls would exchange flow information, solving the asymmetrical flow problems.? Distributed active-active firewall state across each data center is better in PowerPoint than in real life.

 

Problems with having the state: Scaling

The tight coupling of the state can also cause problems with the scaling of network functions. Scaling out NAT functions will have the same effect as NAT box failure. Packets from flow originating from a different firewall directed to a new instance will result in a failed lookup.

 

Conclusion:

Network functions form the foundation of modern communication systems, enabling us to connect, share, and collaborate in a digitized world. By performing vital tasks such as routing, switching, firewalls, load balancing, NAT, and IDS, network functions ensure the smooth and secure flow of data across networks. Understanding the significance of these functions is crucial for businesses and individuals to harness the full potential of the interconnected world we live in today.

 

data center design

Virtual Data Center Design

 

data center design

Virtual Data Center Design

In today’s rapidly evolving digital landscape, data centers support the growing demand for storage, computing power, and connectivity. However, traditional data centers face challenges like limited scalability, high operating costs, and physical space constraints. Many organizations are turning to virtual data centers as a more efficient and flexible solution to address these issues. In this blog post, we will delve into the concept of virtual data centers, exploring their benefits, key components, and significance in shaping the future of data management.

A virtual data center (VDC) is a software-defined infrastructure that leverages virtualization technologies to pool and abstract computing resources, storage, and networking capabilities. Unlike traditional data centers, which rely on physical servers and hardware, VDCs enable organizations to create, manage, and provision virtual machines (VMs) and applications on demand.

 

  • Gaining Efficiency

Deploying multiple tenants on a shared infrastructure is far more efficient than having single tenants per physical device. With a virtualized infrastructure, each tenant requires isolation from all other tenants sharing the same physical infrastructure.

For a data center network design, each network container requires path isolation, for example, 802.1Q on a shared Ethernet link between two switches, and device virtualization at the different network layers, for example, Cisco Application Control Engine ( ACE ) or Cisco Firewall Services Module ( FWSM ) virtual context. To implement independent paths with this type of data center design, you can create Virtual Routing Forwarding ( VRF ) per tenant and map the VRF to Layer 2 segments.

  • Example: Virtual Data Center Design. Cisco.

More recently, the Cisco ACI network enabled segmentation based on logical security zones known as endpoint groups. Where security constructs known as contracts are needed to communicate between endpoint groups. The Cisco ACI still uses VRFs, but they are used differently. Then we have the Ansible Architecture that can be used with Ansible variables to automate the deployment of the network and security constructs for the virtual data center. This brings consistency and will eliminate human error.

 

Before you proceed, you may find the following posts helpful for pre-information:

  1. Context Firewall
  2. Virtual Device Context
  3. Dynamic Workload Scaling
  4. ASA Failover
  5. Data Center Design Guide

 

Data Center Network Design

Key Virtual Data Center Design Discussion Points:


  • Introduction to Virtual Data Center Design and what is involved.

  • Highlighting the details of VRF-lite and how it works.

  • Critical points on the use of virtual contexts and how to implement them.

  • A final note on load disributon and appliciation tier separation. 

 

Back to basics with data center types.

Numerous kinds of data centers and service models are available. Their category relies on several critical criteria. Such as whether one or many organizations own them, how they serve in the topology of other data centers, and what technologies they use for computing and storage. There are main types of data centers include:

  • Enterprise data centers.
  • Managed services data centers.
  • Colocation data centers.
  • Cloud data centers.

You may build and maintain your own hybrid cloud data centers, lease space within colocation facilities, also known as colos, consume shared compute and storage services, or even use public cloud-based services.

 

Benefits of Virtual Data Centers:

1. Scalability: Virtual data centers offer unparalleled scalability, allowing businesses to quickly expand or contract their infrastructure based on their evolving needs. With the ability to provision additional resources in real time, organizations can quickly adapt to changing workloads, ensuring optimal performance and reducing downtime.

2. Cost Efficiency: Virtual data centers significantly reduce operating costs by eliminating the need for physical servers and reducing power consumption. Consolidating multiple VMs onto a single physical server optimizes resource utilization, improving cost efficiency and lowering hardware requirements.

3. Flexibility: Virtual data centers allow organizations to deploy and manage applications across multiple cloud platforms or on-premises infrastructure. This hybrid cloud approach enables seamless workload migration, disaster recovery, and improved business continuity.

Critical Components of Virtual Data Centers:

1. Hypervisor: At the core of a virtual data center lies the hypervisor, a software layer that partitions physical servers into multiple VMs, each running its operating system and applications. Hypervisors enable the efficient utilization of hardware resources and facilitate VM management.

2. Software-Defined Networking (SDN): SDN allows organizations to define and manage their network infrastructure through software, decoupling network control from physical devices. This technology enhances flexibility, simplifies network management, and enables greater security and agility within virtual data centers.

3. Virtual Storage: Virtual storage technologies, such as software-defined storage (SDS), enable the pooling and abstraction of storage resources. This approach allows for centralized management, improved data protection, and simplified storage provisioning in virtual data centers.

 

Data center network design: VRF-lite

VRF information from a static or dynamic routing protocol is carried across hop-by-hop in a Layer 3 domain. Multiple VLANs in the Layer 2 domain are mapped to the corresponding VRF. VRF-lite is known as a hop-by-hop virtualization technique. The VRF instance logically separates tenants on the same physical device from a control plane perspective.

From a data plane perspective, the VLAN tags provide path isolation on each point-to-point Ethernet link that connects to the Layer 3 network. VRFs provide per-tenant routing and forwarding tables and ensure no server-server traffic is permitted unless explicitly allowed.

virtual and forwarding

 

Service Modules in Active/Active Mode

Multiple virtual contexts

The service layer must also be virtualized for tenant separation. The network services layer can be designed with a dedicated Data Center Services Node ( DSN ) or external physical appliances connected to the core/aggregation. The Cisco DSN data center design cases use virtual device contexts (VDC), virtual PortChannel (vPC), virtual switching system (VSS), VRF, and Cisco FWSM and Cisco ACE virtualization. 

This post will look at a DSN as a self-contained Catalyst 6500 series with ACE and firewall service modules. Virtualization at the services layer can be accomplished by creating separate contexts representing separate virtual devices. Multiple contexts are similar to having multiple standalone devices.

The Cisco Firewall Services Module ( FWSM ) provides a stateful inspection firewall service within a Catalyst 6500. In addition, it offers separation through a virtual security context that can be transparently implemented as Layer 2 or as a router “hop” at Layer 3. The Cisco Application Control Engine ( ACE ) module also provides a range of load-balancing capabilities within a Catalyst 6500.

 

FWSM  features

 ACE features

Route health injection (RHI)

Route health injection (RHI)

Virtualization (context and resource allocation)

Virtualization (context and resource allocation)

Application inspection

Probes and server farm (service health checks and load-balancing predictor)

Redundancy (active-active context failover)

Stickiness (source IP and cookie insert)

Security and inspection

Load balancing (protocols, stickiness, FTP inspection, and SSL termination)

Network Address Translation (NAT) and Port Address Translation (PAT )

NAT

URL filtering

Redundancy (active-active context failover)

Layer 2 and 3 firewalling

Protocol inspection

 

You can offer high availability and efficient load distribution with a context design. The first FWSM and ACE are primary for the first context and standby for the second context. The second FWSM and ACE are primary for the second context and standby for the first context. Traffic is not automatically load-balanced equally across the contexts. Additional configuration steps are needed to configure different subnets in specific contexts.

 

Virtual Firewall and Load Balancing
Diagram: Virtual Firewall and Load Balancing

 

Compute separation

Traditional security architecture placed the security device in a central position, either in “transparent” or “routed” mode. All inter-host traffic had to be routed and filtered by the firewall device located at the aggregation layer before communication could occur. This works well in low virtualized environments when you have few VMs. Still, a high-density model ( heavily virtualized environment ) forces us to reconsider firewall scale requirements at the aggregation layer.

It is recommended to deploy virtual firewalls at the access layer to address the challenge of VM density and the ability to move VMs while keeping their security policies. This creates intra and inter-tenant zones and enables finer security granularity within single or multiple VLANs.

 

Application tier separation

The Network-Centric model relies on VLAN separation for three-tier application deployment for each tier. Each tier should have its VLAN in one VRF instance. If VLAN-to-VLAN communication needs to occur, traffic must be routed via a default gateway where security policies can enforce traffic inspection or redirection.

vShield ( vApp ) virtual appliance can inspect inter-VM traffic among ESX hosts, and layers 2,3,4 and 7 filters are supported. A drawback of this approach is that the FW can become a choke point. Compared to the Network-Centric model, the Server-Centric model uses separate VM vNICs to daisy chain tiers together.

 

Data center network design with Security Groups

The concept of Security groups replacing subnet-level firewalls with per-VM firewalls/ACLs. With this approach, there is no traffic tromboning or single choke points. It can be implemented with Cloudstack, OpenStack ( Neutron plugin extension ), and VMware vShield Edge. Security groups are elementary; you assign VMs and specify filters between groups. 

Security groups are suitable for policy-based filtering but don’t consider other functionality where data plane states are required for replay attacks. Security groups give you echo-based functionality and should be good enough for current TCP stacks that have been hardened over the last 30 years. But if you require full stateful inspection and do not regularly patch your servers, then you should implement a complete stateful-based firewall.

 

data center design

LISP networking

Data Center Design with Active Active design

Active Active Data Center Design

In today's digital age, where businesses heavily rely on uninterrupted access to their applications and services, data center design plays a pivotal role in ensuring high availability. One such design approach is the active-active design, which offers redundancy and fault tolerance to mitigate the risk of downtime. This blog post will explore the active-active data center design concept and its benefits.

Active-active data center design refers to a configuration where two or more data centers operate simultaneously, sharing the load and providing redundancy for critical systems and applications. Unlike traditional active-passive setups, where one data center operates in standby mode, the active-active design ensures that both are fully active and capable of handling the entire workload.

Table of Contents

Highlights: Active Active Data Center

Redundant data centers

Redundant data centers are essentially two or more in different physical locations. This enables organizations to move their applications and data to another data center if they experience an outage. This also allows for load balancing and scalability, ensuring the organization’s services remain available.

Redundant data centers are generally located in geographically dispersed locations. This ensures that if one of the data centers experiences an issue, the other can take over, thus minimizing downtime. These data centers should also be connected via a high-speed network connection, such as a dedicated line or virtual private network, to allow seamless data transfers between the locations.

Related: Before you proceed, you may find the following useful:

  1. Data Center Topologies
  2. LISP Protocol
  3. Data Center Network Design
  4. ASA Failover
  5. LISP Hybrid Cloud
  6. LISP Control Plane

Active active data center

Increased dependence on East-West traffic

Clustered Applications

Multi-Tenancy

Business Continuity

Workload Mobility

Back to Basics: Active-active Data Center Design Cisco.

At its core, an active active data center is based on fault tolerance, redundancy, and scalability principles. This means that the active data center should be designed to withstand any hardware or software failure, have multiple levels of data storage redundancy, and scale up or down as needed.

The data center also provides an additional layer of security. It is designed to protect data from unauthorized access and malicious attacks. It should also be able to detect and respond to any threats quickly and in a coordinated manner.

To ensure that the data center is functioning correctly, it is essential to have a comprehensive monitoring and management system in place. This system should be designed to track the performance of the data center, detect any problems, and provide the necessary alerting mechanisms. It should also provide insights into how the data center operates so that any necessary changes can be made.

Cisco Validated Design

Cisco has validated this design, freely available on the Cisco site. In summary, they have tested a variety of combinations such as VSS-VSS, VSS-vPV, and vPC-vPC and validated the design with 200 Layer 2 VLANs and 100 SVIs or 1000 VLANs and 1000 SVI with static routing.

At the time of writing, the M series for the Nexus 7000 supports native encryption of Ethernet frames through the IEEE 802.1AE standard. This implementation uses Advanced Encryption Standard ( AES ) cipher and a 128-bit shared key.

Lab Guide: Cisco ACI

In the following lab guide, we demonstrate Cisco ACI. To extend Cisco ACI, we have different designs, for example, multi-site and multi-pod. This type of design overcomes many challenges of raising a data center, which we will discuss in this post, such as extending layer 2 networks.

One crucial value of the Cisco ACI is the COOP database that maps endpoints in the network. The following screenshots show the synchronized COOP database across spines, even in different data centers. Notice that the bridge domain VNID is mapped to the MAC address. The COOP database is unique to the Cisco ACI.

COOP database
Diagram: COOP database

The Challenge: Layer 2 is Weak.

The challenge of data center design is “Layer 2 is weak & IP is not mobile.” In the past, best practices recommended that networks from distinct data centers be connected through Layer 3 ( routing ), isolating the known Layer 2 turmoil. However, the business is driving the application requirements, changing the connectivity requirements between data centers. The need for an active data center has been driven by the following. It is generally recommended to have Layer 3 connections with path separation through Multi-VRF, P2P VLANs, or MPLS/VPN, along with a modular building block data center design.

Yet, some applications cannot function over a Layer 3 environment. For example, most geo clusters require Layer 2 adjacency between their nodes, whether for heartbeat and connection ( status and control synchronization ) state information or the requirement to share virtual IP.

MAC addresses to facilitate traffic handling in case of failure. However, some clustering products ( Veritas, Oracle RAC ) support communication over Layer 3 but are a minority and don’t represent the general case.

Defining active data centers

The term active-active refers to using at least two data centers where both can service an application at any time, so each functions as an active application site. The demand for active-active data center architecture is to accomplish seamless workload mobility and enable distributed applications along with the ability to pool and maximize resources.  

We must first have active-active data center infrastructure for an active/active application setup. Remember that the network is just one key component of active/active data centers). An active-active DC can be divided into two halves from a pure network perspective:-

  1. Ingress Traffic – inbound traffic
  2. Egress Traffic – outbound traffic
active active data center
Diagram: Active active data center. Scenario. Source is twoearsonemouth

Active Active Data Center and VM Migration

Migrating applications and data to virtual machines (VMs) are becoming increasingly popular as organizations seek to reduce their IT costs and increase the efficiency of their services. VM migration moves existing applications, data, and other components from a physical server to a virtualized environment. This process is becoming increasingly more cost-effective and efficient for organizations, eliminating the need for additional hardware, software, and maintenance costs.

Virtual Machine migration between data centers increases application availability, Layer 2 network adjacency between ESX hosts is currently required, and a consistent LUN must be maintained for stateful migration. In other words, if the VM loses its IP address, it will lose its state, and the TCP sessions will drop, resulting in a cold migration ( VM does a reboot ) instead of a hot migration ( VM does not reboot ).

Due to the stretched VLAN requirement, data center architects started to deploy traditional Layer 2 over the DCI and, unsurprisingly, were faced with exciting results. Although flooding and broadcasts are necessary for IP communication in Ethernet networks, they can become dangerous in a DCI environment.

Traffic Tramboning

Traffic tromboning can also be formed between two stretched data centers, so nonoptimal internal routing happens within extended VLANs. Trombones, by their very nature, create a network traffic scalability problem. Addressing this through load balancing among multiple trombones is challenging since their services are often stateful.

Traffic tromboning can affect either ingress or egress traffic. On egress, you can have FHRP filtering to isolate the HSRP partnership and provide an active/active setup for HSRP. On ingress, you can have GSLB, Route Injection, and LISP.

Traffic Tramboning
Diagram: Traffic Tramboning. Source is Silvanogai

Cisco Active-active data center design and virtualization technologies

To overcome many of these problems, virtualization technologies can be used for Layer 2 extensions between data centers. They include vPC, VSS, Cisco FabricPath, VPLS, OTV, and LISP with its Internet locator design. In summary, we have different technologies that can be used for LAN extensions, and the primary mediums in which they can be deployed are Ethernet, MPLS, and IP.

    1.  Ethernet: VSS and vPC or Fabric Path
    2. MLS: EoMPLS and A-VPLS and H-VPLS
    3.  IP: OTV
    4. LISP

 

Ethernet Extensions and Multi-Chassis EtherChannel ( MEC )

Requires protected DWDM or direct fibers and works between two data centers only. It cannot support multi-datacenter topology, i.e., a full mesh of data centers, but it can help hub and spoke topologies.

Previously, LAG could only terminate on one physical switch. Both VSS-MEC and vPC are port-channeling concepts extending link aggregation to two separate physical switches. This allows for creating L2 typologies based on link aggregation, eliminating the dependency on STP, thus enabling you to scale available Layer 2 bandwidth by bonding the physical links.

Because vPC and VSS create a single connection from an STP perspective, disjoint STP instances can be deployed in each data center. Such isolation can be achieved with BPDU Filtering on the DCI links or Multiple Spanning Tree ( MST ) regions on each site.

At the time of writing, vPC does not support Layer 3 peering, but if you want an L3 link, create one, as this does not need to run on dark fiber or protected DWDM, unlike the extended Layer 2 links.

 

Ethernet Extension and Fabric path

The fabric path allows network operators to design and implement a scalable Layer 2 fabric, allowing VLANs to help reduce the physical constraints on server location. It provides a high availability design with up to 16 active paths at layer 2, each path a 16-member port channel for Unicast and Multicast.

This enables the MSDC networks to have flat typologies, separating nodes by a single hop ( equidistant endpoints ). Cisco has not targeted Fabric Path as a primary DCI solution as it does not have specific DCI functions compared to OTV and VPLS.

Its primary purpose is for Clos-based architectures. But if you have the requirement to interconnect 3 or more sites, the Fabric path is a valid solution when you have short distances between your DCs via high-quality point-to-point optical transmission links.

Your WAN links must support Remote Port Shutdown and microflapping protection. By default, OTV and VPLS should be the first solutions considered as they are Cisco-validated designs with specific DCI features, e.g., OTV can flood unknown unicast for particular VLANs.

FabricPath
Diagram: FabricPath. Source is Cisco

 

IP Core with Overlay Transport Virtualization ( OTV ).

OTV provides a dynamic encapsulation with multipoint connectivity of up to 10 sites ( NX-OS 5.2 supports 6 sites, and NX-OS 6.2 supports 10 sites ). OTV, also known as Over-The-Top virtualization, is a specific DCI technology enabling Layer 2 extension across data center sites by employing a MAC in IP encapsulation with built-in loop prevention and failure boundary preservation.

There is no data plane learning. Instead, the overlay control plane ( Layer 2 IS-IS ) on the provider’s network facilitates all unicast and multicast learning between sites. OTV has been supported on the Nexus 7000 since the 5.0 NXOS Release and ASR 1000 since the 3.5 XE Release. OTV as a DCI has robust high availability, and most failures can be sub-sec convergence with only extreme and very unlikely failures such as device down resulting in <5 seconds.

 

Locator ID/Separator Protocol ( LISP)

Locator ID/Separator Protocol ( LISP) has a lot of applications and, as the name suggests, separates the location and the identifier of the network hosts, making it possible for VMs to move across subnet boundaries while still retaining their IP address and enabling advanced triangular routing designs.

LISP works well when you have to move workloads and also when you have to distribute workloads across data centers, making it a perfect complementary technology for an active-active data center design. It provides you with the following:

  • a) Global IP mobility across subnets for disaster recovery and cloud bursting ( without LAN extension ) and optimized routing across extended subnet sites.
  • b) Routing with extended subnets for active/active data centers and distributed clusters ( with LAN extension).
LISP networking
Diagram: LISP Networking. Source is Cisco

LISP answers the problems with ingress and egress traffic tromboning. It has a location mapping table, so when a host move is detected, updates are automatically triggered, and ingress routers (ITRs or PITRs) send traffic to the new location. From an ingress path flow inbound on the WAN perspective, LISP can answer our little problems with BGP in controlling ingress flows. Without LISP, we are limited to specific route filtering, meaning if you have a PI Prefix consisting of a /16.

If you break this up and advertise into 4 x /18, you may still get poor ingress load balancing on your DC WAN links; even if you were to break this up to 8 x /19, the results might still be unfavorable.

LISP works differently than BGP because a LISP proxy provider would advertise this /16 for you ( you don’t advertise the /16 from your DC WAN links ) and send traffic at 50:50 to our DC WAN links. LISP can get a near-perfect 50:50 conversion rate at the DC edge.

Benefits of Active-Active Data Center Design:

1. Enhanced Redundancy: With active-active design, organizations can achieve higher levels of redundancy by distributing the workload across multiple data centers. This redundancy ensures that even if one data center experiences a failure or maintenance downtime, the other data center seamlessly takes over, minimizing the impact on business operations.

2. Improved Performance and Scalability: Active-active design enables organizations to scale their infrastructure horizontally by distributing the load across multiple data centers. This approach ensures that the workload is evenly distributed, preventing any single data center from becoming a performance bottleneck. It also allows businesses to accommodate increasing demands without compromising performance or user experience.

3. Reduced Downtime: The active-active design significantly reduces the risk of downtime compared to traditional architectures. In the event of a failure, the workload can be immediately shifted to the remaining active data center, ensuring continuous availability of critical services. This approach minimizes the impact on end-users and helps organizations maintain their reputation for reliability.

4. Disaster Recovery Capabilities: Active-active data center design provides a robust disaster recovery solution. Organizations can ensure that their critical systems and applications remain operational despite a catastrophic failure at one location by having multiple, geographically distributed data centers. This design approach minimizes the risk of data loss and provides a seamless failover mechanism.

Implementation Considerations:

Implementing an active-active data center design requires careful planning and consideration of various factors. Here are some key considerations:

1. Network Design: A robust and resilient network infrastructure is crucial for active-active data center design. Implementing load balancers, redundant network links, and dynamic routing protocols can help ensure seamless failover and optimal traffic distribution.

2. Data Synchronization: Organizations need to implement effective data synchronization mechanisms to maintain data consistency across multiple data centers. This may involve deploying real-time replication, distributed databases, or file synchronization protocols.

3. Application Design: Applications must be designed to be aware of the active-active architecture. They should be able to distribute the workload across multiple data centers and seamlessly switch between them in case of failure. Application-level load balancing and session management become critical in this context.

Active-active data center design offers organizations a robust solution for high availability and fault tolerance. By distributing the workload across multiple data centers, businesses can ensure uninterrupted access to critical systems and applications. The enhanced redundancy, improved performance, reduced downtime, and disaster recovery capabilities make active-active design an ideal choice for organizations striving to provide seamless and reliable services in today’s digital landscape.

Multicast VXLAN

Data Center Network Design

Data Center Network Design

Data centers are crucial in today’s digital landscape, serving as the backbone of numerous businesses and organizations. A well-designed data center network ensures optimal performance, scalability, and reliability. This blog post will explore the critical aspects of data center network design and its significance in modern IT infrastructure.

Efficient data center network design is critical for meeting the growing demands of complex applications, high data traffic, and rapid data processing. It enables seamless connectivity, improves application performance, and enhances user experience. A well-designed network also ensures data security, disaster recovery, and efficient resource utilization.

Table of Contents

Highlights: Data Center Network Design

The goal of data center design and interconnection network is to transport end-user traffic from A to B without any packet drops, yet the metrics we use to achieve this goal can be very different. The data center is evolving and progressing through various topology and technology changes, resulting in various data center network designs.

The new data center control plane we are seeing today, such as Fabric Path, LISP, THRILL, and VXLAN, is being driven by a change in the end user’s requirement; the application has changed.

These new technologies may address new challenges, yet the fundamental question of where to create the Layer 2/Layer 3 boundary and the need for Layer 2 in the access layer remains the same. The question stays the same, yet the technologies available to address this challenge have evolved.

The use of Open Networking

We also have the Open Networking Foundation ( ONF ) with open networking. Open networking describes a network that uses open standards and commodity hardware. So, consider open networking in terms of hardware and software. Unlike a vendor approach like Cisco, this gives you much more choice with what hardware and software you use to make up and design your network.

Related: Before you proceed, you may find the following useful:

  1. ACI Networks
  2. IPv6 Attacks
  3. SDN Data Center
  4. Active Active Data Center Design
  5. Virtual Switch

Data Center Control Plane

Key Data Center Network Design Discussion Points:


  • Introduction to data center network design and what is involved.

  • Highlighting the details of VLANs and virtualization.

  • Technical details on the issues of Layer 2 in data centers. 

  • Scenario: Cisco FabricPath and DFA.

  • Details on overlay networking and Cisco OTV.

The Rise of Overlay Networking

What has the industry introduced to overcome these limitations and address the new challenges? – Network virtualization and overlay networking. In its simplest form, an overlay is a dynamic tunnel between two endpoints that enables Layer 2 frames to be transported between them. In addition, these overlay-based technologies provide a level of indirection that enables switch table sizes to not increase in the order of the number of supported end-hosts.

Today’s overlays are Cisco FabricPath, THRILL, LISP, VXLAN, NVGRE, OTV, PBB, and Shorted Path Bridging. They are essentially virtual networks that sit on top of a physical network, often the physical network not being aware of the virtual layer above it.

Lab Guide: VXLAN

The following lab guide displays a VXLAN network. We are running VXLAN in multicast mode. Multicast VXLAN is a variant of VXLAN that utilizes multicast-based IP multicast for transmitting overlay network traffic. VXLAN is an encapsulation protocol that extends Layer 2 Ethernet networks over Layer 3 IP networks.

Linking multicast enables efficient and scalable communication within the overlay network. Notice the multicast group of 239.0.0.10 and the route of 239.0.0.10 forwarding out the tunnel interface. We have multicast enabled on all Layer 3 interfaces, including the core that consists of Spine A and Spine B.

Multicast VXLAN
Diagram: Multicast VXLAN

Traditional Data Center Network Design

How do routers create a broadcast domain boundary? Firstly, using the traditional core, distribution, and access model, the access layer is layer 2, and servers served to each other in the access layer are in the same IP subnet and VLAN. The same access VLAN will span the access layer switches for east-to-west traffic, and any outbound traffic is via a First Hop Redundancy Protocol ( FHRP ) like Hot Standby Router Protocol ( HSRP ).

Servers in different VLANs are isolated from each other and cannot communicate directly; inter-VLAN communications require a Layer 3 device. Virtualization’s humble beginnings started with VLANs, which were used to segment traffic at Layer 2. It was expected to find single VLANs spanning an entire data center fabric.

 

VLAN and Virtualization

The virtualization side of VLANs comes from two servers physically connected to different switches. Assuming the VLAN spans both switches, the same VLAN can communicate with each server. Each VLAN can be defined as a broadcast domain in a single Ethernet switch or shared among connected switches.

Whenever a switch interface belonging to a VLAN receives a broadcast frame ( destination MAC is ffff.ffff.ffff), the device must forward this frame to all other ports defined in the same VLAN.

This approach is straightforward in design and is almost like a plug-and-play network. The first question is, why not connect everything in the data center into one large Layer 2 broadcast domain? Layer 2 is a plug-and-play network, so why not?

 

The issues of Layer 2

The reason is that there are many scaling issues in large layer 2 networks. Layer 2 networks don’t have controlled / efficient network discovery protocols. Address Resolution Protocol ( ARP ) is used to locate end hosts and uses Broadcasts and Unicast replies. A single host might not generate much traffic, but imagine what would happen if 10,000 hosts were connected to the same broadcast domain. VLANs span an entire data center fabric, which can bring a lot of instability due to loops and broadcast storms.

 

No hierarchy in MAC addresses

There is also no hierarchy in MAC addressing. Unlike Layer 3 networks, where you can have summarization and hierarchy addressing, MAC addresses are flat. Creating several thousand hosts to a single broadcast domain will create large forwarding information tables.

Because end hosts are potentially not static, they are likely to be attached and removed from the network at regular intervals, creating a high rate of change in the control plane. You can, of course, have a large Layer 2 data center with multiple tenants if they don’t need to communicate with each.

The shared services requirements, such as WAAS or load balancing, can be solved by spinning up the service VM in the tenant’s Layer 2 broadcast domain. This design will hit scaling and management issues. There is a consensus to move from a Layer 2 design to a more robust and scalable Layer 3 design.

But why is there still a need for Layer 2 in the data center topologies? One solution is Layer 2 VPN with EVPN. But first, let us have a look at Cisco DFA.

The Requirement for Layer 2 in Data Center Network Design

  • Servers that perform the same function might need to communicate with each other due to a clustering protocol or simply as part of the application’s inner functions. If the communication is clustering protocol heartbeats or some server-to-server application packets that are not routable, then you need this communication layer to be on the same VLAN, i.e., Layer 2 domain, as these types of packets are not routable and don’t understand the IP layer.

  • Stateful devices such as firewalls and load balancers need Layer 2 adjacency as they constantly exchange connection and session state information.

  • Dual-homed servers: Single server with two server NICs and one NIC to each switch will require a layer 2 adjacency if the adapter has a standby interface that uses the same MAC and IP addresses after a failure. In this situation, the active and standby interfaces must be on the same VLAN and use the same default gateway.

  • Suppose your virtualization solutions cannot handle Layer 3 VM mobility. In that case, you may need to stretch VLANs between PODS / Virtual Resource Pools or even data centers so you can move VMs around the data center at Layer 2 ( without changing their IP address ).

 

Data Center Design and Cisco DFA

Cisco went one giant step and recently introduced a data center fabric with Dynamic Fabric Automaton ( DFA ), similar to Juniper QFabric, which offers you both Layer 2 switching and Layer 3 routing at the access layer / ToR. Firstly, they have Fabric Path ( IS-IS for Layer 2 connectivity ) in the core, which gives you optimal Layer 2 forwarding between all the edges.

Then they configure the same Layer 3 address on all the edges, which gives you optimal Layer 3 forwarding across the whole Fabric.

On edge, you can have Layer 3 Leaf switches, for example, the Nexus 6000 series, or integrate with Layer 2-only devices like the Nexus 5500 series or the Nexus 1000v. You can also connect external routers or USC or FEX to the Fabric. In addition to running IS-IS as the data center control plane, DFA uses MP-iBGP, with some Spine nodes being the Route Reflector to exchange IP forwarding information.

Cisco FabricPath

DFA also employs a Cisco FabricPath technique called “Conversational Learning.” The first packet triggers a full RIB lookup, and the subsequent packets are switched in the hardware-implemented switching cache.

This technology provides Layer 2 mobility throughout the data center while providing optimal traffic flow using Layer 3 routing. Cisco commented, “DFA provides a scale-out architecture without congestion points in the network while providing optimized forwarding for all applications.”

Terminating Layer 3 at the access / ToR has clear advantages and disadvantages. Other benefits include reducing the size of the broadcast domain, which comes at the cost of reducing the mobility domain across which VMs can be moved.

Terminating Layer 3 at the accesses can also result in sub-optimal routing because there will be hair pinning or traffic tromboning of across-subnet traffic, taking multiple and unnecessary hops across the data center fabric.

 

The role of the Cisco Fabricpath

Cisco FabricPath is a Layer 2 technology that provides Layer 3 benefits, such as multipathing the classical Layer 2 networks using IS-IS at Layer 2. This eliminates the need for spanning tree protocol, avoiding the pitfalls of having large Layer 2 networks. As a result, Fabric Path enables a massive Layer 2 network that supports multipath ( ECMP ). THRILL is an IEEE standard that, like Fabric Path, is a Layer 2 technology that provides the same Layer 3 benefits as Cisco FabricPath to the Layer 2 networks using IS-IS.

LISP is popular in Active / Active data centers for DCI route optimization/mobility and separates the host’s location and the identifier ( EID ), allowing VMs to move across subnet boundaries while keeping the endpoint identification. LISP is often referred to as an Internet locator. 

That can enable some designs of triangular routing. Popular encapsulation formats include VXLAN ( proposed by Cisco and VMware ) and STT (created by Nicira but will be deprecated over time as VXLAN comes to dominate ).

 

Video: LISP networking

In the following video, we will demonstrate the use of LISP in networking. It’s a hands-on demonstration that goes through the various components of a LISP network and how each component operates.

Hands on Video Series – Enterprise Networking | LISP Configuration Intro
Prev 1 of 1 Next
Prev 1 of 1 Next

The role of OTV

OTV is a data center interconnect ( DCI ) technology enabling Layer 2 extension across data center sites. While Fabric Path can be a DCI technology over short distances with dark fiber, OTV has been explicitly designed for DCI. In contrast, the Fabric Path data center control plane is primarily used for intra-DC communications.

Failure boundary and site independence are preserved in OTV networks because OTV uses a data center control plane protocol to sync MAC addresses between sites and prevent unknown unicast floods. In addition, recent IOS versions can allow unknown unicast floods for certain VLANs, which are unavailable if you use Fabric Path as the DCI technology.

 

The Role of Software-defined Networking (SDN)

Another potential trade-off between data center control plane scaling, Layer 2 VM mobility, and optimal ingress/egress traffic flow would be software-defined networking ( SDN ). At a basic level, SDN can create direct paths through the network fabric to isolate private networks effectively.

An SDN network allows you to choose the correct forwarding information on a per-flow basis. This per-flow optimization eliminates VLAN separation in the data center fabric. Instead of using VLANs to enforce traffic separation, the SDN controller has a set of policies allowing traffic to be forwarded from a particular source to a destination.

The ACI Cisco borrows concepts of SDN to the data center. It operates over a leaf and spine design and traditional routing protocols such as BGP and IS-IS. However, it brings a new way to manage the data center with new constructs such as Endpoint Groups (EPGs). In addition, no more VLANs are needed in the data center as everything is routed over a Layer 3 core, with VXLAN as the overlay protocol.

Summary: Recap on Data Center Design

Data centers are the backbone of modern technology infrastructure, providing the foundation for storing, processing, and transmitting vast amounts of data. A critical aspect of data center design is the network architecture, which ensures efficient and reliable data transmission within and outside the facility.  1. Scalability and Flexibility

One of the primary goals of data center network design is to accommodate the ever-increasing demand for data processing and storage. Scalability ensures the network can grow seamlessly as the data center expands. This involves designing a network supporting many devices, servers, and users without compromising performance or reliability. Additionally, flexibility is essential to adapt to changing business requirements and technological advancements.

  • Redundancy and High Availability

Data centers must ensure uninterrupted access to data and services, making redundancy and high availability critical for network design. Redundancy involves duplicating essential components, such as switches, routers, and links, to eliminate single points of failure. This ensures that if one component fails, there are alternative paths for data transmission, minimizing downtime and maintaining uninterrupted operations. High availability further enhances reliability by providing automatic failover mechanisms and real-time monitoring to detect and address network issues promptly.

  • Traffic Optimization and Load Balancing

Efficient data flow within a data center is vital to prevent network congestion and bottlenecks. Traffic optimization techniques, such as Quality of Service (QoS) and traffic prioritization, can be implemented to ensure that critical applications and services receive the necessary bandwidth and resources. Load balancing is crucial in distributing network traffic evenly across multiple servers or paths, preventing overutilization of specific resources and optimizing performance.

  • Security and Data Protection

Data centers house sensitive information and mission-critical applications, making security a top priority. The network design should incorporate robust security measures, including firewalls, intrusion detection systems, and encryption protocols, to safeguard data from unauthorized access and cyber threats. Data protection mechanisms, such as backups, replication, and disaster recovery plans, should also be integrated into the network design to ensure data integrity and availability.

  • Monitoring and Management

Proactive monitoring and effective management are essential for maintaining optimal network performance and addressing potential issues promptly. The network design should include comprehensive monitoring tools and centralized management systems that provide real-time visibility into network traffic, performance metrics, and security events. This enables administrators to promptly identify and resolve network bottlenecks, security breaches, and performance degradation.

Data center network design is critical in ensuring efficient, reliable, and secure data transmission within and outside the facility. Scalability, redundancy, traffic optimization, security, and monitoring are key considerations for designing a robust, high-performance network. By implementing best practices and staying abreast of emerging technologies, data centers can build networks that meet the growing demands of the digital age while maintaining the highest levels of performance, availability, and security.

SDN data center

SDN Data Center

SDN Data Center

The world of technology consists of data centers that play a crucial role in storing and managing vast amounts of information. Traditional data centers, however, have faced challenges in terms of scalability, flexibility, and efficiency. Enter Software-Defined Networking (SDN), a groundbreaking approach reshaping the landscape of data centers. In this blog post, we will explore the concept of SDN, its benefits, and its potential to revolutionize data centers as we know them.

In SDN, the functions of network nodes (switches, routers, bare metal servers, etc.) are abstracted so they can be managed globally and coherently. A single controller, the SDN controller, manages the whole entity coherently by detaching the network device's decision-making part (control plane) from its operational part (data plane).

The name "Software Defined" comes from this controller, allowing "network programmability." The Open Networking Foundation (ONF) was founded in March 2011 to promote the concept and development of OpenFlow. In 2009, the University of Stanford (US) and its research center (ONRC) published the first OpenFlow specifications, one of the protocols used by SDN controllers.

Table of Contents

Highlights: SDN Data Center

What problems do we have, and what are we doing about them? Ask yourself: Are data centers ready and available for today’s applications and tomorrow’s emerging data center applications? Businesses and applications are putting pressure on networks to change, ushering in a new era of data center design. From 1960 to 1985, we started with mainframes and supported a customer base of about one million users.

Example: ACI Cisco

ACI Cisco, short for Application Centric Infrastructure, is a software-defined networking (SDN) solution developed by Cisco Systems. It provides a holistic approach to managing and automating network infrastructure, allowing organizations to achieve agility, scalability, and security all in one framework.

Example: Open Networking Foundation

We also have the Open Networking Foundation ( ONF ) leverages SDN principles. Along with employing open-source platforms and defined standards to build and operate open networking. The ONF’s portfolio includes several areas such as mobile, broadband, and data center running on white box hardware.

Related: Before you proceed, you may find the following post helpful:

  1. DNS Structure
  2. Data Center Network Design
  3. Software Defined Perimeter
  4. ACI Networks
  5. Layer 3 Data Center

Data Center Applications

Key SDN Data Center Design Discussion Points:


  • Introduction to the SDN Data Center and what is involved.

  • Highlighting the details of the different types of traffic patterns.

  • Technical details on the issues with spanning tree protocol. 

  • Scenario: Building a scalable data center.

  • Details on VXLAN and the use of overlay networking. 

The Future of Data Centers 

Exploring Software-Defined Networking (SDN)

In recent years, the rapid advancement of technology has given rise to various innovative solutions transforming how data centers operate. One such revolutionary technology is Software-Defined Networking (SDN), which has garnered significant attention and is set to reshape the landscape of data centers as we know them. In this blog post, we will delve into the fundamentals of SDN and explore its potential to revolutionize data center architecture.

SDN is a networking paradigm that separates the control plane from the data plane, enabling centralized control and programmability of network infrastructure. Unlike traditional network architectures, where network devices make independent decisions, SDN offers a centralized management approach, providing administrators with a holistic view and control over the entire network.

Lab Guide: Cisco ACI

The following screenshots show the topology of the Cisco ACI. The design follows the layout of a leaf and spine architecture. The leaf switches connect to the spines and not to each other. All workloads and even WAN networks connect to the leaf layer.

The ACI goes through what is known as Fabric Discovery, where much of this is automated for you, borrowing the main principle of an SDN data center of automation. As you can see below, the fabric has been successfully discovered. There are three registered nodes – Spine, Leaf-a, and Leaf-b. The ACI is based on the Cisco Nexus 9000 Series.

SDN data center
Diagram: Cisco ACI fabric checking. 

The Benefits of SDN in Data Centers

Enhanced Network Flexibility and Scalability:

SDN allows data center administrators to allocate network resources dynamically based on real-time demands. With SDN, scaling up or down becomes seamless, resulting in improved flexibility and agility. This capability is crucial in today’s data-driven environment, where rapid scalability is essential to meet growing business demands.

Simplified Network Management:

SDN abstracts the complexity of network management by centralizing control and offering a unified view of the network. This simplification enables more efficient troubleshooting, faster provisioning of services, and streamlined network management, ultimately reducing operational costs and increasing overall efficiency.

Increased Network Security:

By offering a centralized control plane, SDN enables administrators to implement stringent security policies consistently across the entire data center network. SDN’s programmability allows for dynamic security measures, such as traffic isolation and malware detection, making it easier to respond to emerging threats.

SDN and Network Virtualization:

SDN and network virtualization are closely intertwined, as SDN provides the foundation for implementing network virtualization in data centers. By decoupling network services from physical infrastructure, virtualization enables the creation of virtual networks that can be customized and provisioned on demand. SDN’s programmability further enhances network virtualization by allowing the rapid deployment and management of virtual networks.

Back to Basics: SDN Data Center

From 1985 to 2009, we moved to the personal computer, client/server model, and LAN /Internet model, supporting a customer base of hundreds of millions. From 2009 to 2020+, the industry has completely changed. We have various platforms (mobile, social, big data, and cloud) with billions of users, and it is estimated that the new IT industry will be worth 4.8T. All of these are forcing us to examine the existing data center topology.

SDN data center architecture is a type of architectural model that adds a level of abstraction to the functions of network nodes. These nodes may include switches, routers, bare metal servers, etc.), to manage them globally and coherently. So, with an SDN topology, we have a central place to work a disparate network of various devices and device types.

We will discuss the SDN topology in more detail shortly. At its core, SDN enables the entire network to be centrally controlled, or ‘programmed,’ using a software SDN application layer. The significant advantage of SDN is that it allows operators to manage the whole network consistently, regardless of the underlying network technology.

SDN Data Center
SDN Data Center

Statistics don’t lie.

The customer has changed and is making us change our data center topology. Content doubles over the next two years, and emerging markets may overtake mature markets. We expect 5,200 GB of data/per person created in 2020. These new demands and trends are putting a lot of duress on the amount of content that will be made, and how we serve and control this content poses new challenges to data networks.

Knowledge check for other software-defined data center market

The software-defined data center market is considerable. In terms of revenue, it was estimated at $43.178 billion in 2020. However, this has grown significantly; now, the software-defined data center market will grow to $120.3 billion by 2025, representing a CAGR of 22.4%.

Knowledge check for SDN data center architecture and SDN Topology

Software Defined Networking (SDN) simplifies computer network management and operation. It is an approach to network management and architecture that enables administrators to manage network services centrally using software-defined policies. In addition, the SDN data center architecture enables greater visibility and control over the network by separating the control plane from the data plane. Administrators can control routing, traffic management, and security by centralized managing networks. With global visibility, administrators can control the entire network. They can then quickly apply network policies to all devices by creating and managing them efficiently.

The Value: SDN Topology

An SDN topology separates the control plane from the data plane connected to the physical network devices. Separating the control plane from the physical network devices allows for more excellent network management and configuration flexibility. Configuring the control plane can create a more efficient and scalable network.

There are three layers in the SDN topology: the control plane, the data plane, and the physical network. The control plane is responsible for controlling the data plane, which is the layer that carries the data packets. The control plane is also responsible for setting up the virtual networks, configuring the network devices, and managing the overall SDN topology.

A personal network impact assessment report

I recently approved a network impact assessment for various data center network topologies. One of my customers was looking at rate-limiting current data transfer over the WAN ( Wide Area Network ) at 9.5mbps over 10 hours for 34GB of data transfer at an off-prime time window, and this particular customer plans to triple that volume over the next 12 months due to application and service changes.

They result in a WAN upgrade and DR ( Disaster Recovery ) scope change. Big Data, Applications, Social Media, and Mobility force architects to rethink how we engineer networks. We should concentrate more on scale, agility, analytics, and management.

 SDN Data Center Architecture: The 80/20 traffic rule

The data center design was based on the 80/20 traffic pattern rule with Spanning Tree Protocol ( 802.1D ), where we have a root, and all bridges build a loop-free path to that root, resulting in half ports forwarding and half in blocking state—completely wasting your bandwidth even though we can load balance based on a certain number of VLANs forwarding on one uplink and another set of VLANs forwarding on the secondary uplink.

We still face the problems and scalability of having large Layer 2 domains in your data center design. Spanning tree is not a routing protocol; it’s a loop prevention protocol, and as it has many disastrous consequences, it should be limited to small data center segments.

SDN Data Center

Data Center Stability


Layer 2 to the Core layer

STP blocks reduandant links

Manual pruning of VLANs for redudancy design

Rely on STP convergence for topology changes

Efficient and stable design

 

Data Center Topology: The Shifting Traffic Patterns

The traffic patterns have shifted, and the architecture needs to adapt. Before, we focused on 80% leaving the DC, while now, a lot of traffic is going east to west and staying within the DC. The original traffic pattern made us design a typical data center style with access, core, and distribution based on Layer 2, leading to Layer 3 transport. The route you can approach was adopted as Layer 3, which adds stability to Layer 2 by controlling broadcast and flooding domains.

The most popular data architecture in deployment today is based on very different requirements, and the business is looking for large Layer 2 domains to support functions such as VMotion. We need to meet the challenge of future data center applications, and as new apps come out with unique requirements, it isnt easy to make adequate changes to the network due to the protocol stack used. One way to overcome this is with overlay networking and VXLAN.

The Issues with Spanning Tree

The problem is that we rely on the spanning tree, which was useful before but is past its date. The original author of the spanning tree is now the author of THRILL ( replacement to STP ). STP ( Spanning Tree Protocol ) was never a routing protocol to determine the best path; it was used to provide a loop-free path. STP is also a fail-open protocol ( as opposed to a Layer 3 protocol that fails closed ).

One of the spanning trees’ most significant weaknesses is that it fails to open, i.e., if I don’t receive a BPDU ( Bridge Protocol Data Unit ), I assume I am not connected to a switch, and I start forwarding on that port. Combining a fail-open paradigm with a flooding paradigm can be disastrous.

Design a Scalable Data Center Topology

To overcome the limitation, some are now trying to route ( Layer 3 ) the entire way to the access layer, which has its problems, too, as some applications require L2 to function, e.g., clustering and stateful devices—however, people still like Layer 3 as we have stability around routing. You have an actual path-based routing protocol managing the network, not a loop-free protocol like STP, and routing also doesn’t fail open and prevents loops with the TTL ( Time to Live ) fields in the headers.

Convergence routing around a failure is quick and improves stability. We also have ECMP ( Equal Cost Multi-Path) paths to help with scaling and translating to scale-out topologies. This allows the network to grow at a lower cost. Scale-out is better than scale-up.

Whether you are a small or large network, having a routed network over a Layer 2 network has clear advantages. How we interface with the network is also cumbersome, and it is estimated that 70% of failures on the network are due to human errors. The risk of changes to the production network leads to cautious changes, slowing processes to a crawl.

In summary, the problems we have faced so far;

STP-based Layer 2 has stability challenges; it fails to open. Traditional bridging is controlled flooding, not forwarding, so it shouldn’t be considered as stable as a routing protocol. Some applications require Layer 2, but people still prefer Layer 3. The network infrastructure must be flexible enough to adapt to new applications/services, legacy applications/services, and organizational structures.

There is never enough bandwidth, and we cannot predict future application-driven requirements, so a better solution would be to have a flexible network infrastructure. The consequences of inflexibility slow down the deployment of new services and applications and restrict innovation.

The infrastructure needs to be flexible for the data center applications. Not the other way around. It must also be agile enough to be a bottleneck or barrier to deployment and innovation.

 What are the new options moving forward?

Layer 2 fabrics ( Open standard THRILL ) change how the network works and enable a large routed Layer 2 network. A Layer 2 Fabric, for example, Cisco FabricPath, is layer 2; it acts more than Layer 3 as it’s a routing protocol managed topology. As a result, there is improved stability and faster convergence. It can also support massive ( up to 32 load-balanced forwarding paths versus a single forwarding path with Spanning Tree ) and scale-out capabilities.

Lab Guide: VXLAN Basics

In this lab guide, we have a VXLAN overlay network. The core configuration with VXLAN is the VNI. The VNI needs to match on both sides. Below is a VNI of 6002 tied to the bridge domain. We are creating a layer 2 network for the two desktops to communicate. The layer 2 network traverses the core layer, which consists of the spine layer. It is the use of the VNI that allows VXLAN to scale.

VXLAN
Diagram: Changing the VNI

 

VXLAN overlay networking

What is VXLAN?

Suppose you already have a Layer 3 core and must support Layer 2 end to end. In that case, you could go for an Encapsulated Overlay ( VXLAN, NVGRE, STT, or a design with generic routing encapsulation). You have the stability of a Layer 3 core and the familiarity of but can service Layer 2 end to end using UDP port numbers as network entropy. Depending on the design option, it builds an L2 tunnel over an L3 core. 

Video: VXLAN Basics

The VLAN tag field is defined in 1. IEEE 802.1Q has 12 bits for host identification, supporting a maximum of only 4094 VLANs. It’s common these days to have a multi-tiered application deployment where every tier requires its segment. With literally thousands of multi-tier application segments, this will run out.

Then came along the Virtual extensible local area network (VXLAN). VXLAN uses a 24-bit network segment ID, called a VXLAN network identifier (VNI), for identification. This is much larger than the 12 bits used for traditional VLAN identification. The VNI is just a fancy name for a VLAN ID, but it now supports up to 16 Million VXLAN segments.

Technology Brief : VXLAN – Introducing VXLAN
Prev 1 of 1 Next
Prev 1 of 1 Next

A use case for this will be if you have two devices that need to exchange state at L2 or require VMotion. VMs cannot migrate across L3 as they need to stay in the same VLAN to keep the TCP sessions intact. Software-defined networking is changing the way we interact with the network.

It provides us with faster deployment and improved control. It changes how we interact with the network and has more direct application and service integration. Having a centralized controller, you can view this as a policy-focused network.

Many prominent vendors will push within the framework of converged infrastructure ( server, storage, networking, centralized management ) all from one vendor and closely linking hardware and software ( HP, Dell, Oracle ). While other vendors will offer a software-defined data center in which physical hardware is virtual, centrally managed, and treated as abstraction resource pools that can be dynamically provisioned and configured ( Microsoft ).

Summary: SDN Data Center

Software-defined networking (SDN) has emerged as a game-changer for data centers in the ever-evolving technology landscape. This innovative approach to networking takes flexibility, scalability, and efficiency to a whole new level. In this blog post, we explored the concept of SDN data centers and how they transform how we manage and operate our networks.

Section 1: Understanding SDN

SDN, at its core, separates the control plane from the data plane, enabling centralized management and programmability of the network. By abstracting the underlying infrastructure, SDN allows administrators to dynamically allocate resources, optimize traffic flow, and respond quickly to changing business needs. It provides a holistic network view, simplifying complex configurations and enhancing network agility.

Section 2: Benefits of SDN Data Centers

2.1 Enhanced Scalability: Due to rigid network architectures, traditional data centers often struggle with scalability. SDN data centers overcome this challenge by decoupling the control plane, enabling seamless scalability and resource allocation on-demand.

2.2 Increased Flexibility: SDN empowers network administrators to define policies and rules through software, eliminating the need to configure individual network devices manually. This flexible approach allows for rapid provisioning of services, quick adaptation to changing requirements, and efficient troubleshooting.

2.3 Improved Performance: With SDN, network traffic can be intelligently monitored, analyzed, and optimized in real-time. This dynamic traffic engineering capability ensures optimal utilization of network resources, minimizing latency and maximizing throughput.

Section 3: SDN Security Considerations

While SDN offers numerous advantages, it also introduces unique security considerations that must be addressed. By centralizing control, SDN exposes a potential single point of failure and becomes an attractive target for malicious activities. Robust authentication, access control, and encryption mechanisms are crucial to ensure the security and integrity of SDN data centers.

Conclusion:

SDN data centers represent a paradigm shift in the world of networking. Their ability to provide scalability, flexibility, and performance optimization is revolutionizing how organizations design and operate their networks. As technology continues to evolve, SDN will undoubtedly play a pivotal role in shaping the future of data centers.