WAN SDN

May 11, 2015

by Matt Conran Blog

WAN SDN

In today's fast-paced digital world, organizations constantly seek ways to optimize their network infrastructure for improved performance, scalability, and cost efficiency. One emerging technology that has gained significant traction is WAN Software-Defined Networking (SDN). By decoupling the control and data planes, WAN SDN provides organizations unprecedented flexibility, agility, and control over their wide area networks (WANs). In this blog post, we will delve into the world of WAN SDN, exploring its key benefits, implementation considerations, and real-world use cases.

WAN SDN is a network architecture that allows organizations to manage and control their wide area networks using software centrally. Traditionally, WANs have been complex and time-consuming to configure, often requiring manual network provisioning and management intervention. However, with WAN SDN, network administrators can automate these tasks through a centralized controller, simplifying network operations and reducing human errors.

Enhanced Agility: WAN SDN empowers network administrators with the ability to quickly adapt to changing business needs. With programmable policies and dynamic control, organizations can easily adjust network configurations, prioritize traffic, and implement changes without the need for manual reconfiguration of individual devices.

Improved Scalability: Traditional wide area networks often face scalability challenges due to the complex nature of managing numerous remote sites. WAN SDN addresses this issue by providing centralized control, allowing for streamlined network expansion, and efficient resource allocation.

Optimal Resource Utilization: WAN SDN enables organizations to maximize their network resources by intelligently routing traffic and dynamically allocating bandwidth based on real-time demands. This ensures that critical applications receive the necessary resources while minimizing wastage.

Multi-site Enterprises: WAN SDN is particularly beneficial for organizations with multiple branch locations. It allows for simplified network management across geographically dispersed sites, enabling efficient resource allocation, centralized security policies, and rapid deployment of new services.

Cloud Connectivity: WAN SDN plays a crucial role in connecting enterprise networks with cloud service providers. It offers seamless integration, secure connections, and dynamic bandwidth allocation, ensuring optimal performance and reliability for cloud-based applications.

Service Providers: WAN SDN can revolutionize how service providers deliver network services to their customers. It enables the creation of virtual private networks (VPNs) on-demand, facilitates network slicing for different tenants, and provides granular control and visibility for service-level agreements (SLAs).

WAN SDN represents a paradigm shift in wide area network management. Its ability to centralize control, enhance agility, and optimize resource utilization make it a game-changer for modern networking infrastructures. As organizations continue to embrace digital transformation and demand more from their networks, WAN SDN will undoubtedly play a pivotal role in shaping the future of networking.

Matt Conran

Highlights: WAN SDN

Discussing WAN SDN

1: – ) Traditional WANs have long been plagued by various limitations, such as complexity, lack of agility, and high operational costs. These legacy networks typically rely on manual configurations and proprietary hardware, making them inflexible and time-consuming. SDN brings a paradigm shift to WANs by decoupling the network control plane from the underlying infrastructure. With centralized control and programmability, SDN enables network administrators to manage and orchestrate their WANs through a single interface, simplifying network operations and promoting agility.

2: – ) At its core, WAN SDN separates the control plane from the data plane, allowing network administrators to manage network traffic dynamically and programmatically. This separation leads to more efficient network management, reducing the complexity associated with traditional network infrastructures. With WAN SDN, businesses can optimize traffic flow, enhance security, and reduce operational costs by leveraging centralized control and automation.

3: – ) One of the key advantages of SDN in WANs is its inherent flexibility and scalability. With SDN, network administrators can dynamically allocate bandwidth, reroute traffic, and prioritize applications based on real-time needs. This level of granular control allows organizations to optimize their network resources efficiently and adapt to changing demands.

4: – ) SDN brings enhanced security features to WANs through centralized policy enforcement and monitoring. By abstracting network control, SDN allows for consistent security policies across the entire network, minimizing vulnerabilities and ensuring better threat detection and mitigation. Additionally, SDN enables rapid network recovery and failover mechanisms, enhancing overall resilience.

**Key Benefits of WAN SDN**

1. **Scalability and Flexibility**: WAN SDN enables networks to adapt quickly to changing demands without the need for significant hardware investments. This flexibility is crucial for organizations looking to scale their operations efficiently.

2. **Improved Network Performance**: By optimizing traffic routing and prioritizing critical applications, WAN SDN ensures that networks operate at peak performance levels. This capability is particularly beneficial for businesses with high bandwidth demands.

3. **Enhanced Security**: WAN SDN allows for the implementation of robust security measures, including automated threat detection and response. This proactive approach to security helps protect sensitive data and maintain compliance with industry regulations.

Application Challenges

Compared to a network-centric model, business intent-based WAN networks have great potential. By using a WAN architecture, applications can be deployed and managed more efficiently. However, application services topologies must replace network topologies. Supporting new and existing applications on the WAN is a common challenge for network operations staff. Applications such as these consume large amounts of bandwidth and are extremely sensitive to variations in bandwidth quality. Improving the WAN environment for these applications is more critical due to jitter, loss, and delay.

WAN SLA

In addition, cloud-based applications such as Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) are increasing bandwidth demands on the WAN. As cloud applications require increasing bandwidth, provisioning new applications and services is becoming increasingly complex and expensive. In today’s business environment, WAN routing and network SLAs are controlled by MPLS L3VPN service providers. As a result, they are less able to adapt to new delivery methods, such as cloud-based and SaaS-based applications.

These applications could take months to implement in service providers’ environments. These changes can also be expensive for some service providers, and some may not be made at all. There is no way to instantiate VPNs independent of underlying transport since service providers control the WAN core. Implementing differentiated service levels for different applications becomes challenging, if not impossible.

WAN SDN Technology: DMVPN

DMVPN is a Cisco-developed solution that enables the creation of virtual private networks over public or private networks. Unlike traditional VPNs that require point-to-point connections, DMVPN utilizes a hub-and-spoke architecture, allowing for dynamic and scalable network deployments. DMVPN simplifies network management and reduces administrative overhead by leveraging multipoint GRE tunnels.

– Multipoint GRE Tunnels: At the core of DMVPN lies the concept of multipoint GRE tunnels. These tunnels create a virtual network, connecting multiple sites while encapsulating packets in GRE headers. This enables efficient traffic routing between sites, reducing the complexity and overhead associated with traditional point-to-point VPNs.

– Next-Hop Resolution Protocol (NHRP): NHRP plays a crucial role in DMVPN by dynamically mapping tunnel IP addresses to physical addresses. It allows for the efficient resolution of next-hop information, eliminating the need for static routes. NHRP also enables on-demand tunnel establishment, improving scalability and reducing administrative overhead.

– IPsec Encryption: DMVPN utilizes IPsec encryption to ensure secure communication over the VPN. IPsec provides confidentiality, integrity, and authentication of data, making it ideal for protecting sensitive information transmitted over the network. With DMVPN, IPsec is applied dynamically per-tunnelly, enhancing flexibility and scalability.

DMVPN over IPSec

Understanding DMVPN & IPSec

IPsec, a widely adopted security protocol, is integral to DMVPN deployments. It provides the cryptographic framework necessary for securing data transmitted over the network. By leveraging IPsec, DMVPN ensures the transmitted information’s confidentiality, integrity, and authenticity, protecting sensitive data from unauthorized access and tampering.

Firstly, the dynamic mesh topology eliminates the need for complex hub-and-spoke configurations, simplifying network management and reducing administrative overhead. Additionally, DMVPN’s scalability enables seamless integration of new sites and facilitates rapid expansion without compromising performance. Furthermore, the inherent flexibility ensures optimal routing, load balancing, and efficient bandwidth utilization.

Example WAN Techniques:

Understanding Virtual Routing and Forwarding

VRF is a technology that enables the creation of multiple virtual routing tables within a single physical router. Each VRF instance acts as an independent router with its routing table, interfaces, and forwarding decisions. This separation allows different networks or customers to coexist on the same physical infrastructure while maintaining complete isolation.

One critical advantage of VRF is its ability to provide network segmentation. By dividing a physical router into multiple VRF instances, organizations can isolate their networks, ensuring that traffic from one VRF does not leak into another. This enhances security and provides a robust framework for multi-tenancy scenarios.

Use Cases for VRF

VRF finds application in various scenarios, including:

1. Service Providers: VRF allows providers to offer their customers virtual private network (VPN) services. Each customer can have their own VRF, ensuring their traffic remains separate and secure.

2. Enterprise Networks: VRF can segregate different organizational departments, creating independent virtual networks.

3. Internet of Things (IoT): With the proliferation of IoT devices, VRF can create separate routing domains for different IoT deployments, improving scalability and security.

Understanding Policy-Based Routing

Policy-based Routing, at its core, involves manipulating routing decisions based on predefined policies. Unlike traditional routing protocols that rely solely on destination addresses, PBR considers additional factors such as source IP, ports, protocols, and even time of day. By implementing PBR, network administrators gain flexibility in directing traffic flows to specific paths based on specified conditions.

The adoption of Policy Based Routing brings forth a multitude of benefits. Firstly, it enables efficient utilization of network resources by allowing administrators to prioritize or allocate bandwidth for specific applications or user groups. Additionally, PBR enhances security by allowing traffic redirection to dedicated firewalls or intrusion detection systems. Furthermore, PBR facilitates load balancing and traffic engineering, ensuring optimal performance across the network.

Implementing Policy-Based Routing

To implement PBR, network administrators must follow a series of steps. Firstly, the traffic classification criteria are defined by specifying the match criteria based on desired conditions. Secondly, create route maps that outline the actions for matched traffic. These actions may include altering the next-hop address, setting specific Quality of Service (QoS) parameters, or redirecting traffic to a different interface. Lastly, the route maps should be applied to the appropriate interfaces or specific traffic flows.

Example SD WAN Product: Cisco Meraki

**Seamless Cloud Management**

One of the standout features of Cisco Meraki is its seamless cloud management. Unlike traditional network systems, Meraki’s cloud-based platform allows IT administrators to manage their entire network from a single, intuitive dashboard. This centralization not only simplifies network management but also provides real-time visibility and control over all connected devices. With automatic updates and zero-touch provisioning, businesses can ensure their network is always up-to-date and secure without the need for extensive manual intervention.

**Cutting-Edge Security Features**

Security is at the core of Cisco Meraki’s suite of products. With cyber threats becoming more sophisticated, Meraki offers a multi-layered security approach to protect sensitive data. Features such as Advanced Malware Protection (AMP), Intrusion Prevention System (IPS), and secure VPNs ensure that the network is safeguarded against intrusions and malware. Additionally, Meraki’s security appliances are designed to detect and mitigate threats in real-time, providing businesses with peace of mind knowing their data is secure.

**Scalability and Flexibility**

As businesses grow, so do their networking needs. Cisco Meraki’s scalable solutions are designed to grow with your organization. Whether you are expanding your office space, adding new branches, or integrating more IoT devices, Meraki’s flexible infrastructure can easily adapt to these changes. The platform supports a wide range of devices, from access points and switches to security cameras and mobile device management, making it a comprehensive solution for various networking requirements.

**Enhanced User Experience**

Beyond security and management, Cisco Meraki enhances the user experience by ensuring reliable and high-performance network connectivity. Features such as intelligent traffic shaping, load balancing, and seamless roaming between access points ensure that users enjoy consistent and fast internet access. Furthermore, Meraki’s analytics tools provide insights into network usage and performance, allowing businesses to optimize their network for better efficiency and user satisfaction.

Performance at the WAN Edge

Understanding Performance-Based Routing

Performance-based routing is a dynamic approach to network traffic management that prioritizes route selection based on real-time performance metrics. Instead of relying on traditional static routing protocols, performance-based routing algorithms assess the current conditions of network paths, such as latency, packet loss, and available bandwidth, to make informed routing decisions. By dynamically adapting to changing network conditions, performance-based routing aims to optimize traffic flow and enhance overall network performance.

The adoption of performance-based routing brings forth a multitude of benefits for businesses.

1- Firstly, it enhances network reliability by automatically rerouting traffic away from congested or underperforming paths, minimizing the chances of bottlenecks and service disruptions.

2- Secondly, it optimizes application performance by intelligently selecting the best path based on real-time network conditions, thus reducing latency and improving end-user experience. A

3- Additionally, performance-based routing allows for efficient utilization of available network resources, maximizing bandwidth utilization and cost-effectiveness.

Implementation Details:

Implementing performance-based routing requires a thoughtful approach. Firstly, businesses must invest in monitoring tools that provide real-time insights into network performance metrics. These tools can range from simple latency monitoring to more advanced solutions that analyze packet loss and bandwidth availability.

Once the necessary monitoring infrastructure is in place, configuring performance-based routing algorithms within network devices becomes the next step. This involves setting up rules and policies that dictate how traffic should be routed based on specific performance metrics.

Lastly, regular monitoring and fine-tuning performance-based routing configurations are essential to ensure optimal network performance.

WAN Performance Parameters

TCP Performance Parameters

TCP (Transmission Control Protocol) is the backbone of modern Internet communication, ensuring reliable data transmission across networks. Behind the scenes, TCP performance is influenced by several key parameters that can significantly impact network efficiency.

TCP performance parameters govern how TCP behaves in various network conditions. These parameters can be fine-tuned to adapt TCP’s behavior to specific network characteristics, such as latency, bandwidth, and congestion. By adjusting these parameters, network administrators and system engineers can optimize TCP performance for better throughput, reduced latency, and improved overall network efficiency.

Congestion Control Algorithms: Congestion control algorithms are crucial in TCP performance. They monitor network conditions, detect congestion, and adjust TCP’s sending rate accordingly. Popular algorithms like Reno, Cubic, and BBR implement different strategies to handle congestion, balancing fairness and efficiency. Understanding these algorithms and their impact on TCP behavior is essential for maintaining a stable and responsive network.

Window Size and Bandwidth Delay Product: The window size parameter, often called the congestion window, determines the amount of data that can be sent before receiving an acknowledgment. The bandwidth-delay product should set the window size, a value calculated by multiplying the available bandwidth with the round-trip time (RTT). Adjusting the window size to match the bandwidth-delay product ensures optimal data transfer and prevents underutilization or overutilization of network resources.

Maximum Segment Size (MSS): The Maximum Segment Size is another TCP performance parameter defining the maximum amount of data encapsulated within a single TCP segment. By carefully configuring the MSS, it is possible to reduce packet fragmentation, enhance data transmission efficiency, and mitigate issues related to network overhead.

Selective Acknowledgment (SACK): Selective Acknowledgment is a TCP extension that allows the receiver to acknowledge out-of-order segments and provide more precise information about the received data. Enabling SACK can improve TCP performance by reducing retransmissions and enhancing the overall reliability of data transmission.

Understanding TCP MSS

TCP MSS refers to the maximum amount of data encapsulated within a single TCP segment. It represents the most significant data payload that can be transmitted without fragmentation. By limiting the segment size, TCP aims to prevent excessive overhead and ensure efficient data transmission across networks.

Several factors influence the determination of TCP MSS. One crucial aspect is the underlying network infrastructure’s Maximum Transmission Unit (MTU). The MTU represents the maximum packet size that can be transmitted over the network without fragmentation. TCP MSS must be set to a value equal to or lower than the MTU to avoid fragmentation and subsequent performance degradation.

Path MTU Discovery (PMTUD) is a mechanism TCP employs to dynamically determine the optimal MSS value for a given network path. By exchanging ICMP messages with routers along the path, TCP can ascertain the MTU and adjust the MSS accordingly. PMTUD helps prevent packet fragmentation and ensures efficient data transmission across network segments.

The TCP MSS value directly affects network performance. A smaller MSS can increase overhead due to more segments and headers, potentially reducing overall throughput. On the other hand, a larger MSS can increase the risk of fragmentation and subsequent retransmissions, impacting latency and overall network efficiency. Striking the right balance is crucial for optimal performance.

Example WAN Technology: DMVPN Phase 3

Understanding DMVPN Phase 3

DMVPN Phase 3 builds upon the foundation of its predecessors, bringing forth even more advanced features. This section will provide an overview of DMVPN Phase 3, highlighting its main enhancements, such as increased scalability, simplified configuration, and enhanced security protocols.

One of the standout features of DMVPN Phase 3 is its scalability. This section will explain how DMVPN Phase 3 allows organizations to effortlessly add new sites to the network without complex manual configurations. By leveraging multipoint GRE tunnels, DMVPN Phase 3 offers a dynamic and flexible solution that can easily accommodate growing networks.

Example WAN Technology: FlexVPN Site-to-Site Smart Defaults

Understanding FlexVPN Site-to-Site Smart Defaults

FlexVPN Site-to-Site Smart Defaults is a powerful feature that simplifies site-to-site VPN configuration and deployment process. Providing pre-defined templates and configurations eliminates the need for manual configuration, reducing the chances of misconfigurations or human errors. This feature ensures a secure and reliable VPN connection between sites, enabling organizations to establish a robust network infrastructure.

FlexVPN Site-to-Site Smart Defaults offers several key features and benefits that contribute to improved network security. Firstly, it provides secure cryptographic algorithms that protect data transmission, ensuring the confidentiality and integrity of sensitive information. Additionally, it supports various authentication methods, such as digital certificates and pre-shared keys, further enhancing the overall security of the VPN connection. The feature also allows for easy scalability, enabling organizations to expand their network infrastructure without compromising security.

Example WAN Technology: FlexVPN IKEv2 Routing

Understanding FlexVPN

FlexVPN, short for Flexible VPN, is a versatile framework offering various VPN solutions. It provides a secure and scalable approach to establishing Virtual Private Networks (VPNs) over various network infrastructures. With its flexibility, it allows for seamless integration and interoperability across different platforms and devices.

IKEv2, or Internet Key Exchange version 2, is a secure and efficient protocol for establishing and managing VPN connections. It boasts numerous advantages, including its robust security features, ability to handle network disruptions, and support for rapid reconnection. IKEv2 is highly regarded for its ability to maintain stable and uninterrupted VPN connections, making it an ideal choice for FlexVPN.

a. Enhanced Security: FlexVPN IKEv2 Routing offers advanced encryption algorithms and authentication methods, ensuring the confidentiality and integrity of data transmitted over the VPN.

b. Scalability: With its flexible architecture, FlexVPN IKEv2 Routing effortlessly scales to accommodate growing network demands, making it suitable for small—to large-scale deployments.

c. Dynamic Routing: One of FlexVPN IKEv2 Routing’s standout features is its support for dynamic routing protocols, such as OSPF and EIGRP. This enables efficient and dynamic routing of traffic within the VPN network.

d. Seamless Failover: FlexVPN IKEv2 Routing provides automatic failover capabilities, ensuring uninterrupted connectivity even during network disruptions or hardware failures.

Understanding MPLS (Multi-Protocol Label Switching)

MPLS serves as the foundation for MPLS VPNs. It is a versatile and efficient routing technique that uses labels to forward data packets through a network. By assigning labels to packets, MPLS routers can make fast-forwarding decisions based on the labels, reducing the need for complex and time-consuming lookups in routing tables. This results in improved network performance and scalability.

Understanding MPLS LDP

MPLS LDP is a crucial component in establishing label-switched paths within MPLS networks. MPLS LDP facilitates efficient packet forwarding and routing by enabling the distribution of labels and creating forwarding equivalency classes. Let’s take a closer look at how MPLS LDP operates.

One of the fundamental aspects of MPLS LDP is label distribution. Through signaling protocols, MPLS LDP ensures that labels are assigned and distributed across network nodes. This enables routers to make forwarding decisions based on labels, resulting in streamlined and efficient data transmission.

In MPLS LDP, labels serve as the building blocks of label-switched paths. These paths allow routers to forward packets based on labels rather than traditional IP routing. Additionally, MPLS LDP employs forwarding equivalency classes (FECs) to group packets with similar characteristics, further enhancing network performance.

MPLS Virtual Private Networks (VPNs) Explained

VPNs provide secure communication over public networks by creating a private tunnel through which data can travel. They employ encryption and tunneling protocols to protect data from eavesdropping and unauthorized access. MPLS VPNs utilize this VPN concept to establish secure connections between geographically dispersed sites or remote users.

MPLS VPN Components

Customer Edge (CE) Router: The CE router acts as the entry and exit point for customer networks. It connects to the provider network and exchanges routing information. It encapsulates customer data into MPLS packets and forwards them to the provider network.

Provider Edge (PE) Router: The PE router sits at the edge of the service provider’s network and connects to the CE routers. It acts as a bridge between the customer and provider networks and handles the MPLS label switching. The PE router assigns labels to incoming packets and forwards them based on the labels’ instructions.

Provider (P) Router: P routers form the backbone of the service provider’s network. They forward MPLS packets based on the labels without inspecting the packet’s content, ensuring efficient data transmission within the provider’s network.

Virtual Routing and Forwarding (VRF) Tables: VRF tables maintain separate routing instances within a single PE router. Each VRF table represents a unique VPN and keeps the customer’s routing information isolated from other VPNs. VRF tables enable the PE router to handle multiple VPNs concurrently, providing secure and independent communication channels.

Use Case – DMVPN Single Hub, Dual Cloud

Single Hub, Dual Cloud is a specific configuration within the DMVPN architecture. In this setup, a central hub device acts as the primary connection point for branch offices while utilizing two separate cloud providers for redundancy and load balancing. This configuration offers several advantages, including improved availability, increased bandwidth, and enhanced failover capabilities.

1. Enhanced Redundancy: By leveraging two cloud providers, organizations can achieve high availability and minimize downtime. If one cloud provider experiences an issue or outage, the traffic can seamlessly be redirected to the alternate provider, ensuring uninterrupted connectivity.

2. Load Balancing: Distributing network traffic across two cloud providers allows for better resource utilization and improved performance. Organizations can optimize their bandwidth usage and mitigate potential bottlenecks.

3. Scalability: Single Hub, Dual Cloud DMVPN allows organizations to easily scale their network infrastructure by adding more branch offices or cloud providers as needed. This flexibility ensures that the network can adapt to changing business requirements.

4. Cost Efficiency: Utilizing multiple cloud providers can lead to cost savings through competitive pricing and the ability to negotiate better service level agreements (SLAs). Organizations can choose the most cost-effective options while maintaining the desired level of performance and reliability.

The role of SDN

With software-defined networking (SDN), network configurations can be dynamic and programmatically optimized, improving network performance and monitoring more like cloud computing than traditional network management. By disassociating the forwarding of network packets from routing (control plane), SDN can be used to centralize network intelligence within a single network component by improving the static architecture of traditional networks.

Controllers make up the control plane of an SDN network, which contains all of the network’s intelligence. They are considered the brains of the network. Security, scalability, and elasticity are some of the drawbacks of centralization.

Since OpenFlow’s emergence in 2011, SDN was commonly associated with remote communication with network plane elements to determine the path of network packets across network switches. Additionally, proprietary network virtualization platforms, such as Cisco Systems’ Open Network Environment and Nicira’s, use the term.

The SD-WAN technology is used in wide area networks (WANs)

SD-WAN, short for Software-Defined Wide Area Networking, is a transformative approach to network connectivity. Unlike traditional WAN, which relies on hardware-based infrastructure, SD-WAN utilizes software and cloud-based technologies to connect networks over large geographic areas securely. By separating the control plane from the data plane, SD-WAN provides centralized management and enhanced flexibility, enabling businesses to optimize their network performance.

Transport Independance: Hybrid WAN

The hybrid WAN concept was born out of this need. An alternative path that applications can take across a WAN environment is provided by hybrid WAN, which involves businesses acquiring non-MPLS networks and adding them to their LANs. Business enterprises can control these circuits, including routing and application performance. VPN tunnels are typically created over the top of these circuits to provide secure transport over any link. 4G/LTE, L2VPN, commodity broadband Internet, and L2VPN are all examples of these types of links.

As a result, transport independence is achieved. In this way, any transport type can be used under the VPN, and deterministic routing and application performance can be achieved. These commodity links can transmit some applications rather than the traditionally controlled L3VPN MPLS links provided by service providers.

SDN and APIs

WAN SDN is a modern approach to network management that uses a centralized control model to manage, configure, and monitor large and complex networks. It allows network administrators to use software to configure, monitor, and manage network elements from a single, centralized system. This enables the network to be managed more efficiently and cost-effectively than traditional networks.

SDN uses an application programming interface (API) to abstract the underlying physical network infrastructure, allowing for more agile network control and easier management. It also enables network administrators to rapidly configure and deploy services from a centralized location. This enables network administrators to respond quickly to changes in traffic patterns or network conditions, allowing for more efficient use of resources.

Scalability and Automation

SDN also allows for improved scalability and automation. Network administrators can quickly scale up or down the network by leveraging automated scripts depending on its current needs. Automation also enables the network to be maintained more rapidly and efficiently, saving time and resources.

Before you proceed, you may find the following posts helpful:

WAN SDN

A Deterministic Solution

Technology typically starts as a highly engineered, expensive, deterministic solution. As the marketplace evolves and competition rises, the need for a non-deterministic, inexpensive solution comes into play. We see this throughout history. First, mainframes were/are expensive, and with the arrival of a microprocessor personal computer, the client/server model was born. The Static RAM ( SRAM ) technology was replaced with cheaper Dynamic RAM ( DRAM ). These patterns consistently apply to all areas of technology.

Finally, deterministic and costly technology is replaced with intelligent technology using redundancy and optimization techniques. This process is now appearing in Wide Area Networks (WAN). Now, we are witnessing changes to routing space with the incorporation of Software Defined Networking (SDN) and BGP (Border Gateway Protocol). By combining these two technologies, companies can now perform intelligent routing, aka SD-WAN path selection, with an SD WAN Overlay

SD-WAN Path Selection

SD-WAN path selection is essential to a Software-Defined Wide Area Network (SD-WAN) architecture. SD-WAN path selection selects the most optimal network path for a given application or user. This process is automated and based on user-defined criteria, such as latency, jitter, cost, availability, and security. As a result, SD-WAN can ensure that applications and users experience the best possible performance by making intelligent decisions on which network path to use.

When selecting the best path for a given application or user, SD-WAN looks at the quality of the connection and the available bandwidth. It then looks at the cost associated with each path. Cost can be a significant factor when selecting a path, especially for large enterprises or organizations with multiple sites.

SD-WAN can also prioritize certain types of traffic over others. This is done by assigning different weights or priorities for various kinds of traffic. For example, an organization may prioritize voice traffic over other types of traffic. This ensures that voice traffic has the best possible chance of completing its journey without interruption.

Diagram: SD WAN traffic steering. Source Cisco.

Critical Considerations for Implementation:

Network Security:

When adopting WAN SDN, organizations must consider the potential security risks associated with software-defined networks. Robust security measures, including authentication, encryption, and access controls, should be implemented to protect against unauthorized access and potential vulnerabilities.

Staff Training and Expertise:

Implementing WAN SDN requires skilled network administrators proficient in configuring and managing the software-defined network infrastructure. Organizations must train and upskill their IT teams to ensure successful implementation and ongoing management.

Real-World Use Cases:

Multi-Site Connectivity:

WAN SDN enables organizations with multiple geographically dispersed locations to connect their sites seamlessly. Administrators can prioritize traffic, optimize bandwidth utilization, and ensure consistent network performance across all locations by centrally controlling the network.

Cloud Connectivity:

With the increasing adoption of cloud services, WAN SDN allows organizations to connect their data centers to public and private clouds securely and efficiently. This facilitates smooth data transfers, supports workload mobility, and enhances cloud performance.

Disaster Recovery:

WAN SDN simplifies disaster recovery planning by allowing organizations to reroute network traffic dynamically during a network failure. This ensures business continuity and minimizes downtime, as the network can automatically adapt to changing conditions and reroute traffic through alternative paths.

The Rise of WAN SDN

The foundation for business and cloud services are crucial elements of business operations. The transport network used for these services is best efforts, weak, and offers no guarantee of an acceptable delay. More services are being brought to the Internet, yet the Internet is managed inefficiently and cheaply.

Every Autonomous System (AS) acts independently, and there is a price war between transit providers, leading to poor quality of transit services. Operating over this flawed network, customers must find ways to guarantee applications receive the expected level of quality.

Border Gateway Protocol (BGP), the Internet’s glue, has several path selection flaws. The main drawback of BGP is the routing paradigm relating to the path-selection process. BGP default path selection is based on Autonomous System (AS) Path length; prefer the path with the shortest AS_PATH. It misses the shape of the network with its current path selection process. It does not care if propagation delay, packet loss, or link congestion exists. It resulted in long path selection and utilizing paths potentially experiencing packet loss.

Example: WAN SDN with Border6

Border6 is a French company that started in 2012. It offers non-stop internet and an integrated WAN SDN solution, influencing BGP to perform optimum routing. It’s not a replacement for BGP but a complementary tool to enhance routing decisions. For example, it automates changes in routing in cases of link congestion/blackouts.

“The agile way of improving BGP paths by the Border 6 tool improves network stability” Brandon Wade, iCastCenter Owner.

As the Internet became more popular, customers wanted to add additional intelligence to routing. Additionally, businesses require SDN traffic optimizations, as many run their entire service offerings on top of it.

What is non-stop internet?

Border6 offers an integrated WAN SDN solution with BGP that adds intelligence to outbound routing. A common approach when designing SDN in real-world networks is to prefer that SDN solutions incorporate existing field testing mechanisms (BGP) and not reinvent all the wheels ever invented. Therefore, the border6 approach to influence BGP with SDN is a welcomed and less risky approach to implementing a greenfield startup. In addition, Microsoft and Viptela use the SDN solution to control BGP behavior.

Border6 uses BGP to guide what might be reachable. Based on various performance metrics, they measure how well paths perform. They use BGP to learn the structure of the Internet and then run their algorithms to determine what is essential for individual customers. Every customer has different needs to reach different subnets. Some prefer costs; others prefer performance.

They elect several interesting “best” performing prefixes, and the most critical prefixes are selected. Next, they find probing locations and measure the source with automatic probes to determine the best path. All these tools combined enhance the behavior of BGP. Their mechanism can detect if an ISP has hardware/software problems, drops packets, or rerouting packets worldwide.

Thousands of tests per minute

The Solution offers the best path by executing thousands of tests per minute and enabling results to include the best paths for packet delivery. Outputs from the live probing of path delays and packet loss inform BGP on which path to route traffic. The “best path” is different for each customer. It depends on the routing policy the customer wants to take. Some customers prefer paths without packet loss; others wish to cheap costs or paths under 100ms. It comes down to customer requirements and the applications they serve.

BGP – Unrelated to Performance

Traditionally, BGP gets its information to make decisions based on data unrelated to performance. Broder 6 tries to correlate your packet’s path to the Internet by choosing the fastest or cheapest link, depending on your requirements.

They are taking BGP data service providers and sending them as a baseline. Based on that broad connectivity picture, they have their measurements – lowest latency, packets lost, etc.- and adjust the data from BGP to consider these other measures. They were, eventually, performing optimum packet traffic forwarding. They first look at Netflow or Sflow data to determine what is essential and use their tool to collect and aggregate the data. From this data, they know what destinations are critical to that customer.

BGP for outbound | Locator/ID Separation Protocol (LISP) for inbound

Border6 products relate to outbound traffic optimizations. It can be hard to influence inbound traffic optimization with BGP. Most AS behave selfishly and optimize the traffic in their interest. They are trying to provide tools that help AS optimize inbound flows by integrating their product set with the Locator/ID Separation Protocol (LISP). The diagram below displays generic LISP components. It’s not necessarily related to Border6 LISP design.

LISP decouples the address space so you can optimize inbound traffic flows. Many LISP uses cases are seen with active-active data centers and VM mobility. It decouples the “who” and the “where,” which allows end-host addressing not to correlate with the actual host location. The drawback is that LISP requires endpoints that can build LISP tunnels.

Currently, they are trying to provide a solution using LISP as a signaling protocol between Border6 devices. They are also working on performing statistical analysis for data received to mitigate potential denial-of-service (DDoS) events. More DDoS algorithms are coming in future releases.

Closing Points: On WAN SDN

At its core, WAN SDN separates the control plane from the data plane, facilitating centralized network management. This separation allows for dynamic adjustments to network configurations, providing businesses with the agility to respond to changing conditions and demands. By leveraging software to control network resources, organizations can achieve significant improvements in performance and cost-effectiveness.

One of the primary advantages of WAN SDN is its ability to optimize network traffic and improve bandwidth utilization. By intelligently routing data, WAN SDN minimizes latency and enhances the overall user experience. Additionally, it simplifies network management by providing a single, centralized platform to control and configure network policies, reducing the complexity and time required for network maintenance.

Summary: WAN SDN

In today’s digital age, where connectivity and speed are paramount, traditional Wide Area Networks (WANs) often fall short of meeting the demands of modern businesses. However, a revolutionary solution that promises to transform how we think about and utilize WANs has emerged. Enter Software-Defined Networking (SDN), a paradigm-shifting approach that brings unprecedented flexibility, efficiency, and control to WAN infrastructure.

Understanding SDN

At its core, SDN is a network architecture that separates the control plane from the data plane. By decoupling network control and forwarding functions, SDN enables centralized management and programmability of the entire network, regardless of its geographical spread. Traditional WANs relied on complex and static configurations, but SDN introduced a level of agility and simplicity that was previously unimaginable.

Benefits of SDN for WANs

Enhanced Flexibility

SDN empowers network administrators to dynamically configure and customize WANs based on specific requirements. With a software-based control plane, they can quickly implement changes, allocate bandwidth, and optimize traffic routing, all in real time. This flexibility allows businesses to adapt swiftly to evolving needs and drive innovation.

Improved Efficiency

By leveraging SDN, WANs can achieve higher levels of efficiency through centralized management and automation. Network policies can be defined and enforced holistically, reducing manual configuration efforts and minimizing human errors. Additionally, SDN enables the intelligent allocation of network resources, optimizing bandwidth utilization and enhancing overall network performance.

Enhanced Security

Security threats are a constant concern in any network infrastructure. SDN brings a new layer of security to WANs by providing granular control over traffic flows and implementing sophisticated security policies. With SDN, network administrators can easily monitor, detect, and mitigate potential threats, ensuring data integrity and protecting against unauthorized access.

Use Cases and Implementation Examples

Dynamic Multi-site Connectivity

SDN enables seamless connectivity between multiple sites, allowing businesses to establish secure and scalable networks. With SDN, organizations can dynamically create and manage virtual private networks (VPNs) across geographically dispersed locations, simplifying network expansion and enabling agile resource allocation.

Cloud Integration and Hybrid WANs

Integrating SDN with cloud services unlocks a whole new level of scalability and flexibility for WANs. By combining SDN with cloud-based infrastructure, organizations can easily extend their networks to the cloud, access resources on demand, and leverage the benefits of hybrid WAN architectures.

Conclusion:

With its ability to enhance flexibility, improve efficiency, and bolster security, SDN is ushering in a new era for Wide-Area Networks (WANs). By embracing the power of software-defined networking, businesses can overcome the limitations of traditional WANs and build robust, agile, and future-proof network infrastructures. It’s time to embrace the SDN revolution and unlock the full potential of your WAN.

Low Latency Network Design

March 22, 2015

by Matt Conran Blog

Low Latency Network Design

In today's fast-paced digital world, where milliseconds can make a significant difference, achieving low latency in network design has become paramount. Whether it's for financial transactions, online gaming, or real-time communication, minimizing latency can enhance user experience and improve overall network performance. In this blog post, we will explore the key principles and strategies behind low latency network design.

Latency, often referred to as network delay, is the time it takes for a data packet to travel from its source to its destination. It encompasses various factors such as propagation delay, transmission delay, and processing delay. By comprehending the different components of latency, we can better grasp the importance of low latency network design.

One of the foundational elements of achieving low latency is by optimizing the hardware and infrastructure components of the network. This involves using high-performance routers and switches, reducing the number of network hops, and employing efficient cabling and interconnectivity solutions. By eliminating bottlenecks and implementing cutting-edge technology, organizations can significantly reduce latency.

Efficiently managing network traffic is crucial for minimizing latency. Implementing Quality of Service (QoS) mechanisms enables prioritization of critical data packets, ensuring they receive preferential treatment and are delivered promptly. Additionally, traffic shaping and load balancing techniques can help distribute network load evenly, preventing congestion and reducing latency.

Content Delivery Networks play a vital role in low latency network design, particularly for websites and applications that require global reach. By strategically distributing content across various geographically dispersed servers, CDNs minimize the distance between users and data sources, resulting in faster response times and reduced latency.

The emergence of edge computing has revolutionized low latency network design. By moving computational resources closer to end-users or data sources, edge computing reduces the round-trip time for data transmission, resulting in ultra-low latency. With the proliferation of Internet of Things (IoT) devices and real-time applications, edge computing is becoming increasingly essential for delivering seamless user experiences.

Low latency network design is a critical aspect of modern networking. By understanding the different components of latency and implementing strategies such as optimizing hardware and infrastructure, network traffic management, leveraging CDNs, and adopting edge computing, organizations can unlock the power of low latency. Embracing these principles not only enhances user experience but also provides a competitive advantage in an increasingly interconnected world.

Matt Conran

Highlights: Low Latency Network Design

Understanding Low Latency

Low latency, in simple terms, refers to the minimal delay or lag in data transmission between systems. It measures how quickly information can travel from its source to its destination. In network design, achieving low latency involves optimizing various factors such as network architecture, hardware, and protocols. By minimizing latency, businesses can gain a competitive edge, enhance user experiences, and unlock new realms of possibilities.

Low latency is critical in various applications and industries. In online gaming, it ensures that actions occur in real-time, preventing lag that can ruin the gaming experience. In financial trading, low latency is essential for executing trades at the exact right moment, where milliseconds can mean the difference between profit and loss. For streaming services, low latency allows for a seamless viewing experience without buffering interruptions.

The Benefits of Low Latency

1: – ) Low-latency network design offers a plethora of benefits across different industries. In the financial sector, it enables lightning-fast trades, providing traders with a significant advantage in highly volatile markets.

2: – ) Low latency ensures seamless gameplay for online gaming enthusiasts, reducing frustrating lags and enhancing immersion.

3: – ) Beyond finance and gaming, low latency networks improve real-time collaboration, enable telemedicine applications, and enhance the performance of emerging technologies like autonomous vehicles and Internet of Things (IoT) devices.

Achieving Low Latency

Achieving low latency involves optimizing network infrastructure and using advanced technology. This can include using fiber optic connections, which offer faster data transmission speeds, and deploying edge computing, which processes data closer to its source to reduce delay. Moreover, Content Delivery Networks (CDNs) distribute content across multiple locations, bringing it closer to the end-user and, thus, reducing latency.

1. Network Infrastructure: To achieve low latency, network designers must optimize the infrastructure by reducing bottlenecks, eliminating single points of failure, and ensuring sufficient bandwidth capacity.

2. Proximity: Locating servers and data centers closer to end-users can significantly reduce latency. By minimizing the physical distance, data can travel faster, resulting in lower latency.

3. Traffic Prioritization: Prioritizing latency-sensitive traffic within the network can help ensure that critical data packets are given higher priority, reducing the overall latency.

4. Quality of Service (QoS): Implementing QoS mechanisms allows network administrators to allocate resources based on application requirements. By prioritizing latency-sensitive applications, low latency can be maintained.

5. Optimization Techniques: Various optimization techniques, such as caching, compression, and load balancing, can further reduce latency by minimizing the volume of data transmitted and efficiently distributing the workload.

Traceroute – Testing for Latency and Performance

**How Traceroute Works**

At its core, traceroute operates by sending packets with increasing time-to-live (TTL) values. Each router along the path decrements the TTL by one before forwarding the packet. When a router’s TTL reaches zero, it discards the packet and sends back an error message to the sender. Traceroute uses this response to identify each hop, gradually mapping the entire route from source to destination. By analyzing the time taken for each response, traceroute also highlights latency issues at specific hops.

**Using Traceroute Effectively**

Running traceroute is simple, yet understanding its output requires some insight. The command displays a list of routers (or hops) with their respective IP addresses and the round-trip time (RTT) for packets to reach each router and return. This information can be used to diagnose network issues, such as identifying a slow or problematic hop. Network engineers often rely on traceroute to determine whether a bottleneck lies within their control or further along the internet’s infrastructure.

**Common Challenges and Solutions**

While traceroute is a powerful tool, it comes with its own set of challenges. Some routers may be configured to deprioritize or block traceroute packets, resulting in missing information. Additionally, asymmetric routing paths, where outbound and return paths differ, can complicate the analysis. However, understanding these limitations allows users to interpret traceroute results more accurately, using supplementary tools or methods to gain a comprehensive view of network health.

Key Challenges in Reducing Latency

Achieving low latency is a complex undertaking that involves several challenges. One of the primary hurdles is network distance. The physical distance between servers and users can significantly affect data transmission speed. Additionally, network congestion can lead to delays, making it difficult to maintain low latency consistently. Another challenge is the processing time required by servers to handle requests, which can introduce unwanted delays. This section delves into these challenges, examining how they hinder efforts to reduce latency.

**Technological Solutions and Innovations**

Despite the challenges, technological advancements offer promising solutions for reducing latency. Edge computing is one such innovation, bringing data processing closer to the user to minimize transmission time. Content delivery networks (CDNs) also play a crucial role by caching content in multiple locations worldwide, thereby reducing latency for end-users. Moreover, advancements in hardware and software optimization techniques contribute significantly to lowering processing times. In this section, we’ll explore these solutions and their potential to overcome latency challenges.

Google Cloud Machine Types

**Understanding Google Cloud’s Machine Type Offerings**

Google Cloud’s machine type families are categorized primarily based on workload requirements. These categories are designed to cater to various use cases, from general-purpose computing to specialized machine learning tasks. The three main families include:

1. **General Purpose**: This category is ideal for balanced workloads. It offers a mix of compute, memory, and networking resources. The e2 and n2 series are popular choices for those seeking cost-effective options with reasonable performance.

2. **Compute Optimized**: These machines are designed for high-performance computing tasks that require more computational power. The c2 series, for instance, provides excellent performance per dollar, making it ideal for CPU-intensive workloads.

3. **Memory Optimized**: For applications requiring substantial memory, such as large databases or in-memory analytics, the m1 and m2 series offer high memory-to-vCPU ratios, ensuring that memory-hungry applications run smoothly.

—

**The Importance of Low Latency Networks**

One of the critical factors in the performance of cloud-based applications is network latency. Google Cloud’s low latency network infrastructure is engineered to minimize delays, ensuring rapid data transfer and real-time processing capabilities. By leveraging a global network of data centers and high-speed fiber connections, Google Cloud provides a robust environment for latency-sensitive applications such as gaming, video streaming, and financial services.

—

**Choosing the Right Machine Type for Your Needs**

Selecting the appropriate machine type family is crucial for optimizing both performance and cost. Factors to consider include the nature of the workload, budget constraints, and the importance of scalability. For instance, a startup with a limited budget may prioritize cost-effective general-purpose machines, while a media company focusing on video rendering might opt for compute-optimized instances.

Additionally, Google Cloud’s flexible pricing models, including sustained use discounts and committed use contracts, offer further opportunities to save while scaling resources as needed.

Optimizing Cloud Performance with Google Cloud

### Understanding Managed Instance Groups

In the ever-evolving world of cloud computing, Managed Instance Groups (MIGs) have emerged as a critical component for maintaining and optimizing infrastructure. Google Cloud, in particular, offers robust MIG services that allow businesses to efficiently manage a fleet of virtual machines (VMs) while ensuring high availability and low latency. By automating the process of scaling and maintaining VM instances, MIGs help streamline operations and reduce manual intervention.

### Benefits of Using Managed Instance Groups

One of the primary benefits of utilizing Managed Instance Groups is the automatic scaling feature. This enables your application to handle increased loads by dynamically adding or removing VM instances based on demand. This elasticity ensures that your applications remain responsive and maintain low latency, which is crucial for providing a seamless user experience.

Moreover, Google Cloud’s MIGs facilitate seamless updates and patches to your VMs. With rolling updates, you can deploy changes gradually across instances, minimizing downtime and ensuring continuous availability. This process allows for a safer and more controlled update environment, reducing the risk of disruption to your operations.

### Achieving Low Latency with Google Cloud

Low latency is a critical factor in delivering high-performance applications, especially for real-time processing and user interactions. Google Cloud’s global network infrastructure, coupled with Managed Instance Groups, plays a vital role in achieving this goal. By distributing workloads across multiple instances and regions, you can minimize latency and ensure that users worldwide have access to fast and reliable services.

Additionally, Google Cloud’s load balancing services work in tandem with MIGs to evenly distribute traffic, preventing any single instance from becoming a bottleneck. This distribution ensures that your application can handle high volumes of traffic without degradation in performance, further contributing to low latency operations.

—

### Best Practices for Implementing Managed Instance Groups

When implementing Managed Instance Groups, it’s essential to follow best practices to maximize their effectiveness. Start by clearly defining your scaling policies based on your application’s needs. Consider factors such as CPU utilization, request count, and response times to determine when new instances should be added or removed.

It’s also crucial to monitor the performance of your MIGs continuously. Utilize Google Cloud’s monitoring and logging tools to gain insights into the health and performance of your instances. By analyzing this data, you can make informed decisions on scaling policies and infrastructure optimizations.

### The Role of Health Checks in Load Balancing

Health checks are the sentinels of your load balancing strategy. They monitor the status of your server instances, ensuring that traffic is only directed to healthy ones. In Google Cloud, health checks can be configured to check the status of backend services via various protocols like HTTP, HTTPS, TCP, and SSL. By setting parameters such as check intervals and timeout periods, you can fine-tune how Google Cloud determines the health of your instances. This process helps in avoiding downtime and maintaining a seamless user experience.

—

### Configuring Health Checks for Low Latency

Latency is a critical factor when it comes to user satisfaction. High latency can lead to slow-loading applications, frustrating users, and potentially driving them away. By configuring health checks appropriately, you can keep latency to a minimum. Google Cloud allows you to set up health checks that are frequent and precise, enabling the load balancer to quickly detect any issues and reroute traffic to healthy instances. Fine-tuning these settings helps in maintaining low latency, thus ensuring that your application remains responsive and efficient.

—

### Best Practices for Effective Health Checks

Implementing effective health checks involves more than just setting up default parameters. Here are some best practices to consider:

1. **Customize Check Frequency and Timeout**: Depending on your application’s needs, customize the frequency and timeout settings. More frequent checks allow for quicker detection of issues but may increase resource consumption.

2. **Diverse Protocols**: Utilize different protocols for different services. For example, use HTTP checks for web applications and TCP checks for database services.

3. **Monitor and Adjust**: Regularly monitor the performance of your health checks and adjust settings as necessary. This ensures that your system adapts to changing demands and maintains optimal performance.

4. **Failover Strategies**: Incorporate failover strategies to handle instances where the primary server pool is unhealthy, ensuring uninterrupted service.

Google Cloud Data Centers

#### What is a Cloud Service Mesh?

A cloud service mesh is a configurable infrastructure layer for microservices applications that makes communication between service instances flexible, reliable, and fast. It provides a way to control how different parts of an application share data with one another. A service mesh does this by introducing a proxy for each service instance, which handles all incoming and outgoing network traffic. This ensures that developers can focus on writing business logic without worrying about the complexities of communication and networking.

#### Importance of Low Latency

In today’s digital landscape, low latency is crucial for providing a seamless user experience. Whether it’s streaming video, online gaming, or real-time financial transactions, users expect instantaneous responses. A cloud service mesh optimizes the communication paths between microservices, ensuring that data is transferred quickly and efficiently. This reduction in latency can significantly improve the performance and responsiveness of applications.

#### Key Features of a Cloud Service Mesh

1. **Traffic Management**: One of the fundamental features of a service mesh is its ability to manage traffic between services. This includes load balancing, traffic splitting, and fault injection, which can help in maintaining low latency and high availability.

2. **Security**: Security is another critical aspect. A service mesh can enforce policies for mutual TLS (mTLS) authentication and encryption, ensuring secure communication between services without adding significant overhead that could affect latency.

3. **Observability**: With built-in observability features, a service mesh provides detailed insights into the performance and health of services. This includes metrics, logging, and tracing, which are essential for diagnosing latency issues and optimizing performance.

#### Implementing a Cloud Service Mesh

Implementing a service mesh involves deploying a set of proxies alongside your microservices. Popular service mesh solutions like Istio, Linkerd, and Consul provide robust frameworks for managing this implementation. These tools offer extensive documentation and community support, making it easier for organizations to adopt service meshes and achieve low latency performance.

Example Product: Cisco ThousandEyes

### What is Cisco ThousandEyes?

Cisco ThousandEyes is a powerful network intelligence platform designed to monitor, diagnose, and optimize the performance of your data center. It provides end-to-end visibility into network paths, application performance, and user experience, giving you the insights you need to maintain optimal operations. By leveraging cloud-based agents and enterprise agents, ThousandEyes offers a holistic view of your network, enabling you to identify and resolve performance bottlenecks quickly.

### Key Features and Benefits

#### Comprehensive Visibility

One of the standout features of Cisco ThousandEyes is its ability to provide comprehensive visibility across your entire network. Whether it’s on-premises, in the cloud, or a hybrid environment, ThousandEyes ensures you have a clear view of your network’s health and performance. This visibility extends to both internal and external networks, allowing you to monitor the entire data flow from start to finish.

#### Proactive Monitoring and Alerts

ThousandEyes excels in proactive monitoring, continuously analyzing your network for potential issues. The platform uses advanced algorithms to detect anomalies and performance degradation, sending real-time alerts to your IT team. This proactive approach enables you to address problems before they escalate, minimizing downtime and ensuring a seamless user experience.

#### Detailed Performance Metrics

With Cisco ThousandEyes, you gain access to a wealth of detailed performance metrics. From latency and packet loss to application response times and page load speeds, ThousandEyes provides granular data that helps you pinpoint the root cause of performance issues. This level of detail is crucial for effective troubleshooting and optimization, empowering you to make data-driven decisions.

### Use Cases: How ThousandEyes Transforms Data Center Performance

#### Optimizing Application Performance

For organizations that rely heavily on web applications, ensuring optimal performance is critical. ThousandEyes allows you to monitor application performance from the end-user perspective, identifying slowdowns and bottlenecks that could impact user satisfaction. By leveraging these insights, you can optimize your applications for better performance and reliability.

#### Enhancing Cloud Service Delivery

As more businesses move to the cloud, maintaining high performance across cloud services becomes increasingly important. ThousandEyes provides visibility into the performance of your cloud services, helping you ensure they meet your performance standards. Whether you’re using AWS, Azure, or Google Cloud, ThousandEyes can help you monitor and optimize your cloud infrastructure.

#### Improving Network Resilience

Network outages can have devastating effects on your business operations. ThousandEyes helps you build a more resilient network by identifying weak points and potential failure points. With its detailed network path analysis, you can proactively address vulnerabilities and enhance your network’s overall resilience.

Achieving Low Latency

A: Understanding Latency: Latency, simply put, is the time it takes for data to travel from its source to its destination. The lower the latency, the faster the response time. To comprehend the importance of low latency network design, it is essential to understand the factors that contribute to latency, such as distance, network congestion, and processing delays.

B: Bandwidth Optimization: Bandwidth plays a significant role in network performance. While it may seem counterintuitive, optimizing bandwidth can actually reduce latency. By implementing techniques such as traffic prioritization, Quality of Service (QoS), and efficient data compression, network administrators can ensure that critical data flows smoothly, reducing latency and improving overall performance.

C: Minimizing Network Congestion: Network congestion is a common culprit behind high latency. To address this issue, implementing congestion control mechanisms like traffic shaping, packet prioritization, and load balancing can be highly effective. These techniques help distribute network traffic evenly, preventing bottlenecks and reducing latency spikes.

D: Proximity Matters: Content Delivery Networks (CDNs): Content Delivery Networks (CDNs) are a game-changer when it comes to optimizing latency. By distributing content across multiple geographically dispersed servers, CDNs bring data closer to end-users, reducing the time it takes for information to travel. Leveraging CDNs can significantly enhance latency performance, particularly for websites and applications that serve a global audience.

E: Network Infrastructure Optimization: The underlying network infrastructure plays a crucial role in achieving low latency. Employing technologies like fiber optics, reducing signal noise, and utilizing efficient routing protocols can contribute to faster data transmission. Additionally, deploying edge computing capabilities can bring computation closer to the source, further reducing latency.

Google Cloud Network Tiers

Understanding Network Tiers

When it comes to network tiers, it is essential to comprehend their fundamental principles. Network tiers refer to the different levels of service quality and performance offered by a cloud provider. In the case of Google Cloud, there are two primary network tiers: Premium Tier and Standard Tier. Each tier comes with its own set of capabilities, pricing structures, and performance characteristics.

The Premium Tier is designed to provide businesses with unparalleled performance and reliability. It leverages Google’s global network infrastructure, ensuring low latency, high throughput, and robust security. This tier is particularly suitable for applications that demand real-time data processing, high-speed transactions, and global reach. While the Premium Tier might come at a higher cost compared to the Standard Tier, its benefits make it a worthwhile investment for organizations with critical workloads.

The Standard Tier, on the other hand, offers a cost-effective solution for businesses with less demanding network requirements. It provides reliable connectivity and reasonable performance for applications that do not heavily rely on real-time data processing or global scalability. By opting for the Standard Tier, organizations can significantly reduce their network costs without compromising the overall functionality of their applications.

Understanding VPC Peering

VPC peering is a method of connecting VPC networks using private IP addresses. It enables secure and direct communication between VPCs, regardless of whether they belong to the same or different projects within Google Cloud. This eliminates the need for complex and less efficient workarounds, such as external IP addresses or VPN tunnels.

VPC peering offers several advantages for organizations using Google Cloud. Firstly, it simplifies network architecture by providing a seamless connection between VPC networks. It allows resources in one VPC to directly access resources in another VPC, enabling efficient collaboration and resource sharing. Secondly, VPC peering reduces network latency by bypassing the public internet, resulting in faster and more reliable data transfers. Lastly, it enhances security by keeping the communication within the private network and avoiding exposure to potential threats.

Understanding Google Cloud CDN

Google Cloud CDN is a content delivery network service offered by Google Cloud Platform. It leverages Google’s extensive network infrastructure to cache and serve content from worldwide locations. Bringing content closer to users significantly reduces the time it takes to load web pages, resulting in faster and more efficient content delivery.

Implementing Cloud CDN is straightforward. It requires configuring the appropriate settings within the Google Cloud Console, such as defining the origin server, setting cache policies, and enabling HTTPS support. Once configured, Cloud CDN seamlessly integrates with your existing infrastructure, providing immediate performance benefits.

– Cache-Control: Leveraging cache control headers lets you specify how long content should be cached, reducing origin server requests and improving response times.

– Content Purging: Cloud CDN provides easy mechanisms to purge cached content, ensuring users receive the most up-to-date information when necessary.

– Monitoring and Analytics: Utilize Google Cloud Monitoring and Cloud Logging to gain insights into CDN performance, identify bottlenecks, and optimize content delivery further.

Use Case: Understanding Performance-Based Routing

Performance-based routing is a dynamic routing technique that selects the best path for data transmission based on real-time network performance metrics. Unlike traditional static routing, which relies on predetermined paths, performance-based routing considers factors such as latency, packet loss, and available bandwidth. Continuously evaluating network conditions ensures that data is routed through the most efficient path, improving overall network performance.

Enhanced Reliability: Performance-based routing improves reliability by dynamically adapting to network conditions and automatically rerouting traffic in case of network congestion or failures. This proactive approach minimizes downtime and ensures uninterrupted connectivity.

Optimized Performance: Performance-based routing facilitates load balancing by distributing traffic across multiple paths based on their performance metrics. This optimization reduces latency, enhances throughput, and improves overall user experience.

Cost Optimization: Through intelligent routing decisions, performance-based routing can optimize costs by leveraging lower-cost paths or utilizing network resources more efficiently. This cost optimization can be particularly advantageous for organizations with high bandwidth requirements or regions with varying network costs.

Routing Protocols:

Routing protocols are algorithms determining the best path for data to travel from the source to the destination. They ensure that packets are directed efficiently through network devices such as routers, switches, and gateways. Different routing protocols, such as OSPF, EIGRP, and BGP, have advantages and are suited for specific network environments.

Routing protocols should be optimized.

Routing protocols determine how data packets are forwarded between network nodes. Different routing protocols use different criteria for choosing the best path, including hop count, bandwidth, delay, cost, or load. Some routing protocols have fixed routes since they do not change unless manually updated. In addition, some are dynamic, allowing them to adapt automatically to changing network conditions. You can minimize latency and maximize efficiency by choosing routing protocols compatible with your network topology, traffic characteristics, and reliability requirements.

Optimizing routing protocols can significantly improve network performance and efficiency. By minimizing unnecessary hops, reducing congestion, and balancing network traffic, optimized routing protocols help enhance overall network reliability, reduce latency, and increase bandwidth utilization.

Strategies for Routing Protocol Optimization

a. Implementing Route Summarization:

Route summarization, also known as route aggregation, is a process that enables the representation of multiple network addresses with a single summarized route. Instead of advertising individual subnets, a summarized route encompasses a range of subnets under one address. This technique contributes to reducing the size of routing tables and optimizing network performance.

The implementation of route summarization offers several advantages. First, it minimizes routers’ memory requirements by reducing the number of entries in their routing tables. This reduction in memory consumption leads to improved router performance and scalability.

Second, route summarization enhances network stability and convergence speed by reducing the number of route updates exchanged between routers. Lastly, it improves security by hiding internal network structure, making it harder for potential attackers to gain insights into the network topology.

b. Load Balancing:

Load balancing distributes network traffic across multiple paths, preventing bottlenecks and optimizing resource utilization. Implementing load balancing techniques, such as equal-cost multipath (ECMP) routing, can improve network performance and avoid congestion. Load balancing is distributing the workload across multiple computing resources, such as servers or virtual machines, to ensure optimal utilization and prevent any single resource from being overwhelmed. By evenly distributing incoming requests, load balancing improves performance, enhances reliability, and minimizes downtime.

There are various load-balancing methods employed in different scenarios. Let’s explore a few popular ones:

-Round Robin: This method distributes requests equally among available resources cyclically. Each resource takes turns serving incoming requests, ensuring a fair workload allocation.

-Least Connections: The least connections method directs new requests to the resource with the fewest active connections. This approach prevents any resource from becoming overloaded and ensures efficient utilization of available resources.

-IP Hashing: With IP hashing, requests are distributed based on the client’s IP address. This method ensures that requests from the same client are consistently directed to the same resource, enabling session persistence and maintaining data integrity.

c. Convergence Optimization:

Convergence refers to the process by which routers learn and update routing information. Optimizing convergence time is crucial for minimizing network downtime and ensuring fast rerouting in case of failures. Techniques like Bidirectional Forwarding Detection (BFD) and optimized hello timers can expedite convergence. BFD, in simple terms, is a protocol used to detect faults in the forwarding path between network devices. It provides a mechanism for quickly detecting failures, ensuring quick convergence, and minimizing service disruption. BFD enables real-time connectivity monitoring between devices by exchanging control packets at a high rate.

The implementation of BFD brings several notable benefits to network operators. Firstly, it offers rapid failure detection, reducing the time taken for network convergence. This is particularly crucial in mission-critical environments where downtime can have severe consequences. Additionally, BFD is lightweight and has low overhead, making it suitable for deployment in resource-constrained networks.

Understanding Layer 3 Etherchannel

Layer 3 Etherchannel, or routed Etherchannel, is a network technology that bundles multiple physical links into a single logical interface. Unlike Layer 2 Etherchannel, which operates at the Data Link Layer, Layer 3 Etherchannel extends its capabilities to the Network Layer. This enables load balancing, redundancy, and increased bandwidth utilization across multiple routers or switches.

Configuring Layer 3 Etherchannel involves several steps. Firstly, the physical interfaces that will be part of the Etherchannel need to be identified. Secondly, the appropriate channel protocol, such as Protocol Independent Multicast (PIM) or Open Shortest Path First (OSPF), needs to be chosen. Next, the Layer 3 Etherchannel interface is configured with the desired parameters, including load-balancing algorithms and link priorities. Finally, the Etherchannel is linked to the chosen routing protocol to facilitate dynamic routing and optimal path selection.

Choose the correct topology:

Nodes and links in your network are arranged and connected according to their topology. Topologies have different advantages and disadvantages regarding latency, scalability, redundancy, and cost. A star topology, for example, reduces latency and simplifies management, but it carries a higher load and creates a single point of failure. Multiple paths connect nodes in mesh topologies, increasing complexity overhead, redundancy, and resilience. Choosing the proper topology depends on your network’s size, traffic patterns, and performance goals.

Understanding BGP Route Reflection

BGP Route Reflection allows network administrators to simplify the management of BGP routes within their autonomous systems (AS). It introduces a hierarchical structure by dividing the AS into clusters, where route reflectors are the focal points for route propagation. By doing so, BGP Route Reflection reduces the number of required BGP peering sessions and optimizes route distribution.

The implementation of BGP Route Reflection offers several advantages. Firstly, it reduces the overall complexity of BGP configurations by eliminating the need for full-mesh connectivity among routers within an AS. This simplification leads to improved scalability and easier management of BGP routes. Additionally, BGP Route Reflection enhances route convergence time, as updates can be disseminated more efficiently within the AS.

Route Reflector Hierarchy

Route reflectors play a vital role within the BGP route reflection architecture. They are responsible for reflecting BGP route information to other routers within the same cluster. Establishing a well-designed hierarchy of route reflectors is essential to ensure optimal route propagation and minimize potential issues such as routing loops or suboptimal path selection. We will explore different hierarchy designs and their implications.

Use quality of service techniques.

In quality of service (QoS) techniques, network traffic is prioritized and managed based on its class or category, such as voice, video, or data. Reducing latency by allocating more bandwidth, reducing jitter, or dropping less important packets with QoS techniques for time-sensitive or critical applications is possible.

QoS techniques implemented at the network layer include differentiated services (DiffServ) and integrated services (IntServ). Multiprotocol label switching (MPLS) and resource reservation protocol (RSVP) are implemented at the application layer. It would be best to use QoS techniques to guarantee the quality and level of service you want for your applications.

TCP Performance Optimizations

Understanding TCP Performance Parameters

TCP, or Transmission Control Protocol, is a fundamental component of Internet communication. It ensures reliable and ordered delivery of data packets, but did you know that TCP performance can be optimized by adjusting various parameters?

TCP performance parameters are configurable settings that govern the behavior of the TCP protocol. These parameters control congestion control, window size, and timeout values. By fine-tuning these parameters, network administrators can optimize TCP performance to meet specific requirements and overcome challenges.

Congestion Control and Window Size: Congestion control is critical to TCP performance. It regulates the rate at which data is transmitted to avoid network congestion. TCP utilizes a window size mechanism to manage unacknowledged data in flight. Administrators can balance throughput and latency by adjusting the window size to optimize network performance.

Timeout Values and Retransmission: Timeout values are crucial in TCP performance. When a packet is not acknowledged within a specific time frame, it is considered lost, and TCP initiates retransmission. Administrators can optimize the trade-off between responsiveness and reliability by adjusting timeout values. Fine-tuning these values can significantly impact TCP performance in scenarios with varying network conditions.

Bandwidth-Delay Product and Buffer Sizes: The bandwidth-delay product is a metric that represents the amount of data that can be in transit between two endpoints. It is calculated by multiplying the available bandwidth by the round-trip time (RTT). Properly setting buffer sizes based on the bandwidth-delay product helps prevent packet loss and ensures efficient data transmission.

Understanding TCP MSS

TCP MSS refers to the maximum amount of data transmitted in a single TCP segment. It plays a vital role in maintaining efficient and reliable communication between hosts in a network. By limiting the segment size, TCP MSS ensures compatibility and prevents fragmentation issues.

The significance of TCP MSS lies in its ability to optimize network performance. By setting an appropriate MSS value, network administrators can balance between efficient data transfer and minimizing overhead caused by fragmentation and reassembly. This enhances the overall throughput and reduces the likelihood of congestion.

Several factors influence the determination of TCP MSS. For instance, the network infrastructure, such as routers and switches, may limit the maximum segment size. Path MTU Discovery (PMTUD) techniques also help identify the optimal MSS value based on the path characteristics between source and destination.

Configuring TCP MSS requires a comprehensive understanding of the network environment and its specific requirements. It involves adjusting the MSS value on both communication ends to ensure seamless data transmission. Network administrators can employ various methods, such as adjusting router settings or utilizing specific software tools, to optimize TCP MSS settings.

What is TCP MSS?

TCP MSS refers to the maximum amount of data sent in a single TCP segment without fragmentation. It is primarily determined by the underlying network’s Maximum Transmission Unit (MTU). The MSS value is negotiated during the TCP handshake process and remains constant for the duration of the connection.

Optimizing TCP MSS is crucial for achieving optimal network performance. When the MSS is set too high, it can lead to fragmentation, increased overhead, and reduced throughput. On the other hand, setting the MSS too low can result in inefficiency due to smaller segment sizes. Finding the right balance can enhance network efficiency and minimize potential issues.

1. Path MTU Discovery (PMTUD): PMTUD is a technique in which the sender determines the maximum path MTU by allowing routers along the path to send ICMP messages indicating the required fragmentation. This way, the sender can dynamically adjust the MSS to avoid fragmentation.

2. MSS Clamping: In situations where PMTUD may not work reliably, MSS clamping can be employed. It involves setting a conservative MSS value guaranteed to work across the entire network path. Although this may result in smaller segment sizes, it ensures proper transmission without fragmentation.

3. Jumbo Frames: Jumbo Frames are Ethernet frames that exceed the standard MTU size. By using Jumbo Frames, the MSS can be increased, allowing for larger segments and potentially improving network performance. However, it requires support from both network infrastructure and end devices.

Understanding Switching

Layer 2 switching, also known as data link layer switching, operates at the second layer of the OSI model. It uses MAC addresses to forward data packets within a local area network (LAN). Unlike layer three routing, which relies on IP addresses, layer 2 switching occurs at wire speed, resulting in minimal latency and optimal performance.

One of the primary advantages of layer 2 switching is its ability to facilitate faster communication between devices within a LAN. By utilizing MAC addresses, layer 2 switches can make forwarding decisions based on the physical address of the destination device, reducing the time required for packet processing. This results in significantly lower latency, making it ideal for real-time applications such as online gaming, high-frequency trading, and video conferencing.

Implementing layer 2 switching requires the deployment of layer 2 switches, specialized networking devices capable of efficiently forwarding data packets based on MAC addresses. These switches are typically equipped with multiple ports to connect various devices within a network. By strategically placing layer 2 switches throughout the network infrastructure, organizations can create low-latency pathways for data transmission, ensuring seamless connectivity and optimal performance.

Spanning-Tree Protocol

STP, a layer 2 protocol, provides loop-free paths in Ethernet networks. It accomplishes this by creating a logical tree that spans all switches within the network. This tree ensures no redundant paths, avoiding loops leading to broadcast storms and network congestion.

While STP is essential for network stability, it can introduce delays during convergence. Convergence refers to the process where the network adapts to changes, such as link failures or network topology modifications. During convergence, STP recalculates the spanning tree, causing temporary disruptions in network traffic. In time-sensitive environments, these disruptions can be problematic.

Introducing Spanning-Tree Uplink Fast

Spanning-Tree Uplink Fast is a Cisco proprietary feature designed to reduce the convergence time of STP. When a superior BPDU (Bridge Protocol Data Unit) is received, it immediately transitions from a blocked port to a forwarding state. This feature is typically used on access layer switches that connect to distribution or core switches.

Understanding Spanning Tree Protocol (STP)

STP, a protocol defined by the IEEE 802.1D standard, is designed to prevent loops in Ethernet networks. STP ensures a loop-free network topology by dynamically calculating the best path and blocking redundant links. We will explore the inner workings of STP and its role in maintaining network stability.

Building upon STP, multiple spanning trees (MST) allow for creating multiple spanning trees within a single network. By dividing the network into multiple regions, MST enhances scalability and optimizes bandwidth utilization. We will delve into the configuration and advantages of MST in modern network environments.

Understanding Layer 2 Etherchannel

Layer 2 Etherchannel, or link aggregation, combines physical links into a single logical link. This provides increased bandwidth and redundancy, enhancing network performance and resilience. Unlike Layer 3 Etherchannel, which operates at the IP layer, Layer 2 Etherchannel operates at the data-link layer, making it suitable for various network topologies and protocols.

Implementing Layer 2 Etherchannel offers several key benefits. Firstly, it allows for load balancing across multiple links, distributing traffic evenly and preventing bottlenecks. Secondly, it provides link redundancy, ensuring uninterrupted network connectivity even during link failures. Moreover, Layer 2 Etherchannel simplifies network management by treating multiple physical links as a single logical interface, reducing complexity and easing configuration tasks.

**Keep an eye on your network and troubleshoot any issues.**

Monitoring and troubleshooting are essential to identifying and resolving any latency issues in your network. Tools and methods such as ping, traceroute, and network analyzers can measure and analyze your network’s latency and performance. These tools and techniques can also identify and fix network problems like packet loss, congestion, misconfiguration, or faulty hardware. Regular monitoring and troubleshooting are essential for keeping your network running smoothly.

**Critical Considerations in Low Latency Design**

Designing a low-latency network requires a thorough understanding of various factors. Bandwidth, network topology, latency measurement tools, and quality of service (QoS) policies all play pivotal roles. Choosing the right networking equipment, leveraging advanced routing algorithms, and optimizing data transmission paths are crucial to achieving optimal latency. Moreover, it is essential to consider scalability, security, and cost implications when designing and implementing low-latency networks.

What is a MAC Move Policy?

In the context of Cisco NX-OS devices, a MAC move policy defines the rules and behaviors associated with MAC address moves within a network. It determines how the devices handle MAC address changes when moved or migrated. The policy can be customized to suit specific network requirements, ensuring efficient resource utilization and minimizing disruptions caused by MAC address changes.

By implementing a MAC move policy, network administrators can achieve several benefits. First, it enhances network stability by preventing unnecessary MAC address flapping and associated network disruptions. Second, it improves network performance by optimizing MAC address table entries and reducing unnecessary broadcasts. Third, it provides better control and visibility over MAC address movements, facilitating troubleshooting and network management tasks.

Proper management of MAC move policy significantly impacts network performance. When MAC addresses move frequently or without restrictions, it can lead to excessive flooding, where switches forward frames to all ports, causing unnecessary network congestion. By implementing an appropriate MAC move policy, administrators can reduce flooding, prevent unnecessary MAC address learning, and enhance overall network efficiency.

Understanding sFlow

– sFlow is a standards-based technology that enables real-time network traffic monitoring by sampling packets. It provides valuable information such as packet headers, traffic volumes, and application-level details. By implementing sFlow on Cisco NX-OS, administrators can gain deep visibility into network behavior and identify potential bottlenecks or security threats.

– Configuring sFlow on Cisco NX-OS is straightforward. By accessing the device’s command-line interface, administrators can enable sFlow globally or on specific interfaces. They can also define sampling rates, polling intervals, and destination collectors where sFlow data will be sent for analysis. This section provides detailed steps and commands to guide administrators through the configuration process.

– Network administrators can harness its power to optimize performance once sFlow is up and running on Cisco NX-OS. By analyzing sFlow data, they can identify bandwidth-hungry applications, pinpoint traffic patterns, and detect anomalies. This section will discuss various use cases where sFlow can be instrumental in optimizing network performance, such as load balancing, capacity planning, and troubleshooting.

– Integration with network monitoring tools is essential to unleash sFlow’s full potential on Cisco NX-OS. sFlow data can seamlessly integrate with popular monitoring platforms like PRTG, SolarWinds, or Nagios.

Use Case: Performance Routing

Understanding Performance Routing (PfR)

Performance Routing, or PfR, is an intelligent network routing technique that dynamically adapts to network conditions, traffic patterns, and application requirements. Unlike traditional static routing protocols, PfR uses real-time data and advanced algorithms to make dynamic routing decisions, optimizing performance and ensuring efficient utilization of network resources.

Enhanced Application Performance: PfR significantly improves application performance by dynamically selecting the optimal path based on network conditions. It minimizes latency, reduces packet loss, and ensures a consistent end-user experience despite network congestion or link failures.

Efficient Utilization of Network Resources: PfR intelligently distributes traffic across multiple paths, leveraging available bandwidth and optimizing resource utilization. This improves overall network efficiency and reduces costs by avoiding unnecessary bandwidth upgrades.

Simplified Network Management: With PfR, network administrators gain granular visibility into network performance, traffic patterns, and application behavior. This enables proactive troubleshooting, capacity planning, and streamlined network management, saving time and effort.

Advanced Topics

BGP Next Hop Tracking:

BGP next hop refers to the IP address used to reach the destination network. When a BGP router receives an advertisement for a route, it must determine the next hop IP address to forward the traffic. This information is crucial for proper routing and efficient packet delivery.

Next-hop tracking provides several benefits for network operators. First, it enables proactive monitoring of the next-hop IP address, ensuring its reachability and availability. Network administrators can detect and resolve issues promptly by tracking the next hop continuously, reducing downtime, and improving network performance. Additionally, next-hop tracking facilitates efficient load balancing and traffic engineering, allowing for optimal resource utilization.

Cutting-Edge Technologies

Low-latency network design is constantly evolving, driven by technological advancements. Innovative solutions are emerging to address latency challenges, from software-defined networking (SDN) to edge computing and content delivery networks (CDNs). SDN, for instance, offers programmable network control, enabling dynamic traffic management and reducing latency. Edge computing brings compute resources closer to end-users, minimizing round-trip times. CDNs optimize content delivery by strategically caching data, reducing global audiences’ latency.

A New Operational Model

We are now all moving in the direction of the cloud. The requirement is for large data centers that are elastic and scalable. The result of these changes, influenced by innovations and methodology in the server/application world, is that the network industry is experiencing a new operational model. Provisioning must be quick, and designers look to automate network configuration more systematically and in a less error-prone programmatic way. It is challenging to meet these new requirements with traditional data center designs.

Changing Traffic Flow

Traffic flow has changed, and we have a lot of east-to-west traffic. Existing data center designs focus on north-to-south flows. East-to-west traffic requires changing the architecture from an aggregating-based model to a massive multipathing model. Referred to as Clos networks, leaf and spine designs allow building massive networks with reasonably sized equipment, enabling low-latency network design.

Vendor Example: High-Performance Switch: Cisco Nexus 3000 Series

Featuring switch-on-a-chip (SoC) architecture, the Cisco Nexus 3000 Series switches offer 1 gigabit, 10 gigabit, 40 gigabit, 100 gigabit and 400 gigabit Ethernet capabilities. This series of switches provides line-rate Layer 2 and 3 performance and is suitable for ToR architectures. Combining high performance and low latency with innovations in performance visibility, automation, and time synchronization, this series of switches has established itself as a leader in high-frequency trading (HFT), high-performance computing (HPC), and big data environments. Providing high performance, flexible connectivity, and extensive features, the Cisco Nexus 3000 Series offers 24 to 256 ports.

Related: Before you proceed, you may find the following post helpful:

Low Latency Network Design

Network Testing

A stable network results from careful design and testing. Although many vendors often perform exhaustive systems testing and provide this via third-party testing reports, they cannot reproduce every customer’s environment. So, to determine your primary data center design, you must conduct your tests.

Effective testing is the best indicator of production readiness. On the other hand, ineffective testing may lead to a false sense of confidence, causing downtime. Therefore, you should adopt a structured approach to testing as the best way to discover and fix the defects in the least amount of time at the lowest possible cost.

What is low latency?

Low latency is the ability of a computing system or network to respond with minimal delay. Actual low latency metrics vary according to the use case. So, what is a low-latency network? A low-latency network has been designed and optimized to reduce latency as much as possible. However, a low-latency network can only improve latency caused by factors outside the network.

We first have to consider latency jitters when they deviate unpredictably from an average; in other words, they are low at one moment and high at the next. For some applications, this unpredictability is more problematic than high latency. We also have ultra-low latency measured in nanoseconds, while low latency is measured in milliseconds. Therefore, ultra-low latency delivers a response much faster, with fewer delays than low latency.

Data Center Latency Requirements

Latency requirements

Intra-data center traffic flows concern us more with latency than outbound traffic flow. High latency between servers degrades performance and results in the ability to send less traffic between two endpoints. Low latency allows you to use as much bandwidth as possible.

A low-lay network design known as Ultra-low latency ( ULL ) data center design is the race to zero. The goal is to design as fast as possible with the lowest end-to-end latency. Latency on an IP/Ethernet switched network can be as low as 50 ns.

High-frequency trading ( HFT ) environments push for this trend, where providing information from stock markets with minimal delay is imperative. HFT environments are different than most DC designs and don’t support virtualization. The Port count is low, and servers are designed in small domains.

It is conceptually similar to how Layer 2 domains should be designed as small Layer 2 network pockets. Applications are grouped to match optimum traffic patterns where many-to-one conversations are reduced. This will reduce the need for buffering, increasing network performance. CX-1 cables are preferred over the more popular optical fiber.

Oversubscription

The optimum low-latency network design should consider and predict the possibility of congestion at critical network points. An unacceptable oversubscription example is a ToR switch with 20 Gbps traffic from servers but only 10 Gbps uplink. This will result in packet drops and poor application performance.

data center network design — Diagram: Data center network design and oversubscription

Previous data center designs were 3-tier aggregation model-based ( developed by Cisco ). Now, we are going for 2-tier models. The main design point for this model is the number of ports on the core; more ports on the core result in more extensive networks. Similar design questions would be a) how much routing and b) how much bridging will I implement c) where do I insert my network services modules?

We are now designing networks with lots of tiers—Clos Network. The concept comes from voice networks from around 1953, previously built voice switches with crossbar design. Clos designs give optimum any-to-any connectivity. They require low latency and non-blocking components. Every element should be non-blocking. Multipath technologies deliver a linear increase in oversubscription with each device failure and are better than architectures that degrade during failures.

Lossless transport

Data Center Bridging ( DCB ) offers standards for flow control and queuing. Even if your data center does not use ISCSI (the Internet Small Computer System Interface), TCP elephant flows benefit from lossless transport, improving data center performance. However, research has shown that many TCP flows are below 100Mbps.

The remaining small percentage are elephant flows, which consume 80% of all traffic inside the data center. Due to their size and how TCP operates, when an elephant flows and experiences packet drops, it slows down, affecting network performance.

Distributed resource scheduling

VMmobiliy is a VMware tool used for distributed resource scheduling. Load from hypervisors is automatically spread to other underutilized VMs. Other use cases in cloud environments where DC requires dynamic workload placement, and you don’t know where the VM will be in advance.

If you want to retain sessions, keep them in the same subnet. Layer 3 VMotion is too slow, as routing protocol convergence will always take a few seconds. In theory, you could optimize timers for fast convergence, but in practice, Interior Gateway Protocols ( IGP ) give you eventual consistency.

VMmobility

Data Centers require bridging at layer 2 to retain the IP addresses for VMobility. The TCP stack currently has no separation between “who” and “where” you are; the IP address represents both functions. Future implementation with Locator/ID Separation Protocol ( LISP ) divides these two roles, but bridging for VMobility is required until fully implemented.

Spanning Tree Protocol ( STP )

Spanning Tree reduces bandwidth by 50%, and massive multipathing technologies allow you to scale without losing 50% of the link bandwidth. Data centers want to move VMs without distributing traffic flow. VMware has VMotion. Microsoft Hyper-V has Live migration.

Network convergence

The layer 3 network requires many events to be completed before it reaches a fully converged state. In layer 2, when the first broadcast is sent, every switch knows precisely where that switch has moved. There are no mechanisms with Layer 3 to do something similar. Layer 2 networks result in a large broadcast domain.

You may also experience large sub-optimal flows as the Layer 3 next hop will stay the same when you move the VM. Optimum Layer 3 forwarding – what Juniper is doing with Q fabric. Every Layer 3 switch has the same IP address; they can all serve as the next hop—resulting in optimum traffic flow.

The well-known steps in routing convergence.

Deep packet buffers

We have more DC traffic and elephant flows from distributed databases. Traffic is now becoming very bursty. We also have a lot of microburst traffic. The bursts are so short that they don’t register as high link utilization but are big enough to overflow packet buffers and cause drops. This type of behavior with TCP causes TCP to start slowly, which is problematic for networks.

Final Points – Low Latency Network Design

Several strategies can be employed to minimize latency in network design. Firstly, utilizing edge computing can bring computational resources closer to users, reducing the distance data must travel. Secondly, implementing Quality of Service (QoS) policies can prioritize critical data traffic, ensuring it reaches its destination promptly. Lastly, optimizing hardware and software configurations, such as using high-performance routers and switches, can also contribute to reducing latency.

Low latency networks are essential in various industries. In finance, milliseconds can make the difference between profit and loss in high-frequency trading. Online gaming relies on low latency to ensure smooth gameplay and prevent lag. In healthcare, low latency networks enable real-time telemedicine consultations and remote surgeries. These examples underscore the importance of designing networks that prioritize low latency.

While the benefits are clear, designing low latency networks comes with its own set of challenges. Balancing cost and performance can be tricky, as achieving low latency often requires significant investment in infrastructure. Additionally, maintaining low latency across geographically dispersed networks can be challenging due to varying internet conditions and infrastructure limitations.

Designing a low latency network is a complex but rewarding endeavor. By understanding the fundamentals, employing effective strategies, and acknowledging the challenges, network designers can create systems that offer lightning-fast connectivity. As technology continues to evolve, the demand for low latency networks will only grow, making it an exciting field with endless possibilities for innovation.

Summary: Low Latency Network Design

In today’s fast-paced digital world, where every millisecond counts, the importance of low-latency network design cannot be overstated. Whether it’s online gaming, high-frequency trading, or real-time video streaming, minimizing latency has become crucial in delivering seamless user experiences. This blog post explored the fundamentals of low-latency network design and its impact on various industries.

Understanding Latency

In the context of networking, latency refers to the time it takes for data to travel from its source to its destination. It is often measured in milliseconds (ms) and can be influenced by various factors such as distance, network congestion, and processing delays. By reducing latency, businesses can improve the responsiveness of their applications, enhance user satisfaction, and gain a competitive edge.

The Benefits of Low Latency

Low latency networks offer numerous advantages across different sectors. In the financial industry, where split-second decisions can make or break fortunes, low latency enables high-frequency trading firms to execute trades with minimal delays, maximizing their profitability.

Similarly, in online gaming, low latency ensures smooth gameplay and minimizes the dreaded lag that can frustrate gamers. Additionally, industries like telecommunication and live video streaming heavily rely on low-latency networks to deliver real-time communication and immersive experiences.

Strategies for Low Latency Network Design

Designing a low-latency network requires careful planning and implementation. Here are some key strategies that can help achieve optimal latency:

Subsection: Network Optimization

By optimizing network infrastructure, including routers, switches, and cables, organizations can minimize data transmission delays. This involves utilizing high-speed, low-latency equipment and implementing efficient routing protocols to ensure data takes the most direct and fastest path.

Subsection: Data Compression and Caching

Reducing the size of data packets through compression techniques can significantly reduce latency. Additionally, implementing caching mechanisms allows frequently accessed data to be stored closer to the end-users, reducing the round-trip time and improving overall latency.

Subsection: Content Delivery Networks (CDNs)

Leveraging CDNs can greatly enhance latency, especially for global businesses. By distributing content across geographically dispersed servers, CDNs bring data closer to end-users, reducing the distance and time it takes to retrieve information.

Conclusion:

Low-latency network design has become a vital aspect of modern technology in a world driven by real-time interactions and instant gratification. By understanding the impact of latency, harnessing the benefits of low latency, and implementing effective strategies, businesses can unlock opportunities and deliver exceptional user experiences. Embracing low latency is not just a trend but a necessity for staying ahead in the digital age.

LISP Protocol and VM Mobility

November 15, 2014

by Matt Conran Blog

LISP Protocol and VM Mobility

The networking world is constantly evolving, with new technologies emerging to meet the demands of an increasingly connected world. One such technology that has gained significant attention is the LISP protocol. In this blog post, we will delve into the intricacies of the LISP protocol, exploring its purpose, benefits, and how it bridges the gap in modern networking and its use case with VM mobility.

LISP, which stands for Locator/ID Separation Protocol, is a network protocol that separates the identity of a device from its location. Unlike traditional IP addressing schemes, which rely on a tightly coupled relationship between the IP address and the device's physical location, LISP separates these two aspects, allowing for more flexibility and scalability in network design.

LISP, in simple terms, is a network protocol that separates the location of an IP address (Locator) from its identity (Identifier). By doing so, it provides enhanced flexibility, scalability, and security in managing network traffic. LISP accomplishes this by introducing two key components: the Mapping System (MS) and the Tunnel Router (TR). The MS maintains a database of mappings between Locators and Identifiers, while the TR encapsulates packets using these mappings for efficient routing.

VM mobility refers to the seamless movement of virtual machines across physical hosts or data centers. LISP Protocol plays a crucial role in enabling this mobility by decoupling the VM's IP address from its location. When a VM moves to a new host or data center, LISP dynamically updates the mappings in the MS, ensuring uninterrupted connectivity. By leveraging LISP, organizations can achieve live migration of VMs, load balancing, and disaster recovery with minimal disruption.

The combination of LISP Protocol and VM mobility brings forth a plethora of advantages. Firstly, it enhances network scalability by reducing the impact of IP address renumbering. Secondly, it enables efficient load balancing by distributing VMs across different hosts. Thirdly, it simplifies disaster recovery strategies by facilitating VM migration to remote data centers. Lastly, LISP empowers organizations with the flexibility to seamlessly scale their networks to meet growing demands.

While LISP Protocol and VM mobility offer significant benefits, there are a few challenges to consider. These include the need for proper configuration, compatibility with existing network infrastructure, and potential security concerns. However, the networking industry is consistently working towards addressing these challenges and further improving the LISP Protocol for broader adoption and seamless integration.

The combination of LISP Protocol and VM mobility opens up new horizons in network virtualization and mobility. By decoupling the IP address from its physical location, LISP enables organizations to achieve greater flexibility, scalability, and efficiency in managing network traffic. As the networking landscape continues to evolve, embracing LISP Protocol and VM mobility will undoubtedly pave the way for a more dynamic and agile networking infrastructure.

Matt Conran

Highlights: LISP Protocol and VM Mobility

Understanding LISP Protocol

– The LISP protocol, short for Locator/Identifier Separation Protocol, is a network architecture that separates the identity of a device (identifier) from its location (locator). It provides a scalable solution for routing and mobility while simplifying network design and reducing overhead. By decoupling the identifier and locator roles, LISP enables seamless communication and mobility across networks.

– Virtual machine mobility revolutionized the way we manage and deploy applications. With VM mobility, we can move virtual machines between physical hosts without interrupting services or requiring manual reconfiguration. This flexibility allows for dynamic resource allocation, load balancing, and disaster recovery. However, VM mobility also presents challenges in maintaining consistent network connectivity during migrations.

LISP & VM Mobility

The integration of LISP protocol and VM mobility brings forth a powerful combination. LISP provides a scalable and efficient routing infrastructure, while VM mobility enables dynamic movement of virtual machines. By leveraging LISP’s locator/identifier separation, VMs can maintain their identity while seamlessly moving across different networks or physical hosts. This synergy enhances network agility, simplifies management, and optimizes resource utilization.

The benefits of combining LISP and VM mobility are evident in various use cases. Data centers can achieve seamless workload mobility and improved load balancing. Service providers can enhance their network scalability and simplify multi-tenancy. Enterprises can optimize their network infrastructure for cloud computing and enable efficient disaster recovery strategies. The possibilities are vast, and the benefits are substantial.

How Does LISP Work

Locator Identity Separation Protocol ( LISP ) provides a set of functions that allow Endpoint identifiers ( EID ) to be mapped to an RLOC address space. The mapping between these two endpoints offers the separation of IP addresses into two numbering schemes ( similar to the “who” and the “where” analogy ), offering many traffic engineering and IP mobility benefits for the geographic dispersion of data centers beneficial for VM mobility.

LISP Components

The LISP protocol operates by creating a mapping system that separates the device’s Endpoint Identifier (EID), from its location, the Routing Locator (RLOC). This separation is achieved using a distributed database called the LISP Mapping System (LMS), which maintains the mapping between EIDs and RLOCs. When a packet is sent to a destination EID, it is encapsulated and routed based on the RLOC, allowing for efficient and scalable communication.

Before you proceed, you may find the following posts helpful:

LISP Protocol and VM Mobility

Virtualization

1- Virtualization can be applied to subsystems such as disks and a whole machine. A virtual machine (VM) is implemented by adding a software layer to an actual device to sustain the desired virtual machine’s architecture. In general, a virtual machine can circumvent real compatibility and hardware resource limitations to enable a more elevated degree of software portability and flexibility.

2- In the dynamic world of modern computing, the ability to seamlessly move virtual machines (VMs) between different physical hosts has become a critical aspect of managing resources and ensuring optimal performance. This blog post explores VM mobility and its significance in today’s rapidly evolving computing landscape.

3- VM mobility refers to transferring a virtual machine from one physical host to another without disrupting operation. Virtualization technologies such as hypervisors make this capability possible, enabling the abstraction of hardware resources and allowing multiple VMs to coexist on a single physical machine.

LISP and VM Mobility

The Locator/Identifier Separation Protocol (LISP) is an innovative networking architecture that decouples the identity (Identifier) of a device or VM from its location (Locator). By separating the two, LISP provides a scalable and flexible solution for VM mobility.

**How LISP Enhances VM Mobility**

1. Improved Scalability:

LISP introduces a level of indirection by assigning Endpoint Identifiers (EIDs) to VMs. These EIDs act as unique identifiers, allowing VMs to retain their identity even when moved to different locations. This enables enterprises to scale their VM deployments without worrying about the limitations imposed by the underlying network infrastructure.

2. Seamless VM Mobility:

LISP simplifies moving VMs by abstracting the location information using Routing Locators (RLOCs). When a VM is migrated, LISP updates the mapping between the EID and RLOC, allowing the VM to maintain uninterrupted connectivity. This eliminates the need for complex network reconfigurations, reducing downtime and improving overall agility.

3. Load Balancing and Disaster Recovery:

LISP enables efficient load balancing and disaster recovery strategies by providing the ability to distribute VMs across multiple physical hosts or data centers. With LISP, VMs can be dynamically moved to optimize resource utilization or to ensure business continuity in the event of a failure. This improves application performance and enhances the overall resilience of the IT infrastructure.

4. Interoperability and Flexibility:

LISP is designed to be interoperable with existing network infrastructure, allowing organizations to gradually adopt the protocol without disrupting their current operations. It integrates seamlessly with IPv4 and IPv6 networks, making it a future-proof solution for VM mobility.

Basic LISP Traffic flow

A device ( S1 ) initiates a connection and wants to communicate with another external device ( D1 ). D1 is located in a remote network. S1 will create a packet with the EID of S1 as the source IP address and the EID of D1 as the destination IP address. As the packets flow to the network’s edge on their way to D1, they are met by an Ingress Tunnel Router ( ITR ).

The ITR maps the destination EID to a destination RLOC and then encapsulates the original packet with an additional header with the source IP address of the ITR RLOC and the destination IP address of the RLOC of an Egress Tunnel Router ( ETR ). The ETR is located on the remote site next to the destination device D1.

The magic is how these mappings are defined, especially regarding VM mobility. There is no routing convergence, and any changes to the mapping systems are unknown to the source and destination hosts. We are offering complete transparency.

LISP Terminology

LISP namespaces:

LSP Name Component	LISP Protocol Description
End-point Identifiers ( EID ) Addresses	The EID is allocated to an end host from an EID-prefix block. The EID associates where a host is located and identifies endpoints. The remote host obtains a destination the same way it obtains a normal destination address today, for example through DNS or SIP. The procedure a host uses to send IP packets does not change. EIDs are not routable.
Route Locator ( RLOC ) Addresses	The RLOC is an address or group of prefixes that map to an Egress Tunnel Router ( ETR ). Reachability within the RLOC space is achieved by traditional routing methods. The RLOC address must be routable.

LISP site devices:

LISP Component	LISP Protocol Description
Ingress Tunnel Router ( ITR )	An ITR is a LISP Site device that sits in a LISP site and receives packets from internal hosts. It in turn encapsulates them to remote LISP sites. To determine where to send the packet the ITR performs an EID-to-RLOC mapping lookup. The ITR should be the first-hop or default router within a site for the source hosts.
Egress Tunnel Router ( ETR )	An ETR is a LISP Site device that receives LISP-encapsulated IP packets from the Internet, decapsulates them, and forwards them to local EIDs at the site. An ETR only accepts an IP packet where the destination address is the “outer” IP header and is one of its own configured RLOCs. The ETR should be the last hop router directly connected to the destination.

LISP infrastructure devices:

LISP Component Name	LISP Protocol Description
Map-Server ( MS )	The map server contains the EID-to-RLOC mappings and the ETRs register their EIDs to the map server. The map-server advertises these, usually as an aggregate into the LISP mapping system.
Map-Resolver ( MR )	When resolving EID-to-RLOC mappings the ITRs send LISP Map-Requests to Map-Resolvers. The Map-Resolver is typically an Anycast address. This improves the mapping lookup performance by choosing the map-resolver that is topologically closest to the requesting ITR.
Proxy ITR ( PITR )	Provides connectivity to non-LISP sites. It acts like an ITR but does so on behalf of non-LISP sites.
Proxy ETR ( PETR )	Acts like an ETR but does so on behalf of LISP sites that want to communicate to destinations at non-LISP sites.

VM Mobility

LISP Host Mobility

LISP VM Mobility ( LISP Host Mobility ) functionality allows any IP address ( End host ) to move from its subnet to either a) a completely different subnet, known as “across subnet,” or b) an extension of its subnet in a different location, known as “extended subnet,” while keeping its original IP address.

When the end host carries its own Layer 3 address to the remote site, and the prefix is the same as the remote site, it is known as an “extended subnet.” Extended subnet mode requires a Layer 2 LAN extension. On the other hand, when the end hosts carry a different network prefix to the remote site, it is known as “across subnets.” When this is the case, a Layer 2 extension is not needed between sites.

LAN extension considerations

LISP does not remove the need for a LAN extension if a VM wants to perform a “hot” migration between two dispersed sites. The LAN extension is deployed to stretch a VLAN/IP subnet between separate locations. LISP complements LAN extensions with efficient move detection methods and ingress traffic engineering.

LISP works with all LAN extensions – whether back-to-back vPC and VSS over dark fiber, VPLS, Overlay Transport Virtualization ( OTV ), or Ethernet over MPLS/IP. LAN extension best practices should still be applied to the data center edges. These include but are not limited to – End-to-end Loop Prevention and STP isolation.

A LISP site with a LAN extension extends a single site across two physical data center sites. This is because the extended subnet functionality of LISP makes two DC sites a single LISP site. On the other hand, when LISP is deployed without a LAN extension, a single LISP site is not extended between two data centers, and we end up having separate LISP sites.

LISP extended subnet

VM mobility: LISP protocol and extended subnets

To avoid asymmetric traffic handling, the LAN extension technology must filter Hot Standby Router Protocol ( HSRP ) HELLO messages across the two data centers. This creates an active-active HSRP setup. HSRP localization optimizes egress traffic flows. LISP optimizes ingress traffic flows.

The default gateway and virtual MAC address must remain consistent in both data centers. This is because the moved VM will continue to send to the same gateway MAC address. This is accomplished by configuring the same HSRP gateway IP address and group in both data centers. When an active-active HSRP domain is used, re-ARP is not needed during mobility events.

The LAN extension technology must have multicast enabled to support the proper operation of LISP. Once a dynamic EID is detected, the multicast group IP addresses send a map-notify message by the xTR to all other xTRs. The multicast messages are delivered leveraging the LAN extension.

LISP across subnet

LISP across subnets requires the mobile VM to access the same gateway IP address, even if they move across subnets. This will prevent egress traffic triangulation back to the original data center. This can be achieved by manually setting the vMAC address associated with the HSRP group to be consistent across sites.

Proxy ARP must be configured under local and remote SVIs to correctly handle new ARP requests generated by the migrated workload. With this deployment, there is no need to deploy a LAN extension to stretch VLAN/IP between sites. This is why it is considered to address “cold” migration scenarios, such as Disaster Recovery ( DR ) or cloud bursting and workload mobility according to demands.

Benefits of LISP

1. Scalability: By separating the identifier from the location, LISP provides a scalable solution for network design. It allows for hierarchical addressing, reducing the size of the global routing table and enabling efficient routing across large networks.

2. Mobility: LISP’s separation of identity and location mainly benefits mobile devices. As devices move between networks, their EIDs remain constant while the RLOCs are updated dynamically. This enables seamless mobility without disrupting ongoing connections.

3. Multihoming: LISP allows a device to have multiple RLOCs, enabling multihoming capabilities without complex network configurations. This ensures redundancy, load balancing, and improved network reliability.

4. Security: LISP provides enhanced security features, such as cryptographic authentication and integrity checks, to ensure the integrity and authenticity of the mapping information. This helps mitigate potential attacks, such as IP spoofing.

Applications of LISP

1. Data Center Interconnection: LISP can interconnect geographically dispersed data centers, providing efficient and scalable communication between locations.

2. Internet of Things (IoT): With the exponential growth of IoT devices, LISP offers an efficient solution for managing these devices’ addressing and communication needs, ensuring seamless connectivity in large-scale deployments.

3. Content Delivery Networks (CDNs): LISP can optimize content delivery by allowing CDNs to cache content closer to end-users, reducing latency and improving overall performance.

Closing Points: LISP and VM Mobility

LISP is a network architecture and protocol that separates the two functions of IP addresses: identifying endpoints and routing traffic. By doing so, it allows for more efficient routing and a reduction in the complexity of network management. This separation is fundamental to enabling VM mobility, as it allows VMs to maintain consistent identities even as their physical locations change.

One of the primary benefits of LISP VM Mobility is the enhanced flexibility it provides. Businesses can move VMs across different data centers or cloud environments without having to reconfigure their network settings. This capability is particularly beneficial for disaster recovery scenarios, load balancing, and maintenance operations. Additionally, LISP VM Mobility can lead to cost savings by optimizing resource utilization and reducing the need for redundant infrastructure.

To implement LISP VM Mobility, organizations need to ensure that their network infrastructure supports the LISP protocol. This may involve updating network equipment and software to be compatible with LISP. Additionally, IT teams should be trained to manage and troubleshoot LISP-enabled environments effectively. By taking these steps, businesses can harness the full potential of LISP VM Mobility to drive innovation and efficiency.

Despite its advantages, LISP VM Mobility is not without challenges. Organizations must carefully plan the transition to ensure compatibility and minimize disruptions. Security is another critical consideration, as the dynamic nature of VM mobility can introduce new vulnerabilities. Implementing robust security measures, such as encryption and access controls, is essential to safeguarding data as it moves across networks.

Summary: LISP Protocol and VM Mobility

LISP (Locator/ID Separation Protocol) and VM (Virtual Machine) Mobility are two powerful technologies that have revolutionized the world of networking and virtualization. In this blog post, we delved into the intricacies of LISP and VM Mobility, exploring their benefits, use cases, and seamless integration.

Understanding LISP

LISP, a groundbreaking protocol, separates the role of a device’s identity (ID) from its location (Locator). By decoupling these two aspects, LISP enables efficient routing and scalable network architectures. It provides a solution to overcome the limitations of traditional IP-based routing, enabling enhanced mobility and flexibility in network design.

Unraveling VM Mobility

VM Mobility, on the other hand, refers to the ability to seamlessly move virtual machines across different physical hosts or data centers without disrupting their operations. This technology empowers businesses with the flexibility to optimize resource allocation, enhance resilience, and improve disaster recovery capabilities.

The Synergy between LISP and VM Mobility

When LISP and VM Mobility join forces, they create a powerful combination that amplifies the benefits of both technologies. By leveraging LISP’s efficient routing and location independence, VM Mobility becomes even more agile and robust. With LISP, virtual machines can be effortlessly moved between hosts or data centers, maintaining seamless connectivity and preserving the user experience.

Real-World Applications

Integrating LISP and VM Mobility opens up various possibilities across various industries. In the healthcare sector, for instance, virtual machines hosting critical patient data can be migrated between locations without compromising accessibility or security. Similarly, in cloud computing, LISP and VM Mobility enable dynamic resource allocation, load balancing, and efficient disaster recovery strategies.

Conclusion:

In conclusion, combining LISP and VM Mobility ushers a new era of network agility and virtual machine management. Decoupling identity and location through LISP empowers organizations to seamlessly move virtual machines across different hosts or data centers, enhancing flexibility, scalability, and resilience. As technology continues to evolve, LISP and VM Mobility will undoubtedly play a crucial role in shaping the future of networking and virtualization.

IT engineers team workers character and data center concept. Vector flat graphic design isolated illustration

Internet Locator

October 27, 2014

by Matt Conran Blog

Internet Locator

In today's digitally connected world, the ability to locate and navigate through various online platforms has become an essential skill. With the advent of Internet Locator, individuals and businesses can now effortlessly explore the vast online landscape. In this blog post, we will delve into the concept of Internet Locator, its significance, and how it has revolutionized how we navigate the digital realm.

Routing table growth: There has been exponential growth in Internet usage, and the scalability of today's Internet routing system is now a concern. With more people surfing the web than ever, the underlying technology must be able to cope with demand.

Whereas in the past, getting an internet connection via some internet locator service could sometimes be expensive, nowadays, thanks to bundles that include telephone connections and streaming services, connecting to the web has never been more affordable. It is also important to note that routing table growth has a significant drive driving a need to reexamine internet connectivity.

Limitation in technologies: This has been met with the limitations and constraints of router technology and current Internet addressing architectures. If we look at the core Internet protocols that comprise the Internet, we have not experienced any significant change in over a decade.

The physical-layer mechanisms that underlie the Internet have radical changed, but only a small number of tweaks have been made to BGP and its transport protocol, TCP. Mechanisms such as MPLS were introduced to provide a workaround to IP limitations within the ISP. Still, Layer 3 or 4 has had no substantial change for over a decade.

Matt Conran

Highlights: Internet Locator

Understanding the Basics of Routing

– At its core, routing refers to the process of selecting paths in a network along which to send data packets. Imagine it as the GPS for the internet, making split-second decisions to ensure that your data takes the most efficient and reliable route.

– Routers, the devices responsible for this task, constantly analyze the network’s topology, updating their routing tables to reflect the best paths available. This dynamic process allows the internet to function smoothly, even as network conditions change.

– Path selection is the heart of routing, involving complex algorithms that determine the best possible path for data to travel. Factors such as path length, bandwidth, congestion, and network policies all influence the decision-making process.

– Protocols like OSPF (Open Shortest Path First) and BGP (Border Gateway Protocol) are employed to ensure that data flows through the most optimal routes, minimizing delays and maximizing efficiency. Understanding these protocols is essential for networking professionals aiming to optimize network performance and reliability.

Note – Routing Tables

### What Are Routing Tables?

Routing tables are essentially databases stored on routers that contain information about the paths to various network destinations. Each entry in a routing table details a specific route and consists of several components, including the destination IP address, the subnet mask, the next hop, and the metric. These components work together to determine the best path for data to travel across the network. By constantly updating and maintaining these tables, routers ensure that data packets reach their endpoints efficiently.

Example of VPC Networking & Routes:

### The Process of Path Selection

Path selection is a critical function of routing tables. It involves determining the most optimal route for data packets to travel from their source to their destination. This decision-making process is influenced by various factors, such as network topology, link costs, and congestion levels. Routers evaluate these factors using algorithms like Distance Vector, Link State, and Path Vector to select the best available path. By doing so, they help maintain high network performance and reliability.

### Dynamic vs. Static Routing

Routing tables can be classified into two types: dynamic and static. Static routing involves manually configuring routers with fixed paths, which can be inefficient in complex or changing network environments. On the other hand, dynamic routing uses protocols such as OSPF, EIGRP, and BGP to automatically update routing tables based on real-time network conditions. Dynamic routing offers greater flexibility and adaptability, making it suitable for larger and more complex networks.

### Challenges and Considerations

While routing tables and path selection are essential for network efficiency, they also present certain challenges. Network administrators must consider factors such as scalability, security, and redundancy when configuring routing tables. Additionally, the risk of routing loops, incorrect configurations, and outdated tables can impact network performance. To mitigate these risks, regular monitoring and maintenance of routing tables are necessary.

Example Routing with IPv6 OSPFv3

Path Selection

In the Forwarding Information Base (FIB), prefix length determines the path a packet should take. Routing information bases (RIBs), or routing tables, program the FIB. Routing protocol processes present routes to the RIB. Three components are involved in path selection:

In the subnet mask, the prefix length represents the number of leading binary bits in the on position.
An administrative distance rating (AD) indicates how trustworthy a routing information source is. If a router learns about a route to a destination from multiple routing protocols, it compares the AD.
Routing protocols use metrics to calculate the best paths. Metrics vary from routing protocol to routing protocol.

Prefix Length

Here’s an example of how a router selects a route when the packet destination falls within the range of multiple routes. Consider a router with the following routes, each with a different prefix length:

10.0.3.0/28
10.0.3.0/26
10.0.3.0/24

There are various prefix lengths (subnet masks) for these routes, also known as prefix routes. RIBs, also known as routing tables, contain all of the routes that are considered different destinations. Unless the prefix is connected to a network, the routing table includes the outgoing interface and the next-hop IP address.

Related: Before you proceed, you may find the following posts helpful:

Internet Locator

The Internet is often represented as a cloud. However, this needs to be clarified as there are few direct connections over the Internet. The Internet is also a partially distributed network. It is decentralized, with many centers or nodes and direct or indirect links. There are also different types of networks on the Internet. For example, we have a centralized, decentralized, and distributed network.

The Internet is a conglomeration of independent systems representing organizations’ administrative authority and routing policies. Autonomous systems are made up of Layer 3 routers that run Interior Gateway Protocols (IGPs) such as Open Shortest Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS) within their borders and interconnect via an Exterior Gateway Protocol (EGP). The current Internet de facto standard EGP is the Border Gateway Protocol Version 4 (BGP-4), defined in RFC 1771.

Guide on BGP Connectivity

In the following, we see a simple BGP design. BGP operated over TCP, more specifically, TCP port 179. BGP peers are created and can be iBGP or EBGP. In the screenshots below, we have an iBGP design. Remember that BGP is a Path Vector Protocol and utilizes a path vector protocol, which considers various factors while making routing decisions. These factors include the number of network hops, network policies, and path attributes such as AS path, next-hop, and origin.

1. Path Vector Protocol: BGP utilizes a path vector protocol, which considers various factors while making routing decisions. These factors include the number of network hops, network policies, and path attributes such as AS path, next-hop, and origin.

Internet Locator: Default Free Zone ( DFZ )

The first large-scale packet-switching network was ARPAnet- the modern Internet’s predecessor. It used a simplex protocol called Network Control Program ( NCP ). NCP combined addressing and transport into a single protocol. Many applications were built on top of NCP, which was very successful. However, it lacked flexibility. As a result, reliability was separated from addressing and packet transfer in the design of the Internet Protocol Suite, with IP being separated from TCP.

On the 1st of January 1983, ARPAnet officially rendered NCP and moved to a more flexible and powerful protocol suite – TCP/IP. The transition from NCP to TCP/IP was known as “flag day,” It was quickly done with only 400 nodes to recompute.

Today, a similar flag day is impossible due to the sheer size and scale of the Internet backbone. The requirement to change anything on the Internet is driven by necessity, and it’s usually slow to change such a vast network. For example, inserting an additional header into the protocol would impact IP fragmentation processing and congestion mechanism. Changing the semantics of IP addressing is problematic as the IP address has been used as an identifier to higher-level protocols and encoded in the application.

**Understanding Default-Free Zones**

In the rapidly evolving landscape of network architecture, the concept of a Default-Free Zone (DFZ) stands out as a crucial element for ensuring seamless connectivity and resilience. A DFZ is essentially a segment of the Internet routing infrastructure where routers operate without a default route. This means that every packet of data must have a specific path, enhancing the precision and efficiency of data transmission. Understanding DFZs is vital for network engineers and IT professionals who strive to maintain robust and efficient networks.

**The Role of DFZ in Modern Networks**

Default-Free Zones play a pivotal role in modern networks by eliminating the dependency on a default router. This leads to a more streamlined routing process, reducing the risk of data bottlenecks and enhancing overall network performance. In a DFZ, routers must rely on complete and accurate routing information, which makes it essential for organizations to maintain up-to-date routing tables and configurations. This meticulous approach not only improves network reliability but also makes troubleshooting more straightforward, as each route is explicitly defined.

The driving forces of the DFZ

Many factors are driving the growth of the Default Free Zone ( DFZ ). These mainly include multi-homing, traffic engineering, and policy routing. The Internet Architecture Board ( IAB ) met on October 18-19th, 2006, and their key finding was that they needed to devise a scalable routing and addressing system. Such an addressing system must meet the current challenges of multi-homing and traffic engineering requirements.

**Challenges and Considerations**

While the benefits of adopting a DFZ are manifold, there are challenges that organizations must address. Maintaining a DFZ requires a high level of expertise and constant monitoring to ensure that routing tables are comprehensive and accurate. The lack of a default route means that any missing information could lead to data transmission failures. As such, organizations must invest in skilled personnel and advanced routing technologies to manage their DFZ effectively. Additionally, the complexity of setting up and maintaining a DFZ can be prohibitive for smaller organizations with limited resources.

**The Future of Networking with DFZ**

As network demands continue to grow, the importance of Default-Free Zones is expected to increase. The rise of cloud computing, IoT devices, and the ever-expanding Internet of Things (IoT) ecosystem necessitates a more resilient and efficient network infrastructure. DFZs are poised to play a critical role in meeting these demands by providing a more reliable and efficient routing framework. Organizations that adopt DFZs are likely to be better equipped to handle future network challenges and innovations.

Internet Locator: Locator/ID Separation Protocol ( LISP )

There has been some progress with the Locator/ID separation protocol ( LISP ) development. LISP is a routing architecture that redesigns the current addressing architecture. Traditional addressing architecture uses a single name, the IP address, to express two functions of a device.

The first function is its identity, i.e., who, and the second function is its location, i.e., where. LISP separates IP addresses into two namespaces: Endpoint Identifiers ( EIDs ), non-routable addresses assigned to hosts, and Routing Locators ( RLOCs), routable addresses assigned to routers that make up the global routing system.

internet locator — Internet locator with LISP

Separating these functions offers numerous benefits within a single protocol, one of which attempts to address the scalability of the Default Free Zone. In addition, LISP is a network-based implementation with most of the deployment at the network edges. As a result, LISP integrates well into the current network infrastructure and requires no changes to the end host stack.

Recap on LISP Protocol and Path Selection

Path selection in LISP is a crucial component that determines how data packets traverse the network. Unlike traditional routing protocols that rely solely on path metrics or shortest path algorithms, LISP introduces a more dynamic and intelligent approach. It leverages a mapping system to decide the best route for data transmission, considering factors such as bandwidth, latency, and network policies. This innovative method ensures that data flows are optimized for efficiency and reliability, even in complex network environments.

### How LISP Enhances Network Scalability

One of the standout features of the LISP protocol is its ability to address the growing demands of network scalability. By decoupling identity from location, LISP minimizes the size of routing tables, thereby reducing memory and processing requirements on routers. This is particularly advantageous in large-scale networks, where maintaining a table of all possible routes can be cumbersome and inefficient. LISP’s path selection mechanism dynamically adapts to changes in the network, ensuring scalability without compromising performance.

Guide on LISP.

In the following guide, we will look at a LISP network. These LISP protocol components include the following:

Map Registration and Map Notify.
Map Request and Map-Reply.
LISP Protocol Data Path.
Proxy ETR.
Proxy ITR.

LISP implements the use of two namespaces instead of a single IP address:

Endpoint identifiers (EIDs)—assigned to end hosts.
Routing locators (RLOCs) are assigned to devices (primarily routers) that comprise the global routing system.

Splitting EID and RLOC functions yields several advantages, including improved routing system scalability, multihoming efficiency, and ingress traffic engineering. With the command: show lisp site summary, site 1 consists of R1, and site 2 consists of R2. Each of these sites advertises its own EID prefix. On R1, the tunnel router, we see the routing locator address 10.0.1.2. The RLOCs ( routing locators ) are interfaces on the tunnel routers.

Border Gateway Protocol (BGP) role in the DFZ

Border Gateway Protocol, or BGP, is an exterior gateway protocol that allows different autonomous systems (AS) to exchange routing information. It is designed to enable efficient communication between different networks and facilitate data exchange and traffic across the Internet.

Exchanging NLRI

BGP is the protocol used to exchange NLRI between devices on the Internet and is the most critical piece of Internet architecture. It is used to interconnect Autonomous systems on the Internet, and it holds the entire network together. Routes are exchanged between BGP speakers with UPDATE messages. The BGP routing table ( RIB ) now stands at over 520,000 routes.

Although some of this growth is organic, a large proportion is driven by prefix de-aggregation. Prefix de-aggregation leads to increased BGP UPDATE messages injected into the DFZ. UPDATE messages require protocol activity between routing nodes, which requires additional processing to maintain the state for the longer prefixes.

Excess churn exposes the network’s core to the edges’ dynamic nature. This detrimental impacts routing convergence since UPDATES need to be recomputed and downloaded from the RIB to the FIB. As a result, it is commonly viewed that the Internet is never fully converged.

Example BGP Technology: Prefer EBGP over iBGP

**Section 1: EBGP vs. iBGP – The Core Differences**

EBGP operates between different autonomous systems (AS), facilitating communication across diverse networks. In contrast, iBGP works within a single AS, managing internal routing. This fundamental difference is pivotal. EBGP’s capability to interact with different AS is crucial for network scalability and maintaining global connectivity, while iBGP focuses on internal efficiency and stability.

**Section 2: The Role of EBGP in Network Scalability**

One of EBGP’s standout features is its ability to support network scalability. It simplifies routing policies between AS and enables organizations to connect with multiple external networks seamlessly. By using EBGP, networks can efficiently manage route advertisements and prevent routing loops, ensuring stable data flows across vast geographical areas. This scalability is less achievable with iBGP, which is limited to internal network boundaries.

**Section 3: EBGP’s Influence on Network Security**

Security is paramount in network management, and EBGP offers robust solutions. By operating between distinct AS, EBGP provides clear demarcations that help isolate and manage security threats. Network administrators can implement stringent policies and filters, ensuring only legitimate routes are advertised. This level of security management is more challenging with iBGP, where internal threats can propagate more easily across the network.

Security in the DFZ

Security is probably the most significant Internet problem; no magic bullet exists. Instead, an arms race is underway as techniques used by attackers and defenders co-evolve. This is because the Internet was designed to move packets from A to B as fast as possible, irrespective of whether B wants any of those packets.

In 1997, a misconfigured AS7007 router flooded the entire Internet with /24 BGP routes. As a result, routing was globally disrupted for more than 1 hour as the more specific prefixes took precedence over the aggregated routes. In addition, more specific routes advertised from AS7007 to AS1239 attracted traffic from all over the Internet into AS1239, saturating its links and causing router crashes.

There are automatic measures to combat prefix hijacking, but they are not widely used or compulsory. The essence of BGP design allows you to advertise whatever NLRI you want, and it’s up to the connecting service provider to have the appropriate filtering in place.

Drawbacks to BGP

BGP’s main drawback concerning security is that it does not hide policy information, and by default, it doesn’t validate the source. However, as BGPv4 runs over TCP, it is not as insecure as many think. A remote intrusion into BGP would require guessing the correct TCP numbers to insert data, and most TCP/IP stacks have hard-to-predict TCP sequence numbers. To compromise BGP routing, a standard method is to insert a rogue router that must be explicitly configured in the target’s BGP configuration as a neighbor statement.

### Complexity in Configuration

One of the primary drawbacks of EBGP is its complexity in configuration. Unlike its internal counterpart, IBGP, EBGP requires careful planning and meticulous setup. Network administrators must configure policies, route maps, and filters to ensure optimal routing paths and prevent routing loops. This complexity can lead to misconfigurations, resulting in network inefficiencies or even outages.

### Limited Scalability

EBGP can also present scalability issues. As networks grow and the number of autonomous systems increases, maintaining numerous EBGP sessions becomes challenging. Each EBGP session consumes memory and processing power, potentially overwhelming routers if not managed properly. This limitation necessitates careful network design and the use of route reflectors or confederations to maintain scalability.

### Security Concerns

Security is another significant concern with EBGP. The protocol itself does not include built-in security features, making it vulnerable to various attacks, such as route hijacking and prefix spoofing. Network operators must implement additional security measures like prefix filtering, route validation, and the use of Resource Public Key Infrastructure (RPKI) to safeguard their networks against such threats.

Significance of BGP:

1. Inter-Domain Routing: BGP is primarily used for inter-domain routing, enabling different networks to communicate and exchange traffic across the internet. It ensures that data packets reach their intended destinations efficiently, regardless of the AS they belong to.

2. Internet Service Provider (ISP) Connectivity: BGP is crucial for ISPs as it allows them to connect their networks with other ISPs. This connectivity enables end-users to access various online services, websites, and content hosted on different networks, regardless of geographical location.

3. Redundancy and Load Balancing: BGP’s dynamic routing capabilities enable network administrators to create redundant paths and distribute traffic across multiple links. This redundancy enhances network resilience and ensures uninterrupted connectivity even during link failures.

4. Internet Traffic Engineering: BGP plays a vital role in Internet traffic engineering, allowing organizations to optimize network traffic flow. By manipulating BGP attributes and policies, network administrators can influence the path selection process and direct traffic through preferred routes.

Example BGP Traffic Engineering – AS Prepend

### Understanding BGP AS Prepend

BGP AS Prepend is a method by which an autonomous system (AS) can influence the path selection of outgoing traffic by artificially inflating the AS path length. This is done by adding (or “prepending”) multiple instances of its own AS number to the AS path attribute of BGP routes. This makes the path appear longer than it actually is, persuading other networks to prefer alternative, shorter paths.

### Why Use BGP AS Prepend?

The primary reason for using AS Prepend is to control the routing of incoming traffic for multi-homed networks—those connected to two or more ISPs. By prepending AS numbers, network administrators can manipulate the perceived path cost across different routes, directing traffic through more preferred paths. This can enhance load balancing, improve latency, and avoid congestion on certain links.

WAN SDN

Highlights: WAN SDN

Discussing WAN SDN

**Application Challenges**

**WAN SLA**

WAN SDN Technology: DMVPN

DMVPN over IPSec

Example WAN Techniques:

Understanding Virtual Routing and Forwarding

Understanding Policy-Based Routing

Example SD WAN Product: Cisco Meraki

Performance at the WAN Edge

Understanding Performance-Based Routing

WAN Performance Parameters

Understanding TCP MSS

Example WAN Technology: DMVPN Phase 3

Example WAN Technology: FlexVPN Site-to-Site Smart Defaults

Example WAN Technology: FlexVPN IKEv2 Routing

Understanding MPLS (Multi-Protocol Label Switching)

MPLS Virtual Private Networks (VPNs) Explained

Use Case – DMVPN Single Hub, Dual Cloud

The role of SDN

Transport Independance: Hybrid WAN

SDN and APIs

Scalability and Automation

WAN SDN

A Deterministic Solution

**SD-WAN Path Selection**

Critical Considerations for Implementation:

Real-World Use Cases:

The Rise of WAN SDN

Example: WAN SDN with Border6

What is non-stop internet?

Thousands of tests per minute

**BGP – Unrelated to Performance**

BGP for outbound | Locator/ID Separation Protocol (LISP) for inbound

Closing Points: On WAN SDN

Summary: WAN SDN

Low Latency Network Design

Highlights: Low Latency Network Design

Understanding Low Latency

**Achieving Low Latency**

Traceroute – Testing for Latency and Performance

**Key Challenges in Reducing Latency**

Google Cloud Machine Types

**Optimizing Cloud Performance with Google Cloud**

Google Cloud Data Centers

Example Product: Cisco ThousandEyes

Achieving Low Latency

Google Cloud Network Tiers

Understanding VPC Peering

Understanding Google Cloud CDN

Use Case: Understanding Performance-Based Routing

Routing Protocols:

**Strategies for Routing Protocol Optimization**

a. Implementing Route Summarization:

b. Load Balancing:

c. Convergence Optimization:

Understanding Layer 3 Etherchannel

Choose the correct topology:

Understanding BGP Route Reflection

Route Reflector Hierarchy

Use quality of service techniques.

TCP Performance Optimizations

Understanding TCP MSS

What is TCP MSS?

Understanding Switching

Spanning-Tree Protocol

Introducing Spanning-Tree Uplink Fast

Understanding Spanning Tree Protocol (STP)

Understanding Layer 2 Etherchannel

What is a MAC Move Policy?

Understanding sFlow

Use Case: Performance Routing

Advanced Topics

BGP Next Hop Tracking:

**Cutting-Edge Technologies**

**A New Operational Model**

**Changing Traffic Flow**

Low Latency Network Design

Application Challenges

WAN SLA

SD-WAN Path Selection

BGP – Unrelated to Performance

Achieving Low Latency

Key Challenges in Reducing Latency

Optimizing Cloud Performance with Google Cloud

Strategies for Routing Protocol Optimization

Cutting-Edge Technologies

A New Operational Model

Changing Traffic Flow

LISP & VM Mobility

Benefits of LISP

Applications of LISP

The driving forces of the DFZ