Load Balancing and Scale-Out Architectures

February 26, 2015

by Matt Conran Blog

Load Balancing and Scale-Out Architectures

In the rapidly evolving world of technology, where businesses rely heavily on digital infrastructure, load balancing has become critical to ensuring optimal performance and reliability. Load balancing is a technique used to distribute incoming network traffic across multiple servers, preventing any single server from becoming overwhelmed. In this blog post, we will explore the significance of load balancing in modern computing and its role in enhancing scalability, availability, and efficiency.

One of the primary reasons why load balancing is crucial is its ability to scale resources effectively. As businesses grow and experience increased website traffic or application usage, load balancers distribute the workload evenly across multiple servers. By doing so, they ensure that each server operates within its capacity, preventing bottlenecks and enabling seamless scalability. This scalability allows businesses to handle increased traffic without compromising performance or experiencing downtime, ultimately improving the overall user experience.

Load balancing is the practice of distributing incoming network traffic across multiple servers to optimize resource utilization and prevent overload. By evenly distributing the workload, load balancers ensure that no single server is overwhelmed, thereby enhancing performance and responsiveness. Load balancing algorithms, such as round-robin, least connection, or IP hash, intelligently distribute requests based on predefined rules, ensuring efficient resource allocation.

Scale out architectures, also known as horizontal scaling, involve adding more servers to a system to handle increasing workload. Unlike scale up architectures where a single server is upgraded with more resources, scale out approaches allow for seamless expansion by adding additional servers. This approach not only increases the system's capacity but also enhances fault tolerance and reliability. By distributing the workload across multiple servers, scale out architectures enable systems to handle surges in traffic without compromising performance.

Load balancing and scale out architectures offer numerous benefits. Firstly, they improve reliability by distributing traffic and preventing single points of failure. Secondly, these architectures enhance scalability, allowing systems to handle increasing demands without degradation in performance. Moreover, load balancing and scale out architectures facilitate better resource utilization, as workloads are efficiently distributed among servers. However, implementing and managing load balancing and scale out architectures can be complex, requiring careful planning, monitoring, and maintenance.

Load balancing and scale out architectures find extensive applications across various industries. From e-commerce websites experiencing high traffic during sales events to cloud computing platforms handling millions of requests per second, these architectures ensure smooth operations and optimal user experiences. Content delivery networks (CDNs), online gaming platforms, and media streaming services are just a few examples where load balancing and scale out architectures are essential components.

Load balancing and scale out architectures have transformed the way systems handle traffic and ensure high availability. By evenly distributing workloads and seamlessly expanding resources, these architectures optimize performance, enhance reliability, and improve scalability. While they come with their own set of challenges, the benefits they bring to modern computing environments make them indispensable. Whether it's a small-scale website or a massive cloud infrastructure, load balancing and scale out architectures are vital for delivering seamless and efficient user experiences.

Matt Conran

Understanding Load Balancing

– Load balancing is a technique for distributing incoming network traffic across multiple servers. By evenly distributing the workload, load balancing enhances the performance, scalability, and reliability of web applications. Whether it’s a high-traffic e-commerce website or a complex cloud-based system, load balancing plays a vital role in ensuring a seamless user experience.

– Load balancing is not only about distributing traffic; it also enhances application availability and scalability. By implementing load balancing, organizations can achieve high availability by eliminating single points of failure. In a server failure, load balancers can seamlessly redirect traffic to healthy servers, ensuring uninterrupted service.

– Additionally, load balancing facilitates scalability by allowing organizations to add or remove servers quickly based on demand. This elasticity ensures that applications can handle sudden spikes in traffic without compromising performance.

Several techniques are employed for load balancing, each with its advantages and use cases. Let’s explore a few popular ones:

1. Round Robin: The Round Robin algorithm evenly distributes incoming requests among servers in a cyclical manner. This technique is simple and effective, ensuring all servers get an equal share of the traffic.

2. Weighted Round Robin: Unlike the traditional Round Robin approach, Weighted Round Robin assigns different server weights based on their capabilities. This allows administrators to allocate more traffic to high-performance servers, optimizing resource utilization.

3. Least Connection: The algorithm directs incoming requests to the server with the fewest active connections. This technique ensures that heavily loaded servers are not overwhelmed and distributes traffic intelligently.

4. IP Hash Load Balancing: Here, the client’s IP address is used to determine which server receives the request. This technique is beneficial for applications requiring session persistence, like shopping carts or user profiles.

5. Weighted Load Balancing: Servers are assigned a weight based on their capacity. More robust servers handle a larger proportion of the load, ensuring an efficient distribution of traffic.

Example: What is Squid Proxy?

Squid Proxy is a widely-used caching proxy server that acts as an intermediary between clients and web servers. It caches commonly requested web content, allowing subsequent requests to be served faster, reducing bandwidth usage, and improving overall performance. Its flexibility and robustness make it a preferred choice for individuals and organizations alike.

Squid Proxy offers a plethora of features that enhance browsing efficiency and security. From content caching and access control to SSL decryption and transparent proxying, Squid Proxy can be customized to suit diverse requirements. Its comprehensive logging and monitoring capabilities provide valuable insights into network traffic, aiding in troubleshooting and performance optimization.

Implementing Squid Proxy brings several benefits to the table. Firstly, it significantly reduces bandwidth consumption by caching frequently accessed content, resulting in faster response times and reduced network costs. Additionally, Squid Proxy allows for granular control over web access, enabling administrators to define access policies, block malicious websites, and enforce content filtering. This improves security and ensures a safe browsing experience.

Example: Understanding HA Proxy

HA Proxy, short for High Availability Proxy, is an open-source load balancer and proxy server software. It operates at the application layer of the TCP/IP stack, making it a powerful tool for managing traffic between clients and servers. Its primary function is to distribute incoming requests across multiple backend servers based on various algorithms, such as round-robin, least connections, or source IP affinity.

HA Proxy offers a plethora of features that make it an indispensable tool for businesses seeking high performance and scalability. Firstly, its ability to perform health checks on backend servers ensures that only healthy servers receive traffic, ensuring optimal performance. Additionally, it supports SSL/TLS termination, allowing for secure connections between clients and servers. HA Proxy also provides session persistence, enabling sticky sessions for specific clients, which is crucial for applications that require stateful communication.

The Mechanics of Scale-Out Architectures

Scale-out architecture, unlike scale-up, involves adding more servers to an existing pool rather than upgrading a single server’s hardware. This horizontal scaling approach is preferred by many enterprises because it offers flexibility, cost-effectiveness, and the ability to seamlessly increase capacity as demand grows. By distributing the load across multiple machines, businesses can enhance performance, reduce downtime, and ensure a better user experience.

**The Benefits of Scale-Out Architectures**

One of the primary advantages of scale-out architectures is their inherent scalability. Businesses can easily accommodate growth by simply adding more servers to their network, thus avoiding the costly and complex upgrades associated with scale-up approaches. This flexibility allows companies to respond swiftly to changing demands, ensuring that their IT infrastructure can handle increased traffic without a hitch. Moreover, scale-out architectures often lead to cost savings, as organizations can opt for commodity hardware and open-source software to build their systems.

**Challenges and Considerations**

While scale-out architectures offer numerous benefits, they are not without challenges. Managing a distributed system can be complex, requiring robust monitoring and management tools to ensure optimal performance. Additionally, as the number of servers increases, so does the potential for network latency and data consistency issues. It’s crucial for businesses to carefully plan and design their scale-out strategies, taking into account factors such as data replication, network bandwidth, and fault tolerance to mitigate these challenges effectively.

Google Cloud Load Balancing

Load balancing plays a vital role in distributing incoming network traffic across multiple servers, ensuring optimal performance and preventing server overload. Google Cloud’s Network and HTTP Load Balancers are powerful tools that enable efficient traffic distribution, enhanced scalability, and improved reliability.

Network Load Balancer: Google Cloud’s Network Load Balancer operates at the transport layer (Layer 4) of the OSI model, making it ideal for TCP/UDP-based traffic. It offers regional load balancing, allowing you to distribute traffic across multiple instances within a region. With features like connection draining, session affinity, and health checks, Network Load Balancer provides robust and customizable load balancing capabilities.

HTTP Load Balancer: For web applications that rely on HTTP/HTTPS traffic, Google Cloud’s HTTP Load Balancer is the go-to solution. Operating at the application layer (Layer 7), it offers advanced features like URL mapping, SSL termination, and content-based routing. HTTP Load Balancer also integrates seamlessly with other Google Cloud services like Cloud CDN and Cloud Armor, further enhancing security and performance.

Setting Up Network and HTTP Load Balancers: Configuring Network and HTTP Load Balancers in Google Cloud is a straightforward process. From the Cloud Console, you can create a new load balancer instance, define backend services, set up health checks, and configure routing rules. Google Cloud’s intuitive interface and documentation provide step-by-step guidance, making the setup process hassle-free.

Google Cloud NEGs

### The Role of Network Endpoint Groups in Load Balancing

Load balancing is crucial for ensuring high availability and reliability of applications. NEGs play a significant role in this by enabling precise traffic distribution. By grouping network endpoints, you can ensure that your load balancer directs traffic to the most appropriate instances, thereby optimizing performance and reducing latency. This granular control is particularly beneficial for applications with complex network requirements.

### Types of Network Endpoint Groups

Google Cloud offers different types of NEGs to cater to various use cases. Zonal NEGs are used for VM instances within the same zone, ideal for scenarios where low latency is required. Internet NEGs, on the other hand, are perfect for external endpoints, such as Google Cloud Storage buckets or third-party services. Understanding these types allows you to choose the best option based on your specific needs and infrastructure setup.

### Configuring Network Endpoint Groups

Configuring NEGs in Google Cloud is a straightforward process. Start by identifying your endpoints and the type of NEG you need. Then, create the NEG through the Google Cloud Console or using cloud commands. Assign the NEG to a load balancer, and configure the routing rules. This flexibility in configuration ensures that you can tailor your network setup to match your application’s demands.

### Best Practices for Using Network Endpoint Groups

To maximize the benefits of NEGs, adhere to best practices such as regularly monitoring traffic patterns and adjusting configurations as needed. This proactive approach helps in anticipating changes in demand and ensures optimal resource utilization. Additionally, leveraging Google Cloud’s monitoring tools can provide insights into the performance of your network endpoints, aiding in making informed decisions.

Load Balancing with MIGs

Google Cloud’s Managed Instance Groups (MIGs)

Google Cloud’s Managed Instance Groups (MIGs) offer a seamless way to manage large numbers of identical virtual machine instances, enabling businesses to efficiently scale their applications while maintaining high availability. Whether you’re running a web application, a backend service, or handling batch processing, MIGs provide a robust framework to meet your needs.

**Understanding the Benefits of Managed Instance Groups**

Managed Instance Groups automate the process of managing VM instances by ensuring that your applications are always running the desired number of instances. This automation not only reduces the operational overhead but also ensures your applications can handle varying loads with ease. With features like automatic healing, load balancing, and integrated monitoring, MIGs provide a comprehensive solution to manage your cloud resources efficiently. Moreover, they support rolling updates, allowing you to deploy new application versions with minimal downtime.

**Scaling with Confidence**

One of the standout features of Managed Instance Groups is their ability to scale your applications automatically. By setting up autoscaling policies based on CPU usage, HTTP load, or custom metrics, MIGs can dynamically adjust the number of running instances to match the current demand. This elasticity ensures that your applications remain responsive and cost-effective, as you only pay for the resources you actually need. Additionally, by distributing instances across multiple zones, MIGs enhance the resilience of your applications against potential failures.

**Best Practices for Using Managed Instance Groups**

To get the most out of Managed Instance Groups, it’s essential to follow best practices. Start by defining clear scaling policies that align with your application’s performance and cost objectives. Regularly monitor the performance of your MIGs using Google Cloud’s integrated monitoring tools to gain insights into resource utilization and potential bottlenecks. Additionally, consider leveraging instance templates to standardize configurations and simplify the deployment of new instances.

**What Are Health Checks and Why Do They Matter?**

Health checks are periodic tests run by load balancers to monitor the status of backend servers. They determine whether servers are available to handle requests and ensure that traffic is only directed to those that are healthy. Without health checks, a load balancer might continue to send traffic to an unresponsive or overloaded server, leading to potential downtime and poor user experiences. Health checks help maintain system resilience by redirecting traffic away from failing servers and restoring it once they are back online.

—

**Google Cloud’s Approach to Load Balancing Health Checks**

Google Cloud offers a comprehensive suite of load balancing options, each with customizable health check configurations. These health checks can be set up to monitor different aspects of server health, such as HTTP/HTTPS responses, TCP connections, and SSL handshakes. Google Cloud’s platform allows users to configure parameters like the frequency of health checks, timeout durations, and the criteria for considering a server healthy or unhealthy. By leveraging these features, businesses can tailor their health checks to meet their specific needs and ensure reliable application performance.

—

**Best Practices for Configuring Health Checks**

To make the most out of cloud load balancing health checks, consider implementing the following best practices:

1. **Set Appropriate Intervals and Timeouts:** Balance the frequency of health checks with network overhead. Frequent checks might catch issues faster but can increase load on your servers.

2. **Define Clear Success and Failure Criteria:** Establish what constitutes a successful health check and at what point a server is considered unhealthy. This might include response codes or specific message content.

3. **Monitor and Adjust:** Regularly review health check logs and performance metrics to identify patterns or recurring issues. Adjust configurations as necessary to address these findings.

Understanding Cross-Region HTTP Load Balancing

Cross-region HTTP load balancing is a technique used to distribute incoming HTTP traffic across multiple servers located in different regions. This approach not only enhances the availability and reliability of your applications but also significantly reduces latency by directing users to the nearest server. On Google Cloud, this is achieved through the Global HTTP(S) Load Balancer, which intelligently routes traffic to optimize user experience based on various factors such as proximity, server health, and current load.

### Benefits of Cross-Region Load Balancing on Google Cloud

One of the primary benefits of using Google Cloud’s cross-region HTTP load balancing is its global reach. With data centers spread across the globe, you can ensure that your users always connect to the nearest available server, resulting in faster load times and improved performance. Additionally, Google Cloud’s load balancing solution comes with built-in security features, such as SSL offloading, DDoS protection, and IPv6 support, providing a robust shield against potential threats.

Another advantage is the seamless scalability. As your user base grows, Google Cloud’s load balancer can effortlessly accommodate increased traffic without manual intervention. This scalability ensures that your services remain available and responsive, even during peak times.

### Setting Up Cross-Region Load Balancing on Google Cloud

To set up cross-region HTTP load balancing on Google Cloud, you need to follow a series of steps. First, create backend services and associate them with your virtual machine instances located in different regions. Next, configure the load balancer by defining the URL map, which dictates how traffic is distributed across these backends. Finally, set up health checks to monitor the status of your instances and ensure that traffic is only directed to healthy servers. Google Cloud’s intuitive interface and comprehensive documentation make this process straightforward, even for those new to cloud infrastructure.

Distributing Load with Service Mesh

The Importance of Load Balancing

One of the primary functions of a cloud service mesh is load balancing. Load balancing is essential for distributing network traffic evenly across multiple servers, ensuring no single server becomes overwhelmed. This not only enhances the performance and reliability of applications but also contributes to the overall efficiency of the cloud infrastructure. With a well-implemented service mesh, load balancing becomes dynamic and intelligent, automatically adjusting to traffic patterns and server health.

### Enhancing Security with a Service Mesh

Security is a paramount concern in cloud computing. A cloud service mesh enhances security by providing built-in features such as mutual TLS (mTLS) for service-to-service encryption, authorization, and authentication policies. This ensures that all communications between services are secure and that only authorized services can communicate with each other. By centralizing security management within the service mesh, organizations can simplify their security protocols and reduce the risk of vulnerabilities.

### Observability and Monitoring

Another significant advantage of using a cloud service mesh is the enhanced observability and monitoring it provides. With a service mesh, organizations gain insights into the behavior of their microservices, including traffic patterns, error rates, and latencies. This granular visibility allows for proactive troubleshooting and performance optimization. Tools integrated within the service mesh can visualize complex service interactions, making it easier to identify and address issues before they impact end-users.

### Simplifying Operations and DevOps

Managing microservices in a cloud environment can be complex and challenging. A cloud service mesh simplifies these operations by offering a consistent and unified approach to service management. It abstracts the complexities of service-to-service communication, allowing developers and operations teams to focus on building and deploying features rather than managing infrastructure. This leads to faster development cycles and more robust, resilient applications.

Google Cloud CDN

What is Cloud CDN?

Cloud CDN, short for Cloud Content Delivery Network, is a globally distributed network of servers that deliver web content to users with increased speed and reliability. By storing cached copies of content at strategically located edge servers, Cloud CDN significantly reduces latency and minimizes the distance data needs to travel, resulting in faster page load times and improved user experience.

When a user requests content from a website, Cloud CDN intelligently routes the request to the nearest edge server to the user. If the requested content is already cached at that edge server, it is delivered instantly, eliminating the need to fetch it from the origin server. However, if the content is not cached, Cloud CDN retrieves it from the origin server and stores a cached copy for future requests, making subsequent delivery lightning-fast.

Load Balancer Scaling

How to scale the load balancer? When considering load balancer scaling and scalability, we need to recap the basics of scaling load balancers. A load balancer is a device that distributes network traffic across multiple servers. It provides an even distribution of traffic across multiple servers, so no single server is overloaded with requests. This helps to improve overall system performance and reliability. Load balancers can balance traffic between multiple web servers, application servers, and databases.

Geographic Locations:

They can also be used to balance traffic between different geographic locations. Load balancers are typically configured to use round-robin, least connection, or source-IP affinity algorithms to determine how to distribute traffic. They can also be configured to use health checks to ensure that only healthy servers receive traffic. By distributing the load across multiple servers, the load balancer helps reduce the risk of server failure and improve overall system performance.

Load Balancers and the OSI Model:

Load balancers operate at different Open Systems Interconnection ( OSI ) Layers from one data center to another; joint operation is between Layer 4 and Layer 7. The load balance function becomes the virtual representation of the application. Internal applications are represented by a virtual IP address ( VIP ). VIP acts as a front-end service to external clients’ requests. Data centers host unique applications with different requirements. Therefore, load balancing and scalability will vary depending on the housed applications.

Understanding the Application:

For example, every application is unique regarding the number of sockets, TCP connections ( short-lived or long-lived ), idle time-out, and activities in each session regarding packets per second. Therefore, understanding the application structure and protocols is one of the most critical elements in determining how to scale the load balancer and design an effective load-balancing solution.

Techniques for Scaling Load Balancers

There are several strategies for scaling load balancers, each with its own benefits and ideal use cases:

1. **Vertical Scaling**: This involves increasing the capacity of your existing load balancer by upgrading its resources. While it is a straightforward approach, it has limitations in terms of cost and scalability.

2. **Horizontal Scaling**: This technique involves adding more load balancers to distribute the traffic effectively across a broader range of resources. It offers better redundancy and can accommodate larger loads but requires careful coordination between load balancers.

3. **Auto-scaling**: Implementing auto-scaling allows your infrastructure to dynamically adjust capacity based on real-time demand. This ensures that you only use the resources you need, thereby optimizing costs while maintaining performance.

Scaling Up

Scaling up is quite common for applications that need more power. Perhaps the database has grown so large that it no longer fits in memory, the disks may be full, or the database may be handling more requests and requiring more processing power than it used to.

Databases have traditionally been difficult to run on multiple machines, making them an excellent example of scaling up. When you try to make something work on two or more machines, many things that work on a single machine don’t. Do you know how to share tables efficiently across machines, for example? MongoDB and CouchDB are two new databases designed to work entirely differently since this is a challenging problem to solve.

Scaling Out

It’s here that things start to get interesting, which is why you picked up this book in the first place. In scaling out, you have multiple machines rather than a single one. The problem with scaling up is that you eventually reach a point where you can’t go any further. The capability of a single machine limits memory and processing power. If you need more than that, what should you do?

A single machine can’t handle so many visitors that people will tell you you’re in an envious position. You wouldn’t believe how nice it is to have such a problem! One of the great things about scaling out is that you can keep adding machines. Scaling out will certainly result in more compute power than scaling up, but you will run into space and power issues at some point.

Best Practices for Load Balancer Scaling

To successfully scale your load balancers, consider these best practices:

– **Monitor Traffic Patterns**: Analyze traffic trends to anticipate spikes and prepare your infrastructure accordingly.

– **Implement Robust Failover Strategies**: Ensure that your load balancers can handle failures gracefully without impacting the user experience.

– **Optimize Load Balancer Configurations**: Regularly review and optimize configurations to ensure that they align with current traffic demands.

Before you proceed, you may find the following post helpful:

Highlights: Load Balancing and Scale-Out Architectures

Availability:

Load balancing plays a significant role in maintaining high availability for websites and applications. By distributing traffic across multiple servers, load balancers ensure that even if one server fails, others can continue handling incoming requests. This redundancy helps to minimize downtime and ensures uninterrupted service for users. In addition, load balancers can also perform health checks on servers, automatically detecting and redirecting traffic away from malfunctioning or overloaded servers, further enhancing availability.

Efficiency:

Load balancers optimize the utilization of computing resources by intelligently distributing incoming requests based on predefined algorithms. This even distribution prevents any single server from being overwhelmed, improving overall system performance. By utilizing load balancing, businesses can ensure that their servers operate optimally, using available resources and minimizing the risk of performance degradation or system failures.

Scale up and scale out:

How is this like load balancing in the computing world? It all comes down to having finite resources and attempting to make the best potential use of them. For example, you may have the goal of making your websites fast; to do that, you must route your requests to the machines best capable of handling them. To get around this, you need more resources.

For example, you can buy a giant machine to replace your current server, known as scale-up and pricey, or another small device that works alongside your existing server, known as scale-out. As noted, the biggest challenge in load balancing is trying to make many resources appear as one. So we can have load balancing with DNS, content delivery networks, and HTTP load balancing. We also need to balance our database and network connections.

Guide on Gateway Load Balancing Protocol (GLBP)

GLBP is running between R1 and R2. The switch is not running GLPB and is used as an interconnection point. GLBP is often used internally between access layer switches and outside the data center. It is similar in operation to HSRP and VRRP. Notice that when we changed the priority of R2, its role changed to Active instead of Standby.

Diagram: Gateway Load Balancer Protocol (GLBP)

What is Load Balancer Scaling?

Load balancer scaling refers to the process of dynamically adjusting the resources allocated to a load balancer to meet the changing demands of an application. As the number of users accessing an application increases, load balancer scaling ensures that the incoming traffic is distributed evenly across multiple servers, preventing any single server from becoming overwhelmed.

The Benefits of Load Balancer Scaling

1. Enhanced Performance: Load balancers distribute incoming traffic among multiple servers, improving resource utilization and response times. By preventing any single server from overloading, load balancer scaling ensures a smooth user experience even during peak traffic.

2. High Availability: Load balancers play a crucial role in maintaining high availability by intelligently distributing traffic to healthy servers. If one server fails, the load balancer automatically redirects the traffic to the remaining servers, preventing service disruption.

3. Scalability: Load balancer scaling allows applications to quickly accommodate increased traffic without manual intervention. As the server load increases, additional resources are automatically allocated to handle the extra load, ensuring that the application can scale seamlessly as per the demands.

Load Balancer Scaling Strategies

1. Vertical Scaling: This strategy involves increasing individual servers’ resources (CPU, RAM, etc.) to handle higher traffic. While vertical scaling can provide immediate relief, it has limitations in terms of scalability and cost-effectiveness.

2. Horizontal Scaling: Horizontal scaling involves adding more servers to the application infrastructure to distribute the incoming traffic. Load balancers are critical in effectively distributing the load across multiple servers, ensuring optimal resource utilization and scalability.

3. Auto Scaling: Auto-scaling automatically adjusts the number of application instances based on predefined conditions. By monitoring various metrics like CPU utilization, network traffic, and response times, auto-scaling ensures that the application can handle increased traffic loads without manual intervention.

Best Practices for Load Balancer Scaling

1. Monitor and Analyze: Regularly monitor your application’s and load balancer’s performance metrics to identify any bottlenecks or areas of improvement. Analyzing the data will help you make informed decisions regarding load balancer scaling.

2. Implement Redundancy: To ensure high availability, deploy multiple load balancers in different availability zones. This redundancy ensures that even if one load balancer fails, the application remains accessible through the remaining ones.

3. Regularly Test and Optimize: Conduct load testing to simulate heavy traffic scenarios and verify the performance of your load balancer scaling setup. Optimize the configuration based on the test results to ensure optimal performance.

Example: Direct Server Return.

Direct server return (DSR) is an advanced networking technology that allows servers to send data directly to a client computer without going through an intermediary. This provides a more efficient and secure data transmission between the two, leading to faster speeds and better security.
DSR is also known as loopback, direct routing, or reverse path forwarding. It is essential in various applications, such as online gaming, streaming video, voice-over-IP (VoIP) services, and virtual private networks (VPNs).

For example, the Real-Time Streaming Protocol ( RTSP ) is an application-level network protocol for multimedia transport streams. It is used in entertainment and communications systems to control streaming media servers. With this application requirement case, the initial client connects with TCP; however, return traffic from the server can be UDP, bypassing the load balancer. For this scenario, the load-balancing method of Direct Server Return is a viable option.

DSR is an excellent choice for high-speed, secure data transmission applications. It can also help reduce latency and improve reliability. For example, DSR can help reduce lag and improve online gaming performance.

Diagram: Direct Server Return (DRS). Source Cisco.

How to scale load balancer

This post will first address the different load balancer scalability options: scale-up and scale-out. Scale-out is generally the path of scaling load balancers we see today, mainly as the traffic load, control, and data plane are spread across VMs or containers that are easy to spin up and down, commonly seen for absorbing DDoS attacks.

We will then discuss how to scale load balancer and the scalability options in the application and at a network load balancing level. We will finally address the different design options for load balancing, such as user session persistence, destination-only NAT, and persistent HTTP sessions.

Scaling a load balancer lets you adjust its performance to its workload by changing the number of nodes it contains. You can scale the load balancer up or down at any time to meet your traffic needs. So, when considering how to scale a load balancer, you must first look at the application requirements and work it out from there. What load do you expect?

In the diagram below, we see the following.

Virtual IP address: A virtual IP address is an IP address that is used to virtualize a computer’s identity on a local area network (LAN). The network address translation (NAT) form allows multiple devices to share a public IP address.
Load Balancer Function: The load balancer is configured to receive client requests and route them to the most appropriate server based on a defined algorithm.

Diagram: How to scale load balancer and load balancer functions.

The primary benefit of load balancer scaling is that it provides scalability. Scalability is the ability of a networking device or application to handle organic and planned network growth. Scalability is the main advantage of load balancing, and in terms of application capacity, it increases the number of concurrent requests data centers can support. So, in summary, load balancing is the ability to distribute incoming workloads to multiple end stations based on an algorithm.

Load balancers also provide several additional features. For example, they can be configured to detect and remove unhealthy servers from the pool of available servers. They also offer SSL encryption, which can help to protect sensitive data being passed between clients and servers. Finally, they can perform other tasks like URL rewriting and content caching.

Load Balancing	Load Balancing Algorithm
Load Balancing Method 1	Round Robin Load Balancing
Load Balancing Method 2	Weighted Round Robin Load Balancing
Load Balancing Method 3	URL Hash Load Balancing
Load Balancing Method 4	Least Connection Method
Load Balancing Method 5	Weighted Least Connection Method
Load Balancing Method 6	Least Response Time Method

Load Balancing with Routers

Load Balancing is not limited to load balancer devices. Routers also perform load balancing with routing. Across all Cisco IOS® router platforms, load balancing is a standard feature. The router automatically activates this feature when multiple routes to a destination are in the routing table.

Routing Information Protocol (RIP), RIPv2, Enhanced Interior Gateway Routing Protocol (EIGRP), Open Shortest Path First (OSPF), and Interior Gateway Routing Protocol (IGRP) are standard routing protocols or derived from static routing and packet forwarding protocols. When forwarding packets, RIP allows a router to use multiple paths.

For process-switching — load balancing is on a per-packet basis, and the asterisk (*) points to the interface over which the next packet is sent.
For fast-switching — load balancing is on a per-destination basis, and the asterisk (*) points to the interface over which the next destination-based flow is sent.

Diagram: IOS Load Balancing. Source Cisco.

Load Balancer Scalability

Scaling load balancers with Scale-Up or Scale-Out

a) Scale-up—Expand linearly by buying more servers, adding CPU and memory, etc. Scale-up is usually done on transaction database servers as these servers are difficult to scale out. Scaling up is a simple approach but the most expensive and nonlinear. Old applications were upgraded by scaling up ( vertical scaling )—a rigid approach that is not elastic. In a virtualized environment, applications are scaled linearly in a scale-out fashion.

b) Scale-out—Add more parallel servers, i.e., scale linearly. Scaling out is more accessible on web servers; add additional web servers as needed. Netflix is an example of a company that designs by scale-out. It spins up Virtual Machines ( VM ) on-demand due to daily changes in network load. Scaling out is elastic and requires a load-balancing component. It is an agile approach to load balancing.

Shared states limit the scalability of scale-out architectures, so try to share and lock as few states as possible. An example of server locking is Amazon’s eventual consistency approach, which limits the amount of transaction locking—shopping cards are not checked until you click “buy.”

Scale up load balancing

A load balancer scale-up is the process of increasing the capacity of a load balancer by adding more computing resources. This can increase the system’s scalability or provide redundancy in case of system failure. The primary goal of scaling up a load balancer is to ensure the system can handle the increased workload without compromising performance.

Scaling up a load balancer involves adding more hardware and software resources, such as CPUs, RAM, and hard disks. These resources will enable the system to process requests more quickly and efficiently. When scaling up a load balancer, consider its architecture and the types of requests it will handle.

Different types of requests require different computing resources. For example, if the load balancer handles high-volume requests, it is essential to ensure that the system has enough CPUs and RAM to handle them.

Considering the network topology when scaling up a load balancer is also essential. The network topology defines how the load balancer will communicate with other systems, such as web servers and databases. If the network topology is not configured correctly, the system may be unable to handle the increased load. Finally, monitoring the system after scaling up a load balancer is essential. This will ensure that the system performs as expected and that the increased capacity is used effectively. Monitoring the system can also help detect potential issues or performance bottlenecks.

By scaling up a load balancer, organizations can increase the scalability and redundancy of their system. However, it is essential to consider the architecture, types of requests, network topology, and monitoring when scaling up a load balancer. This will ensure the system can handle the increased workload without compromising performance.

Additional information: Scale-out load balancing

Scaling out a load balancer adds additional load balancers to distribute incoming requests evenly across multiple nodes. The process of scaling out a load balancer can be achieved in various ways. Organizations can use virtualization or cloud-based solutions to add additional load balancers to their existing systems. Some organizations prefer to deploy their servers or use their existing hardware to scale the load balancer.

Regardless of the chosen method, the primary goal should be to create a reliable and efficient system that can handle increasing requests. This can be done by evenly distributing the load across multiple nodes, ensuring that every node is manageable and manageable. Additionally, organizations should consider deploying additional load balancer resources, such as memory, disk space, or CPU cores.

Finally, organizations should constantly monitor the load balancer’s performance to ensure the system runs optimally. This can be done by tracking the load-balancing performance, analyzing the response time of requests, and providing that the system can handle unexpected spikes in traffic.

Load Balancer Scalability: The Operations

The virtual IP address and load balancing control plane

Outside is a VIP, and inside is a pool of servers. A load balancer scaling device is configured for rules associating outside IP and port numbers with an inside pool of servers. Clients only know the outside IP address through, for example, DNS replies. The load-balancing control plane monitors the servers’ health and determines which can accept requests.

The client sends a TCP SYN packet, which the load balancer device intercepts. The load balancer performs a load-balancing algorithm and sends it to the best server destination. To get the request to the server, you can use Tunnelling, NAT, or two TCP sessions. In some cases, the load balancer will have to rewrite the content. Whatever the case, the load balancer has to create a session to know that this client is associated with a particular inside server.

Local and global load balancing

Local server selection occurs within the data center based on server load and application response times. Any application that uses TCP or UDP protocols can be load-balanced. Whereas local load balancing determines the best device within a data center, global load balancing chooses the best data center to service client requests.

Global load balancing is supported through redirection based on DNS and HTTP. HTTP mechanism provides better control, while DNS is fast and scalable. Both local and global appliances work hand-in-hand; the local device feeds information to the global device, enabling it to make better load-balancing decisions.

Load Balancer Scaling Types

Application-Level Load Balancer Scalability: Load balancing is implemented between tiers in the applications stack and carried out within the application. It is used in scenarios where applications are coded correctly, making it possible to configure load balancing in the application. Designers can use open-source tools with DNS or another method to track flows between tiers of the application stack.

Network-Level Load Balancer Scalability: Network-level load balancing includes DNS round-robin, Anycast, and Layer 4 – Layer 7 load balancers. Web browser clients do not usually have built-in application layer redundancy, which pushes designers to look at the network layer for load-balancing services. If applications were designed correctly, load balancing would not be a network-layer function.

Application-level load balancing

Application-level load balancer scaling concerns what we can do inside the application to provide load-balancing services. The first thing you can do is scale up—add a more worker process. Clients issue requests that block some significant worker processes and that resource is tied to TCP sessions. If your application requires session persistence ( long-lived TCP sessions ), you block worker processes even if the client is not sending data. The solution is FastCGI or changing the webserver to Nginx.

A key point: Nginx

Nginx is event-based. On Apache ( not event-based), every TCP connection consumes a worker process, but with Nginx, a client connection takes no processes unless you are processing an actual request. Generally, Linux is poor at processing many simultaneous requests.

Nginx does not use threads and can easily have 100,000 connections. With Apache, you lose 50% of the performance, and adding CPU doesn’t help. With around 80,000 connections, you will experience severe performance problems no matter how many CPUs you add. Nginx is by far a better solution if you expect a lot of simultaneous connections.

Example: Load Balancing with Auto Scaling groups on AWS.

The following looks at an example of load balancing in AWS. Registering your Auto Scaling group with an Elastic Load Balancing load balancer helps you set up a load-balanced application. Elastic Load Balancing works with Amazon EC2 Auto Scaling to distribute incoming traffic across your healthy Amazon EC2 instances.

This increases your application’s scalability and availability. In addition, you can enable Elastic Load Balancing within multiple Availability Zones to increase your application’s fault tolerance. Elastic Load Balancing supports different types of load balancers. A recommended load balancer is the Application Load Balancer.

Diagram: Elastic Load Balancing in the cloud. Source Amazon.

Network-based load balancing

First, try to solve the load balancer scaling in the application. When you cannot load balance solely using applications, turn to the network for load-balancing services.

DNS round-robin load balancing

The most accessible type of network-level load balancing is DNS round robin. DNS server that keeps track of application server availability. The DNS control plane distributes user traffic over multiple servers round-robin. However, it does come with caveats:

DNS does not know server health.
DNS caching problems.
No measures are available to prevent DoS attacks against servers.

Clients ask for the IP of the web server, and the DNS server replies with an IP address in random order. This works well if the application uses DNS. However, some applications use hard-coded IP addresses; you can’t rely on DNS-based load balancing in these scenarios.

DNS load balancing also requires low TTL times, so the client will often ask the servers. Generally, DNS-based load balancing works well, but not with web browsers. Why? DNS pinning.

DNS pinning

This is because there have been so many attacks on web browsers, and browsers now implement a security feature called DNS pinning. DNS pinning is a method whereby you get the server’s IP address, and even though the TTL has expired, you ignore the DNS TTL and continue to use the URL.

It prevents people from spoofing DNS records and is usually built-in to browsers. DNS load balancing is perfect if the application uses DNS and listens to DNS TTL times. But unfortunately, web browsers are not in that category.

IP Anycast load balancing

IP Anycast provides geographic server load balancing. The idea is to use the same IP address on multiple POPs. Routing in the core will choose the closest POP, routing the client to the nearest POP. All servers have the same IP address configured on loopback.

Address Resolution Protocol (ARP) replies would clash if the same IP address were configured on the LAN interface. Use any routing mechanism to generate an Equal Cost Multi-Path (ECMP) for loopback addresses. For example, static routes are based on IP SLA, or you can use OSPF between the server and router.

Best for UDP traffic

The router will load balance based on a 5-tuple as requests come in. Do not load the balance on destination addresses /ports, as they are always the same. It is usually done using the source client’s IP address/port number. The process takes the 5-tuple and creates a hash value, which makes independent paths based on that value. This works well for UDP traffic and how root servers work. It is also good for DNS server load balancing.

It works well for UDP as every request from the client is independent. TCP does not work like this, as TCP has sessions. It recommended not to use Anycast load balancing for TCP traffic. You need an actual load balancer if you want to load-balance TCP traffic. This could be a software package, Open Source ( HAproxy ), or a dedicated appliance.

Scaling load balancers at Layer 2

Layer 2 designs refer to the load balancer in bridged mode. As a result, all load-balanced and non-load-balanced traffic to and from the servers goes through the load-balancing device. The device bridges two VLANs together in the same IP subnet. Essentially, the load balancer acts as a crossover cable, merging two VLANs.

The critical point is that the client and server sides are in the same subnet. As a result, layer 2 implementations are much more accessible than layer 3 implementations, as there are no changes to IP addresses, netmasks, and default gateway settings on servers. However, with a bridged design, be careful about introducing loops and implementing spanning tree protocol ( STP ).

Scaling load balancers at Layer 3

With layer 3 designs, the load-balancing device acts in routed mode. Therefore, all load-balanced and non-load-balanced traffic to and from the server goes through the load-balancing device. The device routes between two different VLANs that are in two different subnets.

The critical point and significant difference between layer 3 and layer 2 designs are client-side VLANs and server-side VLANs in different subnets. Therefore, the VLANs are not merged, and the load-balancing device routes between VLANs. Layer 3 designs may be more complex to implement but will eventually be more scalable in the long run.

Scaling load balancers with One-ARM mode

One-armed mode refers to a load-balancing device, not in the forwarding path. The critical point is that the load balancer resides on its subnet and has no direct connectivity with server-side VLAN. A vital advantage of this model is that only load-balanced traffic goes through the device.

Server-initiated traffic bypasses the load balancer and changes both source and destination IP addresses. The load balancer terminates outside TCP sessions and initiates new inside TCP sessions. When the client connection comes in, you take the source IP and port number, put them in connection tables, and associate them with the load balancer’s TCP port number and IP.

As everything comes from the load balance IP address, the servers can no longer see the original client. On the right-hand side of the diagram below, the source and destination traffic flow on the server side is the load balancer. The VIP addresses 10.0.0.1, and that is what the client connects to.

The use of X-forwarder-for HTTP header

We use the X-forwarder-for HTTP header to indicate to the server which the original client is. The client’s IP address is replaced with the load balancer’s IP address. The load balancer can insert the X-Forwarders-for HTTP header, where they copy the client’s original IP address into the extra HTTP header—“X-forward-for header.” Apache has a standard that copies the value of this header into the standard CGI variable so all the scripts can pretend no load balancer exists.

The load balancer inserts data into the TCP session; in other words, it has to take ownership of the TCP sessions, so it needs to take control of TCP activities, including buffering, fragmentation, and reassembling. Modifying HTTP requests is hard. F5 has an accelerated mode of TCP load balancing.

Scaling load balancers with Direct Server Return

Direct Server Return is when the same IP address is configured on all hosts. The same IP is configured on the loopback interface, not the LAN interface. The LAN IP address is only used for ARP, so the load balancer would send ARP requests only for the LAN IP address, rewrite the MAC header ( not TCP or HTTP alterations ), and send the unmodified IP packet to the selected server.

The server sends the reply to the client and does not involve the load balancer. As load balancing is done on the MAC address, it requires layer 2 connectivity between the load balancer and servers ( example: Linux Virtual Server ). Also, a tunneling method that uses Layer 3 between the load balancer and servers is available.

A key point: MTU issues

If you do not have layer 2 connectivity, you can use tunnels, but be aware of MTU issues. Make sure the Maximum Segment Size ( MSS ) on the server is reduced so you do not have a PMTU issue between the client and server.

With direct server return, how do you ensure the reply is from the loopback, not the LAN address? If you are using TCP, the TCP session’s IP address is dictated by the original TCP SYN packet, so this is automatic.

However, UDP is different as UDP leaving is different from UDP coming in. So, in UDP cases, you need to set the IP address manually with the application or with iptables. But for TCP, the source in the reply is always copied from the destination IP address in the original TCP SYN request.

Scaling load balancers with Microsoft network load balancing

Microsoft load balancing is the ability to implement load balancing without load balancers. Instead, create a cluster IP address for the server and then use the flooding behavior to send it to all servers.

Clients send a packet to the shared cluster IP address associated with a client’s MAC address. This cluster MAC does not exist anywhere. When the request arrives at the last Layer 3 switch, it sends an ARP request: “Who has this IP address?”?.

ARP requests arrive at all the servers. So, when the client packet arrives, it is sent to the cluster’s bogus MAC address. Because the MAC address has never been associated with any source, all the traffic is flooded from the Layer 2 switch to the servers. The performance of the Layer 2 switch falls massively as unicast flooding is done in software.

The use of Multicast

Microsoft then changed this to use Multicast. This does not work, and packets are dropped as an illegal source of MAC when using a multicast MAC address. Cisco routers drop ARP packets with the source MAC address as multicast. To overcome this, configure static ARP entries. Microsoft also implements IGMP to reduce flooding.

Load Balancing Options

User session persistence ( Stickiness )

The load balancer must keep all session states, even for inactive sessions. Session persistence creates much more state than just the connection table. Some web applications store client session data on the servers, so sessions from the same client must go to the same server. This is particularly important when SSL is deployed for encryption or where shopping carts are used.

The client establishes an HTTP session with the webserver and logs in. After login, the HTTPS session from the same client should land on the same web server to which the client logged in using the initial HTTP request. The following are ways load balancers can determine who the source client is.

session persistance — Diagram: Scaling load balancers and session persistence.

Source IP address – > Problem may arise with large-scale NAT designs.
Extra HTTP cookies – > May require the load balancer to take ownership of the TCP session.
SSL session ID -> The session Will remain persistent even if the client is roaming and the client’s IP address changes.

Data path programming

F5 uses scripts that act on packets, triggering the load-balancing mechanism. You can select the server, manipulate HTTP headers, or even manipulate content. For example, the load balancer can add caching headers in MediaWiki (which does not change content / caching headers ). The load balancer adds the headers that allow the content to be cached.

Persistent HTTP sessions

The client has a long-lived HTTP session to eliminate one RTT and congestion window problem; then, we have a short-lived session from the load balancer to the server. SPDY is a next-generation HTTP with multiple HTTP sessions over one TCP session. This is useful in high-latency environments such as mobile devices. F5 has a SPDY-to-HTTP gateway.

Destination-only NAT

The server rewrites the destination IP address to the actual server’s destination IP and then forwards the packet. The reply packet has to hit the load balancer, as the load balancer has to replace the server’s source IP with the load balancer’s source IP. The client IP does not change, so the server talks directly with the client. This allows the server to do address-based access control or GEO location based on the source address.

Understanding Browser Caching

Browser caching is the process of storing static files locally on a user’s device to reduce load times when revisiting a website. By leveraging browser caching, web developers can instruct browsers to store certain resource files, such as images, CSS, and JavaScript, for a specified period. This way, subsequent visits to the website become faster as the browser doesn’t need to fetch those files again.

Nginx, a popular web server and reverse proxy server, offers a powerful module called “header” that enables fine-grained control over HTTP response headers. With this module, web developers can easily configure caching directives and optimize how browsers cache static resources. By setting appropriate cache-control headers and expiration times, you can dictate how long a browser should cache specific files.

To leverage the browser caching capabilities of Nginx’s header module, you need to configure your server block or virtual host file. First, ensure that the module is installed and enabled. Then, within the server block, you can use the “add_header” directive to set the cache-control headers for different file types. For example, you can instruct the browser to cache images for a month, CSS files for a week, and JavaScript files for a day.

After configuring the caching directives, it’s crucial to verify if the changes are properly applied. There are various tools available, such as browser developer tools and online caching checkers, that can help you inspect the response headers and check if the caching settings are working as intended. By ensuring the correct headers are present, you can confirm that browsers will cache the specified resources.

Final Point: Scaling Load Balancing

As your user base grows, so does the demand on your servers. Without proper scaling, you risk overloading your systems, leading to slowdowns or even crashes. Load balancer scaling helps manage this growth seamlessly. By dynamically adjusting to traffic demands, scaling ensures that resources are used efficiently, providing users with a smooth experience regardless of traffic spikes.

There are primarily two types of load balancer scaling: vertical and horizontal. Vertical scaling involves adding more power to an existing server, such as increasing CPU or RAM. While effective, there’s a limit to how much you can scale vertically. Horizontal scaling, on the other hand, involves adding more servers to distribute the load. This approach is more flexible and can handle larger traffic volumes more effectively.

Implementing load balancer scaling requires careful planning and consideration of your infrastructure needs. It’s important to choose the right tools and technologies that align with your application requirements. Solutions like AWS Elastic Load Balancing or Google Cloud Load Balancing offer robust scaling options that can be tailored to your specific needs. Monitoring and analytics tools are also essential to predict traffic patterns and scale resources proactively.

To get the most out of load balancer scaling, consider these best practices:

1. **Monitor Performance Metrics:** Continuously track key performance indicators to identify when scaling is necessary.

2. **Automate Scaling Processes:** Implement automation to respond quickly to traffic changes, reducing the risk of manual errors.

3. **Test Scaling Strategies:** Regularly test your scaling strategies in a controlled environment to ensure they work as expected.

4. **Optimize Resource Allocation:** Use analytics to allocate resources efficiently, minimizing costs while maximizing performance.

Summary: Load Balancing and Scale-Out Architectures

In today’s digital landscape, where websites and applications are expected to handle millions of users simultaneously, achieving scalability is crucial. Load balancer scaling is vital in ensuring traffic is efficiently distributed across multiple servers. This blog post explored the key concepts and strategies behind load balancer scaling.

Understanding Load Balancers

Load balancers act as network traffic managers, evenly distributing incoming requests across multiple servers. They serve as a gateway, optimizing performance, enhancing reliability, and preventing any single server from becoming overwhelmed. By intelligently routing traffic, load balancers ensure a seamless user experience.

Horizontal Scaling

Horizontal scaling, or scaling out, involves adding more servers to a system to handle increasing traffic. Load balancers play a crucial role in horizontal scaling by dynamically distributing the workload across these additional servers. This allows for improved performance and handling higher user loads without sacrificing speed or reliability.

Vertical Scaling

In contrast to horizontal scaling, vertical scaling, or scaling up, involves increasing the resources of existing servers to handle increased traffic. Load balancers can still play a role in vertical scaling by ensuring that the increased resources are used efficiently. By intelligently allocating requests, load balancers can prevent any server from being overwhelmed, even with the added capacity.

Load Balancer Algorithms

Load balancers utilize various algorithms to determine how requests are distributed across servers. Commonly used algorithms include round-robin, least connections, and IP hash. Each algorithm has its advantages and considerations, and choosing the right one depends on the specific requirements of the application and infrastructure.

Scaling Strategies

Several strategies can be employed when it comes to load balancer scaling. One popular approach is auto-scaling, which automatically adjusts server capacity based on predefined thresholds. Another strategy is session persistence, which ensures that subsequent requests from a user are routed to the same server. The right combination of strategies can lead to an optimized and highly scalable infrastructure.

Conclusion:

Load balancer scaling is critical to achieving scalability for modern websites and applications. By intelligently distributing traffic across multiple servers, load balancers ensure optimal performance, enhanced reliability, and the ability to handle growing user loads. Understanding the key concepts and strategies behind load balancer scaling empowers businesses to build robust and scalable infrastructures that can adapt to the ever-increasing digital world demands.

Green data center with eco friendly electricity usage tiny person concept. Database server technology for file storage hosting with ecological and carbon neutral power source vector illustration.

Data Center – Site Selection | Content routing

November 6, 2014

by Matt Conran Blog

Data Center Site Selection

In today's interconnected world, data centers play a crucial role in ensuring the smooth functioning of the internet. Behind the scenes, intricate routing mechanisms are in place to efficiently transfer data between different locations. In this blog post, we will delve into the fascinating world of data center routing locations and discover how they contribute to the seamless browsing experience we enjoy daily.

Data centers are the backbone of our digital infrastructure, housing vast amounts of data and serving as hubs for internet traffic. One crucial aspect of data center operations is routing, which determines the path that data takes from its source to the intended destination. Understanding the fundamentals of data center routing is essential to grasp the significance of routing locations.

When it comes to selecting routing locations for data centers, several factors come into play. Proximity to major internet exchange points, network latency considerations, and redundancy requirements all influence the decision-making process. We will explore these factors in detail and shed light on the complex considerations involved in determining optimal routing locations.

Data center routing locations are strategically distributed across the globe to ensure efficient data transfer and minimize latency. We will take a virtual trip around the world, uncovering key regions where routing locations are concentrated. From the bustling connectivity hubs of North America and Europe to emerging markets in Asia and South America, we'll explore the diverse geography of data center routing.

Content Delivery Networks (CDNs) play a vital role in optimizing the delivery of web content by caching and distributing it across multiple data centers. CDNs strategically position their servers in various routing locations to minimize latency and ensure rapid content delivery to end-users. We will examine the symbiotic relationship between data center routing and CDNs, highlighting their collaborative efforts to enhance web browsing experiences.

Matt Conran

Highlights: Data Center Site Selection

Understanding Geographic Routing in Data Centers

In today’s hyper-connected world, data centers play a crucial role in ensuring seamless digital experiences. Geographic routing within data centers refers to the strategic distribution of data across various global locations to optimize performance, enhance reliability, and reduce latency. This process involves directing user requests to the nearest or most efficient data center based on their geographical location. By understanding and implementing geographic routing, companies can significantly improve the speed and quality of their services.

**The Importance of Geographic Proximity**

One of the primary reasons for geographic routing is to minimize latency—the delay between a user’s request and the response from the server. When data centers are geographically closer to the end-users, the time taken for data to travel back and forth is reduced. This proximity not only accelerates the delivery of content and services but also enhances user satisfaction by providing a smoother and faster experience. In a world where milliseconds matter, especially in sectors like finance and gaming, geographic routing becomes a game-changer.

**Challenges and Considerations**

While geographic routing offers numerous benefits, it also presents several challenges. One major consideration is the complexity of managing multiple data centers spread across the globe. Companies must ensure consistent data synchronization, security, and compliance with local regulations. Additionally, unforeseen events such as natural disasters or political instability can impact data center operations. Therefore, businesses need to adopt robust disaster recovery plans and flexible routing algorithms to adapt to changing circumstances.

**Technological Innovations Driving Geographic Routing**

Recent advancements in technology have significantly enhanced geographic routing capabilities. Machine learning algorithms can now predict traffic patterns and dynamically adjust routing paths to optimize performance. Moreover, edge computing—bringing computation and data storage closer to the location of need—further complements geographic routing by reducing latency and bandwidth usage. As these technologies continue to evolve, they promise to make geographic routing even more efficient and reliable.

Google Cloud CDN

A CDN is a globally distributed network of servers that stores and delivers website content to users based on their geographic location. By caching and serving content from servers closest to the end users, CDNs significantly reduce latency and enhance the overall user experience.

Google Cloud CDN is a robust and scalable CDN solution offered by Google Cloud Platform. Built on Google’s global network infrastructure, it seamlessly integrates with other Google Cloud services, providing high-performance content delivery worldwide. With its vast network of edge locations, Google Cloud CDN ensures low-latency access to content, regardless of the user’s location.

– Global Edge Caching: Google Cloud CDN caches content at edge locations worldwide, ensuring faster retrieval and reduced latency for end-users.

– Security and Scalability: With built-in DDoS protection and automatic scaling, Google Cloud CDN guarantees the availability and security of your content, even during traffic spikes.

– Intelligent Caching: Leveraging machine learning algorithms, Google Cloud CDN intelligently caches frequently accessed content, further optimizing delivery and reducing origin server load.

– Real-time Analytics: Google Cloud CDN provides comprehensive analytics and monitoring tools to help you gain insights into your content delivery performance.

Routing IP addresses: The Process

In IP routing, routers must make packet-forwarding decisions independently of each other. Therefore, IP routers are only concerned with finding the next hop to a packet’s final destination. The IP routing protocol is myopic in this sense. IP’s myopia allows it to route around failures easily, but it is also a weakness. In most cases, the packet will be routed to its destination via another router unless the router is on the same subnet (more on this later).

In the routing table, a router looks up a packet’s destination IP address to determine the next hop. A packet is then forwarded to the network interface returned by this lookup by the router.

RIB and the FIB

All the different pieces of information learned from all the other methods (connected, static, and routing protocols) are stored in the RIB. A software component called the RIB manager selects all these different methods. Every routing protocol has a unique number called the distance2. If more than one protocol has the same prefix, the RIB manager picks the protocol with the lowest distance. The shortest distance is found on connected routes. Routes obtained via a routing protocol have a greater distance than static routes.

Routing to a data center

Let us address how users are routed to a data center. Well, there are several data center site selection criteria or even checklists that you can follow to ensure your users follow the most optimal path and limit sub-optimal routing. Distributed workloads with multi-site architecture open up several questions regarding the methods for site selection, path optimization for ingress/egress flows, and data replication (synchronous/asynchronous) for storage.

### Understanding Routing Protocols

Routing protocols are the rules that dictate how data is transferred from one point to another within a network. They are the unsung heroes of the digital world, enabling seamless communication between servers, devices, and users. In data centers, common routing protocols include BGP (Border Gateway Protocol), OSPF (Open Shortest Path First), and EIGRP (Enhanced Interior Gateway Routing Protocol). Each protocol has its unique strengths, making them suitable for different network configurations and requirements.

Border Gateway Protocol (BGP) is a cornerstone of internet routing, and its importance extends to data centers. BGP is designed to manage how packets are routed across the internet by exchanging routing information between different networks. In a data center environment, BGP helps in optimizing paths, ensuring redundancy, and providing failover capabilities. This makes it indispensable for maintaining the robustness and resilience of network infrastructure.

### Understanding Border Gateway Protocol (BGP)

At the heart of data center routing lies the Border Gateway Protocol, or BGP. BGP is the protocol used to exchange routing information across the internet, making it a cornerstone of global internet infrastructure. It enables data centers to communicate with each other and decide the best routes for data packets. What makes BGP unique is its ability to determine paths based on various attributes, which helps in managing network policies and ensuring data is routed through the most efficient paths available.

### How BGP Enhances Data Center Efficiency

BGP doesn’t just facilitate communication between data centers; it enhances their efficiency. By allowing data centers to dynamically adjust routing paths, BGP helps in managing traffic loads, avoiding congestion, and preventing outages. For example, if a particular route becomes congested, BGP can reroute traffic through alternative paths, ensuring that data continues to flow smoothly. This adaptability is essential for maintaining the performance and reliability of data centers.

BGP AS Prepending

AS Path prepending is a simple yet powerful technique for manipulating BGP route selection. By adding additional AS numbers to the AS Path attribute, network administrators can influence the inbound traffic flow to their network. Essentially, the longer the AS Path, the less attractive the route appears to neighboring ASes, leading to traffic routed through alternate paths.

AS Path prepending offers several benefits for network administrators. Firstly, it provides a cost-effective way to balance inbound traffic across multiple links, thereby preventing congestion on a single path. Secondly, it enhances network resilience by providing redundancy and alternate paths in case of link failures. Lastly, AS Path prepending can be used strategically to optimize outbound traffic flow and improve network performance.

In my example, AS 1 wants to ensure traffic enters the autonomous system through R2. We can add our autonomous system number multiple times, so the as-path becomes longer. Since BGP prefers a shorter AS path, we can influence our routing. This is called AS path pretending. Below, the default behavior is shown without pretending to be configured.

First, create a route map and use set as-path prepend to add your own AS number multiple times. Don’t forget to add the route map to your BGP neighbor configuration. It should be outbound since you are sending this to your remote neighbor! Let’s check the BGP table! Now we see that 192.168.23.3 is our next-hop IP address. The AS Path for the second entry has also become longer. That’s it!

Distributing the load

Furthermore, once the content is distributed to multiple data centers, you need to manage the request for the distributed content and the load by routing users’ requests to the appropriate data center. Routing in the data center is known as content routing. Content routing takes a user’s request and sends it to the relevant data center.

Note on Content Routing

Content routing is a critical subset of data center routing, focusing on directing user requests to the most appropriate server based on various factors such as location, server load, and network conditions. This approach not only enhances user experience by reducing latency but also optimizes resource utilization within the data center. Content routing relies on advanced algorithms and technologies like load balancing and Anycast networking to make real-time decisions about the best path for data to travel.

Example: Distributing Load with Load Balancing

**The Role of Load Balancing**

Load balancing is a critical aspect of data center routing. It involves distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed. This distribution improves the availability and reliability of applications, enhances user experience, and reduces downtime. Load balancers also monitor server health and reroute traffic if a server becomes unavailable, maintaining seamless connectivity.

**Types of Load Balancing**

There are several types of load balancing methods, each with its own advantages:

1. **Round Robin:** This method distributes requests sequentially across servers, ensuring an even distribution.

2. **Least Connections:** Directs traffic to the server with the fewest connections, optimizing resource use.

3. **IP Hash:** Routes requests based on a unique hash of the client’s IP address, ensuring consistent connections to the same server.

Example Data Center Peering: VPC Peering

**The Importance of Efficient Routing**

Efficient routing to a data center is essential for maintaining the speed and reliability of network services. As businesses increasingly rely on cloud-based applications and services, the demand for seamless connectivity continues to grow. Poor routing can lead to latency issues, bottlenecks, and even outages, which can be detrimental to business operations. By optimizing routing paths, organizations can improve performance and ensure a smooth flow of data.

**VPC Peering: A Key Component**

Virtual Private Cloud (VPC) peering is a vital aspect of data center routing. It allows the interconnection of two VPCs, enabling them to communicate as if they were on the same network. This setup enhances the flexibility and scalability of network architectures, making it easier for businesses to manage workloads across multiple environments. VPC peering eliminates the need for complex gateways, reducing latency and potential points of failure.

Before you proceed, you may find the following post helpful:

Data Center Site Selection

Data Center Interconnect (DCI)

Before we get started on your journey with a data center site selection checklist, it may be helpful to know how data centers interconnect. Data Center Interconnect (DCI) solutions have been known for quite some time; they are mainly used to help geographically separated data centers.

Layer 2 extensions might be required at different layers in the data center to enable the resiliency and clustering mechanisms offered by the other applications. For example, Cisco’s OTV can be used as a DCI solution.

OTV provides Layer 2 extension between remote data centers using MAC address routing. A control plane protocol exchanges MAC address reachability information between network devices, providing the LAN extension functionality. This has a tremendous advantage over traditional data center interconnect solutions, which generally depend on data plane learning and flooding across the transport to learn reachability information.

Data Center Site Selection Criteria

Proximity-based site selection

Different data center site selection criteria can route users to the most optimum data centers. For example, proximity-based site selection involves selecting a geographically closer data center, which generally improves response time. Additionally, you can route requests based on the data center’s load or application availability.

Things become interesting when workloads want to move across geographically dispersed data centers while maintaining active connections to front-end users and backed systems. All these elements put increasing pressure on the data center interconnect ( DCI ) and the technology used to support workload mobility.

Multi-site load distribution & site-to-site recovery

Data center site selection can be used for site-to-site recovery and multi-site load distribution. Multi-site load distribution requires a mechanism that enables the same application to be accessed by both data centers, i.e., an active/active setup.

For site-to-site load balancing, you must use an active/active scenario ( also known as hot standby ) in which both data centers host the same active application. Logically active / standby means that some applications will be active on one site while others will be on standby at the other sites.

Data center site selection is vital, and determining which data center to target your request can be based on several factors, such as proximity and load. Different applications will prefer different site selection mechanisms. For example, video streaming will choose the closest data center ( proximity selection ). Other types of applications would prefer data centers that are least loaded, and others work efficiently with the standard round-robin metric. The three traditional methods for data center site selection criteria are Ingress site selection DNS-based, HTTP redirection, and Route Health Injection.

Data Center Site Selection Checklist

Hypertext Transfer Protocol ( HTTP ) redirection

Applications can have built-in HTTP redirection in their browsers. This enables them to communicate with a secondary server if the primary server is not available. When redirection is required, the server will send an HTTP Redirect ( 307 ) to the client and send the client to the correct site with the required content. One advantage of this mechanism is that you have visibility into the requested content, but as you have probably already guessed, it only works with HTTP traffic.

DNS-based request routing

DNS-based request routing, or DNS load balancing, distributes incoming network traffic across multiple servers or locations based on the DNS responses. Traditionally, DNS has been primarily used to translate human-readable domain names into IP addresses. However, DNS-based request routing can now be vital in optimizing network traffic flow.

How does it work?

When a user initiates a request to access a website or application, their device sends a DNS query to a DNS resolver. Instead of providing a single IP address in response, the DNS resolver returns a list of IP addresses associated with the requested domain. Each IP address corresponds to a different server or location that can handle the request.

The control point for geographic load distribution in DNS-based request routing resides within DNS. DNS-based request routing uses DNS for both site-to-site recovery and multi-site load distribution. A DNS request, either recursive or iterative, is accepted by the client and directed to a data center based on configurable parameters. This provides the ability to distribute the load among multiple data centers with an active/active design based on criteria such as least loaded, proximity, round-robin, and round-trip time ( RTT ).

The support for legacy applications

DNS-based request routing becomes challenging if you have to support legacy applications without DNS name resolution. These applications have hard-coded IP addresses used to communicate with other servers. When there is a combination of legacy and non-legacy applications, the solution might be to use DNS-based request routing and IGP/BGP.

Another caveat for this approach is that the refresh rate for the DNS cache may impact the convergence time. Once a VM moves to the secondary site, there will also be increased traffic flow on the data center interconnect link—previously established connections are hairpinned.

Route Health Injection ( RHI )

Route Health Injection (RHI) is a method for improving network resilience by dynamically injecting alternative routes. It involves monitoring network devices and routing protocols to identify potential failures or performance degradation. By preemptively injecting alternative routes, RHI enables networks to reroute traffic and maintain optimal connectivity quickly.

How does Route Health Injection work?

Route Health Injection operates by continuously monitoring the health of network devices and analyzing routing protocol information. It leverages various metrics such as latency, packet loss, and link utilization to assess the overall health of network paths. When a potential issue is detected, RHI dynamically injects alternative routes to bypass the affected network segment, allowing traffic to flow seamlessly.

RHI is implemented in front of the application and, depending on its implementation, allows the same address or a different address to be advertised. It’s a route injected by a local load balancer that influences the ingress traffic path. RHI injects a static route when the VIP ( Virtual IP address ) becomes available and withdraws the static route when the VIP is no longer active. The VIP is used to represent an application.

A key point: Data center active-active scenario

Route Health Injection can be used for an active/active scenario as both data centers can use the same VIP to represent the server cluster for each application. RHI can create a lot of churns as routes are constantly being added and removed. If the number of supported applications grows, the network’s number of network host routes grows linearly. The decision to use RHI should come down to the scale and size of the data center’s application footprint.

RHI is commonly used on Intranets as the propagation of more specifics is not permitted on the Default Free Zone ( DFZ ). Specific requirements require RHI to be used with BGP/IGP for external-facing clients. Due to the drawbacks of DNS caching, RHI is often preferred over DNS solutions for Internet-facing applications.

A quick point: Ansible Automation

Ansible could be a good automation tool for bringing automation into the data center. Ansible can come from Ansible CLI, with Ansible Core, or a platform approach with Ansible Tower. Can these automation tools assist in our data center operations? Ansible variables can be used to remove site-specific information to make your playbooks more flexible.

For data center configuration or simply checking routing tables, you can have a single playbook that uses Ansible variables to perform operations on both data centers. I use this to check the routing tables of each data center. Once playbook using Ansible variables against one inventory for all my data centers. This can quickly help you when troubleshooting data center site selection.

BGP AS prepending

This can be used for active / standby site selection, not a multi-load distribution method. BGP uses the best path algorithm to determine the best Path to a specific destination. One of those steps that all router manufacturers widely use is AS Path—the lower the number of ASs in the path list, the better the route.

Specific routes are advertised from both data centers, with additional AS Paths added to the secondary site’s routes. When BGP goes through its site selection processes, it will choose the Path with the least AS Paths, i.e., the primary site without AS Prepending.

BGP conditional advertisements

BGP Conditional Advertisements are helpful when you are concerned that some manufacturers may have AS Path explicitly removed. A condition must be met with conditional route advertisement before an advertisement occurs. The routers on the secondary site monitor a set of prefixes located on the first site, and when those prefixes are not reachable at the first site, the secondary sites begin to advertise.

Its configuration is based on community”no-export” and iBGP between the sites. If routes were redistributed between BGP > IGP and advertised to the IBGP peer, the secondary site would advertise those routes, defeating the purpose of a conditional advertisement.

The RHI method used internally or externally with BGP is proper when using IP as the site selection method. For example, this may be the case when you have hard-coded IP addresses in the application used primarily with legacy applications or are concerned about DNS caching issues. Site selection based on RHI and BGP requires no changes to DNS.

However, its main drawback is that it cannot be used for active/active data centers and is primarily positioned as an active / standby method. This is because there is only ever one routing table entry in the routing table.

Additionally, for the final data center site selection checklist. There are designs where you can use IP Anycast in conjunction with BGP, IGP, and RHI to achieve an active/active scenario, and I will discuss this later. With this setup, there is no need for BGP conditional route advertisement or AS Path prepending.

Closing Points: Data Center Selection

Strategic routing is essential for optimizing network performance and ensuring that data reaches its destination quickly and efficiently. With the ever-increasing demand for faster internet speeds and lower latency, data centers need to be strategically located and correctly interconnected. Routing decisions are based on various factors, including geographical proximity, load balancing, and redundancy. By intelligently directing traffic, companies can ensure optimal performance and user satisfaction.

One of the primary considerations in data center routing is load balancing. This technique involves distributing incoming network traffic across multiple servers or data centers to ensure no single server becomes overwhelmed. Load balancing not only enhances the speed and efficiency of data processing but also provides redundancy, ensuring that if one server goes down, others can take over. This seamless transfer of data minimizes downtime and maintains the continuity of services.

Redundancy is a critical factor in ensuring the reliability of data center operations. By having multiple routes to reach a data center, companies can avoid potential disruptions caused by network failures. Redundant pathways ensure that even if one connection is lost, data can still be rerouted through an alternative path. This built-in resilience is vital for maintaining the stability and reliability of data services that businesses and consumers depend on.

Technological advancements have revolutionized data center routing. Techniques such as Anycast routing allow the same IP address to be used by multiple data centers, directing the data to the nearest or most optimal location. Additionally, software-defined networking (SDN) provides dynamic management of routing policies, enabling rapid responses to changing network conditions. These innovations enhance the flexibility and efficiency of data routing, ensuring that the data highway remains smooth and fast.

Load Balancing and Scale-Out Architectures

Understanding Load Balancing

Example: What is Squid Proxy?

Example: Understanding HA Proxy

The Mechanics of Scale-Out Architectures

Google Cloud Load Balancing

Google Cloud NEGs

Load Balancing with MIGs

Understanding Cross-Region HTTP Load Balancing

Distributing Load with Service Mesh

Google Cloud CDN

Load Balancer Scaling

Techniques for Scaling Load Balancers

**Scaling Up**

**Scaling Out**

Best Practices for Load Balancer Scaling

Highlights: Load Balancing and Scale-Out Architectures

Availability:

Efficiency:

Scale up and scale out:

Guide on Gateway Load Balancing Protocol (GLBP)

What is Load Balancer Scaling?

**The Benefits of Load Balancer Scaling**

**Load Balancer Scaling Strategies**

**Best Practices for Load Balancer Scaling**

Example: Direct Server Return.

How to scale load balancer

Load Balancing with Routers

Load Balancer Scalability

Scaling load balancers with Scale-Up or Scale-Out

Scale up load balancing

Additional information: Scale-out load balancing

Load Balancer Scalability: The Operations

The virtual IP address and load balancing control plane

Local and global load balancing

Load Balancer Scaling Types

Application-level load balancing

Network-based load balancing

DNS pinning

IP Anycast load balancing

Best for UDP traffic

**Scaling load balancers at Layer 2**

**Scaling load balancers at Layer 3**

Scaling load balancers with One-ARM mode

The use of X-forwarder-for HTTP header

Scaling load balancers with Direct Server Return

Scaling load balancers with Microsoft network load balancing

The use of Multicast

Load Balancing Options

User session persistence ( Stickiness )

Data path programming

Persistent HTTP sessions

Destination-only NAT

Understanding Browser Caching

Final Point: Scaling Load Balancing

Summary: Load Balancing and Scale-Out Architectures

Data Center Site Selection

Highlights: Data Center Site Selection

Understanding Geographic Routing in Data Centers

Google Cloud CDN

Routing IP addresses: The Process

RIB and the FIB

Routing to a data center

BGP AS Prepending

Distributing the load

Example: Distributing Load with Load Balancing

Example Data Center Peering: VPC Peering

Data Center Site Selection

Data Center Interconnect (DCI)

Data Center Site Selection Criteria

Data Center Site Selection Checklist

DNS-based request routing

**How does it work?**

Route Health Injection ( RHI )

How does Route Health Injection work?

A key point: Data center active-active scenario

A quick point: Ansible Automation

BGP AS prepending

Closing Points: Data Center Selection

Quick Links

Scaling Up

Scaling Out

The Benefits of Load Balancer Scaling

Load Balancer Scaling Strategies

Best Practices for Load Balancer Scaling

Scaling load balancers at Layer 2

Scaling load balancers at Layer 3

How does it work?