Load Balancing and Scale-Out Architectures

Load balancers operate at different Open Systems Interconnection ( OSI ) Layers from one data center to another; common operation is between Layer 4 and Layer 7. This is because each data centers hosts-unique applications with different requirements. Every application is unique with respect to number of sockets, TCP connections ( short-lived or long-lived ), idle time-out and activities in each session in terms of packets per second. One of the most important elements of designing a load-balancing solution is to understand fully the application structure and protocols. However, it is also important to check application load times and performances which is why some developers seek out software testing companies similar to Parasoft to test their application.

For example, Real Time Streaming Protocol ( RTSP ). The initial client connects with TCP, however, return traffic from the server can be UDP, bypassing the LB. For this scenario, load-balancing method of Direct Server Return is a viable option.

Load balance function becomes the virtual representation of the application. Internal applications are represented by a Virtual IP address ( VIP ). VIP acts as a front-end serving external clients requests.


Load Balancing

Load Balancing


Major benefit of load balancing is that it provides scalability. Scalability is the ability of a networking device or application to handle organic and planned network growth. Scalability is the main advantage of load balancing and in terms of application capacity it increases the number of concurrent requests data centers can support.

Essentially, load balancing is the ability to distribute incoming workloads to multiple end station based on some kind of algorithm.

Scale Up or Scale Out

a) Scale up – Expand linear by buying bigger servers, adding additional CPU and memory etc. Scale up is usually done on transaction database servers as these types of servers are difficult to scale out. Scaling up is a simple approach but most expensive and non linear. Old applications were upgraded by scaling up ( vertical scaling ). Rigid approach, which is not elastic. In a virtualized environment, applications are now scaled linearly, in a scale out fashion.

b) Scale out – Add more parallel servers i.e scaling linearly. Scaling out is easier on web servers, just add additional web server as needed. Netflix is an example of a company that designs by scale out. It spins up Virtual Machines ( VM ) on-demand due to daily changes on network load. Scaling out is elastic and requires a load balancing component.


The scalability of scale out architectures is limited by shared state so try to share and lock as little state as possible. An example of server locking is eventual consistency approach used by Amazon where they limit the amount of transaction locking – shopping cards are not checked until you click “buy”.


General Load Balancing Operation

Outside there is a VIP and on inside a pool of servers exist. Load balancing device is configured for rules associating outside IP and port numbers with inside pool of servers. Clients only know the outside IP address through DNS replies. The load-balancing control plane monitors the health of the servers and determines, which ones are able to accept requests.

Client send a TCP SYN packet, which gets intercepted by LB device. The load balancer carry’s out an algorithm and sends to the best-server destination. To get the request to the server you can use either a) Tunneling, NAT and or two TCP sessions. In some cases the load balancer will have to rewrite the content. Whatever the case maybe the LB has to create a session so it knows that this client is associate with a particular inside server.


Local and Global Load Balancing

Local server selection takes place within the data center and is based on criteria such as server load and application response times. Any application that uses TCP or UDP protocols can be load balanced. Whereas local load-balancing determines the best device within a data center, global load balancing chooses the best data center to service client requests. Global load balancing is supported through redirection based on Domain Name System ( DNS ) and HTTP. HTTP mechanism provide better control, while DNS is fast and scalable. Both local and global appliances work hand-in-hand; local device feeds information to global device, enabling it to make better-load balancing decisions.


Types of Load Balancing

Application-Level Load Balancing : Load balancing is implemented between tiers in applications stack and is carried out within the application. Used in scenarios where applications are coded correctly making it possible to configure load balancing in the application. Designers can use open source tools with DNS or some other method to track flows between tiers of the application stack.

Network-Level Load Balancing : Network-level load balancing includes DNS round robin, Anycast, and L4 – L7 load balancers. Web browser clients do not usually have built-in application layer redundancy, which pushes designers to look at the network layer for load balancing services. If applications were designed correctly, load balancing would not be a network-layer function.


Application-Level load balancing

Application-level load balancing is all about what can we do inside the application for load balancing services. First thing you can do is scale up – add more-worker process. When client issue requests that request blocks some large worker processes and that resource is tied to TCP sessions. If your application requires session persistence ( long-lived TCP sessions ) then you block worker processes even if the client is not sending data.

The solution is FastCGI or change the web server to Nginx.





Nginx is event based. On Apache ( not event based ), every TCP connection consumes a worker process but with Nginx a client connection takes no processes unless you are processing an actual request. Generally, Linux is poor at processing many simultaneous requests. Nginx does not use threads and can easily have 100,000 connections. With Apache, you actually lose 50% of the performance and adding CPU doesn’t help. Around 80,000 connections you will experience serious performance problems not matter how many CPU you add.

Nginx is by far a better solution if you expect a lot of simultaneous connections.


Network-Based Load Balancing

First try to solve the problem in the application. When you cannot load balancing solely using applications, then turn to the network for load balancing services.


DNS Round Robin Load Balancing

Easiest type of network-level load balancing is DNS round robin. DNS server that keeps track of application server availability. DNS control plane distribute user traffic over multiple servers in a round robin fashion. It does come with caveats, a) DNS has no knowledge of server health, b) DNS caching problems, c) No measures are available to prevent DoS attacks against servers.

Clients ask for IP of web server and DNS server replies with IP address using some random order. Works well if application uses DNS. Some applications use hard-coded IP addresses and in these scenarios you cant rely on DNS based load balancing. DNS load balancing also requires low TTL times so the client will ask the servers often enough. Generally, DNS based load balancing works well but not with web browsers. There has been so many attacks to web browsers and browsers now implement a security feature-called DNS pinning. DNS pinning is a method whereby you get the IP address of the server and even though the TTL has expired you ignore the DNS TTL and continue to use the URL. It prevent people spoofing DNS records and is usually built-in to browsers.

DNS load balancing is perfect if application uses DNS and listens to DNS TTL times. Web browsers are not in that category.


IP Anycast Load Balancing

IP Anycast provide geographic server load balancing. The idea is to use the same IP address on multiple POPs. Routing in the core will choose the closest POP, providing routing for the client to the nearest POP. All server have same IP address configured on loopback. If the same IP address was configured on LAN interface; then, ARP replies would clash. Use any routing mechanism to generate ECMP to the loopback addresses. Static routes based on IP SLA or you can use OSPF between the server and router.

As requests come in the router will load balance based on 5-tuple, obviously do not load balance on destination addresses /port as it’s always the same. Usually done on source client IP address / port number. Process takes the 5 tuple and creates a hash value, which creates independents paths based on that value. This works well for UDP traffic and how root servers work. Good for DNS server load balancing.

Works well for UDP as every request from the client is independent. TCP does not work like this as TCP has sessions. Recommended not to use Anycast load balancing for TCP traffic. If you want to load balance TCP traffic; then, you need actual load balancer. This could be software package, Open Source ( HAproxy ) or dedicated appliance.


Layer 2 Designs

Refers to the load balancer in bridged mode. All load-balanced and non-load balanced traffic to and from the servers goes through the load balancing device.

The device bridges two VLANs together, with both VLANs existing in the same IP subnet. Essentially, the load balancer is acting like a crossover cable, merging two VLANs together. A key point here is that client-side and server-side are in the same subnet. Layer 2 implementations are much easier than layer 3 implementation as no changes to IP addresses, netmasks, and default gateway settings on servers. But with a bridged design, be careful about introducing loops and implement spanning tree protocol ( STP ).


Layer 3 Design

The load balancing device acts in routed mode. All load-balanced and non-load balanced traffic to and from the server goes through the load balancing device.

The device routes between two different VLANs that are in two different subnets. The key point and major difference between layer 3 and layer 2 designs, is client-side VLAN and server-side VLANs are in different subnets. The VLANs are not merged together and the load balancing device routes between VLANs. Layer 3 designs may be more complex to implements but in the long run will eventually be more scalable.


One-ARM Mode

One-armed mode refers to a load balancing device that is not in the forwarding path. Key point here is that the load balancer resides on its own subnet and has no direct connectivity with server-side VLAN

One ARM mode

One ARM mode


A key advantage of this mode is that only load balanced traffic goes through the device. Server-initiated traffic bypass the load balancer.

Changes both source and destination IP address. The load balancer terminates outside TCP sessions and initiates new inside TCP sessions. When the client connection comes in, you take the source IP and port number, put in connection tables and associate it with a TCP port number and IP of the load balancer. Now, as everything comes from the load balance IP address, the servers can no longer see the original client. To indicate to the server who the original client is we use – X-forwarder-for HTTP header.

Clients IP address is replaced with load balancer IP address. The load balancer can insert the X-Forwarders-for HTTP header where they copy the original IP address of the client into the extra HTTP header – “X-forwared-for header”.

Apache has a standard that copies the value of this header into the standard CGI variable so all the scripts can pretend no load balancer exist. The load balancer inserts data to the TCP session, In other words, the load balancer has to take ownership of the TCP sessions so it needs take control of TCP activities including buffering, fragmentation, reassemble. Modifying HTTP requests is hard. F5 has an accelerated mode of TCP LB.


Direct Server Return

Direct Server Return is when the same IP address is configured on all hosts. Same IP is configured on loopback interface and not LAN interface. The LAN IP address is only used for ARP, so the load balancer would send ARP requests only for the LAN IP address and it would just rewrite the MAC header ( not TCP or HTTP alterations ) and send the unmodified IP packet to the selected server. The server sends the reply straight to client, does not involve the load balancer. Requires layer 2 connectivity between load balancer and servers as load balancing is done on the MAC address ( example : Linux virtual Server ). Also, tunneling method available that uses Layer 3 between the load balancer and servers.


Direct Server Return

Direct Server Return


If you do not have layer 2 connectivity, you can use tunnels but be aware of MTU issues. Make sure the Maximum Segment Size ( MSS ) on the server is reduced so you do not have PMTU issue between client and sever.

With direct server return, how do you make sure the reply from the loopback and not the LAN address?

If you are using TCP; then, the IP address is the TCP session is dictated by the original TCP SYN packet so this is automatic. However, UDP is different as UDP leaving is different to UDP coming in. In UDP cases, you need to set the IP address manually with the application or with iptables. But for TCP the source in the reply is always copied from the destination IP address in the original TCP SYN request.


Microsoft Network Load Balancing

Microsoft load balancing is the ability to implement LB without load balancers. Create a cluster IP address for server and then use the flooding behavior to send to all servers.

Clients sends a packet to the shared cluster IP address, which is associated with a clients MAC address. This cluster MAC does not exist anywhere. When the request arrives to the last Layer 3 switch it sends out an ARP request “who has this IP adress”?.

ARP request arrives to all the server. So, when client packet arrives, it sends to bogus MAC address of the cluster and because the MAC address has never been associate with any source, all the traffic is flooded from the Layer 2 switch to the servers. The performance of the Layer 2 switch fall massively as unicast flooding is done in software.

Microsoft then changed this to use Multicast. This does not work and packets are dropped as an illegal source MAC to use a multicast MAC address . Cisco routers drop ARP packets with source MAC address as a multicast address. Overcome this by configuring static ARP entries. Microsoft also implement IGMP to reduce flooding.


User Session Persistence ( Stickiness )

The load balancer has to keep state of all sessions, even for inactive sessions. Session persistent create much more state than just the connection table. Some web application store to client session data on the servers so the sessions from the same client has to go the same server. This is particularly important when SSL is deployed for encryption or where shopping carts are used.

The client establishes an HTTP session with the web server and logs in. After login, the HTTPS session from the same client should land on the same web server, to which the client first logged in using the initial HTTP request. Following are ways load balancer can determine who the source client it.


Session Persistence

Session Persistence


Source IP address – > Problem may arise with designs that deploy large-scale NAT.

Extra HTTP cookies – > May require the load balancer to take ownership of the TCP session.

SSL session ID -> Will which retain session persistence even if the client is roaming and the client IP address changes.

Data Path Programming

F5 uses scripts that act on packets triggering the load balancing mechanism. You can select the server, manipulate HTTP headers or even manipulate content. For example, mediaWiki ( does not change content / caching headers ) the load balancer can add caching headers. Load balancer adds the headers that allows the content to be cached.


Persistent HTTP Sessions

Client have long-lived HTTP session so they eliminate one RTT and congestion window problem and; then, we have short lived session from load balancer to server. SPDY is next generation HTTP where you have multiple HTTP session over one TCP session. This is useful in high-latency environments such as mobile devices. F5 has a SPDY-to-HTTP gateway.


Destination-only NAT

Rewrites the destination IP address to a destination IP of the actual server and then forwards the packet. The reply packet has to hit the load balancer as the load balancer has to replace the source IP of server with source IP of load balancer. The client IP does not get changed so it looks like the server is talking directly with the client. This allows the server to do address-based access control or GEO location based on source address.



About Matt Conran

Matt Conran has created 184 entries.

Leave a Reply