What is VXLAN
What is VXLAN?
Introduced by Cisco and VMware, VXLAN stands for Virtual eXtensible Local Area Network and is perhaps the most popular overlay technology for IP-based data center fabrics. VXLAN was specifically designed for Layer 2 over Layer 3 tunneling and its early competitions from NVGRE and STT are fading away and VXLAN is becoming the industry standard.
Why introduce VXLAN?
1) STP issues and scalability constraints: STP is undesirable on a large scale and lacks a proper load balancing mechanism. A solution was needed that could leverage the ECMP capabilities of an IP network while also offer the capability of extended VLANs across an IP core i.e virtual segments across the network core.
2) Multi-tenancy: Layer 2 networks are capped at 4000 vlans restricting multi tenancy design.
3) ToR table scalability: Every ToR switch may need to support a number of virtual servers and each virtual server requires a number of NICs and MAC addresses. This pushes the limits on the table sizes for the ToR switch. After the ToR tables become full, Layer 2 traffic will be treated as unknown unicast traffic which will be flooded across the network causing instability to a previously stable core.
Typical Use Cases
1) Multi-tenant IaaS Clouds where you need large number of segments.
2) Link Virtual to Physical Servers. This is done via software or hardware VXLAN to VLAN gateway.
3) HA Clusters across failure domains/availability zones.
4) VXLAN works well over fabrics that have equidistant endpoints.
5) VXLAN-encapsulated VLAN traffic across availability zones MUST be rate-limited to prevent broadcast storm propagation across multiple availability zones.
VXLAN employs a MAC over IP/UDP overlay scheme and extends the traditional VLAN boundary of 4000 VLANs. The 12-bit VLAN identifier in traditional VLANs capped scalability within the data center and proved to be cumbersome if you wanted a vlan per application segment model. VXLAN scales the 12-bit to a 24-bit identifier and allows for 16 million logical endpoints, with each endpoint potentially offering another 4000 vlans. While VXLAN does provide Layer 2 adjacency between these logical endpoints with the ability to move VM’s across boundaries, the main driver for its insertion was to overcome the challenge of having only 4000 VLAN. Typically an application segment would have multiple segments, between each segment you will have firewalling and load balancing services and each segment requiring a different VLAN. The Layer 2 VLAN segment is used to transfer non routable heartbeats or state information that cant cross a L3 boundary. If you are a cloud provider you will soon reach the 4000k vlan limit.
VXLAN Control Plane
VXLAN control plane is very similar to the spanning tree control plane. If a vSwitch receives a packet destined to an unknown address, the vSwitch will forward the packet to an IP address that floods the packet to all the other vSwitches. This IP address is, in turn, mapped to a multicast group across the network. VXLAN doesn’t explicitly have an control plane and it requires IP multicast running in the core for forwarding traffic and host discovery.
Best Practices for enabling IP Multicast in the core
1) Bidirectional PIM or PIM Sparse Mode
2) Redundant Rendezvous Points (RP)
3) Shared trees (reduce the amount of IP multicast state)
4) Always check the IP multicast table sizes on core and ToR switches
5) Single IP multicast address for multiple VXLAN segments is OK
The requirement for IP multicast in the core made VXLAN undesirable from a operation point of view. The creating of the tunnel endpoints is the simple part but introducing a protocol like IP multicast to a core just for tunnel control plane was seen as undesirable. As a result some of the more recent version of VXLAN support IP unicast.
VXLAN completely eliminates the need for spanning tree as it uses a MAC over IP/UDP solution. This enables the core to be IP and not to run spanning tree. A lot of people ask why VXLAN uses UDP? The reason being is that the UDP port numbers make VXLAN naturally inherit Layer 3 ECMP features. The entropy that enables load balancing across multiple paths is embedded into the UDP source port of the overlay header.
The stability of VXLAN and the applications running within it are impacted by the underlying control plan network. If the underlying IP network cannot converge quickly enough VLXAN packets may be dropped and application cache timeout may be triggered. The rate of change in the underlying network have a major impact of the stability of the VXLAN tunnels yet the rate and change of the VLXAN tunnels do not have any effect on the underlying control plane. This is similar to how the stability of an MPLS / VPN overlay is affected by the Cores IGP.
VXLAN Benefits and Drawbacks
|Runs over IP Transport||No control plane|
|Offers a large number of logical endpoints||Needs IP Multicast***|
|Reduced flooding scope||No IGMP snooping ( yet )|
|Eliminates STP||No Pvlan support|
|Easily integrated over existing Core||Requires Jumbo frames in the core ( 50 bytes)|
|Minimal host to network integration||No built in security features **|
|Not a DCI solution ( no arp reduction, first hop gateway localization, no inbound traffic steering i.e LISP )|
** VXLAN has no built in security features. Anyone who gains access to the core network can insert traffic into VXLAN segments. The VXLAN transport network must be absolutely secure as no existing FW or IPS equipment has visibility into the VXLAN traffic.
*** Recent versions have Unicast VXLAN. Nexus 1000V release 4.2(1)SV2(2.1)
Updated: VXLAN Enhancements
MAC Distribution Mode is an enhancement to VXLAN that prevents uknown unicast flooding. It eliminate the process of data plane MAC address learning. Traditionally, this was done by flooding to locate unknown end host and has now been replaced with a VXLAN control plane solution. During VM startup, the VSM ( control plane ) collects the list of MAC addresses and distributes the MAC-to-VTEP mappings to all VEMs participating in a VXLAN segment. This technique makes VXLAN more optimal by unicasting more intelligently, similar to Nicira and VMware NVP.
VXLAN ARP termination works by giving the VSM controller all the ARP and MAC information. This enables the VSM to proxy and respond locally to ARP requests without sending out a broadcast. Due to the fact that 90% of broadcast traffic is ARP requests ( ARP reply is unicast ), this significantly reduces broadcast traffic on the network.