Both SD-WAN and SASE incorporate a level of abstraction into the WAN in the form of virtual WANs. Now imagine these virtual WANs individually holding a single application running over the WAN but instead of being in one location, i.e., on a server, consider them end to end. The individual WAN runs to the cloud or enterprise location, having secure, isolated paths with different policies and topologies. The standard SD-WAN architecture consists of several components, such as a central SD-WAN controller acting as the brain with a global view of the network instructing the local SD-WAN Edge device to steer traffic. Most of the SD-WAN vendors have both an underlay and an overlay. The underlay consists of the infrastructure, be it physical or virtual, and the overlay network, which is the SD-WAN virtual WANs to which the applications are mapped.
- A key point: Performance per overlay
As each application is in an isolated WAN overlay, we can assign different mechanisms to each overlay that is independent of others. Such different performance metrics and topologies can be assigned to each overlay. More importantly, all these can be assigned regardless of the underlying transport. The key point is that each of these virtual WANs is completely independent.
SD-WAN Is Not New
The concepts of SD-WAN are not new. We have had encryption, path control, and overlay networking for some time. However, the main benefit of SD-WAN is that it acts as an enabler to wrap these technologies together and present them to enterprises as a new integrated offering. We have WAN edge devices that forward traffic to other edge devices across a WAN via centralized control. This enables you to configure application-based policy forwarding and security rules across performance-graded WAN paths.
The SD-WAN Control and Data Plane
SD-WAN separates the control from the data plane functions and uses central control plane components to make intelligent decisions and forwards these decisions to the data plane SD-WAN Edge routers. The control plane components provide the control plane for the SD-WAN network and instruct the data plane devices that consist of the SD-WAN Edge router instructions as to where to steer traffic. The brains of the SD-WAN network are the SD-WAN control plane components with a fully holistic view that is end to end. This is compared to the traditional network where the control plane functions are resident in each device. The data plane is where the simple forwarding occurs, and the control plane, which is separate from the data plane, sets up all the controls for the data plane to forward.
Removing intensive algorithms
SDN is about taking intensive network algorithms out of WAN edge router hardware and placing them into a central controller. Previously, in traditional networks, this was in individual hardware devices using control plane points in the data path. BGP-based networks attempted to use the same concepts with Route-Reflector (RR) designs. They moved route reflectors (RR) off the data plane, and these RRs were then used to compute the best-path algorithms. Route reflectors can be positioned anywhere in the network and do not have to sit on the data path. With the controller-based approach that SD-WAN has, you are not embedding the control plane in the network. This allows you to centrally provision and pushes policy down any instructions to the data plane from a central location. This simplifies management and increases scale. Now with SD-WAN, we can centralize control plane security and routing, resulting in data path fluidity. The data plane can flow based on the policy set by the control plane controller that is not in the data plane. The SD-WAN control plane takes care of routing and security decisions and passes the relevant information between the SD-WAN edge routers.
Diagram: SD-WAN: Meaning of VPN.
Challenges With the WAN
The traditional WAN comes with a lot of challenges. It creates a siloes management effect where different WAN links try to connect everything. Traditional WANs require extensive planning for the logistics of calling. In addition, trying to add a branch or remote location can be costly. Additional hardware purchases are required for each site.
Diagram: Wide Area Network (WAN): WAN network and the challenges.
Visibility plays an important role in day-to-day monitoring, and alerting is crucial to understanding the ongoing operational impact of the WAN. In addition, visibility enables critical performance levels to be monitored as deployments are scaled out. This helps with proactive alerting, troubleshooting, and policy optimization. The traditional WAN is known for its lack of visibility.
Challenge: Service Level Agreement (SLA)
A service level agreement (SLA) is defined as a legally binding contract between the service provider and one or more clients that lays down the specific terms and agreements governing the duration of the service engagement. A traditional WAN architecture may consist of private MPLS links with Internet or LTE link as backup. The SLAs within the MPLS service provider environment is usually broken down into bronze, silver, and gold main categories. However, these types of SLA do not fit all geographies and should be fine-tuned per location and customer requirements. Therefore, I would consider these SLAs to be very rigid.
Challenge: Static and lacking agility
The WAN’s capacity, reliability, analytics, and security parts should be available on demand. Yet the WAN infrastructure is very static. New sites and bandwidth upgrades require considerable processing time, and this WAN’s static nature prohibits agility. For today’s type of application and the agility required for business, the WAN is not agile enough, and nothing can be performed on the fly to meet business requirements. When it comes to network topologies, they can be depicted either physically or logically. Common topologies you may have seen include the Star, Mesh, Full, and Ring topologies.
In a physical world, these topologies are fixed and cannot be automatically changed. And the logical topologies can also be hindered by physical footprints. The traditional model of operation forces applications to fit into a specific network topology that is already built and designed. We see this a lot with MPLS/VPNs. The application needs to fit into a predefined topology. This can be changed with configurations such as adding and removing Route Targets, but this requires administrator intervention.
Challenge: Old methods of routing protocols
Routing protocols make forwarding decisions based on destination addresses, and these decisions are made on a hop-by-hop basis. As a result, the application can take paths limited to routing loop restrictions, meaning that the routing protocols will not take a path that could potentially result in a forwarding loop. Although this overcomes the routing loop problems, this limits the number of paths the application traffic can take. The traditional WAN has a hard time enabling micro-segmentation. Micro-segmentation enhances network security by restricting hackers’ lateral movement in the event of a breach. As a result, it’s become increasingly widely deployed by enterprises over the last few years. It provides firms with improved control over east-west traffic and helps to keep applications running in the cloud or data center-type environments more secure.
Routing support is often inconsistent. For example, many traditional WAN vendors support both LAN-and-WAN side dynamic routing and virtual routing and forwarding (VRF) – some only on the WAN side. Then some only support static routing, and other vendors don’t have any support for routing at all.
Challenges with BGP
The issue with BGP: Border Gateway Protocol (BGP) attributes
Border Gateway Protocol (BGP) refers to a gateway protocol that enables the internet to exchange routing information between autonomous systems (AS). As networks interact with each other, they need a way to communicate. This is accomplished through peering. BGP makes peering possible. Without it, networks would not be able to send and receive information from each other. However, it comes with some challenges.
A redundant WAN design requires a routing protocol, either dynamic or static, for effective traffic engineering and failover. This can be done in several ways. For example, for the Border Gateway Protocol (BGP), we can set BGP attributes such as the MED and Local Preference or set the administrative distance on static routes. Routing protocols require complex tuning to load balance between border edge devices. Although these attributes allow granular policy control, they do not cover aspects relating to path performance, such as Round Trip Time (RTT), delay, and jitter. There has always been a problem with complex routing for the WAN. As a result, it’s tricky to configure Quality of Service (QOS) policies per-link basis and design WAN solutions to incorporate multiple failure scenarios.
Issues with BGP: Lack of performance awareness
Due to the lack of performance awareness, BGP may not choose the best-performing path. Therefore, the question we need to ask ourselves is will BGP be able to route on the best path versus the shortest path?
Diagram: BGP protocol. BGP protocol example.
Issues with BGP: The shortest path is not always the best path
The shortest path is not necessarily the best path. Originally, we didn’t have real-time voice and video traffic which is highly sensitive to latency and jitter. We also assumed that all links were equal. This is not the case today, where we have a mix and match of links such as slow LTE and fast MPLS. The shortest path is no longer effective. There are solutions on the market to enhance BGP, offering performance-based solutions for BGP-based networks. These could, for example, send out ICMP requests to monitor the network, then, based on the response, modify the BGP attributes such as AS prepending to influence the traffic flow. All this is done in an attempt to make BGP more performance-based.
BGP is not performance aware
However, we still can’t get away from the fact that BGP is not capacity or performance aware. The common BGP attributes used for path selection are AS-Path length and multi-exit discriminators (MED). Unfortunately, these attributes do not correlate with the network or application’s performance.
Issues with BGP: AS-Path that misses key performance metrics
With default configurations, when BGP receives multiple paths to the same destination, it runs the best path algorithm to decide the best path to install in the IP routing table. Generally, this path selection is based on AS-Path, the number of ASs. AS-Path is not an efficient measure of end-to-end transit. It misses the entire shape of the network, which can result in long path selection or paths experiencing packet loss. Also, BGP changes paths only in reaction to changes in the policy or the set of available routes.
Diagram: BGP protocol explained. The issues.
Issues with BGP: BGP and Active-Active deployments
Configuring BGP at the WAN edge requires the applications to fit into a previously defined set network topology. This is not what we require for applications. BGP is hard to configure and manage when you want active-active or bandwidth aggregation. What options do you have when you want to steer sessions over multiple links dynamically?
Blackout detection only
BGP was not designed to address WAN transport brownouts caused by packet loss. Even with blackouts of entire link failure, the application recovery could take tens of seconds and even minutes to become fully operational. Nowadays, we have more brownouts than blackouts. But the original design of BGP was to detect blackouts only. Brownouts can last anywhere from 10ms to 10 seconds, so it’s crucial to detect the failure in a sub-second and re-route to a better path. To provide resiliency, WAN edge protocols must be combined with additional mechanisms, such as IP SLA and even enhanced object tracking. Unfortunately, these add to configuration complexity.
Major Environmental Changes
The hybrid WAN, typically consisting of Internet and MPLS, was introduced to save costs and resilience. However, we have had three emerging factors – new application requirements, increased Internet use, and the adoption of public cloud services that have put traditional designs under pressure. We also have a lot of complexity at the branch. Many branch sites now include various appliances such as firewalls, intrusion prevention, Internet Protocol (IP) VPN concentrators, WAN path controllers, and WAN optimization controllers. All these point solutions must be maintained and operated but also provide the right visibility that can be easily digested. Visibility is key for the WAN. How do you obtain visibility into application performance across a hybrid WAN and ensure that applications receive appropriate prioritization and are forward over an appropriate path?
The era of client-server
The design for the WAN and branch sites was conceived in the client-server era. At that time, the WAN design satisfies the applications’ needs. Then applications and data resided behind the central firewall in the on-premises data center. Today, of course, we are in an entirely different space with hybrid IT and multi-cloud designs, making applications and data distribution. Data is now omnipresent. The type of WAN and branch originating in the client-server era was not designed with cloud applications.
Hub and spoke designs
The “hub and spoke” model was designed for client/server environments where almost all of an organization’s data and applications resided in the data center (i.e., the hub location) and were accessed by workers in branch locations (i.e., the spokes). Internet traffic would enter the enterprise through a single ingress/egress point, typically into the data center, where it would then pass through the hub and to the users in branch offices. The birth of the cloud resulted in a major shift in how we consume applications, traffic types, and network topology. There was a big push to the cloud, and almost everything was offered as a SaaS. The cloud era changed the traffic patterns as the traffic goes directly to the cloud from the branch site and doesn’t need to be backhauled to the on-premise data center.
Diagram: Hub and Spoke: Network design.
Challenges with hub and spoke design
The hub and spoke model is outdated. Because the model is centralized, day-to-day operations may be relatively inflexible, and changes at the hub, even in a single route, may have unexpected consequences throughout the network. It may be difficult or even impossible to handle occasional periods of high demand between two spokes. The result of the cloud acceleration meant that the best point of access is not always in the central location. Why would branch sites direct all internet-bound traffic to the central HQ causing traffic tromboning and adding to latency when it can go directly to the cloud? The hub and spoke design is not an efficient topology for cloud-based applications.
Active/Active and Active/Passive
Historically, WANs are built on “active-passive,” where a branch can be connected using two or more links, but only the primary link is active and passing traffic. In this scenario, the backup connection only becomes active if the primary connection fails. While this might seem sensible, it’s highly inefficient. The interest in active-active has always been there, but it wasn’t easy to configure and expensive to implement. Active/active designs with traditional routing protocols are hard to design, inflexible, and a nightmare to troubleshoot.
Convergence and application performance problems can arise from active-active WAN edge designs. For example, active-active packets that get to the other end could be out-of-order packets due to each link propagating at different speeds. Also, the remote end has to reassemble, resulting in additional jitter and delay. Both high jitter and delay are bad for network performance. The issues arising from active-active are often known as spray and pray. It increases bandwidth but decreases goodput. Spraying packets down both links can result in 20% drops or packet reordering. There will also be issues with firewalls as they may see asymmetric routes coming in.
- A key point: SD-WAN and active-active paths.
For an active-active design, one needs to have application session awareness and a design that eliminates asymmetric routing. In addition, it would help if you had a way to slice up the WAN so application flows can work efficiently over either link. SD-WAN does this. Also, WAN designs can be active – standby, which requires routing protocol convergence in the event of primary link failure. Unfortunately, routing protocols are known to converge slowly. The emergence of SD-WAN technologies with multi-path capabilities combined with the ubiquity of broadband has made active-active highly attractive and something any business can deploy and manage quickly and easily.
SD-WAN solution enables the creation of virtual overlays that bond multiple underlay links. Virtual overlays enable enterprises to classify and categorize applications based on their unique service level requirements and provide fast failover should an underlay link experience congestion or a brownout or outage.
Regardless of the mechanism used to speed up convergence and failure detection, there are, with traditional routing, several convergence steps that need to be carried out a ) Detecting the topology change, b ) Notifying the rest of the network about the change, c ) Calculating the new best path d ) and e) Then switching to the new best path. Anyways, traditional WAN protocols route down one path and, by default, have no awareness of what’s happening at the application level. For this reason, there have been many attempts to enhance the WANs behavior.
Issues with MPLS
Diagram: Multiprotocol label switching (MPLS).
MPLS has some great features, but it doesn’t suit all application profiles. It can introduce more points of failure than traditional internet transport. Its architecture is predefined and, in some cases, inflexible. For example, some Service Providers (SP) might only offer hub and spoke topologies, and others only offer a full mesh. Any changes to these predefined architectures will require manual intervention unless you have a very flexible MPLS service provider that allows you to do cool stuff with Route Targets.
Scenario: Old and rigid MPLS
I designed a headquarters site for a large enterprise during a recent consultancy. I felt that MPLS topologies, once provisioned, are difficult to change. MPLS topologies are similar to the brick foundation of a house. Once the foundation is laid, it’s hard to make changes to the original structure without starting over. Basically, in its very simplest form in an MPLS network, we have both Provider Edge (PE) and P ( Provider ) routers. The P router configuration does not change based on customer requirements, but the PE router does
We have several technologies such as Route Target to control routers in and out of PE routers. A PE router with matching route targets and configurable variables allows the routes to pass. This created the customer topologies such as a hub and spoke or full mesh. The Wide Area Network (WAN) I was working on was fully outsourced. As a result, any requests would require service provider intervention with additional design & provisioning activities.
For example, mapping application subnets to new or existing RT may involve new high-level design approval with additional configuration templates, which would have to be applied by provisioning teams. It was a lot of work for such a small task. But, unfortunately, it puts the brakes on agility and pushes lead times through the roof.
BGP community tagging
While there are ways to get over this with BGP community tagging and matching, which does provide some degree of flexibility, we can’t get away from the fact that it remains a fixed, predefined configuration. As a result, all subsequent design changes may still require service provider intervention.
Drivers for SD-WAN
Diagram: SD-WAN: The drivers for SD-WAN.
Driver for SD-WAN: Flexible topologies
For example, using DPI, we can have Voice over IP traffic go over MPLS. Here the SD-WAN will look at real-time protocol and session initiation protocol. We can also have less critical applications that can go to the Internet. MPLS can be used only for a certain app. As a result, the best effort traffic is pinned to the Internet, and only critical apps get an SLA and go on the MPLS path. Now we have better utilization of the transports. And circuits never need to be dormant. With SD-WAN, we are using the B/W that you have available and ensuring an optimized experience. What value the SD-WAN brings is that the solution tracks the network and path conditions in real-time, revealing performance issues as they are happening. And then dynamically redirect data traffic to the next available path. Then when the network recovers to its normal state, the SD-WAN solution can redirect the traffic path of the data to its original location. Therefore the effects of network degradation, which come in the form of brownouts and soft failure, can be minimized.
Driver for SD-WAN: Encryption key rotation
Data security has never been a more important consideration than it is today. Therefore, businesses and other organizations must take robust measures to keep data and information safely under lock and key. Encryption keys must get rotated regularly (every 90 days is the standard interval) to reduce the risk of your data security being compromised. However, standard VPN-based encryption key rotation can be complicated and disruptive, often even requiring downtime. SD-WAN can offer automated key rotation, allowing network administrators to pre-program rotations without needing manual intervention or system downtime.
Driver for SD-WAN: Push to the cloud
Another key feature of SD-WAN technology is cloud breakout. In a nutshell, this allows you to connect branch office users to cloud-hosted applications directly and securely, eliminating the inefficiencies of backhauling cloud-destined traffic through the data center. Given the ever-growing importance of SaaS and IaaS services, efficient and reliable access to the cloud is crucial for many businesses and other organizations. By simplifying how to branch traffic is routed, SD-WAN also makes it quicker and easier to set up breakouts.
The changing perimeter location
Users are no longer positioned in one location with corporate-owned static devices. They are dispersed; additional latency degrades application performance when they connect back to central locations. Optimizations can be made to applications and network devices, but the only solution is to shorten the link by moving to cloud-based applications. There is a huge push and a rapid flux for cloud-based applications. Most are now moving away from the on-premise in-house hosting to cloud-based management. The ready-made global footprint enables the usage of SaaS-based platforms that negate the drawbacks of dispersed users tromboning to a central data center to access applications. Logically positioned cloud platforms are closer to the mobile user. In addition, cloud hosting these applications is far more efficient than making them available over the public Internet.
Driver for SD-WAN: Decentralization of traffic
A lot of traffic is now decentralized from the central data center to remote branch sites. Many branches do not run high bandwidth-intensive applications. These types of branch sites are known as light edges. Even with the traffic change, the traditional branch sites rely on hub sites for the majority of security and network services. The branch sites should connect to the cloud applications directly over the Internet, without tromboning traffic to data centers for either internet access or security services. An option should exist to extend the security perimeter into the branch sites without requiring expensive onsite firewalls and IPS/IDS. SD-WAN essentially builds a dynamic security fabric without the appliance sprawl of multiple security devices and vendors.
The ability to service chain traffic
Also, service chaining. Service chaining through SD-WAN allows organizations to reroute their data traffic through one service or multiple services, including intrusion detection and prevention devices or cloud-based security services. It thereby enables firms to declutter their branch office networks. They can, after all, automate how particular types of traffic flows are handled and assemble connected network services into a single chain.
Driver for SD-WAN: Bandwidth-intensive applications
Exponential growth in demand for high bandwidth applications such as multimedia in cellular networks has triggered the need to develop new technologies capable of providing the required high-bandwidth, reliable links in wireless environments. The biggest user of internet bandwidth is video streaming—more than half of total global traffic. The Cartesian study confirms historical trends reflecting consumer usage that remains highly asymmetric as video streaming remains the most popular.
Richer and hungry applications
Richer applications, multimedia traffic, and growth in the cloud application consumption model drive the need for additional bandwidth. Unfortunately, the resulting congestion leads to packet drops, ultimately degrading application performance and user experience. SD-WAN offers flexible bandwidth allocation so that you don’t have to go through the hassle of manually allocating bandwidth for specific applications. Instead, SD-WAN allows you to classify applications into particular groups and then specify a particular service level requirement. This way, you can ensure your set-up is better equipped to run smoothly, minimizing the risk of glitchy and delayed performance on an audio conference call.
Driver for SD-WAN: Organic growth
We also have organic business growth, a big driver for additional bandwidth requirements. The challenge is that existing network infrastructures are static, unable to adequately respond to this type of growth in a reasonable period. The last mile of MPLS puts a lock on you, destroying agility. Circuit lead times impede the organization’s productivity and create an overall lag.
Driver for SD-WAN: Costs
A WAN solution should be simple. To serve the new era of applications, we need to increase the link capacity by buying more bandwidth. However, life is not that simple. The WAN is an expensive part of the network, and employing link oversubscription to reduce the congestion is too expensive. Bandwidth comes at a high cost to cater to new application requirements which are not met by the existing TDM-based MPLS architectures. At the same time, feature-rich MPLS comes at a high cost for relatively low bandwidth. You are never going to beat latency by adding more bandwidth. On the more traditional side, MPLS and ethernet private lines (EPLs) can range in cost from $700 to $10,000 per month, depending on bandwidth size and distance of the link itself. Some enterprises must also account for redundancies at each site as uptime for higher priority sites come into play. Cost becomes exponential when you have a large number of sites to deploy.
Driver for SD-WAN: Limitations of protocols
We already mentioned some problems with routing protocols, but IPsec left to defaults raises challenges. IPSec architecture is point-to-point, not site-to-site. Therefore, it does not natively support redundant uplinks. Complex configurations and potentially additional protocols are required when sites have multiple uplinks to multiple providers. Left to its defaults, IPsec is not abstracted, and one session cannot be sent over multiple uplinks. This will cause challenges with transport failover and path selection. Secure tunnels should be torn up and down immediately and new sites incorporated into a secure overlay without much delay or manual intervention.
Driver for SD-WAN: Internet of Things (IoT)
As millions of IoT devices come online, how do we segment and secure this traffic without complicating the network design further? There will be many dumb IoT devices that will require communication to the IoT platform in a remote location. Therefore will there be increased signaling traffic over the WAN? Security and bandwidth consumption are key issues concerning the introduction of IP-enabled objects. Although encryption is a great way to prevent hackers from accessing data, it is also one of the leading IoT security challenges. These drives like the storage and processing capabilities that would be found on a traditional computer. The result is an increase in attacks where hackers can easily manipulate the algorithms that were designed for protection. Also, Weak credentials and login details leave nearly all IoT devices vulnerable to password hacking and brute force. Any company that uses factory default credentials on their devices is placing their business, its assets, the customer, and their valuable information at risk of being susceptible to a brute force attack.
Driver for SD-WAN: Visibility
Many service provider challenges include a lack of visibility into customer traffic. The lack of granular details of traffic profiles leads to expensive over-provision of bandwidth and link resilience. In addition, upgrades at both a packet and optical layer often occur without full traffic visibility and justification. There are many networks out there that are left at half capacity just in case there is an unexpected spike in traffic. As a result, a lot of money is wasted on link underutilization which should be spent on innovation. This link between underutilization and oversubscription is due to a lack of visibility.