Big Switch SDN – Hyperscale
Over the course of the last 5-years data centre innovation has come from companies such as Google, Facebook, Amazon, and Microsoft. These companies are referred to as the hyperscale players. The vision of Big Switch is to take hyperscale concepts developed by these companies and bring it to smaller data centres around the world. Hyperscale practice and benefits should not be limited to large-scale networks.
Hyperscale networking consist of three things. The first element is bare metal and open switch hardware. Bare metal switches are switches sold without software and now make up 10% of all ports shipped. The second hyperscale aspect is Software Defined Networking (SDN). SDN vision is where you have one device, acting as a controller, which manages the physical and virtual infrastructure. The third element is the actual data architecture. Big Switch leverage what’s known as the Core-and-Pod design. Core-and-Pod is different from the traditional core, aggregation, edge model and allows incredible scale and automation when deploying applications.
Standard Chassis Design vs SDN Design
Standard chassis-based switches have supervisor, line cards and fabric backplane. A proprietary protocol runs between the blades for controls. Big Switch have all of these components but are named differently. Under the covers, the supervisor module is acting like SDN controller, programming the line cards and fabric backplane.
Instead, of supervisors, they have a controller and the internal to chassis proprietary protocol is; now, Openflow. The leaf switches are treated like line cards and spine switches are treated like the fabric backplane. They offer an Openflow integrated architecture.
Traditional data center designs operate on hierarchical tree architecture. Big switch follow a new networking architecture, called leaf and spine, which overcome the shortcomings of traditional tree architectures. To map leaf and spine to traditional data centre terminology, leaf is access and spine is core switch. Leaf and spine operate on the concept that every leaf have equidistant endpoints. Designs with equidistant endpoints make POD placement and service insertion easier than with hierarchical tree architecture.
Big Switch fabric architecture has multiple connections points. Similar to Equal Cost Multipath (ECMP) fabric and Multi Chassis Link Aggregation (MLAG), enabling layer 2 and layer 3 multipathing. This type of connectivity allows you to have network partition problems without having a global effect. Obviously, you still lose capacity to your spin switch but you have not lost connectivity. Controller controls all this and has a central view.
Losing a leaf switch in a leaf and spine architecture is not a big deal as long as you have configured multiple paths.
Bare Metal Switches
The first hyperscale design principle is to utilize bare metal switches. Bare metal switches are Ethernet switches sold without software. Disaggregating the hardware from switches software allows you to build your own switch software stack on top of that. Cheaper in terms of capex and allows you to better tune the operating system to what you actually need. Its give you the ability to tailor the operations to specific requirements.
Core and Pod Design
Traditional core-agg-edge is a monolithic design that cannot evolve. Hyperscale companies are now designing to a core-and-pod design, allowing operations to improve faster. Data centres are usually made up of two-core components. One is the core with is the Layer 3 routes for ingress and egress routing. Then, you have a POD; self-contained unit connected to the core. Intra communication between PODs is done via core.
A POD is a certified design of servers, storage, and network. They are all grouped to common service. Each POD contains an atomic unit of networking, compute, and storage, attached directly to the core via Layer 2 or Layer 3. Due to a PODs-fixed configuration, automation is simple and stable.
Big Switch Products
Big Tap and Big Cloud Fabric are two-product streams from Big Switch. Both use a fabric architecture built on white box switches with a centralized controller and POD design.
Big cloud fabric architecture is designed to be a network for a POD. Each Big Cloud architecture instance is a pair of redundant SDN controllers and a leaf / spine topology is the network for your POD. Switches have zero touch, so they are stateless, turn them on, it boots and downloads the switch image and configuration. It auto discovers all of the links and troubleshoot any physical problems itself.
SDN Controller Architecture
There are generic architectural challenges of SDN controller-based networks. The first crucial question to ask is where is the split between the controller and network devices? In Openflow, it’s clear that the split is between the control plane and the data plane. The split affects the outcomes from various events such as a controller bug, controller failure, network partitions, size of the failure domain.
You might have a SDN controller cluster but every single controller is; still, a single point of failure. Controller cluster protects you from hardware failures but not from software failures. If someone misconfigures or corrupts the controller database you lose the controller regardless of how many controllers in a cluster. Every controller is a single fat fingers domain. Due to the complexity of clusters and clustering protocols you could implement failures by bad design. Every distributed system is complex and if it has to work with real-time data it is even harder.
SDN Controller – Availability Zones
Optimum design is to build controllers per availability zones. If one controller fail you lose that side of the fabric but you still have another fabric. To use this concept you must have applications that can run in multiple availability zones. Availability zones are a great idea but applications must be properly designed to use them. Availability zones usually relate to a single failure domain.
How you deal with failures and what failure rate is acceptable? Level of failure rate acceptance drives the level of redundancy in your network. Full redundancy is a great design goal as it reduces the probability of entire network failure. But full redundancy will never give you 100% availability. Network partitions still happen with fully redundant networks.
Be careful of split brain scenarios when you have one controller looking after one partition and another controller looking after the other partitions. The way Big Switch overcome time is with a distributed control plane. The forwarding elements are aligned so a network partition can happen.
Big Switch Distributed Routing – Tenant Router
For routing they have the concept known as a tenant router. With the tenant router you can say that these two-broadcast domains can talk to each other via policy points. Tenant router is a logical router, physically distributed throughout the entire network. Every switch has a copy of the tenant routers routing table local to it. The routing state is distributed everywhere. No specific layer 3 point that traffic needs to cross to get from one layer 2 segment to the other. As all the leaf switches have a distributed copy of the database, all routing take the most optimal path. Traffic does not have to hairpin up to a physical layer 3 point when two-broadcast domains are on the same leaf switch.
You can map application directly to tenant router, which act like a VRF with VRF packet forwarding hardware. This is known as micro segmentation. With this you can put a set of applications or VM in a tenant and demarc the network by tenant and have per tenant policy.