Stateless Network Functions

There is a need for new technology and its time to break the tight of state and processing. The involves decoupling of the existing design of network functions into a stateless processing component along with a data store layer. Doing this and breaking the tight coupling enables a more elastic and resilient network function infrastructure.

A more realistic reality?

Let’s face it. Networks need to be both scalable and sophisticated. To be successful you need to completely redesign the network functions such as routing and firewall functions along with the underlying platforms that manage and orchestrate these functions.

However, to accomplish this there is a need to create an entirely new architecture and adapt the existing technology to this new architecture. If you look at technologies that have been used for cloud storage, no one has ever used them for networks. Why is this? The reason is mainly down to performance requirements, such as throughput and latency when it comes to distributed systems.

One can understand that with this type of disruptive technology there will be a lot of pushback from the industry saying that it is just not possible. But, we need to give the world something new. It deserves it. The ability to customize networks on demand. You need a logical place to start and this is with a new architecture.

Changing the environment

Decentralized workloads, the decline of on-premise and the increase in multi-cloud deployments has created one of the biggest connectivity challenges for data centers. A key finding is those colocation providers which have traditionally served as space, power, and physical network connectivity resources, should not become the hub for all traffic as workload decentralizes.

The problem is these colocation providers have not focused on connectivity that requires multi-tenancy and routing and they usually just have physical cloud connects, this has introduced growing management and operational challenges which will only increase in large scale deployments.

The cloud connect is where you need to connect multiple enterprises, where these enterprises need to connect to multiple cloud providers. All of these tenants need BGP routing, firewall functions, and NAT but to do this on a larger scale with a solution that couples the state cannot scale and be reliable.

New technolgies come in waves – some appear, others disappear.

The market needs a new type of technology, a software-defined interconnect as the Internet exchange. This originally came to light in 2016 when Laurent Vanbever proposed a software-defined internet exchange based on OpenFlow – SDX.

SDX, a software-defined internet exchange, is an SDN solution originating from the combined efforts of Princeton and UC Berkeley. It aims to address IXP pain points deploying additional SDN controllers and OpenFlow enabled switches. It doesn’t try to replace the entire classical IXP architecture with something new, rather it augments existing designs with a controller.

Software-defined interconnect (SDIX)

However, a software-defined interconnect (SDIX) is a new category of offering that gives colocation providers the ability to manage their cloud connects via software with the ability to extend their connectivity control. It should cover the cloud connect but also multiple data center interconnects.

In that past, the colocation providers focus was on space and power. However, in today’s world, they have new responsibilities. The responsibilities now extend to new types of connectivity for customers.

Customers now have new requirements. They need to move their data from one colocation facility to another to avoid latency or backup purposes. For these types of cases, colocation providers need a new type of platform to direct all of their different tenant’s tasks and requirements to a software-based platform.

Why is this different? The underlying technology for one: when it comes to network functions such as firewalls, routers and load balancers regardless of the application architecture and requirements, these network functions are basically physical boxes. The challenge is that traffic that flows through these boxes is tightly coupled with box. The physical box, virtual machine or container performing a network function is coupled with the state.

When you launch a new network function or redirect the traffic to a backup device what happens with the state? For sure this will affect the application. This might be acceptable for a single application, but not for a large scale deployment when you have millions of connections and application running on top of network functions.

Network function virtualization (NFV) didn’t really help here. Sorry to say. All it did was change the physical boxes to virtual. It’s just like changing a physical appliance in Dublin to a cloud-based provider in Dublin. Is this the future? NFV inherits the same design and features that the physical box has. But what needs to be done is that you realize that the problem is the state. You need to decouple the dynamic state from each of the network functions and put them in a high-performance data store within a cluster of commodity hardware and switches. A hardware agnostic solution with code that is not opensource.

Network function stateless

Then you can make network function stateless so it’s physically just a thread. So if it fails, it doesn’t affect application performance as the state is collected from the data store. This is what is needed as an underlying design, but does it seem possible? There will be overheads from decoupling the state.

The state can put into a cluster of servers. Some servers maintain some of the state and some of the other servers can be the network functions. The state is not physically in another data center or location. Every type of dynamic state such as counters, timers, and handshaking that you see in the TCP flow all of which is a state is a challenge to decouple, without breaking application performance. However, this can be done with adapting technology distributed systems. A database to store state is needed that is designed for high-performance computing. A read for a state should be around 5 microseconds. This is a very low read.

An algorithm is needed that can read and write the state in a way that reads multiple packets at the same time. This enables you to overcome any latency issues and achieve better performance than traditional appliances that have the state coupled.

About Matt Conran

Matt Conran has created 174 entries.

Leave a Reply