data center design

Open Networking

 

open networking

 

 

 

 

Open Networking

To undertake an effective SDN data center transformation strategy, we need to accept that demands on data center networks come from internal end-users, external customers, and considerable changes in the application architecture. All of which put pressure on traditional data center architecture. Dealing effectively with these demands requires the network domain to become more dynamic, potentially introducing Open Networking and Open Networking solutions. We must embrace digital transformation and the changes it will bring to our infrastructure for this to occur. Unfortunately, keeping current methods is holding back this transition.

 



Open Networking Solutions

Key Open Networking Discussion points:


  • Popularity of Spine Leaf architecture.

  • Lack of fabric-wide automation.

  • Automation and configuration management.

  • Open networking vs open protocols.

  • Challenges with integrated vendors.

 

In modern network infrastructures, as has been the case for many years on the server side, customers demand supply chain diversification in terms of hardware and silicon vendors. This diversification brings down the Total Cost of Ownership because businesses can drive better cost savings. In addition, replacing the hardware underneath can be seamless because the software above is common across both vendors. Further, as architectures streamline and spine leaf architecture increases from the data center to the backbone and the Edge, a common software architecture across all these environments brings in the element of operational simplicity. This is perfectly in line with the broader trend of IT/OT convergence.  

open networking
Diagram: Open Networking for a data center topology.

 

Open Networking Solutions

Open networking solutions: Data center topology

Now, let’s look at the data center evolution to see how we can achieve this type of modern infrastructure. So, to evolve and to be in line with current times, you should use technology and your infrastructure as effective tools. Then you will be able to drive the entire organization to become digital. Of course, the network components will play a key role. Still, the digital transformation process is an enterprise-wide initiative focusing on fabric-wide automation and software-defined networking.

 

Open networking solutions: Lacking fabric-wide automation

One major pain point I have seen throughout networking is the necessity to dispense with manual work lacking fabric-wide automation. In addition, it’s common to deploy applications by combining multiple services that run on a distributed set of resources. As a result, there is a lot more complexity to configuration and maintenance than what we had in the past. You have two options to implement all of this.

First, you can connect up these services by, for example, manually: spinning up the servers, installing the necessary packages, SSHing to each one, or you can go down the path of open network solutions with automation, in particular, Ansible automation with Ansible Engine or Ansible Tower with automation mesh. As automation best practice, use Ansible variables for flexible playbook creation that can be easily shared and used amongst different environments.  

 

Agility and the service provider

For example, in the case of a service provider that has thousands of customers that needs to deploy segmentation to separate different customers. Traditionally, the technology of choice would be VRFs or even full-blown MPLS, which requires administrative touchpoints for every box. As I was part of a full-blown MPLS design and deployment for a larger service provider, the costs and time were extreme. Even when it is finally done, the design lacks agility compared to what you could have done with Open Networking.

This would include Provider Edge (PE) Edge routers at the Edge, to which the customer CPE would connect. And then, in the middle of the network, we would have what is known as P ( Provider ) routers that switch the traffic based on a label. Although the benefits of label switching were easy to implement IPv6 with 6PE ( 6PE is a technique that provides global IPv6 reachability over IPv4 MPLS ) that overcomes many of the IPv6 fragmentation issues, we could not get away from the manual process without investing heavily again. It is commonly a manual process.

 

  • Fabric-wide automation and SDN

However, deploying a VRF, or any technology, such as an anycast gateway, is a dynamic global command in a software-defined environment. We now have fabric-wide automation and can deploy with one touch instead of numerous box-by-box configurations. Essentially, we are moving from a box-by-box configuration to the atomic programming of a distributing fabric of a single entity. The beauty is that we can carry out deployments with one configuration point quickly and without human error.

 

fabric wide automation
Diagram: Fabric wide automation.

 

Open networking solutions: Configuration management

Manipulating configuration files by hand is a tedious and error-prone task. Not to mention time-consuming. Equally, performing pattern matching to make changes to existing files is risky. The manual approach will result in configuration drift, where some servers will drift from the desired state. Configuration drift is caused by inconsistent configuration items across devices, usually due to manual changes and updates and not following the path of automation. Here you can use Ansible architecture to maintain the desired state across a variety of managed assets.

The managed assets that can range from distributed firewalls to Linux hosts are stored in what’s known as an inventory file which can be static or dynamic inventory. Dynamic inventories are best suited for a cloud environment where you want to gather host information dynamically. Ansible is all about maintaining the desired state for whatever environment you may have.

 

ansible automation
Diagram: Ansible automation.

 

 

The issue of Silos

To date, the networking industry has been controlled by a few vendors. We have dealt with proprietary silos in the data center, campus/enterprise, and service provider environments. The major vendors will continue to provide a vertically integrated lock-in solution for most customers. They will not allow independent, 3rd party network operating system software to run on their silicon.

Typically, these silos were able to solve the problems of the time. The modern infrastructure needs to be modular, open, and simple. Vendors need to allow independent, 3rd party network operating systems to run on their silicon to break from being a vertically integrated lock-in solution. Cisco has started this for the broader industry regarding open networking solutions with the announcement of the Cisco Silicon ONE. 

 

network overlay
Diagram: The issue of vendor lock-in.

 

 

The Rise of Open Networking Solutions

New data center requirements have emerged; therefore, the network infrastructure needs to break the silos and undergo transformations to meet these trending requirements. One can view the network transformation as moving from a static and conservative mindset that results in cost overrun and inefficiencies to a dynamic routed environment that is simple, scalable, and secure that can reach the far Edge. For effective network transformation, we need several stages. 

Firstly, transition to a routed data center design with a streamlined leaf-spine architecture. Along with a common operating system across cloud, Edge, and 5G networks. A viable approach would be all of this must be done with open standards, without proprietary mechanisms. Then, we need good visibility.

 

The need for visibility

As part of the transformation, the network is no longer considered a black box that needs to be available and provide connectivity to services. Instead, the network is a source of deep visibility that can aid a large set of use cases: network performance, monitoring, security, and capacity planning, to name a few. However, visibility is often overlooked with an over-focus on connectivity and not looking at the network as a valuable source of information.

 

Network management
Diagram: The requirement for deep visibility.

 

 

Monitoring a network: Flow level

In efficient network management, we need the ability to provide deep visibility for the application at a flow level on any port and any device type. Today if you want anything comparable, you would deploy a redundant monitoring network. Such a network would consist of probes, packet brokers, and tools to process the packet for metadata.

The traditional network monitoring tools, such as packet brokers, require life cycle management. A more viable solution would integrate network visibility into the fabric and would not need many components. This enables us to do more with the data and aids with agility for ongoing network operations. There will always be some requirement for application optimization or a security breach, where visibility can help you quickly resolve these issues.

Monitoring is used to detect known problems and is only valid with pre-defined dashboards with a problem you have seen before, such as capacity reaching its limit. On the other hand, we have the practices of Observability that can detect unknown situations and is used to aid those in getting to the root cause of any problem, known or unknown: Observability vs Monitoring

 

Evolution of the Data Center

We are in the transition process, and the data center has already undergone several design phases. Initially, we started with layer 2 silos, suitable for the north-to-south traffic flows. However, layer 2 designs hindered east-west communication traffic flows of modern applications and restricted agility, which led to a push to break network boundaries. Hence, there is a move to routing at the Top of the Rack (ToR) with overlays between ToR to drive inter-application communication. This is the most efficient approach, which can be accomplished in several ways. 

 

The leaf spine “clos” popularity

The demand for leaf and spine “clos” started in the data center and spread to other environments. A clos network is a type of non-blocking, multistage switching architecture. This network design extends from the central/backend data center to the micro data centers at the EdgeEdge. Various parts of the edge network: PoPs, Central offices, and Packet Core have all transformed into leaf and spine “clos” designs. 

leaf spine
Diagram: Leaf Spine.

 

The network overlay

Building a complete network overlay is common to all software-defined technologies when increasing agility. An overlay is a solution that is abstracted from the underlying physical infrastructure. This means separating and disaggregating the customer applications or services from the network infrastructure. Think of it as a sandbox or private network for each application that is on an existing network.

More often, the network overlay will be created with VXLAN. The Cisco ACI uses an ACI network of VXLAN for the overlay and the underlay is a combination of BGP and IS-IS. The overlay abstracts a lot of complexity and Layer 2 and 3 traffic separation is done with a VXLAN network identifier (VNI).

 

The VXLAN overlay

VXLAN uses a 24-bit network segment ID, called a VXLAN network identifier (VNI), for identification. This is much larger than the 12 bits used for traditional VLAN identification. The VNI is just a fancy name for a VLAN ID, but it now supports up to 16 Million VXLAN segments. This is considerably more than the traditional 4094 supported endpoints with VLANs. Not only does this provide more hosts, but it enables better network isolation capabilities having many little VXLAN segments instead of one large VLAN domain.

The VXLAN network has become the de facto overlay protocol and brings many advantages to network architecture regarding flexibility, isolation, and scalability. VXLAN effectively implements an Ethernet segment virtualizing a thick Ethernet cable.

 

Traditional policy deployment

Traditionally, deploying an application to the network involves propagating the policy to work through the entire infrastructure. Why? Because the network acts as an underlay, segmentation rules configured on the underlay are needed to separate different applications and services. This creates a very rigid architecture unable to react quickly and adapt to changes, therefore lacking agility. The applications and the physical network are tightly coupled. Now we can have a policy in the overlay network with proper segmentation per customer.

 

How VXLAN works: ToR

What is VXLAN? Virtual networks and those built with VXLAN are built from servers or ToR switches. Either way, the underlying network is transporting the traffic and doesn’t need to be configured to accommodate the customer application. That’s all done in the overlay, including the policy. Everything happens in the overlay network, which is most efficient when done in a fully distributed manner.

Now application and service deployment occurs without touching the physical infrastructure. For example, if you need to have Layer 2 or Layer 3 paths across the data center network, you don’t need to tweek a VLAN or make changes to routing protocols. Instead, you add a VXLAN overlay network. This approach removes the tight coupling between the application and network, creating increased agility and simplicity in deploying applications and services.

 

the network overlay
Diagram: The VXLAN overlay network.

 

Extending from the data center

Edge computing creates a fundamental disruption among the business infrastructure teams. We no longer have the framework where IT only looks at the backend software, such as Office365, and then OT looks at the routing and switching product-centric elements. There is convergence. Therefore, you need a lot of open APIs. The edge computing paradigm brings processing closer to the end devices. This reduces the latency and improves the end-user experience. It would help if you had a network that could work with this model to support this. Having different siloed solutions does not work. 

 

Common software architecture

So the data center design went from the layer 2 silo to the leaf and spine architecture with routing to the ToR. However, there is another missing piece. We need a common operating software architecture across all the domains and location types for switching and routing to reduce operating costs. The problem remains that even on one site, there can be several different operating systems.

Through recent consultancy engagements, I have experienced the operational challenge of having a bunch of Cisco operating systems on one site. For example, I had an IOS XR for service provider product lines, IOS XE for enterprise, and NS OX for the data center, all on a single site.

 

Open networking solutions and partially open source 

Some major players, such as Juniper, started with one operating system and then fragmented significantly. It’s not that these are not great operating systems. Instead, you have to partition into different teams, often a team for each operating system. With a common operating system software, you have a seamless experience across the entire environment. Therefore, your operational costs go down, your ability to use software for the specific use cases you want goes up, and you can reduce the cost of ownership. In addition, this brings Open Networking and partially open source.

 

open networking solutions

 

 

 

What Is Open Networking

The traditional integrated vendor

Traditionally, networking products were a combination of hardware and software that had to be purchased as an integrated solution. Open networking, on the other hand, disaggregates hardware from software. They were allowing IT to mix and match at will.

With Open Networking, we are not reinventing how packets are forwarded, or routers communicate with each other. With Open Networking solutions, you are never alone and never the only vendor. The value of software-defined networking and Open Networking is doing as much as possible in software, so you don’t depend on delivering new features from a new generation of hardware. If you want a new feature, it’s quickly implemented in software without swapping the hardware or upgrading line cards.

 

Move intelligence to software

You want to move as much intelligence as possible into software, thus removing the intelligence from the physical layer. You don’t want to build in hardware features; you want to use the software to provide the new features. This is an important philosophy and is the essence of Open Networking. Software becomes the central point of intelligence, not the hardware; this intelligence is delivered fabric-wide.

As we have seen with the rise of SASE. From the customer’s point of view, they get more agility as they can move from generation to generation of services without having hardware dependency and don’t have the operational costs of swapping out the hardware constantly.

SDN network

 

Open Networking Solutions and Open Networking Protocols

Some vendors build into the hardware the differentiator of the offering. For example, with different hardware, you can accelerate the services. With this design, the hardware level is manipulated to make improvements but not using standard Open Networking protocols. When you look at your hardware to accelerate your services, the result is that you are 100% locked and unable to move as the cost of moving is too much. You could have numerous generations of, for example, line cards, and all line cards have different capabilities resulting in a complex feature matrix. It is not that I’m against this, and I’m a big fan of the big vendors, but this is the world of closed networking, which has been accepted as the norm until recently. So you must adapt and fit; we need to use open protocols.

 

Open networking is a must; Open-source is not

The proprietary silo deployments led to proprietary alternatives to the big vendors. This meant that the startups and alternatives offered around ten years ago were playing the game on the same pitch as the incumbents. Others built their software and architecture by, for example, saying the Linux network subsystem and the OVS bridge are good enough to solve all data center problems. With this design, you could build small PoPs with layer 2. But the ground shifts as the design requirements change to routing. So let’s glue the Linux kernel and Quagga FRRouting (FRR) and devise a routing solution. Unfortunately, many didn’t consider the control plane architecture or the need for multiple data center use cases.

 

  • A key point: Limited scale

Glueing the operating system and elements of open-source routing provides a limited scale and performance and results in operationally intensive and expensive solutions. The software is built to support the hardware and architectural demands. Now we see a lot of open-source networking vendors tackling this problem from the wrong architectural point of view, at least from where the market is moving to. It is not composable, not microservices-based, and not scalable from an operational viewpoint.

There is a difference between open source and Open Networking. The open-source offerings (especially the control plane) have not scaled because of sub-optimal architectures. On the other hand, Open Networking involves building software from first principles using modern best practices, with Open API (e.g., OpenConfig/NetConf) for programmatic access without compromising on the massive scale-up and scale-out requirements of modern infrastructure.

 

SDN Network Design Options

We have both controller and controllerless options. With a controllerless solution, setup is faster, increasing agility, and provides robustness in terms of single-point-of-failure, particularly for out-of-band management i.e connecting all the controllers together. A controllerless architecture is more self-healing; anything in the overlay network is also part of the control plane resilience. An SDN controller or controller cluster may add complexity and impede resiliency. Since the network is dependent on them for operation, they become a single point of failure and can impact network performance. The intelligence kept in a controller can be a point of attack.

So, there are workarounds where the data plane can continue forward without an SDN controller but always avoid a single point of failure or complex ways to have a quorum in a control-based architecture.

 

software defined architecture
Diagram: Software defined architecutre.

 

Software Defined Architecture & Automation

We have two main types of automation to consider. Day 0 and days 1-2. First and foremost, day 0 automation is how to simplify and reduce human error when we first build out the infrastructure. Day 1-2 touches the customer more. This may include installing services quickly on the fabric, e.g., VRF configuration and building Automation into the fabric. 

 

Day 0 automation

As I said, day 0 automation is the ability to build basic infrastructures, such as routing protocols and connection information. These stages need to be carried out before installing VLANs or services. Typical tools software-defined networking uses are Ansible or your internal applications to orchestrate the build of the network. These are known as fabric automation tools. Once the tools discover the switches, the devices are connected in a particular way, and the fabric network is built without human intervention. It simplifies traditional automation, which is useful in day 0 automation environments.

 

  • Configuration Management

Ansible is a configuration management tool that can help alleviate manual challenges. Ansible replaces the need for an operator to manually tune configuration files and does a good job in application deployment and orchestrating multi-deployment scenarios.  

 

  • Pre-deployed infrastructure

Ansible does not deploy the infrastructure; you could use other solutions like Terraform that are best suited for this. Terraform is infrastructure as a code tool. Ansible is often described as a configuration management tool and is typically mentioned along the same lines as Puppet, Chef, and Salt. However, there is a considerable difference in how they operate. Most notably, the installation of agents. In reality, Ansible automation is relatively easy to install as its agentless. The Ansible architecture can be used in large environments with Ansible Tower with the use case of the execution environment and automation mesh. I have recently come across an automation mesh, a powerful overlay feature that enables automation closer to the. Edge of the network.

 

  • Current and desired stage [ YAML playbooks, variables ]

Ansible ensures that the managed asset’s current state meets the desired state. Ansible is all about state management. It does this with Ansible Playbooks, more specifically, YAML playbooks. A playbook is a term Ansible uses for a configuration management script and ensuring the desired state is met. Essentially, playbooks are Ansible’s configuration management scripts. 

 

open networking solutions
Diagram: Configuration management.

 

Day 1-2 automation

With day 1-2 automation, there are two things that SDN does.

Firstly, the ability to install or provision services automatically across the fabric. With one command, human error is eliminated. The fabric synchronizes the policies across the entire network. It automates and disperses the provisioning operations across all devices. This level of automation is not classical, as this strategy is built into the SDN infrastructure. 

Secondly is integrating network operations and services with the virtualization infrastructure managers such as OpenStack, VCenter, OpenDaylight, or at an advanced level OpenShift networking SDN. How does the network adapt to the instantiation of new workloads via the systems? The network admin should not even be in the loop if, for example, a new virtual machine (VM) is created. 

There should be a signal that a VM with certain configurations should be created, which is then propagated to all fabric elements. You shouldn’t need to touch the network when a new service is coming from the virtualization infrastructure managers. This represents the ultimate in agility as you are removing the network components. 

 

First steps of creating a software-defined data center

It is agreed that agility is a necessity. So what is the prime step? One key step is creating a software-defined data center that will allow the rapid deployment of compute and storage for workloads. In addition to software-defined computing and storage, the network must be automated and not be an impediment. 

 

The five key layers of technology

So to achieve software-defined agility for the network, we need an affordable solution that delivers on four key layers of technology:

  1. Comprehensive telemetry/granular visibility into endpoints and traffic traversing the network fabric for performance monitoring and rapid troubleshooting.
  2. Network virtualization overlay, like computer virtualization, abstracts the network from the physical hardware for increased agility and segmentation.
  3. Software control and automating the physical underlay to eliminate the mundane AND error-prone box-by-box configuration – Software Defined Networking (SDN).
  4. Open network underlay is a cost-effective physical infrastructure with no proprietary hardware lock-in that can leverage open source.
  5. Open Networking solutions are a must, as understanding the implications of open source in large complex data center environments.

 

open networking solutions

Matt Conran: The Visual Age
Latest posts by Matt Conran: The Visual Age (see all)

Comments are closed.