Exploding Applications and Data Center Design
We have exploding applications!!!
The problems we have and what are we doing about it?
Ask yourself the question, are data centers ready and available for the applications of today and the emerging applications for tomorrow? Business and applications are putting pressure on networks to change.
In 1960 to 1985 we started with mainframes and supported a customer base of about one million users. In 1985 to 2009 we moved to the personal computer , client / server model and the LAN /Internet model supported a customer base of hundreds of millions. In 2009 to 2020+ the industry has completely changed. We have a variety of platforms ( Mobile, Social, Big Data & Cloud ) with billions of users and it is estimated that the new I.T industry will be worth £4.8T
The customer has changed and is making us change our networks. Content is doubling over the next two-year period and emerging markets may overtake mature markets and we expect 5,200 GB data / person created in 2020. These new demands and trends are putting a lot of duress to the amount of content that is going to be created and how we serve and control this content poses new challenges to data networks. I recently approved an ( Network Impact Assessment ) for one of my customers and we were looking at rate limiting current data transfer over the WAN ( Wide Area Network ) at 9.5mbps over a 10 hour period for 34GB of data transfer at an off prime time window and this particular customer plans to triple that volume over the next 12 months due to application and service changes. Resulting in a WAN upgrade and DR ( Disaster Recovery ) scope change.
Big Data, Applications, Social Media, and Mobility are forcing architects to re think the way we engineer a network. Now we should concentrate more on scale, agility, analytic and management.
Data Center design was based on the 80/20 traffic pattern rule with Spanning Tree Protocol ( 802.1D ) where we have a root and all bridges build a loop free path to that root resulting in half ports forwarding and half in blocking state. Completely wasting your bandwidth. Even though we have the ability to load balance based on a certain number of VLAN’s configured and forwarding on one uplink and another set of VLAN’s forwarding on the secondary uplink we are still faced with the problems and scalability of having large Layer 2 domains in your data center design. Spanning tree is not a routing protocol, it’s a loop prevention protocol and as it has many disastrous consequences it should be limited to small segments of the data center.
The traffic patterns have shifted and the architecture needs to adapt. Before we focused on 80% leaving the DC while now a lot of traffic is east to west and staying within the DC. The original traffic pattern made us design a typical style of a data center with access, core, distribution based on Layer 2 leading to Layer 3 transport. The “route when you can” approach was adopted as Layer 3 aka routing adds stability to the Layer 2 by controlling broadcast domains and flooding domains. The most popular data architecture in deployment today is based on very different requirement and the business is looking for large Layer 2 domains to support functions such as vmotion.
We need to meet the challenge of the future applications and as new apps come out with new requirements, it isn’t easy to make adequate changes to the network due to the protocol stack used.
The problem is that we rely on spanning tree, it was useful before but its past its date. The original author of spanning tree is the now the author of THRILL ( replacement to STP ).
STP ( Spanning Tree Protocol ) was never a routing protocol to determine the best path, it was used to provide a loop free path. STP is also a fail open protocol ( as apposed to a Layer 3 protocol that fails closed ) and one of spanning-tree’s biggest weaknesses is that it fails open i.e. if I don’t receive a BPDU ( Bridge Protocol Data Unit ), I assume I am not connected to a switch and I start forwarding on that port. When you combine a fail open paradigm to a flooding paradigm that can be disastrous.
To overcome the limitation of some are now trying to route ( Layer 3 ) the entire way to the access layer which has its problems too as there are some applications that require L2 to function e.g clustering, stateful devices. However people still like Layer 3 as we have the stability around routing. You have a true path based routing protocol managing the network not a loop free protocol like STP and routing also doesn’t fail open and prevents loops with the TTL ( Time to Live ) fields in the headers. Convergence around a failure is quick with improved stability. We also have ECMP ( Equal Cost Multi Path) paths to help with scaling which translate to scale out topologies. This allows growth of the network at a lower cost. Scale out is better than scale up.
Whether you are a small or large network , having a routed network over a Layer 2 network has its clear advantages.
The way we interface with the network is also cumbersome and it is estimated that 70% of failures on the network are due to human errors. The risk in changes to production network leads to cautious changes which can slow down processes to a crawl.
In summary , the problems we are faced with so far;
STP based Layer 2 has stability challenges; it fails open. Traditional bridging is controlled flooding, not controlled forwarding so it shouldn’t be considered as stable as a routing protocol. Some applications require Layer 2 but people still prefer to use Layer 3. The network infrastructure needs to be flexible enough to adapt to: new applications / services, legacy applications / services, organisation structures. There is NEVER enough bandwidth and we cannot predict the future application driven requirements so a better solution would be to have a flexible network infrastructure. The consequences of inflexibility slow down deployment of new services and applications and it restricts innovation. The infrastructure needs to be flexible to the applications. Not the other way around. It also needs to be agile enough to no longer be a bottleneck or barrier to deployment and innovation.
What are the new options moving forward?
We have Layer 2 fabrics ( Open standard – THRILL ) which are changing the way the network works and enables a large routed Layer 2 network. A Layer 2 Fabric for example Fabric Path is layer 2 but it acts more that of Layer 3 as it’s a routing protocol managed topology. As a result there is improved stability and faster convergence. It can also support massive ( up to 32 load-balanced forwarding paths versus a single forwarding path with Spanning Tree ) and scale-out capabilities.
If you already have a Layer 3 core and you need to support Layer 2 end to end you could go for an Encapsulated Overlay ( VXLAN, NVGRE and STT). You have the stability of a Layer 3 core and the familiarity of but can service Layer 2 end to end using UDP port numbers as network entropy. Depending on the design option used it basically builds a L2 tunnel over a L3 core. A use case for this would be if you have two devices that need to exchange state at L2 or if you require vmotion. VM’s cannot live migrate across L3 as they need to stay in the same VLAN to keep the TCP sessions intact.
Software Defined Networking is changing the way we interact with the network. It provides us with faster deployment and improved control. It changes the way we interact with the network and we now have more direct application and service integration. Having a centralized controller you can view this as a policy focused network.
Many big vendors will push within the framework of converged infrastructure ( server, storage, networking, centralized management ) all from one vendor and closely linking hardware and software ( HP, Dell, , Oracle ). While other vendors will offer a software-defined data center, in which physical hardware is virtual, centrally managed and treated as abstraction resource pools that can be dynamically provision and configured ( Microsoft ).