Data Center Design Requirements Part 1

We are; now, all moving in the cloud direction. The requirement is for large data centers that are elastic and scalable. Results of these changes that are influenced by innovations and methodology in the server / application world is that the network industry is experiencing the emergence of a new operational model. Provisioning must be quick and designers are looking to automate network configuration with a more systematic and less error prone programmatic way. Difficult to meet these new requirements with traditional data center designs. Traffic flow has changed and we have a lot of east to west traffic. Existing data center designs are designed with focusing on north-to-south flows. East to west traffic requires changing the architecture from aggregating based model to massive multipathing model. Referred to as Clos networks, leaf and spine designs allow the building of huge networks with reasonably sized equipment.

 

Important Fabric Features

Important Fabric Features

 

 

Latency Requirements

Intra-data center traffic flows makes us more concerned with latency than when we had outbound traffic flow. High latency between servers degrades performance and results in the ability to send less traffic between two endpoints. Low latency allows you to use as much bandwidth as possible. An Ultra-low latency ( ULL ) data center design is the race to zero. The goal is to design as fast as possible with the lowest possible end-to-end latency. Latency on an IP/Ethernet switched network can be as low as 50 ns.

 

Latency

 

 

High-frequency trading ( HFT ) environments, is pushing for this trend where it’s imperative to provide information from stock markets with minimal delay. HFT environments are different than most DC designs and don’t support virtualization. Port count is low and servers are designed in small domains. Conceptually similar to how Layer 2 domains should be designed as small layer 2 network pockets. Applications are grouped to match optimum traffic patterns where many-to-one conversations are reduced. This in turn will reduce the need for buffering, increasing network performance.

CX-1 cables are preferred over the more popular optical fiber.

 

Oversubscription

Optimum network design should consider and predict the possibility of congestion at critical network points. An example of unacceptable oversubscription would be ToR switch that has 20 Gbps of traffic from servers but only 10-Gbps uplink. This will result in packet drops and poor application performance.

 

Oversubscription

Oversubscription

 

Previous data center designs were 3-tier aggregation model based ( developed by Cisco ). Now, we are going for 2-tier models. Main design point for this model is the number of ports on the core, more ports on core result in larger networks. Similar design questions would be a) how much routing and, b)  how much bridging will I implement c) where do I insert my network services modules. We are now designing networks with lots of tiers – Clos Network. The concept comes from voice networks from around 1953; previously built voice switches with crossbar design. Clos designs give optimum any-to-any connectivity. Requires low latency and non-blocking components. Every element should be non-blocking. Multipath technologies deliver a linear increase in oversubscription with each device failure and are better than architectures that degrade during failures.


Lossless transport

Data Center Bridging ( DCB ) offers standards for flow control and queuing. Even if your data center does not use (Internet Small Computer System Interface) ISCSI, TCP elephant flows benefit from lossless transport, improving data center performance. Research has shown that a large percentage of TCP flows are below 100Mbps. The remaining small percentage are elephant flows but consume a staggering 80% of all traffic inside the data center. Due to their size and the way TCP operates when an elephant flows experience packet drops, they will slow down, affecting network performance.

 

VMmobiliy

Data Centers require bridging at layer 2 to retain IP address for vMobility. TCP stack currently has no separation between “who” you are and “where” you are i.e IP addressing represents both of these functions. Future implementation with Locator/ID Separation Protocol ( LISP ) divide these two roles but until fully implemented, bridging for vMobility is required.

Optimum design implement bridging without Spanning Tree Protocol ( STP ). Spanning Tree reduces bandwidth by 50% and massive multipathing technologies allow you to scale without losing 50% of the link bandwidth. Data centers want to move VMs without distributing traffic flow. VMware has VMotion Microsoft  Hyper-V has Live migration.

VMmobiliy is a VMware tool used for distributed resource scheduling. Load from hypervisors are automatically spread to other underutilized VMs. Other use cases in cloud environments where DC require dynamic workload placement and you don’t know in advance where the VM will be. If you want to retain sessions, you have to keep in the same subnet. Layer 3 VMotion are too slow as it will always take few seconds for routing protocol convergence. In theory, you could optimize timers for routing protocol fast convergence but in practice Interior Gateway Protocols ( IGP ) give you eventual consistency.

Layer 3 network require many events to complete before it reaches a fully converged state. Layer 2 when the first broadcast is sent, every switch knows exactly where that switch has moved. There are no mechanisms with Layer 3 to do something similar. Layer 2 networks result in a large broadcast domain. You may also experience large sub-optimal flows as when you move the VM the Layer 3 next hop will stay the same. Optimum Layer 3 forwarding – what Juniper is doing with Qfabric. Every Layer 3 switch has the same IP address, they can all serve as the next hop. Results in optimum traffic flow.

 

Deep Packet Buffers

 

Changes in Traffic

Changes in Traffic

 

We have more traffic staying in the DC and elephant flows from distributed databases. Traffic is now becoming very bursty. We also have a lot of micro burst traffic. The bursts are so short in duration that they don’t register as high link utilization but big enough that they can overflow packet buffers and cause drops. This type of behavior with TCP causes TCP slow start. Slow start with elephant flows is problematic for networks.

 

If you have traffic bursts, make sure the switch has good deep buffers.

 

About Matt Conran

Matt Conran has created 165 entries.

One Comment

  • Transport SDN - Multi-area/Multi-AS TE

    […] Within the DC, every point is connected together and you can assume unconstrained capacity. A common data centre design is a leaf and spine architecture, where all nodes have equidistant endpoints. This is not the case in the WAN. The WAN has […]

Leave a Reply