Dynamic Workload Scaling

Dynamic Workload Scaling ( DWS )

 

 

Dynamic Workload Scaling ( DWS ) 

In today’s fast-paced digital landscape, businesses strive to deliver high-quality services while minimizing costs and maximizing efficiency. To achieve this, organizations are increasingly adopting dynamic workload scaling techniques. This blog post will explore the concept of dynamic workload scaling, its benefits, and how it can help businesses optimize their operations.

  • Adjustment of resources

Dynamic workload scaling refers to the automated adjustment of computing resources to match the changing demands of a workload. This technique allows organizations to scale their infrastructure up or down in real time based on the workload requirements. By dynamically allocating resources, businesses can ensure that their systems operate optimally, regardless of varying workloads.

  • Defined Thresholds

Dynamic workload scaling is all about monitoring and distributing traffic at user-defined thresholds. Data centers are under pressure to support the ability to burst new transactions to available Virtual Machines ( VM ). In some cases, the VMs used to handle the additional load will be geographically dispersed, with both data centers connected by a Data Center Interconnect ( DCI ) link. The ability to migrate workloads within an enterprise hybrid cloud or in a hybrid cloud solution between enterprise and service provider is critical for business continuity for planned and unplanned outages.

 

Before you proceed, you may find the following post helpful:

  1. Network Security Components
  2. Virtual Data Center Design
  3. How To Scale Load Balancer
  4. Distributed Systems Observability
  5. Active Active Data Center Design
  6. Cisco Secure Firewall

 

Dynamic Workloads

Key Dynamic Workload Scaling Discussion Points:


  • Introduction to Dynamic Workload Scaling and what is involved.

  • Highlighting the details of dynamic workloads and how they can be implemented.

  • Critical points on how Cisco approaches Dynamic Workload Scaling.

  • A final note on design considerations.

 

Back to basics with OTV.

Overlay Transport Virtualization (OTV) is an IP-based technology to provide a Layer 2 extension between data centers. OTV is transport agnostic, indicating that the transport infrastructure between data centers can be dark fiber, MPLS, IP routed WAN, ATM, Frame Relay, etc.

The sole prerequisite is that the data centers must have IP reachability between them. OTV permits multipoint services for Layer 2 extension and separated Layer 2 domains between data centers, maintaining an IP-based interconnection’s fault-isolation, resiliency, and load-balancing benefits.

Unlike traditional Layer 2 extension technologies, OTV introduces the Layer 2 MAC routing concept. The MAC-routing concept enables a control-plane protocol to advertise the reachability of Layer 2 MAC addresses. As a result, the MAC-routing idea has enormous advantages over traditional Layer 2 extension technologies that traditionally leveraged data plane learning, flooding Layer 2 traffic across the transport infrastructure.

 

Cisco and Dynamic Workloads

A new technology introduced by Cisco, called Dynamic Workload Scaling ( DWS ), satisfies the requirement of dynamically bursting workloads based on user-defined thresholds to available resource pools ( VMs ). It is tightly integrated with Cisco Application Control Engine ( ACE ) and Cisco’s Dynamic MAC-in-IP encapsulation technology known as Overlay Transport Virtualization ( OTV ), enabling resource distribution across Data Center sites. OTV provides the LAN extension method that keeps the virtual machine’s state as it passes locations, and ACE delivers the load-balancing functionality.

 

dynamic workloads
Dynamic workload and dynamic workload scaling.

 

Dynamic workload scaling: How does it work?  

  • DWS monitors the VM capacity for an application and expands that application to another resource pool during periods of peak usage. We provide a perfect solution for distributed applications among geographically dispersed data centers.
  • DWS uses the ACE and OTV technologies to build a MAC table. It monitors the local MAC entries and those located via the OTV link to determine if a MAC entry is considered “Local” or “Remote.”
  • The ACE monitors the utilization of the “local” VM. From these values, the ACE can compute the average load of the local Data Center.
  • DWS uses two APIs. One is to monitor the server load information polled from VMware’s VCenter, and another API is to poll OTV information from the Nexus 7000.
  • During normal load conditions, when the data center is experiencing low utilization, the ACE can load incoming balance traffic to the local VMs.
  • However, when the data center experiences high utilization and crosses the predefined thresholds, the ACE will add the “remote” VM to its load-balancing mechanism.
workload scaling
Workload scaling and its operations.

 

Dynamic workload scaling: Design considerations

During congestion, the ACE adds the “remote” VM to its load-balancing algorithm. The remote VM placed in the secondary data center could add additional load on the DCI. Essentially hair-pining traffic for some time as ingress traffic for the “remote” VM continues to flow via the primary data center. DWS should be used with Locator Identity Separation Protocol ( LISP ) to enable automatic move detection and optimal ingress path selection.

 

Benefits of Dynamic Workload Scaling:

1. Improved Efficiency:

Dynamic workload scaling enables businesses to allocate resources precisely as needed, eliminating the inefficiencies associated with over-provisioning or under-utilization. Organizations can optimize resource utilization and reduce operational costs by automatically scaling resources up during periods of high demand and scaling them down during periods of low demand.

2. Enhanced Performance:

With dynamic workload scaling, businesses can effectively handle sudden spikes in workload without compromising performance. Organizations can maintain consistent service levels and ensure smooth operations during peak times by automatically provisioning additional resources when required. This leads to improved customer satisfaction and retention.

3. Cost Optimization:

Traditional static infrastructure requires businesses to provision resources based on anticipated peak workloads, often leading to over-provisioning and unnecessary costs. Dynamic workload scaling allows organizations to provision resources on demand, resulting in cost savings by paying only for the resources utilized. Additionally, by scaling down resources during periods of low demand, businesses can further reduce operational expenses.

4. Scalability and Flexibility:

Dynamic workload scaling allows businesses to scale their operations as needed. Whether expanding to accommodate business growth or handling seasonal fluctuations, organizations can easily adjust their resources to match the workload demands. This scalability and flexibility enable businesses to respond quickly to changing market conditions and stay competitive.

Dynamic workload scaling has emerged as a crucial technique for optimizing efficiency and performance in today’s digital landscape. By dynamically allocating computing resources based on workload requirements, businesses can improve efficiency, enhance performance, optimize costs, and achieve scalability. Implementing robust monitoring systems, automation, and leveraging cloud computing services are critical steps toward successful dynamic workload scaling. Organizations can stay agile and competitive and deliver exceptional customer service by adopting this approach.

Key Features of Cisco Dynamic Workload Scaling:

Intelligent Automation:

Cisco’s dynamic workload scaling solutions leverage intelligent automation capabilities to monitor real-time workload demands. By analyzing historical data and utilizing machine learning algorithms, Cisco’s automation tools can accurately predict future workload requirements and proactively scale resources accordingly.

Application-Aware Scaling:

Cisco’s dynamic workload scaling solutions are designed to understand the unique requirements of different applications. By utilizing application-aware scaling, Cisco can allocate resources based on the specific needs of each workload, ensuring optimal performance and minimizing resource wastage.

Seamless Integration:

Cisco’s dynamic workload scaling solutions seamlessly integrate with existing IT infrastructures, allowing businesses to leverage their current investments. This ensures a smooth transition to dynamic workload scaling without extensive infrastructure overhauls.

Conclusion:

In today’s dynamic business environment, efficiently managing and scaling workloads is critical for organizational success. Cisco’s dynamic workload scaling solutions provide businesses with the flexibility, performance optimization, and cost savings necessary to thrive in an ever-changing landscape. By leveraging intelligent automation, application-aware scaling, and seamless integration, Cisco empowers organizations to adapt and scale their workloads effortlessly. Embrace Cisco’s dynamic workload scaling and unlock the full potential of your business operations.

 

WAN Design Requirements

LISP Protocol and VM Mobility

LISP Protocol and VM Mobility

The networking world is constantly evolving, with new technologies emerging to meet the demands of an increasingly connected world. One such technology that has gained significant attention is the LISP protocol. In this blog post, we will delve into the intricacies of the LISP protocol, exploring its purpose, benefits, and how it bridges the gap in modern networking and its use case with VM mobility.

LISP, which stands for Locator/ID Separation Protocol, is a network protocol that separates the identity of a device from its location. Unlike traditional IP addressing schemes, which rely on a tightly coupled relationship between the IP address and the device's physical location, LISP separates these two aspects, allowing for more flexibility and scalability in network design.

LISP, in simple terms, is a network protocol that separates the location of an IP address (Locator) from its identity (Identifier). By doing so, it provides enhanced flexibility, scalability, and security in managing network traffic. LISP accomplishes this by introducing two key components: the Mapping System (MS) and the Tunnel Router (TR). The MS maintains a database of mappings between Locators and Identifiers, while the TR encapsulates packets using these mappings for efficient routing.

VM mobility refers to the seamless movement of virtual machines across physical hosts or data centers. LISP Protocol plays a crucial role in enabling this mobility by decoupling the VM's IP address from its location. When a VM moves to a new host or data center, LISP dynamically updates the mappings in the MS, ensuring uninterrupted connectivity. By leveraging LISP, organizations can achieve live migration of VMs, load balancing, and disaster recovery with minimal disruption.

The combination of LISP Protocol and VM mobility brings forth a plethora of advantages. Firstly, it enhances network scalability by reducing the impact of IP address renumbering. Secondly, it enables efficient load balancing by distributing VMs across different hosts. Thirdly, it simplifies disaster recovery strategies by facilitating VM migration to remote data centers. Lastly, LISP empowers organizations with the flexibility to seamlessly scale their networks to meet growing demands.

While LISP Protocol and VM mobility offer significant benefits, there are a few challenges to consider. These include the need for proper configuration, compatibility with existing network infrastructure, and potential security concerns. However, the networking industry is consistently working towards addressing these challenges and further improving the LISP Protocol for broader adoption and seamless integration.

The combination of LISP Protocol and VM mobility opens up new horizons in network virtualization and mobility. By decoupling the IP address from its physical location, LISP enables organizations to achieve greater flexibility, scalability, and efficiency in managing network traffic. As the networking landscape continues to evolve, embracing LISP Protocol and VM mobility will undoubtedly pave the way for a more dynamic and agile networking infrastructure.

Highlights: LISP Protocol and VM Mobility

Understanding LISP Protocol

– The LISP protocol, short for Locator/Identifier Separation Protocol, is a network architecture that separates the identity of a device (identifier) from its location (locator). It provides a scalable solution for routing and mobility while simplifying network design and reducing overhead. By decoupling the identifier and locator roles, LISP enables seamless communication and mobility across networks.

– Virtual machine mobility revolutionized the way we manage and deploy applications. With VM mobility, we can move virtual machines between physical hosts without interrupting services or requiring manual reconfiguration. This flexibility allows for dynamic resource allocation, load balancing, and disaster recovery. However, VM mobility also presents challenges in maintaining consistent network connectivity during migrations.

**LISP & VM Mobility**

The integration of LISP protocol and VM mobility brings forth a powerful combination. LISP provides a scalable and efficient routing infrastructure, while VM mobility enables dynamic movement of virtual machines. By leveraging LISP’s locator/identifier separation, VMs can maintain their identity while seamlessly moving across different networks or physical hosts. This synergy enhances network agility, simplifies management, and optimizes resource utilization.

The benefits of combining LISP and VM mobility are evident in various use cases. Data centers can achieve seamless workload mobility and improved load balancing. Service providers can enhance their network scalability and simplify multi-tenancy. Enterprises can optimize their network infrastructure for cloud computing and enable efficient disaster recovery strategies. The possibilities are vast, and the benefits are substantial.

How Does LISP Work

Locator Identity Separation Protocol ( LISP ) provides a set of functions that allow Endpoint identifiers ( EID ) to be mapped to an RLOC address space. The mapping between these two endpoints offers the separation of IP addresses into two numbering schemes ( similar to the “who” and the “where” analogy ), offering many traffic engineering and IP mobility benefits for the geographic dispersion of data centers beneficial for VM mobility.

LISP Components

The LISP protocol operates by creating a mapping system that separates the device’s Endpoint Identifier (EID), from its location, the Routing Locator (RLOC). This separation is achieved using a distributed database called the LISP Mapping System (LMS), which maintains the mapping between EIDs and RLOCs. When a packet is sent to a destination EID, it is encapsulated and routed based on the RLOC, allowing for efficient and scalable communication.

Before you proceed, you may find the following posts helpful:

  1. LISP Hybrid Cloud 
  2. LISP Control Plane
  3. Triangular Routing
  4. Active Active Data Center Design
  5. Application Aware Networking

 

LISP Protocol and VM Mobility

Virtualization

1- Virtualization can be applied to subsystems such as disks and a whole machine. A virtual machine (VM) is implemented by adding a software layer to an actual device to sustain the desired virtual machine’s architecture. In general, a virtual machine can circumvent real compatibility and hardware resource limitations to enable a more elevated degree of software portability and flexibility.

2- In the dynamic world of modern computing, the ability to seamlessly move virtual machines (VMs) between different physical hosts has become a critical aspect of managing resources and ensuring optimal performance. This blog post explores VM mobility and its significance in today’s rapidly evolving computing landscape.

3- VM mobility refers to transferring a virtual machine from one physical host to another without disrupting operation. Virtualization technologies such as hypervisors make this capability possible, enabling the abstraction of hardware resources and allowing multiple VMs to coexist on a single physical machine.

LISP and VM Mobility

The Locator/Identifier Separation Protocol (LISP) is an innovative networking architecture that decouples the identity (Identifier) of a device or VM from its location (Locator). By separating the two, LISP provides a scalable and flexible solution for VM mobility.

**How LISP Enhances VM Mobility**

1. Improved Scalability:

LISP introduces a level of indirection by assigning Endpoint Identifiers (EIDs) to VMs. These EIDs act as unique identifiers, allowing VMs to retain their identity even when moved to different locations. This enables enterprises to scale their VM deployments without worrying about the limitations imposed by the underlying network infrastructure.

2. Seamless VM Mobility:

LISP simplifies moving VMs by abstracting the location information using Routing Locators (RLOCs). When a VM is migrated, LISP updates the mapping between the EID and RLOC, allowing the VM to maintain uninterrupted connectivity. This eliminates the need for complex network reconfigurations, reducing downtime and improving overall agility.

3. Load Balancing and Disaster Recovery:

LISP enables efficient load balancing and disaster recovery strategies by providing the ability to distribute VMs across multiple physical hosts or data centers. With LISP, VMs can be dynamically moved to optimize resource utilization or to ensure business continuity in the event of a failure. This improves application performance and enhances the overall resilience of the IT infrastructure.

4. Interoperability and Flexibility:

LISP is designed to be interoperable with existing network infrastructure, allowing organizations to gradually adopt the protocol without disrupting their current operations. It integrates seamlessly with IPv4 and IPv6 networks, making it a future-proof solution for VM mobility.

Basic LISP Traffic flow

A device ( S1 ) initiates a connection and wants to communicate with another external device ( D1 ). D1 is located in a remote network. S1 will create a packet with the EID of S1 as the source IP address and the EID of D1 as the destination IP address. As the packets flow to the network’s edge on their way to D1, they are met by an Ingress Tunnel Router ( ITR ).

The ITR maps the destination EID to a destination RLOC and then encapsulates the original packet with an additional header with the source IP address of the ITR RLOC and the destination IP address of the RLOC of an Egress Tunnel Router ( ETR ). The ETR is located on the remote site next to the destination device D1.

LISP protocol

The magic is how these mappings are defined, especially regarding VM mobility. There is no routing convergence, and any changes to the mapping systems are unknown to the source and destination hosts. We are offering complete transparency.

LISP Terminology

LISP namespaces:

LSP Name Component

LISP Protocol Description 

End-point Identifiers  ( EID ) Addresses

The EID is allocated to an end host from an EID-prefix block. The EID associates where a host is located and identifies endpoints. The remote host obtains a destination the same way it obtains a normal destination address today, for example through DNS or SIP. The procedure a host uses to send IP packets does not change. EIDs are not routable.

Route Locator ( RLOC ) Addresses

The RLOC is an address or group of prefixes that map to an Egress Tunnel Router ( ETR ). Reachability within the RLOC space is achieved by traditional routing methods. The RLOC address must be routable.

LISP site devices:

LISP Component

LISP Protocol Description 

Ingress Tunnel Router ( ITR )

An ITR is a LISP Site device that sits in a LISP site and receives packets from internal hosts. It in turn encapsulates them to remote LISP sites. To determine where to send the packet the ITR performs an EID-to-RLOC mapping lookup. The ITR should be the first-hop or default router within a site for the source hosts.

Egress Tunnel Router ( ETR )

An ETR is a LISP Site device that receives LISP-encapsulated IP packets from the Internet, decapsulates them, and forwards them to local EIDs at the site. An ETR only accepts an IP packet where the destination address is the “outer” IP header and is one of its own configured RLOCs. The ETR should be the last hop router directly connected to the destination.

LISP infrastructure devices:

LISP Component Name

LISP Protocol Description

Map-Server ( MS )

The map server contains the EID-to-RLOC mappings and the ETRs register their EIDs to the map server. The map-server advertises these, usually as an aggregate into the LISP mapping system.

Map-Resolver ( MR )

When resolving EID-to-RLOC mappings the ITRs send LISP Map-Requests to Map-Resolvers. The Map-Resolver is typically an Anycast address. This improves the mapping lookup performance by choosing the map-resolver that is topologically closest to the requesting ITR.

Proxy ITR ( PITR )

Provides connectivity to non-LISP sites. It acts like an ITR but does so on behalf of non-LISP sites.

Proxy ETR ( PETR )

Acts like an ETR but does so on behalf of LISP sites that want to communicate to destinations at non-LISP sites.

VM Mobility

LISP Host Mobility

LISP VM Mobility ( LISP Host Mobility ) functionality allows any IP address ( End host ) to move from its subnet to either a) a completely different subnet, known as “across subnet,” or b) an extension of its subnet in a different location, known as “extended subnet,” while keeping its original IP address.

When the end host carries its own Layer 3 address to the remote site, and the prefix is the same as the remote site, it is known as an “extended subnet.” Extended subnet mode requires a Layer 2 LAN extension. On the other hand, when the end hosts carry a different network prefix to the remote site, it is known as “across subnets.” When this is the case, a Layer 2 extension is not needed between sites.

LAN extension considerations

LISP does not remove the need for a LAN extension if a VM wants to perform a “hot” migration between two dispersed sites. The LAN extension is deployed to stretch a VLAN/IP subnet between separate locations. LISP complements LAN extensions with efficient move detection methods and ingress traffic engineering.

LISP works with all LAN extensions – whether back-to-back vPC and VSS over dark fiber, VPLS, Overlay Transport Virtualization ( OTV ), or Ethernet over MPLS/IP. LAN extension best practices should still be applied to the data center edges. These include but are not limited to – End-to-end Loop Prevention and STP isolation.

A LISP site with a LAN extension extends a single site across two physical data center sites. This is because the extended subnet functionality of LISP makes two DC sites a single LISP site. On the other hand, when LISP is deployed without a LAN extension, a single LISP site is not extended between two data centers, and we end up having separate LISP sites.

LISP extended subnet

VM mobility
VM mobility: LISP protocol and extended subnets

To avoid asymmetric traffic handling, the LAN extension technology must filter Hot Standby Router Protocol ( HSRP ) HELLO messages across the two data centers. This creates an active-active HSRP setup. HSRP localization optimizes egress traffic flows. LISP optimizes ingress traffic flows.

The default gateway and virtual MAC address must remain consistent in both data centers. This is because the moved VM will continue to send to the same gateway MAC address. This is accomplished by configuring the same HSRP gateway IP address and group in both data centers. When an active-active HSRP domain is used, re-ARP is not needed during mobility events.

The LAN extension technology must have multicast enabled to support the proper operation of LISP. Once a dynamic EID is detected, the multicast group IP addresses send a map-notify message by the xTR to all other xTRs. The multicast messages are delivered leveraging the LAN extension.

LISP across subnet 

VM mobility
VM mobility: LISP protocol across Subnets

LISP across subnets requires the mobile VM to access the same gateway IP address, even if they move across subnets. This will prevent egress traffic triangulation back to the original data center. This can be achieved by manually setting the vMAC address associated with the HSRP group to be consistent across sites.

Proxy ARP must be configured under local and remote SVIs to correctly handle new ARP requests generated by the migrated workload. With this deployment, there is no need to deploy a LAN extension to stretch VLAN/IP between sites. This is why it is considered to address “cold” migration scenarios, such as Disaster Recovery ( DR ) or cloud bursting and workload mobility according to demands.

**Benefits of LISP**

1. Scalability: By separating the identifier from the location, LISP provides a scalable solution for network design. It allows for hierarchical addressing, reducing the size of the global routing table and enabling efficient routing across large networks.

2. Mobility: LISP’s separation of identity and location mainly benefits mobile devices. As devices move between networks, their EIDs remain constant while the RLOCs are updated dynamically. This enables seamless mobility without disrupting ongoing connections.

3. Multihoming: LISP allows a device to have multiple RLOCs, enabling multihoming capabilities without complex network configurations. This ensures redundancy, load balancing, and improved network reliability.

4. Security: LISP provides enhanced security features, such as cryptographic authentication and integrity checks, to ensure the integrity and authenticity of the mapping information. This helps mitigate potential attacks, such as IP spoofing.

**Applications of LISP**

1. Data Center Interconnection: LISP can interconnect geographically dispersed data centers, providing efficient and scalable communication between locations.

2. Internet of Things (IoT): With the exponential growth of IoT devices, LISP offers an efficient solution for managing these devices’ addressing and communication needs, ensuring seamless connectivity in large-scale deployments.

3. Content Delivery Networks (CDNs): LISP can optimize content delivery by allowing CDNs to cache content closer to end-users, reducing latency and improving overall performance.

Closing Points: LISP and VM Mobility

LISP is a network architecture and protocol that separates the two functions of IP addresses: identifying endpoints and routing traffic. By doing so, it allows for more efficient routing and a reduction in the complexity of network management. This separation is fundamental to enabling VM mobility, as it allows VMs to maintain consistent identities even as their physical locations change.

One of the primary benefits of LISP VM Mobility is the enhanced flexibility it provides. Businesses can move VMs across different data centers or cloud environments without having to reconfigure their network settings. This capability is particularly beneficial for disaster recovery scenarios, load balancing, and maintenance operations. Additionally, LISP VM Mobility can lead to cost savings by optimizing resource utilization and reducing the need for redundant infrastructure.

To implement LISP VM Mobility, organizations need to ensure that their network infrastructure supports the LISP protocol. This may involve updating network equipment and software to be compatible with LISP. Additionally, IT teams should be trained to manage and troubleshoot LISP-enabled environments effectively. By taking these steps, businesses can harness the full potential of LISP VM Mobility to drive innovation and efficiency.

Despite its advantages, LISP VM Mobility is not without challenges. Organizations must carefully plan the transition to ensure compatibility and minimize disruptions. Security is another critical consideration, as the dynamic nature of VM mobility can introduce new vulnerabilities. Implementing robust security measures, such as encryption and access controls, is essential to safeguarding data as it moves across networks.

 

 

Summary: LISP Protocol and VM Mobility

LISP (Locator/ID Separation Protocol) and VM (Virtual Machine) Mobility are two powerful technologies that have revolutionized the world of networking and virtualization. In this blog post, we delved into the intricacies of LISP and VM Mobility, exploring their benefits, use cases, and seamless integration.

Understanding LISP

LISP, a groundbreaking protocol, separates the role of a device’s identity (ID) from its location (Locator). By decoupling these two aspects, LISP enables efficient routing and scalable network architectures. It provides a solution to overcome the limitations of traditional IP-based routing, enabling enhanced mobility and flexibility in network design.

Unraveling VM Mobility

VM Mobility, on the other hand, refers to the ability to seamlessly move virtual machines across different physical hosts or data centers without disrupting their operations. This technology empowers businesses with the flexibility to optimize resource allocation, enhance resilience, and improve disaster recovery capabilities.

The Synergy between LISP and VM Mobility

When LISP and VM Mobility join forces, they create a powerful combination that amplifies the benefits of both technologies. By leveraging LISP’s efficient routing and location independence, VM Mobility becomes even more agile and robust. With LISP, virtual machines can be effortlessly moved between hosts or data centers, maintaining seamless connectivity and preserving the user experience.

Real-World Applications

Integrating LISP and VM Mobility opens up various possibilities across various industries. In the healthcare sector, for instance, virtual machines hosting critical patient data can be migrated between locations without compromising accessibility or security. Similarly, in cloud computing, LISP and VM Mobility enable dynamic resource allocation, load balancing, and efficient disaster recovery strategies.

Conclusion:

In conclusion, combining LISP and VM Mobility ushers a new era of network agility and virtual machine management. Decoupling identity and location through LISP empowers organizations to seamlessly move virtual machines across different hosts or data centers, enhancing flexibility, scalability, and resilience. As technology continues to evolve, LISP and VM Mobility will undoubtedly play a crucial role in shaping the future of networking and virtualization.

Green data center with eco friendly electricity usage tiny person concept. Database server technology for file storage hosting with ecological and carbon neutral power source vector illustration.

Data Center Design with Active Active design

Active Active Data Center Design

In today's digital age, where businesses heavily rely on uninterrupted access to their applications and services, data center design plays a pivotal role in ensuring high availability. One such design approach is the active-active design, which offers redundancy and fault tolerance to mitigate the risk of downtime. This blog post will explore the active-active data center design concept and its benefits.

Active-active data center design refers to a configuration where two or more data centers operate simultaneously, sharing the load and providing redundancy for critical systems and applications. Unlike traditional active-passive setups, where one data center operates in standby mode, the active-active design ensures that both are fully active and capable of handling the entire workload.

Enhanced Reliability: Redundant data centers offer unparalleled reliability by minimizing the impact of hardware failures, power outages, or network disruptions. When a component or system fails, the redundant system takes over seamlessly, ensuring uninterrupted connectivity and preventing costly downtime.

Scalability and Flexibility: With redundant data centers, businesses have the flexibility to scale their operations effortlessly. Companies can expand their infrastructure without disrupting ongoing operations, as redundant systems allow for seamless integration and expansion.

Disaster Recovery: Redundant data centers play a crucial role in disaster recovery strategies. By having duplicate systems in geographically diverse locations, businesses can recover quickly in the event of natural disasters, power grid failures, or other unforeseen events. Redundancy ensures that critical data and services remain accessible, even during challenging circumstances.

Dual Power Sources: Redundant data centers rely on multiple power sources, such as grid power and backup generators. This ensures that even if one power source fails, the infrastructure continues to operate without disruption.

Network Redundancy: Network redundancy is achieved by setting up multiple network paths, routers, and switches. In case of a network failure, traffic is automatically redirected to alternative paths, maintaining seamless connectivity.

Data Replication: Redundant data centers employ data replication techniques to ensure that data is duplicated and synchronized across multiple systems. This safeguards against data loss and allows for quick recovery in case of a system failure.

Highlights: Active Active Data Center Design

The Role of Data Centers

An enterprise’s data center houses the computational power, storage, and applications needed to run its operations. All content is sourced or passed through the data center infrastructure in the IT architecture. Performance, resiliency, and scalability must be considered when designing the data center infrastructure.

Furthermore, the data center design should be flexible so that new services can be deployed and supported quickly. The many considerations required for such a design are port density, access layer uplink bandwidth, actual server capacity, and oversubscription.

A few short years ago, data centers were very different from what they are today. In a multi-cloud environment, virtual networks have replaced physical servers that support applications and workloads across pools of physical infrastructure. Nowadays, data exists across multiple data centers, the edge, and public and private clouds.

Communication between these locations must be possible in the on-premises and cloud data centers. Public clouds are also collections of data centers. In the cloud, applications use the cloud provider’s data center resources.

Redundant data centers

Redundant data centers are essentially two or more in different physical locations. This enables organizations to move their applications and data to another data center if they experience an outage. This also allows for load balancing and scalability, ensuring the organization’s services remain available.

Redundant data centers are generally located in geographically dispersed locations. This ensures that if one of the data centers experiences an issue, the other can take over, thus minimizing downtime. These data centers should also be connected via a high-speed network connection, such as a dedicated line or virtual private network, to allow seamless data transfers between the locations.

Implementing redundant data center BGP involves several crucial steps.

– Firstly, establishing a robust network architecture with multiple data centers interconnected via high-speed links is essential.

– Secondly, configuring BGP routers in each data center to exchange routing information and maintain consistent network topologies is crucial. Additionally, techniques such as Anycast IP addressing and route reflectors further enhance redundancy and fault tolerance.

**Benefits of Active-Active Data Center Design**

1. Enhanced Redundancy: With active-active design, organizations can achieve higher levels of redundancy by distributing the workload across multiple data centers. This redundancy ensures that even if one data center experiences a failure or maintenance downtime, the other data center seamlessly takes over, minimizing the impact on business operations.

2. Improved Performance and Scalability: Active-active design enables organizations to scale their infrastructure horizontally by distributing the load across multiple data centers. This approach ensures that the workload is evenly distributed, preventing any single data center from becoming a performance bottleneck. It also allows businesses to accommodate increasing demands without compromising performance or user experience.

3. Reduced Downtime: The active-active design significantly reduces the risk of downtime compared to traditional architectures. In the event of a failure, the workload can be immediately shifted to the remaining active data center, ensuring continuous availability of critical services. This approach minimizes the impact on end-users and helps organizations maintain their reputation for reliability.

4. Disaster Recovery Capabilities: Active-active data center design provides a robust disaster recovery solution. Organizations can ensure that their critical systems and applications remain operational despite a catastrophic failure at one location by having multiple geographically distributed data centers. This design approach minimizes the risk of data loss and provides a seamless failover mechanism.

**Implementation Considerations:**

Implementing an active-active data center design requires careful planning and consideration of various factors. Here are some key considerations:

1. Network Design: A robust and resilient network infrastructure is crucial for active-active data center design. Implementing load balancers, redundant network links, and dynamic routing protocols can help ensure seamless failover and optimal traffic distribution.

2. Data Synchronization: Organizations need to implement effective data synchronization mechanisms to maintain data consistency across multiple data centers. This may involve deploying real-time replication, distributed databases, or file synchronization protocols.

3. Application Design: Applications must be designed to be aware of the active-active architecture. They should be able to distribute the workload across multiple data centers and seamlessly switch between them in case of failure. Application-level load balancing and session management become critical in this context.

Active-active data center design offers organizations a robust solution for high availability and fault tolerance. Businesses can ensure uninterrupted access to critical systems and applications by distributing the workload across multiple data centers. The enhanced redundancy, improved performance, reduced downtime, and disaster recovery capabilities make active-active design an ideal choice for organizations striving to provide seamless and reliable services in today’s digital landscape.

Network Connectivity Center

### What is Google’s Network Connectivity Center?

Google Network Connectivity Center (NCC) is a centralized platform that enables enterprises to manage their global network connectivity. It integrates with Google Cloud’s global infrastructure, offering a unified interface to monitor, configure, and optimize network connections. Whether you are dealing with on-premises data centers, remote offices, or multi-cloud environments, NCC provides a streamlined approach to network management.

### Key Features of NCC

Google’s NCC is packed with features that make it an indispensable tool for network administrators. Here are some key highlights:

– **Centralized Management**: NCC offers a single pane of glass for monitoring and managing all network connections, reducing complexity and improving efficiency.

– **Scalability**: Built on Google Cloud’s robust infrastructure, NCC can scale effortlessly to accommodate growing network demands.

– **Automation and Intelligence**: With built-in automation and intelligent insights, NCC helps in proactive network management, minimizing downtime and optimizing performance.

– **Integration**: Seamlessly integrates with other Google Cloud services and third-party tools, providing a cohesive ecosystem for network operations.

Understanding Network Tiers

Network tiers refer to the different levels of performance and cost offered by cloud service providers. They allow businesses to choose the most suitable network option based on their specific needs. Google Cloud offers two network tiers: Standard and Premium.

The Standard Tier provides businesses with a cost-effective network solution that meets their basic requirements. It offers reliable performance and ensures connectivity within Google Cloud services. With its lower costs, the Standard Tier is an excellent choice for businesses with moderate network demands.

For businesses that demand higher levels of performance and reliability, the Premium Tier is the way to go. This tier offers optimized routes, reduced latency, and enhanced global connectivity. With its advanced features, the Premium Tier ensures optimal network performance for mission-critical applications and services.

Understanding VPC Networking

VPC networking is the backbone of a cloud infrastructure, providing a private and secure environment for your resources. In Google Cloud, a VPC network can be thought of as your own virtual data center in the cloud. It allows you to define IP ranges, subnets, and firewall rules, empowering you with complete control over your network architecture.

Google Cloud’s VPC networking offers a plethora of features that enhance network management and security. From custom IP address ranges to subnet creation and route configuration, you have the flexibility to design your network infrastructure according to your specific needs. Additionally, VPC peering and VPN connectivity options enable seamless communication with other networks, both within and outside of Google Cloud.

Understanding VPC Peering

VPC Peering enables you to connect VPC networks across projects or organizations. It allows for secure communication and seamless access to resources between peered networks. By leveraging VPC Peering, you can create a virtual network fabric across various environments.

VPC Peering offers several advantages. First, it simplifies network architecture by eliminating the need for complex VPN setups or public IP addresses. Second, it provides low-latency and high-bandwidth connections between VPC networks, ensuring fast and reliable data transfer. Third, it lets you share resources across peering networks, such as databases or storage, promoting collaboration and resource optimization.

Understanding HA VPN

HA VPN, short for High Availability Virtual Private Network, is a feature provided by Google Cloud that ensures continuous and reliable connectivity between your on-premises network and your Google Cloud Virtual Private Cloud (VPC) network. It is designed to minimize downtime and provide fault tolerance by establishing redundant VPN tunnels.

To set up HA VPN, follow a few simple steps. First, ensure that you have a supported on-premises VPN gateway. Then, configure the necessary settings to create a VPN gateway in your VPC network. Next, configure the on-premises VPN gateway to establish a connection with the HA VPN gateway. Finally, validate the connectivity and ensure all traffic is routed securely through the VPN tunnels.

Implementing HA VPN offers several benefits for your network infrastructure. First, it enhances reliability by providing automatic failover in case of VPN tunnel or gateway failures, ensuring uninterrupted connectivity for your critical workloads. Second, HA VPN reduces the risk of downtime by offering a highly available and redundant connection. Third, it simplifies network management by centralizing the configuration and monitoring of VPN connections.

On-premises Data Centers

Understanding Nexus 9000 Series VRRP

Nexus 9000 Series VRRP is a protocol that allows multiple routers to work together as a virtual router, providing redundancy and seamless failover in the event of a failure. These routers ensure continuous network connectivity by sharing a virtual IP address, improving network reliability.

With Nexus 9000 Series VRRP, organizations can achieve enhanced network availability and minimize downtime. Utilizing multiple routers can eliminate single points of failure and maintain uninterrupted connectivity. This is particularly crucial in data center environments, where downtime can lead to significant financial losses and reputational damage.

Configuring Nexus 9000 Series VRRP involves several steps. First, a virtual IP address must be defined and assigned to the VRRP group. Next, routers participating in VRRP must be configured with their respective priority levels and advertisement intervals. Additionally, tracking mechanisms can monitor the availability of specific network interfaces and adjust the VRRP priority dynamically.

High Availability and BGP

High availability refers to the ability of a system or network to remain operational and accessible even during failures or disruptions. BGP is pivotal in achieving high availability by employing various mechanisms and techniques.

BGP Multipath is a feature that allows for the simultaneous use of multiple paths to reach a destination. BGP can use various paths to ensure redundancy, load balancing, and enhanced network availability.

BGP Route Reflectors are used in large-scale networks to alleviate the full-mesh requirement between BGP peers. By simplifying the BGP peering configuration, route reflectors enhance scalability and fault tolerance, contributing to high availability.

BGP Anycast is a technique that enables multiple servers or routers to share the same IP address. This method routes traffic to the nearest or least congested node, improving response times and fault tolerance.

BGP AS Prepend

Understanding BGP Route Reflection

BGP route reflection is used in large-scale networks to reduce the number of full-mesh peerings required in a BGP network. It allows a BGP speaker to reflect routes received from one set of peers to another set of peers, eliminating the need for every peer to establish a direct connection with every other peer. Using route reflection, network administrators can simplify their network topology and improve its scalability.

The network must be divided into two main components to implement BGP route reflection: route reflectors and clients. Route reflectors serve as the central point for route reflection, while clients are the BGP speakers who establish peering sessions with the route reflectors. It is essential to carefully plan the placement of route reflectors to ensure optimal routing and redundancy in the network.

Route Reflector Hierarchy and Scaling

In large-scale networks, a hierarchy of route reflectors can be implemented to enhance scalability further. This involves using multiple route reflectors, where higher-level route reflectors reflect routes received from lower-level route reflectors. This hierarchical approach distributes the route reflection load and reduces the number of peering sessions required for each BGP speaker, thus improving scalability even further.

Understanding BGP Multipath

BGP multipath enables the selection and utilization of multiple equal-cost paths for forwarding traffic. Traditionally, BGP would only utilize a single best path, resulting in suboptimal network utilization. With multipath, network administrators can maximize link utilization, reduce congestion, and achieve load balancing across multiple paths.

One of the primary advantages of BGP multipath is enhanced network resilience. By utilizing multiple paths, networks become more fault-tolerant, as traffic can be rerouted in the event of link failures or congestion. Additionally, multipath can improve overall network performance by distributing traffic evenly across available paths, preventing bottlenecks, and ensuring efficient resource utilization.

Expansion and scalability

Expanding capacity is straightforward if a link is oversubscribed (more traffic than can be aggregated on the active link simultaneously). Expanding every leaf switch’s uplinks is possible, adding interlayer bandwidth and reducing oversubscription by adding a second spine switch. New leaf switches can be added by connecting them to every spine switch and configuring them as network switches if device port capacity becomes a concern. Scaling the network is made more accessible through ease of expansion. A nonblocking architecture can be achieved without oversubscription between the lower-tier switches and their uplinks.

Defining an active-active data center strategy isn’t easy when you talk to network, server, and compute teams that don’t usually collaborate when planning their infrastructure. An active-active Data center design requires a cohesive technology stack from end to end. Establishing the idea usually requires an enterprise-level architecture drive. In addition, it enables the availability and traffic load sharing of applications across DCs with the following use cases.

  • Business continuity
  • Mobility and load sharing
  • Consistent policy and fast provisioning capability across

Understanding Spanning Tree Protocol (STP)

Spanning Tree Protocol (STP) is a fundamental mechanism to prevent loops in Ethernet networks. It ensures that only one active path exists between two network devices, preventing broadcast storms and data collisions. STP achieves this by creating a loop-free logical topology known as the spanning tree. But what about MST? Let’s find out.

As networks grow and become more complex, a single spanning tree may not be sufficient to handle the increasing traffic demands. This is where Spanning Tree MST comes into play. MST allows us to divide the network into multiple logical instances, each with its spanning tree. By doing so, we can distribute the traffic load more efficiently, achieving better performance and redundancy.

MST operates by grouping VLANs into multiple instances, known as regions. Each region has its spanning tree, allowing for independent configuration and optimization. MST relies on the concept of a Root Bridge, which acts as the central point for each instance. By assigning different VLANs to separate cases, we can control traffic flow and minimize the impact of network changes.

Example: Understanding UDLD

UDLD is a layer 2 protocol designed to detect and mitigate unidirectional links in a network. It operates by exchanging protocol packets between neighboring devices to verify the bidirectional nature of a link. UDLD prevents one-way communication and potential network disruptions by ensuring traffic flows in both directions.

UDLD helps maintain network reliability by identifying and addressing unidirectional links promptly. It allows network administrators to proactively detect and resolve potential issues before they can impact network performance. This proactive approach minimizes downtime and improves overall network availability.

Attackers can exploit unidirectional links to gain unauthorized access or launch malicious activities. UDLD acts as a security measure by ensuring bidirectional communication, making it harder for adversaries to manipulate network traffic or inject harmful packets. By safeguarding against such threats, UDLD strengthens the network’s security posture.

Understanding Port Channel

Port Channel, also known as Link Aggregation, is a mechanism that allows multiple physical links to be combined into a single logical interface. This logical interface provides higher bandwidth, improved redundancy, and load-balancing capabilities. Cisco Nexus 9000 Port Channel takes this concept to the next level, offering enhanced performance and flexibility.

a. Increased Bandwidth: By aggregating multiple physical links, the Cisco Nexus 9000 Port Channel significantly increases the available bandwidth, allowing for higher data throughput and improved network performance.

b. Redundancy and High Availability: Port Channel provides built-in redundancy, ensuring network resilience during link failures. With Cisco Nexus 9000, link-level redundancy is seamlessly achieved, minimizing downtime and maximizing network availability.

c. Load Balancing: Cisco Nexus 9000 Port Channel employs intelligent load balancing algorithms that distribute traffic across the aggregated links, optimizing network utilization and preventing bottlenecks.

d. Simplified Network Management: Cisco Nexus 9000 Port Channel simplifies network management by treating multiple links as a logical interface. This streamlines configuration, monitoring, and troubleshooting processes, leading to increased operational efficiency.

Understanding Virtual Port Channel (VPC)

VPC is a link aggregation technique that treats multiple physical links between two switches as a single logical link. This technology enables enhanced scalability, improved resiliency, and efficient utilization of network resources. By combining the bandwidth of multiple links, VPC provides higher throughput and creates a loop-free topology that eliminates the need for Spanning Tree Protocol (STP).

Implementing VPC brings several advantages to network administrators.

First, it enhances redundancy by providing seamless failover in case of link or switch failures.

Second, active-active multi-homing is achieved, ensuring traffic is evenly distributed across all available links.

Third, VPC simplifies network management by treating two switches as single entities, enabling streamlined configuration and consistent policy enforcement.

Lastly, VPC allows for the creation of large Layer 2 domains, facilitating workload mobility and flexibility.

Understanding Nexus Switch Profiles

Nexus Switch Profiles are a feature of Cisco’s Nexus switches that enable administrators to define and manage a group of switch configurations as a single entity. This simplifies the management of complex networks by reducing manual configuration tasks and ensuring consistent settings across multiple switches. By encapsulating configurations into profiles, network administrators can achieve greater efficiency and operational agility.

Implementing Nexus Switch Profiles offers a plethora of benefits for network management. Firstly, it enables rapid deployment of new switches with pre-defined configurations, reducing time and effort. Secondly, profiles ensure consistency across the network, minimizing configuration errors and improving overall reliability. Additionally, profiles facilitate streamlined updates and changes, as modifications made to a profile are automatically applied to associated switches. This results in enhanced network security, reduced downtime, and simplified troubleshooting.

A. Active-active Transport Technologies

Transport technologies interconnect data centers. As part of the transport domain, redundancies and links are provided across the site to ensure HA and resiliency. Redundancy may be provided for multiplexers, GPONs, DCI network devices, dark fibers, diversity POPs for surviving POP failure, and 1+1 protection schemes for devices, cards, and links.

In addition, the following list contains the primary considerations to consider when designing a data center interconnection solution.

  • Recovery from various types of failure scenarios: Link failures, module failures, node failures, etc.
  • Traffic round-trip requirements between DCs based on link latency and applications
  • Requirements for bandwidth and scalability

B. Active-Active Network Services

Network services connect all devices in data centers through traffic switching and routing functions. Applications should be able to forward traffic and share load without disruptions on the network. Network services also provide pervasive gateways, L2 extensions, and ingress and egress path optimization across the data centers. Most major network vendors’ SDN solutions also integrate VxLAN overlay solutions to achieve L2 extension, path optimization, and gateway mobility.

Designing active-active network services requires consideration of the following factors:

  • Recovery from various failure scenarios, such as links, modules, and network devices, is possible.
  • Availability of the gateway locally as well as across the DC infrastructure
  • Using a VLAN or VxLAN between two DCs to extend the L2 domain
  • Policies are consistent across on-premises and cloud infrastructure – including naming, segmentation rules for integrating various L4/L7 services, hypervisor integration, etc.
  • Optimizing path ingresses and regresses.
  • Centralized management includes inventory management, troubleshooting, AAA capabilities, backup and restore traffic flow analysis, and capacity dashboards.

C. Active-Active L4-L7 Services

ADC and security devices must be placed in both DCs before active-active L4-L7 services can be built. The major solutions in this space include global traffic managers, application policy controllers, load balancers, and firewalls. Furthermore, these must be deployed at different tiers for perimeter, extranet, WAN, core server farm, and UAT segments. Also, it should be noted that most of the leading L4-L7 service vendors currently offer clustering solutions for their products across the DCs. As a result of clustering, its members can share L4/L7 policies, traffic loads, and failover seamlessly in case of an issue.

Below are some significant considerations related to L4-L7 service design

  • Various failure scenarios can be recovered, including link, module, and L4-L7 device failure.
  • In addition to naming policies, L4-L7 rules for various traffic types must be consistent across the on-premises infrastructure and in the multiple clouds.
  • Network management centralized (e.g., inventory, troubleshooting, AAA capabilities, backups, traffic flow analysis, capacity dashboards, etc.)

D. Active-Active Storage Services 

Active-active data centers rely on storage and networking solutions. They refer to the storage in both DCs that serve applications. Similarly, the design should allow for uninterrupted read and write operations. Therefore, real-time data mirroring and seamless failover capabilities across DCs are also necessary. The following are some significant factors to consider when designing a storage system.

  • Recover from single-disk failures, storage array failures, and split-brain failures.
  • Asynchronous vs. synchronous replication: With synchronized replication, data is simultaneously written for primary storage and replica. It typically requires dedicated FC links, which consume more bandwidth.
  • High availability and redundancy of storage: Storage replication factors and the number of disks available for redundancy
  • Failure scenarios of storage networks: Links, modules, and network devices

E. Active-Active Server Virtualization

Over the years, server virtualization has evolved. Microservices and containers are becoming increasingly popular among organizations.  The primary consideration here is to extend hypervisor/container clusters across the DCs to achieve seamless virtual machine/ container instance movement and fail-over. VMware Docker and Microsoft are the two dominant players in this market. Other examples include KVM, Kubernetes (container management), etc.

Here are some key considerations when it comes to virtualizing servers

  • Creating a cross-DC virtual host cluster using a virtualization platform
  • HA protects the VM in normal operational conditions and creates affinity rules that prefer local hosts.
  • Deploying the same service, VMs in two DCs can take over the load in real time when the host machine is unavailable.
  • A symmetric configuration with failover resources is provided across the compute node devices and DCs.
  • Managing computing resources and hypervisors centrally

F. Active-Active Applications Deployment

The infrastructure needs to be in place for the application to function. Additionally, it is essential to ensure high application availability across DCs. Applications can also fail over and get proximity access to locations. It is necessary to have Web, App, and DB tiers available at both data centers, and if the application fails in one, it should allow fail-over and continuity.

Here are a few key points to consider

  • Deploy the Web services on virtual or physical machines (VMs) by using multiple servers to form independent clusters per DC.
  • VM or physical machine can be used to deploy App services. If the application supports distributed deployment, multiple servers within the DC can form a cluster or various servers across DCs can create a cluster (preferred IP-based access).
  • The databases should be deployed on physical machines to form a cross-DC cluster (active-standby or active-active). For example, Oracle RAC, DB2, SQL with Windows server failover cluster (WSFC)

Knowledge Check: Default Gateway Redundancy

A first-hop redundancy protocol (FHRP) always provides an active default IP gateway. To transparently failover at the first-hop IP router, FHRPs use two or more routers or Layer 3 switches.

The default gateway facilitates network communication. Source hosts send data to their default gateways. Default gateways are IP addresses on routers (or Layer 3 switches) connected to the same subnet as the source hosts. End hosts are usually configured with a single default gateway IP address when the network topology changes. The local device cannot send packets off the local network segment if the default gateway is not reached. There is no dynamic method by which end hosts can determine the address of a new default gateway, even if there is a redundant router that may serve as the default gateway for that segment.

Advanced Topics:

Understanding VXLAN Flood and Learn

The flood and learning process is an essential component of VXLAN networks. It involves flooding broadcast, unknown unicast, and multicast traffic within the VXLAN segment to ensure that all relevant endpoints receive the necessary information. By using multicast, VXLAN optimizes network traffic and reduces unnecessary overhead.

Multicast plays a crucial role in enhancing the efficiency of VXLAN flood and learn. By utilizing multicast groups, the network can intelligently distribute traffic to only those endpoints that require the information. This approach minimizes unnecessary flooding, reduces network congestion, and improves overall performance.

Several components must be in place to enable VXLAN flood and learn with multicast. We will explore the necessary configurations on the VXLAN Tunnel Endpoints (VTEPs) and the underlying multicast infrastructure. Topics covered will include multicast group management, IGMP snooping, and PIM (Protocol Independent Multicast) configuration.

Related: Before you proceed, you may find the following useful:

  1. Data Center Topologies
  2. LISP Protocol
  3. Data Center Network Design
  4. ASA Failover
  5. LISP Hybrid Cloud
  6. LISP Control Plane

Active Active Data Center Design

At its core, an active active data center is based on fault tolerance, redundancy, and scalability principles. This means that the active data center should be designed to withstand any hardware or software failure, have multiple levels of data storage redundancy, and scale up or down as needed.

The data center also provides an additional layer of security. It is designed to protect data from unauthorized access and malicious attacks. It should also be able to detect and respond to any threats quickly and in a coordinated manner.

A comprehensive monitoring and management system is essential to ensure the data center functions correctly. This system should be designed to track the data center’s performance, detect problems, and provide the necessary alerting mechanisms. It should also provide insights into how the data center operates so that any necessary changes can be made.

Cisco Validated Design

Cisco has validated this design, which is freely available on the Cisco site. In summary, they have tested a variety of combinations, such as VSS-VSS, VSS-vPV, and vPC-vPC, and validated the design with 200 Layer 2 VLANs and 100 SVIs or 1000 VLANs and 1000 SVI with static routing.

At the time of writing, the M series for the Nexus 7000 supports native encryption of Ethernet frames through the IEEE 802.1AE standard. This implementation uses Advanced Encryption Standard ( AES ) cipher and a 128-bit shared key.

Example: Cisco ACI

In the following lab guide, we demonstrate Cisco ACI. To extend Cisco ACI, we have different designs, such as multi-site and multi-pod. This type of design overcomes many challenges of raising a data center, which we will discuss in this post, such as extending layer 2 networks.

One crucial value of the Cisco ACI is the COOP database that maps endpoints in the network. The following screenshots show the synchronized COOP database across spines, even in different data centers. Notice that the bridge domain VNID is mapped to the MAC address. The COOP database is unique to the Cisco ACI.

COOP database
Diagram: COOP database

**The Challenge: Layer 2 is Weak**

The challenge of data center design is “Layer 2 is weak & IP is not mobile.” In the past, best practices recommended that networks from distinct data centers be connected through Layer 3 ( routing ), isolating the known Layer 2 turmoil. However, the business is driving the application requirements, changing the connectivity requirements between data centers.

The need for an active data center has been driven by the following. It is generally recommended to have Layer 3 connections with path separation through Multi-VRF, P2P VLANs, or MPLS/VPN, along with a modular building block data center design.

Yet, some applications cannot function over a Layer 3 environment. For example, most geo clusters require Layer 2 adjacency between their nodes, whether for heartbeat and connection ( status and control synchronization ) state information or the requirement to share virtual IP.

MAC addresses to facilitate traffic handling in case of failure. However, some clustering products ( Veritas, Oracle RAC ) support communication over Layer 3 but are a minority and don’t represent the general case.

Defining active data centers

The term active-active refers to using at least two data centers where both can service an application at any time, so each functions as an active application site. The demand for active-active data center architecture is to accomplish seamless workload mobility and enable distributed applications along with the ability to pool and maximize resources.  

We must first have active-active data center infrastructure for an active/active application setup. Remember that the network is just one key component of active/active data centers). An active-active DC can be divided into two halves from a pure network perspective:-

  1. Ingress Traffic – inbound traffic
  2. Egress Traffic – outbound traffic
active active data center
Diagram: Active active data center. Scenario. Source is twoearsonemouth

Active Active Data Center and VM Migration

Migrating applications and data to virtual machines (VMs) are becoming increasingly popular as organizations seek to reduce their IT costs and increase the efficiency of their services. VM migration moves existing applications, data, and other components from a physical server to a virtualized environment. This process is becoming increasingly more cost-effective and efficient for organizations, eliminating the need for additional hardware, software, and maintenance costs.

Virtual Machine migration between data centers increases application availability, Layer 2 network adjacency between ESX hosts is currently required, and a consistent LUN must be maintained for stateful migration. In other words, if the VM loses its IP address, it will lose its state, and the TCP sessions will drop, resulting in a cold migration ( VM does a reboot ) instead of a hot migration ( VM does not reboot ).

Due to the stretched VLAN requirement, data center architects started to deploy traditional Layer 2 over the DCI and, unsurprisingly, were faced with exciting results. Although flooding and broadcasts are necessary for IP communication in Ethernet networks, they can become dangerous in a DCI environment.

Traffic Tramboning

Traffic tromboning can also be formed between two stretched data centers, so nonoptimal internal routing happens within extended VLANs. Trombones, by their very nature, create a network traffic scalability problem. Addressing this through load balancing among multiple trombones is challenging since their services are often stateful.

Traffic tromboning can affect either ingress or egress traffic. On egress, you can have FHRP filtering to isolate the HSRP partnership and provide an active/active setup for HSRP. On ingress, you can have GSLB, Route Injection, and LISP.

Traffic Tramboning
Diagram: Traffic Tramboning. Source is Silvanogai

Cisco Active-active data center design and virtualization technologies

Virtualization technologies can overcome many of these problems by being used for Layer 2 extensions between data centers. These include vPC, VSS, Cisco FabricPath, VPLS, OTV, and LISP with its Internet locator design. In summary, different technologies can be used for LAN extensions, and the primary mediums in which they can be deployed are Ethernet, MPLS, and IP.

    1. Ethernet: VSS and vPC or Fabric Path
    2. MLS: EoMPLS and A-VPLS and H-VPLS
    3. IP: OTV
    4. LISP

Ethernet Extensions and Multi-Chassis EtherChannel ( MEC )

It requires protected DWDM or direct fibers and works only between two data centers. It cannot support multi-datacenter topology, i.e., a full mesh of data centers, but can help hub and spoke topologies.

Previously, LAG could only terminate on one physical switch. VSS-MEC and vPC are port-channeling concepts extending link aggregation to two physical switches. This allows for creating L2 typologies based on link aggregation, eliminating the dependency on STP, thus enabling you to scale available Layer 2 bandwidth by bonding the physical links.

Because vPC and VSS create a single connection from an STP perspective, disjoint STP instances can be deployed in each data center. Such isolation can be achieved with BPDU Filtering on the DCI links or Multiple Spanning Tree ( MST ) regions on each site.

At the time of writing, vPC does not support Layer 3 peering, but if you want an L3 link, create one, as this does not need to run on dark fiber or protected DWDM, unlike the extended Layer 2 links. 

Ethernet Extension and Fabric path

The fabric path allows network operators to design and implement a scalable Layer 2 fabric, allowing VLANs to help reduce the physical constraints on server location. It provides a high-availability design with up to 16 active paths at layer 2, with each path a 16-member port channel for Unicast and Multicast.

This enables the MSDC networks to have flat typologies, separating nodes by a single hop ( equidistant endpoints ). Cisco has not targeted Fabric Path as a primary DCI solution as it does not have specific DCI functions compared to OTV and VPLS.

Its primary purpose is for Clos-based architectures. However, if you need to interconnect three or more sites, the Fabric path is a valid solution when you have short distances between your DCs via high-quality point-to-point optical transmission links.

Your WAN links must support Remote Port Shutdown and microflapping protection. By default, OTV and VPLS should be the first solutions considered as they are Cisco-validated designs with specific DCI features. For example, OTV can flood unknown unicast for particular VLANs.

FabricPath
Diagram: FabricPath. Source is Cisco

IP Core with Overlay Transport Virtualization ( OTV ).

OTV provides dynamic encapsulation with multipoint connectivity of up to 10 sites ( NX-OS 5.2 supports 6 sites, and NX-OS 6.2 supports 10 sites ). OTV, also known as Over-The-Top virtualization, is a specific DCI technology that enables Layer 2 extension across data center sites by employing a MAC in IP encapsulation with built-in loop prevention and failure boundary preservation.

There is no data plane learning. Instead, the overlay control plane ( Layer 2 IS-IS ) on the provider’s network facilitates all unicast and multicast learning between sites. OTV has been supported on the Nexus 7000 since the 5.0 NXOS Release and ASR 1000 since the 3.5 XE Release. OTV as a DCI has robust high availability, and most failures can be sub-sec convergence with only extreme and very unlikely failures such as device down resulting in <5 seconds.

Locator ID/Separator Protocol ( LISP)

Locator ID/Separator Protocol ( LISP) has many applications. As the name suggests, it separates the location and identifier of the network hosts, enabling VMs to move across subnet boundaries while retaining their IP address and enabling advanced triangular routing designs.

LISP works well when you have to move workloads and distribute workloads across data centers, making it a perfect complementary technology for an active-active data center design. It provides you with the following:

  • a) Global IP mobility across subnets for disaster recovery and cloud bursting ( without LAN extension ) and optimized routing across extended subnet sites.
  • b) Routing with extended subnets for active/active data centers and distributed clusters ( with LAN extension).
LISP networking
Diagram: LISP Networking. Source is Cisco

LISP answers the problems with ingress and egress traffic tromboning. It has a location mapping table, so when a host move is detected, updates are automatically triggered, and ingress routers (ITRs or PITRs) send traffic to the new location. From an ingress path flow inbound on the WAN perspective, LISP can answer our little problems with BGP in controlling ingress flows. Without LISP, we are limited to specific route filtering, meaning if you have a PI Prefix consisting of a /16.

If you break this up and advertise into 4 x /18, you may still get poor ingress load balancing on your DC WAN links; even if you were to break this up to 8 x /19, the results might still be unfavorable.

LISP works differently than BGP because a LISP proxy provider would advertise this /16 for you ( you don’t advertise the /16 from your DC WAN links ) and send traffic at 50:50 to our DC WAN links. LISP can get a near-perfect 50:50 conversion rate at the DC edge.

Summary: Active Active Data Center Design

In today’s digital age, businesses and organizations rely heavily on data centers to store, process, and manage critical information. However, any disruption or downtime can have severe consequences, leading to financial losses and damage to reputation. This is where redundant data centers come into play. In this blog post, we explored the concept of redundant data centers, their benefits, and how they ensure uninterrupted digital operations.

Understanding Redundancy in Data Centers

Redundancy in data centers refers to duplicating critical components and systems to minimize the risk of failure. It involves creating multiple backups of hardware, power sources, cooling systems, and network connections. With redundant systems, data centers can continue functioning even if one or more components fail.

Types of Redundancy

Data centers employ various types of redundancy to ensure uninterrupted operations. These include:

1. Hardware Redundancy involves duplicate servers, storage devices, and networking equipment. If one piece of hardware fails, the redundant backup takes over seamlessly, preventing disruption.

2. Power Redundancy: Power outages can harm data center operations. Redundant power systems, such as backup generators and uninterruptible power supplies (UPS), provide continuous power supply even during electrical failures.

3. Cooling Redundancy: Overheating can damage sensitive equipment in data centers. Redundant cooling systems, including multiple air conditioning units and cooling towers, help maintain optimal temperature levels and prevent downtime.

Network Redundancy

Network connectivity is crucial for data centers to communicate with the outside world. Redundant network connections ensure that alternative paths are available to maintain uninterrupted data flow if one connection fails. This can be achieved through diverse internet service providers (ISPs), multiple routers, and network switches.

Benefits of Redundant Data Centers

Implementing redundant data centers offers several benefits, including:

1. Increased Reliability: Redundancy minimizes the risk of single points of failure, making data centers highly reliable and resilient.

2. Improved Uptime: Data centers can achieve impressive uptime percentages with redundant systems, ensuring continuous access to critical data and services.

3. Disaster Recovery: Redundant data centers are crucial in disaster recovery strategies. If one data center becomes inaccessible due to natural disasters or other unforeseen events, the redundant facility takes over seamlessly, ensuring business continuity.

Conclusion:

Redundant data centers are vital for organizations that cannot afford any interruption in their digital operations. By implementing hardware, power, cooling, and network redundancy, businesses can mitigate risks, ensure uninterrupted access to critical data, and safeguard their operations from potential disruptions. Investing in redundant data centers is a proactive measure to save businesses from significant financial losses and reputational damage in the long run.