This blog is the third series discussing the tail of active-active data centers and data center failover. The first blog focuses on GTM DNS-based load balancing and introduces failover challenges. The second discusses databases and data center failover. This post addresses storage challenges, and finally the 4th will focus on ingress and egress traffic flows. Ivan also has a good webinar on active data centers. There are many factors to take into consideration, such as the type of storage solution, synchronous or asynchronous replication, latency, and bandwidth restrictions between data centers.
Every solution will have different requirements. Latency can drastically affect synchronous replications as a round trip time (RTT) is added to every write action but for asynchronous replications, this may not be as much of an issue. Design errors may also become apparent from certain failure scenarios, such as data center interconnect failure. Potentially resulting in split-brain scenarios so be careful when you try to over-automate and over-engineer things that should be kept simple in the first place. Split-brain occurs when both are active at the same time. Everything becomes out of sync, which may result in full tap storage restores.
History of Storage
Small Computer System Interface (SCSI) was one of the first open standards for storage. It was developed by the American National Standards Institute (ANSI) for attaching peripherals, such as storage to computers. Initially, it wasn’t very flexible and could connect only 15 devices over a flat copper ribbon cable of 20 meters. The fiber channel replaced the flat cable with a fiber cable. Now, we have a fiber infrastructure that overcomes the 20-meter restriction, and many places use this cable under flooring from Netfloor USA Eco or other firms to keep things safe and tidy. However, it still uses the same SCSI protocol, commonly known as SCSI over fiber. Fibre Channel is used to transport SCSI information units over optical fiber.
We then started to put disks into enclosures, known as storage arrays. Storage arrays increase resilience and availability by eliminating any single points of failure (SPOFs). Applications would not write or own a physical disk but instead, write to what is known as a LUN (Logical disk). A LUN is described as a unit that logically supports read/write operations. LUNs allow multi-access support by permitting numerous hosts to access the same storage array.
Eventually, vendors designed storage area networks (SAN). SAN networks provide access to block-level data storage. Block-level storage is used for SAN access while file-level storage is used for network-attached storage (NAS) access. They did not use SCSI anymore and invented their own routing protocol, known as FSPF routing. FSPF was invented by Brocade and is conceptually similar to OSPF for IP networks. They also implemented VSAN, which is similar to VLANs on Layer 2 networks but used for storage. VSAN is a collection of ports that represent a virtual fabric.
Remote Disk Access
Traditionally servers would have a Host Bus Adapter (HBA) and run FC/ FCoE/iSCSI protocols to communicate with the remote storage array. Another method is to send individual file system calls to a remote file system, known as a NAS. The protocols used for this are CIFS and NFS. CIFS was developed by Microsoft and is an open variation of the Server Message Block Protocol (SMB). NFS, developed by Sun Microsystems, runs over TCP and gives you access to shared files, as opposed to SCSI which gives you access to remote disks. The speed of file access depends on your application. Slow performances are generally not related to the protocols NFS or CIF. If your application is well written and can read huge chunks of data it will be fine over NFS. But, on the other hand, if your application is badly written, it is best to use iSCSI then the host will do most of the buffering.
Why not use LAN instead of a fiber channel?
Initially, there was a huge variety of different operating systems. And most of these operating systems already used SCSI and the device drivers that implemented connectivity to load SCSI host adapters. The storage industry decided to offer the same SCSI protocol to the same devices driver but over a fiber channel physical infrastructure. Everything above the fiber channel was not changed. This allowed backward compatibility to old adapters, which is why they continued to use the old SCSI protocols. Fiber channel has its own set of requirements and terminology. The host still thinks they write to a disk 20m away, requiring tight timings. It must have low latency and a minimum distance of around 100 km. Nothing can be lost so it must be lossless and in order packets are critical. The result of all this is that FC requires lossless networks, which usually result in a very expensive dedicated network. With this approach, you have one network for LAN and one network for storage.
Fiber channel over Ethernet was used to get rid of fiber-only networks by offering I/O consultations between the server and the switches. They took the entire fiber frame and put it into an Ethernet frame. FCoE requires lossless Ethernet (DCB) between the servers and the first switch i.e VN and VF ports. It is mainly used to reduce the amount of cabling between servers and switches. It is an access-tier solution. On the Ethernet side, we must have lossless Ethernet. There are a number of standards IEEE formed for this. The first limited the sending device by issuing a PAUSE frame, known as 802.3x, which stops the server from sending data. As a result of the PAUSE frame, the server stops ALL transmission. But we need a way to stop only the lossless part of the traffic i.e the FCoE traffic. This is known as 802.1qbb and allows you to stop a single class of services. There is also QCN (Congestion notification 802.1Qua) which is an end-to-end mechanism that can tell the sending device to slow down. All the servers, switches, and storage arrays negotiate the class parameters deciding what will be lossless.
Storage Replication for Disaster Recovery
The primary reason for storage replication is for disaster recovery and fulfilling service level agreement (SLA). When data center services fail from one DC to another, how accurate will your data be? The level of data consistency depends on the solution in place and how you choose to replicate your data. There are two types of storage replication – synchronous and asynchronous.
Synchronous has a number of steps. The host writes to the disk, the disk writes to the other disk in the remote location, and only when the other disk says OK will an OK be sent back to the host. Synchronous replication guarantees that the data is perfectly in sync. However, it requires tight timeouts, severely limiting the distance between the two data centers. If there are long distances between data centers you need to implement asynchronous replication. The host writes to the local disk and the local disk immediately says OK without writing or receiving notifications from the remote disk. The local disk sends a written request to the remote disk in the background. If you are using traditional LUN-based replication between two data centers, then most solutions make one of these disks read-only and the other read-write. Problems with latency occur when a VM is spawned up in the data center that only has the read-only copy, resulting in replication back to the writable copy. One major effective design factor is how much bandwidth is consumed by storage replication between data centers.
Better storage architecture is to use a system with distributed file systems – both ends are writable. Replication is not done at the disk level but on a file level. The type of replication you use is down to the recovery point objective (RPO). RPO is the terminology used for business continuity objectives. If you require an RPO of zero, then you must use synchronous replication. As discussed requires a number of steps before it is acknowledged to the application. Synchronous also has distance and latency restrictions, which vary depending on the chosen storage solution. For example, VMware VSAN supports RTT of 5 ms. It is a distributed file system so the replication is not done on a traditional LUN level but on a file level. It employs synchronous replication between data centers, adding RTT to every single write.
Most storage solutions are eventually consistent. You write to a file, the file locks, and then the file is eventually copied to the other end. This offers much better performance, but obviously, RPO is non-zero.