DMVPN

DMVPN

DMVPN

Cisco DMVPN is based on a virtual private network (VPN), which provides private connectivity over a public network like the Internet. Furthermore, the DMVPN network takes this concept of VPN further by allowing multiple VPNs to be deployed over a shared infrastructure in a manageable and scalable way. This shared infrastructure, or “DMVPN network,” enables each VPN to connect to the other VPNs without needing expensive dedicated connections or complex configurations.

DMVPN Explained: DMVPN creates a virtual network built on the existing infrastructure. This virtual network consists of “tunnels” between various endpoints, such as corporate networks, branch offices, or remote users.  This virtual network allows for secure communication between these endpoints, regardless of their geographic location. As we are operating under an underlay, DMVPN is an overlay solution.

 

  • VPN-based security solutions

VPN-based security solutions are increasingly popular and have proven effective and secure technology for protecting sensitive data traversing insecure channel mediums, such as the Internet.

Traditional IPsec-based site-to-site, hub-to-spoke VPN deployment models must scale better and be adequate only for small- and medium-sized networks. However, as demand for IPsec-based VPN implementation grows, organizations with large-scale enterprise networks require scalable and dynamic IPsec solutions that interconnect sites across the Internet with reduced latency while optimizing network performance and bandwidth utilization.

Dynamic Multipoint VPN (DMVPN) technology is used for scaling IPsec VPN networks by offering a large-scale IPsec VPN deployment model that allows the network to expand and realize its full potential. In addition, DMVPN offers scalability that enables zero-touch deployment models.

 

For pre-information, you may find the following posts helpful.

  1. VPNOverview
  2. Dynamic Workload Scaling
  3. DMVPN Phases

 

Cisco DMVPN

Key DMVPN Discussion Points:


  • Introduction to DMVPN and what is involved.

  • .I am highlighting the details of the challenging landscape with standard VPN security technologies

  • Technical details on how to approach implementing a DMVPN network.

  • Different types of DMVPN phases.

  • Details on DMVPN components and DMVPN features.

 

Cisco DMVPN

So, for sites to learn about each other and create dynamic VPNs, the DMVPN solution consists of a combination of existing technologies. Therefore, efficiently designing and implementing a Cisco DMVPN network requires thoroughly understanding these components, their interactions, and how they all come together to create a DMVPN network.

These technologies may seem complex, and this post aims to break them down into simple terms. First, we mentioned that DMVPN has different components, which are the building block of a DMVPN network. These include Generic Routing Encapsulation (GRE), Next Hop Redundancy Protocol (NHRP), and IPsec.

The Dynamic Multipoint VPN (DMVPN) feature allows users to better scale large and small IP Security (IPsec) Virtual Private Networks (VPNs) by combining generic routing encapsulation (GRE) tunnels, IPsec encryption, and Next Hop Resolution Protocol (NHRP).

Each of these components needs a base configuration for DMVPN to work. Once the base configuration is in place, we have a variety of show and debug commands to troubleshoot a DMVPN network to ensure smooth operations.

 

Key DMVPN components include:

●   Multipoint GRE (mGRE) tunnel interface: Allows a single GRE interface to support multiple IPsec tunnels, simplifying the size and complexity of the configuration.

●   Dynamic discovery of IPsec tunnel endpoints and crypto profiles: Eliminates the need to configure static crypto maps defining every pair of IPsec peers, further simplifying the configuration.

●   NHRP: Allows spokes to be deployed with dynamically assigned public IP addresses (i.e., behind an ISP’s router). The hub maintains an NHRP database of the public interface addresses of each spoke. Each spoke registers its real address when it boots; when it needs to build direct tunnels with other spokes, it queries the NHRP database for real addresses of the destination spokes

DMVPN Explained
Diagram: DMVPN explained. Source is TechTarget

 

Back to basics: DMVPN Explained.

Overlay Networking

A Cisco DMVPN network consists of many virtual networks, and such a virtual network is called an overlay network because it depends on an underlying transport called the underlay network.  The underlay network forwards traffic flowing through the overlay network. With the use of protocol analyzers, the underlay network is aware of the existence of the overlay. However, left to its defaults, the underlay network does not fully see the overlay network.

We will have routers at the company’s sites that are considered the endpoints of the tunnel that forms the overlay network. So, we could have a WAN edge router or Cisco ASA configured for DMVPN.  Then, for the underlay that is likely out of your control, have an array of service provider equipment such as routers, switches, firewalls, and load balancers that make up the underlay.

What is VXLAN
Diagram: Virtual overlay solutions.

 

Creating an overlay network

The overlay network does not magically appear. To create an overlay network, we need a tunneling technique. Many tunneling technologies can be used to form the overlay network.  The Generic Routing Encapsulation (GRE) tunnel is the most widely used for external connectivity, while VXLAN is for internal to the data center. And the one which DMVPN adopts. A GRE tunnel can support tunneling for various protocols over an IP-based network. It works by inserting an IP and GRE header on top of the original protocol packet, creating a new GRE/IP packet. 

The resulting GRE/IP packet uses a source/destination pair routable over the underlying infrastructure. The GRE/IP header is the outer header, and the original protocol header is the inner header.

 

  • A key point: Is GRE over IPsec a tunneling protocol? 

GRE is a tunneling protocol that transports multicast, broadcast, and non-IP packets like IPX. IPSec is an encryption protocol. IPSec can only transport unicast packets, not multicast & broadcast. Hence we wrap it GRE first and then into IPSec, called GRE over IPSec.

 

Routing protocols

Once the virtual tunnel is fully functional, the routers need a way to direct traffic through their tunnels. Dynamic routing protocols such as EIGRP and OSPF are excellent choices for this. By simply enabling a dynamic routing protocol on the tunnel and LAN interfaces on the router, the routing table is populated appropriately:

GRE tunnels are, by nature, point-to-point tunnels. Therefore, configuring a GRE tunnel with multiple destinations by default is impossible. A point-to-point GRE tunnel is an excellent solution if the goal is only connecting two sites and you don’t plan to scale. However, as sites are added, additional point-to-point tunnels are required to connect them, which is complex to manage.

 

Multipoint GRE

An alternative to configuring multiple point-to-point GRE tunnels is to use multipoint GRE tunnels to provide the connectivity desired. Multipoint GRE (mGRE) tunnels are similar in construction to point-to-point GRE tunnels except for the tunnel destination command. However, instead of declaring a static destination, no destination is declared, and instead, the tunnel mode gre multipoint command is issued.

How does one remote site know what destination to set for the GRE/IP packet created by the tunnel interface? The easy answer is that, on its own, it can’t. The site can only glean the destination address with the help of an additional protocol. This is the next component used to create a DMVPN network isNext Hop Resolution Protocol (NHRP). 

Essentially, mGRE features a single GRE interface on each router with the possibility of multiple destinations. This interface secures multiple IPsec tunnels and reduces the overall scope of the DMVPN configuration. However, if two branch routers need to tunnel traffic, mGRE and point-to-point GRE may not know which IP addresses to use. The Next Hop Resolution Protocol (NHRP) is used to solve this issue.

 

Next Hop Resolution Protocol (NHRP)

The Next Hop Resolution Protocol (NHRP) is a networking protocol designed to facilitate efficient and reliable communication between two nodes on a network. It does this by providing a way for one node to discover the IP address of another node on the same network.

The primary role of NHRP is to allow a node to resolve the IP address of another node that is not directly connected to the same network. This is done by querying an NHRP server which contains a mapping of all the nodes on the network. When a node requests the NHRP server, it will return the IP address of the destination node.

NHRP was originally designed to allow routers connected to non-broadcast multiple-access (NBMA) networks to discover the proper next-hop mappings to communicate. It is specified in RFC 2332. NBMA networks faced a similar issue as mGRE tunnels. 

Cisco DMVPN
Diagram: Cisco DMVPN and NHRP. The source is network direction.

 

The NHRP can deploy spokes with assigned IP addresses. These spokes can be connected from the central DMVPN hub. One branch router requires this protocol to find the public IP address of the second branch router. NHRP uses a “server-client” model, where one router functions as the NHRP server while the other routers are the NHRP clients. In the multipoint GRE/DMVPN topology, the hub router is the NHRP server, and all other routers are the spokes. 

Each client registers with the server and reports its public IP address, which the server tracks in its cache. Then, through a process that involves registration and resolution requests from the client routers and resolution replies from the server router, traffic is enabled between various routers in the DMVPN.

 

IPsec Tunnels

An IPsec tunnel is a secure connection between two or more devices over an untrusted network using a set of cryptographic security protocols. The most common type of IPsec tunnel is the site-to-site tunnel, which connects two sites or networks. It allows two remote sites to communicate securely and exchange traffic between them. Another type of IPsec tunnel is the remote-access tunnel, which allows a remote user to connect to the corporate network securely.

Several parameters, such as authentication method, encryption algorithm, and tunnel mode, must be configured when setting up an IPsec tunnel. Additional security protocols such as Internet Key Exchange (IKE) can also be used for further authentication and encryption, depending on the organization’s needs.

IPsec VPN
Diagram: IPsec VPN. Source Wikimedia.

 

IPsec Tunnel Endpoint Discovery 

Tunnel Endpoint Discovery (TED) allows routers to automatically discover IPsec endpoints so that static crypto maps must not be configured between individual IPsec tunnel endpoints. In addition, TED allows endpoints or peers to dynamically and proactively initiate the negotiation of IPsec tunnels to discover unknown peers. These remote peers do not need to have TED configured to be discovered by inbound TED probes. So, while configuring TED, VPN devices that receive TED probes on interfaces — that are not configured for TED — can negotiate a dynamically initiated tunnel using TED.

 

Routing protocols 

Routing protocols enable the DMVPN to find routes between different endpoints efficiently and effectively. Therefore, to build a scalable and stable DMVPN, choosing the right routing protocol is essential. One option is to use Open Shortest Path First (OSPF) as the interior routing protocol. However, OSPF is best suited for small-scale DMVPN deployments. 

The Enhanced Interior Gateway Routing Protocol (EIGRP) or Border Gateway Protocol (BGP) is more suitable for large-scale implementations. EIGRP is not restricted by the topology limitations of a link-state protocol and is easier to deploy and scale in a DMVPN topology. BGP can scale to many peers and routes, and it puts less strain on the routers compared to other routing protocols

 

DMVPN Deployment Scenarios: 

Cisco DMVPN can be deployed in two ways:

  1. Hub-and-spoke deployment model
  2. Spoke-to-spoke deployment model

Hub-and-spoke deployment model: In this traditional topology, remote sites, which are the spokes, are aggregated into a headend VPN device. The headend VPN location would be at the corporate headquarters, known as the hub. 

Traffic from any remote site to other remote sites would need to pass through the headend device. Cisco DMVPN supports dynamic routing, QoS, and IP Multicast while significantly reducing the configuration effort. 

Spoke-to-spoke deployment model: Cisco DMVPN allows the creation of a full-mesh VPN, in which traditional hub-and-spoke connectivity is supplemented by dynamically created IPsec tunnels directly between the spokes. 

With direct spoke-to-spoke tunnels, traffic between remote sites does not need to traverse the hub; this eliminates additional delays and conserves WAN bandwidth while improving performance. 

Spoke-to-spoke capability is supported in a single-hub or multi-hub environment. Multihub deployments provide increased spoke-to-spoke resiliency and redundancy.  

 

DMVPN Designs

The word phase is almost always connected to DMVPN design discussions. DMVPN phase refers to the version of DMVPN implemented in a DMVPN design. As mentioned above, we can have two deployment models, each of which can be mapped to a DMVPN Phase.

Cisco DMVPN as a solution was rolled out in different stages as the explanation became more widely adopted to address performance issues and additional improvised features. There are three main phases for DMVPN:

  • Phase 1 – Hub-and-spoke
  • Phase 2 – Spoke-initiated spoke-to-spoke tunnels
  • Phase 3 – Hub-initiated spoke-to-spoke tunnels

The differences between the DMVPN phases are related to routing efficiency and the ability to create spoke-to-spoke tunnels. We started with DMVPN Phase 1 where we only had a hub to spoke. So this needed to be more scalable as we could not have direct spoke-to-spoke communication. Instead, the spokes could communicate with one another but were required to traverse the hub.

Then we went to DMVPN Phase 2 to support spoke-to-spoke with dynamic tunnels. These tunnels were brought up by initially passing traffic via the hub. Later, Cisco developed DMVPN Phase 3, which optimized how spoke-to-spoke commutation happens and the tunnel build-up.

Dynamic multipoint virtual private networks began simply as what is best described as hub-and-spoke topologies. The primary tool to create these VPNs combines Multipoint Generic Routing Encapsulation (mGRE) connections employed on the hub and traditional Point-to-Point (P2P) GRE tunnels on the spoke devices.

In this initial deployment methodology, known as a Phase 1 DMVPN, the spokes can only join the hub and communicate with one another through the hub. This phase does not use spoke-to-spoke tunnels. Instead, the spokes are configured for point-to-point GRE to the hub and register their logical IP with the non-broadcast multi-access (NBMA) address on the next hop server (NHS) hub.

It is essential to keep in mind that there is a total of three phases, and each one can influence the following:

  1. Spoke-to-spoke traffic patterns
  2. Routing protocol design
  3. Scalability

 

Summary of DMVPN Phases

 Phase 1—Hub-to-Spoke Designs: Phase 1 was the first design introduced for hub-to-spoke implementation, where spoke-to-spoke traffic would traverse via the hub. Phase 1 also introduced daisy chaining of identical hubs for scaling the network, thereby providing Server Load Balancing (SLB) capability to increase the CPU power.

Phase 2—Spoke-to-Spoke Designs: Phase 2 design introduced the ability for dynamic spoke-to-spoke tunnels without traffic going through the hub, intersite communication bypassing the hub, thereby providing greater scalability and better traffic control. In Phase 2 network design, each DMVPN network is independent of other DMVPN networks, causing spoke-to-spoke traffic from different regions to traverse the regional hubs without going through the central hub.

 Phase 3—Hierarchical (Tree-Based) Designs: Phase 3 extended Phase 2 design with the capability to establish dynamic and direct spoke-to-spoke tunnels from different DMVPN networks across multiple regions. In Phase 3, all regional DMVPN networks are bound to form a single hierarchical (tree-based) DMVPN network, including the central hubs. As a result, spoke-to-spoke traffic from different regions can establish direct tunnels with each other, thereby bypassing both the regional and main hubs.

DMVPN network
Diagram: DMVPN network and phases explained. Source is blog

 

Design recommendation

Which deployment model can you use? The 80:20 traffic rule can be used to determine which model to use:

  1. If 80 percent or more of the traffic from the spokes is directed into the hub network itself, deploy the hub-and-spoke model.
  2. Consider the spoke-to-spoke model if more than 20 percent of the traffic is meant for other spokes.

The hub-and-spoke model is usually preferred for networks with a high volume of IP Multicast traffic.

 

Architecture

Medium-sized and large-scale site-to-site VPN deployments require support for advanced IP network services such as:

● IP Multicast: Required for efficient and scalable one-to-many (i.e., Internet broadcast) and many-to-many (i.e., conferencing) communications, and commonly needed by voice, video, and specific data applications

● Dynamic routing protocols: Typically required in all but the smallest deployments or wherever static routing is not manageable or optimal

● QoS: Mandatory to ensure performance and quality of voice, video, and real-time data applications

 

  • A key point: DMVPN Explained

Traditionally, supporting these services required tunneling IPsec inside protocols such as Generic Route Encapsulation (GRE), which introduced an overlay network, making it complex to set up and manage and limiting the solution’s scalability. Indeed, traditional IPsec only supports IP Unicast, making deploying applications that involve one-to-many and many-to-many communications inefficient.

Cisco DMVPN combines GRE tunneling and IPsec encryption with Next-Hop Resolution Protocol (NHRP) routing to meet these requirements while reducing the administrative burden. 

 

How DMVPN Works

DMVPN builds a dynamic tunnel overlay network.

• Initially, each spoke establishes a permanent IPsec tunnel to the hub. (At this stage, spokes do not establish tunnels with other spokes within the network.) The hub address should be static and known by all of the spokes.

• Each spoke registers its actual address as a client to the NHRP server on the hub. The NHRP server maintains an NHRP database of the public interface addresses for each spoke.

• When a spoke requires that packets be sent to a destination (private) subnet on another spoke, it queries the NHRP server for the real (outside) addresses of the other spoke’s destination to build direct tunnels.

• The NHRP server looks up the NHRP database for the corresponding destination spoke and replies with the real address for the target router. NHRP prevents dynamic routing protocols from discovering the route to the correct spoke. (Dynamic routing adjacencies are established only from spoke to the hub.)

• After the originating spoke learns the peer address of the target spoke, it initiates a dynamic IPsec tunnel to the target spoke.

• Integrating the multipoint GRE (mGRE) interface, NHRP, and IPsec establishes a direct dynamic spoke-to-spoke tunnel over the DMVPN network.

The spoke-to-spoke tunnels are established on demand whenever traffic is sent between the spokes. After that, packets can bypass the hub and use the spoke-to-spoke tunnel directly. 

 

Feature Design of Dynamic Multipoint VPN 

The Dynamic Multipoint VPN (DMVPN) feature combines GRE tunnels, IPsec encryption, and NHRP routing to provide users ease of configuration via crypto profiles–which override the requirement for defining static crypto maps–and dynamic discovery of tunnel endpoints. 

This feature relies on the following two Cisco-enhanced standard technologies: 

  • NHRP–A client and server protocol where the hub is the server, and the spokes are the clients. The hub maintains an NHRP database of the public interface addresses of each spoke. Each spoke registers its real address when it boots and queries the NHRP database for real addresses of the destination spokes to build direct tunnels. 
  • mGRE Tunnel Interface –Allows a single GRE interface to support multiple IPsec tunnels and simplifies the size and complexity of the configuration.
  • Each spoke has a permanent IPsec tunnel to the hub, not to the other spokes within the network. Each spoke registers as a client of the NHRP server. 
  • When a spoke needs to send a packet to a destination (private) subnet on another spoke, it queries the NHRP server for the real (outside) address of the destination (target) spoke. 
  • After the originating spoke “learns” the peer address of the target spoke, it can initiate a dynamic IPsec tunnel to the target spoke. 
  • The spoke-to-spoke tunnel is built over the multipoint GRE interface. 
  • The spoke-to-spoke links are established on demand whenever there is traffic between the spokes. After that, packets can bypass the hub and use the spoke-to-spoke tunnel.
Cisco DMVPN
Diagram: Cisco DMVPN features. The source is Cisco.

 

Cisco DMVPN Solution Architecture

DMVPN allows IPsec VPN networks to better scale hub-to-spoke and spoke-to-spoke designs, optimizing performance and reducing communication latency between sites.

DMVPN offers a wide range of benefits, including the following:

• The capability to build dynamic hub-to-spoke and spoke-to-spoke IPsec tunnels

• Optimized network performance

• Reduced latency for real-time applications

• Reduced router configuration on the hub that provides the capability to dynamically add multiple spoke tunnels without touching the hub configuration

• Automatic triggering of IPsec encryption by GRE tunnel source and destination, assuring zero packet loss

• Support for spoke routers with dynamic physical interface IP addresses (for example, DSL and cable connections)

• The capability to establish dynamic and direct spoke-to-spoke IPsec tunnels for communication between sites without having the traffic go through the hub; that is, intersite communication bypassing the hub

• Support for dynamic routing protocols running over the DMVPN tunnels

• Support for multicast traffic from hub to spokes

• Support for VPN Routing and Forwarding (VRF) integration extended in multiprotocol label switching (MPLS) networks

• Self-healing capability maximizing VPN tunnel uptime by rerouting around network link failures

• Load-balancing capability offering increased performance by transparently terminating VPN connections to multiple headend VPN devices

Network availability over a secure channel is critical in designing scalable IPsec VPN solution designs with networks becoming geographically distributed. DMVPN solution architecture is by far the most effective and scalable solution available.

ebook templarte

eBOOK – SASE Capabilities

In the following ebook, we will address the key points:

  1. Challenging Landscape
  2. The rise of SASE based on new requirements
  3. SASE definition
  4. Core SASE capabilities
  5. Final recommendations

 

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following links useful:

 

Secure Access Service Edge (SASE) is a service designed to provide secure access to cloud applications, data, and infrastructure from anywhere. It allows organizations to securely deploy applications and services from the cloud while managing users and devices from a single platform. As a result, SASE simplifies the IT landscape and reduces the cost and complexity of managing security for cloud applications and services.

SASE provides a unified security platform allowing organizations to connect users and applications securely without managing multiple security solutions. It offers secure access to cloud applications, data, and infrastructure with a single set of policies, regardless of the user’s physical location. It also enables organizations to monitor and control all user activities with the same set of security policies.

SASE also helps organizations reduce the risk of data breaches and malicious actors by providing visibility into user activity and access. In addition, it offers end-to-end encryption, secure authentication, and secure access control. It also includes threat detection, advanced analytics, and data loss prevention.

SASE allows organizations to scale their security infrastructure quickly and easily, providing them with a unified security platform that can be used to connect users and applications from anywhere securely. With SASE, organizations can quickly and securely deploy applications and services from the cloud while managing users and devices from a single platform.

 

 

 

 

Openshift networking

OpenShift Networking

 

 

OpenShift Networking

OpenShift relies heavily on a networking stack at two layers:

  1. In the case of OpenShift itself deployed in the virtual environment, the physical network equipment directly determines the underlying network topology. OpenShift does not control this level, which provides connectivity to OpenShift masters and nodes.
  2. OpenShift SDN plugin determines the virtual network topology. At this level, applications are connected, and external access is provided.

 

The network topology in OpenShift

OpenShift uses an overlay network based on VXLAN to enable containers to communicate with each other. Layer 2 (Ethernet) frames can be transferred across Layer 3 (IP) networks using the Virtual eXtensible Local Area Network (VXLAN) protocol. Whether the communication is limited to pods within the same project or completely unrestricted depends on the SDN plugin being used. The network topology remains the same regardless of which plugin is used.

 

SDN plugins

OpenShift makes its internal SDN plugins available out-of-the-box, as well as plugins for integration with third-party SDN frameworks. The following are three built-in plugins that are available in OpenShift:

  1. ovs-subnet
  2. ovs-multitenant
  3. ovs-networkpolicy

 

Before you proceed, you may find the following posts useful for some pre-information

  1. Kubernetes Networking 101 
  2. Kubernetes Security Best Practice 
  3. Internet of Things Theory

 



OpenShift Networking

Key OpenShift Networking Discussion points:


  • Challenges with containers and endpoint reachability.

  • Challenges to Docker networking.

  • The Kubernetes networking model.

  • Basics of Pod networking.

  • OpenShift networking SDN modes.

 

Back to basics: OpenShift Networking

Key Challenges

We have several challenges with traditional data center networks that prove the inability to support today’s applications, such as microservices and containers. Therefore we need a new set of networking technologies built into OpenShift SDN that can more adequately deal with today’s landscape changes. Firstly, one of the main issues is that we have a tight coupling with all the networking and infrastructure components. With traditional data center networking, Layer 4 is coupled with the network topology at fixed network points and lacks the flexibility to support today’s containerized applications that are more agile than the traditional monolith application.

One of the main issues is that containers are short-lived and constantly spun down. Assets that support the application, such as IP addresses, firewalls, policies, and overlay networks that glue the connectivity, are constantly recycled. These changes bring a lot of agility and business benefits, but there is an extensive comparison to a traditional network that is relatively static, where changes happen every few months.

 

POD network

As a general rule, pod-to-pod communication holds for all Kubernetes clusters: There is an IP address assigned to each Pod in Kubernetes. While pods can communicate directly with each other by addressing their IP addresses, it is recommended that they use Services instead. Services consist of Pods that can be accessed through a single, fixed DNS name or IP address. The majority of Kubernetes applications use Services to communicate. Since Pods can be restarted frequently, addressing them directly by name or IP is highly brittle. Instead, use a Service to manage another pod.

 

  • Simple pod-to-pod communication

The first thing to understand is how Pods communicate within Kubernetes. Kubernetes provides IP addresses for each Pod. IP addresses are used to communicate between pods at a very primitive level. Therefore, you can directly address another Pod using its IP address whenever needed. A Pod has the same characteristics as a virtual machine (VM), which has an IP address, exposes ports, and interacts with other VMs on the network via IP address and port.

What is the communication mechanism between the frontend pod and the backend pod? It is common to have a frontend application that talks to a backend in a web application architecture. The backend could be an API or a database. The frontend and the backend would be separated into two Pods in Kubernetes.

The front end could be configured to communicate directly with the back end via its IP address. However, a front end would still need to know the IP address of the backend, which can be tricky when the Pod is restarted or moved to another node. Using a Service, we can make our solution less brittle. Because the app still communicates with the API pods via the Service, which has a stable IP address, if the Pods die or need to be restarted, this won’t affect the app.

pod networking
Diagram: Pod networking. Source is tutorialworks

 

  • How do containers in the same Pod communicate?

Sometimes, you may need to run multiple containers in the same Pod. The IP address of multiple containers in the same Pod is the same. Therefore, Localhost can be used to communicate between them. For example, a container in a pod can use the address localhost:8080 to communicate with another container in the Pod on port 8080. In the same Pod, two containers cannot share the same port because the IP addresses are shared, and communication occurs on localhost. For instance, you wouldn’t be able to have two containers in the same Pod that expose port 8080. So it would help if you ensured that the services use different ports.

In Kubernetes, pods can communicate with each other in a few different ways:

  1. Containers in the same Pod can connect using localhost, and then the other container exposes the port number.
  2. A container in a Pod can connect to another Pod using its IP address. To find out the IP address of a Pod, you can use oc get pods.
  3. A container can connect to another Pod through a Service. A Service has an IP address and also usually has a DNS name, like my-service.

 

  • OpenShift and Pod Networking

When you initially deploy OpenShift, a private pod network is created. Each pod in your OpenShift cluster is assigned an IP address on the pod network when deployed. This IP address is used to communicate with each pod across the cluster. The pod network spanned all nodes in your cluster and was extended to your second application node when that was added to the cluster. Your pod network IP addresses can’t be used on your network by any network that OpenShift might need to communicate with. OpenShift’s internal network routing follows all the rules of any network and multiple destinations for the same IP address lead to confusion.

 

Endpoint Reachability

Also, Endpoint Reachability. Not only have endpoints changed, but have the ways we reach them. The application stack previously had very few components, maybe just a cache, web server, or database. Using a load balancing algorithm, the most common network service allows a source to reach an application endpoint or load balance to several endpoints. A simple round-robin was standard or a load balancer that measured load. Essentially, the sole purpose of the network was to provide endpoint reachability. However, changes inside the data center are driving networks and network services toward becoming more integrated with the application.

Nowadays, the network function exists no longer solely to satisfy endpoint reachability; it is fully integrated. In the case of Red Hat’s Openshift, the network is represented as a Software-Defined Networking (SDN) layer. SDN means different things to different vendors. So, let me clarify in terms of OpenShift.

 

Highlighting software-defined network (SDN)

When you examine traditional networking devices, we have the control and forwarding plane; these roles are shared on a single device. The concept of SDN separates these two planes, i.e., the control and forwarding planes are decoupled. They can now reside on different devices, bringing many performances and management benefits. The benefits of network integration and decoupling make it much easier for the applications to be divided into several microservice components driving the microservices culture of application architecture. You could say that SDN was a requirement for microservices.

software defined networking
Diagram: Software Defined Networking (SDN). Source is Opennetworking

 

Challenges to Docker Networking 

      • Port mapping and NAT

Docker containers have been around for a while, but networking had significant drawbacks when they first came out. If you examine container networking, for example, Docker containers, there are other challenges when they connect to a bridge on the node where the docker daemon is running. To allow network connectivity between those containers and any endpoint external to the node, we need to do some port mapping and Network Address Translation (NAT). This, by itself, adds complexity.

Port Mapping and NAT have been around for ages. Introducing these networking functions will complicate container networking when running at scale. Perfectly fine for 3 or 4 containers, but the production network will have many more endpoints to deal with. The origins of container networking are based on a simple architecture and primarily a single-host solution.

 

      • Docker at scale: The need for an orchestration layer

The core building blocks of containers, such as namespaces and control groups, are battle-tested. And although the docker engine manages containers by facilitating Linux Kernel resources, it’s limited to a single host operating system. Once you get past three hosts, it is hard to manage the networking. Everything needs to be spun up in a particular order, and consistent network connectivity and security, regardless of the mobility of the workloads, are also challenged.

This led to an orchestration layer. Just as a container is an abstraction over the physical machine, the container orchestration framework is an abstraction over the network. This brings us to the Kubernetes networking model, which Openshift takes advantage of and enhances; for example, the Openshift Route Construct exposes applications for external access. We will be discussing Openshift Routes and Kubernetes Services in just a moment.

 

Introduction to OpenShift

OpenShift Container Platform (formerly known as OpenShift Enterprise) or OCP is Red Hat’s offering for the on-premises private platform as a service (PaaS). OpenShift is based on the Origin open-source project and is a Kubernetes distribution. The foundation of the OpenShift Container Platform is based on Kubernetes and therefore shares some of the same networking technology along with some enhancements.

Kubernetes is the leading container orchestration, and Openshift is derived from containers and kubernetes as the orchestration layer. All of which lay upon an SDN layer that glues everything together. It is the role of SDN to create the cluster-wide network. And the glue that connects all the dots is the overlay network that operates over an underlay network. But first, let us address the Kubernetes Networking model of operation.

 

The Kubernetes model: Pod networking

The Kubernetes networking model was developed to simplify Docker container networking, which had drawbacks, as we discussed. It introduced the concept of Pod and Pod networking, allowing multiple containers inside a Pod to share an IP namespace. They can communicate with each other on IPC or localhost. Nowadays, we are placing a single container into a pod, which acts as a boundary layer for any cluster parameters directly affecting the container. So we run deployment against pods and not containers. In OpenShift, we can assign networking and security parameters to Pods that will affect the container inside. When an app is deployed on the cluster, each Pod gets an IP assigned, and each Pod could have different applications.

For example, Pod 1 could have a web front end, and Pod could be a database, so the Pods need to communicate. For this, we need a network and IP address. By default, Kubernetes allocates each Pod an internal IP address for applications running within the Pod. Pods and their containers can network, but clients outside the cluster cannot access internal cluster resources by default. With Pod networking, every Pod must be able to communicate with each other Pod in the cluster without Network Address Translation (NAT).

OpenShift Network Policy
Diagram: OpenShift Network Policy.

 

A typical service type: ClusterIP

The most common type of service IP address is type “ClusterIP .”The ClusterIP is a persistent virtual IP address used for load-balancing traffic internal to the cluster. Services with these service types cannot be directly accessed outside the cluster. There are other service types for that requirement. The service type of Cluster-IP is considered for East-West traffic since it originates from Pods running in the cluster to the service IP backed by Pods that also run in the cluster.

Then to enable external access to the cluster, we need to expose the services that the Pod or Pods represent, and this is done with an Openshift Route that provides a URL. So we have a Service running in front of the Pod or groups of Pod. And the default is for internal access only. Then we have a URL-based route that gives the internal service external access.

Openshift networking
Diagram: Openshift networking and clusterIP. Source is Redhat.

 

Using an OpenShift Load Balancer to Get Traffic into the Cluster

OpenShift Container Platform clusters can be accessed externally through an OpenShift load balancer service if you do not need a specific external IP address. The OpenShift load balancer allocate unique IP addresses from configured pools. Load balancers have a single edge router IP (which can be a virtual IP (VIP), but it is still a single machine for initial load balancing). How many OpenShift load balancers are there in OpenShift?

  • Two load balancers

The solution supports some load balancer configuration options: Use the playbooks to configure two load balancers for highly available production deployments. Use the playbooks to configure a single load balancer, which is helpful for proof-of-concept deployments. Deploy the solution using your load balancers.

This process involves the following:

  1. The administrator performs the prerequisites;

  2. The developer creates a project and service, if the service to be exposed does not exist;

  3. The developer exposes the service to create a route.

  4. The developer creates the Load Balancer Service.

  5. The network administrator configures networking to the service.

 

Different Openshift SDN networking modes

OpenShift security best practices  

So depending on your Openshift SDN configuration, you can tailor the network topology differently. You can have free for all Pod connectivity, similar to a flat network or something stricter with different security boundaries and restrictions. A free for all Pod connectivity between all projects might be good for a lab environment. Still, for production networks with multiple projects, you may need to tailor the network with segmentation, and this can be done with one of the OpenShift SDN plugins, which we will get to in just a moment.

Openshift networking does this with an SDN layer and enhances Kubernetes networking to have a virtual network across all the nodes and is created with the Open switch standard. For the Openshift SDN, this Pod network is established and maintained by the OpenShift SDN, configuring an overlay network using Open vSwitch (OVS).

 

The OpenShift SDN plugin

We mentioned that you could tailor the virtual network topology to suit your networking requirements, which can be determined by the OpenShift SDN plugin and the SDN model you select. With the default OpenShift SDN, there are several modes available. This level or SDN mode you choose is concerned with managing connectivity between applications and providing external access to them.

Some modes are more fine-grained than others. How are all these plugins enabled? The Openshift Container Platform (OCP) networking relies on the Kubernetes CNI model while supporting several plugins by default and several commercial SDN implementations, including Cisco ACI. The native plugins rely on the virtual switch Open vSwitch and offer alternatives to providing segmentation using VXLAN, specifically the VNID or the Kubernetes Network Policy objects:

We have, for example:

        • ovs-subnet  
        • ovs-multitenant  
        • ovs-network policy

 

Choosing the right plugin depends on your security and control goals. As SDNs take over networking, third-party vendors develop programmable network solutions. OpenShift is tightly integrated with products from such providers by Red Hat. According to Red Hat, the following solutions are production-ready:

  1. Nokia Nuage
  2. Cisco Contiv
  3. Juniper Contrail
  4. Tigera Calico
  5. VMWare NSX-T
  6. ovs-subnet plugin

After OpenShift is installed, this plugin is enabled by default. As a result, pods can be connected across the entire cluster without limitations so that traffic can flow freely between them. This may be undesirable if security is a top priority in large multitenant environments. 

 

ovs-multitenant plugin

Security is usually unimportant in PoCs and sandboxes but becomes paramount when large enterprises have diverse teams and project portfolios, especially when third parties develop specific applications. A multitenant plugin like ovs-multitenant is an excellent choice if simply separating projects is all you need. This plugin sets up flow rules on the br0 bridge to ensure that only traffic between pods with the same VNID is permitted, unlike the ovs-subnet plugin, which passes all traffic across all pods. It also assigns the same VNID to all pods for each project, keeping them unique across projects.

 

ovs-networkpolicy plugin

While the ovs-multitenant plugin provides a simple and largely adequate means for managing access between projects, it does not allow granular control over access. In this case, the ovs-networkpolicy plugin can be used to create custom NetworkPolicy objects that, for example, apply restrictions to traffic egressing or entering the network.

 

Egress routers

In OpenShift, routers direct ingress traffic from external clients to services, which then forward it to pods. As well as forwarding egress traffic from pods to external networks, OpenShift offers a reverse type of router. Egress routers, on the other hand, are implemented using Squid instead of HAProxy. Routers with egress capabilities can be helpful in the following situations:

  • They are masking different external resources being used by several applications with a single global resource. For example, applications may be developed so that they are built pulling dependencies from different mirrors, and collaboration between their development teams is rather loose. So, instead of getting them to use the same mirror, an operations team can set up an egress router to intercept all traffic directed to those mirrors and redirect it to the same site.
  • To redirect all suspicious requests for specific sites to the audit system for further analysis.

OpenShift supports the following types of egress routers:

  • redirect for redirecting traffic to a specific destination IP
  • http-proxy for proxying HTTP, HTTPS, and DNS traffic

 

 

auto scaling observability

Auto Scaling Observability

 

 

Auto Scaling Observability

“What Is a Metric: Good for Known” So when it comes to auto scaling observability and auto scaling metrics, one needs to understand the downfall of the metric. A metric is a single number, with tags optionally appended for grouping and searching those numbers. They are disposable and cheap and have a predictable storage footprint. A metric is a numerical representation of a system state over the recorded time interval and can tell you if a particular resource is over or underutilized at a particular moment. For example, CPU utilization might be at 75% right now.

There can be many tools to gather metrics, such as Prometheus, along with several techniques used to gather these metrics, such as the PUSH and PULL approaches. There are pros and cons to each method. However, Prometheus metric types and its PULL approach are prevalent in the market. However, if you want full observability and controllability, remember it is solely in metrics-based monitoring solutions.  For additional information on Monitoring and Observability and their difference, visit this post on observability vs monitoring.

 



Auto Scaling Metrics

Key Auto Scaling Observability Discussion points:


  • Metrics are good for "known" issues. 

  • Challenges and issues around metrics for monitoring.

  • Observability considerations.

  • No need to predict.

  • Used for unknown / unknown failure modes.

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with channels of observability

The first component of observability is the channels that convey observations to the observer. There are three channels: logs, traces, and metrics. These channels are common to all areas of observability, including data observability.

Logs
Logs are the most typical channel and take several forms (e.g., line of free-text, JSON. Logs are intended to encapsulate information about an event.

Traces
Traces allow you to do what logs don’t—reconnect the dots of a process. Because traces represent the link between all events of the same process, they allow the whole context to be derived from logs efficiently. Each pair of events, an operation, is a span that can be distributed across multiple servers.

Metrics
Finally, we have metrics. Every system state has some component that can be represented with numbers, and these numbers change as the state changes. Metrics provide a basis of information that allows an observer not only to understand using factual information but also leverage mathematical methods to derive insight from even a large number of metrics (e.g., the CPU load, the number of open files, the average amount of rows, the minimum date).

 

Auto scaling observability
Auto scaling observability: Metric Overload

 

Auto Scaling Observability

Metrics: Resource Utilization Only

So metrics help tell us about resource utilization. Within a Kubernetes environment, these metrics are used to perform auto-healing and auto-scheduling purposes. So when it comes to metrics, monitoring performs several functions. First, it can collect, aggregate, and analyze metrics to shift through known patterns that indicate troubling trends. The critical point here is that it shifts through known patterns. Then, based on a known event, metrics trigger alerts that notify when further investigation is needed. Finally, we have dashboards that display the metrics data trends adapted for visual consumption on top of all of this.

These monitoring systems work well for identifying previously encountered known failures but don’t help as much for the unknown. Unknown failures are the norm today with disgruntled systems and complex system interactions. Metrics are suitable for dashboards, but there won’t be a predefined dashboard for unknowns as it can’t track something it does not know about. Using metrics and dashboards like this is a very reactive approach. Yet, it’s an approach widely accepted as the norm. Monitoring is a reactive approach best suited for detecting known problems and previously identified patterns. 

 

Metrics and intermittent problems?

So within a microservices environment, the metrics can help you when the microservice is healthy or unhealthy. Still, a metric will have difficulty telling you if a microservices function takes a long time to complete or if there is an intermittent problem with an upstream or downstream dependency. So we need different tools to gather this type of information.

We have an issue with auto scaling metrics because they only look at individual microservices with a given set of attributes. So they don’t give you a holistic view of the entire problem. For example, the application stack now exists in numerous locations and location types; we need a holistic viewpoint. And a metric does not give this. For example, metrics are used to track simplistic system states that might indicate a service may be running poorly or may be a leading indicator or an early warning signal. However, while those measures are easy to collect, they don’t turn out to be proper measures for triggering alerts.

 

Auto scaling metrics: Issues with dashboards: Useful only for a few metrics

So these metrics are gathered and stored in time-series databases, and we have several dashboards to display these metrics. These dashboards were first built, and there weren’t many system metrics to worry about. You could have gotten away with 20 or so dashboards. But that was about it. As a result, it was easy to see the critical data anyone should know about for any given service. Moreover, those systems were simple and did not have many moving parts. This contrasts the modern services that typically collect so many metrics that fitting them into the same dashboard is impossible.

 

Auto scaling metrics: Issues with aggregate metrics

So we must find ways to fit all the metrics into a few dashboards. Here the metrics are often pre-aggregated and averaged. However, the issue is that the aggregate values no longer provide meaningful visibility, even when we have filters and drill-downs. Therefore we need to predeclare conditions that describe conditions we expect in the future.  This is where we use instinctual practices of past experiences and rely on gut feeling. Remember the network and software hero? It would help to avoid aggregation and averaging within the metrics store. On the other hand, we have Percentiles that offer a richer view. Keep in mind, however, that they require raw data.

 

 

Auto Scaling Observability: Any Question

For auto scaling observability, we take on an entirely different approach. They strive for different exploratory approaches to finding problems. Essentially, those operating observability systems don’t sit back and wait for an alert or something to happen. Instead, they are always actively looking and asking random questions to the observability system. Observability tools should gather rich telemetry for every possible event, having full content of every request and then having the ability to store it and query. In addition, these new auto scaling observability tools are specifically designed to query against high-cardinality data. High cardinality allows you to interrogate your event data in any arbitrary way that we see fit. Now we ask any questions about your system and inspect its corresponding state. 

 

Key Auto Scaling Observability Considerations

No predictions in advance.

Due to the nature of modern software systems, you want to understand any inner state and services without anticipating or predicting them in advance. For this, we need to gain valuable telemetry and use some new tools and technological capabilities to gather and interrogate this data once it has been collected. Telemetry needs to be constantly gathered in flexible ways to debug issues without predicting how failures may occur. 

The conditions affecting infrastructure health change infrequently, and they are relatively easier to monitor. In addition, we have several well-established practices to predict, such as capacity planning and the ability to remediate automatically e.g., auto-scaling in a Kubernetes environment. All of which can be used to tackle these types of known issues.

Auto Scaling Observability
Diagram: Auto Scaling Observability and Observability tools.

 

Due to its relatively predictable and slowly changing nature, the aggregated metrics approach monitors and alerts perfectly for infrastructure problems. So here, a metric-based system works well. Metrics-based systems and their associated alerts help you see when capacity limits or known error conditions of underlying systems are being reached. So, metrics-based systems work well for infrastructure problems that don’t change too much but fall dramatically short in complex distributed systems. You should opt for an observability and controllability platform for these types of systems. 

 

ACI networks

ACI Networks

 

 

ACI Networks

The following post provides a Cisco ACI overview focusing on the network components that form the ACI data center. Firstly, the Cisco data center design traditionally built our networks based on hierarchical data center topologies. This is often referred to as the traditional data center with a three-tier design with an access layer, an aggregation layer, and a core layer. Historically, this design enabled substantial predictability because aggregation switch blocks simplified the spanning-tree topology. In addition, the need for scalability often pushed this design into modularity with ACI networks and ACI Cisco, which increased predictability.

However, although we increased predictability, the main challenge inherent in the three-tier models is that it was difficult to scale. As the number of endpoints increases and the need to move between segments, we need to span layer 2. This is a significant difference between the traditional and the ACI data center.

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following post helpful:

  1. Data Center Security 

 



ACI Networks


Key ACI Networks Discussion Points:


  • Design around issues with Spanning Tree Protocol.

  • Layer 2 all the way to the Core.

  • Routing at the Access layer.

  • The changes from ECMP.

  • ACI networks and normalization.

  • Leaf and Spine designs.

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with traditional data center

Our journey towards ACI starts in the early 1990s, looking at the most traditional and well-known two- or three-layer network architecture. This Core/Aggregation/Access design was generally used and recommended for campus enterprise networks. At that time and in that environment, it delivered sufficient quality for typical client-server types of applications. The traditional design taken from campus networks was based on Layer 2 connectivity between all network parts, segmentation was implemented using VLANs, and the loop-free topology relied on the Spanning Tree Protocol (STP).

Scaling such an architecture implies the growth of broadcast and failure domains, which could be more beneficial for the resulting performance and stability. For instance, picture each STP Topology Change Notification (TCN) message causing MAC tables aging in the whole datacenter for a particular VLAN, followed by excessive BUM (Broadcast, Unknown Unicast, Multicast) traffic flooding until all MACs are relearned.

 

  • A key point: Designing around STP

Before we delve into the Cisco ACI overview, let us first address some basics around STP design. The traditional Cisco data center design often leads to poor network design and human error. You don’t want a layer 2 segment between the data center unless you have the proper controls. Although modularization is still desired in networks today, the general trend has been to move away from this design type that evolves around spanning tree to a more flexible and scalable solution with VXLAN and other similar Layer 3 overlay technologies. In addition, the Layer 3 overlay technologies bring a lot of network agility which is vital to business success.

Agility refers to making changes, deploying services, and supporting the business at its desired speed. This means different things to different organizations. For example, a network team can be considered agile if it can deploy network services in a matter of weeks. In others, it could mean that business units in a company should be able to get applications to production or scale core services on demand through automation with Ansible CLI or Ansible Tower.

Regardless of how you define agility, there is little disagreement with the idea that network agility is vital to business success. The problem is that network agility has traditionally been hard to achieve until now with the ACI data center. Let’s recap some of the main Cisco data center design transitions to understand fully.

 

Cisco data center design
Diagram: Cisco data center design transformation.

 

Cisco ACI Overview: The Need for ACI Networks

Layer 2 to the Core

The traditional SDN data center has gone through several transitions. Firstly, we had Layer 2 to the core. Then, from the access to the core, we had Layer 2 and not Layer 3. A design like this would, for example, trunk all VLANs to the core. For redundancy, you would manually prune VLANs from the different trunk links. Our challenge with this approach of having Layer 2 to the core relies on Spanning Tree Protocol. Therefore redundant links are blocked. As a result, we don’t have the total bandwidth, leading to performance degradation and wasting resources. Another challenge is to rely on topology changes to fix the topology.

 

Data Center Design

Data Center Stability

Layer 2 to the Core layer

STP blocks reduandant links

Manual pruning of VLANs

STP for topology changes

Efficient design

 

Spanning Tree Protocol does have timers to limit the convergence and can be tuned for better performance. Still, we rely on the convergence from Spanning Tree Protocol to fix the topology, but Spanning Tree Protocol was never meant to be a routing protocol. Compared to other protocols operating higher up in the stack are designed to be more optimized to react to changes in the topology. But STP is not an optimized control plane protocol, which significantly hinders the traditional data center. You could relate this to how VLANs have transitioned to become a security feature. However, their purpose was originally for performance reasons.

 

Routing to Access Layer

The Layer 3 boundary gets pushed further to the network’s edge to overcome these challenges to build stable data center networks. Layer 3 networks can use the advances in routing protocols to handle failures and link redundancy much more efficiently. A lot more efficient than Spanning Tree Protocol that should never have been there in the first place. Then we had routing at the access. With this design, we can eliminate Spanning Tree Protocol to the core and then run Equal Cost MultiPath (ECMP) from the access to the core.

We can run ECMP as we are now Layer 3 routing from the access to the core layer instead of running STP that blocks redundant links.  However, equal-cost multipath (ECMP) routes offer a simple way to share the network load by distributing traffic onto other paths. ECMP is typically applied only to entire flows or sets of flows. Destination address, source address, transport level ports, and payload protocol may characterize a flow in this respect.

 

Data Center Design

Data Center Stability


Layer 3 to the Core layer

Routing protocol stability 

Automatic routing  convergence

STP for topology changes

Efficient design

 

  • A Key Point: Equal Cost MultiPath (ECMP)

Equal Cost MultiPath (ECMP) brings many advantages; firstly, ECMP gives us total bandwidth with equal-cost links. As we are routing, we no longer have to block redundant links to prevent loops at Layer 2. However, we still have Layer 2 in the network design and Layer 2 on the access layer; therefore, parts of the network will still rely on the Spanning Tree Protocol, which converges when there is a change in the topology.

So we may have Layer 3 from the access to the core, but we still have Layer 2 connections at the edge and rely on STP to block redundant links to prevent loops. Another potential drawback is that having smaller Layer 2 domains can limit where the application can reside in the data center network, which drives more of a need to transition from the traditional data center design.

 

data center network design
Diagram: Data center network design: Equal cost multipath.

 

The Layer 2 domain that the applications may use could be limited to a single server rack connected to one ToR or two ToR for redundancy with a layer 2 interlink between the two ToR switches to pass the Layer 2 traffic. These designs are not optimal, as you must specify where your applications are set. Therefore, putting the breaks on agility. As a result, there was another critical Cisco data center design transition, and this was the introduction to overlay data center designs.

 

Cisco ACI Overview

Cisco data center design: The rise of virtualization

Virtualization is creating a virtual — rather than actual — version of something, such as an operating system (OS), a server, a storage device, or network resources. Virtualization uses software that simulates hardware functionality to create a virtual system. It is creating a virtual version of something like computer hardware. It was initially developed during the mainframe era. With virtualization, the virtual machine could exist on any host. As a result, Layer 2 had to be extended to every switch.

This was problematic for Larger networks as the core switch had to learn every MAC address for every flow that traversed it. To overcome this and take advantage of the convergence and stability of layer 3 networks, overlay networks became the choice for data center networking, along with introducing control plane technologies such as EVPM MPLS.

 

Cisco ACI Overview with overlay networking with VXLAN

VXLAN is an encapsulation protocol that provides data center connectivity using tunneling to stretch Layer 2 connections over an underlying Layer 3 network. VXLAN is the most commonly used protocol in data centers to create a virtual overlay solution that sits on top of the physical network, enabling virtual networks. The VXLAN protocol supports the virtualization of the data center network while addressing the needs of multi-tenant data centers by providing the necessary segmentation on a large scale.

Here we are encapsulating traffic into a VXLAN header and forwarding between VXLAN tunnel endpoints, known as the VTEPs. With overlay networking, we have the overlay and the underlay concept. By encapsulating the traffic into the overlay VXLAN, we now use the underlay, which in the ACI is provided by IS-IS, to provide the Layer 3 stability and redundant paths using Equal Cost Multipathing (ECMP) along with the fast convergence of routing protocols.

 

Horizontal scaling load balancing

 

The Cisco Data Center Design Transition

The Cisco data center design has gone through several stages when you think about it. First, we started with Spanning Tree, moved to Spanning Tree with vPCs, and then replaced the Spanning Tree with FabricPath. FabricPath is what is known as a MAC-in-MAC Encapsulation. Then we replaced Spanning Tree with VXLAN: VXLAN vs VLAN, a MAC-in-IP Encapsulation. Today in the data center, VXLAN is the de facto overlay protocol for data center networking. The Cisco ACI uses an enhanced version of VXLAN to implement both Layer 2 and Layer 3 forwarding with a unified control plane. Replacing SpanningTree with VXLAN, where we have a MAC-in-IP encapsulation, was a welcomed milestone for data center networking.

 

Cisco ACI Overview: Introduction to the ACI Networks

The base of the ACI network is the Cisco Application Centric Infrastructure Fabric (ACI)—the Cisco SDN solution for the data center. Cisco has taken a different approach from the centralized control plane SDN approach with other vendors and has created a scalable data center solution that can be extended to multiple on-premises, public, and private cloud locations. The ACI networks have many components, including Cisco Nexus 9000 Series switches with the APIC Controller running in the spine leaf architecture ACI fabric mode. These components form the building blocks of the ACI, supporting a dynamic integrated physical and virtual infrastructure.

 

The Cisco ACI version

Before Cisco ACI 4.1, the Cisco ACI fabric allowed only a two-tier (spine-and-leaf switch) topology. Each leaf switch is connected to every spine switch in the network with no interconnection between leaf switches or spine switches. Starting from Cisco ACI 4.1, the Cisco ACI fabric allows a multitier (three-tier) fabric and two tiers of leaf switches, which provides the capability for vertical expansion of the Cisco ACI fabric. This is useful to migrate a traditional three-tier architecture of core aggregation access that has been a standard design model for many enterprise networks and is still required today.

 

The APIC Controller

The ACI networks are driven by the Cisco Application Policy Infrastructure Controller ( APIC) database working in a cluster from the management perspective. The APIC is the centralized point of control, and you can do everything you want to configure in the APIC. Consider the APIC to be the brains of the ACI fabric and server as the single source of truth for configuration within the fabric. The APIC controller is a policy engine and holds the defined policy, which tells the other elements in the ACI fabric what to do. This database allows you to manage the network as a single entity. 

In summary, the APIC is the infrastructure controller and is the main architectural component of the Cisco ACI solution. It is the unified point of automation and management for the Cisco ACI fabric, policy enforcement, and health monitoring. The APIC is not involved in data plane forwarding.

data center layout
Diagram: Data center layout: The Cisco APIC controller

 

The APIC represents the management plane, allowing the system to maintain the control and data plane in the network. The APIC is not the control plane device, nor does it sit in the data traffic path. Remember that the APIC controller can crash, and you still have forwarded in the fabric. The ACI solution is not an SDN centralized control plane approach. The ACI is a distributed fabric with independent control planes on all fabric switches. 

 

Cisco Data Center Design: The Leaf and Spine 

Leaf-spine is a two-layer data center network topology for data centers that experience more east-west network traffic than north-south traffic. The topology comprises leaf switches (servers and storage connect) and spine switches (to which leaf switches connect). In this two-tier Clos architecture, every lower-tier switch (leaf layer) is connected to each top-tier switch (Spine layer) in a full-mesh topology. The leaf layer consists of access switches connecting to devices like servers.

The Spine layer is the network’s backbone and interconnects all Leaf switches. Every Leaf switch connects to every spine switch in the fabric. The path is randomly chosen, so the traffic load is evenly distributed among the top-tier switches. Therefore, if one of the top-tier switches fails, it would only slightly degrade performance throughout the data center.

Cisco ACI Overview
Diagram: Cisco ACI Overview with Leaf and Spine.

 

Unlike the traditional Cisco data center design, the ACI data center operates with a Leaf and Spine architecture. Now traffic comes in through a device sent from an end host. In the ACI data center, this is known as a Leaf device. We also have the Spine devices that are Layer 3 routers with no unique hardware dependencies. In a primary Leaf and Spine fabric, every Leaf is connected to every Spine. Any endpoint in the fabric is always the same distance regarding hops and latency from every other internal endpoint.

The ACI Spine switches are Clos intermediary switches with many vital functions. Firstly, they exchange routing updates with leaf switches via Intermediate System-to-Intermediate System (IS-IS) and rapidly forward packets between leaf switches. They provide endpoint lookup services to leaf switches through the Council of Oracle Protocol (COOP). They also handle route reflection to the leaf switches using Multiprotocol BGP (MP-BGP).

Cisco ACI Overview
Diagram: Cisco ACI Overview.

The Leaf switches are the ingress/egress points for traffic into and out of the ACI fabric. In addition, they are the connectivity points for the various endpoints that the Cisco ACI supports. The leaf switches provide end-host connectivity. The spines act as a fast, non-blocking Layer 3 forwarding plane that supports Equal Cost Multipathing (ECMP) between any two endpoints in the fabric and uses overlay protocols such as VXLAN under the hood. VXLAN enables any workload to exist anywhere in the fabric. Using VXLAN, we can now have workloads anywhere in the fabric without introducing too much complexity.

 

      • A Key Point: ACI data center and ACI networks

This is a significant improvement to data center networking as we can now have physical or virtual workloads in the same logical layer 2 domain, even when we are running Layer 3 down to each ToR switch. The ACI data center is a scalable solution as the underlay is specifically built to be scalable as more links are added to the topology and resilient when links in the fabric are brought down due to, for example, maintenance or failure. 

 

ACI Networks: The Normalization event

VXLAN is an industry-standard protocol that extends Layer 2 segments over Layer 3 infrastructure to build Layer 2 overlay logical networks. The ACI infrastructure Layer 2 domains reside in the overlay, with isolated broadcast and failure bridge domains. This approach allows the data center network to grow without risking creating too large a failure domain. All traffic in the ACI fabric is normalized as VXLAN packets.

ACI encapsulates external VLAN, VXLAN, and NVGRE packets in a VXLAN packet at the ingress. This is known as ACI encapsulation normalization. As a result, the forwarding in the ACI data center fabric is not limited to or constrained by the encapsulation type or overlay network. If necessary, the ACI bridge domain forwarding policy can be defined to provide standard VLAN behavior where required.

 

A final note: Cisco ACI overview with making traffic ACI-compatible

As a final note in this Cisco ACI overview, let us address the normalization process. When traffic hits the Leaf, there is a normalization event. The normalization takes traffic from the servers to the ACI, making it ACI-compatible. Essentially, we are giving traffic sent from the servers a VXLAN ID to be sent across the ACI fabric. Traffic is normalized, encapsulated with a VXLAN header, and routed across the ACI fabric to the destination Leaf, where the destination endpoint is. This is, in a nutshell, how the ACI Leaf and Spine work. We have a set of leaf switches that connect to the workloads and the spines that connect to the Leaf.

VXLAN is the overlay protocol that carries data traffic across the ACI data center fabric. A key point to this type of architecture is that the Layer 3 boundary is moved to the Leaf. This brings a lot of value and benefits to data center design. This boundary makes more sense as we must route and encapsulate this layer without going to the core layer.

 

Cisco ACI Overview

 

Cisco ACI

Service Level Objectives (SLOs): Customer-centric view

 

 

Service Level Objectives (SLOs)

Site Reliability Engineering (SRE) teams have tools such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets that can guide them on the road to building a reliable system with the customer viewpoint as the metric. These new tools or technologies form the basis for reliability in distributed system and are the core building blocks of a reliable stack that assist with baseline engineering. The first thing you need to understand is the service’s expectations. This introduces the areas of service-level management and its components.

The core concepts of service level management are Service Level Agreement (SLA), Service Level Objectives (SLO), and Service Level Indicators (SLIs). The common indicators used are Availability, latency, duration, and efficiency. Monitoring these indicators to catch problems before your SLO is violated is critical. These are the cornerstone of developing a good SRE practice.

    • SLI: Service level Indicator: A well-defined measure of “successful enough.” It is a quantifiable measurement of whether a given user interaction was good enough. Did it meet the expectation of the users? Does a web page load? Within a specific time. This allows you to categorize whether a given interaction is good or bad.
    • SLO: Service level objective: A top-line target for a fraction of successful interactions.
    • SLA: Service level agreement: consequences. It’s more of a legal construct. 

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following helpful:

  1. Starting Observability
  2. Distributed Firewalls

 



Service Level Objectives

Key Service Level Objectives (slos) Discussion points:


  • Required for baseline engineering. 

  • Components of a Reliable system.

  • Chaos Engineering.

  • The issue with static thresholds.

  • How to approach Reliability.

 

  • A key point: Video on System Reliability and SLOs

The following video will discuss the importance of distributed systems observability and the need to fully comprehend them with practices like Chaos Engineering and Site Reliability Engineering (SRE). In addition, we will again discuss the problems with monitoring and static thresholds.

 

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with Site Reliability Engineering (SRE)

Pioneered by Google to make more scalable and reliable large-scale systems, SRE has become one of today’s most valuable software innovation opportunities. SRE is a concrete opinionated implementation of the DevOps philosophy. The main goals are to create scalable and highly reliable software systems. According to Benjamin Treynor Sloss, the founder of Google’s Site Reliability Team, “SRE is what happens when a software engineer is tasked with what used to be called operations.”

System Reliability Meaning
Diagram: System reliability meaning.

 

 

So, Reliability is not so much a feature but more of a practice that must be prioritized and taken into consideration from the very beginning and is not something that should be added later on. For example, when a system or service is in production. Reliability is the essential feature of any system, and it’s not a feature that a vendor can sell you. So if someone tries to sell you an add-on solution called Reliability, don’t buy it, especially if they offer 100% reliability. Nothing can be 100% reliable all the time. If you strive for 100% reliability, you will miss out on opportunities to perform innovative tasks and the need to experiment and take risks that can help you build better products and services. 

 

Components of a Reliable System

Distributed system

To build reliable systems that can tolerate various failures, the system needs to be distributed so that a problem in one location doesn’t mean your entire service stops operating. So you need to build a system that can handle, for example, a node dying or perform adequately with a certain load. To create a reliable system, you need to understand it fully and what happens when the different components that make up the system reach certain thresholds. This is where practices such as Chaos engineering kubernetes can help you.

 

 

Chaos Engineering 

We can have practices like Chaos Engineering that can confirm your expectations, give you confidence in your system at different levels, and prove you can have certain tolerance levels to Reliability. Chaos Engineering allows you to find weaknesses and vulnerabilities in complex systems. It is an important task that can be automated into your CI/CD pipelines. So you can have various Chaos Engineering verifications before you reach production. And these Chaos Engineering tests, such as load and Latency tests, can all be automated with little or no human interaction. Site Reliability Engineering (SRE) teams often use Chaos Engineering to improve resilience and must be part of your software development/deployment process.  

 

It’s All About Perception: Customer-Centric View

Reliability is all about perception. Suppose the user considers your service unreliable. In that case, you will lose consumer trust as service perception is poor, so it’s important to provide consistency with your services as much as possible. For example, it’s OK to have some outages. Outages are expected, but you can’t have them all the time and for long durations.

Users expect to have outages at some point in time, but not for so long. User Perception is everything; if the user thinks you are unreliable, you are. Therefore you need to have a customer-centric view, and using customer satisfaction is a critical metric to measure. This is where the key components of service management, such as Service Level Objectives (SLO) and Service Level Indicators (SLI), come to play. There is a balance that you need to find between Velocity and Stability. You can’t stop innovation, but you can’t take too many risks. An Error Budget will help you here and Site Reliability Engineering (SRE) principles. 

 

Users experience Static thresholds.

User experience means different things to different sets of users. We now have a model where different service users may be routed through the system in different ways, using different components. Therefore providing experiences that can vary widely. We also know now that the services no longer tend to break in the same few predictable ways over and over.

With complex microservices and many software interactions, we have many unpredictable failures that we have never seen before. These are often referred to as black holes. We should have a few alerts triggered by only focusing on symptoms that directly impact user experience and not because a threshold was reached. If your POD network reaches a certain threshold, this tells you nothing about user experience. You can’t rely on static thresholds anymore, as they have no relationship to customer satisfaction.

If you use static thresholds, they can’t reliably indicate any issues with user experience. Alerts should be set up to detect failures that impact user experience. Traditional monitoring falls short in trying this as it usually has predefined dashboards looking for something that has happened before.

This brings us back to the challenges with traditional metrics-based monitoring; we rely on static thresholds to define optimal system conditions, which have nothing to do with user experience. However, modern systems change shape dynamically under different workloads. Static thresholds for monitoring can’t reflect impacts on user experience. They lack context and are too coarse.

 

How to Approach Reliability 

New tools and technologies

We have new tools, such as distributed tracing. So if the system becomes slow, what is the best way to find the bottleneck? Here you can use Distributed Tracing and Open Telemetry. So the tracing helps us instrument our system, so we figure out where the time has been spent and can be used across distributed microservice architecture to troubleshoot problems. Open Telemetry provides a standardized way of instrumenting our system and providing those traces.

We have already touched on Service Level Objectives, Service Level Indicators, and Error Budget. You want to know why and how something has happened. So we don’t just want to know when something has happened and then react to an event that is not looking from the customer’s perspective. We need to understand if we are meeting Service Level Agreement (SLA) by gathering the number and frequency of the outages and any performance issues. Service Level Objectives (SLO) and Service Level Indicators (SLI) can assist you with measurements. 

Service Level Objectives (SLO) and Service Level Indicators (SLI) not only assist you with measurements. They offer a tool for having better system reliability and form the base for the Reliability Stack. SLIs and SLOs help us interact with Reliability differently and offer us a path to build a reliable system. So now we have the tools and a disciple to use the tools within. Can you recall what that disciple is? The discipline is Site Reliability Engineering (SRE)

System Reliability Formula
Diagram: System Reliability Formula.

 

  • SLO-Based approach to reliability

If you’re too reliable all the time, you’re also missing out on some of the fundamental features that SLO-based approaches give you. The main area you will miss is the freedom to do what you want, test, and innovate. If you’re too reliable, you’re missing out on opportunities to experiment, perform chaos engineering, ship features quicker than before, or even introduce structured downtime to see how your dependencies react.

To learn a system, you need to break it. So if you are 100% reliable, you can’t touch your system, so you will never truly learn and understand your system. You want to give your users a good experience, but you’ll run out of resources in various ways if you try to ensure this good experience happens 100% of the time. SLOs let you pick a target that lives between those two worlds.

 

  • Balance velocity and stability

So you can’t just have Reliability by itself; you must also have new features and innovation. Therefore, you need to find a balance between velocity and stability. So we need to balance Reliability with other features you have and are proposing to offer. Suppose you have access to a system with a fantastic feature that doesn’t work. The users that have the choice will leave.

So Site Reliability Engineering is the framework for finding the balance between velocity and stability. So how do you know what level of Reliability you need to provide your customer? This all goes back to the business needs that reflect the customer’s expectations. So with SRE, we have a customer-centric approach.

The primary source of outages is making changes even when the changes are planned. This can come in many forms, such as pushing new features, applying security patches, deploying new hardware, and scaling up to meet customer demand will significantly impact if you strive for a 100% reliability target.  We will not have bugs if nothing changes to the physical/logical infrastructure or other components. We can freeze our current user base and never have to scale the system. In reality, this will not happen. There will always be changes. So it would be best if you found a balance.

 

 

auto scaling observability

Observability vs Monitoring

 

Monitoring vs observability

 

Observability vs Monitoring

To understand the difference between observability vs monitoring, we need first to discuss the role of monitoring. Monitoring is the evaluation to help identify the most practical and efficient use of resources. So the big question I put to you is what to monitor. This is the first step to preparing a monitoring strategy. There are a couple of questions you can ask yourself to understand fully if monitoring is enough or if you need to move to an observability platform. Firstly, you should consider what you should be monitoring, why you should be monitoring, and how to monitor it.? 

Knowing this lets you move into the different tools and platforms available. Some of these tools will be open source, and others commercial. When evaluating these tools, one word of caution: does each tool work in a silo, or can it be used across technical domains? Silos are breaking agility in every form of technology.

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following posts helpful:

  1. Microservices Observability
  2. Auto Scaling Observability
  3. Network Visibility
  4. WAN Monitoring
  5. Distributed Systems Observability

 



Monitoring vs Observability

Key Observability vs Monitoring Discussion points:


  • The difference between Monitoring vs Observability. 

  • Google's four Golden signals.

  • The role of metrics, logs and alerts.

  • The need for Observability.

  • Observability and Monitoring working together.

 

  • A key point: Video on Observability vs. Monitoring

In the following video, We will start by discussing how our approach to monitoring needs to adapt to the current megatrends, such as the rise of microservices. Failures are unknown and unpredictable. Therefore a pre-defined monitoring dashboard will have difficulty keeping up with the rate of change and unknown failure modes. For this, we should look to have the practice of observability for software and monitoring for infrastructure.

 

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with monitoring and distributed systems

By utilizing distributed architectures, the cloud native ecosystem allows organizations to build scalable, resilient, and novel software architectures. But the ever-changing nature of distributed systems means that previous approaches to monitoring can no longer keep up. The introduction of containers made the cloud flexible and empowered distributed systems. Nevertheless, the ever-changing nature of these systems can cause them to fail in many ways. Distributed systems are inherently complex, and, as systems theorist Richard Cook notes, “Complex systems are intrinsically hazardous systems.”

Cloud-native systems require a new approach to monitoring, one that is open-source compatible, scalable, reliable, and able to control massive data growth. But cloud-native monitoring can’t exist in a vacuum: it needs to be part of a broader observability strategy.

 

Observability vs Monitoring
Diagram: Observability vs monitoring.

 

The Starting Point: Observability vs Monitoring

You need to measure and gather the correct event information in your environment, which will be done with several tools. This will let you know what is affecting your application performance and infrastructure. As a good starting point, there are four golden signals for Latency, saturation, traffic, and errors. These are Google’s Four Golden Signals. The four most important metrics to keep track of are: 

      1. Latency: How long it takes to serve a request
      2. Traffic: The number of requests being made.
      3. Errors: The rate of failing requests. 
      4. Saturation: How utilized the service is.

So now we have some guidance on what to monitor, and let us apply this to Kubernetes to, for example, let’s say, a frontend web service that is part of a tiered application, we would be looking at the following:

      1. How many requests is the front end processing at a particular point in time,
      2. How many 500 errors are users of the service received, and 
      3. Does the request overutilize the service?

So we already know that monitoring is a form of evaluation to help identify the most practical and efficient use of resources. With monitoring, we observe and check the progress or quality of something over time. So within this, we have metrics, logs, and alerts. Each has a different role and purpose.

 

Monitoring: The role of metrics

Metrics are related to some entity and allow you to view how many resources you consume. The metric data consists of numeric values instead of unstructured text, such as documents and web pages. Metric data is typically also time series, where values or measures are recorded over some time.  An example of such metrics would be available bandwidth and latency. It is essential to understand baseline values. Without a baseline, you will not know if something is happening outside the norm.

What are the average baseline values for bandwidth and latency metrics? Are there any fluctuations in these metrics? How do these values rise and fall during normal operations and peak usage? And this may change over the different days in the week and months.

If, during normal operations, you notice a rise in these values. This would be deemed abnormal and should act as a trigger that something could be wrong and needs to be investigated. Remember that these values should not be gathered as a once-off but can be gathered over time to understand your application and its underlying infrastructure better.

 

Monitoring: The role of logs

Logging is an essential part of troubleshooting application and infrastructure performance. Logs give you additional information about the events. This is important for troubleshooting or discovering the root cause of the events. Logs will have a lot more detail than metrics. So you will need some way to parse the logs or use a log shipper. A typical log shipper will take these logs from the standard out in a Docker container and ship them to a backend for processing.

FluentD or Logstash has its pros and cons and can be used here to the group and sent to a backend database that could be the ELK stack ( Elastic Search). Using this approach, you can add different things to logs before sending them to the backend. For example, you can add GEO IP information. And this will add richer information to the logs that can help you troubleshoot.

 

Monitoring: The role of alerting

Then we have the alerting, and it would be best to balance how you monitor and what you alert on. So, we know that alerting is not always perfect, and getting the right alerting strategy in place will take time. It’s not a simple day-one installation and requires much effort and cross-team collaboration. You know that alerting on too much will cause alert fatigue. And we are all too familiar with the problems alert fatigue can bring and tensions to departments.

To minimize this, consider Service Level Objective (SLO) for alerts. SLOs are measurable characteristics such as availability, throughput, frequency, and response times. Service Level Objectives are the foundation for a reliability stack. Also, it would be best if you also considered alert thresholds. If these are too short, you will get a lot of false positives on your alerts. 

 

Monitoring is not enough.

Even with all of these in place, monitoring is not enough. Due to the sheer complexity of today’s landscape, you need to consider and think differently about the tools you use and how you use the intelligence and data you receive from them to resolve issues before they become incidents.  That monitoring by itself is not enough.

The tool used to monitor is just a tool that probably does not cross technical domains, and different groups of users will administer each tool without a holistic view. The tools alone can take you only half the way through the journey.  Also, what needs to be addressed is the culture and the traditional way of working in silos. A siloed environment can affect the monitoring strategy you want to implement. Here you can look into an Observability platform.

 

Observability vs Monitoring

So when it comes to observability vs monitoring, we know that monitoring can detect problems and tell you if a system is down, and when your system is UP, Monitoring doesn’t care. Monitoring only cares when there is a problem. The problem has to happen before monitoring takes action. It’s very reactive. So if everything is working, monitoring doesn’t care.

On the other hand, we have an Observability platform, a more proactive practice. It’s about what and how your system and services are doing. Observability lets you improve your insight into how complex systems work and let’s quickly get to the root cause of any problem, known and unknown. Observability is best suited for interrogating systems to explicitly discover the source of any problem, along any dimension or combination of dimensions, without first predicting. This is a proactive approach.

 

The pillars of observability

This is achieved by combining logs, metrics, and traces. So we need data collection, storage, and analysis across these domains. While also being able to perform alerting on what matters most. Let’s say you want to draw correlations between units like TCP/IP packets and HTTP errors experienced by your app. The Observability platform pulls the context from different sources of information like logs, metrics, events, and traces into one central context. Distributed tracing adds a lot of value here.

Also, when everything is placed into one context, you can quickly switch between the necessary views to troubleshoot the root cause. An excellent key component of any observability system is the ability to view these telemetry sources with one single pane of glass. 

Distributed Tracing in Microservices
Diagram: Distributed tracing in microservices.

 

Known and Unknown / Observability Unknown and Unknown

Monitoring automatically reports whether known failure conditions are occurring or are about to occur. In other words, they are optimized for reporting on unknown conditions about known failure modes. This is referred to as known unknowns. In contrast, Observability is centered around discovering if and why previously unknown failure modes may be occurring: in other words, to discover unknown unknowns.

The monitoring-based approach of metrics and dashboards is an investigative practice that leads with the experience and intuition of humans to detect and make sense of system issues. This is okay for a simple legacy system that fails in predictable ways, but the instinctual technique falls short for modern systems that fail in unpredictable ways.

With modern applications, the complexity and scale of their underlying systems quickly make that approach unattainable, and we can’t rely on hunches. Observability tools differ from traditional monitoring tools because they enable engineers to investigate any system, no matter how complex. You don’t need to react to a hunch or have intimate system knowledge to generate a hunch.

 

Monitoring vs Observability: Working together?

Monitoring best helps engineers understand infrastructure concerns. While Observability best helps engineers understand software concerns. So Observability and Monitoring can work together. First, the infrastructure does not change too often, and when it fails, it will fail more predictably. So we can use monitoring here.

This is compared to software system states that change daily and are unpredictable. Observability fits this purpose. The conditions that affect infrastructure health change infrequently, relatively easier to predict. We have several well-established practices to predict, such as capacity planning and the ability to remediate automatically (e.g., as auto-scaling in a Kubernetes environment. All of which can be used to tackle these types of known issues. 

 

Monitoring and infrastructure problems

Due to its relatively predictable and slowly changing nature, the aggregated metrics approach monitors and alerts perfectly for infrastructure problems. So here, a metric-based system works well. Metrics-based systems and their associated alerts help you see when capacity limits or known error conditions of underlying systems are being reached. Now we need to look at monitoring the Software. Now we need access to high-cardinality fields. This may include the user id or a shopping cart id. Code that is well-instrumented for Observability allows you to answer complex questions that are easy to miss when examining aggregate performance.

 

Monitoring vs observability

 

OpenShift Security Context Constraints

OpenShift Security Best Practices

 

OpenShift Security Best Practices

 

OpenShift Security Best Practices

In this post, we will discuss OpenShift security best practices. We will learn about various identity providers that implement authentication in OpenShift, and service accounts and gain an understanding of the connection between users and identities. We will also discuss the authorization process and granting user privileges, admission controllers, and security context constraints.

OpenShift has stricter security policies than Kubernetes. For instance, it is forbidden to run a container as root. It also offers a secure-by-default option to enhance security. Kubernetes doesn’t have built-in authentication or authorization capabilities, so developers must manually create bearer tokens and other authentication procedures.

OpenShift provides a range of security features, including role-based access control (RBAC), image scanning, and container isolation, that help ensure the safety of containerized applications.

 

Preliminary Information: Useful Links to Relevant Content

For useful pre-information on OpenShift basics, you may visit the following posts helpful:

  1. OpenShift Networking
  2. Kubernetes Security Best Practice
  3. Container Networking
  4. Identity Security

 



OpenShift Security.

Key Observability Security Best Practices points:


  • The traditional fixed stack security architecture. 

  • Microservices: Many different entry points.

  • Docker container attack vectors.

  • Security Context Constraints.

  • OpenShift network security.

 

  • A key point: Video on OpenShift security and OpenShift best practices.

In the following video, we will start by discussing OpenShift network security and how Containers in non-OpenShift environments have some pretty big security concerns when left to their defaults. This video will discuss host security issues and how OpenShift SDN overcomes them with Security Context Constraints. By default, all pods, except those for builds and deployments, use a default service account assigned by the restricted SCC, which doesn’t allow privileged containers – those running under the root user and listening on privileged ports (<1024).

 

 

A Key Point: Knowledge Check 

 

Back to Basics: Starting OpenShift Security with OpenShift Best Practices

Generic: Securing containerized environments

Securing containerized environments is considerably different from securing the traditional monolithic application because of the inherent nature of the microservices architecture. We went from one to many, and there is a clear difference in attack surface and entry points. So there is much to consider for OpenShift network security and OpenShift security best practices, including many Docker security options.

The application stack previously had very few components, maybe just a cache, web server, and database separated and protected by a context firewall. The most common network service allows a source to reach an application, and the sole purpose of the network is to provide endpoint reachability.

As a result, the monolithic application has few entry points, for example, ports 80 and 443. Not every monolithic component is exposed to external access and must accept requests directly. And we designed our networks around these facts. The following diagram provides information on the threats you must consider for container security.

container security
Diagram: Container security. Source Neuvector

 

  • A key point: Back to basics. OpenShift Security

OpenShift delivers all the tools you need to run software on top of it with SRE paradigms, from a monitoring platform to an integrated CI/CD system that you can use to monitor and run both the software deployed to the OpenShift cluster, as well as the cluster itself. So the cluster needs to be secured along with the workload that runs in the cluster. 

From a security standpoint, OpenShift provides robust encryption controls to protect sensitive data, including platform secrets and application configuration data. In addition, OpenShift optionally utilizes FIPS 140-2 Level 1 compliant encryption modules to meet security standards for U.S. federal departments.

This post highlights OpenShift security and provides security best practices and considerations when planning and operating your OpenShift cluster. These will give you a starting point. However, as clusters, as well as bad actors, are ever-evolving, it is of significant importance to revise the steps you took.

 

OpenShift Security

Central security architecture

Therefore, we often see security enforcement in a fixed central place in the network infrastructure. This could be, for example, a central security stack consisting of several security appliances. We are often referred to as a kludge of devices. As a result, the individual components within the application need not worry about carrying out any security checks as it occurs centrally for them.

On the other hand, with the common microservices architecture, those internal components are specifically designed to operate independently and accept requests independently, which brings considerable benefits to scaling and deploying pipelines. However, each component may now have entry points and accept external connections. Therefore, they need to be concerned with security individually and not rely on a central security stack to do this for them.

OpenShift Security Guide
Diagram: OpenShift security best practices.

 

The different container attack vectors 

These changes have considerable consequences for security and how you approach your OpenShift security best practices. The security principles still apply, and we still are concerned with reducing the blast radius, least privileges, etc. Still, they must be applied from a different perspective and to multiple new components in a layered approach. Security is never done in isolation.

So as the number of entry points to the system increases, the attack surface broadens, leading us to several docker container security attack vectors not seen with the monolithic. We have, for example, attacks on the Host, images, supply chain, and container runtime. There is also a considerable increase in the rate of change for these types of environments; an old joke says that a secure application is an application stack with no changes.

So when you change, you can open the door to a bad actor. Today’s application changes considerably a few times daily for an agile stack. We have unit and security tests and other safety tests that can reduce mistakes, but no matter how much preparation you do, there is a chance of a breach whenever there is a change. So we have environmental changes that affect security and some alarming technical challenges to how containers run as default, such as running as root by default and with an alarming amount of capabilities and privileges. The following image displays attack vectors that are specifically linked to containers.

container attack vectors
Diagram: Container attack vectors. Source Adriancitu

 

Challenges with Securing Containers

  • Containers running as root

So, as you know, containers run as root by default and share the Kernel of the Host OS, and the container process is visible from the Host. This in itself is a considerable security risk when a container compromise occurs. When a security vulnerability in the container runtime arose, and a container escape was performed, as the application ran as root, it could become root on the underlying Host. Therefore, if a bad actor gets access to the Host and has the correct privileges, it can compromise all the hosts’ containers.

 

  • Risky Configuration

Containers often run with excessive privileges and capabilities—much more than they need to do their job efficiently. As a result, we need to consider what privileges the container has and whether it runs with any unnecessary capabilities it does not need. Some of the capabilities a container may have are defaults that fall under risky configurations and should be avoided. You want to keep an eye on the CAP_SYS_ADMIN. This flag grants access to an extensive range of privileged activities.

The container has isolation boundaries by default with namespace and control groups ( when configured correctly). However, granting the excessive container capabilities will weaken the isolation between the container and this Host and other containers on the same Host. Essentially, removing or dissolving the container’s ring-fence capabilities.

 

Starting OpenShift Security Best Practices

Then we have security with OpenShift that overcomes many of the default security risks you have with running containers. And OpenShift does much of this out of the box. If you want further information on securing an OpenShift cluster, kindly check out my course for Pluralsight on OpenShift Security and OpenShift Network Security. OpenShift Container Platform (formerly known as OpenShift Enterprise) or OCP is Red Hat’s offering for the on-premises private platform (PaaS). OpenShift is based on the Origin open-source project and is a Kubernetes distribution.

The foundation of the OpenShift Container Platform and OpenShift Network Security is based on Kubernetes and therefore shares some of the same networking technology and some enhancements. However, as you know, Kubernetes is a complex beast and can lack by itself when trying to secure clusters.  OpenShift does an excellent job of wrapping Kubernetes in a layer of security, such as using Security Context Constraints (SCCs) that give your cluster a good security base.

 

Security Context Constraints

By default, OpenShift prevents the cluster container from accessing protected functions. These functions — Linux features such as shared file systems, root access, and some core capabilities such as the KILL command — can affect other containers running in the same Linux kernel, so the cluster limits access. Most cloud-native applications work fine with these limitations, but some (especially stateful workloads) need greater access. Applications that need these functions can still use them but need the cluster’s permission.

The application’s security context specifies the permissions that the application needs, while the cluster’s security context constraints specify the permissions that the cluster allows. An SC with an SCC enables an application to request access while limiting the access that the cluster will grant.

 

What are security contexts and security context constraints?

A pod configures a container’s access with permissions requested in the pod’s security context and approved by the cluster’s security context constraints:

security context (SC), defined in a pod, enables a deployer to specify a container’s permissions to access protected functions. When the pod creates the container, it configures it to allow these permissions and block all others. The cluster will only deploy the pod if the permissions it requests are allowed by a corresponding SCC.

security context constraint (SCC), defined in a cluster, enables an administrator to control permissions for pods, permissions that manage containers’ access to protected Linux functions. Similarly to how role-based access control (RBAC) manages users’ access to a cluster’s resources, an SCC manages pods’ access to Linux functions. By default, a pod is assigned an SCC named restricted that blocks access to protected functions in OpenShift v4.10 or earlier. Instead, in OpenShift v4.11 and later, the restricted-v2 SCC is used by default. For an application to access protected functions, the cluster must make an SCC that allows it to be available to the pod.

While an SCC grants access to protected functions, each pod needing access must request it. To request access to the functions its application needs, a pod specifies those permissions in the security context field of the pod manifest. The manifest also specifies the service account that should be able to grant this access. When the manifest is deployed, the cluster associates the pod with the service account associated with the SCC. For the cluster to deploy the pod, the SCC must grant the permissions that the pod requests.

One way to envision this relationship is to think of the SCC as a lock that protects Linux functions, while the manifest is the key. The pod is allowed to deploy only if the key fits.

Security Context Constraints
Diagram: Security Context Constraints. Source is IBM

 

 

A final note: Security context constraint

When your application is deployed to OpenShift in a virtual data center design, the default security model will enforce that it is run using an assigned Unix user ID unique to the project you are deploying it to. Now we can prevent images from being run as the Unix root user. When hosting an application using OpenShift, the user ID that a container runs as will be assigned based on which project it is running in.

Containers cannot run as the root user by default—a big win for security. SCC also allows you to set different restrictions and security configurations for PODs. So, instead of allowing your image to run as the root, which is a considerable security risk, you should run as an arbitrary user by specifying an unprivileged USER, setting the appropriate permissions on files and directories, and configuring your application to listen on unprivileged ports.

 

OpenShift Security Context Constraints
Diagram: OpenShift security context constraints.

 

OpenShift Network Security: SCC defaults access

Security context constraints let you drop privileges by default, which is essential and still the best practice. Red Hat OpenShift security context constraints (SCCs) ensure that, by default, no privileged containers run on OpenShift worker nodes—another big win for security. Access to the host network and host process IDs are denied by default. Users with the required permissions can adjust the default SCC policies to be more permissive.

So when considering SCC, think of SCC admission controllers as restricting POD access, similar to how RBAC restricts user access. To control the behavior of pods, we have security context constraints (SCCs). These cluster-level resources define what resources pods can access and provide additional control.  Security context constraints let you drop privileges by default, a critical best practice. With Red Hat OpenShift SCCs, no privileged containers run on OpenShift worker nodes. Access to the host network and host process IDs are denied by default—a big win for OpenShift security.

 

Restricted security context constraints (SCCs)

A few SCC are available by default, and you may have the head of the restricted SCC. By default, all pods, except those for builds and deployments, use a default service account assigned by the restricted SCC, which doesn’t allow privileged containers – that is, those running under the root user and listening on privileged ports are ports under <1024. SCC can be used to manage the following:

 

    1. Privilege Mode: this setting allows or denies a container running in privilege mode. As you know, privilege mode bypass any restriction such as control groups, Linux capabilities, secure computing profiles, 
    2. Privilege Escalation: This setting enables or disables privilege escalation inside the container ( all privilege escalation flags)
    3. Linux Capabilities: This setting allows the addition or removal of specific Linux capabilities
    4. Seccomp profile – this setting shows which secure computing profiles are used in a pod.
    5. Root-only file system: this makes the root file system read-only 

 

The goal is to assign the fewest possible capabilities for a pod to function fully. This least-privileged model ensures that pods can’t perform tasks on the system that aren’t related to their application’s proper function. The default value for the privileged option is False; setting the privileged option to True is the same as giving the pod the capabilities of the root user on the system. Although doing so shouldn’t be common practice, privileged pods can be helpful under certain circumstances. 

 

OpenShift Network Security: Authentication

The term authentication refers to the process of validating one’s identity. Usually, users aren’t created in OpenShift but are provided by an external entity, such as the LDAP server or GitHub. The only part where OpenShift steps in is authorization—determining roles and permissions for a user. OpenShift supports integration with various identity management solutions in corporate environments, such as FreeIPA/Identity Management, Active Directory, GitHub, Gitlab, OpenStack Keystone, and OpenID.

 

OpenShift Network Security: Users and identities

A user is any human actor who can request the OpenShift API to access resources and perform actions. Users are typically created in an external identity provider, usually a corporate identity management solution such as Lightweight Directory Access Protocol (LDAP) or Active Directory.

To support multiple identity providers, OpenShift relies on the concept of identities as a bridge between users and identity providers. A new user and identity are created upon the first login by default. There are four ways to map users to identities:

 

Service accounts

Service accounts allow us to control API access without sharing users’ credentials. Pods and other non-human actors use them to perform various actions and are a central vehicle by which their access to resources is managed. By default, three service accounts are created in each project:

 

Authorization and role-based access control

Authorization in OpenShift is built around the following concepts:

Rules: Sets of actions allowed to be performed on specific resources.
Roles: Collections of rules that allow them to be applied to a user according to a specific user profile. Roles can be applied either at the cluster or project level.
Role bindings: Associations between users/groups and roles. A given user or group can be associated with multiple roles.

If pre-defined roles aren’t sufficient, you can always create custom roles with just the specific rules you need.

 

 

System Observability

Distributed Systems Observability

 

 

Distributed Systems Observability

We have had a considerable drive with innovation that has spawned several megatrends that have affected how we manage and view our network infrastructure and the need for distributed systems observability. We have seen the decomposition of everything from one to many. Many services and dependencies in multiple locations, aka microservices observability, must be managed and operated instead of the monolithic where everything is generally housed internally. The megatrends have resulted in a dynamic infrastructure with new failure modes not seen in the monolithic, forcing us to look a different systems observability tools and network visibility practices. 

There has also been a shift in the point of control. We move towards new technologies, and many of these loosely coupled services or infrastructures your services lay upon are not under your control. The edge of control has been pushed, creating different network and security perimeters. These parameters are now closer to the workload than a central security stack. Therefore the workloads themselves are concerned with security.

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following posts helpful:

  1. Observability vs Monitoring 

 



Distributed Systems Observability.

Key Distributed Systems Observability points:


  • We no longer have predictable failures.

  • The different demands on networks.

  • The issues with the metric-based approach.

  • Static thresholds and alerting.

  • The 3 pillars of Observability.

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with distributed systems.

Today’s world of always-on applications and APIs has availability and reliability requirements that would have been needed of solely a handful of mission-critical services around the globe only a few decades ago. Likewise, the potential for rapid, viral service growth means that every application has to be built to scale nearly instantly in response to user demand. Finally, these constraints and requirements mean that almost every application made—whether a consumer mobile app or a backend payments application—needs to be a distributed system. A distributed system is an environment where different components are spread across multiple computers on a network. These devices split up the work, harmonizing their efforts to complete the job more efficiently than if a single device had been responsible.

 

System Observability Design
Diagram: Systems Observability design.

 

How This Affects Failures

The primary issue I have seen with my clients is that application failures are no longer predictable, and dynamic systems can fail creatively, challenging existing monitoring solutions. But, more importantly, the practices that support them. We have a lot of partial failures that are not just unexpected but not known or have never been seen before. For example, if you recall, we have the network hero. 

 

The network hero

It is someone who knows every part of the network and has seen every failure at least once. They are no longer helpful in today’s world and need proper Observability. When I was working as an Engineer, we would have plenty of failures, but more than likely, we would have seen them before. And there was a system in place to fix the error. Today’s environment is much different.

We can no longer rely on simply seeing either a UP or Down, setting static thresholds, and then alerting based on those thresholds. A key point to note at this stage is that none of these thresholds considers the customer’s perspective.  If your POD runs at 80% CPU, does that mean the customer is unhappy? When monitoring, you should look from your customer’s perspectives and what matters to them. Content Delivery Network (CDN) was one of the first to realize this game and measure what matters most to the customer.

 

 

Distributed Systems Observability

The different demands

So the new modern and complex distributed systems place very different demands on your infrastructure and the people that manage the infrastructure. For example, in microservices, there can be several problems with a particular microservice:

    • The microservices could be running under high resource utilization and, therefore, slow to respond, causing a timeout
    • The microservices could have crashed or been stopped and is, therefore, unavailable
    • The microservices could be fine, but there could be slow-running database queries.
    • So we have a lot of partial failures. 

 

Therefore: We can no longer predict

The big shift we see with software platforms is that they evolve much quicker than the products and paradigms we use to monitor them. As a result, we need to consider new practices and technologies with dedicated platform teams and good system observability. We can’t predict anything anymore, which puts the brakes on some traditional monitoring approaches, especially the metrics-based approach to monitoring.

I’m not saying that these monitoring tools are not doing what you want them to do. But, they work in a siloed environment, and there is a lack of connectivity. So we have monitoring tools working in silos in different parts of the organization and more than likely managed by different people trying to monitor a very dispersed application with multiple components and services in various places. 

 

Relying On Known Failures

Metric-Based Approach

A metrics-based monitoring approach relies on having previously encountered known failure modes. The metric-based approach relies on known failures and predictable failure modes. So we have predictable thresholds that someone is considered to experience abnormal. Monitoring can detect when these systems are either over or under the predictable thresholds that are previously set. And then, we can set alerts and hope that these alerts are actionable. This is only useful for variants of predictable failure modes.

Traditional metrics and monitoring tools can tell you any performance spikes or notice that a problem occurs. But they don’t let you dig into the source of problems and let us slice and dice or see correlations between errors. If the system is complex, this approach is harder to get to the root cause in a reasonable timeframe.

 

Traditional style metrics systems

With the traditional style metrics systems, you had to define custom metrics, which were always defined upfront. So with this approach, we can’t start to ask new questions about problems. So it would be best if you defined the questions to ask upfront. Then we set performance thresholds, pronounce them “good” or “bad, ” and check and re-check those thresholds. We would tweak the thresholds over time, but that was about it. This monitoring style has been the de facto approach, but we don’t now want to predict how a system can fail. Always observe instead of waiting for problems, such as reaching a certain threshold before acting.

System Observability Analysis
Diagram: System Observability analysis.

 

Metrics: Lack of connective event

Metrics did not retain the connective event. As a result, you cannot ask new questions in the existing dataset.  These traditional system metrics could miss unexpected failure modes in complex distributed systems. Also, the condition detected via system metrics might be unrelated to what is happening. An example of this could be an odd number of running threads on one component that might indicate garbage collection is in progress. It might also indicate that slow response times might be imminent in an upstream service.

 

Users experience static thresholds.

User experience means different things to different sets of users. We now have a model where different service users may be routed through the system in different ways, using different components and providing experiences that can vary widely. We also know now that the services no longer tend to break in the same few predictable ways over and over.  We should have a few alerts triggered by only focusing on symptoms that directly impact user experience and not because a threshold was reached.

 

  • The Challenge: Can’t reliably indicate any issues with user experience

If you use static thresholds, they can’t reliably indicate any issues with user experience. Alerts should be set up to detect failures that impact user experience. Traditional monitoring falls short in trying to do this. With traditional metrics-based monitoring, we rely on static thresholds to define optimal system conditions, which have nothing to do with user experience. However, modern systems change shape dynamically under different workloads. Static thresholds for monitoring can’t reflect impacts on user experience. They lack context and are too coarse.

 

The Need For Distributed Systems Observability

Systems observability and reliability in distributed system is a practice. Rather than just focusing on a tool that does logging, metrics, or altering, Observability is all about how you approach problems, and for this, you need to look at your culture. So you could say that Observability is a cultural practice and allows you to be proactive to findings instead of relying on a reactive approach that we are used to in the past.

Nowadays, we need a different viewpoint and want to see everything from one place. You want to know how the application works and how it interacts with the other infrastructure components, such as the underlying servers, physical or server, the network, and how data transfer looks in a transfer and stale state.  What level of observation do you need so you know that everything is performing as it should? And what should you be looking at to get this level of detail?

Monitoring is knowing the data points and the entities we are gathering from. On the other hand, Observability is like putting all the data together. So monitoring is collecting data, and Observability is putting it together in one single pane of glass. Observability is observing the different patterns and deviations from the baseline; monitoring is getting the data and putting it into the systems. A vital part of an Observability toolkit is service level objectives (slos).

 

The three pillars of distributed systems observability

We have three pillars of Systems Observability. There are Metrics, Traces, and Logging. So it is an oversimplification to define or view Observability as having these pillars. But for Observability, you need these in place. Observability is all about connecting the dots from each of these pillars. If someone asked me which one I prefer, it would be distributed tracing. Distributed tracing allows you to visualize each step in service request executions. As a result, it doesn’t matter if services have complex dependencies. You could say that the complexity of the Dynamic systems is abstracted with distributed tracing.

 

  • Use Case: Challenges without tracing.

For example, latency can stack up if a downstream database service experiences performance bottlenecks. As a result, the end-to-end latency is high. When latency is detected three or four layers upstream, it can be complicated to identify which component of the system is the root of the problem because now that same latency is being seen in dozens of other services.

 

  • Distributed tracing: A winning formula

Modern distributed systems tend to scale into a tangled knot of dependencies. Therefore, distributed tracing shows the relationships between various services and components in a distributed system. Traces help you understand system interdependencies. Unfortunately, those inter-dependencies can obscure problems and make them challenging to debug unless their relationships are clearly understood.

 

 

system reliability

Reliability In Distributed System

 

 

Reliability In Distributed System

When considering reliability in a distributed system, considerable shifts in our environmental landscape have caused us to examine how we operate and run our systems and networks. We have had a mega shift with the introduction of various cloud platforms and their services and containers, along with the complexity of managing distributed systems observability and microservices observability that unveil significant gaps in current practices in our technologies. Not to mention the flaws with the operational practices around these technologies.

This has caused a knee-jerk reaction to a welcomed drive-in innovation to system reliability. Yet, some technologies and tools used to manage these innovations do not align with the innovative events. Many of these tools have stayed relatively static in our dynamic environment. So we have static tools used in a dynamic environment, which causes friction to reliability in distributed systems and the rise for more efficient network visibility.

 

Preliminary Information: Useful Links to Relevant Content

Before you proceed, you may find the following post helpful:

  1. Distributed Firewalls

 



Reliability In Distributed Systems


Key Reliability in Distributed System Discussion Points:


  • Complexity managing distributed systems.

  • Static tools in a dynamic environment.

  • Observability vs Monitoring.

  • Creative failures and black holes.

  • SRE teams and service level objectives.

  • New tools: Disributed tracing.

 

  • A key point: Video reliability in the distributed system

In the following video, we will discuss the essential feature of any system, reliability, which is not a feature that a vendor can sell you. We will discuss the importance of distributed systems and the need to fully understand them with practices like Chaos Engineering and Site Reliability Engineering (SRE). We will also discuss the issues with monitoring and static thresholds.

 

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with distributed systems.

Distributed systems are required to implement the reliability, agility, and scale expected of modern computer programs. Distributed systems are applications of many different components running on many other machines. Containers are the foundational building block, and groups of containers co-located on a single device comprise the atomic elements of distributed system patterns.

Distributed System Observability

The big shift we see with software platforms is that they evolve much quicker than the products and paradigms we use to monitor them. We need to consider new practices and technologies with dedicated platform teams to enable a new era of system reliability in a distributed system. Along with the practices of Observability that are a step up to the traditional monitoring of static infrastructure: Observability vs monitoring.

 

Lack of Connective Event: Traditional Monitoring

If you examine traditional monitoring systems, they look to capture and examine signals in isolation. The monitoring systems work in a siloed environment, similar to developers and operators before the rise of DevOps. Existing monitoring systems cannot detect the “Unknowns Unknowns” familiar with modern distributed systems. This often leads to disruptions of services. So you may be asking what an “Unknown Unknown” is.

I’ll put it to you this way: distributed systems we see today don’t have much sense of predictability—certainly not enough predictability to rely on static thresholds, alerts, and old monitoring tools. If something is static, it can be automated, and we have static events such as in Kubernetes, a POD reaching a limit. Then a replica set introduces another pod on a different node if specific parameters are met, such as Kubernetes Labels and Node Selectors. However, this is only a tiny piece of the failure puzzle in a distributed environment.  Today, we have what’s known as partial failures and systems that fail in very creative ways.

 

Reliability In Distributed System: Creative ways to fail

So we know that some of these failures are quickly predicted, and actions are taken. For example, if this Kubernetes POD node reaches a certain utilization, we can automatically reschedule PODs on a different node to stay within our known scale limits. We have predictable failures that can be automated, not just in Kubernetes but with any infrastructure. An Ansible script is useful when we have predictable events. However, we have much more to deal with than POD scaling; we have many partial and complicated failures known as black holes.

 

In today’s world of partial failures

Microservices applications are distributed and susceptible to many external factors. On the other hand, if you examine the traditional monolithic application style, all the functions reside in the same process. It was either switched ON or OFF!! Not much happened in between. So if there is a failure in the process, the application as a whole will fail. The results are binary, usually either a UP or Down.

And with some essential monitoring, this was easy to detect, and failures were predictable. There was no such thing as a partial failure. In a monolith application, all application functions are within the same process. And a significant benefit of these monoliths is that you don’t have partial failures.

However, in a cloud-native world, where we have broken the old monolith into a microservices-based application, a request made from a client can go through multiple hops of microservices, and we can have several problems to deal with. There is a lack of connectivity between the different domains. Many monitoring tools and knowledge will be tied to each domain, and alerts are often tied to thresholds or rate-of-change violations that have nothing to do with user satisfaction. User satisfaction is a critical metric to care about.

 

System reliability: Today, you have no way to predict

So the new modern and complex distributed systems place very different demands on your infrastructure—considerably different from the simple three-tier application where everything was generally housed in one location.  We can’t predict anything anymore, which puts the brakes on traditional monitoring approaches. When you can no longer predict what will happen, you can no longer rely on a reactive approach to monitoring and management. The move towards a proactive approach to system reliability is a welcomed strategy.

 

 

  • A note on Blackholes: Strange failure modes

When considering a distributed system, many things can happen. A service or region can disappear or disappear for a few seconds or ms and reappear. We consider this as going into a black hole when we have strange failure modes. So when anything goes into it will disappear. So strange failure modes are unexpected and surprising. There is certainly nothing predictable about strange failure modes. So what happens when your banking transactions are in a black hole? What if your banking balance is displayed incorrectly or if you make a transfer to an external account and it does not show up? 

 

Highlighting Site Reliability Engineering (SRE) and Observability

Site Reliability Engineering (SRE) and Observability practices are needed to manage these types of unpredictability and unknown failures. SRE is about making systems more reliable. And everyone has a different way of implementing SRE practices. Usually, about 20% of your issues cause 80% of your problems. You need to be proactive and fix these issues upfront. You need to be able to get ahead of the curve and do these things to stop the incidents from occurring. This usually happens in the wake of a massive incident. This usually acts as a teachable moment. This gives the power to be the reason to listen to a Chaos Engineering project. 

 

New tools and technologies: Distributed tracing

We have new tools, such as distributed tracing. So if the system becomes slow, what is the best way to find the bottleneck? Here you can use Distributed Tracing and Open Telemetry. So the tracing helps us instrument our system, so we figure out where the time has been spent and can be used across distributed microservice architecture to troubleshoot problems. Open Telemetry provides a standardized way of instrumenting our system and providing those traces.

distributed tracing

 

 

  • SLA, SLI, SLO, and Error Budgets

So we don’t just want to know when something has happened and then react to an event that is not looking from the customer’s perspective. We need to understand if we are meeting SLA by gathering the number and frequency of the outages and any performance issues. Service Level Objectives (SLO) and Service Level Indicators (SLI) can assist you with measurements. Service Level Objectives (SLOs) and Service Level Indicators (SLI) not only assist you with measurements. They offer a tool for having better Reliability and form the base for the Reliability Stack.

 

 

Chaos Engineering

Baseline Engineering

 

 

Baseline Engineering

Baseline engineering and network baselining is a critical component of projects, as it helps to ensure that all systems and components are designed, implemented, and operated according to the required specifications. This includes developing and maintaining a baseline architecture and correctly integrating system components. In addition, baseline engineering must ensure that all systems have the necessary security measures in place and provide adequate backups and data integrity.

Baseline Engineering was easy in the past; applications ran in single private data centers, potentially two data centers for high availability. There may have been some satellite PoPs, but generally, everything was housed in a few locations. These data centers were on-premises, and all components were housed internally. As a result, troubleshooting, monitoring, and baselining any issues was relatively easy. The network and infrastructure were pretty static, the network and security perimeters were known, and there weren’t many changes to the stack, for example, daily.

 

Before you proceed, you may find the following post helpful:

  1. Network Traffic Engineering

 



Baseline Engineering


Key Baseline Engineering Discussion Points:


  • Monitoring was easy in the past.

  • How to start a baseline engineering project.

  • Distributed components and latency.

  • Chaos Engineeering Kubernetes.

 

  • A key point: Video on how to start a Baseline Engineering project

This educational tutorial will begin with guidance on how the application has changed from the monolithic style to the microservices-based approach and how this has affected failures. Then, I will introduce to you how this can be solved by knowing exactly how your application and infrastructure perform under stress and what are their breaking points. With direction to start a Low Latency Network Design and Baseline Engineering project. Specifically with Chaos Engineering.

 

 

However, nowadays, we are in a completely different environment where we have distributed applications with components/services located in many other places and types of places, on-premises, and in the cloud, with dependencies on both local and remote services. We are spanning multiple sites and accommodating multiple workload types.

In comparison to the monolith, today’s applications have many different types of entry points to the external world. All of this calls for the practice of Baseline Engineering and Chaos engineering kubernetes so you can fully understand your infrastructure and scaling issues. 

 

 Back to basics with baseline engineering

  • Chaos Engineering

Chaos engineering is a methodology of experimenting on a software system to build confidence in the system’s capability to withstand turbulent environments in production. It is an essential part of the DevOps philosophy, allowing teams to experiment with their system’s behavior in a safe and controlled manner.

This type of baseline engineering allows teams to identify weaknesses in their software architecture, such as potential bottlenecks or single points of failure, and take proactive measures to address them. By injecting faults into the system and measuring the effects, teams gain insights into system behavior that can be used to improve system resilience.

Finally, chaos Engineering teaches you to develop and execute controlled experiments that uncover hidden problems. For instance, you may need to inject system-shaking failures that disrupt system calls, networking, APIs, and Kubernetes-based microservices infrastructures.

Chaos engineering is defined as “the discipline of experimenting on a system to build confidence in the system’s capability to withstand turbulent conditions in production” In other words, it’s a software testing method concentrating on finding evidence of problems before users experience them.

Chaos Engineering

 

  • Network Baselining

Network baselining involves taking measurements of the network’s performance at different times. This includes measuring the throughput, latency, and other performance metrics and the network’s configuration. It is important to note that the performance metrics can vary greatly depending on the type of network being used. This is why it is essential to establish a baseline for the network to be used as a reference point for comparison.


Network baselining is integral to network management as it allows organizations to identify and address potential issues before they become more serious. Organizations can be alerted to potential problems by baselining the network’s performance. This can help organizations avoid costly downtime and ensure their networks run at peak performance.

network baselining
Diagram: Network Baselining. Source is DNSstuff

 

Network Baselining: A Lot Can Go Wrong

There is a growing complexity of infrastructure, and let’s face it, a lot can go wrong. It’s imperative to have a global view of all the infrastructure components and a good understanding of the application’s performance and health. In a large-scale container-based application design, there are many moving pieces and parts, and trying to validate the health of each piece manually is hard to do.  

Therefore, monitoring and troubleshooting are much harder, especially as everything is interconnected, making it difficult for a single person in one team to understand what is happening entirely. Nothing is static anymore, and things are moving around all the time. This is why it is even more important to focus on the patterns and to be able to efficiently see the path of where the issue is.

Some modern applications could be in multiple clouds and different location types simultaneously. As a result, there are multiple data points to consider. If any of these segments are slightly overloaded, the sum of each overloaded segment results in poor performance on the application level. 

 

What does this mean to latency?

Distributed computing has lots of components and services with components far apart. This contrasts with a monolith with all parts in one location. As a result of the distributed nature of modern applications, latency can add up. So we have both network latency and application latency. The network latency is several orders of magnitude more significant.

As a result, you need to minimize the number of Round Trip Times and reduce any unneeded communication to an absolute minimum. When communication is required across the network, it’s better to gather as much data together to get bigger packets that are more efficient to transfer. Also, consider using different types of buffers, both small and large, which will have varying effects on the dropped packet test.

Dropped Packet Test
Diagram: Dropped Packet Test and Packet Loss.

 

With the monolith, the application is simply running in a single process, and it is relatively easy to debug. Many traditional tooling and code instrumentation technologies have been built, assuming you have the idea of a single process. The core challenge is that trying to debug microservices applications is challenging. So much of the tooling we have today has been built for traditional monolithic applications. So there are new monitoring tools for these new applications, but there is a steep learning curve and a high barrier to entry.

 

A new approach: Network baselining and Baseline engineering

For this, you need to understand practices like Chaos Engineering along with service level objectives (slos) and how they can improve the reliability of the overall system. Chaos Engineering is a baseline engineering practice that provides the ability to perform tests in a controlled way. Essentially, we intentionally break things to learn how to build more resilient systems.

So we are injecting faults in a controlled way to make the overall application more resilient by injecting various issues and faults. Implementing practices like Chaos Engineering will help you understand and manage unexpected failures and performance degradation. The purpose of Chaos Engineering is to build more robust and resilient systems.

 

  • A final note on baselines: Don’t forget them!!

Creating a good baseline is a critical factor. You need to understand how things work under normal circumstances. A baseline is a fixed point of reference used for comparison purposes. You need to know usually how long it takes to start the Application to the actual login and how long it takes to do the essential services before there are any issues or heavy load. Baselines are critical to monitoring.

It’s like security; if you can’t see what, you can’t protect. The same assumptions apply here. Go for a good baseline and if you can have this fully automated. Tests need to be carried out against the baseline on an ongoing basis. You need to test constantly to see how long it takes users to use your services. Without baseline data, estimating any changes or demonstrating progress is difficult.

 

network baselining

 

Docker network security

Docker Security Options

 

docker attack surface

 

Docker Security Options

So you are currently in the Virtual Machine world and considering transitioning to a containerized environment as you want to smoothen your application pipeline and gain the benefits of a Docker containerized environment. But you have heard from many the containers are insecure and concerned about docker network security. There is a Docker attack surface to be concerned about. For example, containers run by root by default and have many capabilities that scare you. Yes, we have a lot of benefits to the containerized environment, and containers are the only way to do it for some application stacks. However, we have a new attack surface with some benefits of deploying containers and forcing you to examine Docker security options. The following post will discuss security issues, a container security video to help you get started, and an example of Docker escape techniques.

Containers by themselves are secure, and the kernel is pretty much battle-tested. A container escape is hard to orchestrate unless misconfiguration could result in excessive privileges. So even though the bad actors’ intent may stay the same, we must mitigate a range of new attacks and protect new components. To combat these, you need to be aware of the most common Docker network security options and follow the recommended practices for Docker container security. A platform approach is also recommended, and OpenShift is a robust platform for securing and operating your containerized environment.

 

Preliminary Information: Useful Links to Relevant Content

For pre-information, you may find the following posts helpful: 

  1. OpenShift Security Best Practices
  2. Docker Default Networking 101
  3. What Is BGP Protocol in Networking

 



Docker Security Options


Key Docker Network Security Discussion Points:


  • Docker network security.

  • Docker attack surface.

  • Container security video.

  • Securing Docker containers.

  • Docker escape techniques.

 

  • A key point: Container security video on Docker network security

In this container security video, we will go through some basics of containers along with the Docker attack surface. The building blocks of containers are namespaces and control groups. However, they share the host kernel, which is a security concern. We will also address container orchestrators with Kubernetes and Kubernetes namespace and discuss PODs, Services, NAT, and VXLAN.

 

 

A Key Point: Knowledge Check 

  • A key point: Back to basics with Docker security

To use Docker safely in production and development, you must know the potential security issues and the primary tools and techniques for securing container-based systems. The defenses for your system should also consist of multiple layers. For example, your containers will most likely run in VMs so that if a container breakout occurs, another level of defense can prevent the attacker from getting to the host or other containers. Monitoring systems should be in place to alert admins in the case of unusual behavior. Finally, firewalls should restrict network access to containers, limiting the external attack surface.

 

 

 

 

Docker network security
Diagram: Docker container security supply chain.

Docker Attack Surface

Often, the tools and appliances in place are entirely blind to containers. The tools look at a running process and think, if the process is secure, then I’m secure. One of my clients ran a container with the DockerFile and pulled an insecure image. The onsite tools did not know what an image was and could not scan it. As a result, we had malware right in the network’s core, a little bit too close to the database server for my liking. 

Yes, we call containers a fancy process, and I’m to blame here, too, but we need to consider what is around the container to secure it fully. For a container to function, it needs the support of the infrastructure around it, such as the CI/CD pipeline and supply chain. For this, you must consider all the infrastructures to improve your security posture. If you are looking for quick security tips on docker network security, this course I created for Pluralsight may help you with Docker security options.

container security video

Ineffective Traditional Tools

The containers are not like traditional workloads. We can run an entire application with all its dependencies with a single command. The legacy security tools and processes often assume largely static operations and must be adjusted to adapt to the rate of change in containerized environments. With non-cloud-native data centers, Layer 4 is coupled with the network topology at fixed network points and lacks the flexibility to support containerized applications. There is often only inter-zone filtering, and east-to-west traffic may go unchecked. A container changes the perimeter, and it moves right to the workload. Just look at a microservices architecture. It has many entry points as compared to monolithic applications.

container security video

Docker container networking

When considering container networking, we are a world apart from the monolithic. Containers are short-lived and constantly spun down, and assets such as servers, I.P. addresses, firewalls, drives, and overlay networks are recycled to optimize utilization and enhance agility. Traditional perimeters designed with I.P. address-based security controls lag in a containerized environment.

Rapidly changing container infrastructure rules and signature-based controls can’t keep up with a containerized environment. Securing hyper-dynamic container infrastructure using traditional networks ​​and endpoint controls won’t work. For this reason, you should adopt purpose-built tools and techniques for a containerized environment.

 

The Need for Observability

Not only do you need to implement good docker security options, but you also need to concern yourself with the recent observability tools. So we need proper observability of the state of security and the practices used in the containerization environment, and we need to automate this as much as possible. Not just to automate the development but also the security testing, container scanning, and monitoring.

You are only as secure as the containers you have running. You need to have observability into systems and applications and be proactive in these findings. It is not something you can buy and is a cultural change. You want to know how the application works with the server, how the network is with the application, and what data transfer looks like in transfer and a stable state.  

 

    Data Point

Single Platform


Logs


Metrics


Traces

 

What level of observation do you need so you know that everything is performing as it should? There are several challenges to securing a containerized environment. Containerized technologies are dynamic and complex and require a new approach that can handle the agility and scale of today’s landscape. There are initial security concerns that you must understand before you get started with container security. This will help you explore a better starting strategy.

 

Docker attack surface: Container attack vectors 

We must consider a different threat model and understand how security principles such as least privilege and in-depth defense apply to Docker security options. With Docker containers, we have a completely different way of running applications and, as a result, a different set of risks to deal with.

Instructions are built into Dockerfiles, which run applications differently from a normal workload. With the correct rights, a bad actor could put anything in the Dockerfile without the necessary guard rails that understand containers; there will be a threat.

Therefore, we must examine new network and security models, as old tools and methods won’t meet these demands.  A new network and security model requires you to mitigate against a new attack vector. Bad actors’ intent stays the same. They are not going away anytime soon. But they now have a different and potentially easier attack surface if misconfigured.

I would consider the container attack surface pretty significant; if not locked down, many default tools will be at the disposal of bad actors. For example, we have image vulnerability, access control exploits, container escape, privilege escalation, application code exploits, attacks on the docker host, and all the docker components.

 

Docker security options: A final security note

Containers by themselves are secure, and the kernel is pretty much battle-tested. You will not often encounter kernel compromises, but they happen occasionally. A container escape is hard to orchestrate unless misconfiguration could result in excessive privileges. You should stay clear of setting container capabilities that provide excessive privileges from a security standpoint.

 

Minimise container capabilities: Reduce the attack surface.

If you minimize the container’s capabilities, you are stripping down the container’s functionality to a bare minimum. And we mentioned this in the container security video. Therefore, the attack surface is limited and minimizes the attack vector available to the attacker.  You also want to keep an eye on the CAP_SYS_ADMIN. This flag grants access to an extensive range of privileged activities. There are many other capacities that containers run by default that can cause havoc.

 

docker attack surface