Chaos Engineering

Baseline Engineering



Baseline Engineering

Baseline engineering and network baselining is a critical component of projects, as it helps to ensure that all systems and components are designed, implemented, and operated according to the required specifications. This includes developing and maintaining a baseline architecture and correctly integrating system components. In addition, baseline engineering must ensure that all systems have the necessary security measures in place and provide adequate backups and data integrity.

Network baselining is an essential practice in the field of network management and optimization. By establishing a baseline, network administrators gain valuable insights into the normal behavior of their networks, which helps them identify deviations and troubleshoot issues promptly. In this blog post, we will delve into the concept of network baselining, its significance, and how it can contribute to enhancing network performance.


Highlights: Baseline Engineering

  • Traditional Network Infrastructure

Baseline Engineering was easy in the past; applications ran in single private data centers, potentially two data centers for high availability. There may have been some satellite PoPs, but generally, everything was housed in a few locations. These data centers were on-premises, and all components were housed internally. As a result, troubleshooting, monitoring, and baselining any issues was relatively easy. The network and infrastructure were pretty static, the network and security perimeters were known, and there weren’t many changes to the stack, for example, daily.

  • Distributed Applications

However, nowadays, we are in a completely different environment where we have distributed applications with components/services located in many other places and types of places, on-premises, and in the cloud, with dependencies on both local and remote services. We are spanning multiple sites and accommodating multiple workload types.

In comparison to the monolith, today’s applications have many different types of entry points to the external world. All of this calls for the practice of Baseline Engineering and Chaos engineering kubernetes so you can fully understand your infrastructure and scaling issues. 

  • The Role of Network Baselining

Network baselining involves capturing and analyzing network traffic data to establish a benchmark or baseline for normal network behavior. This baseline represents the typical performance metrics of the network under regular conditions. It encompasses various parameters such as bandwidth utilization, latency, packet loss, and throughput. By monitoring these metrics over time, administrators can identify patterns, trends, and anomalies, enabling them to make informed decisions about network optimization and troubleshooting


Before you proceed, you may find the following post helpful:

  1. Network Traffic Engineering
  2. Low Latency Network Design
  3. Transport SDN
  4. Load Balancing
  5. What is OpenFlow
  6. Observability vs Monitoring
  7. Kubernetes Security Best Practice


Baseline Engineering

Key Baseline Engineering Discussion Points:

  • Monitoring was easy in the past.

  • How to start a baseline engineering project.

  • Distributed components and latency.

  • Chaos Engineeering Kubernetes.


  • A key point: Video on how to start a Baseline Engineering project

This educational tutorial will begin with guidance on how the application has changed from the monolithic style to the microservices-based approach and how this has affected failures. Then, I will introduce to you how this can be solved by knowing exactly how your application and infrastructure perform under stress and what are their breaking points.



 Back to basics with baseline engineering

Chaos Engineering

Chaos engineering is a methodology of experimenting on a software system to build confidence in the system’s capability to withstand turbulent environments in production. It is an essential part of the DevOps philosophy, allowing teams to experiment with their system’s behavior in a safe and controlled manner.

This type of baseline engineering allows teams to identify weaknesses in their software architecture, such as potential bottlenecks or single points of failure, and take proactive measures to address them. By injecting faults into the system and measuring the effects, teams gain insights into system behavior that can be used to improve system resilience.

Finally, chaos Engineering teaches you to develop and execute controlled experiments that uncover hidden problems. For instance, you may need to inject system-shaking failures that disrupt system calls, networking, APIs, and Kubernetes-based microservices infrastructures.

Chaos engineering is defined as “the discipline of experimenting on a system to build confidence in the system’s capability to withstand turbulent conditions in production” In other words, it’s a software testing method concentrating on finding evidence of problems before users experience them.

Chaos Engineering


Network Baselining

Network baselining involves taking measurements of the network’s performance at different times. This includes measuring the throughput, latency, and other performance metrics and the network’s configuration. It is important to note that the performance metrics can vary greatly depending on the type of network being used. This is why it is essential to establish a baseline for the network to be used as a reference point for comparison.

Network baselining is integral to network management as it allows organizations to identify and address potential issues before they become more serious. Organizations can be alerted to potential problems by baselining the network’s performance. This can help organizations avoid costly downtime and ensure their networks run at peak performance.

network baselining
Diagram: Network Baselining. Source is DNSstuff



The Importance of Network Baselining:

Network baselining provides several benefits for network administrators and organizations:

1. Performance Optimization: Baselining helps identify bottlenecks, inefficiencies, and abnormal behavior within the network infrastructure. By understanding the baseline, administrators can optimize network resources, improve performance, and ensure a smoother user experience.

2. Security Enhancement: Baselining also plays a crucial role in detecting and mitigating security threats. By comparing current network behavior against the established baseline, administrators can identify any unusual or malicious activities, such as abnormal traffic patterns or unauthorized access attempts.

3. Capacity Planning: Understanding network baselines enables administrators to accurately forecast future capacity requirements. By analyzing historical data, they can determine when and where network upgrades or expansions may be necessary, ensuring consistent performance as the network grows.

Establishing a Network Baseline:

To establish an accurate network baseline, administrators follow a systematic approach:

1. Data Collection: Network traffic data is collected using specialized monitoring tools, such as network analyzers or packet sniffers. These tools capture and analyze network packets, providing detailed insights into various performance metrics.

2. Duration: Baseline data should ideally be collected over an extended period, typically ranging from a few days to a few weeks. This ensures that the baseline accounts for variations due to different network usage patterns.

3. Normalizing Factors: Administrators consider various factors that can impact network performance, such as peak usage hours, seasonal variations, and specific application requirements. By normalizing the data, they can establish a more accurate baseline that reflects typical network behavior.

4. Analysis and Documentation: Once the baseline data is collected, administrators analyze the metrics to identify patterns and trends. This analysis helps establish thresholds for acceptable performance and highlights any deviations that may require attention. Documentation of the baseline and related analysis is crucial for future reference and comparison.


Network Baselining: A Lot Can Go Wrong

There is a growing complexity of infrastructure, and let’s face it, a lot can go wrong. It’s imperative to have a global view of all the infrastructure components and a good understanding of the application’s performance and health. In a large-scale container-based application design, there are many moving pieces and parts, and trying to validate the health of each piece manually is hard to do.  

Therefore, monitoring and troubleshooting are much harder, especially as everything is interconnected, making it difficult for a single person in one team to understand what is happening entirely. Nothing is static anymore, and things are moving around all the time. This is why it is even more important to focus on the patterns and to be able to efficiently see the path of where the issue is.

Some modern applications could be in multiple clouds and different location types simultaneously. As a result, there are multiple data points to consider. If any of these segments are slightly overloaded, the sum of each overloaded segment results in poor performance on the application level. 


  • A key point: Video on Observability vs Monitoring

We will start by discussing how our approach to monitoring needs to adapt to the current megatrends, such as the rise of microservices. Failures are unknown and unpredictable. Therefore a pre-defined monitoring dashboard will have a hard time keeping up with the rate of change and unknown failure modes. For this, we should look to have the practice of observability for software and monitoring for infrastructure.



What does this mean to latency?

Distributed computing has lots of components and services with components far apart. This contrasts with a monolith with all parts in one location. As a result of the distributed nature of modern applications, latency can add up. So we have both network latency and application latency. The network latency is several orders of magnitude more significant.

As a result, you need to minimize the number of Round Trip Times and reduce any unneeded communication to an absolute minimum. When communication is required across the network, it’s better to gather as much data together to get bigger packets that are more efficient to transfer. Also, consider using different types of buffers, both small and large, which will have varying effects on the dropped packet test.

Dropped Packet Test
Diagram: Dropped Packet Test and Packet Loss.


With the monolith, the application is simply running in a single process, and it is relatively easy to debug. Many traditional tooling and code instrumentation technologies have been built, assuming you have the idea of a single process. The core challenge is that trying to debug microservices applications is challenging. So much of the tooling we have today has been built for traditional monolithic applications. So there are new monitoring tools for these new applications, but there is a steep learning curve and a high barrier to entry.


A new approach: Network baselining and Baseline engineering

For this, you need to understand practices like Chaos Engineering along with service level objectives (slos) and how they can improve the reliability of the overall system. Chaos Engineering is a baseline engineering practice that provides the ability to perform tests in a controlled way. Essentially, we intentionally break things to learn how to build more resilient systems.

So we are injecting faults in a controlled way to make the overall application more resilient by injecting various issues and faults. Implementing practices like Chaos Engineering will help you understand and manage unexpected failures and performance degradation. The purpose of Chaos Engineering is to build more robust and resilient systems.


  • A final note on baselines: Don’t forget them!!

Creating a good baseline is a critical factor. You need to understand how things work under normal circumstances. A baseline is a fixed point of reference used for comparison purposes. You need to know usually how long it takes to start the Application to the actual login and how long it takes to do the essential services before there are any issues or heavy load. Baselines are critical to monitoring.

It’s like security; if you can’t see what, you can’t protect. The same assumptions apply here. Go for a good baseline and if you can have this fully automated. Tests need to be carried out against the baseline on an ongoing basis. You need to test constantly to see how long it takes users to use your services. Without baseline data, estimating any changes or demonstrating progress is difficult.


Network baselining is a critical practice for maintaining optimal network performance and security. By establishing a baseline, administrators can proactively monitor, analyze, and optimize their networks. This approach enables them to identify and address performance issues promptly, enhance security measures, and plan for future capacity requirements. By investing time and effort in network baselining, organizations can ensure a reliable and efficient network infrastructure that supports their business objectives.


network baselining

Matt Conran: The Visual Age
Latest posts by Matt Conran: The Visual Age (see all)

Comments are closed.