Prometheus monitoring example

Prometheus Metric Types

Prometheus Metric Types

Prometheus, the popular open-source monitoring and alerting toolkit, offers a wide range of metric types to collect and analyze data. In this blog post, we will delve into the various Prometheus metric types, their features, and use cases. Whether you're a beginner or an experienced Prometheus user, this guide will provide you with a solid understanding of the different metric types at your disposal.

Counter Metrics: Counter metrics in Prometheus represent a monotonically increasing value. They are commonly used to track the number of events or occurrences over time. With each scrape, the counter resets to zero and starts counting again. These metrics are perfect for measuring things like HTTP requests or the number of messages processed by a system.

Gauge Metrics: Unlike counters, gauge metrics in Prometheus can increase or decrease arbitrarily. They are ideal for capturing values that fluctuate over time, such as CPU usage or memory usage. Gauges provide a snapshot of the current state of a particular metric and are widely used for monitoring resource utilization.

Histogram Metrics: Histogram metrics offer insights into the distribution of values over a given range. They allow you to measure things like request durations or response sizes and provide detailed statistical information. Histograms automatically generate buckets to track observations falling within specific value ranges, enabling you to analyze data distribution effectively.

Summary Metrics: Similar to histograms, summary metrics provide statistical information about observed values. However, summaries are specifically designed to calculate percentiles. They are useful for measuring things like request latency, where you want to understand the distribution of response times across different percentiles. Summaries are lightweight and maintain a small sample size to efficiently calculate percentiles.

Highlights: Prometheus Metric Types

Open-Source Monitoring 

Prometheus is an open-source monitoring and alerting toolkit that was developed initially at SoundCloud. It provides a flexible and powerful platform for monitoring various components of your infrastructure, including servers, containers, and applications. Its robust data model, extensive metrics collection capabilities, and adaptable query language make it a preferred choice for many organizations.

It operates based on a time series database model, collecting and storing metrics from various targets, including servers, applications, or any other component that generates data. Prometheus monitoring offers a versatile set of metric types that cater to different monitoring needs.

Prometheus has become indispensable in the world of cloud-native and containerized applications. It efficiently collects and stores metrics as time series data, offering powerful querying capabilities. But to harness Prometheus’s full potential, it’s crucial to understand its metric types, as they are the foundation of effective monitoring and alerting.

**Use Cases and Benefits**

Prometheus monitoring’s diverse metric types offer immense benefits across various use cases. Let’s explore some notable examples:

1. Infrastructure Monitoring: Prometheus can monitor server resource utilization, network traffic, and system health by utilizing counters and gauges. This allows for proactive system maintenance and efficient capacity planning.

2. Application Performance Monitoring (APM): Histograms and summaries enable detailed analysis of application performance metrics, including response times, latency distributions, and error rates. This helps identify bottlenecks and optimize application performance.

3. Alerting and Incident Management: Prometheus monitoring’s flexible alerting system can trigger notifications based on specific metric thresholds or patterns. This enables rapid incident response and proactive issue resolution. Let’s take a closer look at some of the prominent metric types:

**Four Prometheus Metric Types**

1. Counter: Counters are monotonically increasing metrics used to track the number of occurrences of an event. They are handy for measuring things like request counts or error rates.

2. Gauge: Gauges represent a specific value at a particular point in time. They are ideal for tracking metrics that can fluctuate, such as CPU or memory usage.

3. Histogram: Histograms provide insights into the statistical distribution of a set of values. They divide observations into configurable buckets, allowing for analysis of data distribution and quantiles.

4. Summary: Similar to histograms, summaries provide statistical insights. However, they calculate quantiles on the client side instead of the server side, making them more suitable for monitoring long-running tasks or services.

**Choosing the Right Metric Type**

Selecting the appropriate metric type is vital for accurate data representation and analysis. Understanding the use case of each type helps in leveraging their strengths:

– Use **counters** for monotonically increasing values.

– Opt for **gauges** when tracking fluctuating values.

– Choose **histograms** for bucketed distribution analysis.

– Select **summaries** when quantile information is needed.

By aligning your metrics with the right type, you ensure that your monitoring system is both efficient and insightful.

Prometheus Scraping Metrics

Scraping HTTP Endpoints

Prometheus collects these metrics by scraping HTTP endpoints that expose metrics, known as a pull model. Prometheus exporters built by the community can natively expose those endpoints or via the monitored component. In addition to supporting a wide range of programming languages, Prometheus also provides client libraries that can be used to instrument your code.

Prometheus can also scrape metrics from OpenMetrics. HTTP metrics are exposed via text-based formats (which are more widely used) or more robust and efficient protocol buffer formats. This format is human-readable, so you can open it in a browser or retrieve exposed metrics using a tool like curl.

Prometheus Data Types

Prometheus’ metric model supports only four metric types and is only available in client libraries. The exposition format represents all metric types using one or more underlying Prometheus data types.

This Prometheus data type includes a metric name, labels, and a floating value. In addition, an agent or a monitoring backend adds the timestamp when scraping metrics (Prometheus, for example). PromQL, short for Prometheus Querying Language, is the primary way to query metrics within Prometheus. You can display an expression’s return as a graph or export it using the HTTP API.

PromQL uses three Prometheus data types: scalars, range vectors, and instant vectors. It also uses strings, but only as literals. 

Related: Before you proceed, you may find the following posts helpful:

  1. Observability vs Monitoring
  2. Microservices Observability

Prometheus Metric Types

For Prometheus metric types, we want as many metrics as possible. These need to be stored to follow trends, understand what has been happening from a historical view, and better predict any issues. So, there are several parts to a Prometheus monitoring solution; we must collect the metrics, known as scraping, store them, and then analyze them. In addition, we need to consider storage security, compliance, and regulatory concerns for distributed systems observability.

Monitoring the correct metric is key; having metrics lets you view how the system performs. The Prometheus metrics types represent raw measurement of resource usage, which can help you plan for upgrading and tell you how many resources are being used.  

To be clear, there are two kinds of “types” in Prometheus. There are the metric types of metrics and the data types of PromQL expressions.

Prometheus has four metric types

  1. Counters
  2. Gauges
  3. Histograms 
  4. Summaries 

PromQL subsequently has four data types:

  1. Floats (mostly scalars) 
  2. Range vectors
  3. Instant vectors
  4. Time (though it’s often not counted in this category)

Let me briefly explain them:

Prometheus Metric Types Best Practices:

Now that we have explored the different metric types offered by Prometheus let’s discuss some best practices for using them effectively:

1. Choose the appropriate metric type for your use case. Consider each type’s characteristics and relevance to your monitoring needs.

2. Avoid combining unrelated metrics into a single metric type. This can lead to confusion and make it harder to analyze the data.

3. Use labels to add additional dimensions to your metrics. Labels allow you to differentiate between instances or environments, providing more context to your monitoring data.

Prometheus Pull Approach

It’s easy to monitor a Kubernetes cluster with the pull model because of service discovery and shared network access within the cluster. However, watching a dynamic fleet of virtual machines, AWS Fargate containers, or Lambda functions with Prometheus is hard. How come?

Identifying metrics endpoints for scraping is difficult; network security policies may restrict access to those endpoints. Prometheus Agent Mode was released at the end of 2021 to solve some of these problems. This mode collects metrics and sends them to a monitoring backend via the remote write protocol.

Prometheus target labels
Diagram: Prometheus Service Discovery and Labels

A) Prometheus Service Discovery

Prometheus discovers targets to scrape from service discovery. These can be instrumented or third-party applications you can scrape via an exporter. The scraped data is stored, and you can use it in dashboards using PromQL or send alerts to the Alertmanager, which will convert them into pages, emails, and other notifications. Metrics do not typically magically spring forth from applications; someone has to add the instrumentation that produces them.

B) Default Prometheus configuration.

In the following lab guide, you will see Prometheus’s default configuration. I have done a cat prometheus.yml, and you can see Prometheus is scaping itself for metrics. Prometheus is configured through a single YAML file.

The Prometheus YAML file serves as the configuration file for Prometheus and defines the various aspects of the monitoring setup. It consists of a set of key-value pairs organized in a hierarchical structure, instructing Prometheus on what to scrape, how to scrape it, and how to handle the collected data.

Prometheus YAML file
Diagram: Prometheus YAML file

C) Global Configuration:

This section defines global configurations applicable to the entire Prometheus setup. Key elements in this section include the scrape_interval (specifying the time interval between scrapes) and the evaluation_interval (determining how frequently Prometheus evaluates the collected data).

D) Scrape Configuration:

Prometheus collects metrics from various targets using scrape configurations. In this section, users can define scrape jobs, specifying the target endpoints, scrape intervals, and other related parameters. Each scrape configuration corresponds to a specific target, such as an application, database, or server.

E) Alerting Rules:

Prometheus allows users to define alerting rules to trigger notifications based on specific conditions. The alerting_rules section in the YAML file is dedicated to configuring these rules. Users can define rules for metrics thresholds, anomalies, or other necessary conditions for their monitoring needs.

Core Metric Types

Core Metric Type
Diagram: Core Metric Type. Source is Sentinelone.

Direct Instrumentation with Client Libraries

Client Libraries
Diagram: Direct Instrumentation. Source is Sentinelone.

Indirect Instrumentation with Exporters

Indirect Instrumentation
Diagram: Indirect Instrumentation with Exporters. Source is Sentinelone.

You must remember that Prometheus is only used to collect and explore metrics. Because it uses a time series data model, data is identified by metric names and has key-value pairs.
Prometheus cannot be used for logging or event-driven architectures that require tracking of individual events. Notice that none of the examples or use cases I’ve used in previous sections relate to logs.

Prometheus is suitable for storing CPU usage, latency requests, error rates, or networking bandwidth.

Moreover, Prometheus makes trade-offs with the data it collects. This means that it will prefer to provide 99.99% accurate data (per their documentation) rather than degrade performance or break monitoring systems. You should avoid using this method to access essential information like a bank account balance because you may lose some data.

Starting with Prometheus Metric Types

Metrics can be applied to various components, are a unit of measurement for evaluating an item, and are consistently measured. Examples of common measurements include CPU utilization, memory utilization, and interface utilization. These are numbers about how your resources are performing.

For the metrics side of things, we have runtime, infrastructure, and application metrics, including Prometheus Exporters, response codes, and time-to-serve data. We also have CI/CD pipeline metrics such as build time and failures. Let’s discuss these in more detail.

**The process of exposition**

Exposition is the process of making metrics available to Prometheus. Exposition to Prometheus is done over HTTP. Usually, metrics are exposed under the /metrics path, and a client library handles the request. Prometheus supports both the Prometheus text format and OpenMetrics.

You can produce the exposition format by hand, which is easier with the Prometheus text format. If no suitable library exists for your language, a library is recommended. Most libraries will support both OpenMetrics and Prometheus text formats.

When metrics are defined, they are usually registered with the default registry. If one of the libraries you depend on has Prometheus instrumentation, you will benefit from it automatically. Some users prefer explicitly passing a registry down from the primary function, so you’d have to rely on each library between your application’s primary function and the Prometheus instrumentation to be aware of it. It assumes that all libraries in the dependency chain care about instrumentation and agree on which libraries to use.

**Detailing: Prometheus Metric Types**

The Prometheus client libraries offer four core metric types. These are currently only differentiated in the client libraries (to enable APIs tailored to the usage of the specific types) and in the wire protocol. The Prometheus server does not yet use the type information and flattens all data into untyped time series. This may change in the future.

a) Counters

The first metric type we’ll explore is the Counter. Counters are monotonically increasing values, meaning they only increase over time. These metrics often measure the number of requests served or the total number of events processed. Counters are reset to zero when the Prometheus server restarts, ensuring accurate measurements.

Countermetrics are used to increase measurements. Since they are cumulative, their value can only increase. Exceptionally, the counter’s value is reset to zero when it is restarted. A counter’s value could be more helpful on its own. However, a counter value is often used to compute the delta or rate of change between two timestamps.

Do not use a counter to expose a value that can decrease. For example, do not use a counter for the number of currently running processes; instead, use a gauge.

Client library usage documentation for counters:

b) Gauges

Next up, we have Gauges. Gauges represent a single numerical value that can arbitrarily go up or down. They are commonly used to track metrics like CPU usage or memory consumption. Unlike counters, gauges hold their value until explicitly changed by the exporter.

Gauge metrics measure increases or decreases. This metric type is more familiar since the actual value without additional processing is meaningful. A gauge is, for instance, a metric that measures the temperature, CPU, memory usage, or queue size.

Client library usage documentation for gauges:

c) Histograms

Histograms measure the distribution of values in a given dataset. Prometheus histograms are divided into configurable buckets, allowing you to track metrics such as response times or request sizes. Each bucket captures the number of observations falling into a specific range, making it easier to analyze data distribution.

A histogram represents a distribution of measurements. For example, request durations or response sizes are often measured with them. A histogram counts how many measurements fall into each bucket based on the entire range of measurements.

Client library usage documentation for histograms:

Prometheus Metric Types
Diagram: Prometheus Metric Types. Source timescale.

NOTE: Beginning with Prometheus v2.40, there is experimental support for native histograms. A native histogram requires only a one-time series, including a dynamic number of buckets and the sum and count of observations. Native histograms allow much higher resolution at a fraction of the cost. Detailed documentation will follow once native histograms are closer to becoming stable.

d) Summary

The Summary metric type is similar to histograms but focuses on calculating quantiles over a sliding time window. This makes it helpful in measuring request durations or API response times. Summaries provide information about the distribution of values, including the minimum, maximum, and quantiles.

Client library usage documentation for summaries:

**Summary vs. Histogram**

A critical distinction between summaries and histograms is their storage requirements. Summaries must retain a sample for each observed value, while histograms only store aggregated data points. Therefore, summaries are more memory-intensive but provide more precise quantile calculations.

  • Untyped

The untyped metric type is a catch-all category that can be used when none other metric types fit the collected data. It represents a value that can change over time but does not fall into any specific category. Untyped metrics are helpful when you require flexibility and simplicity, but they lack the advanced analysis capabilities of other metric types.

Highlighting Prometheus Monitoring

Auto Scaling Observability

Previously, Heapsters was used as a monitoring solution that came out of the box with Kubernetes. We now have Prometheus as the de facto standard monitoring system for Kubernetes clusters, bringing many benefits. Firstly, Prometheus monitoring scales with a pull approach and the Prometheus federated options. The challenge is that if we run microservices at scale and the monitoring system pushes metrics out to a metric server, the monitoring system can flood the network.

Also, with a push-based approach, you may need to scale up instead of out, which could be costly. We can have a bunch of different systems we want to monitor. Therefore, the metrics content will differ for the systems and components, but Prometheus collects and exports the same. This provides a welcomed layer of unification for the different systems in the network.

Prometheus Monitoring
Diagram: Prometheus Monitoring. Source is Opcito

Exporters and Client Libraries

With Prometheus monitoring, you can get metrics from the systems you want to monitor using pre-built exporters and custom client libraries. Prometheus works very well with Docker and Kubernetes but can also work outside the container networking world with non-cloud native applications using exporters.

You can monitor your entire stack with a wide range of exporters and client libraries. We install the code library and gather custom applications and runtime metrics for cloud-native applications. Installing code in the application lets us see the custom metrics that matter most to us.

What are Prometheus Exporters?

Prometheus exporters are software components that expose metrics in a format that Prometheus can scrape. They act as intermediaries between Prometheus and the systems or applications being monitored.

These exporters collect metrics from various sources, including databases, web servers, cloud platforms, and custom applications. By providing a standardized interface, exporters allow Prometheus to collect and store metrics from diverse sources uniformly.

Prometheus exporters work by exposing a web server that serves metrics in a format known as the Prometheus exposition format. This format consists of simple text-based data, usually in the form of key-value pairs. The exporters periodically expose an HTTP endpoint that Prometheus scrapes to collect the metrics. The scraped metrics are then stored in the Prometheus time-series database, where they can be queried and visualized.

Types of Prometheus Exporters:

Prometheus exporters come in various forms, each designed to collect metrics from specific systems or applications. Some famous exporters include:

1. Node Exporter collects system-level metrics such as CPU, memory, disk, and network statistics from Linux or Unix-based systems.

2. Blackbox Exporter: Allows monitoring of network endpoints by performing HTTP, TCP, ICMP, and DNS probes.

3. MySQL Exporter: Collects metrics from MySQL databases, providing insights into query execution time, connection statistics, and more.

4. Redis Exporter: This tool gathers metrics from Redis, a popular in-memory data structure store. It enables monitoring key performance indicators such as memory usage, latency, and throughput.

What are Prometheus Client Libraries?

Prometheus provides client libraries in various programming languages, including, but not limited to, Go, Java, Python, Ruby, and JavaScript. These libraries bridge your application and Prometheus, allowing you to collect and expose relevant metrics effortlessly.

1. Go Library:

Prometheus’ Go client library, known as “client_golang,” is widely used due to its simplicity and robustness. It provides a straightforward API to instrument your Go applications, making it easy to expose custom metrics, collect and record them, and serve them via an HTTP endpoint. You can seamlessly integrate Prometheus monitoring into your Go-based projects with the Go client library.

2. Java Library:

Prometheus offers a dedicated client library for Java-based applications called “client_java.” This library allows you to instrument your Java code using simple annotations, making it effortless to expose metrics. The Java library simplifies monitoring Java applications with features like automatic metric exposition via HTTP endpoints and support for popular Java frameworks like Spring Boot and Micronaut.

3. Python Library:

Prometheus’ Python client library, known as “client_python,” provides a Pythonic way to instrument your Python applications. With its intuitive API, you can easily expose custom metrics, collect and record them, and serve them via an HTTP endpoint. The Python library also supports popular web frameworks like Flask and Django, making it a convenient choice for developers.

4. Ruby Library:

For Ruby developers, Prometheus provides a client library called “client_ruby.” This library allows you to instrument your Ruby applications effortlessly. It offers features like metric collection, recording, and exposing metrics via an HTTP endpoint. The Ruby library integrates well with popular Ruby frameworks like Ruby on Rails, enabling seamless monitoring of Ruby applications.

5. JavaScript Library:

Prometheus’ JavaScript client library, known as “client_js,” enables web developers to instrument their JavaScript-based applications. This library lets you easily expose custom metrics, collect and record them, and serve them via an HTTP endpoint. The JavaScript library is compatible with both browser-based JavaScript and Node.js, making it versatile for monitoring frontend and backend applications.

Additional Metric Types:

  • Prometheus Metric type: Runtime metrics

Runtime Metrics are statistics collected by the operating system and application host. These include CPU usage, memory load, and web server requests. For example, this could be CPU and memory usage from a Tomcat and JVM from a Java app.

  • Prometheus Metric type: Infrastructure metrics

We examine CPU utilization, latency, bandwidth, memory, and temperature metrics for Infrastructure metrics. These metrics should be collected over a long period and applied to infrastructure such as networking equipment, hypervisors, and host-based systems.

  • Prometheus Metric type: Application metrics

Then, we have Application metrics and custom statistics relevant only to the application and not the infrastructure. Application metrics pertain specifically to an application. This may include the number of API calls made during a particular time. 

This can be quickly done with web-based applications; here, we can get many status codes that provide information. These metrics are easy to measure, and the response codes are available immediately. For example, an HTTP status code 200 is good, and 400 or more is an issue. 

  • Prometheus Metric type: Time to first byte

Another important metric is the time a web server takes to respond to the data. The important metric here is time to the first byte (TTFB). This measures how long it takes for your application to send data. Time to the first byte refers to the time between the browser requesting a page and when it receives the first byte of information from the server. If this metric exceeds the usual, you may need caching, faster storage, or a better CPU.

Let us take the example of the content delivery network (CDN); what is an excellent time to the first byte? On average, anything with a TTFB under 100 ms is fantastic. Anything between 200 and 500 ms is standard, and anything between 500 ms and 1 second is less than ideal. Anything more significant than 1 second should likely be investigated further.

  • Prometheus Metric type: CI/CD pipeline metrics

For the CI/CD Pipeline metrics, we want to measure the time to do the static code analysis. Next, we want to count the number of errors while running the pipeline. Finally, we want to measure the build time and build failures. These metrics include the time it takes to build an application, the time it takes to complete tests, and how often builds fail.  

  • Prometheus Metric type: Docker metrics

Docker metrics come from the Docker platform. These may include container health checks, the number of online and offline nodes in a cluster, and the number of containers and actions. These container actions may be stopped, paused, or run. So, we have built-in metrics provided by Docker to give additional visibility to the running containers. When running containers in production, monitoring their runtime metrics, such as CPU and memory usage, is essential.

Docker Metrics

Metrics from the Docker Platform are essential for containers when Docker stops and starts applications for you. You can’t gather one metric type without the other. For example, if you look at the application metrics, you only look at half of the puzzle and may miss the problem.

For example, if one of your applications is performing poorly and the docker platform constantly spins up new containers, you would not see that just under the application metrics. Your application and runtime metrics may seem to be within the standard thresholds. However, combining this with the Docker Platform metrics shows the container stats, showing a spike in container creation.

Exposing application metrics to Prometheus

Application metrics give you additional information. Unlike runtime metrics you get for free, you need to record what you care about explicitly. Here, we have client libraries that Prometheus offers. All the major languages have a Prometheus client library, which provides the metrics endpoint. The client library makes application metrics available to Prometheus, giving you a very high level of visibility into your application.

With Prometheus client libraries, you can see what is happening inside the application. So, we have both Prometheus exporters and Prometheus client libraries that allow Prometheus to monitor everything.  

Exposing Docker Metrics to Prometheus

First, Docker Default Networking 101. The Docker Engine interacts with all clients and collects and exports metrics. When you build a Docker image, the Engine records a metric. We need insights into the Docker platform. Here, we can expose Docker metrics to Prometheus. The Docker Engine has a built-in mechanism to export metrics in Prometheus format. So we have, for example, the Docker metrics covering the Engine and container and metrics about images. 

Docker metric types: Three types

The types of metrics have three areas. First, we have the Docker Engine, Builds, and Containers. 

    1. The Engine will give you information on the host, such as the CPU count, O/S version, and build of the Docker Engine. 
    2. Then, for Build metrics, it is helpful for information such as the number of builds triggered, canceled, and failed. 
    3. Container metrics also show the number of containers stopped and paused and the number of health checks that are fired and failed.

Wrap up: Prometheus monitoring.

So, for Prometheus monitoring, we have Prometheus exporters that can get metrics for, let’s say, a Linux server and application metrics that can support Prometheus using a client library.

Both have an HTTP endpoint that returns metrics in the standard Prometheus format. Once the HTTP endpoint is up and running on the application ( legacy or cloud-native ), Prometheus will scrape ( collect ) the metric with dynamic or static approaches.

So we have Exporters that can add metrics to systems that don’t have Prometheus support. We also have Prometheus client libraries that can provide Prometheus support in the application. These client libraries can provide out-of-the-box runtime and custom metrics relevant to the applications.

Summary: Prometheus Metric Types

Regarding monitoring applications, Prometheus stands out as a powerful tool. One of its key features is the ability to work with different metric types, each serving a specific purpose. In this blog post, we dive into the world of Prometheus metric types and understand how they can enhance your application monitoring strategy.

Counter Metrics

Counter metrics are used to track the number of occurrences of a particular event. They start at 0 and can only increase over time. These metrics are ideal for measuring total requests served, successful logins, or errors encountered. By monitoring counter metrics, you can gain valuable insights into your application’s overall health and usage.

Gauge Metrics

Gauge metrics, unlike counter metrics, can increase or decrease over time. They often measure things that fluctuate, such as CPU usage, memory consumption, or the number of active connections. With gauge metrics, you can have a real-time view of the current state of your application and identify any potential bottlenecks or anomalies.

Section 3: Histogram Metrics

Histogram metrics provide a way to measure the distribution of values over a given period of time. They are commonly used to capture things like response times or request sizes. By collecting histogram metrics, you can analyze the distribution of these values and gain insights into the performance characteristics of your application.

Summary Metrics

Summary metrics are similar to histograms but provide more advanced features. They calculate percentiles, allowing you to measure the 90th percentile response time. Additionally, summary metrics provide a rolling time window for observations, making them more suitable for long-term analysis of your application’s performance.

In conclusion, understanding the different Prometheus metric types is critical to effectively monitoring your applications. By utilizing counter metrics, gauge metrics, histogram metrics, and summary metrics, you can gain valuable insights into the behavior and performance of your application. Whether tracking events, monitoring resource utilization, or analyzing response times, Prometheus metric types provide the flexibility and power to ensure your applications run smoothly.

prometheus Monitoring

Prometheus Monitoring: The Pull Approach

Prometheus Monitoring: The Pull Approach

In the world of monitoring and observability, Prometheus has emerged as a powerful tool for collecting and analyzing metrics. One of the key aspects of Prometheus is its unique approach to data collection, known as the pull approach. In this blog post, we will explore what the pull approach entails and why it has gained popularity among DevOps engineers and operators.

Prometheus is an open-source monitoring system that was developed at SoundCloud. It is designed to monitor highly dynamic containerized environments and provides a flexible and scalable solution for collecting time-series data. By using a multi-dimensional data model and a powerful query language, Prometheus allows users to gain deep insights into the performance and health of their systems.

Prometheus Monitoring offers a plethora of features that make it a preferred choice for monitoring modern systems. Its powerful query language, PromQL, allows users to slice and dice metrics, create alerts, and build custom dashboards. The multi-dimensional data model provides flexibility in organizing and querying metrics. Additionally, Prometheus' alerting system enables proactive identification of anomalies, helping organizations mitigate potential issues before they impact end-users.

Prometheus Server: The backbone of the Prometheus ecosystem is the Prometheus Server. This component is responsible for collecting time-series data, processing it, and storing it in a highly efficient and scalable manner. It leverages a pull-based model, periodically scraping metrics from configured targets. With its built-in data storage and querying capabilities, the Prometheus Server acts as a central hub for collecting and storing metrics from various sources.

Exporters: To monitor different types of systems and applications, Prometheus relies on exporters. Exporters are responsible for converting metrics from various formats into Prometheus-compatible data. Whether it's monitoring a database, a web server, or a cloud service, exporters provide the necessary bridges to gather metrics from these systems and make them available to the Prometheus Server. Popular examples include the Node Exporter, Blackbox Exporter, and Prometheus Pushgateway.

Alertmanager: Effective alerting is a crucial aspect of any monitoring system. Prometheus achieves this through its Alertmanager component. Alertmanager receives alerts from the Prometheus Server and applies various routing and grouping rules to ensure that the right people are notified at the right time. It supports multiple notification channels, such as email, PagerDuty, and Slack, making it highly flexible for integrating with existing incident management workflows.

Grafana Integration: Prometheus's power is further enhanced by its integration with Grafana, a popular open-source visualization tool. Grafana allows users to create stunning dashboards and visualizations using data from Prometheus. With its vast array of panels and plugins, Grafana enables users to gain deep insights into their monitoring data and build custom monitoring views tailored to their specific needs.

Highlights: Prometheus Monitoring

Created by SoundCloud

In this post, I would like to discuss Prometheus monitoring and its pull-based approach ( Prometheus Pull ) to the metric collection and the Prometheus metric types. Prometheus is a powerful open-source monitoring system created by SoundCloud to monitor and alert the infrastructure and applications within their environment. It has since become one of the most popular monitoring systems due to its ability to monitor various services, from simple applications to complex distributed systems.

Pull-based System:

– Prometheus is designed to be simple to use. It uses a pull-based system, meaning it collects data from the services it monitors rather than having the services push the data to Prometheus. This makes it easy to set up and configure, allowing for great flexibility in the services it can monitor. It also has an intuitive user interface, making it easy to use and navigate.

– Understanding Prometheus’s architecture is crucial for effectively deploying and leveraging its capabilities. At its core, Prometheus consists of a server responsible for data collection, a time-series database for storing metrics, and a user interface for visualization and querying. The server scrapes metrics from various targets using exporters or service discovery, while the time-series database facilitates efficient storage and retrieval of collected data.

– Prometheus Monitoring integrates with many systems, making it highly versatile in diverse environments. It provides exporters with popular technologies like Kubernetes, Docker, and AWS, allowing an accessible collection of relevant metrics. Moreover, Prometheus can be integrated with other monitoring tools, such as Grafana, to create comprehensive dashboards and visualizations.

Key Features of Prometheus Monitoring:

1. Time-Series Database: Prometheus utilizes its own time-series database, tailored explicitly for efficiently storing and querying time-series data. This allows for real-time monitoring and analysis, enabling quick troubleshooting and performance optimization.

2. Flexible Query Language: PromQL, Prometheus’s query language, offers a powerful and expressive way to retrieve and manipulate metrics data. It supports various operations, including filtering, aggregation, and mathematical calculations, empowering users to extract meaningful insights from their monitoring data.

3. Alerting and Alert Manager: Prometheus has a built-in alerting mechanism that allows users to define alert rules based on specific conditions. Prometheus can trigger alerts when the Alert Manager component meets, manages, and routers such situations.

**Challenge: The traditional approach** 

First, let us roll back in time before we had Prometheus network monitoring, say ten years, and look at the birth of monitoring and the tools used. The monitoring tools often operated in a silo, which led to more blind spots. 

The old approach to monitoring is considerably different from today’s approach to Observability.  Traditionally, you can use something like Ganglia to monitor. Ganglia was often used to monitor CDN networks involving several PoPs in different locations. However, within this CDN network, the PoPs look the same. The same servers, storage, etc., and only with the difference in the number of transit providers and servers.

Then, to alert people, we can use Icinga and have this on the back of the Ganglia. With this monitoring design, the infrastructure pushes metrics to the central collectors. The central collectors are in one location, maybe two for backup, but often two locations.

**Complexity and distributed systems**

However, you will see some issues as infrastructure grows ( infrastructure does grow at alarming rates ), and there is a need to push more metrics into Ganglia. For example, with some monitoring systems, the pushing style of the metric collection can cause scalability issues as the number of servers increases. Especially in more effective distributed systems observability use cases.

Within this CDN monitoring design, only one or two machines collect the telemetry for your infrastructures. So, as you scale your infrastructure and throw more data at the system, you have to scale up instead of out. This can be costly and will often hit bottlenecks.

**Scale with the Infrastructure**

However, you want a monitoring solution to scale your infrastructure growth. As you roll out new infrastructure to meet demands, you want to have monitoring systems that can scale. The monitoring system can scale as the infrastructure scales, such as in the use case with Prometheus network monitoring. However, with Ganglia and Icinga, we also have limited graphing functions.

Creating custom dashboards on unique metrics was difficult, and no alerting support existed. Also, there was no API to get and consume the metric data around that time. If you wanted to access the data and consume it in a different system or perform interesting analyses, all of it was locked into the Ganglia.

Related: Before you proceed, you may find the following post helpful.

  1. Correlate Disparate Data Points
  2. Service Level Objectives
  3. Kubernetes Networking 101

Prometheus Monitoring

Understanding Prometheus’ Pull-Based System

Prometheus operates on a pull-based model, meaning it actively fetches metrics from the targets it monitors. Instead of waiting for the targets to push metrics to it, Prometheus takes the initiative and pulls the data at regular intervals. This approach offers several advantages over traditional push-based systems.

1. Flexibility and Reliability: One of the key benefits of the pull-based system is its flexibility. Prometheus can quickly adapt to dynamic environments and handle changes in the configuration of monitored targets. It can automatically discover new targets and adjust the scraping frequency based on the importance of metrics. This flexibility ensures that Prometheus can keep up with the ever-changing nature of modern infrastructure.

2. Efficiency and Scalability: The pull-based system also provides efficiency and scalability. By fetching metrics directly from the targets, Prometheus reduces the resource overhead on each target. This is particularly beneficial in scenarios where the number of targets is large, or the resources on the targets are limited. Additionally, Prometheus can distribute the scraping workload across multiple instances, enabling horizontal scalability and ensuring smooth operations even under heavy loads.

3. Data Integrity and Consistency: Another advantage of the pull-based system is its ability to ensure data integrity and consistency. Since Prometheus fetches metrics directly from the targets, potential data loss in a push-based system is eliminated. By actively pulling the data, Prometheus guarantees that the most up-to-date and accurate metrics are available for analysis and alerting.

4. Alerting and Monitoring: Prometheus’ pull-based system seamlessly integrates with its powerful alerting and monitoring capabilities. By regularly fetching metrics, Prometheus can evaluate them against predefined rules and trigger alerts when certain thresholds are exceeded. This proactive approach to monitoring ensures that any potential issues or anomalies are promptly detected, allowing for timely remedial actions.

Example: Prometheus is Google Cloud

In the following example, I have set up a  Google Kubernetes Engine cluster ad and then deployed the Managed Service for Prometheus to ingest metrics from a simple application. Remember that the Managed Service for Prometheus is Google Cloud’s fully managed storage and query service for Prometheus metrics. This service is built on Monarch, the same globally scalable data store as Cloud Monitoring.

Note: A thin fork of Prometheus replaces existing Prometheus deployments and sends data to the managed service with no user intervention. This data can then be queried using PromQL through the Prometheus Query API supported by the managed service and the existing Cloud Monitoring query mechanisms.

In this last section, quickly use gcloud to deploy a custom monitoring dashboard that shows the metrics from this application in a line chart. Once created, navigate to Monitoring > Dashboards to see the newly created

Prometheus Network Monitoring

Prometheus network monitoring is an open-source, metrics-based system. It includes a robust data model and a query language that lets you analyze how your applications and infrastructure are performing. It does not try to solve problems outside the metrics space and works solely in metric-based monitoring.

However, it can be augmented with tools and a platform for additional observability. For Prometheus to work, you need to instrument your code.

Available Client libraries

Client libraries are available in all the popular languages and runtimes for instrumenting your code, including Go, Java/JVM, C#/.Net, Python, Ruby, Node.js, Haskell, Erlang, and Rust. In addition, software like Kubernetes and Docker are already instrumented with Prometheus client libraries. So you say these are out of the box, so are you ready for Prometheus to scrap their metrics?

In the following diagram, you will see the Prometheus settings from a fresh install. I have downloaded Prometheus from the Prometheus website on my local machine. Prometheus, by default, listens on port 9090 and contains a highly optimized Time Series Database (TSDB), which you can see is started. Also displayed at the very end of the screenshot is the default name of the Prometheus configuration file.

The Prometheus configuration file is written in YAML format and is defined by the scheme. I have done a CAT on the Prometheus configuration file to give you an idea of what it looks like. In the default configuration, a single job called Prometheus scrapes the time series data exposed by the Prometheus server. The job contains a single, statically configured target, the local host on port 9090. 

The Transition to Prometheus network monitoring

Around eight years ago, Ganglia introduced SaaS-based monitoring solutions. These now solved some problems with alerting built-in and API to get to the data. However, now there are two systems, and this introduces complexity. The collector and the agents are pushing to the SaaS-based system along with an on-premises design.

These systems may need to be managed by two different teams. There can be cloud teams looking after the cloud-based SaaS solution and on-premises network or security teams looking at the on-premises monitoring. So, there is already a communication gap, not to mention the creation of a considerably siloed environment in one technology set—monitoring.

Also, questions arise about where to put the metrics in the SaaS-based product or Ganga. For example, we could have different metrics in the same place or the same metrics in only one spot. How can you keep track and ensure consistency?

Ideally, if you have a dispersed PoP design and expect your infrastructure to grow and plan for the future, you don’t want to have centralized collectors. But unfortunately, most on-premise solutions still have a push-based centralized model. 

Prometheus Monitoring: Prometheus Pull

Then Prometheus Pull came around and offered a new approach to monitoring. It can handle millions of metrics on modest hardware. In general, rather than having external services push metrics to it, Prometheus uses a pull approach compared to a push approach.

Prometheus network monitoring is a server application written in GO. It is an open-source, decentralized monitoring tool that can be centralized using the federate option. Prometheus has a server component, which you run in each environment. You can also run a Prometheus container in each Kubernetes pod.

We use a time-series database for Prometheus monitoring, and every metric is recorded with a timestamp. Prometheus is not an SQL database; you need to use PromQL as its query language, which allows you to query the metrics. 

Prometheus Monitoring: Legacy System

So, let us expand on this and look at two environments for monitoring. We have a legacy environment and a modern Kubernetes environment. We are running a private cloud for the legacy with many SQL, Windows, and Linux servers. There is nothing new here. Here, you would run Prometheus on the same subnet. A Prometheus agent would also be installed. 

We would have Node Exporters for both Linux and Windows, extracting and creating a metric endpoint on your servers. The metric endpoint is needed on each server or host so Prometheus can scrap the metrics. So, a Daemon is running, collecting all of the metrics. These metrics are exposed on a page, for example, http://host:port/metrics, that allows Prometheus to scrape. 

There is also a Prometheus federation feature. You can have a federated endpoint, allowing Prometheus to expose its metrics to other services. This will enable you to pull metrics around different subnets. So, we can have another Prometheus in a different subnet, scrapping the other Prometheus. The federate option will enable you to link these two Prometheus solutions quickly. 

Prometheus Monitoring
Diagram: Prometheus Monitoring. Source is Opcito

Prometheus Monitoring: Modern Kubernetes

Here, we have a microservices observability platform and a bunch of containers or VMs running in a Kubernetes cluster. In this environment, we usually create a namespace; for example, we could call the namespace monitoring. So, we deploy a Prometheus pod in our environments.

The Prometheus pod YAML file will point to the Kubernetes API. The Kubernetes API has a metric server that gets all metrics from your environments. So here we are, getting metrics for the container processes. You can deploy the library in your code if you want to instrument your application.

This can be done with Prometheus code libraries. We now have a metrics endpoint similar to before, and we can grab metrics specific to your application. We also have a metrics endpoint on each container that Prometheus can scrape.

**What is Prometheus Exporter**

Prometheus exporter is a specialized component that extracts and exposes metrics from third-party systems, applications, and services. It bridges Prometheus, an open-source monitoring and alerting toolkit, and the target system or application.

By implementing the Prometheus exporter, users can conveniently collect and monitor custom metrics, enabling them to gain valuable insights into their systems’ health and performance.

How Does Prometheus Exporter Work?

Prometheus exporter follows a simple and efficient architecture. It utilizes a built-in HTTP server to expose metrics in a format that Prometheus understands. The exporter periodically collects metrics from the target system or application and makes them available over HTTP endpoints. Prometheus then scrapes these endpoints, discovers and stores the metrics, and performs analysis and alerting based on the defined rules.

Benefits of Prometheus Exporter:

1. Flexibility: The Prometheus exporter provides flexibility in monitoring various metrics, making it suitable for multiple systems and applications. With its support for custom metrics, users can easily monitor specific aspects and gain insights into their systems’ behavior.

2. Compatibility: Due to its popularity, many systems and applications offer native support for Prometheus exporters. This compatibility allows users to effortlessly integrate the exporter into their existing monitoring infrastructure, eliminating the need for complex configurations or additional tools.

3. Extensibility: Prometheus exporter encourages extensibility by offering a straightforward mechanism to develop and expose custom metrics. This capability enables users to monitor specific application-level metrics critical to their unique monitoring requirements.

4. Scalability: With Prometheus exporter, users can scale their monitoring infrastructure as their systems grow. The exporter’s lightweight design and efficient data collection mechanism ensure that monitoring remains reliable and efficient, even in high-throughput environments.

Exposing Runtime Metrics: The Prometheus Exporter

  • Exporter Types:

To enable Prometheus monitoring, you must add a metric API to the application containers to support this. For applications that don’t have their metric API, we use what is known as an Exporter. This utility reads the runtime metrics the app has already collected and exposes them on an HTTP endpoint.

Prometheus can then look at this HTTP endpoint. So we have different types of Exporters that collect metrics for other runtimes, such as a JAVA Exporter, which will give you a set of JVM statistics, and a .NET Exporter, which will provide you with a set of Windows performance metrics.

Essentially, we are adding a Prometheus endpoint to the application. In addition, we use an Exporter utility alongside the application. So, we will have two processes running in the container. 

With this approach, you don’t need to change the application. This could be useful for some regulatory environments where you can’t change the application code. So now you have application runtime metrics without changing any code.

This is the operating system and application host data already collected in the containers. To make these metrics available to Prometheus, add an Exporter to the Docker Image. Many use the Exporters for legacy applications instead of changing the code to support Prometheus monitoring.

Essentially, we are exporting the statistics to a metric endpoint.

Summary: Prometheus Monitoring

In today’s rapidly evolving digital landscape, businesses rely heavily on their IT infrastructure to ensure smooth operations and deliver uninterrupted customer services. As the complexity of these infrastructures grows, so does the need for effective network monitoring solutions.

Prometheus, an open-source monitoring and alerting toolkit, has emerged as a popular choice for organizations seeking deep insights into their network performance.

Prometheus offers a robust and flexible platform for monitoring various aspects of an IT infrastructure, including network components such as routers, switches, and servers. It collects and stores time-series data, allowing administrators to analyze historical trends, detect anomalies, and make informed decisions.

Prometheus pull-based model

One of the critical strengths of Prometheus lies in its ability to collect data through a pull-based model. Instead of relying on agents installed on each monitored device, Prometheus pulls data from the targets it monitors. This lightweight approach minimizes the impact on the monitored systems, making it an efficient and scalable solution for networks of any size.

Prometheus PromQL

Prometheus employs a powerful query language called PromQL, which enables administrators to explore and manipulate the collected data. With PromQL, users can define custom metrics, create complex queries, and generate insightful visualizations. This flexibility allows organizations to tailor Prometheus to their specific monitoring needs and gain a comprehensive understanding of their network performance.

Prometheus Monitoring and Alerting

Another notable feature of Prometheus is its alerting system. Administrators can define custom alert rules based on specific metrics or thresholds, ensuring that they are promptly notified of any network issues. This proactive approach to monitoring allows businesses to mitigate potential downtime, minimize the impact on end-users, and maintain a high level of service availability.

Prometheus also integrates seamlessly with other widespread monitoring tools and platforms, such as Grafana, allowing users to create visually appealing dashboards and gain even deeper insights into their network performance. This interoperability makes Prometheus a versatile choice for organizations that already have an existing monitoring ecosystem in place.

Furthermore, Prometheus has a thriving community of contributors, ensuring constant updates, bug fixes, and new features. The active development community ensures that Prometheus stays relevant and up-to-date with the latest industry trends and best practices in network monitoring.

Conclusion:

Prometheus Monitoring has revolutionized how developers and DevOps teams monitor and troubleshoot modern software systems. With its efficient time-series database, flexible query language, and extensive ecosystem, Prometheus provides a comprehensive solution for monitoring and alerting needs. By leveraging its powerful features, organizations can gain valuable insights, ensure system reliability, and foster continuous improvement in their software delivery pipelines.