In this post, I would like to discuss Prometheus monitoring and its pull-based approach ( Prometheus Pull ) to the metric collection and the Prometheus metric types. Prometheus is a powerful open-source monitoring system created by SoundCloud to monitor and alert the infrastructure and applications within their environment. It has since become one of the most popular monitoring systems due to its ability to monitor various services, from simple applications to complex distributed systems.
Prometheus is designed to be simple to use. It uses a pull-based system, meaning it collects data from the services it monitors rather than having the services push the data to Prometheus. This makes it easy to set up and configure, allowing for great flexibility in the services it can monitor. It also has an intuitive user interface, making it easy to use and navigate.
- The traditional approach
First, let us roll back in time before we had Prometheus network monitoring, say ten years, and look at the birth of monitoring and the tools used. The monitoring tools often operated in a silo, which led to more blind spots.
The old approach to monitoring is considerably different from today’s approach to Observability. Before you proceed for more information on the difference between monitoring and Observability, you can visit this blog post: observability vs monitoring. Along with a new approach called chaos engineering Kubernetes that can help you better understand your system and its capabilities by breaking it with controlled experiments.
For monitoring, traditionally, you can use something like Ganglia. Ganglia was often used to monitor CDN networks involving several PoPs in different locations. However, within this CDN network, the PoPs look the same. The same servers, storage, etc., and only with the difference in the number of transit providers and servers. Then to alert, we can use Icinga and have this on the back of the Ganglia. With this monitoring design, the infrastructure pushes metrics to the central collectors. The central collectors are in one location, maybe two for backup, but often two locations.
- A key point: Back to basics. Prometheus Network Monitoring.
Prometheus network monitoring is an open-source, metrics-based system. It includes a robust data model and a query language that lets you analyze how your applications and infrastructure are performing. It does not try to solve problems outside the metrics space and works solely in metric-based monitoring. However, it can be augmented with tools and a platform for additional observability. For Prometheus to work, you need to instrument your code.
There are client libraries in all the popular languages and runtimes for instrumenting your code, including Go, Java/JVM, C#/.Net, Python, Ruby, Node.js, Haskell, Erlang, and Rust. In addition, software like Kubernetes and Docker are already instrumented with Prometheus client libraries. So you say these are out of the box, ready for Prometheus to scrap their metrics.
In the following diagram, you will see the Prometheus settings from a fresh install. I have download Prometheus from the Prometheus website on my local machine. Prometheus, by default, listens on port 9090 and contains a highly optimized Time Series Database (TSDB), which you can see is started. Also, displayed at the very end of the screenshot is the default name of the Prometheus configuration file.
The Prometheus configuration file is written in YAML format, defined by the scheme. I have done a CAT on the Prometheus configuration file to give you an idea of what it looks like. In the default configuration, a single job called Prometheus scrapes the time series data exposed by the Prometheus server. The job contains a single, statically configured target, the local host on port 9090.
Prometheus Network Monitoring: The Challenges
However, you will see some issues as infrastructure grows ( infrastructure does grow at alarming rates ) and the need to push more metrics into Ganglia. For example, with some monitoring systems, the pushing style of the metric collection can cause scalability issues as the number of servers increases. Especially in more effective distributed systems observability use cases.
Within this CDN monitoring design, only one or two machines collect the telemetry for your infrastructures. So as you scale your infrastructure and throw more data at the system, you have to scale up instead of out. This can be costly and will often hit bottlenecks.
However, you want a monitoring solution to scale your infrastructure growth. As you roll out new infrastructure to meet demands, you want to have monitoring systems that can scale. So as the infrastructure scales, the monitoring system can scale, such as in the use case with Prometheus network monitoring. However, with Ganglia and Icinga, we also have limited graphing functions.
Creating custom dashboards on unique metrics was hard, and no alerting support existed. Also, there was no API to get and consume the metric data around that time. So if you wanted to get to the data and consume it in a different system or perform interesting analyses, all of it is locked into the Ganglia.
Transitions: Prometheus network monitoring and Prometheus Pull
Around eight years ago, Ganglia introduced SaaS-based monitoring solutions. These now solved some problems with alerting built-in and API to get to the data. However, now there are two systems, and this introduces complexity. The collector and the agents are pushing to the SaaS-based system along with an on-premises design.
These systems may need to be managed by two different teams. There can be cloud teams looking after the cloud-based SaaS solution and an on-premises network or security teams looking at the on-premises monitoring. So there is already a communication gap. Not to mention creating a considerable siloed environment in one technology set – monitoring.
Also, questions arise about where to put the metrics in the SaaS-based product or Ganga. For example, we could have different metrics in the same place or the same metrics in only one place. How can you keep track and ensure consistency? Ideally, if you have a dispersed PoP design and expect your infrastructure to grow and plan for the future, you don’t want to have centralized collectors. But unfortunately, most on-premise solutions still have a push-based centralized model.
Prometheus Monitoring: Prometheus Pull
Then Prometheus Pull came around and offered a new approach to monitoring and can handle millions of metrics on modest hardware. In general, rather than having external services pushing metrics to them. Prometheus uses a pull approach in comparison to a push approach.
Prometheus network monitoring is a server application that is written in GO. It is an open-source, decentralized monitoring tool but can be centralized when you use the federate option. Prometheus has a server component, and you run this in each environment. You can, if you want, run a Prometheus container in each Kubernetes pod.
We use a time-series database for Prometheus monitoring, and every metric is recorded with a timestamp. Prometheus is not a SQL database; you need to use PromQL as its query language. PromQL allows you to query the metrics.
Prometheus Monitoring: Legacy System
So let us now expand on this and look at two environments for monitoring. We have a legacy environment and a modern Kubernetes environment. We are running a private cloud for the legacy with many SQL, Windows, and Linux servers. Nothing new here. Here you would run Prometheus on the same subnet. There would also be a Prometheus agent installed.
We would have Node Exporters for both Linux and Windows, extracting and creating a metric endpoint on your servers. The metric endpoint is needed on each server or host so Prometheus can scrap the metrics. So a Daemon is running, collecting all of the metrics. So these metrics are exposed on a page, for example, http://host:port/metrics, that allows Prometheus to scrape.
There is also a Prometheus federation feature. You can have a federated endpoint, allowing Prometheus to expose its metrics to other services. This allows you to pull metrics around different subnets. So we can have another Prometheus in a different subnet scrapping the other Prometheus. So the federate option allows you to link these two let’s say, Prometheus solutions together very quickly.
Prometheus Monitoring: Modern Kubernetes
So here we have a microservices observability platform and a bunch of containers or VM running in a Kubernetes cluster. In this environment, we usually create a namespace; for example, we could call the namespace monitoring. So here, we deploy a Prometheus pod in our environments.
So the Prometheus pod YAML file will point to the Kubernetes API. The Kubernetes API has a metric server, which will get all metrics from your environments. So here we are, getting metrics for the container processes. We can deploy the library in your code if you want to have instrumentation for your application.
This can be done with Prometheus code libraries. So we now have a metrics endpoint similar to before, and we can grab metrics specific to your application. So we have a metrics endpoint on each container that Prometheus can scrape.
Exposing Runtime Metrics: The Prometheus Exporter
- Exporter Types:
To enable Prometheus monitoring, you must add a metric API to the application containers to support this. For applications that don’t have their metric API, we use what is known as an Exporter. This utility reads the runtime metrics the app has already collected and exposes them on an HTTP endpoint.
Prometheus can then look at this HTTP endpoint. So we have different types of Exporters that collect metrics for different runtimes, such as a JAVA Exporter, which will give you a set of JVM statistics, and a .NET Exporter will give you a set of Windows performance metrics.
Essentially, we are adding a Prometheus endpoint to the application. In addition, we use an Exporter utility alongside the application. So we will have two processes running in the container.
With this approach, you don’t need to change the application. So this could be useful for some regulatory environments where you can’t change the application code. So now you have application runtime metrics without changing any code.
This is the operating system and application host data already collected in the containers. To make these metrics available to Prometheus, add an Exporter to the Docker Image. Many use the Exporters for legacy applications instead of changing the code to support Prometheus monitoring. So essentially, what we are doing is exporting the statistics to a metric endpoint.
- Fortinet’s new FortiOS 7.4 enhances SASE - April 5, 2023
- Comcast SD-WAN Expansion to SMBs - April 4, 2023
- Cisco CloudLock - April 4, 2023