Working With CPU Metrics From Node Exporter

Run stress -c 5 on your server before starting this lesson.

With the Node Exporter up and running, we now have access to a number of infrastructure metrics on Prometheus, including data about our CPU. The processing power of our server determines how well basically everything on our server runs, so keeping track of its cycles can be invaluable for diagnosing problems and reviewing trends in how our applications and services are running.

For almost all monitoring solutions, including Prometheus, data for this metric is pulled from the /proc/stat file on the host itself, and in Prometheus these metrics are provided to us in expressions that start with node_cpu. Assuming we’re not running any guests on our host, the core expression for this that we want to review is the node_cpu_seconds_total metric.

node_cpu_seconds_total works as a counter — that is, it keeps track of how long the CPU spends in each mode, in seconds, and adds it to a persistent count. Counters might not seem especially helpful on their own, but combined with the power of math, we can actually get a lot of information out of it.

Most of the time, what would be helpful here is viewing the percentages and averages that our CPU spends in either the idle more or any working modes. In Prometheus, we can do this with the rate and irate queries, which calculate the per-second average change in the given time series in a range. irate is specifically for fast-moving counters (like our CPU); both should be used with counter-based metrics specifically.

We can see what amount of time our server spends in each mode by running irate(node_cpu_seconds_total[30s]) * 100 in the expression editor with a suggested limit of 30m, assuming you’re using a cloud playground server.

Additionally, we can check for things like the percentage of time the CPU is performing userland processes:

irate(node_cpu_seconds_total{mode="user"}[1m]) * 100

Or we can determine averages across our entire fleet with the avg operator for Prometheus:

avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100

Other metrics to consider include the node_cpu_guest_seconds_total metric, which works similarly to node_cpu_seconds_total but is especially useful for any machine running guest virtual machines.

Tutorial

Working With CPU Metrics From Node Exporter

Installing and configuring Prometheus Node Exporter on Amazon Linux, CentOS & RHEL

Working With Memory Metrics From Node Exporter

Courses

About

Tutorial

You may also like

Courses

About

Login with your site account

Register a new account