System Monitoring using Prometheus and Grafana
Prometheus is a free and open source system monitoring tool that stores all its data in a time series database. You can easily fine-tune the definition of your metrics and generate more accurate reports as well because it offers a powerful query language and multidimensional data. It helps you graphing the resulting data on dashboards.
Grafana is a leading graph and dashboard builder visualizing time series infrastructure and application metrics. It allows you to create alerts, notifications, and ad-hoc filters for your data.
In this tutorial, we will learn how to set up and monitor system using Prometheus and Grafana. In this first half of the tutorial, we will be looking at installing the Prometheus and Grafana using Docker. In the second half of the tutorial, we will be adding Prometheus to Grafana data source, importing the Prometheus stats dashboard and analyzing the data.
In this guide, we will be installing Prometheus server using Docker on an Alibaba Cloud Elastic Compute Service (ECS) Ubuntu server.
- You must have Alibaba Cloud Elastic Compute Service (ECS) activated and verified your valid payment method. If you are a new user, you can get a free account in your Alibaba Cloud account. If you don’t know about how to set up your ECS instance, you can refer to this tutorial or quick-start guide. Your ECS instance must have at least 1GB RAM and 1 Core processor.
- User with sudo access
Update the System
We recommend you to install any new packages on a freshly updated server. So, let’s upgrade all the available packages using the following command.
sudo apt-get update
As we know, we will be installing Prometheus using Docker, let’s install Docker first if you haven’t already. Here we will install Docker by downloading their installation script, it is the quickest way to install Docker.
wget -qO- https://get.docker.com/ | sh
The above command downloads and executes a small installation script written by the Docker team.
Next, you will need to add the user to docker group for working with docker. Execute the following command to do so.
sudo usermod -aG docker $(whoami)
Login again your server to activate your new groups.
You have successfully installed Docker on your server. Now, let’s install Docker Compose. Before installing Docker Compose you’ll have to install ‘python-pip’ as it is required to install Docker Compose.
sudo apt-get -y install python-pip
Now, you can install Docker Compose using the following command.
sudo pip install docker-compose
In this section, we will be installing the Prometheus server using Docker. The Prometheus server is responsible for collecting and storing metrics as well as processing expression queries and generating alerts. There are so many patterns available for organizing the storage of metrics. Here, we will use the Docker image’s default behavior of using a Docker data volume to store the metrics.
First of all, create a Prometheus configuration file using any text editor.
Add the following contents to the file.
# A scrape configuration scraping a Node Exporter and the Prometheus server
# Scrape Prometheus itself every 5 seconds.
- job_name: 'prometheus'
- targets: ['localhost:9090'] # Scrape the Node Exporter every 5 seconds.
- job_name: 'node'
- targets: ['YourServerIP:9100']
Replace YourServerIP with your actual server IP address.
Next, start the Prometheus Docker container with the external configuration file using the following command.
docker run -d -p 9090:9090 -v ~/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.memory-chunks=10000
The above command will pull the docker image from the Docker Hub. You can list all running Docker container using the following command.
You can inspect the logs of the running Prometheus server using the container ID, execute the following command to do so.
docker logs container_id
(Note — You will need to replace container_id with your actual Container ID)
Find out where on the host’s filesystem the metrics storage volume is stored using the following command.
docker inspect container_id
You should see the following output:
Installing Node Exporter
You will have to install Prometheus Node Exporter because It will expose Prometheus metrics about the host machine it is running on. We recommend you to run the Node Exporter directly on the host system outside of Docker because If we run it on Docker without further options it will only export metrics about the container’s environment, which will be different from the host’s environment.
Execute the following command to start the Node Exporter on port 9100 using Docker.
docker run -d -p 9100:9100 -v "/proc:/host/proc" -v "/sys:/host/sys" -v "/:/rootfs" --net="host" prom/node-exporter -collector.procfs /host/proc -collector.sysfs /host/proc -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)
The Prometheus server will now automatically start scraping the Node Exporter. Please visit your Prometheus server’s status page at http://YourServerIP:9090/status and verify that the http://YourServerIP:9100/metrics target for the
node job is now showing a HEALTHY state.
Grafana transforms multiple feeds from Prometheus into metrics in a centralized dashboard. You will need to install Grafana here for using both Prometheus and Grafana for system monitoring.
Execute the following command to launch Grafana as a Docker container. Please replace the YourPassword with your actual admin password.
docker run -d -p 3000:3000 -e "GF_SECURITY_ADMIN_PASSWORD=YourPassword" -v ~/grafana_db:/var/lib/grafana grafana/grafana
The above command will download the Grafana Docker image from the Docker Hub and automatically create a new Docker volume at ‘~/grafana_db’ on the host system and at ‘/var/lib/grafana’ in the container filesystem. Grafana will automatically create its SQLite 3 database.
You have successfully installed Grafana on your server. We recommend you to verify this installation by visiting http://YourServerIP:3000/. You should see the Login/SignUp webpage of Grafana like this:
You will need to login to Grafana using admin as username and password that you chose when downloading the Grafana, click on Login button.
After logging in, you should see Grafana’s main view.
Adding Prometheus as a Grafana Data Source
We will now add Prometheus to Grafana data source, importing the Prometheus stats dashboard and analyzing the data.
Let’s configure Grafana to access Prometheus server as the data source for creating graphs. If in case you have more than one Prometheus server running then you will need to configure this for every Prometheus server because each Prometheus server is a separate data source.
Open up your web browser and visit http://YourServerIP:3000/ and login to your account.
Once you logged in, click on the Grafana icon in the top left corner to show the main menu. Select Data Sources to navigate to the data source list page. Click Add New in the top navigation bar. You should see the following:
Create a new data source using the following values.
URL: http://:9090, (default port is 9090)
Basic Auth: According to Your Server
Finally, click on Add button to add your data source, and then click Test Connection to verify everything is working properly.
Importing the Prometheus Stats Dashboard
You have successfully added the data sources and checked that everything is working properly. Now, you will have to download and import the statics dashboard. You can download and import this dashboard from here.
To import a dashboard from a local JSON file, click the Choose file button in the Import File section. Find the downloaded
prometheus-dash.json on your local file system, and import it.
Once you will import the dashboard, you will immediately be taken to the Prometheus Stats Dashboard and you will see statistics from your Prometheus server.
Remember, you don’t forget to save your dashboard otherwise it will not appear once you close your browser.
All the panels are explained below individually so that one can easily learn to view the Prometheus stats dashboard.
- Uptime: This single stat graph is the uptime or the time since the Prometheus server was brought online. If your server has recently restarted you will find it very useful.
- Local Storage Memory Series: Current number of series held in memory will be displayed here.
- Internal Storage Queue Length: Ideally, this queue length should be Empty (0) or a low number.
- Sample Ingested: This graph displays the count of samples ingested by the Prometheus server, as measured over the last 5 minutes, per time series in the range vector. When troubleshooting an issue on IRC or Github, this is often the first stat requested by the Prometheus team. This number should align with the number of metrics you believe you are ingesting.
- Scrapes: Prometheus scrapes metrics from instrumented jobs. There is two scrapes graph, the first one is, Target scrapes graph will show how frequently targets are scraped as measured over the last 5 minutes. Another one is, the Scrape Duration graph will show how long the scrapes are taking with percentiles available as series.
Analyzing the Data
You have successfully installed and configured Prometheus server and Grafana to monitor your system. you have seen the graphs created by Grafana above. It is important to analyze the data and check usage of the data.
Since your Prometheus server is just newly created so the graphs will appear pretty flat and small. By the time, these graphs will vary greatly depending on your particular use case and workloads. Ideally, you should want these graphs to remain in a stable state. If targets start exporting more metrics, the number of ingested samples and target scrapes per second will increase
If your target scrapes drop unexpectedly, you will have to determine the cause and make the graph stable.
When the node on which the Prometheus server runs is simply not sufficient to handle the load then Slow rule evaluation will indicate you that.
Prometheus has an Alert Management component called AlertManager which gives you the alert via Email and other notification clients. You will need to define your alert rules in file “alert.rules”, so that Prometheus server can read the alert rules and can fire alerts at appropriate times.
For example, If the Prometheus server find the value of a metric is different from the threshold range that you have defined in the “alert.rules” file for more than 30 seconds, it will trigger the Alert Manager to fire an alert.