Run Telegraf - InfluxDB - Grafana stack through docker compose

Run Telegraf - InfluxDB - Grafana stack through docker compose

·

9 min read

Monitoring your docker containers using TIG stack with docker-compose

In the world of monitoring there are a lot of tools, sometimes gathered in stacks that make you possible to monitore your IT services from hardware to software. Basically these stacks help you to navigate through logs and datas produced by the different level of you IT services (hard, middle and software) and display them in a human readable way such as graph and dashboard.

One of these stack is Telegraf, InfluxDB and Grafana. Let's see how we can run these tools within a docker compose.

VM Configuration:

This example is running on the following VM configuation:

  • Hypervisor: VirtualBox
  • OS: Ubuntu 18.04
  • RAM: 8GB
  • CPU: 2
  • Disk size: 10GB

Use case: How monitore docker's container ?

This example does not take into account security needs of a production environment !

The stack

Below some useful links:

In order to quickly understand the role of each component let's summurize them:

  • Telegraf: Will be used as an agent that collects metrics from docker daemon
  • InfluxDB: Will be our time series database that store all up coming datas from telegraf
  • Grafana: Will be used to display our datas in pretty dashboards through a web interface.

Note: This a very quick sum up of these techs: for more information please consult the documentation

Requirements

To deploy the stack we will use the docker-compose CLI which relies on docker. So we will need both of them:

Here we go !

Let's build our stack step by step. As advised in the best practices it's is better to use a freeze version rather than the latest.

Note: want to know more about docker best practices ? See my article on it ;)

Prepare your workspace:

mkdir /home/$USER/TIG_workspace
cd /home/$USER/TIG_workspace

General configuration

  • docker-compose version: 3.3
  • 3 containers we will be created: one by service
  • These three containers will need a dedicated network managed by docker. Let' so call it TIG_net
  • As it is better to get use to name the container (notably for monitoring). We will follow this pattern: TIG_<service>

Telegraf

Requirements:

  • Docker image: telegraf:1.19.3
  • This container needs that the influx database is up and running
  • We need to access the telegraf configuration file which is located in: /etc/telegraf/telegraf.conf
  • We need to bind docker socket to the host machine to pull containers' data
  • Restart strategy: We want the the container restart it self in case of unexepected crash.

Now we can translate these requirements in yaml instruction:

version: "3.3"
services:
  telegraf:
    image: telegraf:1.19.3
    container_name: TIG_telegraf
    depends_on:
      - influxdb
    restart: unless-stopped
    volumes:
      - "./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf"
      - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
      - TIG_net

Concerning the volume : before running the container we need to create the ./telegraf/telegraf.conf on the host side, otherwise docker will create telegraf.conf as directory and send an error when running the service:

# Go into your workspace
cd /home/$USER/TIG_workspace

# Create the telegraf directory
mkdir telegraf

# Get the config file with create a container from the telegraf:1.19.3 image
docker run  --rm telegraf:1.19.3 cat /etc/telegraf/telegraf.conf >> telegraf/telegraf.conf

Some explanations here:

  • docker run --rm telegraf:1.19.3 : create a container from the telegraf:1.19.3 image and remove it once the process within it is over
  • cat /etc/telegraf/telegraf.conf: overrite the entrypoint to display the default config file
  • >> telegraf/telegraf.conf: send the output in the config file on the host.

InfluxDB

Requirements:

  • Docker image: influxdb:1.8
  • InfluxDB is accessible through the 8086 port, this port will be used by Telegraf to send datas, and by Grafana to pull datas
  • Environments variables are mandatory:
    • INFLUX_DB
    • INFLUXDB_USER
    • INFLUXDB_USER_PASSWORD
  • Restart strategy: as we defined for telegraf
  • We need a docker volume to store our datas
  influxdb:
    image: influxdb:1.8
    container_name: TIG_influxdb
    environment:
      INFLUX_DB: $INFLUX_DB
      INFLUXDB_USER: $INFLUXDB_USER
      INFLUXDB_USER_PASSWORD: $INFLUXDB_USER_PASSWORD
    ports:
      - 8086:8086
    restart: unless-stopped
    volumes:
      - "influxdb_volume:/var/lib/influxdb"
    networks:
      - TIG_net

_Notes:

  • Concerning the volume, for telegraf's config file we use a bind mount, where we use a docker volume for influxdb. More information here
  • Environment variables are stored in an aside .env file so that we are not displaying sensitve information like passwords

Grafana

Let's finish this configuration with Grafana requirements:

  • Docker image: grafana/grafana:8.1.2
  • Restart strategy: as we defined for telegraf and influxdb
  • Grafana is accessible through a web interface on port 3000, we want to access it from the host machine, so will bind it to the same port on the host.
  • Environment variables are mandatory:
    • GF_SECURITY_ADMIN_USER
    • GF_SECURITY_ADMIN_PASSWORD
  • We need a docker volume to store grafana's data (such as plugins or afterwards dashboards)
  grafana:
    image: grafana/grafana:8.1.2
    container_name: TIG_grafana
    restart: unless-stopped
    ports:
      - 3000:3000
    environment:
      GF_SECURITY_ADMIN_USER: $GF_SECURITY_ADMIN_USER
      GF_SECURITY_ADMIN_PASSWORD: $GF_SECURITY_ADMIN_PASSWORD
    volumes:
      - "grafana_volume:/var/lib/grafana"
    networks:
      - TIG_net

The .env file

The last step is to create a .env file that will store our sensitive datas (such as passwords). Keep in mind that is not a production security level but it's way better than writing password directly in the docker-compose file.

Let's create it:

echo "INFLUX_DB=telegraf
INFLUXDB_USER=telegraf_user
INFLUXDB_USER_PASSWORD=telegraf_password

GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=grafana" > .env

Overview

Final yml file

Now we have a final configuration we can create and edit our docker-compose.yml file (using your favorite text editor). We just need to add some docker instruction to tell docker daemon to add both influxdb and grafana volumes and the TIG_net network. We usually write these informations at the end of the file.

vim docker-compose.yml
version: "3.3"
services:
  influxdb:
    image: influxdb:1.8
    container_name: TIG_influxdb
    environment:
      INFLUX_DB: $INFLUX_DB
      INFLUXDB_USER: $INFLUXDB_USER
      INFLUXDB_USER_PASSWORD: $INFLUXDB_USER_PASSWORD
    ports:
      - 8086:8086
    restart: unless-stopped
    volumes:
      - "influxdb_volume:/var/lib/influxdb"
    networks:
      - TIG_net

  telegraf:
    image: telegraf:1.19.3
    container_name: TIG_telegraf
    depends_on:
      - influxdb
    restart: unless-stopped
    volumes:
      - "./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf"
      - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
      - TIG_net

  grafana:
    image: grafana/grafana:8.1.2
    container_name: TIG_grafana
    restart: unless-stopped
    ports:
      - 3000:3000
    environment:
      GF_SECURITY_ADMIN_USER: $GF_SECURITY_ADMIN_USER
      GF_SECURITY_ADMIN_PASSWORD: $GF_SECURITY_ADMIN_PASSWORD
    volumes:
      - "grafana_volume:/var/lib/grafana"
    networks:
      - TIG_net

networks:
  TIG_net:
    driver: bridge

volumes:
  influxdb_volume:
  grafana_volume:

Representation of our architecture

TIG_stack2.drawio.png

Up the containers

Now our docker-compose.yml is ready, we can run the containers:

docker-compose up -d

Output:

Creating network "03_tig_TIG_net" with driver "bridge"
Creating volume "03_tig_influxdb_volume" with default driver
Creating volume "03_tig_grafana_volume" with default driver
Creating TIG_influxdb ... done
Creating TIG_grafana  ... done
Creating TIG_telegraf ... done

At any time you can check each service log

docker logs TIG_telegraf

Note: This feature can be very useful when debugging services in general. We can add -f to follow the log progression on your stdout.

By now you should access your grafana server through this url: localhost:3000

Note: you can reach this server using the container IP address: 1. Get the container ip address:

docker inspect TIG_grafana | grep IP

Output

            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
                    "IPAMConfig": null,
                    "IPAddress": "172.27.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,

2. Connect to http://172.27.0.3:3000 (in my case)

Telegraf configuration

By now our containers are up and running but no datas from Docker are collected. Go in the telegraf config file and edit it, as the volume has been bind there is no need to exec anything in the TIG_telegraf container. Simply edit the telegraf config file on your host:

vim /home/$USER/TIG_workspace/telegraf/telegraf.conf

Note:

  • The telegraf.conf file is quite long (~ 8000 lines !) I won't display it here. When navigate through this file you can find out a dedicated section for and docker.
  • If your at ease with vim go ahead, else you should use your favorite text editor (regarding the file's amount of line)

Inputs section

Docker section extract:

 [[inputs.docker]]
   ## Docker Endpoint
   ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
   ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
   endpoint = "unix:///var/run/docker.sock"

   ## Set to true to collect Swarm metrics(desired_replicas, running_replicas)
   gather_services = false

   ## Only collect metrics for these containers, collect all if empty
   container_names = []

   ## Set the source tag for the metrics to the container ID hostname, eg first 12 chars
   source_tag = false

   ## Containers to include and exclude. Globs accepted.
   ## Note that an empty array for both will include all containers
   container_name_include = []
   container_name_exclude = []
...

Here we will use endpoint = "unix:///var/run/docker.sock" to pull datas from the container (remember when we binded the docker socket on TIG_telegraf container ?)

Note:

  • It's definitely worth spending five minutes reading the docker section to understand how it works
  • In these case, documentation can be very useful: docker input doc

Output section

As we did for docker's input section let's configure the output which is going to be influxdb

[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
  urls = ["http://influxdb:8086"]

Note: here we modified the last url with urls = ["http://influxdb:8086"], in fact docker can manage domain name within the docker-compose.yml file

Check data in influxdb

To check datas are pulled to influxdb we can use the influxdb CLI from the TIG_influxdb container:

docker exec -it TIG_influxdb influx

Here some output:

> show databases
name: databases
name
----
_internal
telegraf

> use telegraf
Using database telegraf

> show measurements
name: measurements
name
----
cpu
disk
diskio
docker
docker_container_blkio
docker_container_cpu
docker_container_mem
docker_container_net
docker_container_status
kernel
mem
processes
swap
system

Note: more information about influxdb CLI here, syntax is similare to SQL

Configure Grafana

Set up database

Now that datas are pulled from vshere and docker, we will set grafana so that it will be able to display them in pretty dashboard

Go to localhost:3000

Pasted image 20220109191503.png

Go in the Configuration section (gear on the left pannel) and data sources

Pasted image 20220109191913.png

Click on Add data source and select influxDB. You should reach the InfluxDB configuration, feed it with:

Pasted image 20220109192215.png

(scroll down)

Pasted image 20220109192331.png

Write down the credential (they are in the .env in case you forgot) and click on Save & test This mean everything went well:

Pasted image 20220109192428.png

Import a complete dashboard

A community is designing dashboard through different use case. One them will interest us: The docker one

Let's import it through the grafana interface:

Pasted image 20220109193235.png

Pasted image 20220109193343.png

Then on the left Pannel go to Dashboard/Manage and click the Docker Dashboard

Pasted image 20220109193456.png

Enjoy your pretty dashboard ;)

Pasted image 20220109193700.png

Trouble shot

Cannot connect with admin/admin password with grafana on localhost:3000

Reset your passord using the grafana CLI. Without using a container the command is grafana-cli --homepath "/usr/share/grafana" admin reset-admin-password <new password>. Let's execute within our container

docker exec -it TIG_grafana grafana-cli --homepath "/usr/share/grafana" admin reset-admin-password admin

The output:

INFO[01-09|17:00:42] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO[01-09|17:00:42] Starting DB migrations                   logger=migrator
INFO[01-09|17:00:42] migrations completed                     logger=migrator performed=0 skipped=330 duration=983.026µs