April 16, 20257 min read

Docker Compose monitoring: how to track all your services

Practical guide to monitoring a Docker Compose stack. Healthchecks, monitoring all services from one place, and getting alerted when one fails.

dockerdocker-composemonitoringhealthcheckdevops

Most Docker setups start with a docker-compose.yml. Web server, database, Redis, maybe a background worker — a handful of services running together on a single host. Docker Compose monitoring is the practice of keeping visibility over all of those services from one place and getting notified when any of them fails. This is a practical guide to healthchecks, what they test vs what they miss, and how to monitor a Compose stack properly.

How Docker Compose handles health

Docker Compose has a built-in healthcheck configuration that runs a command inside your container on a schedule. If the command succeeds (exit code 0), the container is "healthy." If it fails repeatedly, it becomes "unhealthy."

services:
  api:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

The fields:

test: Command to run inside the container to check health
interval: How often to run the check (30s is a reasonable default)
timeout: How long the healthcheck has to complete before it's considered timed out
retries: How many consecutive failures before marking unhealthy
start_period: Grace period during startup before failures count against the retries

# See healthcheck status
docker ps
# CONTAINER ID  ...  STATUS
# a1b2c3d4e5f6  ...  Up 2 minutes (healthy)
# f6e5d4c3b2a1  ...  Up 2 minutes (unhealthy)

Healthcheck patterns for common services

Web/API services

api:
  image: myapp:latest
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
    interval: 30s
    timeout: 5s
    retries: 3
    start_period: 30s

Your /health endpoint should:

Return HTTP 200 on success
Check that critical dependencies (DB connection, cache) are reachable
Respond in under 1s (not a full integration test)
Return non-200 if the service can't serve requests

PostgreSQL

postgres:
  image: postgres:16
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
    interval: 10s
    timeout: 5s
    retries: 5
    start_period: 10s

pg_isready checks that PostgreSQL is accepting connections. It's included in the Postgres image — no additional tooling needed.

Redis

redis:
  image: redis:7-alpine
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 10s
    timeout: 3s
    retries: 5

redis-cli ping returns PONG if Redis is healthy. Quick and reliable.

Nginx / reverse proxy

nginx:
  image: nginx:alpine
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost/health"]
    interval: 30s
    timeout: 5s
    retries: 3

You'll need to add a /health location to your nginx config:

location /health {
    access_log off;
    return 200 "healthy\n";
    add_header Content-Type text/plain;
}

Background workers (queue consumers)

Workers are harder to healthcheck because they don't listen on HTTP. Options:

worker:
  image: myapp:latest
  command: python worker.py
  healthcheck:
    # Check a PID file that the worker writes on startup
    test: ["CMD", "test", "-f", "/tmp/worker.pid"]
    interval: 30s
    timeout: 5s
    retries: 3

Or use a custom health script that checks if the worker is making progress (reading a counter from Redis, for example):

#!/bin/bash
# /usr/local/bin/worker-health.sh
LAST_PROCESSED=$(redis-cli get worker:last_processed_at)
NOW=$(date +%s)
# Fail if no message processed in last 5 minutes
if [ $((NOW - LAST_PROCESSED)) -gt 300 ]; then
  exit 1
fi
exit 0

What healthchecks test vs what they miss

Docker healthchecks are useful but limited. Understanding the gap helps you decide where additional monitoring is needed.

What healthchecks tell you

The container process is running and responding
The service can accept basic requests
Critical dependencies (DB, cache) are reachable at the moment of the check

What healthchecks miss

Degraded performance — a healthcheck that passes in 0.1 seconds doesn't tell you when p99 latency is 8 seconds. The service is "healthy" by the check but users are experiencing timeouts.

Memory leaks — a container leaking 50MB/hour passes every healthcheck until it finally OOM kills. By then it's too late.

Queue backlogs — a worker container is "running" but has 10,000 unprocessed jobs backed up. Healthcheck passes, users are waiting.

Partial failures — one replica in a multi-replica setup is unhealthy, but others are handling traffic. Your single healthcheck passing on one container doesn't tell you the others are down.

Network isolation — the container thinks it's healthy, but outbound network access broke and it can't reach external APIs. A simple localhost healthcheck passes.

This is why healthchecks are necessary but not sufficient for production monitoring. You need:

Healthchecks: immediate detection of complete service failure
Resource monitoring: detecting degradation before it becomes failure
Uptime tracking: historical view of when services were healthy vs not
Alerts: proactive notification when health status changes

Monitoring all services in a Compose stack

Adding the Kernus agent to your Compose stack

You can run the Kernus agent as a service in your docker-compose.yml alongside your application services:

# docker-compose.yml
version: '3.8'

services:
  api:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: myapp
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d myapp"]
      interval: 10s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s

  worker:
    image: myapp:latest
    command: python worker.py
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  # Add Kernus agent alongside your services
  kernus-agent:
    image: kernus/agent:latest
    environment:
      - KERNUS_AGENT_TOKEN=${KERNUS_AGENT_TOKEN}
      - KERNUS_HOST_NAME=${HOSTNAME:-my-server}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: unless-stopped
    network_mode: host  # Needed to resolve host metrics

The agent reads Docker's API via the mounted socket. It automatically discovers every container in your stack — you don't configure which services to monitor, it finds them all.

Alternatively, install the agent directly on the host instead of as a container, which is the approach we recommend for production:

curl -fsSL https://kernus.app/install | sh
kernus token YOUR_TOKEN --host company-backend
kernus agent start

Windows: Use WSL or Git Bash for this one-liner, or use PowerShell (see Windows install).

A typical compose monitoring setup

For a production api + postgres + redis + worker stack:

Alert rules to create:

Rule 1: Any service down
  Condition: Container status != running
  Duration: 1 minute
  Filter: (all containers)
  Channel: #production-alerts

Rule 2: High API memory
  Condition: Memory > 80% of limit
  Duration: 10 minutes
  Filter: api (container name)
  Channel: #infrastructure-warn

Rule 3: Worker restart storm
  Condition: Restart count increases by 5
  Duration: 30 minutes  
  Filter: worker
  Channel: #production-alerts

Rule 4: Any OOM kill
  Condition: Container OOM killed
  Duration: Immediate
  Filter: (all containers)
  Channel: SMS + Slack

The depends_on healthcheck pattern (critical for reliability)

One of the most common causes of container restart loops in Compose stacks is service startup order. Your API starts before the database is ready, tries to connect, fails, restarts. The solution is depends_on with condition: service_healthy:

services:
  api:
    depends_on:
      postgres:
        condition: service_healthy  # API won't start until postgres is HEALTHY
      redis:
        condition: service_healthy

  worker:
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      api:
        condition: service_healthy  # Worker waits for API too (optional)

This requires that each dependency has a healthcheck configured. Without a healthcheck, condition: service_healthy won't work (it'll fail to start or treat the service as always healthy).

The full flow on docker-compose up:

Postgres starts, healthcheck begins polling
Redis starts, healthcheck begins polling
Once both are healthy, API and worker start
If API healthcheck fails after startup, Docker reports it as unhealthy (dependent services won't automatically restart, but your monitoring will alert)

Viewing your Compose stack health

# See all services with health status
docker compose ps

# Output:
# NAME       IMAGE      SERVICE   STATUS         PORTS
# api        myapp      api       running         0.0.0.0:8080->8080/tcp
# postgres   postgres   postgres  running          5432/tcp
# redis      redis      redis     running          6379/tcp
# worker     myapp      worker    running (unhealthy)

# Watch in real time
watch docker compose ps

The (unhealthy) suffix means the container is running but failing its healthcheck. It's still running — it hasn't crashed — but something is wrong.

For setting up alerts when a service fails: How to set up Docker container alerts for Slack, Discord, Telegram. For the complete Docker monitoring approach: Docker container monitoring — complete guide 2025.

Monitor your entire Compose stack from one dashboard →

Try Kernus free

Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.

Start monitoring

← Back to all posts