February 26, 20258 min read

Docker container keeps restarting: how to debug and fix

Systematic guide to debugging a Docker container stuck in a restart loop. Exit codes explained, reading logs, common causes, and restart policy behavior.

dockerdebuggingrestartcontainersdevops

If your Docker container keeps restarting, you're in the right place. This page exists to be the most useful resource when something is on fire and you need answers fast. We'll walk through every possible cause systematically — from reading the exit code to finding the actual crash reason in logs to the common issues that catch people off guard.

Start at the top. Don't skip ahead.

Step 1: Look at what docker ps is showing you

docker ps -a

The output looks something like:

CONTAINER ID   IMAGE     COMMAND        STATUS                          RESTARTS   NAMES
a1b2c3d4e5f6   myapp     "node app.js"  Restarting (1) 10 seconds ago   8          api-service

Key things to read:

STATUS: Restarting (1) 10 seconds ago — the (1) is the exit code from the last restart. Exit code 1 means application error. Exit code 137 means OOM kill or SIGKILL. Exit code 143 means SIGTERM.
RESTARTS: How many times Docker has restarted it. 8 restarts in a short period is a crash loop.

If the status shows Exited instead of Restarting, the container is stopped and won't come back because either there's no restart policy, or it hit the restart limit.

Step 2: Read the exit code

docker inspect <container_name> --format='ExitCode: {{.State.ExitCode}} | OOMKilled: {{.State.OOMKilled}} | Error: {{.State.Error}}'

Match the exit code to a cause:

Exit code	Cause	What to do
`0`	Clean exit (not a crash)	Container finished intentionally — wrong restart policy
`1`	Application error	Read the logs — the app crashed
`2`	Shell error	Check entrypoint script
`126`	Permission denied	File permissions or SELinux issue
`127`	Command not found	Bad entrypoint, missing binary in image
`137` + OOMKilled=true	Out of memory	Increase memory limit, check for leak
`137` + OOMKilled=false	Killed by SIGKILL	Deployment, manual kill, or host under pressure
`139`	Segfault	Likely a C/C++/Rust binary issue; check core dumps
`143`	SIGTERM received	Graceful shutdown — check if intentional

For exit code 1 (the most common): the cause is in the logs. Go to step 3.

For exit code 137, read our OOM kill deep dive.

Step 3: Read the logs

# Last 50 lines from the most recent run
docker logs --tail 50 <container_name>

# Last 50 lines with timestamps
docker logs --tail 50 -t <container_name>

# Follow logs in real time as the container restarts
docker logs -f <container_name>

# Logs since a specific time (useful if the container last crashed hours ago)
docker logs --since 2025-01-15T03:00:00 <container_name>

What to look for:

The crash message is almost always in the last 5-20 lines before the log cutoff. Common patterns:

# Node.js crashes:
Error: ENOENT: no such file or directory, open '/app/config.json'
Error: connect ECONNREFUSED 127.0.0.1:5432

# Python crashes:
ImportError: No module named 'psycopg2'
psycopg2.OperationalError: could not connect to server: Connection refused

# Java/Spring crashes:
APPLICATION FAILED TO START
Description: Failed to configure a DataSource: 'url' attribute is not specified

# Go crashes:  
panic: runtime error: invalid memory address or nil pointer dereference
dial tcp: lookup postgres on 127.0.0.11:53: no such host

# Database connection refused (common to all languages):
Error: Connection refused to postgres:5432

Notice that "connection refused" appears across all languages. That's often not an application bug — it's a startup order problem (more on this below).

Step 4: Check if it's a startup order issue

This is the most common cause of containers restarting that people overlook. Your API container starts before the database is ready, tries to connect, fails, crashes, restarts, and the cycle continues.

Docker Compose's depends_on helps with container start order, but it only waits for the container to start, not for the service inside it to be ready:

# This only waits for the postgres CONTAINER to start, not postgres to be READY
depends_on:
  - postgres

The fix:

# This waits for postgres to be healthy (i.e., actually accepting connections)
depends_on:
  postgres:
    condition: service_healthy

postgres:
  image: postgres:16
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U myuser -d mydb"]
    interval: 5s
    timeout: 5s
    retries: 5
    start_period: 10s

Alternative: add retry logic in your application code. Most ORMs support connection retries. Don't assume database connectivity on startup — retry with exponential backoff.

Step 5: Check for missing environment variables

A common cause of exit code 1 is the application crashing because it can't find a required environment variable:

# See what environment variables the container has
docker inspect <container_name> --format='{{range .Config.Env}}{{println .}}{{end}}'

# Or exec into the running container
docker exec -it <container_name> env

Compare to what your application requires. Missing DATABASE_URL, SECRET_KEY, or API_KEY variables are a frequent culprit, especially after a config change or new deployment.

Step 6: Check for port conflicts

If your application fails to bind to a port because something else is already using it:

# See if a port is already in use on the host
ss -tlnp | grep :8080

# Or with netstat
netstat -tlnp | grep :8080

Application logs will typically say "address already in use" or "port 8080 is already in use." Inside a container, port conflicts can also occur if you're running multiple processes that try to bind the same port.

Step 7: Check file permissions and volume mounts

# See what volumes are mounted
docker inspect <container_name> --format='{{json .Mounts}}' | python3 -m json.tool

# Check permissions on a mounted directory
ls -la /path/to/host/directory

Common volume permission issues:

Host directory owned by root, container running as non-root user
Host SELinux/AppArmor policies blocking access
Read-only volume mount on a path the app tries to write to

How restart policies work

Your container's restart behavior depends on the restart policy set in your Compose file or Docker run command:

Policy	Behavior
`no`	Never restart (default if not set)
`always`	Always restart, even on clean exit (code 0)
`unless-stopped`	Restart unless manually stopped
`on-failure`	Only restart on non-zero exit code
`on-failure:5`	Only restart on non-zero exit, max 5 times

A container with restart: always will keep restarting forever, even if the exit code is 0. This is usually wrong if your process exits cleanly — use on-failure instead.

services:
  api:
    image: myapp:latest
    restart: on-failure:3  # Restart on crash, max 3 attempts

After 3 failed restarts, Docker stops attempting and the container stays in Exited state. This is usually what you want in production — stop the restart loop and alert so a human can investigate.

Common causes by symptom

Container restarts immediately (< 1 second)

Almost always a missing file, bad config, or immediate panic. Read logs immediately after the restart starts:

docker logs --tail 20 <container_name> 2>&1 | head -30

Container runs for 30-120 seconds then crashes

Likely a lazy initialization issue — the app starts, warms up, then tries to connect to something it couldn't reach, or loads a large dataset and OOM kills:

# Watch memory growth in real time
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.CPUPerc}}" <container_name>

Container crashes only under load

Concurrency bug or resource exhaustion. The problem only manifests when requests are hitting the service. Check for:

Connection pool exhaustion (too many concurrent DB connections)
Thread/goroutine leak that accumulates until OOM
Race condition in global state

Container crashes after a deployment

Your new image has a bug. Roll back:

# Roll back to the previous image tag
docker stop api-service
docker rm api-service
docker run -d --name api-service myapp:previous-stable-tag

Getting alerts before the restart count explodes

If you find out about a restart loop because a user reports it, you're too late. The right approach is to be notified at the first sign of trouble.

Kernus monitors restart counts across all your containers and can alert you when:

A container restarts more than N times in a time window
A container exits with a non-zero exit code
A container is OOM killed

The alert includes the exit code, OOM status, and the last log lines at the time of the crash — so you have context immediately, not after you've SSH'd in to investigate.

For OOM kill specifics: OOM kills in Docker — how to detect and prevent them. To set up restart alerts: How to set up alerts for Docker containers.

Monitor container restarts automatically — try Kernus free →

Try Kernus free

Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.

Start monitoring

← Back to all posts