Docker container keeps restarting: how to debug and fix
Systematic guide to debugging a Docker container stuck in a restart loop. Exit codes explained, reading logs, common causes, and restart policy behavior.
If your Docker container keeps restarting, you're in the right place. This page exists to be the most useful resource when something is on fire and you need answers fast. We'll walk through every possible cause systematically — from reading the exit code to finding the actual crash reason in logs to the common issues that catch people off guard.
Start at the top. Don't skip ahead.
Step 1: Look at what docker ps is showing you
docker ps -a
The output looks something like:
CONTAINER ID IMAGE COMMAND STATUS RESTARTS NAMES
a1b2c3d4e5f6 myapp "node app.js" Restarting (1) 10 seconds ago 8 api-service
Key things to read:
- STATUS:
Restarting (1) 10 seconds ago— the(1)is the exit code from the last restart. Exit code 1 means application error. Exit code 137 means OOM kill or SIGKILL. Exit code 143 means SIGTERM. - RESTARTS: How many times Docker has restarted it. 8 restarts in a short period is a crash loop.
If the status shows Exited instead of Restarting, the container is stopped and won't come back because either there's no restart policy, or it hit the restart limit.
Step 2: Read the exit code
docker inspect <container_name> --format='ExitCode: {{.State.ExitCode}} | OOMKilled: {{.State.OOMKilled}} | Error: {{.State.Error}}'
Match the exit code to a cause:
| Exit code | Cause | What to do |
|---|---|---|
0 | Clean exit (not a crash) | Container finished intentionally — wrong restart policy |
1 | Application error | Read the logs — the app crashed |
2 | Shell error | Check entrypoint script |
126 | Permission denied | File permissions or SELinux issue |
127 | Command not found | Bad entrypoint, missing binary in image |
137 + OOMKilled=true | Out of memory | Increase memory limit, check for leak |
137 + OOMKilled=false | Killed by SIGKILL | Deployment, manual kill, or host under pressure |
139 | Segfault | Likely a C/C++/Rust binary issue; check core dumps |
143 | SIGTERM received | Graceful shutdown — check if intentional |
For exit code 1 (the most common): the cause is in the logs. Go to step 3.
For exit code 137, read our OOM kill deep dive.
Step 3: Read the logs
# Last 50 lines from the most recent run
docker logs --tail 50 <container_name>
# Last 50 lines with timestamps
docker logs --tail 50 -t <container_name>
# Follow logs in real time as the container restarts
docker logs -f <container_name>
# Logs since a specific time (useful if the container last crashed hours ago)
docker logs --since 2025-01-15T03:00:00 <container_name>
What to look for:
The crash message is almost always in the last 5-20 lines before the log cutoff. Common patterns:
# Node.js crashes:
Error: ENOENT: no such file or directory, open '/app/config.json'
Error: connect ECONNREFUSED 127.0.0.1:5432
# Python crashes:
ImportError: No module named 'psycopg2'
psycopg2.OperationalError: could not connect to server: Connection refused
# Java/Spring crashes:
APPLICATION FAILED TO START
Description: Failed to configure a DataSource: 'url' attribute is not specified
# Go crashes:
panic: runtime error: invalid memory address or nil pointer dereference
dial tcp: lookup postgres on 127.0.0.11:53: no such host
# Database connection refused (common to all languages):
Error: Connection refused to postgres:5432
Notice that "connection refused" appears across all languages. That's often not an application bug — it's a startup order problem (more on this below).
Step 4: Check if it's a startup order issue
This is the most common cause of containers restarting that people overlook. Your API container starts before the database is ready, tries to connect, fails, crashes, restarts, and the cycle continues.
Docker Compose's depends_on helps with container start order, but it only waits for the container to start, not for the service inside it to be ready:
# This only waits for the postgres CONTAINER to start, not postgres to be READY
depends_on:
- postgres
The fix:
# This waits for postgres to be healthy (i.e., actually accepting connections)
depends_on:
postgres:
condition: service_healthy
postgres:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myuser -d mydb"]
interval: 5s
timeout: 5s
retries: 5
start_period: 10s
Alternative: add retry logic in your application code. Most ORMs support connection retries. Don't assume database connectivity on startup — retry with exponential backoff.
Step 5: Check for missing environment variables
A common cause of exit code 1 is the application crashing because it can't find a required environment variable:
# See what environment variables the container has
docker inspect <container_name> --format='{{range .Config.Env}}{{println .}}{{end}}'
# Or exec into the running container
docker exec -it <container_name> env
Compare to what your application requires. Missing DATABASE_URL, SECRET_KEY, or API_KEY variables are a frequent culprit, especially after a config change or new deployment.
Step 6: Check for port conflicts
If your application fails to bind to a port because something else is already using it:
# See if a port is already in use on the host
ss -tlnp | grep :8080
# Or with netstat
netstat -tlnp | grep :8080
Application logs will typically say "address already in use" or "port 8080 is already in use." Inside a container, port conflicts can also occur if you're running multiple processes that try to bind the same port.
Step 7: Check file permissions and volume mounts
# See what volumes are mounted
docker inspect <container_name> --format='{{json .Mounts}}' | python3 -m json.tool
# Check permissions on a mounted directory
ls -la /path/to/host/directory
Common volume permission issues:
- Host directory owned by root, container running as non-root user
- Host SELinux/AppArmor policies blocking access
- Read-only volume mount on a path the app tries to write to
How restart policies work
Your container's restart behavior depends on the restart policy set in your Compose file or Docker run command:
| Policy | Behavior |
|---|---|
no | Never restart (default if not set) |
always | Always restart, even on clean exit (code 0) |
unless-stopped | Restart unless manually stopped |
on-failure | Only restart on non-zero exit code |
on-failure:5 | Only restart on non-zero exit, max 5 times |
A container with restart: always will keep restarting forever, even if the exit code is 0. This is usually wrong if your process exits cleanly — use on-failure instead.
services:
api:
image: myapp:latest
restart: on-failure:3 # Restart on crash, max 3 attempts
After 3 failed restarts, Docker stops attempting and the container stays in Exited state. This is usually what you want in production — stop the restart loop and alert so a human can investigate.
Common causes by symptom
Container restarts immediately (< 1 second)
Almost always a missing file, bad config, or immediate panic. Read logs immediately after the restart starts:
docker logs --tail 20 <container_name> 2>&1 | head -30
Container runs for 30-120 seconds then crashes
Likely a lazy initialization issue — the app starts, warms up, then tries to connect to something it couldn't reach, or loads a large dataset and OOM kills:
# Watch memory growth in real time
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.CPUPerc}}" <container_name>
Container crashes only under load
Concurrency bug or resource exhaustion. The problem only manifests when requests are hitting the service. Check for:
- Connection pool exhaustion (too many concurrent DB connections)
- Thread/goroutine leak that accumulates until OOM
- Race condition in global state
Container crashes after a deployment
Your new image has a bug. Roll back:
# Roll back to the previous image tag
docker stop api-service
docker rm api-service
docker run -d --name api-service myapp:previous-stable-tag
Getting alerts before the restart count explodes
If you find out about a restart loop because a user reports it, you're too late. The right approach is to be notified at the first sign of trouble.
Kernus monitors restart counts across all your containers and can alert you when:
- A container restarts more than N times in a time window
- A container exits with a non-zero exit code
- A container is OOM killed
The alert includes the exit code, OOM status, and the last log lines at the time of the crash — so you have context immediately, not after you've SSH'd in to investigate.
For OOM kill specifics: OOM kills in Docker — how to detect and prevent them. To set up restart alerts: How to set up alerts for Docker containers.
Monitor container restarts automatically — try Kernus free →
Try Kernus free
Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.
Start monitoring