OOM kills in Docker: how to detect and prevent them
Deep dive on Docker OOM kills — what they are at the kernel level, exit code 137 explained, how to detect them, set memory limits, and prevent them in production.
A Docker OOM kill is one of the most common and most misdiagnosed production incidents. The container just... restarts. Logs show a sudden cutoff. Exit code 137. Your users saw errors for 30 seconds. Without the right monitoring, you might not even know it happened. This post is a technical deep dive: what the OOM killer actually is, how Docker handles it, how to detect it, and how to prevent it.
What the OOM killer is (Linux kernel basics)
Every modern Linux kernel has an OOM (Out Of Memory) killer. When the system runs out of physical memory and all swap is exhausted, the kernel must make a decision: which process should be killed to reclaim memory?
The OOM killer scores every process using a heuristic that considers:
- Memory footprint — processes using more memory score higher (more likely to be killed)
- Process age — newer processes score higher than long-running ones
- OOM score adjustment — containers with high
oom_score_adjare preferentially killed
When Docker runs a container with a memory limit (--memory 512m or mem_limit: 512m in Compose), it sets up a Linux cgroup with that memory constraint. When the container's processes try to exceed that limit, the kernel OOM killer activates and kills the offending process — not the whole container, just the process(es) exceeding the limit.
Docker then records:
- Exit code
137(128 + 9, where 9 = SIGKILL) OOMKilled: truein the container inspect output
And if the container has a restart policy, Docker restarts it automatically. Which means the OOM kill can happen silently, over and over.
Exit code 137 explained
Exit codes in Docker follow Linux convention:
| Exit code | Meaning |
|---|---|
0 | Clean exit — the process finished normally |
1 | General error — application-level failure |
2 | Shell built-in misuse |
126 | Permission denied (can't execute) |
127 | Command not found |
137 | Killed by SIGKILL (128 + 9) — almost always OOM |
139 | Segmentation fault (128 + 11) |
143 | Killed by SIGTERM (128 + 15) — graceful shutdown |
Exit code 137 is technically "killed by SIGKILL from outside" — which includes OOM kills, but also manual docker kill or kill -9 PID. The OOMKilled: true flag is the definitive check.
# Check both exit code AND OOMKilled flag
docker inspect my-container --format='ExitCode: {{.State.ExitCode}}, OOMKilled: {{.State.OOMKilled}}'
# ExitCode: 137, OOMKilled: true → Definitive OOM kill
# ExitCode: 137, OOMKilled: false → SIGKILL from somewhere else
How to detect OOM kills
Manual detection
The simplest check for a running or recently stopped container:
# Check OOM status
docker inspect <container-name> | grep -E '"ExitCode"|"OOMKilled"'
# Or formatted:
docker inspect <container-name> --format='{{.State.ExitCode}} {{.State.OOMKilled}}'
For system-level OOM events (useful on hosts without per-container tracking):
# Recent OOM events in kernel log
dmesg | grep -i "killed process\|out of memory\|oom"
# With timestamp (requires dmesg with -T support)
dmesg -T | grep -i "oom"
# In systemd journal
journalctl -k | grep -i "oom"
Watching for OOM in real time
# Watch restart count for all containers
watch -n 5 'docker ps --format "table {{.Names}}\t{{.Status}}\t{{.RunningFor}}"'
# Stats with memory usage
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
A container approaching its memory limit will show MemPerc near 100%. When it hits 100% and the process is killed, watch for the restart.
Automated detection with monitoring
Manually checking for OOM kills doesn't work at scale or at 3 AM. This is why automated monitoring exists.
Kernus automatically detects OOM kills by:
- Watching the Docker API for container exit events
- Checking
OOMKilledin the container state - Recording the exit code (137) and reason ("oom_killed")
- Capturing the last log lines at time of kill
- Firing an alert with all this context to your configured channels
An OOM kill alert from Kernus includes the container name, host, memory usage at time of kill, exit reason, and the last log lines — giving you enough context to diagnose the issue without SSHing into the server.
Common causes of Docker OOM kills
1. Memory limit set too low
The most common cause. You gave a container 256MB but the JVM wants 512MB at startup. Solution:
# docker-compose.yml
services:
api:
image: myapp:latest
deploy:
resources:
limits:
memory: 512m # Increase from 256m
reservations:
memory: 256m # What Docker guarantees (optional)
How to find the right limit: run the container without a limit for a while, observe peak memory usage with docker stats, then set the limit at 150% of peak.
2. Memory leak in the application
A container that OOM kills repeatedly isn't necessarily undersized — it might have a memory leak. Symptoms:
- Memory usage grows steadily over hours/days
- The growth never levels off (a healthy app's memory usage plateaus)
- After restart, memory starts low, then grows again
# Watch memory growth over time (run for 30 minutes and observe)
watch -n 60 'docker stats --no-stream my-container | grep -v NAME'
If memory grows 50MB every hour with no plateau, you have a leak. Fix in the application code; don't just increase the memory limit (it delays the failure, not fixes it).
Common leak sources by runtime:
- Node.js: Event listener accumulation, large objects kept in closure, unbounded caches
- Java/JVM: Class loader leaks, ThreadLocal not cleaned up, off-heap memory (use
-XX:MaxDirectMemorySize) - Go: Goroutine leaks (goroutines that never terminate), large heap retained by long-lived references
- Python: Circular references with
__del__, keeping large datasets in memory between requests
3. Unexpected memory spike
Some workloads need much more memory for a specific operation:
- Image processing with a large file
- Building a large report/export
- Parsing a large JSON payload
If the spike is predictable and bounded, increase the memory limit. If it's unbounded (arbitrary input size), consider streaming the data instead of loading it all into memory.
4. Swap disabled
By default, Docker containers can use swap equal to double their memory limit. If the host system has swap disabled, OOM kills happen faster. Check:
free -h
# If swap shows 0, swap is disabled
Adding swap isn't a solution to a leak, but it can prevent OOM kills from brief spikes while you investigate.
Setting proper memory limits
Rule of thumb
Observe your container's steady-state memory usage under normal load. Set your limit to 150-200% of steady-state to give headroom for spikes.
# Observe steady-state memory (run after container is warmed up)
docker stats --no-stream --format "{{.MemUsage}}" my-container
# e.g., "312MiB / 2GiB"
# Steady-state: ~312MB
# Good limit: 512MB-600MB
Docker Compose memory limit syntax
services:
api:
image: myapp:latest
mem_limit: 512m # Hard limit (deprecated but still works)
memswap_limit: 512m # Set equal to mem_limit to disable swap for this container
# Or with deploy (preferred for Compose v3):
deploy:
resources:
limits:
memory: 512m
JVM-specific: always set heap limits
Java containers are notorious for OOM kills because the JVM's default heap sizing is based on the host's total memory — not the container's limit. Add JVM flags:
environment:
- JAVA_OPTS=-Xmx256m -Xms128m -XX:MaxMetaspaceSize=128m
Or use the container-aware JVM flags (Java 10+):
environment:
- JAVA_OPTS=-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0
Node.js: explicitly set max old space
Node.js also doesn't respect container memory limits by default. Set:
node --max-old-space-size=450 server.js # For a 512MB container limit
Or via NODE_OPTIONS environment variable:
environment:
- NODE_OPTIONS=--max-old-space-size=450
Preventing OOM kills in production
1. Always set memory limits
Running containers without memory limits means one misbehaving container can crash the entire host. Set limits on everything in production. No exceptions.
2. Monitor memory trends, not just peaks
Set up an alert for "memory above 75% for 10 minutes" — this gives you warning before the OOM kill happens. You have time to investigate and increase the limit (or fix the leak) before the crash.
3. Use memory reservations
Set mem_reservation (soft limit) in addition to mem_limit (hard limit). This tells Docker how much memory to guarantee for the container during scheduling, which helps Docker make better decisions when the host is under memory pressure.
4. Capture log snapshots on exit
When an OOM kill happens, the container is killed mid-operation. The logs stop abruptly. Having the last 50-100 log lines captured at the moment of the kill is invaluable for diagnosing what the container was doing when it ran out of memory.
Kernus captures log snapshots automatically on crash events and includes them in alert notifications. You see the context without needing to reconstruct it after the fact from rotated log files.
5. Test your memory limits during staging
Before deploying, run a load test against your staging environment. Watch memory usage. Confirm the container plateaus at a safe level. Fix leaks or adjust limits before they become production incidents.
To understand all exit codes and what they mean: Docker container keeps restarting — how to debug and fix. To set up OOM kill alerts: How to set up alerts for Docker containers.
Try Kernus free
Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.
Start monitoring