February 19, 20258 min read

OOM kills in Docker: how to detect and prevent them

Deep dive on Docker OOM kills — what they are at the kernel level, exit code 137 explained, how to detect them, set memory limits, and prevent them in production.

dockeroommemorycontainersdebugging

A Docker OOM kill is one of the most common and most misdiagnosed production incidents. The container just... restarts. Logs show a sudden cutoff. Exit code 137. Your users saw errors for 30 seconds. Without the right monitoring, you might not even know it happened. This post is a technical deep dive: what the OOM killer actually is, how Docker handles it, how to detect it, and how to prevent it.

What the OOM killer is (Linux kernel basics)

Every modern Linux kernel has an OOM (Out Of Memory) killer. When the system runs out of physical memory and all swap is exhausted, the kernel must make a decision: which process should be killed to reclaim memory?

The OOM killer scores every process using a heuristic that considers:

Memory footprint — processes using more memory score higher (more likely to be killed)
Process age — newer processes score higher than long-running ones
OOM score adjustment — containers with high oom_score_adj are preferentially killed

When Docker runs a container with a memory limit (--memory 512m or mem_limit: 512m in Compose), it sets up a Linux cgroup with that memory constraint. When the container's processes try to exceed that limit, the kernel OOM killer activates and kills the offending process — not the whole container, just the process(es) exceeding the limit.

Docker then records:

Exit code 137 (128 + 9, where 9 = SIGKILL)
OOMKilled: true in the container inspect output

And if the container has a restart policy, Docker restarts it automatically. Which means the OOM kill can happen silently, over and over.

Exit code 137 explained

Exit codes in Docker follow Linux convention:

Exit code	Meaning
`0`	Clean exit — the process finished normally
`1`	General error — application-level failure
`2`	Shell built-in misuse
`126`	Permission denied (can't execute)
`127`	Command not found
`137`	Killed by SIGKILL (128 + 9) — almost always OOM
`139`	Segmentation fault (128 + 11)
`143`	Killed by SIGTERM (128 + 15) — graceful shutdown

Exit code 137 is technically "killed by SIGKILL from outside" — which includes OOM kills, but also manual docker kill or kill -9 PID. The OOMKilled: true flag is the definitive check.

# Check both exit code AND OOMKilled flag
docker inspect my-container --format='ExitCode: {{.State.ExitCode}}, OOMKilled: {{.State.OOMKilled}}'
# ExitCode: 137, OOMKilled: true → Definitive OOM kill
# ExitCode: 137, OOMKilled: false → SIGKILL from somewhere else

How to detect OOM kills

Manual detection

The simplest check for a running or recently stopped container:

# Check OOM status
docker inspect <container-name> | grep -E '"ExitCode"|"OOMKilled"'

# Or formatted:
docker inspect <container-name> --format='{{.State.ExitCode}} {{.State.OOMKilled}}'

For system-level OOM events (useful on hosts without per-container tracking):

# Recent OOM events in kernel log
dmesg | grep -i "killed process\|out of memory\|oom"

# With timestamp (requires dmesg with -T support)
dmesg -T | grep -i "oom"

# In systemd journal
journalctl -k | grep -i "oom"

Watching for OOM in real time

# Watch restart count for all containers
watch -n 5 'docker ps --format "table {{.Names}}\t{{.Status}}\t{{.RunningFor}}"'

# Stats with memory usage
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

A container approaching its memory limit will show MemPerc near 100%. When it hits 100% and the process is killed, watch for the restart.

Automated detection with monitoring

Manually checking for OOM kills doesn't work at scale or at 3 AM. This is why automated monitoring exists.

Kernus automatically detects OOM kills by:

Watching the Docker API for container exit events
Checking OOMKilled in the container state
Recording the exit code (137) and reason ("oom_killed")
Capturing the last log lines at time of kill
Firing an alert with all this context to your configured channels

An OOM kill alert from Kernus includes the container name, host, memory usage at time of kill, exit reason, and the last log lines — giving you enough context to diagnose the issue without SSHing into the server.

Common causes of Docker OOM kills

1. Memory limit set too low

The most common cause. You gave a container 256MB but the JVM wants 512MB at startup. Solution:

# docker-compose.yml
services:
  api:
    image: myapp:latest
    deploy:
      resources:
        limits:
          memory: 512m  # Increase from 256m
        reservations:
          memory: 256m  # What Docker guarantees (optional)

How to find the right limit: run the container without a limit for a while, observe peak memory usage with docker stats, then set the limit at 150% of peak.

2. Memory leak in the application

A container that OOM kills repeatedly isn't necessarily undersized — it might have a memory leak. Symptoms:

Memory usage grows steadily over hours/days
The growth never levels off (a healthy app's memory usage plateaus)
After restart, memory starts low, then grows again

# Watch memory growth over time (run for 30 minutes and observe)
watch -n 60 'docker stats --no-stream my-container | grep -v NAME'

If memory grows 50MB every hour with no plateau, you have a leak. Fix in the application code; don't just increase the memory limit (it delays the failure, not fixes it).

Common leak sources by runtime:

Node.js: Event listener accumulation, large objects kept in closure, unbounded caches
Java/JVM: Class loader leaks, ThreadLocal not cleaned up, off-heap memory (use -XX:MaxDirectMemorySize)
Go: Goroutine leaks (goroutines that never terminate), large heap retained by long-lived references
Python: Circular references with __del__, keeping large datasets in memory between requests

3. Unexpected memory spike

Some workloads need much more memory for a specific operation:

Image processing with a large file
Building a large report/export
Parsing a large JSON payload

If the spike is predictable and bounded, increase the memory limit. If it's unbounded (arbitrary input size), consider streaming the data instead of loading it all into memory.

4. Swap disabled

By default, Docker containers can use swap equal to double their memory limit. If the host system has swap disabled, OOM kills happen faster. Check:

free -h
# If swap shows 0, swap is disabled

Adding swap isn't a solution to a leak, but it can prevent OOM kills from brief spikes while you investigate.

Setting proper memory limits

Rule of thumb

Observe your container's steady-state memory usage under normal load. Set your limit to 150-200% of steady-state to give headroom for spikes.

# Observe steady-state memory (run after container is warmed up)
docker stats --no-stream --format "{{.MemUsage}}" my-container
# e.g., "312MiB / 2GiB"
# Steady-state: ~312MB
# Good limit: 512MB-600MB

Docker Compose memory limit syntax

services:
  api:
    image: myapp:latest
    mem_limit: 512m         # Hard limit (deprecated but still works)
    memswap_limit: 512m     # Set equal to mem_limit to disable swap for this container
    
    # Or with deploy (preferred for Compose v3):
    deploy:
      resources:
        limits:
          memory: 512m

JVM-specific: always set heap limits

Java containers are notorious for OOM kills because the JVM's default heap sizing is based on the host's total memory — not the container's limit. Add JVM flags:

environment:
  - JAVA_OPTS=-Xmx256m -Xms128m -XX:MaxMetaspaceSize=128m

Or use the container-aware JVM flags (Java 10+):

environment:
  - JAVA_OPTS=-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0

Node.js: explicitly set max old space

Node.js also doesn't respect container memory limits by default. Set:

node --max-old-space-size=450 server.js  # For a 512MB container limit

Or via NODE_OPTIONS environment variable:

environment:
  - NODE_OPTIONS=--max-old-space-size=450

Preventing OOM kills in production

1. Always set memory limits

Running containers without memory limits means one misbehaving container can crash the entire host. Set limits on everything in production. No exceptions.

2. Monitor memory trends, not just peaks

Set up an alert for "memory above 75% for 10 minutes" — this gives you warning before the OOM kill happens. You have time to investigate and increase the limit (or fix the leak) before the crash.

3. Use memory reservations

Set mem_reservation (soft limit) in addition to mem_limit (hard limit). This tells Docker how much memory to guarantee for the container during scheduling, which helps Docker make better decisions when the host is under memory pressure.

4. Capture log snapshots on exit

When an OOM kill happens, the container is killed mid-operation. The logs stop abruptly. Having the last 50-100 log lines captured at the moment of the kill is invaluable for diagnosing what the container was doing when it ran out of memory.

Kernus captures log snapshots automatically on crash events and includes them in alert notifications. You see the context without needing to reconstruct it after the fact from rotated log files.

5. Test your memory limits during staging

Before deploying, run a load test against your staging environment. Watch memory usage. Confirm the container plateaus at a safe level. Fix leaks or adjust limits before they become production incidents.

To understand all exit codes and what they mean: Docker container keeps restarting — how to debug and fix. To set up OOM kill alerts: How to set up alerts for Docker containers.

Get alerted when containers OOM kill — try Kernus free →

Try Kernus free

Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.

Start monitoring

← Back to all posts