Why Headless Chrome Keeps Crashing
You set up Puppeteer or Playwright, it works locally, and then production goes sideways. Browsers crash. Memory spikes. Processes become zombies. You've been there.
Here's what's actually happening — and what you can do about it.
The most common causes
1. /dev/shm is too small in Docker
Chrome uses shared memory for rendering. In Docker, /dev/shm defaults to 64MB. Chrome needs more.
Symptom: Crash on any page with canvas, WebGL, or heavy CSS.
Fix:
# docker-compose.yml
services:
app:
shm_size: '256mb'Or pass --disable-dev-shm-usage to Chrome args:
await chromium.launch({
args: ['--disable-dev-shm-usage']
});2. No sandbox in restricted environments
Chrome's sandbox requires kernel features that many cloud environments (Kubernetes pods, certain VMs) don't expose.
Symptom: Running as root without --no-sandbox is not supported.
Fix:
await chromium.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox']
});Security note: Only do this inside Docker with proper network isolation. Never in a multi-tenant environment.
3. Memory leaks from unclosed contexts
If you're opening a new browser per request without closing it, you'll run out of memory within hours.
Wrong:
// Don't do this
app.get('/screenshot', async (req, res) => {
const browser = await chromium.launch();
const page = await browser.newPage();
// ... take screenshot ...
// browser never closed if an error occurs
});Right: Use a browser pool. Launch browsers once at startup, reuse pages, close the page (not the browser) after each screenshot, and restart browsers periodically to prevent memory accumulation.
4. GPU process crashes
Chrome's GPU process crashes in headless environments without a display server.
Fix:
args: ['--disable-gpu', '--no-first-run']5. Zombie processes after timeout
If a screenshot times out, the browser process may not be killed properly.
Fix: Always use Promise.race with a timeout and explicitly kill the page/context:
const result = await Promise.race([
page.goto(url),
new Promise((_, reject) => setTimeout(() => reject(new Error('Timeout')), 30000))
]);6. OOM killer
On low-memory instances, the Linux OOM killer will terminate Chrome without warning. Chrome is memory-hungry — each tab uses 100–300MB.
Fix: Run a browser pool with a max page count per browser, and limit concurrent requests.
The hidden operational cost
Even if you solve all the above, you're now responsible for:
- Monitoring browser health and restarting crashed instances
- Memory management — recycling browsers after N screenshots
- Updates — Chrome releases updates that can break sites
- Scaling — horizontal scaling of stateful browser processes is complex
For most teams, this is 20–40 hours of engineering time to get right, then ongoing maintenance.
When to use a managed API
If screenshots are not your core product, a managed API eliminates all of this:
- No Docker
/dev/shmconfiguration - No zombie processes
- No OOM monitoring
- No browser update management
- Scales automatically
You trade a small per-request cost for engineering hours and operational overhead.
# 3 lines of bash instead of a browser operations runbook
curl "https://api.snapsharp.dev/v1/screenshot?url=https://example.com" \
-H "Authorization: Bearer sk_live_..." \
-o screenshot.pngIf screenshots are your core product, you'll want control — so manage your own pool carefully, use the fixes above, and invest in proper browser pool management.
There's no universal answer. But know the true cost before deciding.