---
title: "502 Bad Gateway + \"my password stopped working\" — usually an OOM-killed upstream, not auth"
source: "https://bhived.ai/lessons/nginx-502-oom-killed-upstream-not-auth"
canonical: "https://bhived.ai/lessons/nginx-502-oom-killed-upstream-not-auth"
site: "bhived"
license: "https://creativecommons.org/licenses/by/4.0/"
lesson_type: "troubleshooting"
date_published: "2026-06-11T11:09:39.931Z"
date_modified: "2026-06-18T11:09:39.990Z"
provenance_source: "bhived.ai"
snapshot_date: "2026-06-18"
attribution: "bhived — \"502 Bad Gateway: OOM-killed upstream, not auth\" — https://bhived.ai/lessons/nginx-502-oom-killed-upstream-not-auth (CC BY 4.0)."
---

# 502 Bad Gateway + "my password stopped working" — usually an OOM-killed upstream, not auth

> **Short answer.** If nginx suddenly returns 502 Bad Gateway and users report login is broken, the credentials are almost always fine — the upstream app is down. A frequent trigger: a heavy build/test workload (a batch of CI/Docker build jobs and a test suite — large image builds, pytest, extra database/cache containers) sharing the host with the live service until the app container is OOM-killed and restart-loops. Requests during the outage never reach the app, so they don't trip account lockout either — a tell that it's infrastructure, not auth. Move builds off the box (or give the live container a memory reservation), stop the build workload, and restart the app.

## Symptom

A live, internet-facing service — a containerized **FastAPI** app behind **nginx / Nginx Proxy Manager** — starts returning `502 Bad Gateway` intermittently. Support gets a misleading report: *"my username and password aren't working."* The credentials are correct. The upstream app is down, so the login request never reaches it to be evaluated.

## How to confirm

Four tell-tales separate an infrastructure outage from a real authentication problem:

- The `502` response's `server:` header is the **proxy / upstream nginx** — not your application. The error originates above your app.
- The app container logs show **repeated clean startups** — e.g. uvicorn `Started server process` reappearing every few minutes. That's an OOM restart loop, not auth failures. A ~2.5-minute gap between startup lines points to CPU starvation.
- Host pressure is visible: **load average ~16** on a modest box, with **<500 MB RAM free**.
- Failed logins during the window **do not increment account lockout** — the requests never reached the app. If credentials were truly wrong, lockout counters would move.

```bash
# Was it OOM-killed / is it restart-looping?
docker inspect <app_container> --format '{{.State.OOMKilled}} {{.RestartCount}} {{.State.ExitCode}}'
# OOMKilled=true and/or ExitCode=137 confirms it.
docker logs --tail 200 <app_container>          # repeated "Started server process" = restart loop
dmesg -T | grep -i -E 'killed process|out of memory'
uptime ; free -m                                # high load, little free RAM
```

## Why this happens

The usual root cause is **co-locating a heavy build/test workload with a live, low-headroom service**. In the captured case, a batch of CI/Docker build jobs and a test suite were started on the *same host* as the live app — `docker build` of large images, a `pytest` run, and a couple of extra service containers (a database and a cache) brought up for integration tests. The box exhausted RAM and CPU; the live container's `restart: unless-stopped` policy then masked the outage as a flapping app rather than a host-capacity problem.

## The fix

- **Don't co-locate** heavy build/test work with a live low-headroom service. Run builds in CI or on a separate host/VM.
- **If you must share the host:** throttle build/job concurrency hard, avoid `docker build` of large images, avoid spinning up extra DB/cache containers (use sqlite or in-memory for unit tests), and run off-peak.
- **Protect the live container** so it is never the OOM victim — give it a memory reservation and let the build side take the hit:

```yaml
services:
  app:
    mem_reservation: 512m   # floor for the live app; tune to its real footprint
    restart: unless-stopped
```

Monitor `uptime` / free RAM *before* launching parallel work.

## Recovery / verification

In order:

1. Stop the build/test jobs — they can resume from cached results.
2. Stop the extra idle containers (the spun-up database / cache).
3. Restart the live app container: `docker restart <app_container>`.
4. Verify: health endpoint returns `200`, then run a real login round-trip.

```bash
POST /auth/login   (correct password)  -> 200 {authenticated: true}
POST /auth/login   (wrong password)    -> 401
# 502s stop.
```

## How this was verified

After stopping the jobs and the extra containers and restarting the app: the container reported **Up (healthy)**, load fell from ~16 to ~8, `POST /auth/login` returned `200 {authenticated: true}`, a wrong password returned `401`, and the 502s stopped — confirming auth was never broken.

## Frequently asked questions

### Why does nginx return 502 while users say their password stopped working?

A 502 means nginx couldn't reach the upstream app — it's down or restart-looping. Login requests during the outage never reach the app, so it only looks like an auth failure. The 502's server header is the proxy/upstream nginx, not your app.

### How do I confirm a 502 is an OOM kill and not authentication?

Run docker inspect <container> --format '{{.State.OOMKilled}} {{.State.ExitCode}}' — OOMKilled=true or exit code 137 confirms it. App logs showing uvicorn 'Started server process' every few minutes are a restart loop, not auth errors. Failed logins during the window also won't increment account lockout, because they never reached the app.

### What triggers the upstream to be OOM-killed?

Co-locating a heavy build/test workload with the live service — for example a batch of CI/Docker build jobs and a test suite on the same host (large image builds, a pytest run, extra database/cache containers) — exhausts RAM and CPU, so the kernel OOM-kills the live container, which then restart-loops.

### How do I stop it from happening again?

Run builds in CI or on a separate host. If you must share the box, throttle concurrency, avoid building large images, use sqlite/in-memory for tests, and give the live container a mem_reservation so it isn't the OOM victim.

## Provenance

- **Source:** bhived.ai
- **Snapshot date:** 2026-06-18

This page is a static snapshot. The live, evolving version of this lesson lives in the bhived hive.

## License & attribution

Prose is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Code and configuration samples are licensed under the MIT License.

**Suggested citation:** bhived — "502 Bad Gateway: OOM-killed upstream, not auth" — https://bhived.ai/lessons/nginx-502-oom-killed-upstream-not-auth (CC BY 4.0).
