Designing Production-Grade REST API Health Check Endpoints

Why health checks matter

A health check endpoint is your API’s heartbeat. Load balancers, orchestrators, and uptime monitors rely on it to route traffic, trigger restarts, and alert humans. A well‑designed health check:

Detects when the app process is alive and when it’s actually able to serve requests
Minimizes false positives/negatives with deterministic rules and tight timeouts
Communicates unambiguously to humans and machines with a stable response contract
Respects security boundaries and doesn’t leak secrets or internals unnecessarily

This article outlines pragmatic patterns to design production‑grade REST API health checks that integrate cleanly with Kubernetes, serverless platforms, and traditional load balancers.

Liveness vs. readiness vs. startup

Not all “health” is equal. Distinguish checks by intent:

Liveness (live): Is the process running and not deadlocked? If no, restart it. Keep this check fast and shallow.
Readiness (ready): Can the service successfully handle real traffic now? Verify critical dependencies and configuration.
Startup (optional): Has the app completed expensive one‑time initialization (migrations, warm caches) before other checks begin?

Recommended endpoints:

GET /health/live → shallow; returns 200 if event loop/threads respond
GET /health/ready → deeper; returns 200 only if critical dependencies pass
GET /health/startup → 200 after initialization phase

Avoid using a single /health for all cases; split intent to reduce flapping and noisy restarts.

HTTP method, path, and semantics

Method: Use GET. It is safe and idempotent.
Paths: Prefer predictable, non-versioned paths under /health (e.g., /health/live, /health/ready). Health is infrastructure‑facing and not part of your public API versioning scheme.
Latency budget: Target p95 < 50 ms for liveness and < 200 ms for readiness (excluding slow external dependencies where you’ll use timeouts and partial aggregation).

Status codes that convey truth

200 OK: Healthy for the intended check
503 Service Unavailable: Unhealthy or not ready; signals load balancers to stop routing traffic
500 Internal Server Error: Unexpected failure in the checker itself; treat as unhealthy
429 Too Many Requests: If you rate‑limit health hits (rare), but generally exempt health endpoints from limits

Do not return 2xx with an error payload; machines key off status codes.

Response contract (keep it small and stable)

Even though load balancers mostly care about status codes, a concise JSON body helps humans and observability tools.

Example minimal, stable response for readiness:

{
  "status": "pass",
  "service": "billing-api",
  "version": "2.4.1",
  "time": "2026-04-07T15:23:18Z",
  "uptime_s": 48213,
  "checks": {
    "db:primary": {"status": "pass", "latency_ms": 14},
    "redis": {"status": "pass", "latency_ms": 3},
    "queue": {"status": "warn", "detail": "lag=523"}
  }
}

status: pass | warn | fail (map warn→200 with alerts; fail→503)
checks: minimal, redacted fields only; exclude secrets
version/commit: aids rollbacks and correlation with deployments

Optional JSON Schema for the response

Keep the contract explicit and testable.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/health.schema.json",
  "type": "object",
  "required": ["status", "service", "time"],
  "properties": {
    "status": {"enum": ["pass", "warn", "fail"]},
    "service": {"type": "string"},
    "version": {"type": ["string", "null"]},
    "time": {"type": "string", "format": "date-time"},
    "uptime_s": {"type": ["number", "null"]},
    "checks": {
      "type": "object",
      "additionalProperties": {
        "type": "object",
        "required": ["status"],
        "properties": {
          "status": {"enum": ["pass", "warn", "fail"]},
          "latency_ms": {"type": ["number", "null"]},
          "detail": {"type": ["string", "null"]}
        }
      }
    }
  }
}

What to check (and how deep)

Liveness: Only app loop responsiveness and basic memory/CPU sanity. Avoid I/O. Example: respond “pass” if a trivial in‑process action completes within 50 ms.
Readiness: Validate dependencies required to serve requests:
- Database connectivity (and a trivial SELECT 1 or ping)
- Cache reachability (PING)
- Message broker connection (open channel, publish/confirm noop)
- Critical downstream HTTP services (HEAD or GET on a cheap endpoint)
- Configuration/secrets presence
- Migration state if mandatory for correctness

Use timeouts per dependency (e.g., 100–300 ms each) and cap total readiness time with a global deadline. Return partial results with an aggregate status that errs on safety.

Aggregation logic

pass if all critical checks pass
warn if non‑critical checks fail or exceed soft thresholds (e.g., queue lag high)
fail if any critical check fails or the global timeout elapses

Document which checks are critical. Put this mapping in code, not tribal knowledge.

Performance, timeouts, and frequency

Keep checks cheap. Prefer cached results refreshed every N seconds over recomputing on every hit, especially when external calls are needed.
Use circuit breakers: if a dependency is flapping, avoid amplifying load by hammering it from health checks.
Recommended cadence: orchestrators often poll every 10s; your endpoint should handle bursts without material CPU/IO impact.

Caching patterns

Liveness: no cache, fully in‑process
Readiness: memoize dependency results for 2–10 seconds with per‑check TTLs
HTTP headers: Cache-Control: no-store (for external monitors). For internal sidecars that tolerate slight staleness, allow short max-age.

Security and exposure

Network boundary first: expose liveness only on localhost or pod network; expose readiness to cluster/load balancer; avoid internet exposure.
Authentication: usually not required inside trusted networks. If public, require auth or IP allowlists.
Redaction: include high‑level status, not secrets, connection strings, or stack traces.
Rate limiting: exempt cluster IPs; apply light limits to public monitors to prevent abuse.

Observability integration

Emit a metric for each check: health_check_status{check=“db”} = {0,1,2} and latency
Create alerts tied to readiness fail rate and sustained warn states
Correlate status with deployment version labels to detect bad releases quickly
Log a compact single‑line JSON entry per health evaluation

Failure modes to design for

Hung process: liveness must fail even if sockets accept but can’t make progress
Slow dependencies: use timeouts so health returns within deadline; classify as fail/warn deterministically
Partial outages: do not mask failures; return 503 on readiness fail so traffic drains
Startup storms: use a separate startup probe to prevent premature restarts during migrations or JIT warmup

Implementation sketches

Below are minimal examples. Each uses two endpoints: /health/live and /health/ready.

Node.js (Express)

import express from 'express';
import { pingDb, pingRedis } from './deps.js';

const app = express();
const started = Date.now();

app.get('/health/live', (req, res) => {
  res.set('Cache-Control', 'no-store');
  return res.status(200).json({ status: 'pass', service: 'billing-api', time: new Date().toISOString() });
});

app.get('/health/ready', async (req, res) => {
  res.set('Cache-Control', 'no-store');
  const deadline = 300; // ms
  const t0 = Date.now();
  const checks = {};

  const withTimeout = (p, ms) => Promise.race([
    p, new Promise((_, r) => setTimeout(() => r(new Error('timeout')), ms))
  ]);

  try { checks['db:primary'] = { status: 'pass', latency_ms: await time(() => withTimeout(pingDb(), 150)) }; }
  catch { checks['db:primary'] = { status: 'fail' }; }

  try { checks['redis'] = { status: 'pass', latency_ms: await time(() => withTimeout(pingRedis(), 80)) }; }
  catch { checks['redis'] = { status: 'fail' }; }

  const fail = Object.values(checks).some(c => c.status === 'fail');
  const elapsed = Date.now() - t0;
  const body = { status: fail || elapsed > deadline ? 'fail' : 'pass', service: 'billing-api', time: new Date().toISOString(), uptime_s: Math.floor((Date.now()-started)/1000), checks };
  return res.status(body.status === 'pass' ? 200 : 503).json(body);
});

function time(fn) { const t=Date.now(); return fn().then(()=>Date.now()-t); }

app.listen(8080);

Python (Flask)

from flask import Flask, jsonify
import time

app = Flask(__name__)

@app.route('/health/live')
def live():
    return jsonify(status='pass', service='billing-api', time=time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())), 200

@app.route('/health/ready')
def ready():
    checks = {}
    try:
        # ping_db() should be a cheap query with a short timeout
        ping_db(timeout_ms=150)
        checks['db:primary'] = {'status': 'pass'}
    except Exception:
        checks['db:primary'] = {'status': 'fail'}
    status = 'fail' if any(c['status']=='fail' for c in checks.values()) else 'pass'
    return jsonify(status=status, service='billing-api', time=time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), checks=checks), 200 if status=='pass' else 503

Go (net/http)

func live(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Cache-Control", "no-store")
  w.WriteHeader(http.StatusOK)
  w.Write([]byte(`{"status":"pass","service":"billing-api"}`))
}

Kubernetes and load balancer alignment

LivenessProbe → /health/live with periodSeconds: 10, timeoutSeconds: 1
ReadinessProbe → /health/ready with initialDelaySeconds tuned to startup time
StartupProbe (if heavy init) → /health/startup to suppress early restarts
For cloud load balancers, point health checks at /health/ready so draining occurs on fail (returns 503)

Versioning and compatibility

Do not version health URLs; version the payload fields only if necessary, and maintain backward‑compatible keys.
Additive changes are safest. Removing or renaming fields requires communication with platform/observability owners.

Common anti‑patterns

Doing real business work in health checks (e.g., large queries)
Hitting third‑party APIs synchronously on every request
Long‑running checks without timeouts
Returning 200 with an error message in the body
Exposing secrets or detailed stack traces
Single “/health” that mixes liveness and readiness, causing noisy restarts

A practical checklist

Separate endpoints: /health/live, /health/ready (+ /health/startup if needed)
Clear status codes: 200 pass, 503 fail
Deterministic timeouts with global deadline
Minimal JSON body with aggregate and per‑check results
Safe exposure and redaction; no secrets
Light caching/memoization for deeper checks
Metrics, logs, and alerts wired to health outcomes
Document critical vs. non‑critical dependencies

Conclusion

A health check is a contract between your service and the platform operating it. By separating liveness from readiness, using clear status codes, applying strict timeouts, and returning a compact, stable JSON body, you make that contract reliable. Pair this with conservative security, light caching, and good observability, and your REST API will fail fast, recover quickly, and scale predictably under real‑world conditions.