Designing Production-Grade REST API Health Check Endpoints
Design robust REST API health check endpoints: liveness vs readiness, payload schema, dependencies, security, caching, and production-ready examples.
Image used for representation purposes only.
Why health checks matter
A health check endpoint is your API’s heartbeat. Load balancers, orchestrators, and uptime monitors rely on it to route traffic, trigger restarts, and alert humans. A well‑designed health check:
- Detects when the app process is alive and when it’s actually able to serve requests
- Minimizes false positives/negatives with deterministic rules and tight timeouts
- Communicates unambiguously to humans and machines with a stable response contract
- Respects security boundaries and doesn’t leak secrets or internals unnecessarily
This article outlines pragmatic patterns to design production‑grade REST API health checks that integrate cleanly with Kubernetes, serverless platforms, and traditional load balancers.
Liveness vs. readiness vs. startup
Not all “health” is equal. Distinguish checks by intent:
- Liveness (live): Is the process running and not deadlocked? If no, restart it. Keep this check fast and shallow.
- Readiness (ready): Can the service successfully handle real traffic now? Verify critical dependencies and configuration.
- Startup (optional): Has the app completed expensive one‑time initialization (migrations, warm caches) before other checks begin?
Recommended endpoints:
- GET /health/live → shallow; returns 200 if event loop/threads respond
- GET /health/ready → deeper; returns 200 only if critical dependencies pass
- GET /health/startup → 200 after initialization phase
Avoid using a single /health for all cases; split intent to reduce flapping and noisy restarts.
HTTP method, path, and semantics
- Method: Use GET. It is safe and idempotent.
- Paths: Prefer predictable, non-versioned paths under /health (e.g., /health/live, /health/ready). Health is infrastructure‑facing and not part of your public API versioning scheme.
- Latency budget: Target p95 < 50 ms for liveness and < 200 ms for readiness (excluding slow external dependencies where you’ll use timeouts and partial aggregation).
Status codes that convey truth
- 200 OK: Healthy for the intended check
- 503 Service Unavailable: Unhealthy or not ready; signals load balancers to stop routing traffic
- 500 Internal Server Error: Unexpected failure in the checker itself; treat as unhealthy
- 429 Too Many Requests: If you rate‑limit health hits (rare), but generally exempt health endpoints from limits
Do not return 2xx with an error payload; machines key off status codes.
Response contract (keep it small and stable)
Even though load balancers mostly care about status codes, a concise JSON body helps humans and observability tools.
Example minimal, stable response for readiness:
{
"status": "pass",
"service": "billing-api",
"version": "2.4.1",
"time": "2026-04-07T15:23:18Z",
"uptime_s": 48213,
"checks": {
"db:primary": {"status": "pass", "latency_ms": 14},
"redis": {"status": "pass", "latency_ms": 3},
"queue": {"status": "warn", "detail": "lag=523"}
}
}
- status: pass | warn | fail (map warn→200 with alerts; fail→503)
- checks: minimal, redacted fields only; exclude secrets
- version/commit: aids rollbacks and correlation with deployments
Optional JSON Schema for the response
Keep the contract explicit and testable.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schemas/health.schema.json",
"type": "object",
"required": ["status", "service", "time"],
"properties": {
"status": {"enum": ["pass", "warn", "fail"]},
"service": {"type": "string"},
"version": {"type": ["string", "null"]},
"time": {"type": "string", "format": "date-time"},
"uptime_s": {"type": ["number", "null"]},
"checks": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": ["status"],
"properties": {
"status": {"enum": ["pass", "warn", "fail"]},
"latency_ms": {"type": ["number", "null"]},
"detail": {"type": ["string", "null"]}
}
}
}
}
}
What to check (and how deep)
- Liveness: Only app loop responsiveness and basic memory/CPU sanity. Avoid I/O. Example: respond “pass” if a trivial in‑process action completes within 50 ms.
- Readiness: Validate dependencies required to serve requests:
- Database connectivity (and a trivial SELECT 1 or ping)
- Cache reachability (PING)
- Message broker connection (open channel, publish/confirm noop)
- Critical downstream HTTP services (HEAD or GET on a cheap endpoint)
- Configuration/secrets presence
- Migration state if mandatory for correctness
Use timeouts per dependency (e.g., 100–300 ms each) and cap total readiness time with a global deadline. Return partial results with an aggregate status that errs on safety.
Aggregation logic
- pass if all critical checks pass
- warn if non‑critical checks fail or exceed soft thresholds (e.g., queue lag high)
- fail if any critical check fails or the global timeout elapses
Document which checks are critical. Put this mapping in code, not tribal knowledge.
Performance, timeouts, and frequency
- Keep checks cheap. Prefer cached results refreshed every N seconds over recomputing on every hit, especially when external calls are needed.
- Use circuit breakers: if a dependency is flapping, avoid amplifying load by hammering it from health checks.
- Recommended cadence: orchestrators often poll every 10s; your endpoint should handle bursts without material CPU/IO impact.
Caching patterns
- Liveness: no cache, fully in‑process
- Readiness: memoize dependency results for 2–10 seconds with per‑check TTLs
- HTTP headers: Cache-Control: no-store (for external monitors). For internal sidecars that tolerate slight staleness, allow short max-age.
Security and exposure
- Network boundary first: expose liveness only on localhost or pod network; expose readiness to cluster/load balancer; avoid internet exposure.
- Authentication: usually not required inside trusted networks. If public, require auth or IP allowlists.
- Redaction: include high‑level status, not secrets, connection strings, or stack traces.
- Rate limiting: exempt cluster IPs; apply light limits to public monitors to prevent abuse.
Observability integration
- Emit a metric for each check: health_check_status{check=“db”} = {0,1,2} and latency
- Create alerts tied to readiness fail rate and sustained warn states
- Correlate status with deployment version labels to detect bad releases quickly
- Log a compact single‑line JSON entry per health evaluation
Failure modes to design for
- Hung process: liveness must fail even if sockets accept but can’t make progress
- Slow dependencies: use timeouts so health returns within deadline; classify as fail/warn deterministically
- Partial outages: do not mask failures; return 503 on readiness fail so traffic drains
- Startup storms: use a separate startup probe to prevent premature restarts during migrations or JIT warmup
Implementation sketches
Below are minimal examples. Each uses two endpoints: /health/live and /health/ready.
Node.js (Express)
import express from 'express';
import { pingDb, pingRedis } from './deps.js';
const app = express();
const started = Date.now();
app.get('/health/live', (req, res) => {
res.set('Cache-Control', 'no-store');
return res.status(200).json({ status: 'pass', service: 'billing-api', time: new Date().toISOString() });
});
app.get('/health/ready', async (req, res) => {
res.set('Cache-Control', 'no-store');
const deadline = 300; // ms
const t0 = Date.now();
const checks = {};
const withTimeout = (p, ms) => Promise.race([
p, new Promise((_, r) => setTimeout(() => r(new Error('timeout')), ms))
]);
try { checks['db:primary'] = { status: 'pass', latency_ms: await time(() => withTimeout(pingDb(), 150)) }; }
catch { checks['db:primary'] = { status: 'fail' }; }
try { checks['redis'] = { status: 'pass', latency_ms: await time(() => withTimeout(pingRedis(), 80)) }; }
catch { checks['redis'] = { status: 'fail' }; }
const fail = Object.values(checks).some(c => c.status === 'fail');
const elapsed = Date.now() - t0;
const body = { status: fail || elapsed > deadline ? 'fail' : 'pass', service: 'billing-api', time: new Date().toISOString(), uptime_s: Math.floor((Date.now()-started)/1000), checks };
return res.status(body.status === 'pass' ? 200 : 503).json(body);
});
function time(fn) { const t=Date.now(); return fn().then(()=>Date.now()-t); }
app.listen(8080);
Python (Flask)
from flask import Flask, jsonify
import time
app = Flask(__name__)
@app.route('/health/live')
def live():
return jsonify(status='pass', service='billing-api', time=time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())), 200
@app.route('/health/ready')
def ready():
checks = {}
try:
# ping_db() should be a cheap query with a short timeout
ping_db(timeout_ms=150)
checks['db:primary'] = {'status': 'pass'}
except Exception:
checks['db:primary'] = {'status': 'fail'}
status = 'fail' if any(c['status']=='fail' for c in checks.values()) else 'pass'
return jsonify(status=status, service='billing-api', time=time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), checks=checks), 200 if status=='pass' else 503
Go (net/http)
func live(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Cache-Control", "no-store")
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"status":"pass","service":"billing-api"}`))
}
Kubernetes and load balancer alignment
- LivenessProbe → /health/live with periodSeconds: 10, timeoutSeconds: 1
- ReadinessProbe → /health/ready with initialDelaySeconds tuned to startup time
- StartupProbe (if heavy init) → /health/startup to suppress early restarts
- For cloud load balancers, point health checks at /health/ready so draining occurs on fail (returns 503)
Versioning and compatibility
- Do not version health URLs; version the payload fields only if necessary, and maintain backward‑compatible keys.
- Additive changes are safest. Removing or renaming fields requires communication with platform/observability owners.
Common anti‑patterns
- Doing real business work in health checks (e.g., large queries)
- Hitting third‑party APIs synchronously on every request
- Long‑running checks without timeouts
- Returning 200 with an error message in the body
- Exposing secrets or detailed stack traces
- Single “/health” that mixes liveness and readiness, causing noisy restarts
A practical checklist
- Separate endpoints: /health/live, /health/ready (+ /health/startup if needed)
- Clear status codes: 200 pass, 503 fail
- Deterministic timeouts with global deadline
- Minimal JSON body with aggregate and per‑check results
- Safe exposure and redaction; no secrets
- Light caching/memoization for deeper checks
- Metrics, logs, and alerts wired to health outcomes
- Document critical vs. non‑critical dependencies
Conclusion
A health check is a contract between your service and the platform operating it. By separating liveness from readiness, using clear status codes, applying strict timeouts, and returning a compact, stable JSON body, you make that contract reliable. Pair this with conservative security, light caching, and good observability, and your REST API will fail fast, recover quickly, and scale predictably under real‑world conditions.
Related Posts
API Backward Compatibility Strategies: Designing Change Without Breaking Clients
Practical strategies to keep APIs backward compatible—versioning, additive changes, deprecation, rollout, and testing for REST, GraphQL, and gRPC.
API Load Testing with k6: A Step-by-Step Tutorial for Reliable APIs
Learn API load testing with Grafana k6: install, script, model workloads, set thresholds, run in CI, and analyze results with practical examples.
Implementing HATEOAS in REST APIs: A Practical Guide
A practical guide to implementing HATEOAS in REST APIs with formats, examples, tooling, testing, and rollout strategies.