Designing Idempotent APIs: A Practical Guide

A practical guide to API idempotency: keys, retries, storage design, error handling, and patterns for exactly-once effects in distributed systems.

ASOasis
8 min read
Designing Idempotent APIs: A Practical Guide

Image used for representation purposes only.

Why Idempotency Matters in API Design

Idempotency is the property that performing the same operation multiple times has the same effect as performing it once. In distributed systems, networks fail, clients retry, and servers crash mid-response. Without idempotency, these realities turn a simple “create an order” into duplicates, double-charges, or data corruption.

This guide shows how to design, implement, and test idempotency for HTTP APIs—covering keys, storage, retries, status codes, concurrency, and long‑running operations.

Idempotency, Safety, and Determinism—Know the Differences

  • Safe methods: Do not change server state (e.g., GET, HEAD). Safe ≠ idempotent effects; they should be both, but “safety” is about no side effects.
  • Idempotent methods: Multiple identical requests result in the same state (e.g., PUT, DELETE). POST is not idempotent by default but can be made so.
  • Determinism: Given the same inputs you get the same outputs. Idempotency focuses on effects, not necessarily identical outputs (e.g., timestamps may differ).

When You Need Idempotency

  • Money and inventory: charges, refunds, transfers, reservations.
  • Resource creation where duplicates are harmful: user signups, orders, ticket issuance.
  • Webhooks and message consumption where at‑least‑once delivery is common.
  • Any endpoint clients are likely to retry due to timeouts, 5xx errors, or flaky networks.

HTTP Semantics and Idempotency

  • GET/HEAD/OPTIONS: Safe and idempotent. Cache aggressively; no idempotency key needed.
  • PUT: Replace a resource; must be idempotent by definition. Use ETags for concurrency control.
  • DELETE: Idempotent by definition; repeated deletes should remain successful (commonly 204 or 404 on subsequent calls is acceptable—pick one and document it).
  • PATCH: Partially updates state; can be idempotent if operations are defined as such. Prefer conditional requests.
  • POST: Not idempotent by default. Use an Idempotency‑Key to make it effectively idempotent for a given intent.

Designing the Idempotency Key

  • Location: Prefer request header Idempotency-Key. Allow a body field only if headers are unavailable.
  • Scope: Bind the key to the tuple (HTTP method, canonicalized path, authenticated principal/tenant, request body hash). This prevents cross‑route replays.
  • Format: Opaque, hard‑to‑guess string (UUIDv4 or ULID). Length ≤ 255 bytes.
  • TTL: Set according to business risk and client retry windows. Common: 24–72 hours; for financial operations: 7–30 days.
  • Collision policy: If the same key arrives with a different payload hash, return 409 Conflict.

Client Responsibilities and Retry Strategy

  • Generate a fresh Idempotency-Key per user intent, reuse it across retries for the same intent only.
  • Retry on network timeouts and 5xx. Do not retry on most 4xx except 409/429 when documented.
  • Use exponential backoff with jitter (e.g., full jitter). Cap total retry window ≤ server TTL.
  • Preserve the original key through redirects and proxy layers.

Example POST with key:

curl -X POST https://api.example.com/v1/charges \
  -H 'Authorization: Bearer <token>' \
  -H 'Idempotency-Key: 9c2b5e59-3b7a-4ea3-b55d-3d33b2e9c1f1' \
  -H 'Content-Type: application/json' \
  -d '{"amount": 4200, "currency": "USD", "source": "tok_abc123"}'

Server-Side Idempotency Store

You need a durable “idempotency store” that records the first successful (or in‑progress) result for a given key and replays it for subsequent attempts.

Core fields you’ll want:

  • key: string primary or unique
  • request_fingerprint: hash(method + path + normalized body + principal)
  • status: in_progress | succeeded | failed (non‑retryable)
  • response_status, response_headers, response_body
  • created_at, updated_at, expires_at
  • lock_version or a short‑lived lock flag

Storage Backends

  • Relational DB: Strong transactional guarantees; use a unique constraint on key.
  • Redis (with AOF/RDB persistence): Fast, add a SET NX + TTL for locks; persist final results.
  • Hybrid: Fast lock in Redis, canonical record in DB.

Atomic “insert-or-get” pattern (SQL)

CREATE TABLE idempotency (
  key TEXT PRIMARY KEY,
  request_fingerprint TEXT NOT NULL,
  status TEXT NOT NULL CHECK (status IN ('in_progress','succeeded','failed')),
  response_status INT,
  response_headers JSONB,
  response_body JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at TIMESTAMPTZ NOT NULL
);

-- For quick lookups by expiration cleanup
CREATE INDEX idx_idem_expires ON idempotency (expires_at);

Minimal middleware flow (Python‑like pseudocode)

fp = sha256(method + canon_path + normalize_json(body) + principal_id)

rec = db.get_by_key(idem_key)
if rec:
    if rec.request_fingerprint != fp:
        return http_409_conflict({"error":"key-payload-mismatch"})
    if rec.status in ("succeeded", "failed"):
        return replay(rec.response_status, rec.response_headers, rec.response_body)
    # in_progress: either wait, or return 409/202 depending on your policy
    return http_202_accepted({"operation_id": idem_key})

# First writer wins
try:
    db.insert({
        "key": idem_key,
        "request_fingerprint": fp,
        "status": "in_progress",
        "expires_at": now()+ttl
    })
except UniqueViolation:
    return http_202_accepted({"operation_id": idem_key})

# Execute side effects exactly once
result = perform_business_logic()

# Persist final outcome and replayable response
db.update(idem_key, {
    "status": "succeeded",
    "response_status": result.status,
    "response_headers": whitelist_headers(result.headers),
    "response_body": result.body,
    "updated_at": now()
})

return result

Notes:

  • Bind the first successful response to the key. Subsequent retries return the same response (status, headers subset, body) even if the server node handling it differs.
  • If business logic partially commits but the response is lost, the next retry must detect the committed state and return the stored response.

Handling Concurrency

Two identical requests may arrive concurrently:

  • Use a transaction or distributed lock per key to ensure only one executes side effects.
  • For the “loser” request: either block briefly and replay the stored response, or return 202/409 with instructions to retry.

For resource updates, pair idempotency with optimistic concurrency:

  • ETag/If-Match on PUT/PATCH to protect against lost updates.
  • 412 Precondition Failed when ETag mismatches.

Status Codes and Responses

  • 200/201: Successful; store and replay exactly.
  • 202: Accepted for long‑running work. Include operation_id and polling URL. Replays of the same key before completion may return 202 consistently.
  • 204: Successful no-content responses (e.g., DELETE) are fine to replay.
  • 409: Key‑payload mismatch or other semantic conflicts.
  • 422: Validation failed; store and replay if you consider the failure final. Otherwise mark failed (non‑retriable) and return consistently.
  • 429: Rate limited; do not store as final; return Retry-After.
  • 5xx: Do not store as final; client should retry.

Asynchronous and Long‑Running Operations

For operations that take time:

  • First request with Idempotency-Key creates an operation resource and returns 202 with Location: /operations/{id}.
  • Replays of the same key return the same 202 and URL until completion.
  • When complete, GET /operations/{id} returns the terminal result. You may also update the idempotency store to include the final response for direct replay.

Worker side idempotency:

  • Use an inbox/outbox pattern. Each job/event has a unique effect_id. Workers insert effect_id into a processed table with a unique constraint before executing irreversible effects; on conflict, skip.

Idempotent Webhooks and Messages

  • Require providers to include a unique event_id and send signatures.
  • Maintain a webhook_inbox table keyed by event_id. If insert fails due to uniqueness, drop the duplicate.
  • Make handlers idempotent: avoid additive side effects without guards, and use upserts/conditional updates.

Exactly‑Once vs. At‑Least‑Once

“Exactly‑once delivery” across distributed components is impractical at scale. Target at‑least‑once delivery with idempotent consumers and de‑duplication at the effect boundary:

  • Use natural unique constraints (e.g., transfer_id, order_number) to gate side effects.
  • Treat the idempotency store as control‑plane; enforce uniqueness in the domain model too.

Security and Privacy Considerations

  • Treat Idempotency-Key as semi‑sensitive: do not expose it in public URLs or logs paired with PII.
  • Bind the key to the authenticated principal to prevent cross‑tenant replays.
  • Redact sensitive fields in stored response bodies or encrypt at rest.

Observability and Operations

Track:

  • Requests using idempotency by route
  • Replay rate and reasons (timeout, 5xx, client retries)
  • Key collisions and payload mismatches
  • In‑progress timeouts and stuck keys
  • Storage latency/hit ratio and TTL expirations

Emit structured logs with fields: idem_key, fingerprint, principal_id, correlation_id, attempt, outcome.

Common Pitfalls

  • Storing results only in RAM or ephemeral caches—restarts break guarantees.
  • Using long cache TTLs without considering business risk (e.g., preventing legitimate retries after a week).
  • Failing to bind the key to the payload and route, enabling cross‑request replay bugs.
  • Returning different response shapes on replay—clients may break. Store and replay consistently.
  • Treating 5xx as final and storing them—blocks legitimate retries.

Practical Time Windows

  • Client retry window: 30 seconds to several minutes with backoff for interactive flows; longer for batch jobs.
  • Server TTL: Client window + replication delay + a safety margin. Example: 48 hours for orders; 7–30 days for financial operations.

Minimal Go Handler Sketch

func handle(w http.ResponseWriter, r *http.Request) {
  key := r.Header.Get("Idempotency-Key")
  if key == "" { http.Error(w, "missing idempotency key", 400); return }

  fp := Fingerprint(r)
  rec, err := store.Get(key)
  if err == nil {
    if rec.Fingerprint != fp { http.Error(w, "key-payload-mismatch", 409); return }
    if rec.Done { replay(w, rec); return }
    w.WriteHeader(http.StatusAccepted); io.WriteString(w, opJSON(key)); return
  }

  if !store.TryBegin(key, fp, time.Now().Add(ttl)) { // insert if absent
    w.WriteHeader(http.StatusAccepted); io.WriteString(w, opJSON(key)); return
  }

  res := DoBusiness(r.Context())
  store.Complete(key, res)
  Write(w, res)
}

Testing Idempotency

  • Unit: Key–payload binding, replay equality, mismatch returns 409.
  • Concurrency: Fire N identical requests; assert only one side effect.
  • Resilience: Kill the server after committing the effect but before responding; retry must return the stored response.
  • Property tests: Randomized bodies, fuzzing headers, clock skew.
  • Chaos: Inject network timeouts and 5xx to measure replay rate and latency.

Quick Checklist

  • Header: Idempotency-Key required for POST/PATCH/side‑effecting endpoints.
  • Bind: method + path + body + principal → fingerprint.
  • Store: durable record with unique key, request fingerprint, final response.
  • Concurrency: single-writer per key; others wait or get 202/409.
  • Errors: 409 on mismatch; never store transient 5xx as final.
  • TTLs: business‑appropriate and documented.
  • Observability: metrics, logs, and dashboards.
  • Domain guards: unique business identifiers to gate irreversible effects.

Conclusion

Idempotency is not a luxury—it’s a reliability contract. With a clear key strategy, a durable store, careful status codes, and idempotent consumers, your API can withstand retries, crashes, and duplicates without harming correctness or user trust.

Related Posts