REST API Caching Strategies with ETags: A Practical Guide

A practical guide to REST API caching with ETags—how they work, when to use them, headers to pair with, and implementation tips with code.

ASOasis
8 min read
REST API Caching Strategies with ETags: A Practical Guide

Image used for representation purposes only.

Why ETags belong in every REST API

Caching is one of the most cost‑effective performance levers for APIs. It trims latency, shrinks bandwidth, and reduces CPU load—often without changing business logic. The challenge is freshness: clients need the latest state without always refetching the full payload. That’s where validators—especially ETags—shine. They let clients ask “has this changed?” and the server answer with a tiny 304 Not Modified when nothing did.

This article covers practical REST API caching strategies centered on ETags: how they work, when to use strong vs. weak tags, how to combine them with Cache-Control, Vary, and CDNs, and implementation tips with code.

A quick refresher: expiration vs. validation

HTTP caching offers two complementary models:

  • Expiration: the server declares how long a response is fresh using Cache-Control (e.g., max-age=60). While fresh, caches serve it without contacting the origin.
  • Validation: when a cached response becomes stale—or immediately, if no freshness is provided—the client sends a conditional request with a validator. If the representation is unchanged, the origin replies 304 Not Modified; otherwise, it returns a new 200 with the updated body.

ETag is the most precise validator because it represents the identity of the exact response representation.

ETags explained

An ETag (entity tag) is an opaque string that uniquely identifies a specific representation of a resource.

  • Strong ETag: indicates byte‑for‑byte equality. If two payloads (post‑compression, headers aside) are identical, they share the same strong ETag. Syntax example: “abc123”.
  • Weak ETag: indicates semantic equivalence, but not byte‑identical. Useful when trivial differences (whitespace, field order) don’t matter. Syntax: W/“abc123”.

Guidelines:

  • Use strong ETags by default for exact representations.
  • Use weak ETags when different encodings or minor formatting changes would otherwise invalidate the cache too often, or when generating a canonical byte‑identical form is expensive.
  • ETags are opaque—don’t leak business meaning, secrets, or PII inside them.

Conditional GET with If-None-Match

The client workflow:

  1. GET /items/42 → server returns 200 OK with ETag: “v7” and Cache-Control.
  2. Later, the client sends GET /items/42 with If-None-Match: “v7”.
  3. If unchanged, server returns 304 Not Modified with no body; client reuses its cached copy.

Example with curl:

# Initial fetch
curl -i https://api.example.com/items/42
# ... ETag: "v7"

# Conditional re-fetch
curl -i https://api.example.com/items/42 \
  -H 'If-None-Match: "v7"'
# 304 Not Modified, tiny response

Benefits:

  • Saves bandwidth and CPU.
  • Preserves correctness—the origin remains the source of truth for freshness checks.

Optimistic concurrency with If-Match

ETags also protect writes against lost updates.

Workflow:

  1. Client reads resource → receives ETag: “v7”.
  2. Client updates with PUT/PATCH and If-Match: “v7”.
  3. If the resource changed in the meantime (now “v8”), the server rejects with 412 Precondition Failed, prompting the client to re-read and merge.

Example:

curl -i -X PATCH https://api.example.com/items/42 \
  -H 'Content-Type: application/json' \
  -H 'If-Match: "v7"' \
  --data '{"name":"New title"}'
# 200 OK on success, or 412 Precondition Failed if out-of-date

Use cases:

  • Collaborative editing
  • High‑contention resources
  • Mobile/offline sync

Designing good ETags

Goals: stable, safe, and cheap to compute.

Common patterns:

  • Hash of response bytes (e.g., SHA‑256). Strong validator; simplest mental model.
  • Version column or monotonically increasing revision number from storage. Fast; works well when every update increments version.
  • Last‑updated timestamp with sufficient precision. Beware collisions if timestamps are granular or clock skewed.

Tips:

  • If you compress at the origin, either compute ETag over the compressed bytes or use Vary: Accept-Encoding to keep encodings separated. For strong validation, the bytes that reach the client must match the ETag basis.
  • For weak ETags on JSON, you can hash a canonical form (sorted keys, normalized whitespace) to avoid cache busting from serialization noise.
  • Keep them opaque; prefer base64url or hex strings.

Cache-Control that pairs well with ETags

The validator decides “is it the same?”; Cache-Control decides “when to ask again?” Combine both.

Useful directives:

  • max-age=60: allow caches to reuse for 60 seconds without revalidation.
  • s-maxage=300: give shared caches (CDNs, proxies) a different TTL.
  • must-revalidate: stale responses must be revalidated before use.
  • stale-while-revalidate=30: allow caches to serve a stale response while revalidating in the background—great for latency.
  • stale-if-error=300: serve stale content if the origin is failing.

Example header set for a list endpoint that can be slightly out of date:

Cache-Control: max-age=30, s-maxage=120, stale-while-revalidate=30, stale-if-error=300
ETag: "a1f5..."
Vary: Accept, Accept-Encoding

Vary and cache keys

Caches store responses keyed by method + URI + selected request headers. Use Vary to declare which headers affect the representation.

  • Vary: Accept: if you return different media types (e.g., JSON vs. CSV).
  • Vary: Accept-Encoding: if you compress.
  • Vary: Authorization: if you serve user‑specific responses while still allowing shared caches to avoid mixing users. Many CDNs bypass caching when Authorization is present unless you opt in.

Avoid unnecessary Vary headers—they fragment the cache.

Authentication and privacy

  • For user‑specific data, prefer Cache-Control: private, max-age=0, must-revalidate with ETags so browsers can validate but shared caches do not store the response.
  • For highly sensitive data, use Cache-Control: no-store to prevent any caching.
  • Do not encode user identifiers in ETags. Keep validators representation‑scoped, not user‑scoped.

Compression, transformations, and strong vs. weak tags

Strong validators require byte identity. Small differences—gzip level, header ordering, whitespace—break equality.

  • If you need strong validation and use compression, compute the ETag over the exact compressed payload you send, and include Vary: Accept-Encoding.
  • If intermediaries transform responses (minification, reformatting), switch to weak ETags or stop the transformations for API routes.

CDNs and reverse proxies

ETags work well end‑to‑end if intermediaries forward them intact.

  • Use s-maxage and stale-while-revalidate to empower CDNs to keep traffic off the origin.
  • Ensure your CDN forwards If-None-Match to origin and propagates 304 back to clients. Most do by default for cache misses.
  • When using Authorization, decide policy: bypass shared caches, or cache per user with Vary: Authorization and private keys (complex, error‑prone). Many teams keep authenticated endpoints private/no-store and rely on conditional requests to the origin.
  • Some CDNs support Surrogate-Control or custom surrogate keys for group invalidation. This complements, not replaces, ETags.

Implementation examples

Node.js/Express: strong ETag from canonical JSON

Express ships with ETag support, but you may want a content‑based strong tag for JSON.

import crypto from 'node:crypto';
import express from 'express';

const app = express();

function canonicalJson(obj) {
  return JSON.stringify(obj, Object.keys(obj).sort());
}

function etagOf(body) {
  const hash = crypto.createHash('sha256').update(body).digest('base64url');
  return '"' + hash + '"'; // strong ETag
}

app.get('/items/:id', async (req, res) => {
  const item = await db.items.findById(req.params.id);
  const body = canonicalJson(item);
  const etag = etagOf(body);

  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end();
  }

  res.setHeader('ETag', etag);
  res.setHeader('Cache-Control', 'max-age=30, stale-while-revalidate=30');
  res.setHeader('Vary', 'Accept, Accept-Encoding');
  res.type('application/json').send(body);
});

app.patch('/items/:id', express.json(), async (req, res) => {
  const current = await db.items.findById(req.params.id);
  const currentEtag = etagOf(canonicalJson(current));

  if (req.headers['if-match'] && req.headers['if-match'] !== currentEtag) {
    return res.status(412).json({ error: 'Precondition Failed' });
  }

  const updated = await db.items.update(req.params.id, req.body);
  const body = canonicalJson(updated);
  res.setHeader('ETag', etagOf(body));
  res.json(updated);
});

app.listen(3000);

Nginx as reverse proxy

Ensure ETags from upstream are preserved and compression is coordinated.

proxy_set_header If-None-Match $http_if_none_match; # forward conditional requests
proxy_pass_request_headers on;

# Don’t strip upstream ETags
proxy_ignore_headers off;

# Coordinate compression and caching
gzip on;
gzip_vary on; # adds Vary: Accept-Encoding

# Cache policy example (if using nginx cache or CDN in front)
add_header Cache-Control "s-maxage=120, stale-while-revalidate=30";

Testing and observability

  • Use curl -I to inspect headers quickly.
  • Simulate validators: curl -H ‘If-None-Match: “xyz”’.
  • Log conditional hits/misses at the origin; emit metrics for 304 rates, average payload size saved, and origin request reductions.
  • Watch for 412 Precondition Failed after deployments—this often indicates client write races you’re now catching correctly.

Common pitfalls to avoid

  • Per‑node ETags that differ across replicas (e.g., auto‑generated by file inode); clients bounce between servers and always miss. Generate ETags at the application layer or ensure deterministic computation across nodes.
  • Hashing uncompressed bytes but serving compressed payloads with strong ETags, causing false mismatches. Either base ETag on the sent bytes or use Vary: Accept-Encoding.
  • Omitting Vary when content negotiation is in play (Accept, Accept-Language, Accept-Encoding), leading to wrong representations from caches.
  • Encoding secrets or user ids inside ETags.
  • Overusing no-store; it disables useful browser validation and forces full refetches.
  • Returning 200 with empty body on If-None-Match instead of 304 Not Modified.
  • Weak ETags where clients rely on byte identity (e.g., Range requests + integrity checks).
  • TTLs that are too long for hot, frequently changing feeds; prefer short max-age plus validators and stale-while-revalidate.
  • Per‑user responses cached in shared caches without Vary: Authorization or private; can leak data.
  • Using Last-Modified with coarse granularity as the only validator; ETags are typically safer and more precise.

Decision recipes

  • Default for read‑mostly JSON: strong ETag + Cache-Control: max-age=30..120, stale-while-revalidate=30, Vary: Accept, Accept-Encoding.
  • Highly dynamic resources: set max-age=0, must-revalidate, still include ETag so clients validate cheaply.
  • Expensive to canonicalize or transformed by intermediaries: use weak ETag (W/"…") and short freshness.
  • Authenticated endpoints: usually Cache-Control: private, max-age=0, must-revalidate + ETag; or no-store for highly sensitive data.
  • Concurrency‑sensitive writes: require If-Match and return 412 on mismatch; consider 428 Precondition Required if clients must always send preconditions.

Putting it all together

The most effective strategy is layered:

  • Emit ETags for every cacheable GET.
  • Pair with sensible Cache-Control to cut round‑trips while keeping data fresh.
  • Use Vary deliberately to shape cache keys and avoid leaks.
  • Enforce optimistic concurrency on writes with If-Match.
  • Measure, iterate, and adjust TTLs by endpoint based on traffic patterns.

Do this well and you’ll deliver faster responses, lower operating costs, and a more robust client experience—without sacrificing correctness.

Related Posts