API Feature Flag Integration Guide: Patterns, Security, and Production Rollouts

A practical, end-to-end guide to integrating feature flags via API—architecture, security, caching, rollouts, observability, and production checklists.

ASOasis
7 min read
API Feature Flag Integration Guide: Patterns, Security, and Production Rollouts

Image used for representation purposes only.

Overview

Feature flags let you ship code safely by toggling behavior at runtime. Integrating flags via an API (rather than only SDKs) gives you language-agnostic control, simpler governance, and the ability to run flags from any compute surface—servers, jobs, edge functions, or internal tools. This guide shows how to design and implement a robust API-first feature flag workflow that scales to production.

When and why to use an API

  • Heterogeneous stack: multiple languages, bespoke runtimes, or serverless/edge where SDKs are heavy.
  • Centralized governance: audit, approval, and policy live behind your API gateway.
  • Stateless compute: jobs and ephemeral workers can fetch decisions on demand.
  • Custom evaluation: route calls through your privacy, fraud, or entitlement layers.

Tradeoff: API evaluation adds network latency and a dependency on your flag service. You’ll mitigate this with caching, streaming updates, and graceful fallbacks described below.

Core concepts

  • Flag: a named toggle controlling behavior (boolean, multivariate, JSON payload).
  • Targeting attributes: context used to evaluate (userId, accountId, region, plan, appVersion, requestIp, etc.).
  • Segments: saved predicates (e.g., beta_testers, enterprise_accounts).
  • Rules: ordered targeting logic (if region == “EU” and plan == “pro” → variant “B”).
  • Bucketing: deterministic assignment to variants using hashing and salts.
  • Environments: dev, staging, prod—isolated flag states.
  • Governance: approvals, change reasons, owners, TTLs, and audit logs.

Reference architecture

  1. Caller builds a context object (attributes) and default value.
  2. Caller requests an evaluation from the Flag API.
  3. API evaluates: rules → segments → bucketing → constraints.
  4. API returns the chosen variant, payload, and metadata (reason, ruleId, requestId, TTL).
  5. Caller caches decision locally and instruments metrics.

For high-throughput paths, combine: short-lived caches (30–120s), background refresh, and streaming updates to minimize latency and protect the control plane.

Security and privacy

  • Authentication: prefer OAuth 2.0 client credentials for server-to-server; rotate secrets. For internal services, mTLS is excellent.
  • Authorization: least-privileged scopes (e.g., flags:read, flags:write). Never embed full-access secrets in user devices.
  • Request signing: optional HMAC for tamper detection between trusted services.
  • PII: hash stable identifiers (SHA-256 with a per-environment salt) for bucketing; only send attributes needed for rules. Redact or tokenize sensitive fields.
  • Data residency: keep evaluation in-region; if exporting decisions, avoid raw identifiers.

Data modeling

  • Key naming: use kebab or snake case (e.g., checkout-new-flow). Prefix for domain (checkout.new_flow) if that helps ownership.
  • Types: boolean, percentage rollout, multivariate (e.g., {variant: “B”, payload: {color: “#ff4d4f”}}), and JSON payloads.
  • Metadata: owner, team, TTL/sunset date, jiraTicket, riskLevel, killSwitch flag, and linked dashboards.
  • Segments: dynamic (rule-based) and static (uploaded IDs). Keep segment size reasonable; shard or compress when large.

API surface (reference)

Below is a vendor-neutral sketch you can adapt.

  • Fetch flags (config sync):
    • GET /v1/flags?environment=prod
    • Response headers: ETag, Cache-Control: max-age=60
  • Evaluate in batch (recommended for hot paths):
    • POST /v1/evaluate
    • Body: list of contexts; options for includeReasons, returnPayloads, explain.
  • Manage flags:
    • POST /v1/flags, PATCH /v1/flags/{key}, DELETE /v1/flags/{key}
    • Concurrency: If-Match with ETag.
  • Segments:
    • POST /v1/segments, PATCH /v1/segments/{key}
  • Streaming updates:
    • GET /v1/stream (Server-Sent Events) or /v1/ws (WebSocket) to push invalidations.
  • Audit:
    • GET /v1/audit?entity=flag:checkout-new-flow

Example: evaluate request

{
  "environment": "prod",
  "requests": [
    {
      "flagKey": "checkout-new-flow",
      "defaultValue": false,
      "context": {
        "userId": "8a7f4c91",
        "accountId": "acme-co-1337",
        "region": "US",
        "plan": "pro",
        "appVersion": "2.3.1"
      }
    },
    {
      "flagKey": "pricing-experiment",
      "defaultValue": "control",
      "context": {
        "accountId": "acme-co-1337",
        "country": "DE",
        "currency": "EUR"
      }
    }
  ],
  "options": {"includeReasons": true, "returnPayloads": true}
}

Example: evaluate response

{
  "requestId": "r-01J3Z3ZQ9P3ZQ7",
  "decisions": [
    {
      "flagKey": "checkout-new-flow",
      "variant": true,
      "reason": {"ruleId": "rule-2", "segment": "beta_testers"},
      "ttlSeconds": 60
    },
    {
      "flagKey": "pricing-experiment",
      "variant": "B",
      "payload": {"price": 23.00, "currency": "EUR"},
      "bucketing": {"seed": "pricing-experiment", "hash": "e1b3…"},
      "ttlSeconds": 120
    }
  ]
}

Quick-start integration

cURL (smoke test)

curl -s -X POST https://flags.example.com/v1/evaluate \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "environment":"prod",
    "requests":[{"flagKey":"checkout-new-flow","defaultValue":false,"context":{"userId":"8a7f4c91","plan":"pro"}}],
    "options":{"includeReasons":true}
  }'

Node.js (fetch)

import fetch from 'node-fetch';

const body = {
  environment: 'prod',
  requests: [{
    flagKey: 'checkout-new-flow',
    defaultValue: false,
    context: { userId: '8a7f4c91', plan: 'pro' }
  }],
  options: { includeReasons: true }
};

const res = await fetch('https://flags.example.com/v1/evaluate', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.FLAG_TOKEN}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(body)
});
const json = await res.json();

Python (requests)

import os, requests

payload = {
  'environment': 'prod',
  'requests': [{
    'flagKey': 'checkout-new-flow',
    'defaultValue': False,
    'context': {'userId': '8a7f4c91', 'plan': 'pro'}
  }]
}

auth = {'Authorization': f"Bearer {os.environ['FLAG_TOKEN']}"}
resp = requests.post('https://flags.example.com/v1/evaluate', json=payload, headers=auth, timeout=0.2)
resp.raise_for_status()
print(resp.json())

Caching and propagation

  • Client-side cache: cache decisions by (flagKey, context hash) for ttlSeconds. Use an LRU to bound memory.
  • Config cache: GET /v1/flags with ETag; on 304 Not Modified reuse prior config.
  • Streaming: subscribe to /v1/stream; invalidate cache on messages: {“type”:“invalidate”,“flagKey”:“checkout-new-flow”}.
  • Backoff and jitter: respect Retry-After and 429; exponential backoff with decorrelated jitter.
  • Budget: target p50 evaluation latency < 10 ms for local cache hits; < 50–100 ms for remote.

Rollout strategies

  • Release toggles: hide incomplete code; default off in prod; enable for internal users.
  • Ops toggles (kill switches): fast-disable risky subsystems.
  • Permission toggles: gate premium features by plan/entitlements.
  • Experiment toggles: multivariate, 50/50 or weighted rollouts.
  • Progressive delivery: ramp 1% → 5% → 10% → 25% → 50% → 100%, pausing on error budgets or business KPI regressions.
  • Sticky bucketing: use consistent hashing of a stable identifier and seed; never re-bucket users mid-experiment.

Observability and governance

  • Metrics: evaluation QPS, p95 latency, error rate, cache hit ratio; business KPIs linked to key flags.
  • Tracing: include requestId and flag decisions as span attributes (flag.key, flag.variant, flag.ruleId).
  • Logging: structured logs with version, environment, and caller service.
  • Audit: who changed what and why; require change reasons and approval for prod flags.
  • TTL and cleanup: every flag has a sunset date; CI fails if expired flags remain active.

Reliability patterns

  • Defaults first: always provide a safe defaultValue on evaluate requests.
  • Circuit breaker: if error rate spikes or latency > threshold, short-circuit to defaults and log.
  • Idempotency: for writes (POST /v1/flags, rollout steps), include Idempotency-Key; server must return the same result on retries.
  • Concurrency control: use ETag + If-Match on PATCH to avoid lost updates.
  • Rate limits: read 429 → backoff; write quotas stricter; expose X-RateLimit-* headers.
  • Multi-region: run active-active; pin callers to nearest region; deterministic bucketing must use the same seed and hash across regions and languages.

Mobile, web, and edge

  • Public clients: never expose full-access tokens. Use short-lived, scoped JWTs that list allowed flag keys and attributes permitted.
  • Offline mode: bundle a minimal bootstrap (defaults or last-known values) signed by the server; verify signature before using.
  • Sync cadence: background refresh every 60–300 seconds; real-time invalidations via WebSocket/SSE where possible.

CI/CD and Infrastructure as Code

Represent flags as code to review and version.

Example flag config (YAML):

flagKey: checkout-new-flow
environment: prod
state: off
owners: ["payments-team"]
rules:
  - id: rule-1
    when: plan == 'pro' and region == 'US'
    variant: true
bucketing:
  seed: checkout-new-flow
metadata:
  jira: PAY-1234
  ttl: 2026-12-31

Pipeline steps:

  • Lint rules (type checks, unknown attributes).
  • Validate no PII in payloads.
  • Unit-test evaluation fixtures.
  • Require approvals for prod env.
  • Apply via GitOps to the flag API.

Testing

  • Unit tests: deterministic bucketing and rule precedence with fixed seeds.
  • Contract tests: mock the API with OpenAPI schemas; verify error handling (timeouts, 429s, 5xx).
  • Integration: canary deploy calling the API; assert cache hit ratios and fallback behavior.
  • Chaos drills: deliberately fail the API to confirm safe defaults and circuit breakers.

Failure modes and how to handle

  • API unreachable: use cached decisions or defaults; log once per interval, not per request (avoid alert storms).
  • Stale config: respect ttlSeconds; surface a metric when cache age > SLO.
  • Hash drift: if migrating hash algorithm, dual-write/dual-evaluate and compare decisions before cutover.
  • Segment bloat: warn when static lists exceed thresholds; migrate to indexed lookups or server-side joins.

Migration playbook (from env vars or config files)

  1. Inventory toggles and map owners, risk, and sunset dates.
  2. Create flags in the API with identical semantics; default to current behavior.
  3. Dual-read: call API but trust local value; compare outcomes over a week.
  4. Flip trust to API; keep env var as last-resort override for one release.
  5. Remove legacy toggles; update runbooks and dashboards.

Example: adding a rollout step safely

# Fetch current flag for editing
etag=$(curl -s -H 'Authorization: Bearer $TOKEN' https://flags.example.com/v1/flags/checkout-new-flow | jq -r '.["etag"]')

# Propose: move from 10% to 25%
curl -s -X PATCH https://flags.example.com/v1/flags/checkout-new-flow \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'If-Match: '"$etag" \
  -H 'Idempotency-Key: roll-2026-06-25-01' \
  -H 'Content-Type: application/json' \
  -d '{
    "changeReason": "Expanding canary after stable metrics",
    "rules": [{"id":"gradual","percentage":25,"seed":"checkout-new-flow"}]
  }'

Production checklist

  • Every evaluate call includes a safe default.
  • Timeouts < 200 ms; retries with jitter; circuit breaker in place.
  • Cache with TTL; subscribe to streaming invalidations.
  • Deterministic bucketing with documented seed and hash.
  • OAuth/mTLS; least-privileged scopes; no secrets in public clients.
  • Audit and approvals enforced in prod; flags have TTLs and owners.
  • Observability dashboards for latency, errors, cache hits, and business KPIs.
  • Runbooks for kill switches and rollback.

Conclusion

An API-first feature flag integration gives you consistent control across languages and runtimes, strong governance, and safer rollouts. Treat flags like code, secure the edges, instrument everything, and design for failure. With those practices, you can ship faster while protecting reliability and user experience.

Related Posts