API Key Management and Rotation: A Practical Security Playbook

A practical guide to API key management and rotation—covering strategies, automation patterns, detection, and governance without downtime.

ASOasis
7 min read
API Key Management and Rotation: A Practical Security Playbook

Image used for representation purposes only.

Why API key rotation matters

API keys still power a huge share of service-to-service integrations, third‑party SaaS connectors, and mobile or IoT workloads. They are simple, fast to adopt—and dangerously easy to misuse. Stagnant, over‑privileged, or leaked keys turn minor coding mistakes into breach‑scale incidents. Rotation is your pressure relief valve: it limits blast radius, shortens attacker dwell time, and enforces operational discipline across teams.

This guide distills proven patterns for managing and rotating API keys with minimal downtime and maximum assurance.

Threat model: what you are defending against

  • Accidental exposure: keys committed to repos, pasted in tickets, or logged in plaintext.
  • Supply‑chain leakage: compromised CI runners, package scripts, or third‑party build steps.
  • Credential stuffing: reused keys across environments or vendors.
  • Long‑lived access: stale keys that outlive projects, contractors, or devices.
  • Over‑privilege: single key grants broad, production‑level access.
  • Inadequate telemetry: no audit trail to answer “who used which key, when, from where?”

Clarify scope up front: “API keys” here include classic static tokens (header or query param), vendor‑issued personal access tokens, and service tokens distinct from OAuth 2.0 or mTLS client certs. Rotation principles apply broadly, even if mechanics differ.

Core principles

  • Least privilege by design: one key, one purpose, minimal scope.
  • Short lifetime by default: prefer ephemeral or time‑boxed keys.
  • Isolation: separate keys per environment, tenant, region, and workload.
  • Determinism and automation: rotation is push‑button or fully scheduled; no manual steps.
  • Observability: every key has an owner, labels, usage logs, and expiry.
  • Crypto agility: make it easy to change algorithms, formats, and issuers.
  • Reversibility: safe rollback if a rotation breaks consumers.

The API key lifecycle

  1. Generate: create key in a controlled system (KMS, HSM, or provider console). Tag with owner, purpose, environment, expiry.
  2. Distribute: deliver via a secrets manager or workload identity. Never via chat/email.
  3. Store: at rest in a vault; in memory in apps; never in files, images, or client bundles.
  4. Use: load at startup via environment injection or sidecar; avoid printing to logs.
  5. Rotate: replace on schedule or on signal (leak, role change, vendor event).
  6. Revoke: immediately disable old key after successful cutover.
  7. Retire: archive minimal metadata for audits; delete body of the secret.

Rotation strategies

  • Rolling rotation (dual keys): create a new key while the old one remains valid. Update consumers to prefer the new key, validate traffic, then revoke the old one. Best for external SaaS that supports two active tokens.
  • Blue/green secrets: maintain “current” and “next” secret versions. Deploy green (next), shift traffic gradually, then flip alias to green and retire blue.
  • Ephemeral/short‑lived tokens: issue Just‑In‑Time tokens (minutes to hours) through a broker service using workload identity. Rotation happens continuously by design.
  • Event‑driven rotation: rotate on detected leak, role change, incident declaration, or vendor compromise notice.
  • Envelope re‑encryption: if keys encrypt data, rotate data keys through a re‑wrap process using a new master key; plan for background migration.

Reference architecture

  • Identity: map workloads (pods, functions, VMs) to identities (workload federation, OIDC, SPIFFE/SPIRE). Prefer identity‑based access to the secrets manager over distributing long‑lived static keys.
  • Secrets management: central vault with versioning, access policies, audit logs, and automatic rotation hooks.
  • Key broker: internal service that issues short‑lived access tokens from upstream static credentials and enforces policy (rate limits, IP allowlists, device posture).
  • Policy engine: codify who can generate, approve, and rotate which classes of keys; gate changes via code review and change windows.
  • Telemetry: aggregate secret access logs, API usage logs, and egress flow logs. Add anomaly detection and “canary keys.”

Implementation patterns by use case

  1. Third‑party SaaS that allows two keys per account

    • Keep old and new keys active during a short overlap (e.g., 24 hours).
    • Update secret alias to point to the new version; redeploy consumers.
    • Verify requests succeed with the new key; revoke the old one.
  2. Internal microservices with a gateway

    • Replace static keys with gateway‑issued, signed tokens (short TTL, audience‑scoped).
    • Back the gateway with a root secret stored in a KMS/HSM and rotate it regularly.
  3. CI/CD pipelines

    • Remove static secrets from runners. Use OIDC workload identity to exchange a signed identity token for time‑scoped credentials on each job.
    • Where vendors require static keys, store them in a secrets manager and inject at runtime; schedule rotation via pipeline jobs.
  4. Mobile/IoT clients

    • Never ship raw keys in apps or firmware. Use a device attestation + token exchange flow with short‑lived tokens. Pin TLS and enforce replay protection.

Automation blueprint (pseudo‑workflow)

name: rotate-saas-api-key
on:
  schedule: [cron: "0 3 * * 1"]   # Mondays 03:00 UTC
  workflow_dispatch:
jobs:
  rotate:
    runs-on: ubuntu-latest
    permissions: { id-token: write }   # for workload identity
    steps:
      - name: Authenticate to cloud
        run: |
          oidc_exchange > creds.json
      - name: Create new key in vendor
        run: |
          NEW=$(curl -s -X POST "$VENDOR/keys" -H "Authorization: Bearer $ADMIN")
          echo "new=$NEW" >> $GITHUB_OUTPUT
      - name: Store in secrets manager (versioned)
        run: |
          sm put --name saas/api --version new --value "$NEW"
          sm promote --name saas/api --from new --to current
      - name: Trigger rolling deploy
        run: deploy --service payments --with-secret saas/api:current
      - name: Health check
        run: |
          curl -f https://health.company.test/payments || exit 1
      - name: Revoke old key
        run: |
          OLD=$(sm get --name saas/api --version previous)
          curl -s -X DELETE "$VENDOR/keys/$OLD"
      - name: Close change ticket with evidence
        run: attach-logs --job $GITHUB_RUN_ID --ticket ROT-1234

Key ideas:

  • Use versioned secrets with aliases (current/previous).
  • Gate revocation on health checks and observable success.
  • Persist artifacts for audits: who rotated, what changed, evidence of success.

Validation, monitoring, and detection

  • Track usage: per‑key request counts, error rates, client IDs, source IPs, user agents.
  • Canary keys: issue a decoy key; any use equals leak detection.
  • Egress controls: restrict outbound traffic to known API endpoints and IPs.
  • DLP and secret scanning: run commit‑time and repo‑wide scanners; scan artifacts and logs.
  • Alerting: notify on keys nearing expiry, spikes in 401/403, or use from new geographies.

Incident response: when a key leaks

  1. Contain (minutes): revoke exposed key, disable affected integrations, rotate dependent secrets.
  2. Eradicate (hours): scrub logs, purge caches, rotate upstream credentials, invalidate sessions/tokens.
  3. Recover (hours–days): restore service with new keys; verify least‑privilege scopes.
  4. Learn (days): add detections, adjust TTLs, harden pipelines, update playbooks.

Maintain a pre‑approved “break glass” procedure with:

  • Contacts and on‑call rotations.
  • Scripts to mass‑revoke and mass‑rotate.
  • Communication templates for customers and vendors.

Governance, risk, and compliance

  • Policy: define maximum key TTL (e.g., 30–90 days), approval tiers for privileged scopes, and required peer review for rotations.
  • Inventory: maintain a CMDB‑like index of all keys with owners, systems, scopes, and expiry.
  • Segregation of duties: different personas generate, approve, and deploy keys.
  • Evidence: store rotation logs, change tickets, and health‑check proofs for audits.

Common pitfalls to avoid

  • One key for everything: breaks least privilege and complicates forensics.
  • Manual, ad‑hoc rotations: error‑prone and usually skipped during incidents.
  • Hard‑coding in images: forces disruptive redeploys; prefer runtime injection.
  • No overlap window: immediate revocation before clients switch causes outages.
  • Ignoring downstream caches: SDKs and gateways may cache keys—coordinate TTLs and cache busting.
  • Lack of rollback: rotate without the ability to revert aliases safely.

Metrics that matter

  • Rotation success rate (%) without outages.
  • Median time to rotate (MTR) on schedule and during incidents.
  • Percentage of keys with TTL ≤ policy (e.g., ≤ 60 days).
  • Number of unused/stale keys detected and removed per quarter.
  • Detection time from leak to containment.

Quick checklist

  • Keys scoped to single purpose, environment, and minimal permissions.
  • Central secrets manager with versioning, RBAC, and audit logs.
  • Automated, tested rotation workflows with health‑gated revocation.
  • Short TTLs or ephemeral tokens where possible.
  • Telemetry linking keys to identities, services, and geographies.
  • Incident playbook with scripts and contacts.
  • Regular secret scanning across repos, images, and logs.

Final thoughts

Rotation is not just a calendar event—it is an operating model. Treat every key as ephemeral, every rotation as code, and every change as an opportunity to narrow scope and improve observability. When rotation becomes boring and automated, your attack surface shrinks, your mean time to recover improves, and audits become routine rather than stressful.

Related Posts