API Load Testing with k6: A Step-by-Step Tutorial for Reliable APIs

Why k6 for API Load Testing

APIs are the backbone of modern systems. When traffic surges or a slow dependency appears, even small bottlenecks ripple into user-visible incidents. k6 (by Grafana) is a developer-centric, scriptable load testing tool that lets you write realistic scenarios in JavaScript, enforce performance SLOs with thresholds, and integrate seamlessly into CI/CD. This tutorial walks you through setup, writing tests, modeling workloads, analyzing results, and automating gates that prevent slow code from shipping.

Install and Verify

Choose one of the quick install paths:

macOS (Homebrew):
```
brew install k6
```
Windows (Chocolatey):
```
choco install k6
```
Linux (Deb, RPM, or via package manager): install from your distro repo or from the official k6 packages.

Docker (portable, great for CI):

docker run --rm -i grafana/k6:latest run - <script.js

Verify:

k6 version

Your First Test in 5 Minutes

Create a minimal test that requests an endpoint, validates the response, and sets a latency SLO.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 10,
  duration: '30s',
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95% of requests under 500ms
    http_req_failed: ['rate<0.01'],     // <1% request errors
  },
};

export default function () {
  const res = http.get(`${__ENV.BASE_URL || 'https://api.example.com'}/health`);
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1); // think time
}

Run it:

BASE_URL=https://your-api.example.com k6 run smoke.js

You’ll see a live progress bar and a final summary with p(95), RPS, and failure rate. Thresholds enforce SLOs; if they fail, k6 exits non‑zero—perfect for CI.

Core Concepts You’ll Use Every Day

Virtual Users (VUs): lightweight JS contexts that run your default function.
Iterations: one pass of the default function. VUs loop, producing traffic.
Duration and Stages: define how long and how traffic ramps up/down.
Scenarios and Executors: advanced control over arrival rate, shared iterations, or soak tests.
Checks vs. Thresholds: checks assert on individual responses; thresholds enforce statistical SLOs across the test.

Modeling Realistic Workloads with Scenarios

Relying on only VUs and duration often yields uneven RPS. For APIs with strict SLOs, model traffic with executors.

export const options = {
  scenarios: {
    constant_rps: {
      executor: 'constant-arrival-rate',
      rate: 100,                 // 100 requests per second
      timeUnit: '1s',
      duration: '2m',
      preAllocatedVUs: 50,       // pool size; tune to avoid over/underutilization
      maxVUs: 200,
    },
    ramp_rps: {
      executor: 'ramping-arrival-rate',
      startRate: 20,
      timeUnit: '1s',
      preAllocatedVUs: 50,
      maxVUs: 300,
      stages: [
        { target: 100, duration: '1m' },
        { target: 200, duration: '2m' },
        { target: 50,  duration: '1m' },
      ],
    },
  },
  thresholds: {
    http_req_duration: [
      { threshold: 'p(95)<400', abortOnFail: true, delayAbortEval: '30s' },
    ],
    http_req_failed: ['rate<0.01'],
  },
};

Use arrival-rate executors when you care about stable RPS. For a quick smoke test, try shared-iterations; for long stability checks, prefer a hours-long executor with modest RPS.

Structuring Tests with setup and teardown

Use setup() for one-time actions (e.g., login) and pass artifacts to VUs. Use teardown() for cleanup.

import http from 'k6/http';
import { check, sleep } from 'k6';

export function setup() {
  const res = http.post(`${__ENV.BASE_URL}/auth`, JSON.stringify({
    username: __ENV.USER,
    password: __ENV.PASS,
  }), { headers: { 'Content-Type': 'application/json' } });
  check(res, { 'login ok': (r) => r.status === 200 });
  const token = res.json('token');
  return { token };
}

export default function (data) {
  const headers = { Authorization: `Bearer ${data.token}` };
  const res = http.get(`${__ENV.BASE_URL}/v1/orders?limit=25`, { headers });
  check(res, {
    '200': (r) => r.status === 200,
    'payload ok': (r) => r.json('items.length') >= 0,
  });
  sleep(Math.random() * 2);
}

export function teardown(data) {
  // optionally revoke token, clear fixtures, etc.
}

Data-Driven Tests and Correlation

Use open() to load static files and SharedArray to avoid duplicating data per VU.

import { SharedArray } from 'k6/data';
import http from 'k6/http';

const users = new SharedArray('users', () => JSON.parse(open('./fixtures/users.json')));

export default function () {
  const u = users[Math.floor(Math.random() * users.length)];
  const login = http.post(`${__ENV.BASE_URL}/auth`, JSON.stringify(u), { headers: { 'Content-Type': 'application/json' } });
  const token = login.json('token');
  const create = http.post(`${__ENV.BASE_URL}/v1/things`, JSON.stringify({ name: `t-${__VU}-${__ITER}` }), { headers: { Authorization: `Bearer ${token}`, 'Content-Type': 'application/json' } });
  const id = create.json('id'); // correlation: use the id returned to fetch later
  http.get(`${__ENV.BASE_URL}/v1/things/${id}`, { headers: { Authorization: `Bearer ${token}` } });
}

Custom Metrics, Tags, and Grouping

Add domain-specific visibility by tagging requests and creating custom metrics.

import { Counter, Trend, Rate } from 'k6/metrics';
import { group } from 'k6';

const createLatency = new Trend('create_latency');
const notFoundRate = new Rate('not_found_rate');
const createdCount = new Counter('created_count');

export default function () {
  group('catalog flow', () => {
    const res = http.get(`${__ENV.BASE_URL}/v1/catalog`, { tags: { endpoint: 'catalog' } });
    createLatency.add(res.timings.duration, { endpoint: 'catalog' });
    notFoundRate.add(res.status === 404);
  });
  createdCount.add(1);
}

Tags make filtering and per-endpoint thresholds possible, for example http_req_duration{endpoint:catalog}.

Choosing the Right Test Type

Smoke: fast validation of correctness under light load.
```
k6 run --vus 1 --duration 30s api.js
```
Load: typical daily peak for 15–60 minutes using constant or ramping arrival rate.
Stress: escalate beyond expected peak to find system breaking points.
Spike: sudden surge to test autoscaling and queue behavior.
Soak: hours-long steady load to reveal memory leaks and resource exhaustion.

Practical Thresholds that Map to SLOs

Thresholds turn performance into a contract:

export const options = {
  thresholds: {
    'http_req_duration{endpoint:catalog}': ['p(95)<300'],
    http_req_failed: ['rate<0.005'], // 0.5% max failures
    checks: ['rate>0.99'],
  },
};

Pick p(95) or p(99) over averages.
Separate read vs. write endpoints using tags.
Use abortOnFail to stop early for egregious regressions.

Handling Timeouts, Retries, and Backoff

Match real client behavior. Global timeouts and per-request settings reduce false positives.

export const options = {
  tlsAuth: [], // add mTLS if needed
  insecureSkipTLSVerify: false, // only set true in non‑prod tests
};

const params = {
  timeout: '3s',
  headers: { 'Content-Type': 'application/json' },
};

function withRetry(fn, attempts = 3) {
  let last;
  for (let i = 0; i < attempts; i++) {
    last = fn();
    if (last.status && last.status < 500) return last;
    sleep(Math.pow(2, i) / 10); // backoff 100ms, 200ms, 400ms
  }
  return last;
}

export default function () {
  const res = withRetry(() => http.get(`${__ENV.BASE_URL}/v1/search?q=foo`, params));
  check(res, { 'ok or 404 acceptable': (r) => r.status === 200 || r.status === 404 });
}

Running at Scale and Capturing Results

Local run with tuned summary:

k6 run --summary-trend-stats=avg,p(90),p(95),p(99),min,max test.js

JSON summary for post-processing:
```
k6 run --out json=results.json test.js
```
Time-series backends (dashboards and long-term analysis):
- InfluxDB: --out influxdb=http://localhost:8086/k6
- Prometheus remote write: --out prometheus-remote-write=http://prom:9090/api/v1/write

Grafana dashboards on top of InfluxDB or Prometheus help visualize latency histograms, per-endpoint p(95), saturation, and error bursts.

CI/CD Integration: Fail the Build on Regressions

You can use k6 in pipelines either via a hosted runner with k6 installed or through Docker. Example GitHub Actions job using Docker:

name: performance-gate
on: [push, pull_request]
jobs:
  k6:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run k6
        run: |
          docker run --rm \
            -e BASE_URL=${{ secrets.PERF_BASE_URL }} \
            -v "$PWD:/scripts" grafana/k6:latest run /scripts/test.js

If thresholds fail, the step exits non‑zero and your PR is blocked until performance meets the bar.

Interpreting the k6 Summary Like a Pro

http_req_duration: use p(95)/p(99), not avg. Compare against SLOs.
http_req_failed: watch for spikes >1%; correlate with logs and traces.
Iterations and data received/sent: validate workload realism.
Checks: a low check rate indicates functional issues, not just slowness.
Per-endpoint tags: identify the slow outliers rather than tuning blindly.

Common Pitfalls and How to Avoid Them

Unstable RPS: use arrival-rate executors instead of only VUs+duration.
Shared mutable globals: protect or isolate state; prefer server-generated IDs.
Cold start bias: include a warm-up stage, then measure steady state.
Caching illusions: bust caches when necessary or test cache-hit and miss paths separately.
Connection reuse: most clients reuse keep-alive. Only disable with noConnectionReuse: true if you’re modeling worst-case behavior.
Think time: simulate real users with sleep() and randomization.

Advanced Protocols and Use Cases

gRPC: test microservice-to-microservice paths.
WebSockets: model streaming or push updates.
Browser module: complement API tests with lightweight UX journeys.
xk6 extensions: add custom protocols or exporters when needed.

A Complete Example You Can Adapt Today

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  scenarios: {
    read_heavy: {
      executor: 'ramping-arrival-rate',
      startRate: 50,
      timeUnit: '1s',
      preAllocatedVUs: 100,
      maxVUs: 500,
      stages: [
        { target: 200, duration: '2m' }, // ramp to peak
        { target: 200, duration: '5m' }, // sustain peak
        { target: 50,  duration: '2m' }, // ramp down
      ],
      tags: { scenario: 'read' },
    },
    write_light: {
      executor: 'constant-arrival-rate',
      rate: 10,
      timeUnit: '1s',
      duration: '9m',
      preAllocatedVUs: 20,
      maxVUs: 100,
      tags: { scenario: 'write' },
    },
  },
  thresholds: {
    'http_req_duration{scenario:read}': ['p(95)<300'],
    'http_req_duration{scenario:write}': ['p(95)<600'],
    http_req_failed: ['rate<0.01'],
    checks: ['rate>0.99'],
  },
};

const commonHeaders = { 'Content-Type': 'application/json' };

export default function () {
  // Read path
  const q = ['laptops', 'monitors', 'keyboards'][Math.floor(Math.random() * 3)];
  const r1 = http.get(`${__ENV.BASE_URL}/v1/search?q=${q}`, { headers: commonHeaders, tags: { endpoint: 'search' } });
  check(r1, { 'search 200': (r) => r.status === 200 });

  // Write path (lighter rate)
  if (__ITER % 10 === 0) { // ~10% writes
    const body = JSON.stringify({ sku: `sku-${__VU}-${__ITER}`, qty: 1 });
    const r2 = http.post(`${__ENV.BASE_URL}/v1/cart/items`, body, { headers: commonHeaders, tags: { endpoint: 'add_to_cart' } });
    check(r2, { 'add_to_cart 201': (r) => r.status === 201 });
  }

  sleep(Math.random());
}

This mixed workload models common read-heavy systems, tags endpoints for granular thresholds, and aligns SLOs to user impact.

Next Steps

Wire results to Grafana via Prometheus or InfluxDB.
Add canary performance gates for your hottest endpoints.
Expand scenarios to cover retries, degraded dependencies, and downstream timeouts.
Track historical p(95)/error trends to spot regressions early.

With k6, performance becomes a first-class, testable requirement—guarded by code, enforced by thresholds, and visible in your pipeline and dashboards.

API Load Testing with k6: A Step-by-Step Tutorial for Reliable APIs

Why k6 for API Load Testing

Install and Verify

Your First Test in 5 Minutes

Core Concepts You’ll Use Every Day

Modeling Realistic Workloads with Scenarios

Structuring Tests with setup and teardown

Data-Driven Tests and Correlation

Custom Metrics, Tags, and Grouping

Choosing the Right Test Type

Practical Thresholds that Map to SLOs

Handling Timeouts, Retries, and Backoff

Running at Scale and Capturing Results

CI/CD Integration: Fail the Build on Regressions

Interpreting the k6 Summary Like a Pro

Common Pitfalls and How to Avoid Them

Advanced Protocols and Use Cases

A Complete Example You Can Adapt Today

Next Steps

Tags

Related Posts

API Monitoring and Observability Tools: A Practical Comparison

API‑First Development: An End‑to‑End Workflow Guide

React Accessibility: Practical ARIA Best Practices

Services

Products

Company

Legal