API Load Testing with k6: A Step-by-Step Tutorial for Reliable APIs
Learn API load testing with Grafana k6: install, script, model workloads, set thresholds, run in CI, and analyze results with practical examples.
Image used for representation purposes only.
Why k6 for API Load Testing
APIs are the backbone of modern systems. When traffic surges or a slow dependency appears, even small bottlenecks ripple into user-visible incidents. k6 (by Grafana) is a developer-centric, scriptable load testing tool that lets you write realistic scenarios in JavaScript, enforce performance SLOs with thresholds, and integrate seamlessly into CI/CD. This tutorial walks you through setup, writing tests, modeling workloads, analyzing results, and automating gates that prevent slow code from shipping.
Install and Verify
Choose one of the quick install paths:
- macOS (Homebrew):
brew install k6 - Windows (Chocolatey):
choco install k6 - Linux (Deb, RPM, or via package manager): install from your distro repo or from the official k6 packages.
- Docker (portable, great for CI):
docker run --rm -i grafana/k6:latest run - <script.js
Verify:
k6 version
Your First Test in 5 Minutes
Create a minimal test that requests an endpoint, validates the response, and sets a latency SLO.
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
http_req_failed: ['rate<0.01'], // <1% request errors
},
};
export default function () {
const res = http.get(`${__ENV.BASE_URL || 'https://api.example.com'}/health`);
check(res, {
'status is 200': (r) => r.status === 200,
});
sleep(1); // think time
}
Run it:
BASE_URL=https://your-api.example.com k6 run smoke.js
You’ll see a live progress bar and a final summary with p(95), RPS, and failure rate. Thresholds enforce SLOs; if they fail, k6 exits non‑zero—perfect for CI.
Core Concepts You’ll Use Every Day
- Virtual Users (VUs): lightweight JS contexts that run your default function.
- Iterations: one pass of the default function. VUs loop, producing traffic.
- Duration and Stages: define how long and how traffic ramps up/down.
- Scenarios and Executors: advanced control over arrival rate, shared iterations, or soak tests.
- Checks vs. Thresholds: checks assert on individual responses; thresholds enforce statistical SLOs across the test.
Modeling Realistic Workloads with Scenarios
Relying on only VUs and duration often yields uneven RPS. For APIs with strict SLOs, model traffic with executors.
export const options = {
scenarios: {
constant_rps: {
executor: 'constant-arrival-rate',
rate: 100, // 100 requests per second
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 50, // pool size; tune to avoid over/underutilization
maxVUs: 200,
},
ramp_rps: {
executor: 'ramping-arrival-rate',
startRate: 20,
timeUnit: '1s',
preAllocatedVUs: 50,
maxVUs: 300,
stages: [
{ target: 100, duration: '1m' },
{ target: 200, duration: '2m' },
{ target: 50, duration: '1m' },
],
},
},
thresholds: {
http_req_duration: [
{ threshold: 'p(95)<400', abortOnFail: true, delayAbortEval: '30s' },
],
http_req_failed: ['rate<0.01'],
},
};
Use arrival-rate executors when you care about stable RPS. For a quick smoke test, try shared-iterations; for long stability checks, prefer a hours-long executor with modest RPS.
Structuring Tests with setup and teardown
Use setup() for one-time actions (e.g., login) and pass artifacts to VUs. Use teardown() for cleanup.
import http from 'k6/http';
import { check, sleep } from 'k6';
export function setup() {
const res = http.post(`${__ENV.BASE_URL}/auth`, JSON.stringify({
username: __ENV.USER,
password: __ENV.PASS,
}), { headers: { 'Content-Type': 'application/json' } });
check(res, { 'login ok': (r) => r.status === 200 });
const token = res.json('token');
return { token };
}
export default function (data) {
const headers = { Authorization: `Bearer ${data.token}` };
const res = http.get(`${__ENV.BASE_URL}/v1/orders?limit=25`, { headers });
check(res, {
'200': (r) => r.status === 200,
'payload ok': (r) => r.json('items.length') >= 0,
});
sleep(Math.random() * 2);
}
export function teardown(data) {
// optionally revoke token, clear fixtures, etc.
}
Data-Driven Tests and Correlation
Use open() to load static files and SharedArray to avoid duplicating data per VU.
import { SharedArray } from 'k6/data';
import http from 'k6/http';
const users = new SharedArray('users', () => JSON.parse(open('./fixtures/users.json')));
export default function () {
const u = users[Math.floor(Math.random() * users.length)];
const login = http.post(`${__ENV.BASE_URL}/auth`, JSON.stringify(u), { headers: { 'Content-Type': 'application/json' } });
const token = login.json('token');
const create = http.post(`${__ENV.BASE_URL}/v1/things`, JSON.stringify({ name: `t-${__VU}-${__ITER}` }), { headers: { Authorization: `Bearer ${token}`, 'Content-Type': 'application/json' } });
const id = create.json('id'); // correlation: use the id returned to fetch later
http.get(`${__ENV.BASE_URL}/v1/things/${id}`, { headers: { Authorization: `Bearer ${token}` } });
}
Custom Metrics, Tags, and Grouping
Add domain-specific visibility by tagging requests and creating custom metrics.
import { Counter, Trend, Rate } from 'k6/metrics';
import { group } from 'k6';
const createLatency = new Trend('create_latency');
const notFoundRate = new Rate('not_found_rate');
const createdCount = new Counter('created_count');
export default function () {
group('catalog flow', () => {
const res = http.get(`${__ENV.BASE_URL}/v1/catalog`, { tags: { endpoint: 'catalog' } });
createLatency.add(res.timings.duration, { endpoint: 'catalog' });
notFoundRate.add(res.status === 404);
});
createdCount.add(1);
}
Tags make filtering and per-endpoint thresholds possible, for example http_req_duration{endpoint:catalog}.
Choosing the Right Test Type
- Smoke: fast validation of correctness under light load.
k6 run --vus 1 --duration 30s api.js - Load: typical daily peak for 15–60 minutes using constant or ramping arrival rate.
- Stress: escalate beyond expected peak to find system breaking points.
- Spike: sudden surge to test autoscaling and queue behavior.
- Soak: hours-long steady load to reveal memory leaks and resource exhaustion.
Practical Thresholds that Map to SLOs
Thresholds turn performance into a contract:
export const options = {
thresholds: {
'http_req_duration{endpoint:catalog}': ['p(95)<300'],
http_req_failed: ['rate<0.005'], // 0.5% max failures
checks: ['rate>0.99'],
},
};
- Pick p(95) or p(99) over averages.
- Separate read vs. write endpoints using tags.
- Use
abortOnFailto stop early for egregious regressions.
Handling Timeouts, Retries, and Backoff
Match real client behavior. Global timeouts and per-request settings reduce false positives.
export const options = {
tlsAuth: [], // add mTLS if needed
insecureSkipTLSVerify: false, // only set true in non‑prod tests
};
const params = {
timeout: '3s',
headers: { 'Content-Type': 'application/json' },
};
function withRetry(fn, attempts = 3) {
let last;
for (let i = 0; i < attempts; i++) {
last = fn();
if (last.status && last.status < 500) return last;
sleep(Math.pow(2, i) / 10); // backoff 100ms, 200ms, 400ms
}
return last;
}
export default function () {
const res = withRetry(() => http.get(`${__ENV.BASE_URL}/v1/search?q=foo`, params));
check(res, { 'ok or 404 acceptable': (r) => r.status === 200 || r.status === 404 });
}
Running at Scale and Capturing Results
- Local run with tuned summary:
k6 run --summary-trend-stats=avg,p(90),p(95),p(99),min,max test.js - JSON summary for post-processing:
k6 run --out json=results.json test.js - Time-series backends (dashboards and long-term analysis):
- InfluxDB:
--out influxdb=http://localhost:8086/k6 - Prometheus remote write:
--out prometheus-remote-write=http://prom:9090/api/v1/write
- InfluxDB:
Grafana dashboards on top of InfluxDB or Prometheus help visualize latency histograms, per-endpoint p(95), saturation, and error bursts.
CI/CD Integration: Fail the Build on Regressions
You can use k6 in pipelines either via a hosted runner with k6 installed or through Docker. Example GitHub Actions job using Docker:
name: performance-gate
on: [push, pull_request]
jobs:
k6:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run k6
run: |
docker run --rm \
-e BASE_URL=${{ secrets.PERF_BASE_URL }} \
-v "$PWD:/scripts" grafana/k6:latest run /scripts/test.js
If thresholds fail, the step exits non‑zero and your PR is blocked until performance meets the bar.
Interpreting the k6 Summary Like a Pro
- http_req_duration: use p(95)/p(99), not avg. Compare against SLOs.
- http_req_failed: watch for spikes >1%; correlate with logs and traces.
- Iterations and data received/sent: validate workload realism.
- Checks: a low check rate indicates functional issues, not just slowness.
- Per-endpoint tags: identify the slow outliers rather than tuning blindly.
Common Pitfalls and How to Avoid Them
- Unstable RPS: use arrival-rate executors instead of only VUs+duration.
- Shared mutable globals: protect or isolate state; prefer server-generated IDs.
- Cold start bias: include a warm-up stage, then measure steady state.
- Caching illusions: bust caches when necessary or test cache-hit and miss paths separately.
- Connection reuse: most clients reuse keep-alive. Only disable with
noConnectionReuse: trueif you’re modeling worst-case behavior. - Think time: simulate real users with
sleep()and randomization.
Advanced Protocols and Use Cases
- gRPC: test microservice-to-microservice paths.
- WebSockets: model streaming or push updates.
- Browser module: complement API tests with lightweight UX journeys.
- xk6 extensions: add custom protocols or exporters when needed.
A Complete Example You Can Adapt Today
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
read_heavy: {
executor: 'ramping-arrival-rate',
startRate: 50,
timeUnit: '1s',
preAllocatedVUs: 100,
maxVUs: 500,
stages: [
{ target: 200, duration: '2m' }, // ramp to peak
{ target: 200, duration: '5m' }, // sustain peak
{ target: 50, duration: '2m' }, // ramp down
],
tags: { scenario: 'read' },
},
write_light: {
executor: 'constant-arrival-rate',
rate: 10,
timeUnit: '1s',
duration: '9m',
preAllocatedVUs: 20,
maxVUs: 100,
tags: { scenario: 'write' },
},
},
thresholds: {
'http_req_duration{scenario:read}': ['p(95)<300'],
'http_req_duration{scenario:write}': ['p(95)<600'],
http_req_failed: ['rate<0.01'],
checks: ['rate>0.99'],
},
};
const commonHeaders = { 'Content-Type': 'application/json' };
export default function () {
// Read path
const q = ['laptops', 'monitors', 'keyboards'][Math.floor(Math.random() * 3)];
const r1 = http.get(`${__ENV.BASE_URL}/v1/search?q=${q}`, { headers: commonHeaders, tags: { endpoint: 'search' } });
check(r1, { 'search 200': (r) => r.status === 200 });
// Write path (lighter rate)
if (__ITER % 10 === 0) { // ~10% writes
const body = JSON.stringify({ sku: `sku-${__VU}-${__ITER}`, qty: 1 });
const r2 = http.post(`${__ENV.BASE_URL}/v1/cart/items`, body, { headers: commonHeaders, tags: { endpoint: 'add_to_cart' } });
check(r2, { 'add_to_cart 201': (r) => r.status === 201 });
}
sleep(Math.random());
}
This mixed workload models common read-heavy systems, tags endpoints for granular thresholds, and aligns SLOs to user impact.
Next Steps
- Wire results to Grafana via Prometheus or InfluxDB.
- Add canary performance gates for your hottest endpoints.
- Expand scenarios to cover retries, degraded dependencies, and downstream timeouts.
- Track historical p(95)/error trends to spot regressions early.
With k6, performance becomes a first-class, testable requirement—guarded by code, enforced by thresholds, and visible in your pipeline and dashboards.
Related Posts
API Monitoring and Observability Tools: A Practical Comparison
A practical comparison of API monitoring and observability tools: categories, criteria, architectures, cost controls, and decision recipes.
API‑First Development: An End‑to‑End Workflow Guide
A practical, end-to-end API-first workflow: design, mock, test, secure, observe, and release with contracts as the single source of truth.
React Accessibility: Practical ARIA Best Practices
A practical React guide to ARIA—when to use it, when not to, plus patterns for focus, labels, widgets, and testing.