GraphQL Input Validation and Sanitization: A Practical, Defense‑in‑Depth Guide
A practical guide to GraphQL input validation and sanitization with schema design, scalars, directives, resolver checks, and query cost controls.
Image used for representation purposes only.
Overview
GraphQL makes data retrieval flexible and efficient, but that same flexibility can open the door to malformed inputs, business-logic abuse, and resource exhaustion. Solid input validation and careful sanitization are essential to keep your API predictable, secure, and operable at scale. This guide walks through a practical, defense-in-depth approach—covering schema design, custom scalars, directives, resolver-level checks, query cost controls, and safe handling of downstream systems.
Validation vs. Sanitization
- Validation: Decide whether input is acceptable. Reject early with precise errors when it is not.
- Sanitization: Transform acceptable input into a canonical, safe form for storage or further processing.
Prefer validating and rejecting bad data over “fixing” it silently. Use sanitization for benign transformations (for example, trimming whitespace) or canonicalization (for example, Unicode normalization) that preserve business meaning.
Threat Model: What Can Go Wrong?
- Business-logic bypass: Missing or lax checks allow invalid states (negative prices, impossible dates, overlong names).
- Injection: Unsafe construction of SQL/NoSQL queries or shell commands from user inputs.
- XSS in downstream consumers: Returning unsanitized, user-controlled HTML/markdown that later renders in a browser.
- Denial of Service (DoS): Deeply nested or expensive queries, huge inputs, large file uploads, or repeated heavy operations.
- Data exfiltration signals: Overly permissive filters, lax enums, or weak ownership checks enable scraping or unauthorized access.
Where to Validate and Sanitize
Think in layers and enforce constraints as early as possible:
- Transport and gateway
- Enforce content-type, maximum request size, authentication, TLS.
- Apply rate limits and WAF rules.
- Operation-level controls
- Enforce query depth/complexity limits; prefer persisted or allow‑listed queries for sensitive paths.
- Schema design
- Leverage non-null, enums, and precise input types.
- Use custom scalars and constraint directives for common patterns.
- Resolver/business logic
- Validate shapes, ranges, and cross-field rules; canonicalize and sanitize inputs.
- Enforce authorization and ownership checks here.
- Persistence and integrations
- Parameterize queries; validate before writing; sanitize/escape when embedding in other contexts (HTML, CSV, logs).
Canonicalization and Sanitization Principles
- Normalize Unicode (for example, NFKC) to avoid homoglyph tricks.
- Trim whitespace where appropriate; normalize casing for identifiers (emails, usernames) if your rules allow.
- Whitelist characters when feasible; bound lengths defensively.
- Avoid destructive HTML stripping unless your product specifically requires it; prefer structured formats (for example, markdown-to-HTML with a sanitizer) or reject unexpected HTML entirely.
Strong Schema Design: Your First Line of Defense
Define precise input types and enums. Avoid broad strings when a narrower type exists.
Example input and enum:
input CreateUserInput {
username: String! # further constrained via directive or resolver
email: Email!
displayName: String
role: Role! # enum guards acceptable values
}
enum Role {
MEMBER
ADMIN
}
Custom Scalars for Common Patterns
Custom scalars encapsulate validation and canonicalization (for example, emails, URLs, UUIDs, DateTimes). Below is a minimal Email scalar in TypeScript; use a robust library in production.
// email-scalar.ts
import { GraphQLError, GraphQLScalarType, Kind } from 'graphql';
function normalizeEmail(v: string) {
const trimmed = v.trim();
// Simple check; replace with a proper email validator
if (!/^[^@\s]+@[^@\s]+\.[^@\s]+$/.test(trimmed)) {
throw new GraphQLError('Invalid email');
}
return trimmed.toLowerCase();
}
export const EmailScalar = new GraphQLScalarType({
name: 'Email',
description: 'Lowercased, trimmed RFC5322-like email (simplified)',
serialize: (v) => String(v),
parseValue: (v) => normalizeEmail(String(v)),
parseLiteral: (ast) => {
if (ast.kind !== Kind.STRING) throw new GraphQLError('Email must be a string');
return normalizeEmail(ast.value);
},
});
Register this scalar in your schema and resolvers so that any field of type Email benefits automatically.
Constraint Directives: Declarative Field Rules
Directives let you attach constraints to schema fields and inputs. You can implement your own or use a community solution. Here’s a conceptual example:
# Example constraint directive definition
directive @constraint(
minLength: Int
maxLength: Int
pattern: String
) on INPUT_FIELD_DEFINITION | ARGUMENT_DEFINITION
input CreateUserInput {
username: String! @constraint(minLength: 3, maxLength: 30, pattern: "^[a-zA-Z0-9_]+$")
email: Email!
}
In the directive’s logic, validate the argument and fail fast with a BAD_USER_INPUT error if the constraint is violated.
Resolver-Level Validation with a Schema Library
Even with strong schemas, resolver-level checks are indispensable for cross-field rules and business constraints. Validation libraries also offer sanitization transforms.
Example with Zod in Apollo Server:
// user.resolver.ts
import { z } from 'zod';
const createUserSchema = z.object({
username: z.string().min(3).max(30).regex(/^[a-zA-Z0-9_]+$/),
email: z.string().email().transform((v) => v.trim().toLowerCase()),
displayName: z.string().trim().max(80).optional(),
role: z.enum(['MEMBER', 'ADMIN']),
});
export const resolvers = {
Mutation: {
async createUser(_, { input }, { db, auth, logger }) {
// AuthZ first
auth.ensure('user:create');
// Validate and sanitize
const data = createUserSchema.parse(input);
// Parameterized DB query; never interpolate
const user = await db.users.insert({
username: data.username,
email: data.email,
display_name: data.displayName ?? null,
role: data.role,
});
return { id: user.id, ...data };
},
},
};
Tip: Prefer throwing typed GraphQLErrors with extensions.code = BAD_USER_INPUT and redacted, user-safe messages.
Query Cost Controls: Depth, Complexity, and Persisted Queries
Validation isn’t only about field values. You must restrict the shape and cost of incoming operations.
- Depth limit: Cap nesting to prevent pathological queries.
- Complexity scoring: Assign weights to fields and cap total cost.
- Persisted/allow‑listed queries: Only accept pre-registered operations (great for public APIs and mobile apps).
Conceptual setup (Node.js):
import depthLimit from 'graphql-depth-limit';
import { createComplexityRule } from 'graphql-query-complexity';
const maxDepth = 8;
const complexityRule = createComplexityRule({
maximumComplexity: 1000,
estimators: [/* field config estimators here */],
});
const server = new ApolloServer({
schema,
validationRules: [depthLimit(maxDepth), complexityRule],
});
Persisted queries sketch:
import { ApolloServerPluginInlineTraceDisabled } from '@apollo/server/plugin/disabled';
import { apqPlugin } from 'some-apq-plugin'; // conceptual
const server = new ApolloServer({
schema,
plugins: [apqPlugin(), ApolloServerPluginInlineTraceDisabled()],
});
For highly sensitive endpoints, reject ad‑hoc queries entirely and accept only allow‑listed operation IDs.
Handling File Uploads Safely
If your API accepts files via a GraphQL Upload scalar, validate type and size server-side and stream to storage—don’t buffer entire files in memory.
// upload.resolver.ts
import { finished } from 'stream/promises';
const MAX_BYTES = 10 * 1024 * 1024; // 10 MB
const ALLOWED = new Set(['image/png', 'image/jpeg']);
export const resolvers = {
Mutation: {
async uploadAvatar(_, { file }, { storage }) {
const { filename, mimetype, createReadStream } = await file;
if (!ALLOWED.has(mimetype)) throw new Error('Unsupported file type');
const stream = createReadStream();
let bytes = 0;
stream.on('data', (chunk) => {
bytes += chunk.length;
if (bytes > MAX_BYTES) stream.destroy(new Error('File too large'));
});
const key = `avatars/${Date.now()}-${filename}`;
const write = storage.writeStream(key, { contentType: mimetype });
stream.pipe(write);
await finished(write);
return { ok: true, url: storage.publicUrl(key) };
},
},
};
Downstream Safety: Databases, Search, and HTML
- Databases: Always use parameterized queries or query builders. Validate data before insertion; add database constraints (NOT NULL, CHECK, UNIQUE) as a backstop.
- Search engines: Escape special characters for the target DSL (for example, Elasticsearch, Lucene). Prefer parameterized APIs if available.
- HTML and emails: If returning rich text that will render in browsers, sanitize on output using a robust HTML sanitizer. Consider storing both the source (for example, markdown) and a sanitized, rendered form.
Language and Framework Notes
- Node.js: Apollo Server, GraphQL Yoga, NestJS GraphQL all support custom scalars, schema directives, and validation hooks.
- Java: graphql-java supports directives, instrumentation for complexity, and custom scalars.
- Go: gqlgen allows custom scalars and middlewares for validation.
- Python: Graphene and Ariadne support custom scalars and extension hooks for validation.
The patterns in this article generalize: push invariants into the schema, verify cross-field business rules in resolvers/services, and enforce resource limits at the operation boundary.
Error Handling, Logging, and Observability
- Error shapes: Use BAD_USER_INPUT for validation failures; avoid leaking stack traces. Include a stable error code and field-specific messages.
- Redaction: Remove PII from logs; never log raw credentials or tokens.
- Metrics: Track validation failure rates, top offending fields, and rejected operation complexity to spot abuse or UX issues.
- Tracing: Annotate traces with decision points (for example, “rejected: complexity=1250 > 1000”).
Example error response shape:
{
"errors": [
{
"message": "username must match ^[a-zA-Z0-9_]+$",
"extensions": { "code": "BAD_USER_INPUT", "field": "username" }
}
],
"data": null
}
Security and Privacy Checklist
- Inputs
- Enforce exact shapes with input types; avoid overly generic strings.
- Apply custom scalars for Email, URL, UUID, DateTime.
- Add directive-based constraints for length/pattern/range.
- Canonicalize (trim, case, Unicode) where appropriate.
- Validate cross-field rules in resolvers/services.
- Operations
- Depth and complexity limits in place.
- Persisted/allow‑listed queries for public surfaces.
- Maximum input sizes and upload limits enforced.
- Downstream
- Parameterized DB queries; DB constraints as backstops.
- Escape/sanitize for HTML or other rendering contexts.
- Platform
- Rate limiting and authentication at the edge.
- Safe error messages; redacted logs.
- Observability on validation failures and query costs.
Putting It Together: A Minimal End-to-End Flow
- Client sends a persisted query ID with input variables.
- Gateway enforces auth, size limits, and rate limits.
- Server validates operation depth/complexity; rejects if exceeded.
- Schema-level scalars/directives validate primitive constraints.
- Resolver validates business rules (for example, uniqueness, ownership) and canonicalizes inputs.
- Persistence layer writes via parameterized queries; DB constraints enforce invariants.
- Responses are shaped with safe error messages; logs are redacted; metrics are recorded.
Conclusion
Robust GraphQL input validation and sanitization isn’t one trick—it’s a layered strategy. Express invariants in the schema, capture business logic in resolvers, restrict operation cost, and sanitize only when it preserves meaning. Combined with sound downstream hygiene and observability, these practices yield an API that is safer, faster to debug, and resilient under real-world traffic and adversarial inputs.
Related Posts
Integrating an AI Writing Assistant via API: Architecture, Code, and Best Practices
A practical guide to integrating an AI writing assistant via API—architecture, prompt design, code samples, safety, evaluation, and performance optimization.
Implementing AI Chatbots for Customer Service: An End-to-End Guide
End-to-end guide to planning, building, and launching AI chatbots for customer service: architecture, KPIs, workflows, security, and ROI.
Implementing Reliable Tool Calling for AI Agents: Architecture, Schemas, and Best Practices
Hands-on guide to reliable, secure tool calling for AI agents: architecture, schemas, control loops, error handling, observability, and evaluation.