GraphQL Error Handling Best Practices: Clear, Secure, and Resilient APIs

A practical guide to GraphQL error handling: schema design, HTTP codes, partial data, masking, client patterns, observability, and examples.

ASOasis
6 min read
GraphQL Error Handling Best Practices: Clear, Secure, and Resilient APIs

Image used for representation purposes only.

Why GraphQL Error Handling Deserves First-Class Design

GraphQL makes it easy to fetch exactly the data you need, but poor error handling can turn elegant schemas into opaque systems. Done well, errors are clear, consistent, secure, and actionable—by both clients and operators.

This guide distills best practices across schema design, transport and protocol behavior, server implementation, client patterns, security, and observability.

The Anatomy of a GraphQL Error

A GraphQL response can include both data and errors:

{
  "data": { "user": null },
  "errors": [
    {
      "message": "User not found",
      "path": ["user"],
      "locations": [{ "line": 2, "column": 3 }],
      "extensions": {
        "code": "NOT_FOUND",
        "httpStatus": 404,
        "severity": "info",
        "correlationId": "7f3b…"
      }
    }
  ]
}

Key fields:

  • message: Human-readable summary (safe for end users if masked).
  • path: Where the error happened in the response shape.
  • locations: Where in the query it originated (optional in production responses).
  • extensions: A machine-readable bag for codes, metadata, and tracing.

Partial data is normal in GraphQL; a field can resolve to null while siblings succeed. Embrace this instead of failing entire operations by default.

Map Transport vs. Execution vs. Domain Errors

Treat errors in three layers:

  1. Transport/Protocol (HTTP-level)
  • 200: Successful GraphQL execution (even with GraphQL errors in the body).
  • 400: Syntax or validation errors that prevent execution.
  • 401/403: Authentication/authorization refused at the gateway.
  • 429: Rate limiting.
  • 500/502/503: Server or upstream outage.
  1. GraphQL Execution Errors
  • Resolver throws (unexpected). Mask to a generic message; keep details in logs/trace.
  1. Domain/Business Errors
  • Prefer modeling as data (payload patterns or typed unions) or use GraphQL errors with stable codes in extensions. Choose one approach consistently.

Design Your Schema for Predictable Errors

Schema choices determine how clients experience failure.

  1. Use Non-Null Thoughtfully
  • Non-null (!) declares a contract. If a non-null field fails, GraphQL will bubble nulls up to the nearest nullable parent. Overusing ! can cause cascading nulls. Reserve it for truly mandatory data.
  1. Prefer Payload Patterns for Mutations Return a payload object with data and user-facing errors, popularized by large GraphQL APIs:
type Mutation {
  createUser(input: CreateUserInput!): CreateUserPayload!
}

type CreateUserPayload {
  user: User
  userErrors: [UserError!]!
}

type UserError {
  message: String!
  code: UserErrorCode!
  fieldPath: [String!]
}

enum UserErrorCode {
  INVALID_INPUT
  EMAIL_TAKEN
  RATE_LIMITED
}

Benefits: predictable shape, multiple field-level issues at once, excellent UX for forms.

  1. Consider Typed Result Unions for Queries
union UserResult = User | NotFoundError | PermissionError

interface AppError { message: String!, code: String! }

type NotFoundError implements AppError { message: String!, code: String! }

This keeps domain errors in the type system, improving discoverability and codegen.

  1. Standardize Error Codes Create an enum or documented registry. Keep codes stable across versions. Common families:
  • AUTH_xxx (UNAUTHENTICATED, FORBIDDEN)
  • INPUT_xxx (INVALID_INPUT, MISSING_FIELD)
  • DOMAIN_xxx (USER_EXISTS, INSUFFICIENT_BALANCE)
  • TRANSIENT_xxx (TIMEOUT, UPSTREAM_UNAVAILABLE)

Server-Side Implementation Patterns

  1. Mask Internal Errors by Default Never leak stack traces, SQL, file paths, or PII. Convert unexpected exceptions to a generic message and log the details with a correlation ID.
import { GraphQLError } from 'graphql'

function toPublicError(e: unknown, opts: { code: string; httpStatus?: number; contextId: string }) {
  return new GraphQLError('Internal server error', {
    extensions: {
      code: opts.code ?? 'INTERNAL_SERVER_ERROR',
      httpStatus: opts.httpStatus ?? 500,
      correlationId: opts.contextId,
    },
  })
}
  1. Map Known Exceptions to Structured GraphQLError
class EmailTakenError extends Error {}

function mapDomainError(e: unknown, contextId: string) {
  if (e instanceof EmailTakenError) {
    return new GraphQLError('Email already in use', {
      extensions: { code: 'EMAIL_TAKEN', httpStatus: 409, correlationId: contextId, severity: 'info' },
      path: ['createUser']
    })
  }
  return toPublicError(e, { code: 'INTERNAL_SERVER_ERROR', contextId })
}
  1. Fail Fast Before Execution
  • Reject unauthorized requests and malformed JSON at the edge with proper HTTP status codes.
  • Apply input validation (schema, constraints) early; when feasible, return multiple validation issues in userErrors.
  1. Rate Limiting and Quotas
  • Use 429 with Retry-After at the transport layer when you can decide pre-execution.
  • If limits are per-field or per-identity during execution, return domain errors with clear codes and guidance.
  1. Federation and Subgraphs
  • Propagate subgraph errors with enough context in extensions (e.g., subgraphName). Avoid leaking internals; include only identifiers safe for clients and operators.

Handling Partial Data Deliberately

Partial success is a GraphQL superpower. Design UI and clients to:

  • Render available data while highlighting incomplete sections.
  • Provide actionable toasts/tooltips using extensions.code.
  • Avoid retry storms; only retry idempotent, transient failures.

Example partial response for a feed where one card failed:

{
  "data": {
    "feed": [ { "id": "1", "title": "OK" }, null, { "id": "3", "title": "OK" } ]
  },
  "errors": [
    { "message": "Card unavailable", "path": ["feed", 1], "extensions": { "code": "NOT_FOUND" } }
  ]
}

Incremental Delivery: @defer and @stream

With deferred and streamed results, errors arrive with their own path context. Best practices:

  • Preserve ordering and associate errors with the chunk via path.
  • Surface non-fatal chunk errors in the UI region that owns that fragment.
  • Treat transport disconnects as network errors; reconcile partial UI state on reconnection.

Client-Side Patterns (Apollo/Relay and Beyond)

  1. Separate Network vs. GraphQL Errors
  • Network errors: HTTP failures, timeouts, DNS. Generally retryable with backoff.
  • GraphQL errors: Present in errors[]. Only retry if extensions.code signals a transient condition.
  1. Use Error Policies Intentionally
  • none: treat any GraphQL error as fatal for the operation.
  • all/ignore: allow partial data; surface errors to UI while continuing. Pick per-screen based on UX, not globally.
  1. Normalize and Centralize Handling
  • Add an error link/middleware to translate extensions.code into user messages and analytics events.
  • Use type policies or fragments to isolate error-prone fields behind boundaries.
  1. Make UIs Resilient
  • Display field-level validation messages from userErrors.
  • For forbidden/not found, prefer friendly empty states over modals.
  • For destructive operations, request confirmations and display precise failure reasons.

Security and Privacy

  • Redact sensitive fields in logs and in any extensions.
  • Maintain an allowlist of public messages; default to generic text.
  • Include a correlationId in extensions to link user reports to server logs.
  • Throttle error responses to deter probing.

Observability and SLOs

Instrument errors with structure, not strings.

  • Dimensions: code, operationName, fieldPath, httpStatus, userAgent, subgraph/service.
  • Emit metrics (counters) and traces (with spans per resolver) to correlate latency and failures.
  • Build dashboards for: top failing operations, top error codes, partial-data rate, retry rate, P95/P99 latency.

Testing Strategy

  • Unit-test resolver error mapping (domain exceptions → GraphQLError/union types).
  • Contract tests for canonical error cases (auth, input validation, rate limit, not found).
  • Snapshot representative responses to guard structure (data + errors + extensions).
  • Chaos tests for timeouts and upstream failures; verify masking and retry behavior.

Versioning and Stability

  • Treat error codes as part of your public contract.
  • Deprecate codes like APIs: announce, dual-emit (old + new) during migration, then remove.
  • Keep messages stable enough for UX copy but consider them non-contractual; clients should key on extensions.code.

Reference Patterns You Can Mix and Match

  1. Payload + userErrors (mutations):
  • Best for form flows; batch multiple issues.
  1. Typed result unions (queries):
  • Great DX with codegen; exhaustive handling in clients.
  1. GraphQL errors with extensions:
  • Ideal for unexpected or cross-cutting concerns (auth, rate limit, upstream outages).

Consistent use matters more than picking a single pattern.

Practical Checklist

  • Define and document a stable error code taxonomy.
  • Decide per-operation strategy: payload errors, unions, or GraphQL errors.
  • Implement masking with correlation IDs.
  • Map known domain exceptions to structured errors.
  • Use correct HTTP statuses for transport failures.
  • Support partial data and choose error policies intentionally.
  • Instrument errors with metrics and traces.
  • Test canonical cases and chaos scenarios.

Conclusion

GraphQL’s flexibility makes error handling both powerful and easy to misuse. By designing errors as a first-class contract—typed where possible, consistently coded, properly masked, and well-instrumented—you’ll deliver APIs that are clearer for consumers and safer for operators, while preserving GraphQL’s strengths: partial data, incremental delivery, and schema-driven evolution.

Related Posts