Error Handling Design
Error Handling Design
Non-Negotiables
- No swallowed errors. Silence is sabotage.
- Errors must be diagnosable. Every error needs context and a path to action.
- External errors must be stable. Clients depend on contracts.
- Security beats verbosity. Never leak secrets or internal details.
Minimum Requirements (Industry Standard Baseline)
Error Taxonomy
Define and enforce categories (examples):
- Validation (client fix)
- Authentication/Authorization (client fix)
- Not Found (client fix)
- Conflict (client fix / concurrency)
- Dependency Failure (server/operator action)
- Internal (server action)
Error Contract (API)
Return a consistent shape:
code: stable machine-readable codemessage: human-readable summarydetails: optional structured fields (safe to expose)trace_id: correlation id for support
Boundaries
- Catch exceptions at boundaries only (HTTP handler, job entrypoint, message consumer).
- Inside core logic: prefer returning typed results or throwing domain errors.
Retry Policy
- Retries are for transient failures only.
- Always use:
- max attempts
- exponential backoff
- jitter
- circuit breakers where applicable
Observability
- Every error path must:
- log once (structured)
- emit metrics (error rate)
- preserve stack traces internally
References
- RFC 9457 (Problem Details for HTTP APIs): https://www.rfc-editor.org/rfc/rfc9457
- Google Cloud API Design Guide (errors): https://cloud.google.com/apis/design/errors
- Resilience patterns (timeouts/retries): https://learn.microsoft.com/en-us/azure/architecture/patterns/