In the HTTP protocol, clients use request messages to perform operations, defined by request methods, on resources identified by request URIs. However, servers aren’t always able or willing to completely and successfully perform these requested operations. The subject of this post is to present proper ways for HTTP servers to express these non-success outcomes.

Status codes

The primary way to communicate the request completion result is via the response message’s status code. The status code is a three-digit integer divided into five classes, (list adapted from RFC 7231):

1xx (Informational): The request was received, continuing process
2xx (Successful): The request was successfully received, understood, and accepted
3xx (Redirection): Further action needs to be taken in order to complete the request
4xx (Client Error): The request contains bad syntax or cannot be fulfilled
5xx (Server Error): The server failed to fulfill an apparently valid request

The last two of these five classes, 4xx and 5xx, are used to represent non-success outcomes. The 4xx class is used when the request is not completely understood by the server (e.g. incorrect HTTP syntax) or fails to satisfy the server requirements for successful handling (e.g. client must be authenticated). These are commonly referred as client errors.

On the other hand, 5xx codes should be strictly reserved for server errors, i.e., situations where the request is not successfully completed due to a abnormal behavior on the server.

Here are some of basic rules that I tend to use when choosing status codes:

Never use a 2xx to represent a non-success outcome. Namely, always use a 4xx or 5xx to represent those situations, except when the request can be completed by taking further actions, in which a 3xx could be used.
Reserve the 5xx status code for errors where the fault is indeed on the server side. Examples are infrastructural problems, such as the inability to connect to external systems, such as a database or service, or programming errors such as an indexation out of bounds or a null dereference.
Inability to successfully fulfill a request due to malformed or invalid information in the request must instead be signaled with 4xx status codes. Some examples are: the request URI does not match any known resource; the request body uses an unsupported format; the request body has invalid information.
As a rule of thumb, and perhaps a little hyperbolically, if an error does not require waking up someone in the middle of night then probably it shouldn’t be signaled using a 5xx class code, because it does not signals a server malfunction.

The HTTP specification also defines a set of 41 concrete status codes and associated semantics, from which 19 belong to the 4xx class and 6 belong to the 5xx class. These standard codes are a valuable resource for the Web API designer, which should simultaneously respect and take advantage of this semantic richness when designing the API responses. Here are some rule of thumb:

Use 500 for server unexpected errors, reserving 503 for planned service unavailability.
Reserve the 502 and 504 codes for reverse proxies.
A failure when contacting an internal third-party system should still use a 500 when this internal system is not visible to the client.
Use 401 when the request has invalid or missing authentication/authorization information required to perform the operation.
If this authentication/authorization information is valid but the operation is still not allowed, then use 403.
Use 404 when the resource identified by the request URI does not exist or the server does not want to reveal its existence. For resources represented as lists, an empty list should use 200 and not 404, since the resource does exist.
Use 400 if parts of the request are not valid, such as fields in the request body. For invalid query string parameters I tend to use 404 since the query string is an integral part of the URI, however using 400 is also acceptable.

HTTP status codes are extensible, meaning that other specifications, such as WebDav can define additional values. The complete list of codes is maintained by IANA at the Hypertext Transfer Protocol (HTTP) Status Code Registry. This extensibility means that HTTP clients and intermediaries are not obliged to understand all status codes. However, they must understand each code class semantics. For instance, if a client receives the (not yet defined) 499 status code, then it should treat it as a 400 and not as a 200 or a 500.

Despite its richness, there aren’t HTTP status code for all possible failure scenarios. Namely, by being uniform, these status code don’t have any domain-specific semantics. However, there are scenarios where the server needs to provide the client with a more detailed error cause, namely using domain-specific information.

Two common anti-patterns are:

Redefining the meaning of standard code for a particular set of resources. This solution breaks the uniform interface contract: the semantics of the status code should be the same independently of the request’s target resource.
Using an unassigned status code in the 4xx or 5xx classes. Unless this is done via a proper registration of the new status code in IANA, this decision will hinder evolution and most probably will collide with future extensions to the HTTP protocol.

Error representations

Instead of fiddling with status codes, a better solution is to use the response payload to provide a complementary representation of the error cause. And yes, a response message may (and probably should) contain a body even when it represents an error outcome - response bodies are not exclusive of successful responses.

The RFC 7807 - Problem Details for HTTP APIs is an IETF specification defining JSON and XML formats to represent such error information. The following excerpt, taken from the specification document, exemplifies how further information can be conveyed on a response with 403 (Forbidden) status code, stating the domain specific reason for the request prohibition.

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
Content-Language: en

{
    "type": "https://example.com/probs/out-of-credit",
    "title": "You do not have enough credit.",
    "detail": "Your current balance is 30, but that costs 50.",
    "instance": "/account/12345/msgs/abc",
    "balance": 30,
    "accounts": ["/account/12345","/account/67890"]
}

The application/problem+json media type informs the receiver that the payload is using this format and should be processed according to its rules. The payload is comprised by a JSON object containing both fields defined by the specification and fields that are kept domain specific. The type, title, detail and instance are of the first type, having their semantics defined by the specification

type - URI identifier defining the domain-specific error type. If it is URL, then its dereference can provide further information on the error type.
title - Human-readable description of the error type.
detail - Human-readable description of this specific error occurrence.
instance - URI identifier for this specific error occurrence.

On the other hand, the balance and accounts fields are domain specific extensions and their semantics is scoped to the type identifier. This allows the same extensions to be used by different HTTPs APIS with different semantics as long as the used type identifiers are different. I recommend an HTTP API to have a central place documenting all type values as well as the domain specific fields associated to each one of these values.

Using this format presents several advantages when compared with constantly “reinventing the wheel” with ad-hoc formats:

Taking advantage of rich and well defined semantics for the specification defined fields - type, title, detail and instance.
Making the non-success responses easier to understand and handle, namely for developers that are familiar with this common format.
Being able to use common libraries to produce and consume this format. When using a response payload to represent the error details one might wonder if there is still a need to use proper 4xx or 5xx class codes to represents error. Namely, can’t we just use 200 for every response, independently of the outcome and have the client use the payload to distinguish them? My answer is an emphatic no: using 2xx status to represent non-success breaks the HTTP contract, which can have consequences on the behavior of intermediary components. For instance, a cache will happily cache a 200 response even it’s payload is in the application/problem+json format. Notice that the operation of most intermediaries is independent of the messages payload. And yes, HTTP intermediaries are still relevant on an HTTPS world: intermediaries can live before (e.g. client caching) and after (e.g. output caching) the TLS connection endpoints.

The HTTP protocol and associated ecosystem provides richer ways to express non-success outcomes, via response status codes and error representations. Taking advantage of those is harnessing the power of the Web for HTTP APIs.

Additional Resources

Indicating Problems in HTTP APIs - a post by Mark Nottingham, co-author of RFC 7807 - Problem Details for HTTP APIs, introducing this specification.
Succeeding in Failing - a video by Darrel Miller, presented at NDC Oslo 2015.