The circuit breaker states
Closed: The closed state is the default "everything is working as expected" state. Requests pass freely through. When certain failures happen, they cause a circuit break and closed moves to open.
Open: The open state rejects all requests for a fixed amount of time without attempting to send them. Once the breaker trips, it enters the open state. At this point, any requests to the service will fail automatically.
Half-Open: The breaker allows a set number of requests through in order to test the status of the resource. The half-open state determines if the circuit returns to closed or open.
These states are dependent on pre-set criteria, known as thresholds, that might include qualities like: error rates in a given time-frame, latency, traffic, or even resource utilization. Many circuit breaker libraries, like opossum for Node.js, allow you to define a mix of threshold metrics as part of the configuration.
Once configured, the breaker will handle the changing of state and as a result allow or deny requests. The circuit breaker pattern also lends itself nicely to state machines implementations, as the movement between states is well defined.
Determining the right threshold criteria
Threshold criteria can come in many forms. For APIs where speed is important, latency may be your core threshold. For an API that handles user accounts, uptime will be more important. Circuit breaker design is all down to choosing the right threshold criteria.
You can make these determinations by analyzing your API calls. If you don't already have a monitoring solution, this may be a difficult task to nail down on the first few tries. Alternately, you can use something like the Bearer Agent to automatically monitor and log calls to APIs. Then, you can analyze the data and set up notifications. This provides a great foundation for informing your circuit breaker decisions.
Let's look at a basic example. Assume we have a function in our application that returns posts for a specific user, getPosts
. It is a wrapper around a third-party's API client, apiClient
. To implement the circuit breaker, let's use a popular library.
In opossum, this looks roughly like the following:
// Require the library and client
const circuitBreaker = require("opossum")
const apiClient = require("our-example-api-client")
// Our example request
const getPosts = (req, res, next) => {
// Wrap our client get method in the circuit breaker
const breaker = circuitBreaker(apiClient.get, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 5000
})
// Call the API from within the circuit breaker
return breaker
.fire("/api/posts")
.then(res.json)
.catch(next)
}
Now when there are problems with the /api/posts
endpoint from the third-party API, our circuit breaker starts a rolling assessment window. It captures the effectiveness of each call, and when a problem arises it triggers the breaker.
The configuration object allows us to set a timeout for requests (3 seconds), an error threshold percentage (50%), and a reset timeout when the "open state" of the breaker will transition to half-open. (5 seconds). It may look something like the following diagram:
Some common thresholds that can be used to trip the breaker in this pattern are server timeout, increase in errors or failures, failing status codes, and unexpected response types.
Reacting to failures
Circuit breakers are useful for delaying retries and preventing unnecessary requests, but the true power comes in how your application can react to the states of the breaker.
Existing libraries can help with this. For example, opossum allows you to run a fallback function when the breaker triggers the failure state. Alternately, event emitters can notify your app that the state changed.
For example, this gives your application the power to do things like:
- When certain services fail, replace them by using alternate APIs.
- Return cached data from a previous response, and notify the user.
- Provide feedback to the user and retry the action in the background.
- Log problems to your preferred logging solution, or use a logging service.
The resulting state lifecycle
To bring it all back together, an example of the full lifecycle of a circuit breaker is as follows:
- The state starts closed. The service works as expected.
- Multiple failures occur when trying to reach the service. Some could be a timeout, while others may be server errors.
- The circuit breaker trips, and moves into the open state. The open state has a set time that it waits before performing any action, so it waits.
- Any incoming requests to the service at this time immediately fail. The breaker continues to wait in the open state.
- After this timeout, the breaker moves to the half-open state.
- A portion of requests to the service are now allowed through. If a failure occurs (or a set number of failures occur), the breaker moves back to the open state and the timeout process begins again.
- If requests in the half-open state succeed, the breaker knows that the service is working as expected and moves back to the closed state.
How the breaker design determines the timeout on the open state, the success criteria for reaching services when the breaker is half-open, and the failure threshold for moving from closed to open all determine how your application will be using the circuit breaker design.
Should you implement circuit breakers?
The uncertainty that comes with adding an external API can quickly add to your application's technical debt. By relying on proven patterns, like the circuit breaker, your team can build resiliency and graceful degradation into your application. While this article focused on external APIs and web services, this pattern also provides a great way to make sure that your own internal microservices don't cause your application to fail.
The nice part is that the design pattern itself, moving through the states when a service fails, is general enough that it can exist across different implementations.
In this article we briefly looked at opossum for Node and the browser, but most languages have a community library available:
- circuit_breaker or simple_circuit for Ruby
- go-circuitbreaker for Go
- circuitbreaker for Python
- failsafe for Rust
Comments
Post a Comment