Skip to main content

The Circuit Breaker Pattern

The circuit breaker states

Closed: The closed state is the default "everything is working as expected" state. Requests pass freely through. When certain failures happen, they cause a circuit break and closed moves to open.

circuit-closed

Open: The open state rejects all requests for a fixed amount of time without attempting to send them. Once the breaker trips, it enters the open state. At this point, any requests to the service will fail automatically.

circuit-open

Half-Open: The breaker allows a set number of requests through in order to test the status of the resource. The half-open state determines if the circuit returns to closed or open.

circuit-half

These states are dependent on pre-set criteria, known as thresholds, that might include qualities like: error rates in a given time-frame, latency, traffic, or even resource utilization. Many circuit breaker libraries, like opossum for Node.js, allow you to define a mix of threshold metrics as part of the configuration.

Once configured, the breaker will handle the changing of state and as a result allow or deny requests. The circuit breaker pattern also lends itself nicely to state machines implementations, as the movement between states is well defined.

Determining the right threshold criteria

Threshold criteria can come in many forms. For APIs where speed is important, latency may be your core threshold. For an API that handles user accounts, uptime will be more important. Circuit breaker design is all down to choosing the right threshold criteria.

You can make these determinations by analyzing your API calls. If you don't already have a monitoring solution, this may be a difficult task to nail down on the first few tries. Alternately, you can use something like the Bearer Agent to automatically monitor and log calls to APIs. Then, you can analyze the data and set up notifications. This provides a great foundation for informing your circuit breaker decisions.

Let's look at a basic example. Assume we have a function in our application that returns posts for a specific user, getPosts. It is a wrapper around a third-party's API client, apiClient. To implement the circuit breaker, let's use a popular library.

In opossum, this looks roughly like the following:

// Require the library and client
const circuitBreaker = require("opossum")
const apiClient = require("our-example-api-client")

// Our example request
const getPosts = (req, res, next) => {
  // Wrap our client get method in the circuit breaker
  const breaker = circuitBreaker(apiClient.get, {
    timeout: 3000,
    errorThresholdPercentage: 50,
    resetTimeout: 5000
  })

  // Call the API from within the circuit breaker
  return breaker
    .fire("/api/posts")
    .then(res.json)
    .catch(next)
}

Now when there are problems with the /api/posts endpoint from the third-party API, our circuit breaker starts a rolling assessment window. It captures the effectiveness of each call, and when a problem arises it triggers the breaker.

The configuration object allows us to set a timeout for requests (3 seconds), an error threshold percentage (50%), and a reset timeout when the "open state" of the breaker will transition to half-open. (5 seconds). It may look something like the following diagram:

Circuit Breaker Diagram

Some common thresholds that can be used to trip the breaker in this pattern are server timeout, increase in errors or failures, failing status codes, and unexpected response types.

Reacting to failures

Circuit breakers are useful for delaying retries and preventing unnecessary requests, but the true power comes in how your application can react to the states of the breaker.

Existing libraries can help with this. For example, opossum allows you to run a fallback function when the breaker triggers the failure state. Alternately, event emitters can notify your app that the state changed.

For example, this gives your application the power to do things like:

  • When certain services fail, replace them by using alternate APIs.
  • Return cached data from a previous response, and notify the user.
  • Provide feedback to the user and retry the action in the background.
  • Log problems to your preferred logging solution, or use a logging service.

The resulting state lifecycle

To bring it all back together, an example of the full lifecycle of a circuit breaker is as follows:

  • The state starts closed. The service works as expected.
  • Multiple failures occur when trying to reach the service. Some could be a timeout, while others may be server errors.
  • The circuit breaker trips, and moves into the open state. The open state has a set time that it waits before performing any action, so it waits.
  • Any incoming requests to the service at this time immediately fail. The breaker continues to wait in the open state.
  • After this timeout, the breaker moves to the half-open state.
  • A portion of requests to the service are now allowed through. If a failure occurs (or a set number of failures occur), the breaker moves back to the open state and the timeout process begins again.
  • If requests in the half-open state succeed, the breaker knows that the service is working as expected and moves back to the closed state.

How the breaker design determines the timeout on the open state, the success criteria for reaching services when the breaker is half-open, and the failure threshold for moving from closed to open all determine how your application will be using the circuit breaker design.

Should you implement circuit breakers?

The uncertainty that comes with adding an external API can quickly add to your application's technical debt. By relying on proven patterns, like the circuit breaker, your team can build resiliency and graceful degradation into your application. While this article focused on external APIs and web services, this pattern also provides a great way to make sure that your own internal microservices don't cause your application to fail.

The nice part is that the design pattern itself, moving through the states when a service fails, is general enough that it can exist across different implementations.

In this article we briefly looked at opossum for Node and the browser, but most languages have a community library available:

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...