Skip to main content

Use cases for Node workers (or ways to improve performance of Node servers)


In the past, Node.js was often not an option when building applications that require CPU intensive computation. This is due to its non-blocking, event-driven I/O architecture. With the advent of thread workers in Node.js, it is possible to use it for CPU intensive applications. In this article, we will take a look at certain use cases of worker threads in a Node.js application.
Before continuing with the use cases of thread workers in Node.js, let’s do a quick comparison of I/O-bound vs CPU-bound in Node.

I/O-bound vs CPU-bound in Node.js

I/O bound

A program is said to be bound by a resource if an increase in the resource leads to improved performance of the program. Increase in the speed of the I/O subsystem(such as memory, hard disk speed or network connection) increases the performance of an I/O bound program. This is typical of Node.js applications as the event loop often spends time waiting for the network, filesystem and perhaps database I/O to complete their operations before continuing with code execution or returning a response. Increasing hard disk speed and/or network connection would usually improve the overall performance of the application or program.

CPU bound

A program is CPU bound if its processing time reduces by an increase in CPU. For instance, a program that calculates the hash of a file will processes faster on a 2.2GHz processor and process slower on a 1.2GHz.
For CPU bound applications the majority of the time is spent using the CPU to do calculations. In Node.js, CPU bound applications block the event and cause other requests to be held up.

Node.js golden rule

Don’t block the event loop, keep it running and avoid anything that could block the thread-like synchronous network calls or infinite loops.
Node runs in a single-threaded event loop, using non-blocking I/O calls, allowing it to concurrently support tens of thousands of computations running at the same time, for example serving multiple incoming HTTP requests. This works well and is fast as long as the work associated with each client at any given time is small. But if you perform CPU intensive calculations, your concurrent Node.js server will come to a screeching halt. Other incoming requests will wait as only one request is being served at a time.
Certain strategies have been used to cope with CPU intensive tasks in Node.js. Multiple processes (like cluster API) that make sure that the CPU is optimally used, child processes that spawn up a new process to handle blocking tasks.
These strategies are advantageous because the event loop is not blocked, it also allows separation of processes, so if something goes wrong in one process, it does not affect other processes. However, since the child processes run in isolation they are not able to share memory with each other and the communication of data must be via JSON, which requires serialization and deserialization of data.
The best solution for CPU intensive computation in Node.js is to run multiple Node.js instances inside the same process, where memory can be shared and there would be no need to pass data via JSON. This is exactly what worker threads do in Node.js.
worker threads display

Real-world CPU intensive tasks that can be done with thread workers

We will look at a few use cases of thread workers in a Node.js application. We will not be looking at thread worker APIs because we will just be looking at use cases of thread workers in a node application. If you are not familiar with thread workers you can visit this post get started with how to use thread worker APIs.

Image resizing

Let’s say you are building an application that allows users to upload a profile image and then you generate multiple sizes (eg: 100 x 100 and 64 x 64) of the image for the various use cases within the application. The process of resizing the image is CPU intensive and having to resize into two different sizes would also increase the time spent by the CPU resizing the image. The task of resizing the image can be outsourced to a separate thread while the main thread handles other lightweight tasks.
// worker.js
const { parentPort, workerData } =  require("worker_threads");
const  sharp  =  require("sharp");

async  function  resize() {

    const  outputPath  =  "public/images/" + Date.now() +  ".png";
    const { image, size } =  workerData;

    await  sharp(image)
    .resize(size, size, { fit:  "cover" })
    .toFile(outputPath);
 parentPort.postMessage(outputPath);
}
resize()
// mainThread.js
const { Worker } =  require("worker_threads");

module.exports  =  function  imageResizer(image, size) {

    return  new  Promise((resolve, reject) => {
    const  worker  =  new  Worker(__dirname  +    "/worker.js", {
workerData: { image, size }
});
    worker.on("message", resolve);
    worker.on("error", reject);
    worker.on("exit", code  => {
        if (code  !==  0)
            reject(new  Error(`Worker stopped with exit code ${code}`));
        });
    });
};
The main thread has a method that creates a thread for resizing each image. It passes the size and the image to the thread worker using the workerData property. The worker resizes the image with sharp and sends it back to the main thread.

Video compression

Video compression is another CPU intensive task that can be outsourced to the thread worker. Most video streaming applications would usually have multiple variations of a single video which is shown to users depending on their network connection. Thread workers can do the job of compressing the video to various sizes.
ffmpeg-fluet is a commonly used module for video processing in Node.js applications. It is dependent on ffmpeg which is a complete, cross-platform solution to record, convert and stream audio and video.
Because of the overhead of creating workers each time you need to use a new thread, it is recommended that you create a pool of workers which you can use when you need them as opposed to creating workers on the fly. To create a worker pool we use an NPM module node-worker-threads-pool, it creates worker threads pool using Node’s worker_threads module.
// worker.js
const { parentPort, workerData } =  require("worker_threads");
const  ffmpeg  =  require("fluent-ffmpeg");

function  resizeVideo({ inputPath, size, parentPort }) {
    const  outputPath  =  "public/videos/"  +  Date.now() +  size  +  ".mp4";
    ffmpeg(inputPath)
    .audioCodec("libmp3lame")
    .videoCodec("libx264")
    .size(size)
    .on("error", function(err) {
    console.log("An error occurred: "  +  err.message);
    })
    .on("end", function() {
    parentPort.postMessage(outputPath);
    })
    .save(outputPath);
}

parentPort.on("message", param  => {
    resizeVideo({ ...param, parentPort });
});
// mainThread.js
const { StaticPool } =  require("node-worker-threads-pool");

  const  filePath  =  __dirname  +  "/worker.js";
  const  pool  =  new  StaticPool({
        size:  4,
        task:  filePath,
        workerData:  "workerData!"
    });

  const  videoSizes  = ["1920x1080", "1280x720",   "854x480", "640x360"];

module.exports  =  async  function compressVideo(inputPath) {
    const  compressedVideos  = [];
    videoSizes.forEach(async  size  => {
        const  video  =  await  pool.exec({ inputPath, size });
        compressedVideos.push(video);
    });
};

File integrity

Suppose you have to store your files on cloud storage. You want to be sure that the files that you store are not tampered by any third party. You can do it by computing hash of that file using a Cryptographic hash algorithm. You save these hashes and their storage location in your database. When you download the files, you compute the hash again to see if they match. The process of computing the hash is CPU intensive and can be done in a thread worker:
// hashing.js
const {
  Worker, isMainThread, parentPort, workerData
} = require('worker_threads');
const  crypto  =  require("crypto");
const  fs  =  require("fs");


if (isMainThread) {
  module.exports = async function hashFile(filePath) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename);
      worker.on('message', resolve);
      worker.on('error', reject);
      worker.on('exit', (code) => {
        if (code !== 0)
          reject(new Error(`Worker stopped with exit code ${code}`));
      });
      worker.postMessage(filePath)
    });
  };
} else {
    const  algorithm  =  "sha1";
    const  shasum  =  crypto.createHash(algorithm);
    const  stream  =  fs.ReadStream(filePath);
    stream.on("data", function(data) {
        shasum.update(data);
    });
    stream.on("end", function() {
        const  hash  =  shasum.digest("hex");
        parentPort.postMessage(hash);
    });
}
Notice that we have both the worker thread code and the main thread code in the same file. The isMainThread property of the thread worker helps us determine the current thread and run the code appropriate for each thread. The main thread creates a new worker and listens to events from the worker. The worker thread calculates the hash of a stream of data using Node.js cryptomethod called createHash.

Conclusion

A Node.js thread worker is a great option when we want to improve performance by freeing up the event loop. One thing to note is that workers are useful for performing CPU-intensive JavaScript operations. Do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than worker threads can.

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...