Skip to main content

Use cases for Node workers (or ways to improve performance of Node servers)


In the past, Node.js was often not an option when building applications that require CPU intensive computation. This is due to its non-blocking, event-driven I/O architecture. With the advent of thread workers in Node.js, it is possible to use it for CPU intensive applications. In this article, we will take a look at certain use cases of worker threads in a Node.js application.
Before continuing with the use cases of thread workers in Node.js, let’s do a quick comparison of I/O-bound vs CPU-bound in Node.

I/O-bound vs CPU-bound in Node.js

I/O bound

A program is said to be bound by a resource if an increase in the resource leads to improved performance of the program. Increase in the speed of the I/O subsystem(such as memory, hard disk speed or network connection) increases the performance of an I/O bound program. This is typical of Node.js applications as the event loop often spends time waiting for the network, filesystem and perhaps database I/O to complete their operations before continuing with code execution or returning a response. Increasing hard disk speed and/or network connection would usually improve the overall performance of the application or program.

CPU bound

A program is CPU bound if its processing time reduces by an increase in CPU. For instance, a program that calculates the hash of a file will processes faster on a 2.2GHz processor and process slower on a 1.2GHz.
For CPU bound applications the majority of the time is spent using the CPU to do calculations. In Node.js, CPU bound applications block the event and cause other requests to be held up.

Node.js golden rule

Don’t block the event loop, keep it running and avoid anything that could block the thread-like synchronous network calls or infinite loops.
Node runs in a single-threaded event loop, using non-blocking I/O calls, allowing it to concurrently support tens of thousands of computations running at the same time, for example serving multiple incoming HTTP requests. This works well and is fast as long as the work associated with each client at any given time is small. But if you perform CPU intensive calculations, your concurrent Node.js server will come to a screeching halt. Other incoming requests will wait as only one request is being served at a time.
Certain strategies have been used to cope with CPU intensive tasks in Node.js. Multiple processes (like cluster API) that make sure that the CPU is optimally used, child processes that spawn up a new process to handle blocking tasks.
These strategies are advantageous because the event loop is not blocked, it also allows separation of processes, so if something goes wrong in one process, it does not affect other processes. However, since the child processes run in isolation they are not able to share memory with each other and the communication of data must be via JSON, which requires serialization and deserialization of data.
The best solution for CPU intensive computation in Node.js is to run multiple Node.js instances inside the same process, where memory can be shared and there would be no need to pass data via JSON. This is exactly what worker threads do in Node.js.
worker threads display

Real-world CPU intensive tasks that can be done with thread workers

We will look at a few use cases of thread workers in a Node.js application. We will not be looking at thread worker APIs because we will just be looking at use cases of thread workers in a node application. If you are not familiar with thread workers you can visit this post get started with how to use thread worker APIs.

Image resizing

Let’s say you are building an application that allows users to upload a profile image and then you generate multiple sizes (eg: 100 x 100 and 64 x 64) of the image for the various use cases within the application. The process of resizing the image is CPU intensive and having to resize into two different sizes would also increase the time spent by the CPU resizing the image. The task of resizing the image can be outsourced to a separate thread while the main thread handles other lightweight tasks.
// worker.js
const { parentPort, workerData } =  require("worker_threads");
const  sharp  =  require("sharp");

async  function  resize() {

    const  outputPath  =  "public/images/" + Date.now() +  ".png";
    const { image, size } =  workerData;

    await  sharp(image)
    .resize(size, size, { fit:  "cover" })
    .toFile(outputPath);
 parentPort.postMessage(outputPath);
}
resize()
// mainThread.js
const { Worker } =  require("worker_threads");

module.exports  =  function  imageResizer(image, size) {

    return  new  Promise((resolve, reject) => {
    const  worker  =  new  Worker(__dirname  +    "/worker.js", {
workerData: { image, size }
});
    worker.on("message", resolve);
    worker.on("error", reject);
    worker.on("exit", code  => {
        if (code  !==  0)
            reject(new  Error(`Worker stopped with exit code ${code}`));
        });
    });
};
The main thread has a method that creates a thread for resizing each image. It passes the size and the image to the thread worker using the workerData property. The worker resizes the image with sharp and sends it back to the main thread.

Video compression

Video compression is another CPU intensive task that can be outsourced to the thread worker. Most video streaming applications would usually have multiple variations of a single video which is shown to users depending on their network connection. Thread workers can do the job of compressing the video to various sizes.
ffmpeg-fluet is a commonly used module for video processing in Node.js applications. It is dependent on ffmpeg which is a complete, cross-platform solution to record, convert and stream audio and video.
Because of the overhead of creating workers each time you need to use a new thread, it is recommended that you create a pool of workers which you can use when you need them as opposed to creating workers on the fly. To create a worker pool we use an NPM module node-worker-threads-pool, it creates worker threads pool using Node’s worker_threads module.
// worker.js
const { parentPort, workerData } =  require("worker_threads");
const  ffmpeg  =  require("fluent-ffmpeg");

function  resizeVideo({ inputPath, size, parentPort }) {
    const  outputPath  =  "public/videos/"  +  Date.now() +  size  +  ".mp4";
    ffmpeg(inputPath)
    .audioCodec("libmp3lame")
    .videoCodec("libx264")
    .size(size)
    .on("error", function(err) {
    console.log("An error occurred: "  +  err.message);
    })
    .on("end", function() {
    parentPort.postMessage(outputPath);
    })
    .save(outputPath);
}

parentPort.on("message", param  => {
    resizeVideo({ ...param, parentPort });
});
// mainThread.js
const { StaticPool } =  require("node-worker-threads-pool");

  const  filePath  =  __dirname  +  "/worker.js";
  const  pool  =  new  StaticPool({
        size:  4,
        task:  filePath,
        workerData:  "workerData!"
    });

  const  videoSizes  = ["1920x1080", "1280x720",   "854x480", "640x360"];

module.exports  =  async  function compressVideo(inputPath) {
    const  compressedVideos  = [];
    videoSizes.forEach(async  size  => {
        const  video  =  await  pool.exec({ inputPath, size });
        compressedVideos.push(video);
    });
};

File integrity

Suppose you have to store your files on cloud storage. You want to be sure that the files that you store are not tampered by any third party. You can do it by computing hash of that file using a Cryptographic hash algorithm. You save these hashes and their storage location in your database. When you download the files, you compute the hash again to see if they match. The process of computing the hash is CPU intensive and can be done in a thread worker:
// hashing.js
const {
  Worker, isMainThread, parentPort, workerData
} = require('worker_threads');
const  crypto  =  require("crypto");
const  fs  =  require("fs");


if (isMainThread) {
  module.exports = async function hashFile(filePath) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename);
      worker.on('message', resolve);
      worker.on('error', reject);
      worker.on('exit', (code) => {
        if (code !== 0)
          reject(new Error(`Worker stopped with exit code ${code}`));
      });
      worker.postMessage(filePath)
    });
  };
} else {
    const  algorithm  =  "sha1";
    const  shasum  =  crypto.createHash(algorithm);
    const  stream  =  fs.ReadStream(filePath);
    stream.on("data", function(data) {
        shasum.update(data);
    });
    stream.on("end", function() {
        const  hash  =  shasum.digest("hex");
        parentPort.postMessage(hash);
    });
}
Notice that we have both the worker thread code and the main thread code in the same file. The isMainThread property of the thread worker helps us determine the current thread and run the code appropriate for each thread. The main thread creates a new worker and listens to events from the worker. The worker thread calculates the hash of a stream of data using Node.js cryptomethod called createHash.

Conclusion

A Node.js thread worker is a great option when we want to improve performance by freeing up the event loop. One thing to note is that workers are useful for performing CPU-intensive JavaScript operations. Do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than worker threads can.

Comments

Popular posts from this blog

How to use Ngx-Charts in Angular ?

Charts helps us to visualize large amount of data in an easy to understand and interactive way. This helps businesses to grow more by taking important decisions from the data. For example, e-commerce can have charts or reports for product sales, with various categories like product type, year, etc. In angular, we have various charting libraries to create charts.  Ngx-charts  is one of them. Check out the list of  best angular chart libraries .  In this article, we will see data visualization with ngx-charts and how to use ngx-charts in angular application ? We will see, How to install ngx-charts in angular ? Create a vertical bar chart Create a pie chart, advanced pie chart and pie chart grid Introduction ngx-charts  is an open-source and declarative charting framework for angular2+. It is maintained by  Swimlane . It is using Angular to render and animate the SVG elements with all of its binding and speed goodness and uses d3 for the excellent math functio...

Understand Angular’s forRoot and forChild

  forRoot   /   forChild   is a pattern for singleton services that most of us know from routing. Routing is actually the main use case for it and as it is not commonly used outside of it, I wouldn’t be surprised if most Angular developers haven’t given it a second thought. However, as the official Angular documentation puts it: “Understanding how  forRoot()  works to make sure a service is a singleton will inform your development at a deeper level.” So let’s go. Providers & Injectors Angular comes with a dependency injection (DI) mechanism. When a component depends on a service, you don’t manually create an instance of the service. You  inject  the service and the dependency injection system takes care of providing an instance. import { Component, OnInit } from '@angular/core'; import { TestService } from 'src/app/services/test.service'; @Component({ selector: 'app-test', templateUrl: './test.component.html', styleUrls: ['./test.compon...

How to solve Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded

During the automation of multiple tasks on my job and personal projects, i decided to move on  Puppeteer  instead of the old school PhantomJS. One of the most usual problems with pages that contain a lot of content, because of the ads, images etc. is the load time, an exception is thrown (specifically the TimeoutError) after a page takes more than 30000ms (30 seconds) to load totally. To solve this problem, you will have 2 options, either to increase this timeout in the configuration or remove it at all. Personally, i prefer to remove the limit as i know that the pages that i work with will end up loading someday. In this article, i'll explain you briefly 2 ways to bypass this limitation. A. Globally on the tab The option that i prefer, as i browse multiple pages in the same tab, is to remove the timeout limit on the tab that i use to browse. For example, to remove the limit you should add: await page . setDefaultNavigationTimeout ( 0 ) ;  COPY SNIPPET The setDefaultNav...