Skip to main content

Threads in Node

The quick and simple answer is: to have it excel in the only area where Node has suffered in the past: dealing with heavy CPU intensive computations. This is mainly why Node.js is not strong in areas such as AI, Machine Learning, Data Science and similar. There are a lot of efforts in progress to solve that, but we’re still not as performant as when deploying microservices for instance.
So I’m going to try and simplify the technical documentation provided by the initial PR and the official docs into a more practical and simple set of examples. Hopefully that’ll be enough to get you started.


So how do we use the new Threads module?

To start with, you’ll be requiring the module called “worker_threads”.
Note that this will only work if you use the --experimental-worker flag when executing the script, otherwise the module will not be found.
Notice how the flag refers to workers and not threads, this is how they’re going to be referenced throughout the documentation: worker threads or simply workers.
If you’ve used multi-processing in the past, you’ll see a lot of similarities with this approach, but if you haven’t, don’t worry, I’ll explain as much as I can.

What can you do with them?

Worker threads are meant, like I mentioned before, for CPU intensive tasks, using them for I/O would be a waste of resources, since according to the official documentation, the internal mechanism provided by Node to handle async I/O is much more efficient than using a worker thread for that, so… don’t bother.
Let’s start with a simple example of how you would go about creating a worker and using it.

const { Worker, isMainThread, workerData } = require('worker_threads');
let currentVal = 0; let intervals = [100,1000, 500]
function counter(id, i){
console.log("[", id, "]", i)
return i;
}
if(isMainThread) {
console.log("this is the main thread")
for(let i = 0; i < 2; i++) {
let w = new Worker(__filename, {workerData: i});
}
setInterval((a) => currentVal = counter(a,currentVal + 1), intervals[2], "MainThread");
} else {
console.log("this isn't")
setInterval((a) => currentVal = counter(a,currentVal + 1), intervals[workerData], workerData);
}
The above example will simply output a set of lines showing incremental counters, which will increase their values using different speeds.
The output of the above piece of code
Let’s break it down:
  1. The code inside the IF statement creates 2 worker threads, the code for them is taken from the same file, due to the __filenameparameter passed. Workers need the full path to the files right now, they can’t handle relative paths, so that is why this value is used.
  2. The 2 workers are sent a value as a global parameter, in the form of the workerDataattribute you see as part of the second argument. That value can then be accessed through a constant with the same name (see how the constant is created in the first line of the file and used later on in the last line).
This example is one of the m host basic things you can do with this module, but it’s not really that fun, is it? Let’s look at another example.
Let’s try now to do some “heavy” computation while at the same time, doing some async stuff in the main thread.
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads')
const request = require("request");
if(isMainThread) {
console.log("This is the main thread")
let w = new Worker(__filename, {workerData: null});
w.on('message', (msg) => { //A message from the worker!
console.log("First value is: ", msg.val);
console.log("Took: ", (msg.timeDiff / 1000), " seconds");
})
w.on('error', console.error);
w.on('exit', (code) => {
if(code != 0)
console.error(new Error(`Worker stopped with exit code ${code}`))
});
request.get('http://www.google.com', (err, resp) => {
if(err) {
return console.error(err);
}
console.log("Total bytes received: ", resp.body.length);
})
} else { //the worker's code
function random(min, max) {
return Math.random() * (max - min) + min
}
const sorter = require("./test2-worker");
const start = Date.now()
let bigList = Array(1000000).fill().map( (_) => random(1,10000))
sorter.sort(bigList);
parentPort.postMessage({ val: sorter.firstValue, timeDiff: Date.now() - start});

}
This time around, we’re requesting the homepage for Google.com and at the same time, sorting a randomly generated array of 1 million numbers. This is going to take a few seconds, so it’s perfect for us to show how well this behaves. We’re also going to measure the time it takes for the worker thread to perform the sorting and we’re going to send that value (along with the first sorted value) to the main thread, where we’ll display the results.
Here is the output from example #2
The main takeaway from this example, is the communication between threads.
Workers can receive messages in the main thread through the on method. The events we can listen to are the ones shown on the code. The message event is triggered whenever we send a message from the actual thread using the parentPort.postMessagemethod. You could also send a message to the thread’s code using the same method, on your worker instance and catching them using the parentPortobject.
In case you’re wondering, the code for the helper module I used is here, although there is nothing note-worthy about it.
Let’s now look at a very similar example, but with a cleaner code, giving you a final idea of how you could structure your worker thread’s code.
As a final example, I’m going to stick to the same functionality, but showing you how you could clean it up a bit and have a more maintainable version.
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const request = require("request");
function startWorker(path, cb) {
let w = new Worker(path, {workerData: null});
w.on('message', (msg) => {
cb(null, msg)
})
w.on('error', cb);
w.on('exit', (code) => {
if(code != 0)
console.error(new Error(`Worker stopped with exit code ${code}`))
});
return w;
}
console.log("this is the main thread")
let myWorker = startWorker(__dirname + '/workerCode.js', (err, result) => {
if(err) return console.error(err);
console.log("[[Heavy computation function finished]]")
console.log("First value is: ", result.val);
console.log("Took: ", (result.timeDiff / 1000), " seconds");
})
const start = Date.now();
request.get('http://www.google.com', (err, resp) => {
if(err) {
return console.error(err);
}
console.log("Total bytes received: ", resp.body.length);
//myWorker.postMessage({finished: true, timeDiff: Date.now() - start}) //you could send messages to your workers like this
})
Regards And your thread code can be inside another file, such as:
const { parentPort } = require('worker_threads');
function random(min, max) {
return Math.random() * (max - min) + min
}
const sorter = require("./test2-worker");
const start = Date.now()
let bigList = Array(1000000).fill().map( (_) => random(1,10000))
/**
//you can receive messages from the main thread this way:
parentPort.on('message', (msg) => {
console.log("Main thread finished on: ", (msg.timeDiff / 1000), " seconds...");
})
*/
sorter.sort(bigList);
parentPort.postMessage({ val: sorter.firstValue, timeDiff: Date.now() - start});
Breaking this one down, we see:
  1. Main thread and worker threads now have their code inside different files. This is easier to maintain and extend.
  2. The startWorkerfunction returns the new instance, allowing you to later send messages to it if you so wanted.
  3. You no longer need to worry if your main thread’s code is actually the main thread (we removed the main IF statement).
  4. You can see in the worker’s code how you would receive a message from the main thread, allowing for a two-way asynchronous communication.
That is going to be it for this article, I hope you got enough to understand how to get started to play around with this new module. Remember that:
  1. This is still highly experimental and things explained here can change in future releases
  2. Go and read the PR comments and docs, there is more information about this in there, I just focused on the basic steps to get it going.
  3. Have fun! Play around, report bugs and suggest improvements, this is just starting

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...