Advanced Node.Js: A Hands on Guide to Event Loop, Child Process and Worker Threads in Node.Js

What makes Node.js so performant and Scalable? Why is Node the technology of choice for so many companies? In this article, we will answer these questions and look at some of the advanced concepts that make Node.js unique. We will discuss:

Event Loop ➰
Concurrency Model 🚈
Child Process 🎛️
Threads and Worker Threads 🧵

JavaScript developers with a deeper understanding of Node.js reportedly earn 20% ~ 30% more than their peers. If you are looking to grow your knowledge of Node.js then this blog post is for you. Let’s dive in 🤿!!

What happens when you run a Node.js Program?

when we run our Node.js app it creates

1 Process 🤖
1 Thread 🧵
1 Event Loop ➰

A process is an executing program or a part of an executing program. An application can be made out of many processes. Node.js runtime, however, initiates only one process.

A thread is a basic unit to which the operating system allocates processor time. Think of threads as a unit that lets you use part of your processor.

An event loop is a continuously running loop (just like a while loop). It executes one command at a time, more on this later. For now, let’s think of it as a while loop that will run until Node has executed every line of code.

Now, let’s take a look at how our code runs inside of Node.js instance.

1

2

3

4

5

6

console.log('Task 1');

console.log('Task 2');

// some time consuming for loop

for(let i = 0; i < 1000000000; i++) {

}

console.log('Task 3');

What happens when we run this code? It will first print out Task 1 then Task 2 and then it will run the time consuming for loop (we won’t see anything in the terminal for a couple seconds) and finally it will print out Task 3. Let’s look at a diagram of what’s actually happening.

Component 1

Node puts all our tasks into an Events queue and sends them one by one to the event loop. The event loop is single-threaded and it can only run one thing at a time. So it goes through Task 1 and Task 2 then the very big for loop and then it goes to Task 3. This is why we see a pause in the terminal after Task 2 because it is running the for a loop.

Now let’s do something different. Let’s replace that for loop with an I/O event.

1

2

3

4

5

6

7

8

console.log('Task 1');

console.log('Task 2');

fs.readFile('./ridiculously_large_file.txt', (err, data) => {

    if (err) throw err;

    console.log('done reading file');

    process.exit();

});

console.log('Task 3');

Pro tip: you can generate a 100mb file in linux or mac just by running this command dd if=/dev/urandom of=ridiculously_large_file.txt bs=1048576 count=100

We would naturally assume that this will output something similar. Just like the for loop reading big files takes time and the execution on the event loop will take some time. We however, get something totally different.

1

2

3

4

Task 1

Task 2

Task 3

done reading file

But what caused this? How did Task 3 get executed before the file was read. Well let’s take a look at the visuals below to see what’s happening

Component 2

I/O tasks, network requests, database processes are classified as blocking tasks in Node.js. So whenever the event loop encounters these tasks it sends them off to a different thread and moves on to the next task in events queue. A thread gets initiated from the thread pool to handle each blocking tasks and when it is done, it puts the result in a call-back queue. When the event loop is done executing everything in the events queue it will start executing the tasks in the call-back queue. So that’s why we see done reading file at the end.

What makes the Single Threaded Event Loop Model Efficient? ⚙️

JavaScript was created to do just a simple things in the web browsers such as form validation or simple animations. This is why it was built with the single-threaded event loop model. Running everything in one thread is considered as a disadvantage.

However, in 2009 Ryan Dahl the creator of Node saw this simple event loop model as an opportunity to build a lightweight web server.

To better understand what problem Node.js solves we should look at the what typical web servers were like before Node.js came into play.

This is how a traditional multi-threaded web application model handles request:

It maintains a thread pool (a collection of available threads)
When client request comes in a thread is assigned
This thread will take care of reading Client requests, processing Client requestS, performing any Blocking IO Operations (if required) and preparing Response.
This thread is not free until a response is sent back

Main drawback of this model is handling concurrent users. So let’s say if we have more users visiting our sites than there are available threads then some users will need to wait until a thread frees up to get response. If a lot of users are performing blocking I/O tasks then this wait time also increases. This is also very resource-heavy if we are expecting one million concurrent users we better make sure we have enough threads to handle those requests.

Moreover, the server itself start to slow down because of increasing load. There’s also the overhead of context switching between threads and writing applications to optimize threads resource sharing can be painful.

Because of the single-threaded model Node.js, it doesn’t need to spin off new threads for every single request. Node.js also delegates blocking tasks to other components as we saw earlier. Since we don’t really care about many threads it makes node.js very lightweight and ideal for microservice-based architecture.

Drawbacks of Node’s Single Threaded Model !!!

The single-threaded event loop architecture uses resources efficiently but it doesn’t have some drawbacks. The Node.js instance cannot immediately benefit from multiple cores in your CPU. A Java application can have immediate access to more memory as we upgrade our hardware but Node runs on a single thread.

This is 2020 😄 and we are seeing more and more complicated web applications. What if our application needs to do complex computation, run a machine learning algorithm? Or What if we want to run a complicated crypto algorithm? In this case we have to harness the power of multiple cores to increase performance.

Languages like Java and C# can programmatically initiate threads and harness the power of multiple cores. In Node.js that is not an option as we saw earlier. Node’s way of solving this problem is child_process.

Child Process in Node

The child_process module gives node the ability to spawn child process by accessing operating system commands.

Let’s assume we have a REST endpoint that has a long-running function and we would like to use multiple cores in our processor to execute this function.

Here’s our code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

const { fork } = require('child_process');

app.get('/endpoint', (request, response) => {

   // fork another process

   const process_ml_algo = fork('./process_data.js');

   const data = request.body.data;

   // send send the data to forked process

   process_ml_algo.send({ data });

   // listen to forked process 

   process.on('ml_algo', (result) => {

     log.info(`ml_algo executed with ${result}`);

   });

   return response.json({ status: true, sent: true });

});

1

2

3

4

5

6

7

// receive message from master process

process.on('ml_algo', async (message) => {

    const result = await runMachineLearningProcess(message.mails); 

    // send response to master process

    process.send({ result: result });

});

In the example above we demonstrate how we can spin off a new process and share data between them. Using the forked process we can take advantage of multiple cores of CPU.

You can take a look at all the methods of child processes in the official node docs.

Here is a diagram of how child process work

Component 3

child_process is a good solution but there’s another option. child_process module spins off new instances of Node to distribute the workload all these instances will each have 1 event loop 1 thread and 1 process. In 2018 Node.js introduced worker_thread. This module allows node the ability to have

1 Process
Multiple threads
1 Event Loop per thread

Yes!! You read that right 😄.

Component 4

1

2

3

4

5

6

7

8

9

10

const { Worker, workerData, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {

  const worker1 = new Worker(__filename, { workerData: 'Worker Data 1'});

  worker1.once('message', message => console.log(message));

  const worker2 = new Worker(__filename, { workerData: 'Worker Data 2' });

  worker2.once('message', message => console.log(message));

} else {

  parentPort.postMessage('I am ' + workerData);

}

We check if it is the main thread and then create two workers and pass on messages. On the worker thread the data gets passed on through postMessage method and the workers execute the command.

Since worker_threads makes new threads inside the same process it requires less resources. Also we are able to pass data between these threads because they have the shared memory space.

As of January 2020 worker_threads are fully supported in the Node LST version 12. I highly recommend reading up the following post if you want to learn more about worker_threads.

Node.js multithreading

NEW TECH UPDATES

Search This Blog