Multi-threading in nodeJs

We all have been told that node is a single-threaded language and even being so, it can perform asynchronous tasks using a mechanism known as event loops. What this event loop does is that it brings all the code inside a nodeJs program in a stack one by one and starts executing them. Whenever it finds anything inside the code like setTimeout, setInterval, or setImmediate; it pushes the execution to message queues and brings back the results of the instructions inside the callbacks once all other synchronous instructions inside the program are being executed. That is why the code below prints B first and then A.

setTimeout(() => {
   console.log(‘A’);
});
console.log(‘B’);

This however differs when we are writing promises or async-await instructions inside our code as these instructions are pushed to a different kind of data structure called job queue. Job queue unlike message queue starts executing the code as soon as it gets it and brings in the result once it is done executing. Try running the code below to have a better idea of these two very regularly seen cases in a node developer’s life.

const bar = () => console.log(‘bar’);const baz = () => console.log(‘baz’);const foo = () => {
  console.log(‘foo’);
 
  setTimeout(bar, 0);
 
  new Promise((resolve, reject) =>
    resolve(‘should be right after baz, before bar’)
  )
   .then((resolve) => console.log(resolve));
  
   baz();
 
  };foo();

The code above will result in printing …

foo
baz
should be right after baz, before bar
bar

essentially validating our concept. But there is also a corner of the entire node ecosystem that uses multithreading by moving out of the event loop environment. This happens because a big part of the entire node ecosystem is coded using c++ code and a special javascript bridge is used by nodeJs to execute those codes. In this article, we will look into a function named pbkdf2 which is a part of the native crypto module of nodeJs. But before we start looking at these two stuffs and go really deep and try to understand how they are using threads I want to encourage you to open your terminal. Then type node; Enter; and then press TAB twice. The result will look something like this

Image for post — step 1 to understanding nodeJs internal architecture.

If you look closely, you will find each and every module that you have been using while developing your node apps like process, module, Array, promise, require, JSON and HTTP, streams and net if you are an experienced developer and worked outside expressJS or socket.io or ever tried to develop one such library by yourself. Essentially the point I am making here is that whatever you are seeing here is nodeJs and it is nothing but an API or a wrapper over a much bigger world lying underneath. Below is a small diagram of what the internal world looks like and this is where the c++ code implementations come into the picture.

A quick note : NodeJS is the abstraction layer wrapping the v8 and libuv implementations which provides real power to the ecosystem. One may argue that what was the need of an abstraction layer and why cannot we talk directly to the more powerful stuffs. In my view two answers will be sufficient
1. We want to write javascript code that can be read by the browser. The V8 and libuv has most of their code share to be c++ which brings in the need of a javascript bridge that is provided by NodeJs.
2. Performance is a key issue. A lot of c++ libraries specially in the field of cryptography and image processing are relevant even today just because no other language can run them that fast. In such cases, need of a bridge becomes imminent that can harness all the c++ features while coding in javascript.

By now, it seems that we have run a little off-topic from understanding multi-threading in node to internal architecture. But please be with me as I feel these concepts are essential to know before moving to the multi-threading stuffs and know what is there providing us a multi-threading interface. Now, let us see some code implementation and try to see multi-threading in action.

const crypto = require(‘crypto’);const start = Date.now();crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘1:’, Date.now() — start);
});

The program above is using the pbkdf2 function to hash a string named ‘password’ using a salt named ‘salt’ over a hundred thousand iterations using sha512 hashing and tries to give us the execution time from the callback at the end. Now the resulting time may be different from device to device but in my machine, it ends up in almost a second. But things happen when we try to run one more pbkdf2 hashing concurrently in the same script. Please refer to the code below.

const crypto = require(‘crypto’);const start = Date.now();crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘1:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘2:’, Date.now() — start);
});

The above code outputs this in my device

As it can be seen that both the pbkdf2 instance has finished executing at almost similar times like the earlier, i.e. almost a second. But in a single-threaded environment this should not happen and the second one should end up in almost 2 seconds.

Now here is the new learning. Node is very much single-threaded at its implementations but as soon as it interacts with the libuv component, it is no longer in the javascript component of the ecosystem and shifts to the c++ side where lies a mechanism called thread pool. The thread pool contains 4 threads be default and uses all of them to optimise resources and turn around time. In our case above, libuv just put both of our pbkdf2 instances in two of its thread and runs simultaneously, and hence it took almost the same time. The diagram below explains how multi-threading is happening in our case above.

Now to verify the 4 thread concept, let us try running the 5 instances of the pbkdf2 function and see the output.

const crypto = require(‘crypto’);const start = Date.now();crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘1:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘2:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘3:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘4:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘5:’, Date.now() — start);
});

The output above clearly shows a second delay between executing the first 4 instances and the fifth instance. However, the time of execution of each of the first 4 functions has increased by one second almost for my machine. This has all to do with the number of cores available in the cpu which will run these threads and that angle of this implementation seems a little off-topic to explain at this moment. However, we can clearly see the existence of 4 threads ready inside the thread pool to take your code and start running almost instantly and any other instances have to wait for any of these threads to complete its execution. I encourage you to play around with more number of pbkdf2 instances in your script and find a significantly similar behaviour.

A quick note : The number of threads inside the thread pool is not bounded to 4 only. It can be altered according to your needs. For this, you have to set the process.env.UV_THREADPOOL_SIZE to any numberic value you want. But caution here is that more thread will ensure more concurrent execution and not quick delivery. As in the last case we saw how execution time doubled when running 5 instances with 4 threads. It again boils down to you cpu’s multi threading algorithms and its native power to complete a certain number of instructions provided to them. You may have to be a bit logical while trying to set the thread pool size inside your node script.

Finally, we will try running again five instances of the same pbkdf2 but with five threads inside the thread pool.

process.env.UV_THREADPOOL_SIZE = 5;const crypto = require(‘crypto’);const start = Date.now();crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘1:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘2:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘3:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘4:’, Date.now() — start);
});crypto.pbkdf2(‘password’, ‘salt’, 100000, 512, ‘sha512’, () => {
   console.log(‘5:’, Date.now() — start);
});

Conclusion

NodeJs is not a completely single-threaded runtime. It has certain functions and components that interact with its C++ side of the ecosystem which allows multi-threading to happen through a concept called thread pool. By default, it contains 4 threads which can be changed as per requirements. This is done by a javascript bridging between node and libuv and it is done for tasks that are resource-intensive, time-consuming, and have robust c++ implementations.

NEW TECH UPDATES

Search This Blog

Multi-threading in nodeJs

Conclusion

Comments

Post a Comment

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

Certbot SSL configuration in ubuntu

Working with Node.js streams