Single-Threaded Node.js is a Myth

11 Jul 2024

Node.js is known as a blazingly fast server platform with its revolutionary single-thread architecture, utilizing server resources more efficiently. But is it possible to achieve that amazing performance using only one thread? The answer might surprise you.

In this article, we will reveal all the secrets and magic behind Node.js in a very simple manner.

Process vs Thread ⚙️

Before we begin, we have to understand what a process and a thread are and discover their differences and similarities.

A process is an instance of a program that is currently being executed. Each process runs independently of others. Processes have several substantial resources:

Execution code;
Data Segment - contains global and static variables that need to be accessible from any part of the program;
Heap - dynamic memory allocation;
Stack - local variables, function arguments, and function calls;
Registers - small, fast storage locations directly within the CPU used to hold data temporarily during the execution of programs (like program pointer and stack pointer).

A thread is a single unit of execution within a process. There might be multiple threads within the process performing different operations simultaneously. The process shares execution code, data, and heap with threads, but stack and registers are allocated separately for each thread.

JavaScript is Not Threaded ❗️

To avoid misunderstanding terms, it's important to note that JavaScript itself is neither single-threaded nor multi-threaded. The language has nothing to do with threading. It's just a set of instructions for the execution platform to handle. The platform handles these instructions in its own way - whether in a single-threaded or multi-threaded manner.

I/O operations 🧮

(Or Input / Output operations) are generally considered to be slower compared to other computer operations. Here are some examples:

write data to the disk;
read data from the disk;
wait for user input (like mouse click);
send HTTP request;
performing a database operation.

I/O's are Slow 🐢

You might be wondering why reading data from disk is considered slow? The answer lies in the physical implementation of hardware components.

Accessing the RAM is in the order of nanoseconds, while accessing data on the disk or the network is in the order of milliseconds.

The same applies to the bandwidth. RAM has a transfer rate consistently in the order of GB/s, while the disk or network varies from MB/s to optimistically GB/s.

On top of that, we have to consider the human factor. In many circumstances, the input of an application comes from a real person (like, a key press). So the speed and frequency of I/O doesn't only depend on technical aspects.

I/O's Block the Thread 🚧

I/O's can significantly slow down a program. The thread remains blocked, and no further operations will be executed until the I/O is completed.

Create More Threads! 🤪

Okay, why not just spawn more threads inside the program and handle each request separately? Well, it seems like a good idea. Now, each client request has its own thread, and the server can handle multiple requests simultaneously.

The program needs to allocate additional memory and CPU resources for each thread. This sounds reasonable. However, a significant issue arises when threads perform I/O operations - they become idle and spend most of their time using 0% of resources, waiting for the operation to complete. The more threads there are, the more resources are inefficiently utilized.

On top of that, managing threads is a challenging task leading to potential issues such as race conditions, deadlocks, and livelocks. The operating system needs to switch between threads, which can add overhead and reduce the efficiency gains from multithreading.

What is the Solution? 🤔

Luckily, humanity has already invented smart mechanisms to perform these kinds of operations efficiently.

Welcome to the Event Demultiplexer. It involves a process called Multiplexing - a method by which signals are combined into one signal over a shared resource. The aim is to share a scarce resource (in our case it's CPU and RAM). For example, in telecommunications, several telephone calls may be carried out using one wire.

The responsibilities of the Event Demultiplexer are divided into the following steps:

Identify event Sources. Each source can generate events;
Register event Sources. The registration involves specifying which events to monitor for each source;
Wait for events;
Send event notification.

Important! The Event Demultiplexer is not a component or device that exists in the real world. It's more like a theoretical model used to explain how to handle numerous simultaneous events efficiently.

To understand this complex process, let's go back to the past. Imagine an old phone switchboard: it identifies and registers sources of events (phones) and waits for new events (calls). Once there is a new event (a phone call), the switchboard delivers a notification (lights up a bulb). Then, the switchboard operator reacts to the notification by checking the target phone number and forwarding the call to its desired destination.

For computers, the principle is the same. However, the role of sources is played by things such as file descriptors, network sockets, timers, or user input devices. Each source can generate events like data available to read, space available to write, or connection requests.

Each operating system has already implemented the Event Demultiplexer mechanism: epoll (Linux), kqueue (macOS), event ports (Solaris), IOCP (Windows).

But Node.js is crossplatform. To govern this entire process while supporting cross-platform I/O, there is an abstraction layer that encapsulates these inter-platform and intra-platform complexities and expose a generalized API for the upper layers of Node.

Libuv the King 🏆

Welcome libuv - a cross-platform library (written in C) originally developed for Node.js to provide a consistent interface for non-blocking I/O across various operating systems. Libuv not only interfaces with the system's Event Demultiplexer but also incorporates two important components: the Event Queue and the Event Loop. These components work together to efficiently handle concurrent non-blocking resources

The Event Queue is a data structure where all events are placed by the Event Demultiplexer, ready to be enqueued and processed sequentially by the Event Loop until the queue is empty.

The Event Loop is a continuously running process that waits for messages in the Event Queue and then dispatches them to the appropriate handlers.

Problem Solved? 🥳

This is what happens when we call an I/O operation:

Libuv initializes the appropriate event demultiplexer depending on the operating system;
The Node.js interpreter scans the code and puts every operation into the call stack;
Node.js sequentially executes operations in the call stack. However, for I/O operations, Node.js sends them to the Event Demultiplexer in a non-blocking way. This approach ensures that the I/O operation does not block the thread, allowing other operations to be executed concurrently.
The Event Demultiplexer identifies the source of the I/O operation and registers the operation using the OS's facilities;
The Event Demultiplexer continuously monitors the source (e.g., network sockets) for events (e.g., when data is available to read); When the event occurs (such as data becoming available to read), the 6. Event Demultiplexer signals and adds the event with the associated callback to the Event Queue;
The Event Loop continuously checks the Event Queue and processes the event callback.

What Node.js does is that while one request is waiting, it can handle another request. Node.js does not wait for a request to complete before processing all other requests. By default, all requests you make in Node.js are concurrent - they do not wait for other requests to finish before executing.

Hooray! It seems like the problem is solved. Node.js can run efficiently on a single thread since most of the complexities of blocking I/O operations have been solved by OS developers. Thank you!

Problem is NOT Solved 🫠

But if we take a closer look at the libuv structure, we find an interesting aspect:

Wait, Thread Pool? What? Yes, now we've delved deep enough to answer the main question - Why Node.js is not (entirely) single-threaded?

Unveiling The Secret 🤫

Okay, we have a powerful tool and OS utilities that allow us to run asynchronous code in a single thread.

But here is a problem with Event Demultiplexer. Since the implementation of the Event Demultiplexer on each OS is different, some parts of I/O operations are not fully supported in terms of asynchrony. It is difficult to support all the different types of I/O in all the different types of OS platforms. Those issues are especially related to the file I/O implementations. This also has an impact on some of Node.js's DNS functions.

Not only that. There are other types of I/O's that can not be completed in asynchronous manner, like:

DNS Operations, like dns.lookup can block because they might need to query a remote server;
CPU-bound tasks, like cryptography;
ZIP compression.

For these kinds of cases, the thread pool is used to perform the I/O operations in separate threads (typically there are 4 threads by default). So, the complete Node.js architecture diagram would look like this:

Yes, Node.js itself is single-threaded, but the libraries it uses internally, such as libuv with its thread pool for some I/O operations, are not.

The Thread Pool, in conjunction with the Tasks Queue, is used to handle blocking I/O operations. By default, the Thread Pool includes 4 threads, but this behavior can be modified by providing additional environment variable:

UV_THREADPOOL_SIZE=8 node my_script.js

This is what happens when an I/O operation cannot be performed asynchronously, but the key differences are:

When the Event Demultiplexer identifies the source of I/O operation it registers the operation in the Tasks Queue;
The Thread Pool continuously monitors the Tasks Queue for new tasks;
When a new task is placed in the Tasks Queue, the Thread Pool reacts by handling it with one of the pre-defined threads asynchronously;
After finishing the operation, the Thread Pool signals and adds the event with the associated callback to the Event Queue.

There is no magic here. I/O cannot be actually non-blocking and there is no way to achieve that (at least for now). Data cannot be transferred faster that it dictated by physics constraints. Nothing is perfect, so until we find ways to increase data transfer speeds at the hardware level, we use a set of optimised algorithms to perform asynchronous operations in the most efficient way possible.

Thank you for reading and have a wonderful day :)