Skip to main content

Is the _id Property in MongoDB 100% Unique?

 MongoDB is a NoSQL database that operates with collections and documents. Each document created on MongoDB has a unique object ID property. So when creating a document without entering an ID, the document will be created with an auto-generated ID.

Who Generates the ID?

When filling in the properties of a document, we do not necessarily need to enter the object ID. But when we refer to MongoDB after creating a document, it would have an object ID that looks like this:

{ 
“_id”: “5f1819229fdf8a0c7c2d8c36”
}

This makes it much easier for us when creating documents in MongoDB and saves us a lot of time. The object ID in the documents in MongoDB is created by the MongoDB driver, which talks to MongoDB. Therefore, this brings out a lot of advantages:

  • You do not need to wait for MongoDB to create a new unique identifier.
  • Applications of MongoDB are highly scalable.
  • You can create several instances of MongoDB.
  • There is no need to talk to a central place to get a unique identifier.

Properties of the Object ID

An object ID is 24 characters long with two characters taking up to one byte, thus containing a total of 12 bytes. Here is what the 12 bytes of the object ID tell us.

Timestamp

The first four bytes of the object ID represent the timestamp when the document was created. We avoid the need to create a separate property in the document such as created-at and can thus save time and have optimized lines of code.

Due to the contribution of the timestamp to the object ID, we can obtain the time when the document was created by referring to the ID exclusively. Therefore, when querying the data, we do not need a different method to sort the documents by the timestamp.

You can obtain the timestamp of an object by using the following commands:

const mongoose = require('mongoose');// Create object Id on memory
const id = new mongoose.Types.ObjectId();
// Get timestamp
console.log(id.getTimestamp());

Machine identifier

The next three bytes represent the machine identifier (i.e. the machine that the document was created on). Suppose two documents were created at the same time on different machines. These three bytes would be different, adding to the uniqueness of the object ID.

Process identifier

The next two bytes represent the process identifier (i.e. the process in the machine that the document was created on). Suppose two documents were created at the same time on the same machine, but with different processes. These two bytes will be different, adding to the uniqueness of the object ID.

Counter

The last three bytes represent a counter. This counter is an auto-incrementing number similar to other counter variables in SQL and NoSQL databases that makes the object ID unique (in SQL, this may hinder scalability). Suppose two documents were created at the same time on the same machine and on the same process. The counter bytes will be different, contributing to the uniqueness of the object ID.

The Object ID Problem

In MongoDB, a problem arises with the counter in the object ID that may limit its uniqueness.

The object ID is only unique as long as the counter does not overflow!

The counter overflow problem is when the counter has reached its maximum capacity, leading to documents having the same object ID. Therefore, the object ID is almost unique but not 100% unique!

Here is why: The counter is being allocated three bytes. This means it has the capacity to represent up to 16 million numbers. If that many documents are generated at the same time, on the same machine, and the same process, two documents can share the same object ID.

Using Mongoose

When building applications using Node.js and Express.js, we use mongoose. Mongoose is an abstraction over the MongoDB driver. Therefore, when creating a document, mongoose talks to the MongoDB driver to create a new object ID.

Creating an object ID

// create an object id on memory - Not on DB!const mongoose = require('mongoose');const id = mongoose.Types.ObjectId();
console.log(id);
// output -> 5f1819229fdf8a0c7c2d8c36

Validating object ID

With mongoose, you can also validate the object ID statically using the isValid property:


const mongoose = require('mongoose');
// create object Id
const id = new mongoose.Types.ObjectId();
// validate object Id
const isValid = mongoose.Types.ObjectId.isValid(id);
console.log(isValid);
// Output -> true


Conclusion

The MongoDB object ID is unique unless a certain scenario (discussed above) is reached. Therefore, as software developers, we should consider the size and complexity of the system for a trade-off between the auto-generated ID and a custom object ID.

The approach is entirely up to you to decide. For more information regarding the MongoDB Object ID, you can refer to the official documentation of MongoDB and Mongoose.

I hope this story has educated you on the uniqueness and reliability of the Object ID in MongoDB. Have fun learning and enjoy coding!




















Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...