Skip to main content

Reclaiming Disk Space From MongoDB

If you have used MongoDB, you probably have noticed that it follows a default disk usage policy a bit like "take what you can, give nothing back." Here's a simple example: Let's say you have 10 GB of data in a MongoDB database, and you delete 3 GB of that data. However, even though that data is deleted and your database is holding only 7 GB worth of data, that unused 3 GB will not be released to the OS. MongoDB will keep holding on to the entire 10 GB disk space it had before, so it can use that same space to accommodate new data. You can easily see this yourself by running a db.stats():

A db.stats() example, showing dataSize, storageSize, and fileSize

The dataSize parameter shows the size of the data in the database, while storageSize shows the size of data plus unused/freed space. The fileSize parameter, which is essentially the space your database is taking up on disk, includes the size of data, indexes, and unused/freed space.

MongoDB is commonly used to store large quantities of data, often in read-heavy situations where the amount of data manipulation operations are relatively much less. In this kind of situation, it makes sense to anticipate that if you had to handle a certain amount of data before, then you might have to handle a similar amount again. Nevertheless, there will be situations (your development environment, for example) where you don't want to allow MongoDB to keep hogging all your disk space to itself. So, how would you reclaim this disk space? Depending on your setup and the storage engine you're using for your MongoDB, you have a couple of choices.

Compact

The compact command works at the collection level, so each collection in your database will have to be compacted one by one. This completely rewrites the data and indexes to remove fragmentation. In addition, if your storage engine is WiredTiger, the compact command will also release unused disk space back to the system. You're out of luck if your storage engine is the older MMAPv1 though; it will still rewrite the collection, but it will not release the unused disk space. Running the compact command places a block on all other operations at the database level, so you have to plan for some downtime.

Usage example:

db.runCommand({compact:'collectionName'})

Repair

If your storage engine is MMAPv1, this is your way forward. The repairDatabase command is used for checking and repairing errors and inconsistencies in your data. It performs a rewrite of your data, freeing up any unused disk space along with it. Like compact, it will block all other operations on your database. Running repairDatabase can take a lot of time depending on the amount of data in your db, and it will also completely remove any corrupted data it finds.

RepairDatabase needs free space equivalent to the data in your database and an additional 2GB more. It can be run either from the system shell or from within the mongo shell. Depending on the amount of data you have, it may be necessary to assign a sperate volume for this using the --repairpath option.

Usage examples

In the system shell

mongod --repair --repairpath /mnt/vol1

In the mongo shell

db.repairDatabase()

In the mongo shell, with runCommand

db.runCommand({repairDatabase:1})

Resync

In a replica set, unused disk space can be released by running an initial sync. This involves stopping the mongod instance, emptying the data directory, and then restarting to allow it to reconstruct the data through replication.

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...