Skip to main content

Reclaiming Disk Space From MongoDB

If you have used MongoDB, you probably have noticed that it follows a default disk usage policy a bit like "take what you can, give nothing back." Here's a simple example: Let's say you have 10 GB of data in a MongoDB database, and you delete 3 GB of that data. However, even though that data is deleted and your database is holding only 7 GB worth of data, that unused 3 GB will not be released to the OS. MongoDB will keep holding on to the entire 10 GB disk space it had before, so it can use that same space to accommodate new data. You can easily see this yourself by running a db.stats():

A db.stats() example, showing dataSize, storageSize, and fileSize

The dataSize parameter shows the size of the data in the database, while storageSize shows the size of data plus unused/freed space. The fileSize parameter, which is essentially the space your database is taking up on disk, includes the size of data, indexes, and unused/freed space.

MongoDB is commonly used to store large quantities of data, often in read-heavy situations where the amount of data manipulation operations are relatively much less. In this kind of situation, it makes sense to anticipate that if you had to handle a certain amount of data before, then you might have to handle a similar amount again. Nevertheless, there will be situations (your development environment, for example) where you don't want to allow MongoDB to keep hogging all your disk space to itself. So, how would you reclaim this disk space? Depending on your setup and the storage engine you're using for your MongoDB, you have a couple of choices.

Compact

The compact command works at the collection level, so each collection in your database will have to be compacted one by one. This completely rewrites the data and indexes to remove fragmentation. In addition, if your storage engine is WiredTiger, the compact command will also release unused disk space back to the system. You're out of luck if your storage engine is the older MMAPv1 though; it will still rewrite the collection, but it will not release the unused disk space. Running the compact command places a block on all other operations at the database level, so you have to plan for some downtime.

Usage example:

db.runCommand({compact:'collectionName'})

Repair

If your storage engine is MMAPv1, this is your way forward. The repairDatabase command is used for checking and repairing errors and inconsistencies in your data. It performs a rewrite of your data, freeing up any unused disk space along with it. Like compact, it will block all other operations on your database. Running repairDatabase can take a lot of time depending on the amount of data in your db, and it will also completely remove any corrupted data it finds.

RepairDatabase needs free space equivalent to the data in your database and an additional 2GB more. It can be run either from the system shell or from within the mongo shell. Depending on the amount of data you have, it may be necessary to assign a sperate volume for this using the --repairpath option.

Usage examples

In the system shell

mongod --repair --repairpath /mnt/vol1

In the mongo shell

db.repairDatabase()

In the mongo shell, with runCommand

db.runCommand({repairDatabase:1})

Resync

In a replica set, unused disk space can be released by running an initial sync. This involves stopping the mongod instance, emptying the data directory, and then restarting to allow it to reconstruct the data through replication.

Comments

Popular posts from this blog

How to use Ngx-Charts in Angular ?

Charts helps us to visualize large amount of data in an easy to understand and interactive way. This helps businesses to grow more by taking important decisions from the data. For example, e-commerce can have charts or reports for product sales, with various categories like product type, year, etc. In angular, we have various charting libraries to create charts.  Ngx-charts  is one of them. Check out the list of  best angular chart libraries .  In this article, we will see data visualization with ngx-charts and how to use ngx-charts in angular application ? We will see, How to install ngx-charts in angular ? Create a vertical bar chart Create a pie chart, advanced pie chart and pie chart grid Introduction ngx-charts  is an open-source and declarative charting framework for angular2+. It is maintained by  Swimlane . It is using Angular to render and animate the SVG elements with all of its binding and speed goodness and uses d3 for the excellent math functio...

Understand Angular’s forRoot and forChild

  forRoot   /   forChild   is a pattern for singleton services that most of us know from routing. Routing is actually the main use case for it and as it is not commonly used outside of it, I wouldn’t be surprised if most Angular developers haven’t given it a second thought. However, as the official Angular documentation puts it: “Understanding how  forRoot()  works to make sure a service is a singleton will inform your development at a deeper level.” So let’s go. Providers & Injectors Angular comes with a dependency injection (DI) mechanism. When a component depends on a service, you don’t manually create an instance of the service. You  inject  the service and the dependency injection system takes care of providing an instance. import { Component, OnInit } from '@angular/core'; import { TestService } from 'src/app/services/test.service'; @Component({ selector: 'app-test', templateUrl: './test.component.html', styleUrls: ['./test.compon...

How to solve Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded

During the automation of multiple tasks on my job and personal projects, i decided to move on  Puppeteer  instead of the old school PhantomJS. One of the most usual problems with pages that contain a lot of content, because of the ads, images etc. is the load time, an exception is thrown (specifically the TimeoutError) after a page takes more than 30000ms (30 seconds) to load totally. To solve this problem, you will have 2 options, either to increase this timeout in the configuration or remove it at all. Personally, i prefer to remove the limit as i know that the pages that i work with will end up loading someday. In this article, i'll explain you briefly 2 ways to bypass this limitation. A. Globally on the tab The option that i prefer, as i browse multiple pages in the same tab, is to remove the timeout limit on the tab that i use to browse. For example, to remove the limit you should add: await page . setDefaultNavigationTimeout ( 0 ) ;  COPY SNIPPET The setDefaultNav...