Let’s Talk about MongoDB’s Compact Command

By Zhang Youdong.

To understand how the compact command and other related commands work in MongoDB, let’s discuss it through addressing a series of questions that may pop up in somebody’s head when they first come across the command.

Stay tuned for more helpful articles and tutorials about MongoDB from the team of engineers at Alibaba Cloud ApsaraDB for MongoDB.

Why Use the Compact Command

The compact command, as its name suggests, can help to compact the shard space, which can be extremely useful in MongoDB. Also, this is something the remove command simply doesn’t do. To make things even clearer, consider the visualization shown below:

The Remove and Drop Commands and How They Compare with the Compact Command

Now let’s discuss two related commands and how they are different from each other, but before we do so, let’s see how exactly each one of them deletes files:

  • db.collection.remove({}, {multi: true}): With this command, files are removed from B-tree one by one, and finally all files are removed, but the physical space of the files will not be reclaimed, as shown in the figure above.
  • db.collection.drop(): With this command, the physical files in the collection are dropped, and the space is immediately reclaimed.

As you can infer from the above descriptions, the remove command generates logical free space, which can be used to write new data immediately, but the total physical space that is occupied by the file will not be reclaimed immediately. But, generally, as long as the data is continuously written, physical space being fragmented is typically not a major cause of concern, and the compact command is not required for the collection.

In some scenarios, however, after a large amount of data is removed, read and write performance may be affected with there being fewer subsequent writes as a result. So, if you want to reclaim space, you may need to explicitly call the compact command. Hence, this is just one more reason why you may want to run the compact command instead of either one of these commands.

How the Compact Command Affects Write Performance

Well, the compact command isn’t perfect either and does have its drawbacks, too, and so you’ll have to be careful when you use it. Let me explain.

When you use the compact command to compact a collection, a mutex (mutual exclusion object) write lock will be added to the database where the collection is located, which may cause all read/write requests to the database to be blocked. And, as a result, the compact command may take a long time to execute. The time it takes generally corresponds to the amount of data in the collection.

This, of course, still doesn’t subtract all of the benefits of the compact command, though. But, generally it’s recommended to perform a compact command during off-peak hours so to avoid any pesky interruptions to your business services. This is something we recommend all our customers do.

How the Compact Command Works

The compact command is ultimately completed by the storage engine WiredTiger. That is, when you run a compact command, WiredTiger constantly writes the data in the background of the collection file to the idle space in the front, and then gradually truncates the file to reclaim physical space.

Before performing each round of the compact command, WiredTiger also checks whether one of the following conditions are met.

  1. Of the 80% of the space in the front of the file, 20% of the space is free for writing 20% of the data in the back of the file;
  2. Of the 90% of the space in the front of the file, 10% of the space is free for writing 10% of the data in the back of the file.

If neither one of the above conditions is met, it means that performing the compact command wouldn’t be able to reclaim at least 10% of the physical space, as is the case with the second condition. In this case, the compact command would be quit.

In other words, sometimes when a large collection is compacted, the compact command immediately returns “OK,” but in reality the physical space of the collection remains unchanged. This is because WiredTiger deems that the collection does not need to be compacted.

How Much Space Is Reclaimed by the Compact Command

To know how much space will be reclaimed by the compact command, you need to know the amount of empty space available to be reused by WiredTiger. The amount of empty space available is reflected in the output of db.collection.stats() under the heading wiredTiger.block-manager.file bytes available for reuse. Consider the following example for reference:

For more information about other related questions, check out this FAQ page, which covers more related issues.

What to Do Before You Run the Compact Command

Before running the compact command, you’ll want to make sure that you have read the content covered in this blog, especially the stuff covered below, and understand the principle and impact of the compact command.

You can read more about this here.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store