What does MongoDB database capacity planning mean? 07/01 Update SLTechnology News&Howtos

What does MongoDB database capacity planning mean?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

MongoDB database capacity planning refers to what, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

What is MongoDB database capacity planning?

In the final analysis, the storage we are talking about is an application software based on the operating system, and the operating system can make use of nothing more than the storage of the system: disk, memory, CPU cache, and so on. Therefore, the significance of capacity planning is to reasonably estimate the storage capacity, performance and system configuration of the machine according to the characteristics of the data to be stored and the amount of data over a period of time.

Usually this problem can be reduced to an estimate of memory, hard disk capacity and performance in actual capacity planning.

Let's take MongoDB as an example to talk about some calculation methods of capacity estimation. One of our usual rough requirements is to put all the hot data in memory. The hot data here may include frequently accessed data, indexes, and system overhead. Let's explain these three aspects:

Frequently accessed data

You can estimate data access according to different application scenarios. For example, if you use MongoDB to save the content of posts, the size of each post is 1k. At present, there are 100m posts and 100w posts are added every day. Then the number of posts will be about 200m after 3 months, requiring 200G of hard disk space.

The 100w posts added every day are visited frequently, while maybe the posts we visit actively every day are 200w, that is to say, the other 100w are previous posts. If we plan to give thermal data a memory size of 1G, we can only hold 100w of data, not 200w of data. Due to the randomness of post access, the worst-case scenario is that the data we visit every time is not in memory (for example, access 100w that is not in memory first, load it into memory, then access the data that has just been changed to disk, and need to load again), need to carry out disk IO the same number of times as PV, disaster! In the best case, we also need 100w disk IO (for example, frequent access to 100w data in memory, and then frequent access to 100w data that is not in memory). In the case of uniform access frequency, we need to perform disk IO about 12 times per second.

Let's adjust the memory planned for hot data to 2G and see what happens, when 200w of hot data can be installed in a day. In the best case, 100w disk IO is required (for example, 100w old data in 200w data are all in yesterday's hot data, then only 100w data need to be reloaded). In the case of uniform access frequency, disk IO needs to be performed about 12 times per second. In the worst case, 200w disk IO is required (for example, 200w pieces of data accessed today do not coincide with yesterday's hot data), and with uniform access frequency, you need to do about 25 disk IO per second.

What is MongoDB database capacity planning?

Similarly, if we increase the memory, the data we need to access will be more likely to be in memory, thus reducing the frequency of disk IO.

The above is a simple example where you can evaluate and calculate according to your own data access characteristics, not only the average IO, but also the peak IO.

At the same time, don't forget that MongoDB will regularly call fsync to flush the dirty pages in memory to disk (by default, once a minute). You can evaluate the IO each time according to the amount or proportion of your own dirty data, and then you will consider whether you need to lower the frequency of fsync.

If you also turn on journaling log, then this amount of IO also needs to be added.

Of course, the next thing is whether your disk can withstand the final IO, and then you can consider whether you need to use a faster hard drive, whether you need RAID, whether you need to switch to SSD, and so on.

Index quantity

Unlike frequently accessed data, indexes need to be all in memory, so it is relatively easy to calculate the capacity of the index. You can see the size of the index you are currently occupying through the db.stats () command of MongoDB. For example, in the above example, if the index size of 100 million pieces of data is 5 gigabytes, it would take about 10 gigabytes for 200 million entries. So the 10G index must be installed in memory.

System overhead

The cost of MongoDB daemon can basically be understood as a constant, so the system overhead here is mainly connection overhead. It depends on the characteristics of your application. For example, your maximum concurrency operation is 100. That is, 100 connections are connected to the MongoDB at the same time. One thread cost per connection is set to the system's stack size, which defaults to 10m, which is 1G (of course you can adjust this value appropriately). If you need to sort the data in real time, you need to take into account the memory overhead when sorting.

Summary

Of course, the above is a simple estimation method, we do not expect to be able to calculate the real capacity estimation results, after all, the change of Internet products is always so uncontrollable. However, it is very important to estimate the relevant capacity according to the business situation before deployment. A good estimate can get a relatively balanced result in terms of money, performance and operation and maintenance costs.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.