What are the best practices of MongoDB 07/12 Update SLTechnology News&Howtos

What are the best practices of MongoDB

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

What are the best practices of MongoDB? I believe many inexperienced people don't know what to do about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Preface

As a solution architect at MongoDB, I spend most of my time interacting with MongoDB customers and users. Here, I hope to collect some best practices that are worth understanding or following in MongoDB development and maintenance through a constantly updated live article. I really hope that you can also participate in maintaining this document so that more users can benefit.

About Security enabling Authentication and Authentication for MongoDB Cluster

Authentication is not enabled under the default installation of the MongoDB server. This means that everyone can connect directly to the mongod instance and perform any database operation. It is recommended to enable authentication http://docs.mongoing.com/manual-zh/tutorial/enable-authentication.html according to the documentation.

Assign different role permissions to different users

MongoDB supports permission systems defined by role. You should explicitly assign users only the appropriate permissions based on the "minimum permissions" rule.

Use a central authentication server

Whenever possible, use a central authentication server such as LDAP, Kerbero, and use a strong password policy.

Create a whitelist for application servers that need to access MongoDB (firewall configuration)

If your server has multiple network cards, it is recommended to listen to the service only on the IP of the intranet.

Use an encryption engine for sensitive data

MongoDB Enterprise Edition supports storage encryption, and an encryption engine should be used to protect sensitive data involving customers.

About deploying replication sets that use at least three data nodes

The recommended minimum deployment for MongoDB is a replication set of three data nodes. Replication sets provide the following benefits:

The system is 99.999% highly available

Automatic failover

Data redundancy

Disaster recovery deployment

Separation of reading and writing

Don't slice too early.

Sharding can be used to expand the reading and writing ability of your system, but it also brings many new challenges, such as the complexity of management, the increase of cost, the challenge of choosing the right key, and so on. In general, you should exhaust other performance tuning options before considering sharding, such as index optimization, pattern optimization, code optimization, hardware resource optimization, IO optimization, etc.

Select the appropriate number of slices

Some of the triggering conditions for sharding are:

The total amount of data is too large to be managed on a server.

The number of concurrency is too high for a server to process in time.

Disk IO pressure is too high

The memory of the stand-alone system is not large enough to hold thermal data.

The processing capacity of the server network card reaches the bottleneck

Want to support localized read and write in the case of multiple deployment

Depending on the trigger conditions of your fragmentation, you can determine the number of slices required according to the total demand and then divided by the capacity of each server.

Deploy enough replication set members for each shard

The data between fragments do not replicate each other. The data of each shard must be highly available within the shard. Therefore, for each shard MongoDB, at least 3 data nodes are required to ensure that the shard will not become unavailable due to the downtime of the primary node.

Select the appropriate chip key

In the slicing scene, one of the most important considerations is to select the appropriate key. Selecting the chip key needs to take into account the read and write mode of the application. Generally speaking, a slice key optimizes either write operation or read operation. Trade-offs should be made based on which operations are more frequent.

The chip key value should have a very high cardinality, that is to say, the chip key has many different values in the collection. For example, _ id is a chip key with a high cardinality because the _ id value does not repeat.

Generally speaking, the chip key should not be continuously growing. For example, timestamp is a growing chip key. This kind of chip key can easily cause the phenomenon of thermal fragmentation, that is, new writes are concentrated on a certain slice.

A good slice key should direct the query to one (or more) shards to improve query efficiency. Generally speaking, this means that the chip key should include the fields used in the most commonly used queries.

Good chip keys should be dispersed enough so that new inserts can be distributed across multiple slices to increase the concurrent write rate.

You can use a combination of several fields to form chip keys for several different purposes (cardinality, dispersion, query orientation, etc.)

About the system using SSD or RAID10 to improve storage IOPS capability

MongoDB is a high-performance and high-concurrency database, and most of its IO operations are random updates. Generally speaking, the native SSD is the best storage solution. If you use an ordinary hard disk, it is recommended to use RAID10 striping to improve the concurrency of IO channels.

Use separate physical volumes for Data and Journal/log

Many performance bottlenecks in MongoDB are related to IO. It is recommended that you set a separate physical volume for the log disk (Journal and Syslog) to reduce the resource consumption of the data disk IO.

The system log can be specified directly on the command line or within the configuration file parameters. Journal logs do not support direct assignment to another directory, which can be solved by creating a symbol link for the Journal directory.

Using the XFS file system

MongoDB recommends using the XFS file system under the WiredTiger storage engine. Ext4 is the most common, but it does not perform well under IO stress due to conflicts between the internal journal and WiredTiger of the ext file system.

Cautious use of oversized cache under WiredTiger

The write failure of WiredTiger occurs asynchronously. The default is to do checkpoint every 60 seconds. Checkpoint requires traversing all the dirty data in memory so that it can be sorted out and then written to the hard disk. If the cache is very large (for example, larger than 128G), then the checkpoint time will take a long time. Data write performance is affected during checkpoint. It is recommended that the actual cache be set to 64GB or below.

Close Transparent Huge Pages

Transparent Huge Pages (THP) is a memory management optimization tool for Linux that reduces the extra overhead of Translation Lookaside Buffer (TLB) by using larger memory pages. Most of the MongoDB database is scattered to read and write a small amount of data, THP will have a negative impact on MongoDB, so it is recommended to close.

Http://docs.mongoing.com/manual-zh/tutorial/transparent-huge-pages.html

Enable Log Rotation

Prevent the log file of MongoDB from growing indefinitely and taking up too much disk space. Good practice is to enable log rotation and clean up the history log files in a timely manner.

Http://docs.mongoing.com/manual-zh/tutorial/rotate-log-files.html

Allocate enough Oplog space

Enough Oplog space ensures that you have enough time to restore a slave node from scratch, or perform some time-consuming maintenance operations on the slave node. Assuming that your longest offline maintenance operation takes H hours, then your Oplog should generally ensure that at least H2 or H3 hours of oplog can be saved.

If your MongoDB deployment is not set to the correct Oplog size, you can adjust it by referring to the link below:

Http://docs.mongoing.com/manual-zh/tutorial/change-oplog-size.html

Close the atime of the database file

Forbidding the system to update the access time of the file will effectively improve the performance of file reading. This can be achieved by adding the noatime parameter to the / etc/fstab file. For example:

/ dev/xvdb / data ext4 noatime 0 0

After modifying the file, you can re-mount:

# mount-o remount / data increases default file descriptor and process / thread limit

The default number of file descriptors and the maximum number of processes for Linux are generally too low for MongoDB. It is recommended that this value be set to 64000. Because the MongoDB server needs a file descriptor for every database file and every client connection. If this number is too small, it may go wrong or fail to respond in the case of large-scale concurrent operations. You can modify these values with the following command:

Ulimit-n 64000ulimit-u 64000 prohibits NUMA

On a multiprocessor Linux system that uses NUMA technology, you should disable the use of NUMA. The performance of MongoDB running in a NUMA environment can sometimes slow down, especially when the process load is high.

Pre-read value (readahead) setting

The pre-read value is an optimization means of the file operating system, that is, when the program requests to read a page, the file system will read the following pages at the same time and return. The reason for this is that many times IO's most time-consuming disk seek. Through pre-reading, the system can return the following data at the same time in advance. Assuming that the program is doing a continuous read, this can save a lot of disk seek time.

MongoDB will do random interviews a lot of times. For random access, this pre-reading value should be set to a small value. Generally speaking, 32 is a good choice.

You can use the following command to display the pre-read values for the current system:

Sudo blockdev-report

To change the preview value, you can use the following command:

Sudo blockdev-setra 32

Replace it with a suitable storage device.

Use NTP time server

When using MongoDB replication sets or sharding clusters, be sure to use NTP time servers. This ensures correct synchronization between the principles of MongoDB clustering.

About monitoring and backup to monitor and alarm important database indicators

Key indicators include:

Disk Space disk space

CPU

RAM utilization rate

Ops Counter addition, deletion, modification and search

Replication Lag replication delay

Number of Connections connections

Oplog Window

Monitor the slow query log

By default, MongoDB records database operations that exceed 100ms in the log file (mongod.log).

About indexing build the appropriate index for each of your queries

This is a collection with a large amount of data, for example, more than tens of millions (number of documents). If there is no index MongoDB, you need to read all the Document from the disk to memory, which will put great pressure on the MongoDB server and affect the execution of other requests.

Create appropriate composite indexes and do not rely on cross-indexes

If your query uses multiple fields, MongoDB has two indexing techniques available: cross-indexing and combinatorial indexing. Cross-index is to establish a single-field index for each field, and then use the corresponding single-field index to get the query results. Cross-indexing currently has a low trigger rate, so if you have a multi-field query, it is recommended to use a combined index to ensure the normal use of the index.

For example, if the app needs to find all Shenzhen Marathon runners under the age of 30:

Db.athelets.find ({sport: "marathon", location: "sz", age: {$lt: 30})

Then you may need an index like this:

Db.athelets.ensureIndex ({sport:1, location:1, age:1}); combined index field order: matching condition comes first, range condition comes after (Equality First, Range After)

As an example, when creating a composite index, the matching condition (sport: "marathon") should precede the composite index if there is a match and a range. Range condition (age:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.