MongoDb Optimization Guid 04/27 Update SLTechnology News&Howtos

MongoDb Optimization Guid

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Why choose MongoDB?

1. Performance

In the era of big data, the processing of a large amount of data has become one of the most important reasons for considering a database. One of the main goals of MongoDB is to maintain excellent database performance as much as possible, which largely determines the design of MongoDB. In an era dominated by traditional mechanical hard drives, hard drives are likely to become a deficiency in performance, while MongoDB chooses to maximize the use of memory resources as a cache for excellent performance, and automatically selects the fastest index for query. MongoDB simplifies the database as much as possible, leaving as much operation to the client as possible, which is one of the reasons why MongoDB can maintain excellent performance.

2. Expansion

Now the amount of data on the Internet has changed from the MB and GB in the past to the current TB level. A single database is obviously unbearable, and scalability has become an important topic. However, today's developers often have difficulties in choosing the way to expand. Do they choose horizontal expansion or vertical expansion?

Scale-out (scale out) is to split the database into different chunks to distribute to different machines by adding partitions, which has the advantage of low expansion cost but difficult to manage.

Scale up) Vertical expansion differs from horizontal expansion in that it upgrades the original server to give it more computing power. The advantage is that it is easy to manage without considering the many problems caused by expansion, but the disadvantage is also obvious, which is the high cost. A mainframe is often very expensive, and such an upgrade may not find a machine with more computing power when the data reaches its limit.

MongoDB, on the other hand, chose a more economical scale-out, which can easily split the data into different servers. And developers do not need to consider the problems brought by multiple servers when obtaining data. MongoDB can automatically route developers' requests to the correct server, allowing developers to break away from the disadvantages of scale-out and focus more on program development.

3. Use

MongoDB adopts the design method of NoSQL, which can manipulate the data more flexibly. In the traditional RDBMS, you must have encountered dozens or even hundreds of lines of complex SQL statements. The traditional RDBMS SQL statements contain a large number of associations, subqueries and other statements, which not only increase the complexity, but also make performance tuning more difficult. The document-oriented (document-oriented) design of MongoDB uses more flexible documents as a data model to replace rows in RDBMS. Document-oriented design makes the way for developers to obtain data more flexible, and even developers can query complex nested relationships with only one sentence, so that developers do not have to rack their brains to get data.

2. The influence of NoSQL on traditional database design thinking.

1. Pre-design mode and dynamic mode

In the traditional database design thinking, the design phase of the project needs to specify the field names, field types and field types in the database table. If you try to insert data that does not conform to the design, the database will not accept this data to ensure the integrity of the data.

-- Database fields: NAME, SONG

INSERT INTO T_INFO VALUES ('John','Come Together');-- successful INSERT INTO T_INFO VALUES (' Xiaoming', 20, 'xiaoming@111.com');-- failure

NoSQL dynamically appends documents (similar to "rows") in a collection (similar to "table"). No data type is defined at the beginning of the collection, and any document can be appended to any collection. For example, we can add such two documents to a collection:

{"name": "John", "song": "Come Together"} {"name": "Xiaoming", "age": "20", "email": xiaoming@111.com}

The format of the document in MongoDB is similar to our common JSON, so we can see that the first one has two fields "name" and "song", and the second has three fields "name", "age" and "email". It is impossible to insert successfully in the database in pre-design mode, but it is possible in dynamic mode in MongoDB, which has the advantage that we do not have to do so for a small number. However, many kinds of fields can be designed in a separate table, and they can be stored in a single table, but the disadvantages are also obvious. when we get data, we need to distinguish between different documents in the same table. increased the amount of code on development. Therefore, at the beginning of the design, it is necessary to weigh the advantages and disadvantages of the dynamic schema to select the data types in the table.

2. Normalization and anti-normalization

Normalization is the concept put forward by Edgar Coder, the inventor of relational model in 1970. Normalization will spread the data into different tables and associate it with relational model. The advantage is that when you modify it later, it will not affect the data associated with it, but can only modify itself.

Denormalization is the opposite idea of normalization, which stores the data of the current document in this table instead of splitting it.

There is no good or bad problem between normalization and anti-normalization. The advantage of normalization is that it can provide higher performance when we write, modify and delete, while anti-normalization can improve our performance when querying. Of course, there is no associated query in NoSQL to improve query performance, but we can still model it by storing the associated table ID in the table. But it can be seen that the status of anti-paradigm in the concept of NoSQL is greater than that of paradigm.

3. Performance and number of users

"how can you make the software have higher performance?" I think this is a question that most developers have thought about. Performance often determines the quality of a software, if you develop an Internet product, then your product performance will be more tested, because you are facing a large number of Internet users, they are not so patient. Seriously, every second increase in page loading speed may cause you to lose some users, that is, the loading speed is inversely proportional to the number of users. So what is the loading speed that users can accept?

As shown in the figure, if the page load time is more than 10s, then the user will leave. If 1s--10s, you need a prompt, but how fast do we need to load if our page is not prompted? Yes, 1s.

Of course, this is from the point of view of a product manager, but what about from a technician's point of view? The loading speed is proportional to the number of users, and the more users you have, the more data you need to process, and the slower the loading speed will be. This is a very interesting thing, so if your product is an exciting product, then what you need to do as a technician is to increase the performance of the software and the number of users at the same time, even faster than the number of users.

The impact of database performance on the overall performance of the software is self-evident, so how to improve database performance when we use MongoDB?

4. Normalization and anti-normalization

In the project design phase, defining the purpose of the collection is a very important step for performance tuning.

From the perspective of performance optimization, the design of the collection needs to consider the common operations of the data in the collection. For example, we need to design a log collection, and the log is viewed frequently, but the write frequency is very high, so we can get that the common operation in this collection is to update (add, delete, modify). What if we want to save a list of cities? Obviously, this collection is a collection with a high frequency of viewing but a low frequency of writing, so the common operation is query.

For the collections of frequent updates and frequent queries, the most important thing we need to pay attention to is the degree of their normalization. In the introduction of normalization and anti-normalization, we learned that the rational use of normalization and anti-normalization is very important for the improvement of performance. However, the use of this design is very flexible, and assuming that we now need to store a book and its author, the association in MongoDB can be reflected in the following forms:

1. Complete separation (stylized design)

Example 1:

{"_ id": ObjectId ("5124b5d86041c7dca81917"), "title": "how to use MongoDB", "author": [ObjectId ("144b5d83041c7dca84416"), ObjectId ("144b5d83041c7dca84418"), ObjectId ("144b5d83041c7dca84420"),]}

We added the id array of the author (comment) to the book as a field. This kind of design is commonly used in non-relational databases, which is what we call paradigm design. In MongoDB, we extract the books that are not directly related to the primary key to another collection, and make the association query by storing the primary key. When we want to query articles and comments, we need to first query the articles we need, then get the comment id from the article, and finally use the complete article and its comments. In this case, the query performance is obviously not ideal. However, when an author's information needs to be modified, the advantage of stylized maintenance is highlighted, and we can modify the author's field without considering the book associated with the author.

2. Fully embedded (anti-stylized design)

Example 2:

{"_ id": ObjectId ("5124b5d86041c7dca81917"), "title": "how to use MongoDB", "author": [{"name": "Ding Lei"age": 40, "nationality": "china",} {"name": "Jack Ma", "age": 49, "nationality": "china",}, {"name": "Zhang Zhaozhong", "age": 59, "nationality": "china" },]}

In this example, we completely embed the author's field into the book, and when querying the book, we can get all the information of the corresponding author, but because an author may have more than one book, when modifying an author's information, we need to go through all the books to find the author and modify it.

3. Partial embedding (compromise)

Example 3:

{"_ id": ObjectId ("5124b5d86041c7dca81917"), "title": "how to use MongoDB", "author": [{"_ id": ObjectId ("144b5d83041c7dca84416"), "name": "Ding Lei"}, {"_ id": ObjectId ("144b5d83041c7dca84418") "name": "Jack Ma"}, {"_ id": ObjectId ("144b5d83041c7dca84420"), "name": "Zhang Zhaozhong"},]}

This time we extract the most commonly used part of the author field. When we only need to get the name of the book and the author, we do not need to query the author collection again, we can get it only in the book collection query.

This method is a relative compromise, which ensures not only the query efficiency, but also the update efficiency. However, this method is obviously more difficult to master than the first two, and the difficulty lies in the need to combine with the actual business to find a suitable extraction field. As shown in example 3, the name is obviously not a frequently modified field, and it is fine to extract such a field, but if the extracted field is a frequently modified field (such as age), we still need to update this field extensively and update it accordingly.

Of the above three examples, the first example has the highest update efficiency, but the lowest query efficiency, while the second example has the highest query efficiency, but the least update efficiency. So in the actual work, we need to design the fields in the table according to our actual needs, in order to achieve the highest efficiency.

5. Understand the filling factor

What is the filling factor?

The fill factor (padding factor) is the growth space reserved by MongoDB for the expansion of documents, because MongoDB documents are stored in sequential tables, and each document is very compact, as shown in the figure.

(note: source of the picture: "MongoDB The Definitive Guide")

1. There is no extra room for growth between elements.

two。 When we increase the size of an element in the sequence table, it will result in insufficient space allocated and can only be asked to move backwards.

3. When the modified element is moved, the subsequent inserted document will provide a certain filling factor, so as to facilitate the frequent modification of the document. If there is no more document moving due to increase, the filling factor of the subsequent inserted document will be reduced accordingly.

The understanding of the filling factor is important because the movement of the document consumes performance, and frequent movement will greatly increase the burden of the system. In actual development, the factor that is most likely to make the document larger is the array. So if our documents will be frequently modified and increase the space, then we must fully consider the filling factor.

So how can we improve performance if our documentation is a frequent extension?

Two schemes

1. Increase the initial allocation space. Include a usePowerOf2Sizes property in the property of the collection, and when this option is true, the system allocates the initial space of subsequent inserted documents to the N power of 2.

This allocation mechanism is suitable for a collection where data will change frequently, and it will leave more space for each document, but the allocation of space will not be as efficient as it used to be. if your collection does not move frequently when updating, this allocation will lead to a relatively slow write speed.

2. We can use the data to forcibly expand the initial allocation space.

Db.book.insert ({"name": "MongoDB", "publishing": "Tsinghua University Press", "author": "john"tags": [] "stuff": "ggggggggggggggggggggggggggggggggggggg"})

Yes, it may not look very elegant. But sometimes it works! When we make incremental changes to this document, we just delete the stuff field. Of course, you can name this stuff field whatever you want, including the padding characters in it, you can also add it at will.

6. Accurate use of the index

I believe you must understand the impact of indexes on a database. If a query command enters the database and the query optimizer does not find a suitable index, then the database will perform a full set scan (also known as full table scan in RDBMS), and the impact of full set queries on performance is catastrophic.

A query without an index is like getting a word you want from the irregular vocabulary of a dictionary, but this dictionary has no catalogue and can only be found page by page. Such a search may take you hours, but if you are asked to query words as often as users visit. Hey, I'm sure you'll yell, "I quit!" . Obviously the computer will not shout like this, it has always been a diligent employee, no matter how harsh the request, he will complete. So please be kind to your computer through the index: D.

The type of index in MongoDB is roughly the same as in RDBMS. Let's not repeat it too much. Let's take a look at how indexes can be used more efficiently in MongoDB.

6.1 the fewer indexes, the better

Indexes can greatly improve query performance, so is it better to have as many indexes? The answer is no, and the more indexes the better, but the fewer the better. Whenever you build an index, the system will add an index table for you to index the specified column, however, when you insert or modify the indexed column, the database needs to reorder the original index table, the process of reordering consumes performance very much, but the pressure to deal with a small amount of index is not very great, but if the number of indexes is large, the impact on performance can be imagined. Therefore, when creating the index, we need to be careful to establish the index, and we should give full play to the function of each index, that is to say, when the index needs can be met, the less the number of indexes, the better.

Implicit index

/ / build composite index db.test.ensureIndex ({"age": 1, "no": 1, "name": 1}) We can sort age,no fields quickly when querying, implicit index means that if the fields we want to sort are included in the established composite index, there is no need to repeat the index. Db.test.find () .sort ("age": 1, "no": 1) db.test.find () .sort ("age": 1)

Such as the above two sort queries, you can use the above composite index without the need to re-establish the index.

Flip index

/ / create a composite index db.test.ensureIndex ({"age": 1})

Flipping the index is easy to understand, that is, we do not need to consider the direction of the index column when sorting the query. For example, in this case, we can write the sorting condition as "{'age': 0}", which still does not affect performance.

6.2 the smaller the index column particles, the better

What do you mean, the smaller the particles, the better? The number of repetitions of each data in an index column is called a grain, also known as the cardinality of the index. If the size of the data is too large, the index will not perform as well as it should. For example, we have a "age" column index, and if 20 years old accounts for 50% of the "age" column, if we want to query a 20-year-old person named "Tom" now, we need to query 50% of the data in the table, and the role of the index is greatly reduced. Therefore, when establishing the index, we should try our best to put the columns with small data particles on the left side of the index to ensure that the index plays the maximum role.

Summary

The above is the MongoDb optimization guide introduced by the editor. I hope it will be helpful to you. If you have any questions, please leave a message for me, and the editor will reply to you in time. Thank you very much for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.