In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
What are the MongoDB design naming conventions? in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.
1. Library
1. Library names are all lowercase, any special characters other than `_` are prohibited, and library names that start with numbers are prohibited, such as `123abc`.
two。 Libraries exist in the form of folders, and the use of special characters or other irregular naming methods can lead to naming confusion
3. The database name is up to 64 characters
4. Before creating a new library, you should try to evaluate its volume, QPS, etc., and discuss with DBA in advance whether a new library or a new cluster should be created specifically for the library.
After a developer got the MongoDB provided by DBA, due to the loose permission control of MongoDB, the developers of the business did not bother to discuss with DBA when creating collections, but randomly created all the collections in one library. There was no problem at first, because the number of business requests was not large. Half a year later, the business grew to a relatively large level, and at this time the developer launched a new project with a large amount of writing, most of which were batch updates, because all the collections were stored in one repository. The batch update of this new project brings frequent locks, Imax O average, and so on. Finally, the development and DBA work together to split the library into several new libraries, transforming a library N set into a single library set, and the performance problem is easily solved.
two。 Set
1. Collection names are all lowercase. Any special characters other than `_` are prohibited. Collection names that start with numbers are prohibited. For example, `123abc`, system is prohibited. System is the system collection prefix.
two。 The collection name is up to 64 characters
3. Writing large collections in a library will affect the read and write performance of other collections. If the more prosperous collections are in one DB, a maximum of 80 collections are recommended. At the same time, the performance of disk Icano should also be considered.
4. If the collection of the evaluation sheet has a large amount of data, you can split a large table into multiple small tables, and then store each small table in a separate library or sharding table.
5. The collection of MongoDB has the function of "automatically cleaning up expired data". You only need to add a TTL index to the time field of the document in the collection, but it should be noted that the type of this field must be mongoDate (), which must be combined with the actual business design.
6. Design polling set-whether the collection is designed as a Capped restriction set, be sure to design whether it is needed according to the actual business
7. Create a collection rule
Different business scenarios can be configured for different configurations.
a. If it is a table that reads more and writes less, we can set the page size as small as possible when creating a table, such as 16KB, if the data volume of the table is not too large (
"internal_page_max=16KB,leaf_page_max=16KB,leaf_value_max=8KB,os_cache_max=1GB"
b. If the table that reads more and writes less has a large amount of data, you can set a compression algorithm for it, for example:
"block_compressor=zlib,internal_page_max=16KB,leaf_page_max=16KB,leaf_value_max=8KB"
c. Note: do not use this zlib compression algorithm, it consumes a lot of cpu, if you use snapp, it consumes 20% cpu, and if you use zlib, you can consume 90%cpu, even 100%.
d. If it is a table that writes more and reads less, you can set leaf_page_max to 1MB and enable the compression algorithm, or you can set an os_cache_max value for the page cache size at the operating system level, so that it does not take up too much page cache memory and prevent reading operations from being affected.
e. Case
Db.createCollection (
"logs"
{storageEngine: {wiredTiger: {configString: "internal_page_max=16KB,leaf_page_max=16KB,leaf_value_max=8KB,os_cache_max=1GB"}
)
f. Description
Read more and write less.
Internal_page_max=16KB defaults to 4KB
Leaf_page_max=16KB defaults to 32KB
Leaf_value_max=8KB defaults to 64MB
Os_cache_max=1GB defaults to 0
Read more tables with less writing and have a large amount of data
Block_compressor=zlib defaults to snappy
Internal_page_max=16KB defaults to 4KB
Leaf_page_max=16KB defaults to 32KB
Leaf_value_max=8KB defaults to 64MB
3. Document
1. Key in the document forbids the use of any special characters other than `_`
two。 Try to store the same type of documents in one set and spread different types of documents in different sets; the same type of documents can greatly improve the utilization of the index. if the documents are mixed, there may be a situation where queries often need full table scans.
3. Prohibit the use of _ id, such as writing custom content to _ id
There is a serious write performance problem in the MongoDB of a business, which is roughly as follows: the IO runs full when the write reaches 300 paces. It is found that the service has written disordered md5-like data into _ id for convenience during the design. The table of MongoDB is similar to InnoDB, it is an index organization table, the data content follows the primary key, and _ id is the default primary key in MongoDB. Once the value of _ id is non-self-increasing, when the amount of data reaches a certain extent, each write may lead to a substantial adjustment of the binary tree of the primary key, which will be a costly write, so the write will decrease with the increase of the amount of data. So be sure not to write custom content in _ id.
4. Try not to make array fields a query condition.
A business creates an index on the array field of a table. After the creation, it is found that the size of the table has increased a lot. Troubleshooting found that it is due to the substantial increase in the index volume. In MongoDB, if you add an index to an array field, then MongoDB will take the initiative to add an independent index for all the elements in the array, for example: for the array field {a: [XQuery Z]} add index {aura 1}, actually the added index is:
{a: [x:1]}
{a: [y:1]}
{a: [z:1]}
If there are 11 elements in the array field of the service, then 11 indexes have been created at a time, which is the root cause of the substantial increase in index size. In addition, if there are array fields in the composite index, then MongoDB creates a separate index for the combination of each element and other fields, for example, adding indexes {a: [xonomy _ z]} and {b:qqq} to the array fields {a _ _ 1 ~ bv _ 1}. In fact, the added index is:
{a: [x:1], b:1}
{a: [y:1], b:1}
{a: [z:1], b:1}
If there are two array fields in a combined index, the number of indexes will be the Cartesian product of the elements in the two array fields, so MongoDB does not allow more than one array field in the index.
5. If the field is large, it should be compressed and stored as much as possible.
A service has been normal since it was launched, but after three times the volume, it was found that the network card traffic alarm and IO pressure alarm of the MongoDB server. During the investigation, it was found that the service said a very long text field was stored in MongoDB, and the average volume of this field reached 7K. In the scenario where the concurrency is 2000QPS, one to 20 pieces of data are fetched at a time, resulting in the MongoDB sending nearly 100MB data per second. For the database, both reads and writes are random IO, so in such a large data throughput scenario, IO reaches the alarm threshold.
Because the text is a sample that is easy to compress, we compress and store the field so that its average volume is reduced to 2K, while the decompression is processed on the business side, and the throughput is finally reduced to about 20MB/S.
If the field is large and will become a query condition, such as a long list of url, try to convert it to md5 and store it.
A stress test is carried out before a business is launched, and it is found that the query performance in a certain scenario is not satisfactory. During the inspection, it is found that the query conditions in this scenario are similar: {url:xxxx}, while most of the values in the url field are very long, and the average volume of this field reaches 0.5K. In this case, the volume of the index will become very large, so that although the request can go through the index, the efficiency is not ideal. So dba works with business development to optimize the scenario:
1. After changing the content of the field from the real url to the value of the url content md5, the size of the field has been greatly reduced to 32 bits.
two。 When querying, the user requests to query through url, and at this time the program will md5 the url, and then use the obtained value to query, because the size is greatly reduced, so the query speed has been greatly improved. After optimization, the pressure test is carried out again, and the performance is up to the standard, which is 6 times that of the previous.
6. Because MongoDB is case-sensitive, if the field does not need to be case-sensitive, in order to improve query efficiency, we should try to store uniform case data, such as all lowercase or add an auxiliary field with uniform case to the field.
A business needs to query according to the field {a:XxX}. In MongoDB, the value of an is case-sensitive and cannot be configured to ignore case, but the business scenario needs to ignore case in order to meet the query requirements. This contradiction of whether it is case-sensitive or not causes the business to use rules to match: {a:/xxx/i}, the I parameter indicates that case is ignored in the regular, and it is found after launch. Query performance is very poor. In a collection of 2 million documents, a query takes 2.8 / 7 seconds. When the 50QPS is reached, the CPU of the server where the MongoDB instance resides reaches 973%.
When MongoDB uses regularization in query conditions, it can use indexes like ordinary exact matching to achieve efficient queries, but once the parameter I is used to ignore the case of the query optimizer, the case of each data needs to be adjusted and then matched, and the request becomes a full table scan, which is the root cause of inefficiency.
For this scenario, you can create a new field with uniform size, such as all lowercase: suppose the original field is {a:aAbB}, then add a corresponding field that is all lowercase: {a_low:aabb} and then query through the field a_low to achieve an exact match. according to the improved scheme, the query time for this scenario is reduced to 2 milliseconds, although the addition of new fields causes the instance to become larger. But it's worth it in exchange for a big improvement in performance.
7. Do not store too long a string. If this field is a query condition, make sure that the value of the field does not exceed 1KB.
8. The index of MongoDB only supports fields less than 1K. If the length of the data you store is longer than 1K, it will not be indexed.
4. Indexes
1. MongoDB's composite index strategy is consistent with MySQL and follows the "leftmost principle".
two。 Index name length should not exceed 128 characters
3. The query scenario should be evaluated as comprehensively as possible, and the number of single-column indexes should be reduced by merging single-column indexes into combined indexes as much as possible, combined with 1 point and 2 points.
MongoDB's combinatorial index rules, like MySQL, follow the leftmost principle. Assuming that a combinatorial index is as follows: {aburete 1jpgmlgl}, then queries based on the following conditions can be used:
{a:1}
{a:1,b:2}
{a:1,b:2,c:3}
{}
Indexes are not available for queries with the following conditions:
{b:1}
{b:1:c:2}
{c:2}
In addition, when designing an index, you can use this principle to reduce the number of indexes. If you need to query through {a:xxx} or {apurxxxreb _ xxx}, then create an index:
{a:1,b:1}
These two query scenarios can be satisfied at the same time without the need to create a separate {apur1} index.
4. When creating a composite index, you should evaluate the fields contained in the index and try to put the fields with a large data base (data with many unique values) in front of the composite index.
The query in a business scenario is very slow, which takes about 1.7 seconds and needs to be tuned. The query and corresponding index for this scenario are as follows:
Query: {name:baidu,status:0}
Index: {status:1,name:1}
At first glance, there is no problem, because the query and the index match very well, but after analysis of the collection, it is found that there are a total of 1.5 million documents in the collection, while 1499930 in status=0, because this basically accounts for 99% of the number of documents (the data base is very small), so although the index is used, it is still necessary to find name=baidu data from 1.49 million rows of data, but the name field has a large number of different data (large data base). So if the composite index is adjusted to name first, the query can extract less data through the name field and then filter it through status, which is fast:
{name:1.status:1} query time reduced to 3'5 milliseconds after adjustment.
5. When the amount of data is large, the creation of MongoDB indexes is a slow process, so it should be evaluated as far as possible before the frontline or the amount of data becomes large, and the indexes that will be used should be created as needed.
6. MongoDB supports TTL index, which can automatically delete data before XXX seconds according to your needs and try to perform deletion operation during the business trough. See if the business needs this type of index.
7. If the data you store is geographic location information, such as longitude and latitude data. Then you can add geographic indexes supported by MongoDB: 2d and 2dsphere to this field, but they are different, and mixing will lead to inaccurate results.
2d: can only be used for point-to-point indexing, suitable for flat maps and continuous-time data, such as game maps [2dsphere: allows you to specify points, lines, and polygons. Maps suitable for the type of earth surface (sphere)] if a 2d index is created on the surface of the sphere, it will result in a large number of distortions near the poles, resulting in inaccurate results
8. MongoDB's full-text index is still in the "experimental" stage, and its performance is not ideal, so it is not recommended to use it at present.
9. Starting from MongoDB2.4, the ICP function of index is supported, through which the number of indexes can be reduced reasonably.
Starting with MongoDB2.4, composite indexes can be used more efficiently, such as:
If the data cardinality of the x field is very large and there is very little data matched by this condition, in this case, there is no need to add a special index, the index {xvex 1magnum 1} can bring ideal performance, but it is important to note that ICP performance is not as efficient as the native continuous composite index. If it is found that the efficiency is not good, then it is still necessary to add a separate {XRV 1 ZRO 1} index.
10. Indexes should be created in the background to avoid blocking normal DML and queries.
Db.works.createIndex ({plan:1,trainingpoints:1,cmsOrder:1,stateValue:1}, {background:true})
a. Add a unique index
Db.bodys.createIndex ({user:1,date:-1}, {unique:true,background:true}) unique index 3.2 must be added in this way
b. Add an index with an array of other columns
Db.antis.createIndex ({"action.last": 1 background:true refeiting type 1}, {background:true})
Where action.last is an array
C. TTL index. Field create_date,180 automatically cleans up data after days.
Db.orders.createIndex ({"create_date": 1}, {"expireAfterSeconds": 15552000})
d. Case description
Create a location and status index, in order to quickly process the "somewhere unprocessed order" query, this is a multi-conditional query, so it is a composite index
The status field comes first because most queries rely on the status field
Db.order.createIndex ({"status": 1, "delivery.city": 1, "delivery.address": 1})
Another way to speed up the query in this Demo is to create a Partial Indexes index that contains only the specified state.
For example, status must be delivering to add to the index, effectively control the size of the index and speed up the query.
Db.order.createIndex ({"delivery.city": 1, "delivery.address": 1}, {partialFilterExpression: {'status': {$eq: "delivering"})
e.
11. It is suggested to create an index: first, do the equivalent query, sort, and do the range query.
5. Practical operation performance
1. The-1 in the index is different from the one in the index. One is in reverse order and the other is in positive order. You should establish a suitable index sort according to your own business scenario. It should be noted that {aburete 1 is the same as {aburet 1} and the other is the same.
two。 Check your program performance as much as possible when developing your business. You can use the explain () function to check your query execution details. In addition, the hint () function is equivalent to force index () in MySQL.
3. Some $operators in queries can lead to poor performance, such as $ne,$not,$exists,$nin,$or. Try not to use them in your business.
A. $exist: because of the loose document structure, the query must traverse each document
B. $ne: if the negative value is most, the entire index will be scanned
C. $not: may cause the query optimizer to not know which index to use, so it often degenerates to a full table scan
D. $nin: full table scan
E. $or: query as many conditions as there are, and finally merge the result set, so use $in as much as possible
4. If you combine the size / number of documents with a fixed number of documents, it is recommended to create a capped (capped) collection, which has very high write performance and does not require special cleaning of old data. It should be noted that the capped table does not support r emove () and update ()
5. When writing data, if you need to implement a function similar to INSERT INTO ON DUPLICATE KEY UPDATE in MySQL, you can choose the upsert () function
Db.analytice.update (
{"url": "/ blog"}
{"$inc": {"visits": 1}}
True
)
The third parameter says that this is upsert
6. Do not sort too much data at one time. MongoDB currently supports sorting the result sets within 32MB. If sorting is needed, please try to limit the amount of data in the result set.
7. MongoDB's aggregation framework is very easy to use, can implement complex statistical queries through simple syntax, and the performance is good.
8. If you need to clean up all the data in a collection, the performance of remove () is very low, and drop () should be used in this scenario.
Remove () is a line-by-line operation, so it performs poorly when deleting large amounts of data
9. You can choose to use batchInsert when writing a large amount of data, but at present, the maximum message length that MongoDB can accept each time is 48MB. If it exceeds 48MB, it will be automatically split into multiple 48MB messages.
10. When using array fields as query conditions, it will not be able to override the index
This is because the array is saved in the index, and even if the array field is removed from the field that needs to be returned, the index still cannot overwrite the query.
11. If there is a range condition in the query, try to filter it with the fixed value condition, and put the fixed value query field in front of the range query field when creating the index
The answers to the questions about MongoDB design naming conventions are shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.