Naming Design Specification for MongoDB 07/19 Update SLTechnology News&Howtos

Naming Design Specification for MongoDB

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Brief introduction

MongoDB is a database based on distributed file storage. Written in C++ language. Designed to provide scalable high-performance data storage solutions for WEB applications.

MongoDB is a product between relational database and non-relational database, which is the most functional and most like relational database in non-relational database. It supports a very loose data structure, which is similar to json's bson format, so it can store more complex data types. The most important feature of Mongo is that the query language it supports is very powerful, and its syntax is somewhat similar to the object-oriented query language. It can almost achieve most of the functions similar to the single table query of relational database, and also supports the establishment of data indexing.

Properties:

* for collection storage, it is easy to store data of object type.

* Mode freedom.

* dynamic query is supported.

* full indexing is supported, including internal objects.

* query is supported.

* replication and failure recovery are supported.

* use efficient binary data storage, including large objects such as video, etc.

* automatically handle fragments to support scalability at the cloud computing level.

* Golang,RUBY,PYTHON,JAVA,C++,PHP,C# and other languages are supported.

* the file format is BSON (an extension of JSON).

* accessible through the network.

one。 Naming rules

1.mongodb version selection:

The default newly installed database uses MongoDB 3.x Community Edition. Recommendation 3.2.10 +

two。 Database design specification

The database name can be any UTF-8 string that meets the following conditions:

(1) No special characters other than "_" characters can appear.

(2) cannot contain "(spaces),., $, /, and (empty characters)."

(3) it should be all lowercase.

(4) up to 30 characters.

(5) it is forbidden to use the name of the library that begins with a number.

3. Collection naming rules

Any UTF-8 string that must meet the following conditions

(1) the collection name cannot be an empty string ""; special characters other than the "_" character cannot appear, and names starting with numbers are prohibited.

(2) the collection name cannot be "system." At the beginning, this is the prefix reserved for the system collection. For example, the system.users collection holds the user information of the database, and the system.namespaces collection holds the information of all database collections.

(3) user-created collection names cannot contain the reserved character $. Do not appear $in the name unless you want to access the collection created by the system

(4) Collection names should be concise and clear, and lowercase should be used as far as possible.

4. Field naming convention

(1) the field cannot contain (empty characters).

(2) prohibit the use of field names that begin with numbers

(3) the field name cannot be named with "", and no special characters other than "" characters can appear.

(4) the field reference must be the collection name + the referenced field name. For example, the key id of the collection user is referenced in the collection user_info, using user_id as the key name

(5) only in the case of a reference, the first letter of the collection name contained in the field needs to be capitalized, and the rest is in lowercase format.

(6) if the field is large, it should be compressed as much as possible.

(7) if the field is large and will become a query condition, such as a long list of url, you can transfer it to md5 and store it.

(8) prohibit customizing the value of _ id.

Second, database design specification

(1) rational capacity planning and library-level split

When creating a new database, plan in advance whether the collection number, storage capacity, QPS, etc. of the database should be placed in the existing cluster or the newly created cluster deployment.

(2) avoid putting all collections in the same database, resulting in too many collections in one library.

(3) Service forbids the use of id field

Business avoids writing custom business data to the id field: because the Jd field of MongoDB is the primary key by default, which is similar to the primary key of the MySQL InnoDB table, if the business writes unordered data (such as uuid/md5), the collection itself is B + Tree, and the internal storage data structure will be adjusted to ensure the balance of the tree. The cost of writing data is high, which can easily lead to low write performance.

(4) MongoDB data is case-sensitive. If the business is not case-sensitive, it is recommended to redundant all uppercase or lowercase fields for case-insensitive data retrieval efficiency *

Data queries in mongo are case-sensitive, such as {f, "aA"} query conditions, which cannot match documents with fields of "aa", "AA", "Aa" values. Some businesses need to ignore size and deal with {f:aa/} in a regular way. Although the function of ignoring size is realized, the query efficiency is very low and CPU resources are consumed. To address this type of requirement, you want to redundant an all-uppercase (or lowercase) field for the business to ignore the size of the retrieval requirement. For example, redundant tupper fields for f fields, storing field contents in all uppercase {f_upper. "AA'}

(5) compress and store large high-frequency fields:

For many high-frequency queries, if there is data that returns large fields (such as more than 10 KB), it is easy to fill up the network bandwidth of the MongoDB server when QPS is increased.

Or if the write frequency is high, the derivative oplog entity is very large. It is suggested that this kind of high-frequency and large data should be compressed in the business layer and then stored in MongoDB.

(6) when ObjectId is stored as ObjectId, it cannot be stored as a string.

Reasons: first, it is convenient to query (the string and ObjectId do not match each other). Second, ObjectId contains useful information, such as knowing the creation date from the inserted timestamp. Third, the ObjectId represented by the string takes up twice as much disk space.

two。 Index design specification

(1) the index of MongoDB only supports fields less than 1K. If the length of the data you store is longer than 1K, it cannot be indexed.

(2) the length of the index name should not be too long; naming method: idx_ field name; it is recommended to include all field names in the combined index. Field names that are too long can be abbreviated.

(3) unique index naming convention: uniq_ field name

(3) the query scenario should be evaluated as comprehensively as possible, and the number of single-column indexes should be reduced by merging single-column indexes into combined indexes as much as possible.

(4) the more indexes, the slower the mongodb will be caused by inserting or modifying records.

(5) the index should be created in the background to avoid blocking normal DML and query. Db.works.createIndex ({avell db.works.createIndex 1}, {"name": 'idx_ field name'}, {background:true})

(6) prohibit the creation of indexes on array fields

(7) when creating a composite index, the fields contained in the index should be evaluated and the fields with high selectivity (data with many unique values) should be placed in front of the composite index as far as possible.

(8) check your program performance as much as possible when developing your business, and use explain () to check the execution plan more often.

(9) redundant indexes are prohibited. For example, index idx_account_sName_createTime {"account": 1, "sName": 1, "createTime":-1} and index idx_account {"account": 1} index redundancy, you can delete the idx_account index.

3. Query specification

(1) whether the query statement uses an index, there must be an index on the key of the query condition, or on the key of the sort condition (except for collections with small amounts of data)

(2) use limit () to limit the size of the returned result set to reduce the resource consumption of the database server and the amount of data transmitted through the network.

(3) query only the fields used, but not all fields; try not to let array fields become query conditions.

(4) perform remove () delete operation without query condition, warning or error report

(5) some $operators in queries may lead to poor performance, such as $ne,$not,$exists,$nin,$or,$where should not be used in business as far as possible.

(6) MongoDB's composite index use strategy follows the "leftmost principle", giving priority to the use of overlay index, and query statements follow the order of composite index fields.

(7) use hint () to force the use of an index query if necessary

(8) during the update operation, query first and then update. Update efficiency can be improved by updating the primary key key.

# Application connection configuration

Reasonably set up read-write separation, reduce the pressure on the master node, and improve the scalability of the cluster.

(1) the mongo client determines the routing rules of the client's read-only query through the read-preference attribute setting.

(2) by default, all queries in the mongo client are routed to the master node, while many applications can route read-only queries to the slave node due to low read-only business (accepting second-level synchronization delay). The MongoDB normal replication synchronization delay is less than 1 second.

(3) if the business is read-only, the requirement for data-specificity is not high (for example, the worst case is subject to a 60-second delay). It is recommended that the read-only preference property of the program drver be set to secondaryPreferred.

Mongo client [read-only preference] supports five modes:

Read Preference ModeDescriptionprimary default mode, all client read-only commands are sent to master node primaryPreferred read-only query to master node by default, when master node is not available, forward to slave node secondary all read-only queries to slave node secondaryPreferred all read-only queries to slave node, if all slave nodes are not available, forward nearest read-only query to the nearest available data node Regardless of the role of the node, how to choose the MongoDB sharding key: sharding key specification:

1. Several principles of sharding keys:

The sharding key is immutable.

The sharding key must be an index.

The sharding key size is limited to 512bytes.

The sharding key is used for routing queries.

MongoDB does not accept documents without sharding keys inserted on collection that have been shredded at collection level.

The elements of a good film key

A good shard key should have the following features:

1.key distribution is sufficiently discrete (sufficient cardinality)

two。 Write requests are evenly distributed (evenly distributed write)

3. Try to avoid scatter-gather queries (targeted read)

The internal mechanism of MongoDB ensures that each replica set (RS) contains the same number of blocks.

The choice of chip keys determines three important aspects:

# 1. Distribution of reading and writing

The most important point is the distribution of reading and writing. If you always write to one machine, then that machine will become a write bottleneck, and the write performance of your cluster will be reduced. It doesn't matter how many nodes there are in your cluster, because all writes take place in only one place. Therefore, you should not use the monotonously increasing _ id or timestamp as the slice key, which will cause you to keep adding data to the last replica set.

Similarly, if your reads are on the same replica set all the time, you'd better pray that your tasks are within the limits of the machine's memory. Dividing read requests through a replica set allows your working data set to scale linearly with the number of fragments. In this way, you can distribute the load pressure evenly among the memory and disks of each machine.

two。 The size of the block

The second is the size of the data block. MongoDB can divide large blocks of data into smaller ones, but this only happens when the chip keys are different. If you have a large number of data documents that use the same chip keys, then you will get huge blocks of data accordingly. The emergence of large chunks is very bad, not only because it leads to uneven distribution of data, but also because once the size of the data block exceeds a certain value, you cannot move it between shards.

3. Number of shards hit by each query

Finally, it would be best to ensure that most query requests hit as few fragments as possible. For a query request, the delay depends directly on the slowest hit delay on the server; so the fewer shards you hit, the faster the query will theoretically be. This is not a rigid rule, but it should be advantageous if it can be fully considered. Because the distribution of data blocks on the slice only approximately follows the order of the chip keys, but is not strictly mandatory.

Several strategies of sharding keys

Both read and write can be evenly distributed, and it ensures that each document has a different chip key, so the data blocks can be very fine.

Queries for multiple documents will hit all shards.

Small movement of data files (advantage)

Because the data files are incremented, the write IO of insert is permanently placed on the last piece, creating a write hotspot for the last piece. At the same time, as the amount of data of the last piece increases, it will continue to migrate to the previous piece.

The data is evenly distributed, and the write IO of insert is evenly distributed on multiple chips. (advantage)

Queries for multiple documents are bound to hit all shards; a large number of random IO, the disk is overloaded.

In order to prevent the generation of large blocks, it is recommended to use key combinations and introduce _ id to refine. {keyname: 1, _ id: 1}

The principle is: keyname can be a field that is often queried, with a large cardinality as much as possible; the _ id field has many different values for mongodb to split, and this strategy is suitable for most business situations; if you can't find a field like keyname, Hashed _ id.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.