[Mongo] MongoDB practical information series-Variety and Document Validation specification fields 01/05 Update SLTechnology News&Howtos

[Mongo] MongoDB practical information series-Variety and Document Validation specification fields

2026-01-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original address: http://www.mongoing.com/archives/2282

Words written before

As the most popular document database in recent years, MongoDB has been concerned by more and more people, but as there are only a handful of MongoDB-related technology sharing in China, many friends complain to me that they don't know how to start.

"MongoDB practical Information Series" will share some practical information of MongoDB from the perspective of practical application, covering tuning, troubleshooting and other aspects. I hope it will be helpful to you.

If you want to know more about the basis of MongoDB, please also Google.

We know that MongoDB is a document database, and scheme free is a very important feature, but how should we reasonably use this feature to deal with the schema of MongoDB in production?

Text

Everyone knows that MongoDB is a document database and belongs to Schema Free.

So what benefits can MongoDB's document model bring to us? here are a few:

Json format-in MongoDB, developers can store a json data directly into MongoDB, which is very friendly to developers High read and write performance-in relational databases, we often need join, subqueries and other related requirements, which often lead to more random IO. In MongoDB, we can design a reasonable data model to implement many related requirements through embedding and anti-normal form, thus reducing random IO. The data model of schema free-MongoDB is flexible, there is no need to worry about Online DDL, and different document can have different structures.

Here, we will not delve into how to design and model MongoDB's Schema. For this part, we recommend that you read TJ's sharing of "MongoDB Advanced pattern Design" and "Retail Reference Architecture Part 1 to 4" at the year-end event in Open Source China.

Here we will mainly discuss the way of inspection and inspection of schema after preliminary modeling and formal online service.

Variety

Variety is an open source, very useful, open source tool for detecting the type and distribution of mongodb fields.

As the first sentence in its github readme says, "Meet Variety, a Schema Analyzer for MongoDB."

Variety can help us to detect the field types and distribution in our MongoDB table and produce reports, which allows us to intuitively analyze the existing table structure and field types, and find out the hidden dangers in the data model.

Let's explain it with examples:

First, create a table

Db.users.insert ({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"}); db.users.insert ({name: "Dick", bio: "I swordfight.", birthday: new Date ("1974-03-14")}); db.users.insert ({name: "Harry", pets: "egret", birthday: new Date ("1984-03-14")}) Db.users.insert ({name: "Genevive", bio: "? a va?"}); db.users.insert ({name: "Jim", someBinData: new BinData (2, "1234")})

Let's take a look at the results obtained through variety

$mongo test-- eval "var collection = 'users'" variety.js+-+ | key | types | occurrences | percents | |-|- |-- | | id | ObjectId | 5 | 100.0 | | name | String | 5 | 100.0 | bio | String | 3 | 60.0 | | birthday | String | 2 | 40.0 | pets | Array (4) | String (1) | 5 | 40.0 | | someBinData | BinData-old | 1 | 20.0 | | someWeirdLegacyKey | String | 1 | 20.0 | +-+

Test is our db name and users is the table name. We can see that for the five pieces of data we inserted earlier, the result of variety running out is:

All document contain _ id, and name fields, 60% of document contains bio fields, 40% of document contains birthday and pets fields, and pets fields have 2 types of data (4 array, 1 string), and 20% of document contains someBinData and SomeWeirdLegacyKey fields.

However, in the production environment, because we have a large amount of data, for example, a table has 1 billion pieces of data, it will take a long time to scan all of them. Maybe we only want to analyze 1000 pieces of data, so we can use limit to limit them.

$mongo test-- eval "var collection = 'users' Limit = 1000 "variety.js+---+ | key | types | occurrences | percents | |-| | _ id | ObjectId | 1000 | 100.0 | | name | String | 1000 | 100.0 | | someBinData | BinData-old | 1000 | 100.0 | +-+

Because MongoDB can reduce the need for federated queries through embedding, and random IO can be reduced through anti-normal forms, there is likely to be nesting in our document. Sometimes there are too many layers of nesting, which affects our statistics. What to do, we can limit it through maxDepth. Please refer to the following example:

Db.users.insert ({name: "Walter", someNestedObject: {a: {b: {c: {d: {eWalter 1}) $mongo test-eval "var collection = 'users'" variety.js+----+ | key | types | occurrences | percents | |-|-- -|-| | id | ObjectId | 1 | 100.0 | | name | String | 1 | 100.0 | someNestedObject | Object | 1 | 100.0 | someNestedObject.a | Object | 1 | 100.0 | someNestedObject.a.b | Object | 1 | 100.0 | someNestedObject.a.b.c | Object | 1 | 100.0 | someNestedObject.a.b.c.d | Object | 1 | 100.0 | | someNestedObject. A.b.c.d.e | Number | 1 | 100.0 | +-+ $mongo test-- eval "var collection = 'users' MaxDepth = 3 "variety.js+---+ | key | types | occurrences | percents | |-|-- | _ id | ObjectId | 1 | 100.0 | | name | String | 1 | 100.0 | someNestedObject | Object | 1 | 100.0 | someNestedObject.a | Object | 1 | 100.0 | | someNestedObject.a.b | Object | 1 | 100.0 | + |

Or if we want to specify statistical conditions, such as those where we want caredAbout to be true, we can do this:

$mongo test-- eval "var collection = 'users', query = {' caredAbout':true}" variety.js

Or you want to sort:

$mongo test-- eval "var collection = 'users', sort = {updated_at:-1}" variety.js

We can also specify the format of the analysis result:

$mongo test-- quiet-- eval "var collection = 'users', outputFormat='json'" variety.js

In general, in production, we do not analyze on primary. We can do analysis on a secondary with priority 0 and hidden. In this case, you need to specify slaveOK:

$mongo secondary.replicaset.member:31337/somedb-- eval "var collection = 'users', slaveOk = true" variety.js

Or we want to store the analysis results in mongo:

$mongo test-- quiet-- eval "var collection = 'users', persistResults=true" variety.js

And specify details of the storage:

The db name stored in the resultsDatabase analysis result, the collection name stored in the resultsCollection analysis result, the userresultsPass analysis result stored in the instance passwordmongo test-- quiet-- eval "var collection = 'users', persistResults=true, resultsDatabase='db.example.com/variety' variety.js, why should we use Variety?

Although our MongoDB is Schema Free, in most cases, we want the field type to be uniform.

Inconsistent field types may bring errors to our data. Imagine that if the field type of a field is not uniform and we do not know it, we are likely to find that data is lost and inaccurate in the business query.

And in the production environment, the version of the application is constantly iterating, the demand is increasing, and the fields are changing accordingly. If there is no standardized online process check, there may still be some data fields in the database. For example, some document has a field and some do not. Variety can also help us find these problems.

Document Validation

MongoDB 3.2has introduced a lot of powerful features, and I have to mention the emergence of Document Validation,Document Validation here. I think it is also that MongoDB officials want to express "schema free but you may need some rules". Haha, it is pure conjecture.

A brief introduction to Document Validation:

We can make some restrictions on our schema free's mongodb collection. Of course, this does not mean that MongoDB has become a relational database, but I think it better highlights the characteristics of MongoDB Schema free. Schema free in the right place, where you need it, and have restrictions in the right place.

Suppose we want to create a new table contacts with the following constraints:

The phone field is of type string or the email field should match the end of "@ mongodb.com", or status is "Unknown" or "Incomplete".

Db.createCollection ("contacts", {validator: {$or: [{phone: {$type: "string"}}, {email: {$regex: / @ mongodb.com$/}}, {status: {$in: ["Unknown", "Incomplete"]}]}})

We can limit the tables that have been created in the following ways:

Db.runCommand ({collMod: "contacts", validator: {$or: [{phone: {$type: "string"}}, {email: {$regex: / @ mongodb.com$/}}, {status: {$in: ["Unknown", "Incomplete"]}]}, validationLevel: "moderate"})

As you can see here, with an extra validationLevel parameter, we can specify our validationLevel level when setting validation:

The default level is strict, which validates the existing document of the collection and the new document later.

It can be set to moderate, and only the existing document is qualified by validation.

There is also a validationAction parameter to specify how our mongodb instance handles update or insert when there is data that does not comply with the validation rules.

The default level is error,mongodb to reject these insert and update that do not comply with validation rules.

It can be set to that warn,mongodb is logged, but this type of insert and update operations are allowed. Logs such as:

2015-10-15T11:20:44.260-0400 W STORAGE [conn3] Document would fail validation collection: example.contacts doc: {_ id: ObjectId ('561fc44c067a5d85b96274e4'), name: "Amanda", status: "Updated"}

Validation restrictions validation cannot set collection in admin, local, and config libraries

You cannot set validation for collections such as system.*

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.