Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is MongoDB?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is MongoDB". In daily operation, I believe many people have doubts about what MongoDB is. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is MongoDB?" Next, please follow the editor to study!

I. brief introduction

MongoDB is a popular open source document database, and judging from its name, it does have some ambition. The original name of MongoDB comes from the English word "Humongous", which means "huge" in Chinese, meaning that the nominee's intention is to handle large amounts of data.

But the author prefers to call it "Mango" database, in addition to the transliteration is more similar, the reason also comes from the two senses of using MongoDB in recent years:

The first layer of feeling is "cool". The characteristic of using this document database is that there are almost no restrictions. On the one hand, the structure of Json documents is easier to understand, and the absence of Schema constraints also makes DDL management easier and everything can be done very quickly.

The second layer of feeling is "sour", which believes that the brothers who work in operation and maintenance or supporting sex will feel more deeply. Because the entry experience of MongoDB is "too friendly", some teams think that it is very easy to make good use of this database, so it is normal for developers to bury some holes in the inventory system. The so-called delivery for a while, the maintenance of the crematorium.. Of course, this sentence may go a little too far. But the subtext here is: like the traditional RDBMS database, MongoDB also needs careful consideration and care in its use, otherwise it will encounter more holes.

So, although the selection of the document database can deter some teams, it still doesn't hinder some support for the database, such as rankings on DB-Engine:

Figure-DBEngine ranking

MongoDB has long been ranked fifth in the overall rankings (1st in the document database) and is also the most popular NoSQL database. In addition, the active community of MongoDB, coupled with commercial drivers (MongoDB listed on Nasdaq in 2017), have contributed to the development of the open source database.

Some features of the MongoDB database:

For document storage, flexible data structure can be expressed based on JSON/BSON.

Dynamic DDL capability, no strong Schema constraints, support for fast iteration

High-performance computing, providing fast data query based on memory

Easy to expand, using data fragmentation to support massive data storage

Rich feature set, support for secondary indexing, powerful aggregation pipeline functions, tailor-made functions for developers, such as automatic data aging, fixed collection and so on.

Cross-platform version and support for multilingual SDK..

Assuming that you are new to MongoDB, the following content will help you get some idea of the full picture of the database technology.

2. Basic model

The data structure is critical to a software. MongoDB refers to the SQL database in the conceptual model, but it is not exactly the same.

On this point, it is also said that MongoDB is the most SQL-like database in NoSQL..

As shown in the following table:

Description

Database database, which is the same as SQL's database (database) concept, a database contains multiple sets (tables)

Collection collection, which is equivalent to a table (table) in SQL, can hold multiple documents (rows). The difference is that the schema of the collection is dynamic and there is no need to declare a strict table structure in advance. More importantly, by default, MongoDB does not do any schema check on the written data.

An document document, which is equivalent to a row in SQL, consists of multiple fields (columns) and is represented in bson (json) format.

The field field, which is equivalent to the column in SQL, differs from normal column in that the type of field can be more flexible, such as supporting nested documents and arrays. In addition, the types of fields in MongoDB are fixed, case-sensitive, and the fields in the document are ordered.

In addition, SQL has some other concepts, the corresponding relations are as follows:

Description

Id primary key, MongoDB uses an id field by default to ensure the uniqueness of the document.

Reference reference can barely correspond to the concept of foreign keys (foreign key), because reference does not implement any foreign key constraints, but only a special type of association query and transformation automatically carried out by the client (driver).

The view view, which is no different from the SQL view, is a layer of objects that can be dynamically queried based on tables / collections, either virtual or physical (materialized views).

Index index, which is the same as SQL's index.

Lookup, which is an aggregation operator that can be used to implement functions similar to SQL-join connections

Transaction transactions, starting with MongoDB version 4.0, provide support for transactions

Aggregation aggregation, MongoDB provides a powerful aggregation computing framework, group by is one of the aggregation operations.

BSON data type

MongoDB documents can be represented using Javascript objects and are formatted based on JSON.

A typical document is as follows:

{

"_ id": 1

"name": {"first": "John", "last": "Backus"}

"contribs": ["Fortran", "ALGOL", "Backus-Naur Form", "FP"]

"awards": [

{

"award": "W. W. McDowell Award"

"year": 1967

"by": "IEEE Computer Society"

}, {

"award": "Draper Prize"

"year": 1993

"by": "National Academy of Engineering"

}

]

}

Once upon a time, the advent and popularity of JSON made the data transfer of Web 2.0 very easy, so using JSON syntax was very easy for developers to accept. But JSON has its own shortcomings, such as the inability to support specific data types such as dates, so MongoDB actually uses an extended JSON called BSON (Binary JSON).

The data types supported by BSON include:

Figure-BSON Typ

Distributed ID

In the stand-alone era, most applications can use database self-increasing ID as the primary key. Traditional RDBMS also supports this approach. For example, mysql can implement self-increasing primary keys by declaring auto_increment. However, once the data is distributed, this approach is no longer applicable because there is no guarantee that the primary keys on multiple nodes will not be duplicated.

In order to guarantee the uniqueness of distributed data ID, application developers put forward their own schemes, and in most schemes, ID is generated in segments. For example, the famous snowflake algorithm uses timestamps, machine numbers, process numbers and random numbers at the same time to ensure uniqueness.

MongoDB uses ObjectId to represent the type of primary key, and each document in the database has a _ id field to represent the primary key. The rules for generating _ id are as follows:

Figure-ObjecteID

These include:

4-byte Unix timestamp

3-byte machine ID

2-byte process ID

3-byte counter (initialize random)

It is worth mentioning that the generation of id is essentially generated by the client (Driver), which can achieve better randomness and reduce the load on the server. Of course, the server also detects whether the written document contains an id field, and if not, it is automatically generated.

Third, operational grammar

In addition to the document model itself, the commands for manipulating data are also based on the syntax of the JSON/BSON format.

Such as inserting a document:

Db.book.insert (

{

Title: "My first blog post"

Published: new Date ()

Tags: ["NoSQL", "MongoDB"]

Type: "Work"

Author: "James"

ViewCount: 25

CommentCount: 2

}

)

Perform a document lookup:

Db.book.find ({author: "James"})

Command to update the document:

Db.book.update (

{"_ id": ObjectId ("5c61301c15338f68639e6802")}

{"$inc": {"viewCount": 3}}

)

The command to delete the document:

Db.book.remove ({"_ id":

ObjectId ("5c612b2f15338f68639e67d5")})

In traditional SQL syntax, you can limit the returned fields, and MongoDB can be represented by Projection:

Db.book.find ({"author": "James"}

{"_ id": 1, "title": 1, "author": 1})

Implement a simple paging query:

Db.book.find ({})

.sort ({"viewCount":-1})

.skip (10) .limit (5)

This BSON/JSON-based syntax format is not complex, and its expressive power may be more powerful than SQL. Similar to MongoDB's approach is ElasticSearch, which is a leader in searching databases.

For a complete comparison between document manipulation and SQL, the official document describes in detail: https://docs.mongodb.com/manual/reference/sql-comparison/

So, an interesting question is whether MongoDB can query with SQL?

Of course you can!

However, it is important to note that these features are not native to MongoDB, but need to be implemented by third-party tool platforms:

The client uses SQL, and tools such as mongobooster and studio3t can be used

On the server side, you can take a look at some platforms such as presto..

IV. Index

There is no doubt that indexes are a key capability of a database, and MongoDB supports a very rich number of index types. Using these indexes, we can achieve fast data search, and the types and characteristics of the index are designed for different application scenarios.

The technical implementation of the index depends on the underlying storage engine, and in the current version MongoDB uses wiredTiger as the default engine. The structure of B+ tree is used in the implementation of the index, which is no different from other traditional databases. So this is good news, and most of the index tuning techniques based on SQL databases are still feasible on MongoDB.

Figure-B + tree

Using ensureIndexes, you can declare a normal index for the collection:

Db.book.ensureIndex ({author: 1})

The number 1 after author represents ascending order, or-1 if it is descending order.

Implement a composite (compound) index as follows:

Db.book.ensureIndex ({type: 1, published: 1})

The order of index keys makes sense only for composite indexes.

If the field of the index is an multikey type, the index automatically becomes an array index:

Db.book.ensureIndex ({tags: 1})

MongoDB can contain fields of an array on a composite index, but only one at most

Index characteristic

When declaring an index, you can also assign certain features to the index through some parameterization options, including:

Unique=true, which represents a unique index

ExpireAfterSeconds=3600, indicating that this is an TTL index and the data will age in 1 hour

Sparse=true, which represents a sparse index, indexing only documents with non-empty (non-null) fields

PartialFilterExpression: {rating: {$gt: 5}, conditional index, that is, documents that meet the calculation conditions are indexed

Index classification

In addition to normal indexes, the types supported by MongoDB include:

Hash (HASH) index. Hash is another data structure that can be retrieved quickly. MongoDB's HASH type sharding key uses a hash index.

Geospatial index, used to support fast geospatial queries, such as looking for businesses 1 km nearby.

Text index, used to support fast full-text retrieval

Fuzzy Index (Wildcard Index), a flexible index based on matching rules, was introduced in version 4.2.

Index evaluation, tuning

The explain () command can be used for query plan analysis to further evaluate the effectiveness of the index. As follows:

> db.test.explain () .find ({a: 5})

{

"queryPlanner": {

...

"winningPlan": {

"stage": "FETCH"

"inputStage": {

"stage": "IXSCAN"

"keyPattern": {

"a": 5

}

"indexName": "axi1"

"isMultiKey": false

"direction": "forward"

"indexBounds": {"a": ["[5.0,5.0]"]}

}

}}

...

}

From the result winningPlan, you can see whether the execution plan is efficient, such as:

If the result of the index is not hit, COLLSCAN will be displayed.

Hit the result of the index, using IXSCAN

A memory sort appears, displayed as SORT

For a description of the results of explain, you can refer to the documentation further:

Https://docs.mongodb.com/manual/reference/explain-results/index.html

V. Cluster

Among the 4V features often mentioned in big data's field, Volume (large amount of data) is the first to be mentioned. Due to the limitation of the vertical expansion capacity of a single machine, the way of horizontal expansion appears to be more reliable. MongoDB comes with this ability to store data on multiple machines to provide greater capacity and load capacity. In addition, at the same time, in order to ensure the high availability of data, MongoDB uses replica set to achieve data replication.

A typical MongoDB cluster architecture uses sharding plus replica sets at the same time, as shown below:

Figure-MongoDB sharding Cluster (Shard Cluster)

Architecture description

Data sharding (Shards) is used to store real cluster data, which can be a single Mongod instance or a replica set. In production environment, Shard is generally a Replica Set to prevent the single point of failure of the data chip. For a sharded set (sharded collection), a part of the data of the collection is stored on each shard (split according to the sharding key). If the collection is not sharded, then the data of the collection is stored in the Primary Shard of the database.

The configuration server (Config Servers) holds the metadata (metadata) of the cluster, including the routing rules of each Shard, and the configuration server consists of a replica set (ReplicaSet).

Query routing (Query Routers) Mongos is the access portal to Sharded Cluster and does not persist the data itself. After the Mongos starts, the metadata is loaded from the Config Server, services are provided, and the user's request is correctly routed to the corresponding Shard. Sharding clusters can deploy multiple Mongos to share the pressure of client requests.

Slicing mechanism

The following details are important for understanding and applying the slicing mechanism of MongoDB, so it is necessary to mention:

1. How to split the data

First of all, based on the sharded data block is called chunk, a sharded set will contain multiple chunk, each chunk located in which shard (Shard) is recorded on the Config Server (configuration server). When Mongos operates a shard collection, it automatically finds the corresponding chunk according to the shard key, and initiates an operation request to the shard where the chunk is located.

The data is sliced according to the slicing strategy, which consists of slicing key (ShardKey) + slicing algorithm (ShardStrategy).

MongoDB supports two sharding algorithms:

Range slicing

As shown in the figure above, suppose the collection is sliced according to the x field, and the value range of x is [minKey, maxKey] (x is an integer, where minKey and maxKey are the minimum and maximum values of the integer). The whole range of values is divided into multiple chunk, and each chunk (default configuration is 64MB) contains a small piece of data: for example, Chunk1 contains all documents whose values of x are in [minKey,-75). Chunk2 contains all documents with x values between [- 75,25).

Range fragmentation can well meet the needs of range query, for example, if you want to query all documents with a value of x between [- 30,10], Mongos can directly route the request to Chunk2 and query all documents that meet the criteria. The disadvantage of range sharding is that if the ShardKey has an obvious increasing (or decreasing) trend, most of the newly inserted documents will be distributed to the same chunk, unable to expand the writing ability, such as using _ id as the ShardKey, while the id high bits automatically generated by MongoDB are timestamped and are continuously increasing.

Hash fragmentation

Hash fragmentation first calculates the hash value (64bit integer) according to the user's ShardKey, and then distributes the document to different chunk according to the range slicing strategy according to the hash value. Because the calculation of hash value is random, Hash fragments have good discreteness, and the data can be randomly distributed to different chunk. Hash sharding can fully expand the writing capacity and make up for the deficiency of scope sharding, but it can not efficiently query the service scope. All scope queries need to query multiple chunk in order to find documents that meet the conditions.

two。 How to ensure equilibrium

As described above, the data is distributed on different chunk, and the chunk is allocated to different shards, so how to ensure that the data on the shard (chunk) is balanced? In a real scenario, there are two situations:

A. Full pre-allocation, the number of chunk and shard are predefined, such as 10 shard, store 1000 chunk, then each shard has 100 chunk. At this point, the cluster is already in an equilibrium state (assumed here)

B. Non-pre-allocation, this situation is more complex. Generally, split will occur when a chunk is too large, and the result of continuous split will lead to imbalance, or when dynamic capacity expansion increases sharding, there will also be an imbalance. This imbalance is detected by the cluster equalizer, and once the imbalance is found, the relocation of chunk data is balanced.

MongoDB's data equalizer runs on Primary Config Server (the master node of the configuration server), which also controls the relocation process of Chunk data.

Figure-data automatic equalization

The imbalance of the data is determined by the difference in the number of Chunk on the two shards. The corresponding threshold table is as follows:

The data migration of MongoDB has a certain impact on the performance of the cluster, which can not be avoided. Currently, the only way to avoid it is to align the balance window to the business idle period.

Https://docs.mongodb.com/manual/tutorial/manage-sharded-cluster-balancer/#sharding-schedule-balancing-window

3. High availability of application

Application nodes can achieve high availability by connecting multiple Mongos at the same time, as follows:

Figure-mongos High availability

Of course, the highly available function of connecting is implemented by Driver.

Replica set

Replica set is another topic. In fact, in addition to the previous framework, replica set can be used as a Shard in Shard Cluster. For smaller businesses, it can also be deployed as a single replica set. The replica set of MongoDB adopts an one-master and multi-slave structure, that is, a Primary Node + N* Secondary Node. The data is written from the master node and copied to multiple standby nodes.

A typical architecture is as follows:

Using the replica set, we can achieve:

The database is highly available, and when the primary node goes down, the standby node is automatically elected as the new primary node.

Read and write are separated, and read requests can be diverted to the standby node to reduce the single point pressure on the primary node.

Note that read-write separation can only increase the "read" ability of the cluster, but there is nothing you can do about the very high write load. To meet this requirement, using sharding clusters and adding sharding, or improving the disk IO and CPU capabilities of database nodes can achieve certain results.

Election

The MongoDB replica set uses the Raft algorithm to complete the election of the master node, which is automatically completed during initialization, as shown in the following command:

Config = {

_ id: "my_replica_set"

Members: [

{_ id: 0, host: "rs1.example.net:27017"}

{_ id: 1, host: "rs2.example.net:27017"}

{_ id: 2, host: "rs3.example.net:27017"}

]

}

Rs.initiate (config)

The initiate command is used to initialize the replica set, and after the election is completed, you can see the result of the election through the isMaster () command:

> db.isMaster ()

{

"hosts": [

"192.168.100.1purl 27030"

"192.168.100.2purl 27030"

"192.168.100.3" 27030 "

]

"setName": "myReplSet"

"setVersion": 1

"ismaster": true

"secondary": false

"primary": "192.168.100.1 purl 27030"

"me": "192.168.100.1 purl 27030"

"electionId": ObjectId ("7fffffff0000000000000001")

"ok": 1

}

Affected by the Raft algorithm, the election of the primary node needs to meet the "majority" principle. Please refer to the following table:

Therefore, in order to avoid even votes, the deployment of replica sets is generally based on the base number of nodes, such as 3.

heartbeat

In the highly available implementation mechanism, the heartbeat is very critical, and judging whether a node is down depends on whether the heartbeat of the node is normal or not. Heartbeats are regularly sent to other nodes on each node in the replica set to sense changes in other nodes, such as whether they fail or if their roles have changed. Using heartbeat, MongoDB replica set implements automatic failover, as shown in the following figure:

By default, a node sends a heartbeat to other nodes every 2 seconds, including the primary node. If the standby node does not receive a response from the primary node within 10 seconds, it will initiate the election actively. At this time, a new round of elections begins, and the new master node will generate and take over the business of the original master node. The whole process is transparent to the upper layer, and the application does not need to be aware of it, because Mongos automatically discovers these changes. If the application uses only a single replica set, the processing is automatically done by the Driver layer.

Copy

The data of the primary and secondary nodes is implemented through oplog replication, which is very similar to mysql's binlog. In each node of the replica set, there is a special collection called local.oplog.rs. When the write operation on the Primary is complete, an oplog is written to the collection, while the Secondary continues to pull the new oplog from the Primary and play it back locally for synchronization.

Next, take a look at the specific form of an oplog:

{

Ts: Timestamp (1446011584, 2)

"h": NumberLong ("1687359108795812092")

"v": 2

"op": "I"

"ns": "test.nosql"

"o": {"_ id": ObjectId ("563062c0b085733f34ab4129"), "name": "mongodb", "score": "100"}

}

Some of the key fields are:

The optime of the ts operation, which contains not only the timestamp of the operation, but also a self-incrementing counter value.

Globally unique representation of the h operation

Version information of v oplog

The type of op operation, such as iObject insert data update..

A collection of ns operations in the form of database.collection

O refers to the specific operation content, and for an insert operation, it contains the contents of the entire document.

MongoDB is careful about the design of oplog, such as:

Oplog must be orderly, guaranteed through optime.

The oplog must contain complete information that can be played back.

Oplog must be idempotent, that is, playing the same log multiple times produces the same result.

The oplog collection is of a fixed size, and old oplog records are scrolled away to avoid taking up too much space.

Interested readers can refer to the official documentation:

Https://docs.mongodb.com/manual/core/replica-set-oplog/index.html

VI. Transaction and consistency

For a long time, "do not support transactions" has been criticized by MongoDB, of course, it can also be said that this is a tradeoff of NoSQL database (giving up transactions in pursuit of high performance and high scalability). But in essence, MongoDB has the concept of transactions a long time ago, but this transaction can only be aimed at a single document, that is, the operation of a single document is guaranteed by atomicity. After version 4.0, MongoDB began to support multi-document transactions:

Version 4.0 supports replica set-wide multi-document transactions.

Version 4.2 supports multi-document transactions across shards (based on two-phase commit).

In terms of transaction isolation, MongoDB supports the isolation level of snapshots (snapshot), which can avoid dirty reads, unrepeatable reads, and phantom reads. Although there is a real transaction function, multi-document transactions have a certain impact on performance, and applications should be fully evaluated before choosing.

Consistency

Consistency is a complex topic, and consistency is more often proposed from an application point of view, such as:

If you write a piece of data to the system, you should be able to read the written data immediately.

In the CAP theory of distributed architecture and many continuation viewpoints, it is mentioned that due to the existence of network partition, the system is required to choose between consistency and availability, but not both.

Graph-CAP theory

In MongoDB, this choice can be made by the developer. MongoDB allows clients to set certain levels or preferences for their operations, including:

Read preference read preference, you can specify read master node, read backup node, or give priority to read master, priority read backup, and take the nearest node

Write concern write concern, which specifies the state in which the write result will be returned. It can be none, ack, or most nodes have completed data replication, and so on.

Read concern reads focus on specifying the state of the read data version, which can be written locally for read, read most nodes, or linear read (linearizable), and so on.

Using different settings will result in different choices for C (consistency) and A (usability), such as:

Set the read preference to primary, where both reads and writes are on the primary node. This ensures the consistency of the data, but once the failure of the main node leads to failure (reduced availability)

Set the read preference to secondaryPrefered, when the writer is the master, the priority is read and backup, and the availability is improved, but there is a delay in the data (there is inconsistency)

When read and write concerns are set to majority (most), consistency is improved, but availability is also reduced (node failure results in most write failures)

There has always been a discussion on this tradeoff, and MongoDB, in addition to providing a variety of options, is mainly through replication, heartbeat-based automatic failover and other mechanisms to reduce the impact of system failure, so as to improve the overall availability.

At this point, the study of "what is MongoDB" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report