How to realize Cluster in MongoDB 07/16 Update SLTechnology News&Howtos

How to realize Cluster in MongoDB

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to achieve clustering in MongoDB. The editor thinks it is very practical, so I hope you can get something after reading this article. Let's take a look at it with the editor.

Basic concept

Document: a document is the core concept of MongoDB and the basic unit of data, similar to rows in a relational database. In MongoDB, a document is represented as an ordered set of key-value pairs. Documents are generally marked with the following styles:

{"title": "hello!"} {"title": "hello!", "recommend": 5} {"title": "hello!", "recommend": 5, "author": {"firstname": "paul", "lastname": "frank"}

As you can see from the above example, the value of a document has different data types and can even be a complete embedded document (* an example author is a document)

Collection: a collection is a collection of documents, which is equivalent to a data table in a relational database. MongoDB database is not a relational database and there is no concept of schema. Documents in the same collection can take different forms. For example:

{"name": "jack", "age": 19} {"name": "wangjun", "age": 22, "sex": "1"}

Can exist in the same collection.

Database (database): multiple documents form a collection, and multiple collections form a database. A MongoDB instance can host multiple databases, and each database can have 0 to more collections.

The main goal of MongoDB is to combine the advantages of both key-value pair storage (which provides high performance and high scalability) and traditional RDBMS (relational database) systems. MongoDB applies to the following scenarios:

Website data: Mongo is very suitable for real-time insertion, update and query, and has the replication and high scalability required for website real-time data storage.

Caching: due to its high performance, Mongo is also suitable as a cache layer for the information infrastructure. After the system is rebooted, the persistence cache built by Mongo can avoid the overload of data sources in the lower layer.

Large-size, low-value data: it may be expensive to use traditional relational databases to store some data. before that, many programmers tend to choose traditional files for storage.

Highly scalable scenario: Mongo is ideal for databases consisting of dozens or hundreds of servers

For the storage of objects and JSON data: Mongo's BSON data format is very suitable for document format storage and query.

Of course, MongoDB also has unsuitable scenarios:

A highly transactional system, such as a banking or accounting system. At present, traditional relational databases are more suitable for applications that require a large number of transactions.

Traditional business intelligence applications: BI databases for specific problems can provide highly optimized query methods. For such applications, the data warehouse may be a more appropriate choice (such as Hive in the Hadoop suite).

A problem that requires SQL.

Cluster strategy

In a commercial environment, MongoDB is usually used in the form of clusters for high availability. It is very easy to build a cluster environment for MongoDB. Here is an introduction.

Master-slave mode

The mode that we widely use when using MySQL database is that after the master node dies, the slave node can replace the host to continue the service after the dual backup. So this model is much more reliable than a single node.

Let's take a look at how to build the master-slave replication node of MongoDB step by step:

1. Prepare two machines 10.43.159.56 and 10.43.159.58. 10.43.159.56 as the master node and 10.43.159.58 as the slave node.

two。 Download the MongoDB installer packages separately. Create a folder / data/MongoDBtest/master,10.43.159.58 establishment folder / data/MongoDBtest/slave on 10.43.159.56.

3. Start the MongoDB master node program at 10.43.159.56. Notice the following "- master" parameter, marking the primary node:

Mongod-dbpath / data/MongoDBtest/master-master

The output log is as follows. It is successful:

[initandlisten] MongoDB starting: pid=18285 port=27017 dbpath=/data/MongoDBtest/master master=1

4. Start the MongoDB slave node program at 10.43.159.58. Key configuration: specify the ip address and port of the master node-source 10.43.159.56 slave 27017 and mark the slave node-port parameter:

Mongod-dbpath / data/MongoDBtest/slave-slave-source 10.43.159.56 data/MongoDBtest/slave-slave 27017

The output log is as follows. It is successful:

[initandlisten] MongoDB starting: pid=17888port=27017 dbpath=/data/MongoDBtest/slave slave=1

The log shows that the slave node replicates data synchronously from the master node:

[replslave] repl: from host: 10.43.159.56:27017

In this way, the MongoDB cluster with master-slave structure is set up, isn't it very simple?

Let's see what this cluster can do. Log in to the slave node shell first and perform the insert data:

Mongo 127.0.0.1 test3 27017 > db.testdb.insert ({"test3": "testval3"}); not master

You can see that the slave node of MongoDB can only be read and cannot perform write operations.

So if the master server dies, can the slave server take over?

You can try to force the MongoDB process on the master node to shut down, log in to the slave node, and insert the data again:

> db.testdb.insert ({"test3": "testval3"}); not master

It seems that the slave node does not automatically replace the work of the master node, so it can only be handled manually, stop the slave node, and then start the slave node in the way of master. Because the data on the slave node is the same as the master node, the slave node can replace the master node at this time, which belongs to manual switching.

In addition, we can build multiple slave nodes to achieve the separation of database read and write, such as the master node is responsible for writing, multiple slave nodes are responsible for reading, for mobile APP, most of the operations are read operations, which can achieve load sharing.

So, can the cluster with this master-slave structure be able to cope with the commercial environment? We found that there are still several problems that need to be solved:

Can I switch connections automatically when the primary node is dead? Currently, manual switching is required.

How to solve the problem of excessive writing pressure on the master node?

Each of the above data from the slave node is a full copy of the database, will the pressure from the slave node be too great?

Can the route access policy be automatically extended even if the slave node route is implemented?

Solving these problems depends on the replica set pattern described below.

Replica mode

MongoDB officials no longer recommend the use of master-slave mode, the alternative is to use the replica set mode, so what is the replica set? To put it simply, the replica set is a master-slave cluster with automatic fault recovery, or the master-slave mode is actually a single-copy application, without good scalability and fault tolerance. The replica set has multiple copies to ensure fault tolerance, even if a copy is dead, there are still many copies, and even better, the replica set is automated in many places, and it does a lot of management work for you. Smart readers have found that the manual switching of master-slave mode has been solved, no wonder MongoDB officials strongly recommend this mode. Let's look at the architecture diagram of the MongoDB replica set:

From the figure, you can see that the client connects to the entire replica set, regardless of which machine is down. The master server is responsible for reading and writing the whole replica set, and the replica set synchronizes data backup periodically. Once the master node dies, the replica node will elect a new master server, which does not need to be concerned about the application server. Let's take a look at the architecture after the primary server is down:

After the replica node in the replica set detects that the primary node is dead through the heartbeat mechanism, it will initiate the primary node election mechanism in the cluster and automatically elect a new master server. So Cool! Let's hurry up and deploy!

The officially recommended number of replica set machines is at least 3 (officially, the number of replica sets * * is odd), so we will also configure and test according to this number.

1. Prepare three machines 10.43.159.56, 10.43.159.58 and 10.43.159.60. 10.43.159.56 as the master node of the replica set, and 10.43.159.58 and 10.43.159.60 as the replica node of the replica set.

two。 Create a MongoDB replica set test folder on each machine

3. Download the installer package that installs MongoDB

4. Start MongoDB on each machine separately

Give your copy set a name, such as test:

/ data/MongoDBtest/MongoDB-linux-x86_64-2.4.8/bin/mongod-dbpath / data/MongoDBtest/replset/data-replSet test

You can see from the log that the replica set has not been initialized.

5. Initialize replica set

/ data/MongoDBtest/MongoDB-linux-x86_64-2.4.8/bin/mongo

Use the admin database:

Use admin

Define the replica set configuration variable, where the _ id: "test" is consistent with the above command parameter "- replSet test":

Config = {_ id: "test", members: [. {_ id:0,host: "10.43.159.56 test 27017"},. {_ id:1,host: "10.43.159.58 test 27017"},. {_ id:2,host: "10.43.159.60 test"}].}

Initialize the replica set configuration:

Rs.initiate (config)

Output successful:

{"info": "Config now saved locally. Should come online in about a minute.", "ok": 1}

Check the log. After the replica set is started successfully, 56 is the primary node PRIMARY,58 and 60 is the replica node SECONDARY. Note that this is the primary node elected by the three nodes, which is random to a certain extent.

View the status of the cluster node:

Rs.status ()

The entire replica set has been built successfully. Is it super simple?

The MongoDB of replica set mode is not only easy to build, but also powerful. Now let's look back at whether this mode can solve the problem we left behind: can the primary node automatically switch connections when the master node is dead?

First test whether the replica set data replication function is normal.

First insert the data on the primary node 56, and then view the data on the replica node and find that the log reports an error:

Error: {"$err": "not master and slaveOk=false", "code": 13435} at src/mongo/shell/query.js:128

This is because by default only data is read and written from the primary node, and the copy is not allowed to be read, as long as the copy is set to be readable. Execute: rs.slaveOk () on the replica node, then query the data and find that the data of the primary node has been synchronized.

Re-test the failover function of the replica set

Stop the process on master node 56 first, and you can see that the log on nodes 58 and 60 shows the voting process. Then execute rs.status () to see the cluster status update, 56 as unreachable, 58 as the master, and 60 as the replica. After starting 56 nodes, it is found that 58 is still the primary node and 56 becomes the replica node. In this way, the problem of automatic failover is solved.

So, for the master node read and write pressure is too great, how to solve it? The common solution is read-write separation. What about the read-write separation of MongoDB replica sets?

Look at the picture and say:

For mobile APP scenarios, write operations are usually far less than read operations, so one master node is responsible for writing and two replica nodes are responsible for reading. The client can choose which node to read from. There are five types of data reading parameters (Primary, PrimaryPreferred, Secondary, SecondaryPreferred, Nearest):

Primary: default parameter, read only from the primary node

PrimaryPreferred: most of the data is read from the primary node, and only from the Secondary node if the primary node is not available.

Secondary: read only from the Secondary node. The problem is that the Secondary node's data is "older" than the Primary node's data.

SecondaryPreferred: priority is given to reading from the Secondary node, and reading data from the master node when the Secondary node is not available

Nearest: no matter whether it is the master node or the Secondary node, read data from the node with network delay.

In a typical replica set network, there are not only replica nodes, but also other roles, such as arbitration nodes, as shown below:

The arbitration node does not store data, but only votes in the group responsible for failover, which reduces the pressure of data replication. In addition, there are Secondary-Only, Hidden, Delayed, Non-Voting and other roles.

Secondary-Only: cannot be a Primary node, but can only be a Secondary replica node to prevent some low-performance nodes from becoming master nodes.

Hidden: this type of node cannot be referenced by the client IP, nor can it be set as the master node, but it can vote and is generally used to back up data.

Delayed: you can specify a time delay to synchronize data from the Primary node. Mainly used for backup data, if real-time synchronization, mistakenly deleted data immediately synchronized to the slave node, recovery can not be restored.

Non-Voting: a Secondary node that does not have the right to vote, a pure backup data node.

The above is how to achieve clustering in MongoDB. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.