How to implement slicing operation in MongoDB 07/03 Update SLTechnology News&Howtos

How to implement slicing operation in MongoDB

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to achieve slicing operation in MongoDB, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

1. Brief introduction of slicing

Slicing refers to the process of breaking up data and distributing it on different machines. It is sometimes called zoning. Scatter data on different machines without requiring functionality

Powerful mainframe computers can store more data and handle larger loads.

Almost all database software can be sliced manually, and the application needs to maintain connections to several different database servers, each of which is complete.

Independent. Applications manage different data on different servers, and storage and village searches need to be carried out on the correct server. This method can work well, but also

It is difficult to maintain, such as adding nodes to or removing nodes from the cluster, and adjusting data distribution and load patterns is not easy.

MongoDB supports automatic slicing, which can get rid of manual slicing management. The cluster automatically splits the data to do load balancing.

Automatic slicing of 2.MongoDB

The basic idea of MongoDB slicing is to divide the collection into small pieces. These blocks are scattered into several slices, each of which is responsible for only a portion of the total data. The application does not need to know

Which piece corresponds to which data, and you don't even need to know that the data has been split, so run a routing process named mongos before slicing, and the router knows

The location of all the data, so the application can connect it to send the request normally. For the application, it only knows that it is connected to a normal mongod. The router knows what to do with the film

Corresponding relationship, can forward the request to the correct chip. If the request is answered, the router collects it and sends it back to the application.

When there is no sharding, the client connects to the mongod process, and when sharding, the client connects to the mongos process. Mongos hides the sharding details from the application.

From an application point of view, there is no difference between slicing and non-slicing. So when you need to extend, you don't have to modify the code of the application.

Unfragmented client connections:

Client connection for sharding:

When do you need slicing:

a. The machine does not have enough disks.

b. A single mongod can no longer meet the performance needs of some data

c. Want to put a lot of data in memory to improve performance

Generally speaking, start with never shredding, and then convert it into shards when needed.

3. Chip key

When you set up sharding, you need to select a key from the collection and use the value of that key as the basis for data splitting. This key becomes the chip key.

Suppose there is a collection of documents that represent people. If you choose the name "name" as the slice key, the first article may store a document whose name begins with AME F.

The second film saves the document at the beginning of Gmurp, and the third one saves the document of Qmurz. As slices are added or deleted, MongoDB rebalances the data by comparing the traffic of each slice

Balanced, the amount of data is also within a reasonable range (for example, the data stored by the slice with large flow may be less than that under the flow)

4. Fragment an existing collection

Suppose there is a collection of logs that are now to be sliced. We turn on the sharding function, and then tell MongoDB to use "timestamp" as the slice key, and we want all the data to be placed in the

On a piece of film. You can insert data at will, but it's always on one piece.

Then, add a piece. After the piece is built and run, MongoDB will split the collection into two pieces. Each block contains a chip key value at a certain value

Suppose one of the documents in the range contains documents that are time-stamped before 2011.11.11, and the other contains documents after 2011.11.11. Among them

A piece will be moved to the new film. If the timestamp of the new document is before November 11, 2011, it will be added to the first block, otherwise to the second block.

5. Incremental chip key or random chip key

The selection of chip keys determines the distribution of insertion operations between slices.

If you select a key such as "timestamp", this value may continue to grow, and without much break, all the data will be sent to a single slice.

The one that contains the date after November 11, 2011. If you add a new piece, and then split the data, it will still be imported into a server. A new film has been added

MongoDB will split the post-November 11, 2011 into 2011.11.11-2021.11.11. If the document time is longer than 2021.11.11,

All documents will be inserted as the last piece. This is not suitable for high write loads, but querying by chip keys can be very efficient.

If the write load is high and you want to distribute the load evenly to each chip, you have to choose chip keys that are evenly distributed. The hash value of the timestamp in the log example, without the "logMessage" of the pattern

All of them are compounded with this condition.

Whether the chip bond jumps randomly or increases steadily, the change of the chip key is very important. For example, if you have a "logLevel" key that has only three values, "DEBUG", "WARN", "ERROR"

Under no circumstances can MongoDB use it as a slice key to divide data into more than three slices (because there are only three values). If the key changes too little, but you want to use it as a chip key

You can combine this key with a changing key to create a composite chip key, such as a combination of "logLevel" and "timestamp".

Select the chip key and create the chip key much like an index, so that the two principles are similar. In fact, the chip key is also the most commonly used index.

6. The influence of chip keys on operation

The end user should not be able to tell whether it is sharded or not, but understand how the query is different when selecting different chip keys.

Suppose the same collection of people, which is sliced according to "name", has three slices, and the range of the initials of the name is Amurz. The following queries are made in different ways:

Db.people.find ({"name": "Refactor"})

Mongos will send this query directly to Qmurz tablet, and after getting the response, it will forward it directly to the client.

Db.people.find ({"name": {"$lt": "L"}})

Mongos will first send it to Amurf and Gmurp, and then forward the result to the client.

Db.people.find () .sort ({email: 1})

Mongos will query all chips and merge and sort the results to ensure that the results are in the correct order.

Mongos uses cursors to get data from various servers, so you don't have to wait for all the data to get before sending batch results to the client.

Db.people.find ({"email": re@msn.cn})

Mongos doesn't track the "email" key, so it doesn't know which piece to send the query to. So he sent a query to all the pieces in order.

If you insert a document, mongos sends it to the appropriate slice based on the value of the name key.

7. Create shards

There are two steps to establishing sharding: start the actual server and decide how to split the data.

Slicing generally has three components:

a. Piece

A slice is a container for storing subset data. A slice can be a single mongod server (for development and testing) or a replica set (for production). So one piece

There are multiple servers, there can only be one master server, and other servers hold the same data.

B.mongos

Mongos is the router process equipped with MongoDB. It routes all requests and then aggregates the results. It does not store data or configuration information itself

However, the information about the configuration server is cached.

c. Configure the server

The configuration server stores the configuration information of the cluster: the corresponding relationship between data and slices. Mongos does not permanently store room data, so it needs a place to store the sliced configuration.

It gets synchronization data from the configuration server.

8. Start the server

The first step is to start the configuration server and mongos. The configuration server needs to be started first. Because mongos will use its configuration information.

The configuration server starts just like a normal mongod

Mongod-dbpath "F:\ mongo\ dbs\ config"-port 20000-logpath "F:\ mongo\ logs\ config\ MongoDB.txt"-rest

Configuring the server does not require much space and resources (200m of actual data takes up about the configuration space of 1kB)

Establish the mongos process, a total of application connections. This routing server is not required to connect to the data directory, but be sure to indicate the location of the configuration server:

Mongos-- port 30000-- configdb 127.0.0.1 configdb 20000-- logpath "F:\ mongo\ logs\ mongos\ MongoDB.txt"

Shard management is usually done through mongos.

Add tablet

A film is an ordinary mongod instance (or copy set).

Mongod-dbpath "F:\ mongo\ dbs\ shard"-port 10000-logpath "F:\ mongo\ logs\ shard\ MongoDB.txt"-rest

Mongod-dbpath "F:\ mongo\ dbs\ shard1"-port 10001-logpath "F:\ mongo\ logs\ shard1\ MongoDB.txt"-rest

Connect the mongos you just started and add a slice to the cluster. Start shell and connect to mongos:

Make sure that mongos is connected instead of mongod, and add slices through the addshard command:

> mongo 127.0.0.1:30000mongos > db.runCommand (. {... "addshard": "127.0.0.1 10000",... "allowLocal": true... Sat Jul 21 10:46:38 uncaught exception: error {"$err": "can't find a shard toput new db on", "code": 10185} mongos > use adminswitched to db adminmongos > db.runCommand (. {... "addshard": "127.0.0.1 10000",... AllowLocal: 1. }.) {"shardAdded": "shard0000", "ok": 1} mongos > db.runCommand (. {. "addshard": "127.0.0.1 10001",... AllowLocal: 1. }.) {"shardAdded": "shard0001", "ok": 1}

When running the film on this machine, you have to set the allowLocal key to 1.MongoDB to avoid configuring the cluster locally due to misconfiguration.

So let it know that this is just development, and we know exactly what we are doing. If it is in a production environment, it should be deployed on different machines.

When you want to add slices, running addshard.MongoDB will be responsible for integrating the slices into the cluster.

Segmentation data

MongoDB does not publish every piece of data stored directly, it has to turn on the sharding function at the database and collection level.

If you are connecting to the configuration server

E:\ mongo\ bin > mongo 127.0.0.1:20000MongoDB shell version: 2.0.6connecting to: 127.0.0.1:20000/test > use adminswitched to db admin > db.runCommand ({"enablesharding": "test"}) {"errmsg": "no such cmd: enablesharding", "bad cmd": {"enablesharding": "test"}, "ok": 0}

Should be to connect to the routing server:

Db.runCommand ({"enablesharding": "test"}) / / enables sharding in the test database.

After the database is sliced, its internal collections will be stored on different slices, which is also the precondition for these sets.

After sharding is enabled at the database level, you can use the shardcollection command to accumulate and shard:

Db.runCommand ({"shardcollection": "test.refactor", "key": {"name": 1}}) / / A pair of refactor collections of test database are fragmented, and the slice key is name

If you add data to the refactor collection now, it will be automatically distributed to each slice according to the value of "name".

9. Production configuration

After entering the production environment, a more robust sharding solution is needed. Successful shard construction requires the following conditions:

Multiple configuration servers

Multiple mongos servers

Every film is a copy.

The correct setting w

Robust configuration

It is easy to set up multiple configuration servers.

Setting up multiple configuration servers is the same as setting up one configuration server

Mongod-dbpath "F:\ mongo\ dbs\ config"-port 20000-logpath "F:\ mongo\ logs\ config\ MongoDB.txt"-restmongod-- dbpath "F:\ mongo\ dbs\ config1"-port 20001-logpath "F:\ mongo\ logs\ config1\ MongoDB.txt"-restmongod-dbpath "F:\ mongo\ dbs\ config2"-port 20002-logpath "F:\ mongo\ logs\ config2\ MongoDB.txt"-rest

When you start mongos, you should connect it to three configuration servers:

Mongos-- port 30000-- configdb 127.0.0.1 configdb 20000127.0.0.1 logpath "F:\ mongo\ logs\ mongos\ MongoDB.txt"

The configuration server uses a two-step commit mechanism, rather than the normal asynchronous replication of MongoDB, to maintain different copies of the cluster configuration. This can ensure the state of the cluster.

Consistency. This means that after a configuration server goes down, the configuration information of the cluster is read-only. The client can still read and write, but only all configuration servers are backed up

The data can only be rebalanced later.

Multiple mongos

There is no limit to the number of mongos. It is recommended that only one mongos process be run against an application server. So that each application server can communicate with mongos

Local answer, if the server is not working, there will be no application trying to talk to the mongos that does not exist.

A robust film

In a production environment, each piece should be a replica set, so that a single server breaks down and does not cause the whole piece to fail. You can add a copy set as a slice with the addshard command

When adding, just specify the name and seed of the replica set.

If you want to add a replica set refactor, which contains a server 127.0.0.1 10000 (and other servers), you can add it to the cluster with the following command:

Db.runCommand ({"addshard": "refactor/127.0.0.1:10000"})

If 127.0.1 mongos 10000 server crashes, it will know that it is connected to a replica set and will use the new primary node.

10. Manage shards

The sharding information is mainly stored in the config database so that it can be accessed by any process connected to the mongos.

Configuration set

Mongos is connected in shell and the use config database is used

a. Piece

All the films can be found in the shareds collection.

Db.shards.find ()

b. Database

The databases collection contains a list of databases and some related information that have been included on the film.

Db.databases.find ()

The returned document explains:

"_ id"

Represents the database name

"partitioned"

Indicates whether the sharding feature is enabled

"primary"

This value corresponds to "_ id". Where is the data stronghold of the table name? Whether sliced or not, the database will always have a base camp. If it is fragmented, it will be used when creating a database

Choose a piece at random. In other words, the base camp is where to start creating database documents. Although many other servers will be used in the database when sharding, it will start with this slice.

c. Block

Block information is stored in the chunks collection. This shows how the data is sliced into the cluster.

Db.chunks.find ()

Shard command

Get a summary

Db.printShardingStatus ()

Delete a piece

Use removeshard to delete slices from the cluster. Removeshard will move the data of all blocks on a given slice to other slices.

Db.runCommand ({"removeshard": "127.0.0.1 10001"})

During the move, removeshard will show the progress.

The above content is how to implement the sharding operation in MongoDB. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.