The concept and principle of mongodb sharding-actual combat sharding cluster 07/15 Update SLTechnology News&Howtos

The concept and principle of mongodb sharding-actual combat sharding cluster

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I. slicing

Sharding is a method of distributing data across multiple machines. MongoDB uses sharding to support deployments with very large datasets and high-throughput operations.

Question:

Database systems with large datasets or high-throughput applications may challenge the capacity of a single server. For example, a high query rate can deplete the server's CPU capacity. The RAM whose working set size is larger than the system emphasizes the I / O capacity of the disk drive.

There are two ways to solve system growth: vertical scaling and horizontal scaling.

Vertical scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. The limitations of available technologies may limit the power of a single machine for a given workload. In addition, cloud-based providers have a hard upper limit based on available hardware configurations. As a result, vertical scaling has an actual maximum.

Horizontal scaling involves dividing the system dataset and loading multiple servers, and adding additional servers to increase capacity as needed. Although the overall speed or capacity of a single machine may not be high, each machine processing a subset of the entire workload may provide more efficiency than a single high-speed, high-capacity server. Expanding deployment capacity only requires the addition of additional servers as needed, which may be lower than the overall cost of high-end hardware for a single machine. The tradeoff is the increased complexity of infrastructure and deployment maintenance.

MongoDB supports horizontal scaling through sharding

II. Overview of sharding clusters

The MongoDB sharding cluster consists of the following components:

Sharding: is an independent ordinary mongod process that stores data and information. It can be a replica set or a separate server.

Mongos: serves as a routing function for programs to connect. The data is not saved itself, and the cluster information is loaded from the configuration server at startup. To start the mongos process, you need to know the address of the configuration server and specify the configdb option.

Configuration server: is a separate mongod process that stores cluster and shard metadata, that is, information about what data each shard contains. Start the establishment first and enable the logging feature. Start the configuration server like a normal mongod and specify the configsvr option. Without too much space and resources, the 1KB space of the configuration server is equivalent to the 200MB of real data. What is saved is only the distribution table of the data. When the service is not available, it becomes read-only and cannot be partitioned or migrated.

The interaction diagram is as follows:

MongoDB shards the data at the collection level and distributes the collection data among the shards in the cluster.

1. Lock key

In order to distribute documents in the collection, MongoDB uses the sharding key to partition the collection. The sharding key consists of immutable fields that exist in each document in the target collection.

Select the shard key when you are in the sharding collection. The selection of the sharding key cannot be changed after sharding. The sharding collection can have only one shard key. To fragment a non-empty collection, the collection must have an index that begins with the shard key. For an empty collection, MongoDB creates an index if the collection does not already have the appropriate index for the specified sharding key.

The selection of sharding keys will affect the performance, efficiency and scalability of sharding clusters. Clusters with the best hardware and infrastructure may suffer bottlenecks due to the selection of sharding keys. Selecting the sharding key and its supporting index also affects the sharding strategy that the cluster can use.

2. Chunks

MongoDB divides the fragmented data into blocks. Each block has a lower and exclusive higher range based on the sharding key.

3. Balancer and even block allocation

In order to achieve uniform distribution of blocks across all shards in the cluster, the balancer runs in the background to move blocks across shards.

4. Advantages of Sharding

(1) read and write

MongoDB's distribution throughout the read and write workload fragments will be in the sharding cluster, so that each fragment will handle a subset of cluster operations. By adding more shards, read and write workloads can be scaled horizontally in the cluster.

For queries that contain the shard key or the compound shard key prefix mongos, you can locate the query on a specific shard or shard set. These target operations are usually more efficient than broadcasting to each shard in the cluster.

(2) Storage capacity

Split allocates the entire data fragment cluster, allowing each fragment to contain a subset of the total cluster data. As the dataset grows, additional shards increase the storage capacity of the cluster.

(3) High availability

The fragmentation cluster can continue to execute, even if one or more fragments are partial read / write operations that are not available. Although a subset of data on unavailable shards cannot be accessed during downtime, reads or writes to available shards can still be successful.

Note:

(1) starting with MongoDB 3.2, you can deploy the configuration server as a replica set. A sharded cluster with a configured server replica set (CSRS) can continue to process reads and writes as long as most replica sets are available.

(2) in version 3.4, MongoDB removed support for SCCC configuration servers.

(3) in a production environment, shards should be deployed as replica sets to provide higher redundancy and availability.

5. Matters needing attention in sharding cluster

(1) the requirements and complexity of sharding cluster infrastructure need to be carefully planned, implemented and maintained.

(2) the selection of sharding key needs to be carefully considered to ensure cluster performance and efficiency. After sharding, you cannot change the sharding key, nor can you cancel sharding.

(3) Sharding has certain operation requirements and restrictions.

(4) if the query does not include the prefix of the shard key or the compound shard key, mongos performs a broadcast operation and queries all shards in the shard cluster. These scatter / collect queries can be long-running operations.

6. Slicing strategy

MongoDB supports two sharding strategies for distributing data across sharding clusters.

(1) Hash slicing

The hash shard involves calculating the hash of the shard key field value. Then, a range is assigned to each block based on the hash fragment key value.

(2) remote slicing

Remote sharding involves dividing data into ranges based on sharding key values. Then assign a range to each block according to the fragment key value

7. MongoDB Sharding Cluster role

(1) Shard Server

That is, each Shard that stores the actual data can be a mongod instance or a set of mongod instances.

The composition of the Replica Set. In order to implement the auto-failoverMongoDB within each Shard, it is officially recommended that each Shard

For a group of Replica Set.

(2) Config Server

In order to store a particular collection in multiple shard, you need to specify a shard key for that collection

For example, {age: 1} shard key can determine which chunk the record belongs to. Config Servers is used to store

Configuration information of all shard nodes, shard key range of each chunk, distribution of chunk in each shard,

Sharding configuration information for all DB and collection in the cluster.

(3) Route Process

This is a front-end routing client that accesses from this and then asks Config Servers which Shard to query or

Save the record, then connect to the corresponding Shard for operation, and finally return the result to the client. The client only needs to set the original

Queries or update requests sent to mongod are sent to Routing Process intact, regardless of the records being operated

Which Shard is stored on?

Third, assign clusters and build them.

1. Environmental preparation

(1), database environment

Hostnam

Database IP address

Database version

Use

System

SQL_mongdb

172.169.18.128

Mongodb4.0.3

Configuration 3, routing 1, sharding 2

Cenots7.4

Node01

172.169.18.162

Mongodb4.0.3

Route 1, sharding 2

Centos7.4

Node01

172.169.18.180

Mongodb4.0.3

Shard 2

Centos7.4

(2) temporarily disable the firewall and seliunx, and then enable the security rules after the test.

(3) install mongdb4.0 version of yum (omitted)

Similarly, install the service on node01 and node02

Understand the meaning of slice keys before deployment. A good slice key is very important for slicing. The chip key must be an index according to which the data is split and dispersed. Adding via sh.shardCollection automatically creates the index. A self-increasing chip key is not good for writing and data uniform distribution, because the self-increasing chip key will always be written on one slice, and later reaching a certain threshold may be written to another slice. But pressing the photo button to query can be very efficient. Random chip keys have a good effect on the uniform distribution of data. Be careful to avoid querying on multiple fragments as much as possible. When querying on all shards, mongos merges and sorts the results.

(4)

Frame diagram

2. Configure the startup of the server. (open 3 on SQL_mongdb, Port:21000, 22000, 2300)

Note: 1 or 3 configuration servers must be enabled. If 2 servers are enabled, an error will be reported:

(1) create a directory

[root@SQL_mongdb /] # mkdir-p / opt/mongodb/date1

[root@SQL_mongdb /] # mkdir-p / opt/mongodb/date2

[root@SQL_mongdb /] # mkdir-p / opt/mongodb/date3

(2) New configuration file

[root@SQL_mongdb ~] # cat / etc/mongodb_21000.conf

# data directory

Dbpath=/opt/mongodb/date1/

# Log file

Logpath=/opt/mongodb/mongodb_21000.log

# Log append

Logappend=true

# Port

Port = 21000

# maximum number of connections

MaxConns = 50

Pidfilepath = / opt/mongodb/mongo_21000.pid

# Log, redo log

Journal = true

# swipe submission mechanism

JournalCommitInterval = 200

# Daemon mode

Fork = true

# how often data is brushed to the log

Syncdelay = 60

# storageEngine = wiredTiger

# Operation log (in M)

OplogSize = 1000

The file size of # namespace is 16m by default and the maximum is 2G.

Nssize = 16

Noauth = true

UnixSocketPrefix = / tmp

Configsvr = true

ReplSet=jiangjj

Bind_ip = 172.169.18.128

[root@SQL_mongdb ~] # cat / etc/mongodb_22000.conf

# data directory

Dbpath= / opt/mongodb/date2/

# Log file

Logpath= / opt/mongodb/mongodb_22000.log

# Log append

Logappend=true

# Port

Port = 22000

# maximum number of connections

MaxConns = 50

Pidfilepath = / opt/mongodb/mongo_22000.pid

# Log, redo log

Journal = true

# swipe submission mechanism

JournalCommitInterval = 200

# Daemon mode

Fork = true

# how often data is brushed to the log

Syncdelay = 60

# storageEngine = wiredTiger

# Operation log (in M)

OplogSize = 1000

The file size of # namespace is 16m by default and the maximum is 2G.

Nssize = 16

Noauth = true

UnixSocketPrefix = / tmp

Configsvr = true

ReplSet=jiangjj

Bind_ip = 172.169.18.128

[root@SQL_mongdb ~] # cat / etc/mongodb_23000.conf

# data directory

Dbpath= / opt/mongodb/date3/

# Log file

Logpath= / opt/mongodb/mongodb_23000.log

# Log append

Logappend=true

# Port

Port = 23000

# maximum number of connections

MaxConns = 50

Pidfilepath = / opt/mongodb/mongo_23000.pid

# Log, redo log

Journal = true

# swipe submission mechanism

JournalCommitInterval = 200

# Daemon mode

Fork = true

# how often data is brushed to the log

Syncdelay = 60

# storageEngine = wiredTiger

# Operation log (in M)

OplogSize = 1000

The file size of # namespace is 16m by default and the maximum is 2G.

Nssize = 16

Noauth = true

UnixSocketPrefix = / tmp

Configsvr = true

ReplSet=jiangjj

Bind_ip = 172.169.18.128

(3) launch configuration file (instance process)

[root@SQL_mongdb] # mongod-f / etc/mongodb_21000.conf

[root@SQL_mongdb] # mongod-f / etc/mongodb_22000.conf

[root@SQL_mongdb] # mongod-f / etc/mongodb_23000.conf

Note:

Turn off the process service

# mongod-f / etc/mongodb_21000.conf-- shutdown

If the access is not normal, the restart will report an error, and the mongd.lock must be deleted before it can be started successfully.

(4) configure nodes to form a cluster (replica set)

Launch the configuration at any node, where the SQL_jiangjj node is used

Log into the database

[root@SQL_mongdb] # mongo-- host 172.169.18.128VR 21000

> use admin

> cfg= {_ id: "jiangjj", members: [{_ id:0,host:'172.169.18.128:21000',priority:3}, {_ id:1,host:'172.169.18.128:22000',priority:2}, {_ id:2,host:'172.169.18.128:23000',priority:1}]}

# configuration effective command

> rs.initiate (cfg)

3. Routing configuration (one on SQL_mongodb and one on node01, port:3000)

The routing server does not save the data, just record the log.

(1) add a configuration file to SQL_mongodb

[root@SQL_mongdb ~] # vim / etc/mongodb_30000.conf

# Log file

Logpath = / opt/mongodb/mongodb_route.log

# Log append

Logappend = true

# Port

Port = 30000

# maximum number of connections

MaxConns = 20000

# chunkSize=1

# bind address

Bind_ip = 0.0.0.0

Pidfilepath = / opt/mongodb/mongo_30000.pid

# must be 1 or 3 configurations

Configdb = jiangjj/172.169.18.128:21000172.169.18.128:22000172.169.18.128:23000

# configdb=127.0.0.1:20000 # error report

# Daemon mode

Fork = true

(2) enable mongos

[root@SQL_mongdb] # mongos-f / etc/mongodb_30000.conf

(3) configure a normal shard file

View statu

Follow the above method to open the sharding service and routing service (the same as the configuration file) on node01, and start the sharding service on node02. The configuration server, routing server and sharding server to this shard have all been deployed.

Note: one configuration of sharding, the other is similar.

[root@node01 etc] # vim mongodb_60000.conf

# mongodb

Dbpath=/opt/mongodb/date2/

Logpath=/opt/mongodb_60000.log

Pidfilepath=/opt/mongodb/mongodb_60000.pid

Directoryperdb=true

Logappend=true

Bind_ip=172.169.18.162

Port=60000

OplogSize=100

Fork=true

Noprealloc=true

4. Configure shards: the following operations are performed on the command line of mongodb

(1) Log in to the routing server mongos:

[root@SQL_mongdb] # mongo-- port=30000

Mongos > use admin

Mongos > db.runCommand ({addshard:'172.169.18.128:60000'})

Mongos > db.runCommand ({addshard:'172.169.18.162:60000'})

Mongos > db.runCommand ({addshard:'172.169.18.180:60000'})

# View

Mongos > sh.status ()

(2) enable sharding function: sh.enableSharding ("library name"), sh.shardCollection ("library name. Collection name", {"key": 1})

Mongos > sh.enableSharding ('jiangjj')

Mongos > sh.status ()

Mongos > sh.shardCollection ("jiangjj.text", {"name": 1})

The error is as follows:

Cannot accept sharding commands if not started with-shardsvr

Add the following parameters to the sharding configuration file:

Shardsvr = true

Execute the above command after restarting the process

View: mongos > sh.status ()

View details:

Mongos > sh.status ({"verbose": 1})

Mongos > db.printShardingStatus ("vvvv")

Mongos > printShardingStatus (db.getSisterDB ("config"), 1)

# determine whether it is Sharding or not

5. Test View

1. Add database and data on the routing Mongos side, so

2. The added database is assigned to the shard node that does not listen.

Note: what we do here is a single point. If a fragment is broken, the data will be lost. We can use the replica set to reduce the disaster.

4. High availability: Sharding+Replset

1. Add a shard in each of the three nodes, port:50000,name:user01

One of the clusters is sliced, modified and adjusted according to the actual situation

# mongodb

Dbpath=/opt/mongodb/date1/

Logpath=/opt/mongodb_50000.log

Pidfilepath=/opt/mongodb/mongodb_50000.pid

# keyFile=/opt/mongodb/mongodb.key / / user authentication files between nodes, content must be consistent, permission 600, only replica set mode is valid

Directoryperdb=true

Logappend=true

ReplSet=user01

Bind_ip=172.169.18.162

Port=50000

# auth=true

OplogSize=100

Fork=true

Noprealloc=true

# maxConns=4000

Shardsvr = true

2. Set shards to replica sets

Log into the database

[root@SQL] # mongo-- host 172.169.18.128VR 50000

> use admin

> user01db= {_ id: "user01", members: [{_ id:0,host:'172.169.18.128:50000',priority:3}, {_ id:1,host:'172.169.18.162:50000',priority:2}, {_ id:2,host:'172.169.18.180:50000',priority:1}]}

# configuration effective command

> rs.initiate (user01db)

3. Configure routing nodes

[root@SQL_mongdb] # mongo-- port=30000

Switch to the admin library

Mongos > use admin

# add sharding node

Mongos > sh.addShard ("user01/172.169.18.128:50000")

Note:

# you can also add a replica set directly to the routing node

Mongos > sh.addShard ("user01/172.169.18.128:50000172.169.18.162:50000172.169.18.180:50000")

# View cluster information

Mongos > sh.status ()

Supplementary note: equalizer

Equalizer: the equalizer is responsible for data migration. It periodically checks whether the shard is unbalanced. If it does not exist, it will start the block migration. The state in the config.locks set indicates whether the equalizer is running, 0 means inactive, and 2 means equalization. The process of balancing and migrating data will increase the load of the system: the target shard must query all the documents of the source shard, insert the document into the target shard, and then clear the data of the source shard. The equalizer can be turned off (not recommended): closing will result in uneven data distribution among shards and inefficient utilization of disk space.

View status: mongos > sh.getBalancerState ()

Close command: mongos > sh.stopBalancer ()

Open command: mongos > sh.setBalancerState (true)

4. Add two databases, user01 and user02, to Mongos, and add data test replica set

Assigned to different sharding nodes to view different data

5. Migrate the sharded database and delete the sharding

When deleting a shard, you must ensure that the data on the shard is moved to another shard. For a shard collection, use an equalizer to migrate the data block. For a non-shard collection, you must modify the main shard of the collection.

1) migrate the database jiangjj04 on shard0001 shards to user01 collection shards

Mongos > use admin

Mongos > db.runCommand ({moveprimary: "jiangjj04", to: "user01"})

As follows:

2. Delete shards:

# need to delete under admin (need to execute twice)

Mongos > db.runCommand ({"removeshard": "jiangjj01"})

Note: the last part of MongoDB cannot be deleted, otherwise the following error will be reported

3. Delete the database on the shard

Mongos > use jsqdb

Mongos > db.dropDatabase ()

Third, cluster user authentication settings

1. Create a verification key file

The function of keyFile file: security authentication between clusters, adding security authentication mechanism KeyFile (if keyfile authentication is enabled, auth authentication is enabled by default. In order to ensure that I can log in later, I have created a user)

# cd / opt/mongodb/

# touch .keyFile

# chmod 600 .keyFile

# openssl rand-base64 102 > .keyFile

102: is the file size

Note: before creating a keyFile, you need to stop the mongod service of all master and slave nodes in the replica set, and then create it, otherwise the service may not start.

2. Add a root account to the Mongos node

Mongos > use admin

Mongos > db.createUser ({

... User:'root'

... Pwd:'123456'

... Roles: [{role:'root',db:'admin'}]

3. Update all configurations and sharding node profiles, and configure only the keyFile parameter in the routing profile.

KeyFile=/home/data/.keyFile

Auth=true

4. Start the replica set and test

Root user

Now only two routing node ports 30000 have access to the database, and other nodes have access to the error message in the following figure

Configure users for jiangjj01 on the mongos node

Add a jiangjj user with read and write permission

Additional knowledge points:

Permission scaling

1) how to increase permissions

Db.grantRolesToUser (

[{role:,db:}])

Note: this method accepts 2 parameters instead of two objects, and how repeated permissions will not be overridden, the new one will be added

For example:

Db.grantRolesToUser ('roleTest, [{role:'readWrite',db:'test'}])\

2) how to shrink permissions

Db.revokeRolesToUser (

[{role:,db:}])

For details of permission configuration, please refer to the official

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.