[MongoDB] 03, MongoDB Index and fragmentation basis 07/06 Update SLTechnology News&Howtos

[MongoDB] 03, MongoDB Index and fragmentation basis

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I. MongoDB configuration

The configuration items in the mongodb configuration file / etc/mongodb.conf are actually mongod startup options (same as memcached)

[root@Node7] # mongod-- helpAllowed options:General options:-h [--help] show this usage information-- version show version information-f [--config] arg configuration file specifying additional options-v [--verbose] be more verbose (include multiple times for more verbosity e.g.-vvvvv) -quiet quieter output-- port arg specify port number-27017 by default-- bind_ip arg comma separated list of ip addresses to listen on-all local ips by default-- maxConns arg max number of simultaneous connections-20000 by default-- logpath arg Log file to send write to instead of stdout-has to be a file Not directory-logappend append to logpath instead of over-writing-pidfilepath arg full path to pidfile (if not set No pidfile is created)-keyFile arg private key for cluster authentication-setParameter arg Set a configurable parameter-nounixsocket disable listening on unix sockets-unixSocketPrefix arg alternative directory for UNIX domain sockets (defaults to / tmp)-fork fork server process-syslog Log to system's syslog facility instead of file or stdout-- auth run with security-- cpu periodically show cpu and iowait utilization-- dbpath arg directory for datafiles-defaults to / data/db/-- diaglog arg 0=off 1W2R 3=both 7=W+some reads-- directoryperdb Each database will be stored in a separate directory-ipv6 enable IPv6 support (disabled by default)-journal enable journaling-journalCommitInterval arg how often to group/batch commit (ms)-journalOptions arg journal diagnostic options-jsonp allow JSONP access via http (has security Implications)-noauth run without security-nohttpinterface disable httpinterface-nojournal disable journaling (journaling is on by default for 64 bit)-noprealloc disable data file preallocation-will often hurt performance-noscripting Disable scripting engine-notablescan do not allow table scans-nssize arg (= 16). Ns file size (in MB) for new databases-- profile arg 0=off 1=slow 2=all-quota limits each database to a certain number of files (8 default)-quotaFiles arg number of files allowed per db Requires-- quota-- repair run repair on all dbs-- repairpath arg root directory for repair files-defaults to dbpath-- rest turn on simple rest api-- shutdown kill a running server (for init scripts)-- slowms arg value of slow for profile and console log-- smallfiles Use a smaller default file size-- syncdelay arg (= 60) seconds between disk syncs (0=never But not recommended)-sysinfo print some diagnostic system information-upgrade upgrade db if neededReplication options:-oplogSize arg size to use (in MB) for replication op log. Default is 5% of disk space (i.e. Large is good) Master/slave options (old Use replica sets instead:-- master master mode-- slave slave mode-- source arg when slave: specify master as-- only arg when slave: specify a single database to replicate-- slavedelay arg specify delay (in seconds) to be used when applying master ops to slave-- autoresync automatically resync if slave data is staleReplica set options:-- ReplSet arg arg is [/]-- replIndexPrefetch arg specify index prefetching behavior (if secondary) [none | _ id_only | all] Sharding options:-- configsvr declare this is a config db of a cluster Default port 27019; default dir / data/configdb-- shardsvr declare this is a shard db of a cluster Default port 27018SSL options:-sslOnNormalPorts use ssl on configured ports-- sslPEMKeyFile arg PEM file for ssl-- sslPEMKeyPassword arg PEM file password-- sslCAFile arg Certificate Authority file for SSL-- sslCRLFile arg Certificate Revocation List file for SSL-- sslWeakCertificateValidation allow client to connect without presenting a Certificate-- sslFIPSMode activate FIPS 140-2 mode at startup

Common configuration parameters:

Fork= {true | false} whether mongod is running in the background

Bind_ip=IP specifies the listening address

Port=PORT specifies the port to listen on. Default is 27017.

MaxConns=N specifies the maximum number of concurrent connections

Syslog=/PATH/TO/SAME_FILE specifies the log file

Whether httpinterface=true enables web monitoring function. The port is mongod port + 1000.

Whether journal starts the transaction log. It has been started by default.

Slowms arg (= 100) sets slow query (in ms). If the time exceeds the set time, it is slow query. Default is 100ms.

This should be enabled to repair data when repair shuts down unexpectedly

II. Index

Indexes can usually greatly improve the efficiency of queries. If there is no index, MongoDB must scan each file in the collection and select those records that meet the query criteria when reading data. The query efficiency of this kind of scanning full collection is very low, especially when dealing with a large amount of data, the query can take dozens of seconds or even minutes, which is very fatal to the performance of the website.

An index is a special data structure, which is stored in a data collection that is easy to traverse and read. An index is a structure that sorts the values of one or more columns in a database table.

1. Type of index

B+ Tree, hash, Spatial Index, full-text Index

Indexes supported by MongoDB:

Single key index, combined index (multi-field index),

Multi-key index: an index created on a key-value pair.

Spatial indexes: location-based lookup

Text index: equivalent to full-text index

Hash index: precise search, not suitable for range search

2. Management of index

Create:

Db.mycoll.ensureIndex (keypattern [, options])

View help information:

Db.mycoll.ensureIndex (keypattern [, options])-options is an object with these possible fields: name, unique, dropDups

Db.COLLECTION_NAME.ensureIndex ({KEY:1})

In the syntax, the Key value is the index field you want to create, 1 is the specified index in ascending order, and if you want to create the index in descending order, specify-1. In the ensureIndex () method, you can also set to create indexes using multiple fields (called composite indexes in relational databases). Db.col.ensureIndex ({"title": 1, "description":-1})

EnsureIndex () receives optional parameters. The list of optional parameters is as follows:

The ParameterTypeDescriptionbackgroundBoolean indexing process blocks other database operations, and background can specify that the index is created in the background, that is, adding the "background" optional parameter. The default value for background is false. Whether the index established by uniqueBoolean is unique. Specifies that a unique index is created for true. The default value is the name of the false.namestring index. If not specified, MongoDB generates an index name by concatenating the field name and sort order of the index. Whether or not dropDupsBoolean deletes duplicate records when creating a unique index, specifies that true creates a unique index. The default value is false.sparseBoolean, which does not enable indexing for field data that does not exist in the document; this parameter requires special attention. If set to true, documents that do not contain corresponding fields will not be queried in the index field. The default value is false.expireAfterSecondsinteger, which specifies a value in seconds, completes the TTL setting, and sets the lifetime of the collection. The version number of the vindex version index. The default index version depends on the version that mongod runs when the index is created. The weightsdocument index weight value, with a value between 1 and 99999, indicates the score weight of the index relative to other index fields. Default_languagestring for text indexing, this parameter determines the list of rules for deactivated words and stemming and lexical organs. The default is English language_overridestring for text index, this parameter specifies the field name included in the document, the language overrides the default language, and the default value is language.

Query:

Db.mycoll.getIndex ()

Delete:

Db.mycoll.dropIndexes () deletes all indexes of the current collection

Db.mycoll.dropIndexes ("index") deletes the specified index

Db.mycoll.reIndex () rebuilds the index

Example:

> db.students.find () > for (iTun1) I db.students.find () .count () 100 > db.students.find () {"_ id": ObjectId ("58d613021e8383d30814f846"), "name": "student1", "age": 1} {"_ id": ObjectId ("58d613021e8383d30814f847"), "name": "student2", "age": 2} {"_ id": ObjectId ("58d613021e8383d30814f848"), "name": "student3" "age": 3} {"_ id": ObjectId ("58d613021e8383d30814f849"), "name": "student4", "age": 4} {"_ id": ObjectId ("58d613021e8383d30814f84a"), "name": "student5", "age": 5} {"_ id": ObjectId ("58d613021e8383d30814f84b"), "name": "student6", "age": 6} {"_ id": ObjectId ("58d613021e8383d30814f84c") "name": "student7", "age": 7} {"_ id": ObjectId ("58d613021e8383d30814f84d"), "name": "student8", "age": 8} {"_ id": ObjectId ("58d613021e8383d30814f84e"), "name": "student9", "age": 9} {"_ id": ObjectId ("58d613021e8383d30814f84f"), "name": "student10" "age": 10} {"_ id": ObjectId ("58d613021e8383d30814f850"), "name": "student11", "age": 11} {"_ id": ObjectId ("58d613021e8383d30814f851"), "name": "student12", "age": 12} {"_ id": ObjectId ("58d613021e8383d30814f852"), "name": "student13", "age": 13} {"_ id": ObjectId ("58d613021e8383d30814f853") "name": "student14", "age": 14} {"_ id": ObjectId ("58d613021e8383d30814f854"), "name": "student15", "age": 15} {"_ id": ObjectId ("58d613021e8383d30814f855"), "name": "student16", "age": 16} {"_ id": ObjectId ("58d613021e8383d30814f856"), "name": "student17" "age": 17} {"_ id": ObjectId ("58d613021e8383d30814f857"), "name": "student18", "age": 18} {"_ id": ObjectId ("58d613021e8383d30814f858"), "name": "student19", "age": 19} {"_ id": ObjectId ("58d613021e8383d30814f859"), "name": "student20", "age": 20} Type "it" for more # only shows the first 20 It shows more > db.students.ensureIndex ({name:1}) # build the index on the name key, and 1 indicates ascending order -1 means descending > show collectionsstudentssystem.indexest1 > db.students.getIndexes () [{# default index "v": 1, "name": "_ id_" "key": {"_ id": 1}, "ns": "students.students" # database Collection}, {"v": 1, "name": "name_1", # automatically generated index name "key": {"name": 1 # index created on the name key} "ns": "students.students"}] > db.students.dropIndexes ("name_1") # Delete the specified index {"nIndexesWas": 2, "msg": "non-_id indexes dropped for collection", "ok": 1} > db.students.getIndexes () [{"v": 1 "name": "_ id_", "key": {"_ id": 1}, "ns": "students.students"}] > db.students.dropIndexes () # default index cannot be deleted {"nIndexesWas": 1, "msg": "non-_id indexes dropped for collection", "ok": 1} > db.students.getIndexes [{"v": 1, "name": "_ id_" "key": {"_ id": 1}, "ns": "students.students"} > db.students.find ({age: "90"}). Explain () # shows the query process {"cursor": "BtreeCursor T1", "isMultiKey": false "n": 0, "nscannedObjects": 0, "nscanned": 0, "nscannedObjectsAllPlans": 0, "nscannedAllPlans": 0, "scanAndOrder": false, "indexOnly": false, "nYields": 0, "nChunkSkips": 0, "millis": 17 "indexBounds": index "age" used by {#: [["90", "90"]]} "server": "Node7:27017"}

III. Slicing of MongoDB

1. Introduction to slicing

With the development of business, when the data set is getting larger and larger, there are bottlenecks in CPU, Memory and IO, so it is necessary to expand mongodb.

Increasing mongodb can only balance the read pressure, but not the write pressure, so it is necessary to fragment the data set.

Mongodb native support for sharding

MySQL sharding solution (framework), requires senior DBA (more than 5 years experience)

Gizzard, HiveDB, MySQL Proxy + HSACLE, Hibernate Shard, Pyshards

2. Roles in sharding architecture

Mongos:Router

It is equivalent to an agent, routing user requests to appropriate shards for execution, without storing or querying data.

Config server: metadata servers, which also need multiple, but not replica sets, and need to be implemented with other tools such as zookeeper

The index of the dataset stored on the shard server is stored

Shard: data node, also known as mongod instance

Zookeeper:

It is often used to realize the coordination of the central node of the distributed system, which can provide the mechanism of electing and electing the master node; zookeeper itself can also be distributed.

3. The way of slicing

Slicing is based on collection

In order to ensure the data set balance on each shard node, each collectin is cut into fixed-size chunk (blocks), and then allocated to the shard node one by one.

Based on range slices:

Range, the index used must be a sequential index, which supports sorting such as: B tree index

Distribute chunk evenly according to the index

Based on list slices:

List, a discrete way to put values in a list

Based on hash slices:

Hash, press the key to model the number of shard servers, store them separately, and realize the hot spot data divergence.

Which slicing method to use needs to be determined according to your own business model.

Principles for slicing:

Discrete writing, centralized reading

Db.enableSharding ("testdb")

IV. Actual combat cases

1. Architecture

2. Configuration process

1) config server node should be configured first

Using configsvr=true configuration, no need to join replica set, listening on tcp:27019 port

2) mongos

Just use-- configdb=172.16.100.16:27019 to specify config server when you start mongos, and listen on tcp 27017 as the proxy

Options when mongos starts:

Mongos-configdb=172.168.100.16-fork-logpath=/var/log/mongodb/mongos.log

3) add a shard node to the mongos node

Help with shard-related commands:

TestSet:PRIMARY > sh.help () sh.addShard (host) server:port OR setname/server:port # add shard node Can be replica set name sh.enableSharding (dbname) enables sharding on the database dbname # specify which database to enable slicing function sh.shardCollection (fullName,key,unique) shards the collection # which collection is sliced sh.splitFind (fullName Find) splits the chunk that find is in at the median sh.splitAt (fullName,middle) splits the chunk that middle is in at middle sh.moveChunk (fullName,find,to) move the chunk where 'find' is to' to' (name of shard) sh.setBalancerState () turns the balancer on or off true=on False=off sh.getBalancerState () return true if enabled sh.isBalancerRunning () return true if the balancer has work in progress on any mongos sh.addShardTag (shard,tag) adds the tag to the shard sh.removeShardTag (shard,tag) removes the tag from the shard sh.addTagRange (fullName,min,max Tag) tags the specified range of the given collection sh.status () prints a general overview of the clustest # View the status of shard "primary" means that if some collection is very small, there is no need to do shard, and there are no data nodes stored in the collection of shard.

When creating a collection, specify which key to use as the shard

Sh.shardCollection (fullName,key,unique)

FullName: for the full name, including database and collection: database name. Collection name

Example: sh.shardCollection ("testdb.students", {"age": 1})

Indicates that the students collection in the testdb library is sliced and an ascending index is created based on the "age" field; then the data under the testdb library students collection is automatically distributed to each shard node

Use admin

Db.runCommand ("listShards") # lists shard nodes

Db.printShardingStatus () is the same as sh.status ()

Sh.isBanlancerRunning () # check whether the equalizer is running and will only run automatically when equalization is needed

Sh.getBalancerState () # whether the equalization function is enabled

Sh.moveChunk (fullName,find,to) # manually move chunk, not recommended

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.