Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduce MongoDB in detail

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

I. Overview

(1) version history 0.x starting node 1.x supports replication set and shard 2.x richer database functions 3.x merges a Wired Tiger company specializing in database engine, and the more perfect surrounding ecological environment 4.x supports distributed transactions.

The official version of MongoDB is an even version, x.x.x, the major version (x.x) is upgraded about once a year, and the minor version is mainly to fix problems, usually released once a month or two.

MongoDB supports native high availability: Application connects to Primary nodes through Driver, and a Primary node connects to multiple Secondary nodes.

MongoDB supports horizontal expansion, sharding clustering: Driver connects multiple Mongos,Mongos to multiple Shard, each Shard is a Primary and multiple Secondary.

2. Replication set

Mainly used to achieve high availability of services

(1) characteristics

The replication set of MongoDB has the following main characteristics:

Fast replication: when data is written, the data is quickly copied to another node. Failover: automatically selects a new node instead when the node that accepts the write fails. Other functions: data distribution, separation of reading and writing, remote disaster recovery. (2) MongoDB data replication principle A modification operation will be recorded in oplog, there is a thread listening to oplog, if there is a change, this change will be applied to other databases. The slave node opens a tailable cursor on the master node, constantly fetches the newly added oplog, and plays it back on the slave library. (3) Inter-node election

A typical replica set consists of more than three nodes with voting rights, one Primary accepts write operations and voting at election, and two Secondary replicates Primary node data and voting at election.

The node with the right to vote sends heartbeat data in pairs, and the main node that does not receive the heartbeat for 5 times is defined as a missing state. MongoDB then conducts elections based on the Raft protocol. There are up to 50 nodes in Replica Set, and only 7 nodes have voting rights at most. Factors that affect the election: most of the nodes in the entire cluster must survive. The condition for being elected as the primary node: to be able to establish a connection with a majority of nodes has a newer oplog with a higher priority (configurable) commonly used configuration options: whether to have the right to vote (v parameter): will participate in the vote if there is. Priority (priority parameter): the higher the priority, the more priority the node is called the primary node. A node with a priority of 0 cannot become a primary node. Hidden (hidden parameter): copies the data, but is not visible to the application. The hidden node has voting rights, but the priority must be 0. Delay (slaveDelay parameter): copy the data before N seconds, the time difference between the newspaper and the master node. Deployment of the replication set: the configuration of the master-slave database is the same, the hardware is independent and the software version is consistent. III. Slicing cluster

(1) the mongos routing node provides a single entry of the cluster to forward the application's request to select a suitable data node for reading and writing and merging the return results of multiple data quota nodes. At least two mongos nodes are stateless. (2) config configuration directory nodes provide storage of cluster metadata. Mapping of sharding data distribution (3) data nodes scale out a maximum of 1024 shards in the unit of replication set. The data between shards can not repeat all the shards together to complete the work. (4) the characteristics are completely transparent to the application. There is no need for special processing to automatically balance and dynamically expand the data, and there is no need to provide three sharding modes offline. (5) the three ways of sharding cluster are based on scope: good query performance of fragment key range, optimized reading. The distribution of data may be uneven and hot spots are easy to be generated. Based on Hash: uniform data distribution, write optimization, low efficiency of range query, suitable for log, Internet of things high concurrency scenarios. Based on Zone: it is divided into multiple Zone according to region, time limit and other attributes. All objects in a collection (collection) are stored in a block (chunk), and the default block size is 64MB. When the data capacity exceeds 64 MB, it is possible to implement a migration (6) reasonable architecture

A shard does not exceed the 3TB, try to ensure that it is in 2TB. Commonly used indexes must be contained in memory.

How many slices are needed?

Number of shards = max (required storage capacity / single node mount capacity, working set size / single server memory capacity 0.6, total concurrency / single node concurrency 0.7)

How to select a chip key?

The cardinal number, the larger the cardinal number, the better. For example, the surname of a hundred families is larger than the age base. Write distribution, data distribution is uneven, such as the age of students. Directional, mongos can directly locate the data. Disaster preparedness and recovery (1) backup mongodump-h HostName:Port-d DatabaseName-c CollectionName

Use the-- oplog parameter to implement an incremental backup. Copy all the oplog from the start of the mongodump to the completion. Will be exported to a dump/oplog.bson file.

(2) restore mongostore-h HostName:port-d DatabaseName-c CollectionName Filename.bson

Use the-- oplogReplay parameter for incremental recovery. Recovery at a specific point in time is achieved through the-- oplogLimit parameter and-- oplogFile parameter.

In the backup of a sharding cluster, data migration and equalization may occur in multiple shards, resulting in confusion of the backed up data, which can be solved by stopping the equalizer.

(3) backup scheme delaying node backup full backup + OpLog incremental backup mongodump replication data file system snapshot 5. Transaction support (1) write transaction

WriteConcern parameter: determines how many nodes a write operation falls on before it is considered successful.

W parameter: 1: default, which requires that the write operation has been propagated to the independent mongo instance or the Primary member of the replica set. 0: no write confirmation is required. Majority: requires that write operations have been propagated to most members who have stored data and have the right to vote. J parameter: true: the write operation is successful when it falls to journal. False: a write operation that falls into memory counts as a success. (2) where to read the reading transaction? The location is controlled by the readPreference parameter, and the values are as follows: Primary (default): master node, general users place orders. PrimaryPrefered: master node is preferred, and general users check orders. Secondary: slave node, commonly used for reporting. SecondaryPrefered: priority slave node. Nearest: the proximity principle, determined by PingTime, is generally used to upload pictures and distribute them around the world. Use Tag to direct some nodes: by specifying {"purpose": ""}. What kind of data can be read? Isolation is controlled by the readConcern parameter, and the values are as follows: avaliable: read all available data. Local (default): reads all available data that belongs to the current shard. Majority: read the submitted data on most nodes to prevent dirty reading. Implementation mechanism: nodes use MVCC mechanism to maintain multiple versions, and the version data confirmed by most nodes is used as a snapshot. MongoDB links different versions by maintaining multiple snapshots, and snapshots are no longer used. Linearizable: linearize the read document to ensure that all previous writes are safe to read in the event of a network partition, because all nodes are checked during the read. Snapshot: reads the data from the most recent snapshot. How to read and write safely? ReadConcern is set to majoritywriteConcern set to majority (3) ACIDA=4.0 version of multi-document transaction MongoDB that replicates set with multiple tables and rows, sharding cluster with multiple tables and rows, and version 1.0 single form document. C=writeConcern and readConcern. I=readConcern . D=Journal and Replication. (4) ChangeStream

Used to track changes, similar to triggers, based on oplog implementation, the returned _ id can be used for breakpoint recovery, there is a cursor to track and push changes to majority conditions.

The application can understand the changes of the data in real time. The replication protocol version must be 1 and use the WT storage engine. Only replica sets and Shard are available. MongoDB Driver3.6 is used, and the version 3.6 feature parameter featureCompatibilityVersion must be turned on. WriteConcern must be configured. The similarities and differences between ChangeStream and triggers: ChangeStream is asynchronous and based on the event callback mechanism. ChangeStream takes effect once for each client. ChangeStream supports breakpoints, and triggers can only be rolled back by transactions. Application scenario: the interruption event of micro-service linkage ChangeStream for cross-cluster replication cannot exceed the recovery time of oplog. VI. Interview questions

What are the advantages of MongoDB?

For Collection and Document, save data in JSON format, support binary data and large objects. High performance, support for Document embedding, reduce IO operations on the database, based on complete index support, support for fast query. High availability, replication set, provides automatic failover. Highly scalable, sliced cluster. Support for aggregation pipelines and full-text indexing.

Support plug-in storage engine, WiredTiger storage engine and in-memory storage engine.

Data types supported by MongoDB:

Similar to Java: String (UTF-8 encoding is legal), Integer, Double, Boolean, Arrays, Datetime. Unique: ObjectId (for storing documents ID,ObjectId based on distributed primary keys to achieve MongoDB sharding is also available), Min/Max Keys (comparing a value with the lowest and highest values of BSON elements), Code (JavaScript code), Regular Expression (regular expressions), Binary Data (binary data), Null (null values), Object (embedded documents).

What is mongod and what are the default parameters?

Mongod is the main process of dealing with MongoDB system. The default parameter is-- dbpath=/data/db,--port=27017.

The difference between MySQL and MongoDB:

MongoDB is a non-relational database MySQL using virtual memory + persistence MySQL using traditional SQL statements. The common architecture of MongoDB is replica set and shard cluster, while MySQL has MS, MHA, MMM and other architectures MongoDB is based on memory and stores hot data in physical memory to tell read and write data. Each storage engine in MySQL has its own characteristics.

Will the update operation fsync to disk immediately?

No, the disk write operation is delayed by default, and the write operation may fall to disk within 2 seconds, which can be configured by the syncPeriodSecs parameter.

What types of indexes are supported by MongoDB?

Single field index composite index multi-key index full-text index Hash index wildcard index 2d sphere index

MongoDB builds an index on A: {BMagol C}. Will indexes be used for queries A: {BMagol C} and A: {C Magi B}?

Because the MongoDB index uses the B-tree tree principle, the index will only be used on A: {BMagne C}.

If the block move operation (moveChunk) fails, do I need to manually clear the partially transferred document?

No, the move operation is consistent and deterministic. After a failure, the move operation keeps retrying. When finished, the data will only appear in the new shard.

When will the data be extended to multiple shards?

MongoDB fragmentation is based on range. So all objects in a collection are stored in a chunk, and the default block size is 64Mb. When the data capacity exceeds 64 Mb, it is possible to implement a migration, and only when there is more than one block, there are multiple options for sharding data acquisition.

What happens when you update a document on a block being migrated (Chunk)?

The update occurs immediately on the old block (Chunk), and then the changes are copied to the new shard before ownership is transferred.

What happens when a Shard stops or is slow to initiate a query?

If a fragment stops, the query returns an error unless the query has the "Partial" option set. If a shard responds slowly, MongoDB waits for its response.

What is Arbiter?

The quorum node does not maintain the dataset. The purpose of the arbitration node is to maintain the arbitration in the replica set by responding to the heartbeats and election requests of other replica set nodes.

What are the node types of replication sets?

Priority 0 (Priority 0) node hidden (Hidden) node delayed (Delayed) node voting (Vote) node and non-voting node 7, application case (1) typical application scenario of MongoDB

MongoDB is an OLTP database, and in principle, MongoDB can do whatever MySQL and Oracle can do. MongoDB has native scale-out ability, flexible model support, suitable for rapid development of iterative, changeable data model scenarios, and MongoDB uses JSON data structures, which is very suitable for the field of micro-services.

Feature-based selection:

More than 100 million data in MongoDB traditional relational database supports Easy database and table structure Easy data dictionary, association query high concurrency read EasyHard high concurrency write EasyHard cross-region cluster EasyHard data fragment Easy middleware address location query fully support PostGreSQL, others are troublesome to aggregate and calculate EasyGroupBY, complex SQL heterogeneous data Easy data dictionary, large association query, wide table Easy performance limitation

Scene-based selection:

Mobile applications, Mini Program

Scene features: based on RESTful API, fast iteration, frequent changes in data structure, most functions are based on geographic information, explosive growth, high availability

Industry case: Keep (to be honest, fitness is better than a private one-on-one), mobike, ADP

Huge commodity data of e-commerce

Scene features: commodity information is all-inclusive, database schema design is difficult

Industry case: JD.com Mall, Xiao Hongshu, GAP

Content Management:

Scene features: diverse content data, difficult to expand

Industry case: Adobe AEM,SiteCore

Internet of things IoT

Scene features: the sensor data structure is often semi-structured data, and the amount of real-time data collected by the sensor is huge, which is easy to grow to the level of 10 billion.

Industry cases: Huawei, Bosch, MindSphere

SaaS application

Scene features: multi-tenant model, changeable demand, fast data growth

Industry case: ADP, Teambition

Host shunt

Scene features: high-performance query, real-time synchronization mechanism

Industry case: financial industry

Real-time online analysis

Scene features: stream data calculation, fast calculation, second response

Industry cases: MongoDB caching mechanism, MongoDB aggregation framework, differential chip architecture

Relational migration to MongoDB carries more data and concurrency

Features of the scenario: data growth leads to low performance, and the scheme of sub-database and sub-table is complex.

Industry cases: headline, NetEase, Baidu, China Eastern Airlines, Bank of China

(2) MongoDB interfacing MySQL and Oracle

There are several issues to consider when migrating from traditional relational databases to MongoDB:

Overall architecture

Operation and maintenance tools, script permissions setting distributed monitoring backup recovery

Pattern design

The table structure is integrated into JSON documents

SQL statement / stored procedure / ORM layer

Original SQL

Stored procedure characteristics

ORM framework

Data migration

There are several ways to migrate data:

(1) Database export, import, export, JSON or CSV

(2) ETL bulk migration tools, Kettle, Talend

(3) Real-time synchronization tools, infomatica and Tapdata (will run an Agent), usually parsing log mode

(4) active migration of applications

(3) MongoDB and Spark

MongoDB as the storage scheme of Spark, MongoDB has more fine-grained storage than HDFS and supports structured storage. MongoDB supports indexing mechanism, which makes the reading of Spark faster. HDFS is written once and read many times, but MongoDB is suitable for the read-write mixed scenario of Spark. MongoDB is an online storage, millisecond SLA.

4) Visualization and ETL

MongoDB can be combined with SQL through BI Connector. BI Connector automatically generates the DRDL mapping file, and then we write SQL statements to display the data according to the mapping file.

BI Connector is an enterprise version and a stand-alone service.

BI Connector exposes the interpreter built by the MySQL driver and then acts as a virtual MySQL service.

(5) the disaster recovery level of two-place, three-center advanced cluster design describes RPORTOLevel0 no disaster preparedness source, only local data backup 24 hours, 4 hours Level1 local backup + remote storage, save and send key data to remote 24-hour 8-hour Level2 dual-center master and standby, establish hot spots backup through the network for a few minutes to half-hour Level3 dual-center dual-active, second-second Level4 dual-active + remote hot backup Switch seconds to minutes when two centers of a city are not available

Network layer solution

GSLB implements the health check of the MongoDB load balancer and switches the application layer through the domain name.

Application layer solution

Use load balancing technology, virtual IP technology, use the same Session, use the same set of data.

Use HAProxy or Nginx as the local SLB local load balancer.

Database layer solution

Copy data through log synchronization or storage mirroring.

Replication set cross-center 2'2'1 solution

2 / 2 / 1 ensures the high availability of the main center, and oplog synchronizes millisecond copies.

(6) write more around the world

Since replication sets only solve the problem of reading, writes still have to be done on the Primary, so the user experience in several countries is not guaranteed.

Global multi-writing is essentially a special slicing cluster. The sharding nodes in the cluster are deployed in different regions. To achieve global sharding and multi-writing, the following three conditions must be achieved:

For the data collection to be fragmented, a region field is added to the model.

Add a region label to each shard in the cluster.

Sh.addShardTag ("shard0", "Asia")

Specify a range of tiles for each area that belongs to that area.

Sh.addShardRange ("tableName", {"location": "China"}, "Asia")

Transactional issues of global overwriting:

When overseas users access and read data, they want to read it locally overseas, so you need to set readPreference: "nearest".

When an overseas user places an order, it needs to be written to most of the local nodes to be successful, and the overseas data at home and abroad are waiting for oplog synchronization, so you need to set writeConcern: "majority". When you need to read the data from all areas for summary, you only need to set the read local master slave node to: readPreference: "nearset" to ensure that the nearest data is read locally. Joining overseas users to place orders in China will result in writing to remote overseas nodes because writeConcern: "majority" needs to be written to most nodes.

Of course, MongoDB can also deploy two clusters at the same time as Oracle at home and abroad, synchronizing through third-party tools, and dealing with data conflicts. Common middleware are: Tapdata and MongoShake. These two third-party middleware are also based on oplog.

8. Connect and develop considerations: connect to replication set: mongodb://node1,node2/dbname? [option] Connect to shard set: mongdb://mongos1,mongos2/dbname? [option] support domain name resolution: load balancer cannot be used before mongodb+srv://mongos or node address mongos, because mongos comes with LB transaction support: use 4.2 compatible drivers. The transaction will be completed within 60 seconds, otherwise it will be cancelled. Shards involving transactions cannot use arbitration nodes. Transactions affect the efficiency of Chunk migration. The Chunk being migrated may cause the transaction to fail. Multiple document transactions must take place on the Primary node. ReadConcern should only be set at the transaction level, not on every read and write. Other: each query corresponds to an index as much as possible. Try to use overlay indexes. Use projection to reduce the content returned to Client. Avoid using count when dealing with paging, just use limit. Try to keep within 1000 updated document transactions. Necessary check when the system is online: disable NUMA, otherwise it may lead to a sudden large number of SWAP exchanges in some cases. Disable Transparent Huge Page, otherwise it will affect database efficiency. Tcp_keepalive_time is set to 120 seconds to tolerate network problems. Sets the maximum number of file handle openings. Turn off the atime of the file system to improve access efficiency. IX. Index management

Indexes in MongoDB are special structures that are stored in easily traversed data sets and use BTree structures.

(1) issues to consider when creating an index each index requires at least 8KB space to add an index that affects write performance because each collection must also update when the index is in the Action state when inserting. Each index takes up disk space and memory. (2) Index name crossover cannot exceed 128 fields. Composite index cannot exceed 32 attributes. Each set cannot exceed 64 indexes. (3) Index management.

Create an index

The db.collection.createIndex (,); parameter data type describes that backgroundBoolean creating an index blocks database operations and can be specified as background operations. Whether the uniqueBoolean builds a unique index nameString the name of the index dropDupsBoolean3.0 version is discarded, whether to delete duplicate records when building the index sparseBoolean does not index field data that does not exist in the document expireAfterSecondsInteger seconds, set the version number of the indexed TTLvIndex version index weightDocument index weight value, the value between 1-99999 default_languageString on the text type index, determines the word splitter rule, the default is the English language_overrideString index on the text type index Specifies the name of the field included in the document

View Index

Db.collection.getIndexs ()

Delete index

Db.collection.dropIndexs (); db.collection.dropIndex ()

View the creation process and termination

Db.currentOp (); db.killOp ()

Use situation

/ / get index access information $indexStats// returns query plan explain () / / controls the index, forcing MongoDB to use specific indexes to query hint () (4) single-valued index

MongoDB can create an index on any field, and by default it creates an index in the _ id field, which cannot be deleted when the _ id index is used to prevent the client from creating an index with the same value. Use the _ id index in a sharding cluster.

(5) compound index

Combine multiple keys together to speed up queries that match multiple keys.

Cannot create composite index with Hash index Composite field index is ordered composite index supports prefix matching query db.collection.createIndex ({:,:,...}) (6) Multi-key index

MongoDB uses a multi-key index to create an index for each element of an array, which can be built on an array of string, number, and embedded document types. If you create a field that contains the values of an array, MongoDB automatically determines whether to create an index.

Db.coll.createIndex ({:

< 1 or -1 >

}) (7) full-text index

The MongoDB mechanism provides a full-text index type that supports searching for strings in the collection.

Db.collection.createIndex ({key: "text", key: "text". })

MongoDB provides weights and how wildcards are created. The query mode is separated by multiple string spaces, excluding the use of "-" in the query. Each full-text index can be assigned a different degree of search by setting weights, with a default weight of 1, and for each index field in the document, MongoDB multiplies the matching number by the weight and adds the results. Using this sum, MongoDB then calculates the score of the document

There is at most one full-text index per collection. If the query uses the $text expression, the hint () function (8) Hash index cannot be used.

The hash index uses a hash function to calculate the hash value of the index field value. The hash function collapses the embedded document and calculates the hash value of the entire value, but does not support multi-key (that is, array) indexes.

Db.collection.createIndex ({_ id: "hashed"})

Hash indexes support slicing using hash sharding keys. The hash-based shard uses the hash index of the field as the shard key to split the data in the entire shard cluster.

X. security architecture

Turn on the security option by adding the-auth parameter to the command line or by adding authorization: enabled to the configuration file.

Use the command line client operation: mongo-uUsername-pPassword-- authenticationDatabase DbName

(1) Security policy supported by MongoDB user name password certificate LDAP, enterprise edition Kerberos, enterprise edition (2) for authentication KeyFile between cluster nodes, copy Key to different nodes uniformly, random string X.509, certificate-based mode, issued by internal or external CA servers, each node has different certificates (3) user rights supported by MongoDB

MongoDB's Role is based on Action and Resource. Action defines an action, and Resource represents the resources that an action can operate on. The MongoDB built-in permission role inheritance diagram is as follows:

Custom roles and users can use createRole () and createUser (), respectively.

(4) Transmission encryption

MongoDB supports TLS/SSL to encrypt all network data transfers, whether internal nodes or client-to-server.

(5) set disk encryption (Enterprise Edition) to generate master key, which is a key used to encrypt the database. Each database corresponds to a different key. When the disk is down, the data is encrypted based on the key of different databases. The management of key is done through a key management server using the KMIP protocol. MongoDB also supports file management. (6) Field encryption MongoDB supports field-level encryption. When sending a request to encrypted data, the driver of MongoDB directly contacts the key manager to obtain the secret key, and then directly goes to the database to query according to the query conditions, pulls the encrypted data and uses the secret key decryption to return plaintext data. The encryption and decryption of the data all take place in the MongoDB driver. (7) the audit (enterprise version) record format is JSON, which can be recorded in local files or syslog records: DDL, DML, user authentication.

The audit log is recorded to syslog:

-- auditDestination syslog

The audit log is recorded to the specified file:

-- auditDestination file-- auditFormat JSON-- auditPath / path/to/auditLog.json

Audit the deletion:

-- auditDestination file-- auditFormat JSON-- auditPath / path/to/auditLog.json-- auditFilter'{atype: {$in: ["dropCollection"]}'11. Performance optimization (1) mongostat

A tool for understanding the running status of MongoDB.

Insert, query, update, delete: number of operations in the last second getmore: for cursor operations, operations in the last second command: create index and other operations, the number of operations executed in the last second dirty: more than 20% may block new requests, because this parameter indicates that the proportion of data has not been flushed used: more than 95% may block new requests, because MongoDB is based on memory cache mechanism, when the cache exceeds 80% LRU algorithm qrw, arw: queued request conn: current number of connections (2) mongotop will be executed

Tools for understanding collective pressure

Ns: collection total: total time consuming read: reading time consuming write: writing time consuming (3) mongod log

MongoDB records queries that exceed 100ms and outputs the execution plan.

(4) mtoolspip install mtools

Common instructions:

Mplotqueries LogFile: display all slow queries through icons. Mloginfo-queries LogFile: summarize all slow query modes and the number of occurrences, elapsed time, etc.

Https://github.com/rueckstiess/mtools

12. GridFS

GridFS is a sub-module of MongoDB, which is mainly used to store files in MongoDB, which is equivalent to a distributed file system built into MongoDB. In essence, the data of the file is stored in blocks, and the default file collection is divided into fs.files and fs.chunks. Fs.files stores the basic information of a file, such as file name, size, upload time, MD5, etc. Fs.chunks is the place where the real data of a file is stored. A file will be divided into multiple chunk blocks for storage, usually 256KB/.

The advantage of GridFS is that you don't have to build a file system alone, you can just use the one that comes with Mongodb. Backup and shards all rely on MongoDB, which is easy to maintain.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report