In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you the process of MongoDB from being into the pit to being fascinated. The content is concise and easy to understand, and it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Background: our company is a social e-commerce company with millions of users and annual sales of nearly 5 billion. At the beginning of the establishment of the company's technology department, in order to adapt to the rapid growth of users and the continuous change of business, when selecting the database, we chose MongoDB after investigation and comparison.
Yes, you read it right, All in MongoDB!
1. Why use MongoDB
Because we mainly do social e-commerce business, so there are certain requirements for the performance of the database, coupled with commodity trading is the company's main source of profit, so there are certain requirements for the high availability of the database.
Summarize our requirements for the database:
Safe and stable
High availability
High performance
What are the main considerations when considering database selection?
Data scale
Support read-write concurrent traffic
Latency and Throughput
In terms of data scale, important data records such as order and commodity SKU, as well as member information will certainly continue to grow over time, so what we need is not only to meet the current requirements, but also to consider the more convenient expansion of massive data six months and a year later!
Let's introduce our reasons for choosing MongoDB from the architecture, performance, and document model of MongoDB!
2.MongoDB architecture
2.1 about High availability
As the core of the system, the database should be 99.99% available, while the high availability guarantee comes from the replication set mode of MongoDB redundant data. MongoDB comes with multiple copies with high availability, and only needs reasonable configuration to avoid service unavailability caused by a single database node failure.
The legend shows:
A Primary master node that mainly accepts reads and writes from server
Two Secondary slave nodes used to synchronize data from the Primary.
About high availability: when the master node fails, two slave nodes will elect and vote to produce a new master node, thus ensuring the availability of the service. (PS: the data is not writable during the election process, but if the Secnondary node is configured to be readable, the data can be read at this time.) This is the high availability of MongoDB, simple configuration, no need to introduce additional middleware or plug-ins to assist in the failover between database nodes.
2.2 about the election algorithm "distributed consistency algorithm-raft"
Raft protocol is an algorithm to ensure the consistency of distributed data when the leader node fails or the network partition leads to brain fissure. MongoDB uses this algorithm to ensure the data consistency when the master node fails or the network partition. Of course, the algorithm used by MongoDB and the original raft will be slightly different. MongoDB will use Secondary to pull data from Primary instead of Primary pushing data to Secondary to reduce the pressure on Primary and other ways that are conducive to database operation to improve the use of raft.
Animation demonstration of raft algorithm
Http://thesecretlivesofdata.com/raft/
2.3 about very large replication sets (clusters)
{"_ id":, "host":, "arbiterOnly": false, "buildIndexes": true, "hidden": false, "priority": 0, / / set to 0 "tags": {}, "slaveDelay": NumberLong (0), "votes": 0 / / set to 0}
MongoDB allows up to 50 nodes, but only 7 nodes have voting rights at most, and a node can be configured with 7 non-voting Non-Voting nodes, plus one Primary node.
Why can only seven voting nodes be allowed? Referring to the raft algorithm in section 2.2, the more nodes, the longer the voting time, the longer the elected Primary node time. In this process, we cannot write because there is no master node.
What is the use of so many non-voting nodes? You should all have heard of MySQL read-write separation, the use of read-write separation to improve database performance. MongoDB can also be used here. Primary is used to write and Secondary is used to read. You can give the BI department a Secondary, the finance department a Secondary, and the operations department a Secondary.
2.4 WriteConcern
Since our database has at least three nodes (1Primary+2Secondary), Secondary maintains consistency by synchronizing Primary data, so how can we ensure that the data is safely down when we write operations?
There are the following situations:
1. Write Primary successfully, return client write success, when Secondary has not synchronized Primary, Primary is dead, data is lost!
two。 Primary is successfully written, data synchronization is successful with a Secondary, and client is returned to write successfully. At this point, the Primary is dead and the data will not be lost. But it just so happens that Primary and synchronized Secondary hang up at the same time, and the data is lost!
3. The Primary is successfully written, the data synchronization of two Secondary is successful, and the client is returned to write successfully. At this point, the Primary is dead and the data will not be lost.
We analyze the above three situations: in the first case, there is a risk of data loss. In the second case, data loss will still occur, but the probability of data loss will be greatly reduced. The third case is the safest way, but the number of nodes is large, synchronization is very time-consuming, users need to wait too long, generally do not consider.
The trade-off recommended by MongoDB here is to use Write Concern--- to balance data reliability and efficiency!
Db.products.insert ({item: "envelopes", qty: 100, type: "Clasp"}, {writeConcern: {w: "majority", wtimeout: 5000}} / / set writeConcern to majority, timeout is 5000 milliseconds)
3.MongoDB fragmentation
3.1 how does large-scale data affect database efficiency?
The performance of the database is also closely related to the size of the database itself. Take a relational database as an example:
The efficiency of querying millions of tables is very different from that of tens of millions of tables or even more than 100 million tables, and the query performance deteriorates sharply.
Creating an index when inserting may cause index tree adjustment and page splitting.
3.2 how to improve the efficiency of data reading and writing in the face of massive data?
In order to improve the efficiency of the database in the massive data, we adopt the idea of divide and conquer, split the large table into small tables, and the large database into small databases.
We often use sub-table and sub-database to solve the problem in relational database:
For example, the order library is divided into online library and offline library, the last three months is the online library, the long-term order data is put into the offline library, so the data of the online library is greatly reduced, and the performance of the database is improved.
For example, when we have too many users with more than 10 million rows of records, the query efficiency of a single table decreases, and we split a user table into multiple user tables, which is horizontal split.
How do we do it in MongoDB?
3.3 MongoDBSharding
Fragmentation of MongoDB
By dividing the data of the same set (Collection1) into different shard (shard keys), the amount of data on the same data file has been reduced, and the purpose of splitting the data scale has been achieved.
Advantages of Shard: online expansion, dynamic expansion
Shard: used to store actual data blocks. In the actual production environment, a shard server role can be undertaken by several machines grouped into a replica set to prevent a single point of failure of the host.
Config Server: the configuration server mongod instance stores the metadata and configuration of the entire cluster, including chunk information. In MongoDB 3.4.The configuration server must be deployed as a replica set.
Mongos:mongos acts as a query router, providing an interface between client applications and sharding clusters.
The data inserted by the server is routed to a specific address through Mongos, which is also the convenience of MongoDB. It does not need to pay attention to routing or use middleware provided by third parties to assist routing. It is reliable and rest assured.
Slicing load balancing
When our MongoDB replica set becomes a shard cluster, as the amount of data increases, each shard will become larger and larger. Here, two situations occur:
1. Hot and cold data, the amount of data in a fragment is too large.
two。 The total amount of data is large, and the slicing of the sharding cluster is too large.
When a problem occurs (1), MongoDB's load balancer (Balancer) automatically moves data from large shards to smaller shards. Note that this does not mean that we can rest easy. On the contrary, we should reflect on the uneven data caused by our own incorrect selection of keys. Because it consumes performance for sharding migration, the application server writes once to Shard B, and then Shard B rewrites to Shard C invisible data is written twice, which is a great waste!
When there is a problem (2), of course, a new shard is added to the oversized shard collection to share the pressure of the shard cluster.
Note: although MongoDB sharding can be online, it will have a certain impact on the performance of normal read and write operations. It is recommended to deploy shards during non-peak hours.
Introduction to 4.MongoDB document Model
The challenge of database modeling is to balance the needs of the application, which is suitable for the structure of the database engine and the data retrieval mode. When designing a data model, we need to consider how the application uses the data (query, update, and data processing) as well as the structure of the data itself.
4.1flexible Schema
In a relational database, data must be inserted according to a determined table structure. However, because MongoDB is a document database, it is not required by default when inserting data. Its performance is as follows:
Different documents in the same collection do not necessarily have the same fields, and field types can be different.
To change the structure of a document in the collection, such as adding a field, deleting a field, or changing the type of a field, you only need to update the document.
4.2 example 1 the design of the model of Vol N
In e-commerce business, a user may have multiple recipients and receiving addresses. In a relational database, we need to set up contact tables, address tables, and associate them. But in MongoDB, we only need a collection to get this done!
The data relationships are as follows:
/ / patron document {_ id: "joe", name: "Joe Bookreader"} / / address documents {patron_id: "joe", / / reference to patron document street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345"} {patron_id: "joe", street: "1 Some Other Street", city: "Boston" State: "MA", zip: "12345"}
In MongoDB, we can design like this:
{"_ id": "joe", "name": "Joe Bookreader", "addresses": [{"street": "123 Fake Street", "city": "Faketon", "state": "MA", "zip": "12345"} {"street": "1 Some Other Street", "city": "Boston", "state": "MA", "zip": "12345"}]}
Yes, the above is a document (document) in the collection, does it feel very flexible and convenient! You can add category information or product labels to the SKU collection, you can also redundancy the basic SKU information in the inventory collection, and you can place orders in the redundant parts of the order collection. Yes, it's so flexible! This is also one of the important reasons why we choose MongoDB, so that developers have a much less mental burden, you do not need to be a SQL master, you can write excellent query statements in MongoDB.
Of course, redundancy is good for a while, and it's not unheard of to reconstruct the crematorium, because too much redundancy will eventually lead to various problems such as bloated data and degraded performance. To control the developer's impulse to redundancy, it also depends on the team's technical Leader to check it.
Internet business is not immutable, the needs of products and users and the market are changing all the time! We do not have the technical strength to build a platform that can adapt to the flexible business, but at present we can choose a reliable, powerful and flexible database-MongoDB!
The above is the process of MongoDB from being into the pit to being fascinated. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.