In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces you to hundreds of millions of MySQL data seven steps to MongoDB operation process, the content is very detailed, interested friends can refer to, I hope to help you.
I. Problems
Inside Gooddoctor Online, S3 system is responsible for centralized storage, query and management of operation logs of each business party. At present, the system has tens of millions of queries per day and hundreds of thousands of insertions. With the continuous accumulation of log volume, the main table has reached billions, and a single table occupies 400G+ disk space. S3 is a system that existed in the early stage of the business. At that time, MySQL was used for storage in order to implement it simply and quickly. With the continuous growth of the business, performance and scalability should be considered at the same time. It was time to reselect.
The new project is named LogStore.
II. Objectives
1. Safety
At the beginning of the design of S3 system, data isolation was not considered according to business system, but key(system + class name + id) + limited fixed field + serialized value was directly used for storage, which was obviously inconvenient for subsequent cluster splitting and management. LogStore system needs to divide data area logically. When accessing, the business party needs to specify app to perform necessary permission verification to distinguish different business data, and then insert and query operations.
2. Generality
S3 mainly provides a three-layer structure, using MySQL fixed fields for storage, which inevitably causes waste of field space. LogStore system needs to provide a common log storage format, and the business party defines the field meaning by itself, and retains a certain degree of queryable dimensions.
3, high performance
The QPS of S3 system is 300+, and the maximum single data is about 1KB. LogStore systems need to support write and read speeds 10 times faster than current QPS.
4, auditable
To meet internal security audit requirements, LogStore does not provide updates to data, only allows insertion and query of data.
5, easy to expand
LogStore system and underlying storage should meet the scalability characteristics, can be online expansion, to meet the company's log storage needs for the next 5 years or even longer, and to maximize disk space savings.
III. Scheme Selection
In order to achieve the transformation goal, four storage transformation schemes were investigated, and the comparison of various schemes is as follows:
1. We are not suitable-sub-library sub-table
Sub-library sub-table is mainly divided into application layer dependent middleware and agent middleware, no matter which needs to modify the existing PHP and Java framework, but also brings certain operational difficulties to DBA management data. In order to reduce the complexity of the architecture, the architecture team rejected the solution of introducing DB middleware, and still required a solution with simple operation and maintenance and low cost.
2. We are not suitable-TiDB
TiDB has also entered our key research object for a time, but because the current DB ecosystem of the company is mainly on MGR, MongoDB and MySQL, TiDB scenarios have not been fully utilized in foreseeable needs, so they have been temporarily shelved.
3, We are not suitable-ElasticSearch
ELK-stack does offer a suite that makes ES attractive, and companies have been clustering ES for a long time. ES has the advantage in the field of retrieval and data analysis. It is also because of its powerful retrieval and analysis functions that the cost of writing, querying and storing is relatively high. In this scenario of log processing, the cost performance is slightly lower, so it is also passed.
4. Suitable choice-MongoDB
Business operation logs read more and write less, which is very suitable for the characteristics of document database MongoDB. At the same time, MongoDB has been widely used in the industry, and the company also has a lot of business in use. After accumulating some operation and maintenance experience on MongoDB, it finally decided to choose MongoDB as the new log system storage solution.
IV. Performance test
In order to verify whether MongoDB performance can meet the requirements, we set up a MongoDB cluster. The machine configuration, architecture diagram and test results are as follows:
1. Machine configuration
MongoDB cluster 3 machines configured as follows:
CPU
memory
hard disk
OS
Mongo version
8-Core
15G
MongoDB Memory Allocation Single Node 8G
100G
CentOS release 6.6 (Final)
3.2.17
2. Architecture Diagram Architecture Diagram
3. Test scenario architecture diagram
This MongoDB test uses the YCSB (https://github.com/brianfrankcooper/YCSB) performance test tool. YCSB's workloads directory stores 6 different workload types, representing different pressure test load types. This time, we only use 5 of them. The specific scenarios and test results are as follows.
workloada
100% insertion for loading test data
workloadb
Read more than write less, 90% read, 10% update.
workloadc
Read more than write less, read 100%.
workloadd
Read more than write, 90% read, 10% insert.
workloadf
Mixed read-write, 50% read, 25% insert, 25% update
(1)Insert average document size is 5K, data volume is 1 million, concurrency is 100, total data volume is about 5.265G, execution time and disk pressure:
Conclusion: 100w data insertion takes 219s, average insertion takes 21.8ms and throughput is 4568/s.
(2)Test 90% read, 10% update, concurrent 100 scenarios:
Conclusion: The total time is 236s, the average read time is 23.6 ms, the average update time is 23.56 ms, and the throughput reaches 4225/s.
(3)Read more than write less, 100% read, concurrent 100 scenarios:
Conclusion: The total read time is 123s, the average read time is 12.3 ms, and the throughput is 8090/s.
(4)Test read more than write less, 90% read, 10% insert, concurrent 100 scenarios:
Conclusion: The total time is 220s, the average read time is 21.9 ms, the average insert time is 21.9 ms, and the throughput is 4541/s.
(5)Test mixed read-write, 50% read, 25% insert, 25% update, concurrent 100 scenarios:
Conclusion: The total time is 267s, the average read time is 26.7 ms, the average update time is 26.7 ms, the average insert time is 26.6 ms, and the throughput is 3739/s.
4. Comparison of test results
It can be seen that MongoDB is suitable for reading more and writing less, with the best performance, and the reading and writing rate can meet the production requirements.
V. Seamless migration practices
In order to ensure seamless business migration and minimize the investment cost of business R & D students, we decided to adopt a phased handover scheme.
Step 1: System application layer transformation +LogStore system construction
First of all, read switch and write switch are built in S3 system, which can introduce read and write traffic into LogStore system respectively, and access of new application can directly call LogStore system. At this time, the structure diagram is as follows.
Step 2: Incremental Data Synchronization
In order to make the new data in S3 system consistent with LogStore system, Maxwell subscribes MySQL Binlog to synchronize to MongoDB in the underlying database. The schematic diagram is as follows:
Maxwell (http://maxwells-daemon.io) reads MySQL binary log binlog in real time and generates JSON format messages, which are sent to Kafka as producers. Logstore system consumes data in Kafka and writes it to mongodb database.
So far, for the existing log type of the business party, the newly added data achieves the purpose of double writing at the bottom layer, and the S3 system and the LogStore system store two pieces of data; if the business party newly adds a log type, it can directly call the LogStore system interface. Next, we will migrate older data of existing log types.
Step 3: Migration of stock data
This migration of S3 old data uses php timed task scripts (multiple) to query the data and deliver the data to RabbitMQ queue. LogStore system pulls messages from RabbitMQ queue for consumption and storage in MongoDB. The schematic diagram is as follows:
(1)Because the id in the original mysql table is of varchar type and is not a primary key index, queries can only be performed in batches by using ctime index, and chunks are delivered to the mq queue at data-intensive places.
(2)Data cannot be migrated in one day, and there may be disruptions in the migration process. The script uses a timed task to execute 20 hours a day, stops execution at the online time, and records the stop time in Redis.
(3)Due to the large amount of data that needs to be migrated, increase the number of scripts as much as possible and shorten the time for data migration under the condition that mq and consumers can bear it.
(4)During script execution, observe the business delay and MySQL monitoring, and adjust immediately if any impact is found, so as to ensure that normal business is not affected.
Step 4: Verify the data
After the old data import is completed, the old data should be verified in two aspects: data volume and data integrity.
Data volume: query whether the old data exists in MongoDB based on the ID of the old data of S3 system, and compensate for retransmission if it does not exist;
Data integrity: md5 verification is performed for data in S3 and MongoDB according to the same rules, and compensation retransmission is performed if the verification fails.
Step 5: Double Write Data
Turn on the prefabricated write switch of the application layer and import the traffic into LogStore. At this time, MySQL traffic does not stop, and binlog synchronization continues. The structure is as follows:
As you can see from the figure, traffic from the write interface at the S3 call point is written to the MongoDB database backuplogs collection. Why not write it directly to the logs table? Leave a small suspense, explained later in the article.
Step 6: Grayscale switch S3 Read LogStore system
As mentioned above, for S3 system application layer read and write call points, there is a built-in switch, turn on the application layer read switch, and all read operations go through LogStore. The schematic diagram after switching is as follows:
Step 7: Grayscale Switch Write Interface to LogStore System
Turn on the application layer write switch, all write operations will be written asynchronously to MongoDB through mq, then how to prove that the application layer write call point is completely modified?
In the above, double-write data is one copy to the logs table and one copy to the backuplogs table. The data synchronized by Maxwell's Binlog must be the most complete. In theory, count( logs) >= count(backuplogs). If the data increment of the two sets is the same for a period of time, it proves that the write call point is completely modified. Double writing can be removed and only LogStore is retained. Otherwise, it needs to be checked and modified again. After the switch is completed, the schematic diagram is as follows:
VI. MongoDB and Fault Drill
Fault drill can detect whether the service is really high availability, timely discover the weak links of the system, and prepare plans in advance to reduce the failure recovery time. To verify whether MongoDB is truly highly available, we built a MongoDB cluster online:
At the same time, we write scripts to simulate user MongoDB data insertion and reading, based on the good doctor online self-research fault drill platform, fault injection to the machine, to see the impact of various faults on users. Fault drill content CPU, memory, disk, network and process Kill operations, details as shown in the following figure:
Experimental results:
CPU, disk population and disk load have little impact on MongoDB cluster;
Memory overload may cause system OOM, resulting in MongoDB process being killed by the operating system. Since MongoDB has data copies and automatic master-slave switching, the impact on users is small;
Network jitter, delay and packet loss will cause mongos to connect to the server for a long time, and the phenomenon of client blocking will occur, which can be monitored by means of network monitoring;
Actively kill the master node, slave node, arbiter node, mongos node and config node of MongoDB respectively, which has little impact on the whole cluster.
Overall, MongoDB has replicas and automatic master-slave switching, and the client has an automatic reconnection detection mechanism. When a single machine fails, the impact on the overall cluster availability is small. At the same time, it can increase the monitoring of single machine resources, reach the threshold to alarm, reduce fault detection and recovery time.
1. Use of MongoDB
MongoDB data may be written unevenly among shards. In this case, block balancing can be enabled. Since the balancer will increase the system load, it is best to perform it when the traffic volume is small.
Reasonable selection of fragmentation keys and the establishment of indexes, will make your query faster, this should be specific analysis of the scene.
2. Migrating data
A field that uniquely identifies the data must be reserved, preferably a primary key id, to facilitate data verification;
Be sure to consider multiple processes, scripts to automate, shorten migration time and minimize manual intervention;
During migration, pay attention to database, middleware and application-related indicators at all times to prevent exported and imported data from affecting normal business;
It is necessary to fully practice in the same configuration environment and formulate data comparison test cases in advance to prevent data loss;
Each online operation (such as switching between reading and writing) should have a corresponding rollback plan to minimize the impact on the business.
About hundreds of millions of MySQL data seven steps to MongoDB operation process to share here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 209
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.