What are the pits used by TiDB 04/21 Update SLTechnology News&Howtos

What are the pits used by TiDB

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Most people do not understand the knowledge points of this article "what are the pits used by TiDB", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this article, "what are the pits used by TiDB?"

Original database architecture and database evaluation model

Let's take a look at our current database architecture:

According to the business, it is mainly divided into four layers:

The first is the business layer, the meaning of the business layer means that it is a specific product or process module, which needs to call the core circuit or core platform to achieve its own functions. This part of the architecture is mainly the use of x86 servers and low-end storage media + MySQL, Redis open source products, this layer in the realization of business registration will minimize costs.

The second layer is the core payment link layer, which is the core layer of the business. if there is a problem, it will lead to the transaction failure of the whole platform, so this layer will maximize data security, data availability and consistency. Its architecture is mainly implemented by small computer + high-end storage, as well as Oracle.

The third layer is the core platform layer, which refers to the platform that assists the core payment link to complete the entire transaction process, or a platform with some of its own capabilities, which will reduce the cost of high-end storage while ensuring performance and stability.

Finally, there is the offline filing layer. According to our current architecture in the TiDB landing process, we choose the route is from the business layer, to the core platform layer, and then to the core payment link layer. At present, both the business layer and the core platform layer have promoted the use of TiDB. Although some minor problems have been encountered in the process, on the whole, there are no problems in key points such as data consistency. Other problems about performance interpretation and parameter tuning have also been properly solved. And, in the process of promotion, we have established a set of application specifications for distributed database, including the best application practices of research and development, parallel schemes of new and old architectures, etc., to reduce risks and ensure operation.

The figure above is the database evaluation model we have established to make recommendations for database selection quickly. Now for newly adopted projects, we no longer consider using Oracle in the selection of relational database, but do the selection of architecture in the internal RDS, Sharding-Sphere, and TiDB. After a large number of tests, including TPCC, Sysbench, business testing and full communication with the original factory, the database selection and evaluation model of China Telecom Wing payment based on business scenario is finally determined.

As shown in the following table, we are mainly three types of technical stations, which are screened by capacity valves, performance valves, number of large tables, zoning rules, HTAP, and topology.

First of all, in terms of capacity, for example, if my capacity is less than 3T, QPS is less than 20000, and large tables are less than 10, RDS will be used in this scenario.

If it is between 3T and 10T, the QPS is more than 20000, the number of large tables is less, there are clear sub-table rules, and there are no statistical query scenarios, Sharding-Sphere will be selected.

If the capacity is greater than 3T, the QPS is at least greater than 20000, the number of large tables is relatively large, and the sharding rules are difficult to define, or some mixed scenarios, this will choose TiDB.

Here is a suggestion that we can make a quick selection when we make a case selection later.

Database evolution path

In business selection and application, we start from the edge, first verify the basic function and performance of the product from the historical archive library, and then cut in from the periphery, such as selecting unified messaging and marketing business to cut into the actual business. Then make the transition to the important business, such as the selection of credit reporting business, billing business, accounts, and settlement reconciliation business, which are what we have promoted now. The next thing we plan to do is to go to the core and select the pilot business in the CIF, payment, transaction and accounting system.

The principles of business application and switching are as follows:

Using business double write, transform the adaptation layer of application and database, run in parallel, and switch traffic step by step.

You can't throw away the data and you can't be wrong.

After double write verification, sensitive services need to quickly switch to TiDB architecture.

You can switch to the original architecture at any time during the migration process.

The tables that are partitioned by partial libraries should be combined.

The following is a detailed description of the practice and optimization in the use of TiDB. The application scenarios of TiDB in wing payment mainly include OLTP and some mixed scenarios, most of which are on the business scale of TB-level databases.

Application of account reconciliation platform system

The reconciliation platform (the reconciliation between the payment system and the channel) includes two latitudes, one is the blending of the information flow, that is, the business reconciliation / transaction reconciliation, which is mainly the reconciliation between the payment information of the receipt transaction and the information flow documents provided by the bank. The blending of information flow can find the order drop between the payment system and the banking system, and the two sides lead to inconsistent payment amount or payment status of the same transaction due to the reasons between the two systems. The second is the blending of capital flow, that is, capital reconciliation, which mainly blends the payment information of the receipt transaction with the capital flow information provided by the bank. The blending of capital flows can find the difference between the actual changes in the accounts of the payment system in the bank and the changes that should occur. This system involves multiple tables, the size of a single table is more than 1 billion, and the overall data size is 8T. The logic of the business application is relatively complex, and the concurrency scenario is moderate. If you choose according to the architecture selection diagram and evaluation model, it is more suitable for TiDB.

A Survey of the Application of account reconciliation platform

This is its data flow diagram. First of all, the core payment system generates the transaction pipeline and transmits it to the file parsing service in the form of a file. The file parsing service saves the analytical results of the data to the distributed database. The reconciliation system completes the reconciliation process based on the distributed database, and provides query pages and query services to the WEB side.

The following two monitoring charts are the monitoring charts after the reconciliation platform is launched, daily (TPS) and response time. Currently, the TPS is below 7000 per day, and the corresponding response time meets our needs.

Application value of account reconciliation platform system

The three most commonly used reconciliation channels we have selected for the billing system platform are compared:

UnionPay Alipay channel, in the past, the overall time to use MySQL is two minutes, now the overall time to use TiDB is 40 seconds, the performance has improved by 300%.

UnionPay card-free shortcut, the original use of MySQL takes 3 to 5 minutes, the current use of TiDB is 1 to 2 minutes, performance improvement has also reached a 200%-300% improvement ratio.

WeChat Pay, it used to take 3 minutes to use MySQL, but now it takes about 1 minute to use TiDB, and the performance has also improved by 300%.

The energy efficiency of this system for the operation of the financial department has been greatly improved, which greatly reduces the complexity of the work of the technical team. After we went online, the internal TiDB's current performance is still quite satisfactory.

Application of personal Billing system

The personal billing system provides individual users with the functions of management, query, classification and statistics of billing data of all transactions in the WingPay APP client, so that users can better grasp all the transactions they have done through WingPay.

The data source mainly comes from the reception of the kafka queue and from the transaction system.

Personal billing data originally exists in MySQL, using MyCat for sub-database and table strategy, but still can not solve the growing problem of data and insufficient storage space, can only save the data for one year. At the same time, the amount of data in the master table of personal billing data is about 8 billion. Whether it is adding columns and indexing, using online pt-online-schema, or gh-ost, it will take too long to copy a temporary table, fill up the disk, or take too long to process.

According to the evaluation model, it is also a typical TiDB application scenario, and the application switching and migration are carried out in a short time according to the application switching principle.

Here is the data flow chart of personal bills. First of all, the trading system will synchronize the transaction information into the kafka, and then consume it into the personal billing system, which will be displayed to the APP side through the personal billing system. Here, we choose DM of TiDB to migrate from MySQL to TiDB. DM tool not only supports full backup of files to import MySQL data into TiDB, but also supports incremental synchronization to TiDB by parsing and executing MySQL binlog. At the same time, it also satisfies the scenario where we have multiple MyCat sub-databases and tables that need to be merged into the same TiDB table. DM provides better support, you can take a look at the working schematic diagram of DM, the above is the function of full data migration, the following data flow belongs to an incremental data synchronization process.

For full data migration, DM first uses the dumper unit to import the table structure and data into SQL files from the upstream MySQL, and then reads these SQL files through the loader unit and synchronizes them to the downstream TiDB. The incremental synchronization part first uses the relay unit as the Slave, connects to the upstream MySQL and pulls the binlog as relay log data to the local, and then reads the relay log through the syncer unit and parses the statement into syncer to synchronize to the downstream TiDB. This incremental process is very similar to the master-slave replication of MySQL.

Our migration is to wait for DM to synchronize all the data and increments to TiDB, after a variety of verification of its data consistency, and then select a day for a brief write pause (about 10 minutes). When the business is handed over to the business, the business actually makes a double-write transformation. At this time, the double-write switch is turned on, and the TiDB is written at the same time. At the same time, the data of TiDB and MySQL are verified to be consistent. When it is confirmed that there is no problem, the synchronization of the MySQL is later disconnected, and a migration is completed.

Application of personal billing

You can take a look at the monitoring chart, the above is taken from Zabbix, because it originally used MySQL, so we first used Zabbix monitoring, QPS usually reaches a maximum of about 3-4 K, the following is the diagram of TiDB. As you can see, usually the QPS is about the same, but when you are active, the QPS will increase several times, and there is no problem with using TiDB at this time, indicating that you can handle this kind of system when the traffic is increased several times.

It significantly improves the user experience, increases the user usage, reduces the users lost due to traceability transactions, increases the user activity, and solves the latitude problems of the original sub-database table in capacity, storage cycle, query efficiency and so on.

Application of Anti-money laundering system

With the many changes in the number and type of monitoring data, the demand data of anti-money laundering business is increasing day by day, and the scope of monitoring is constantly expanding. At present, the platform is facing the following problems:

Significant performance bottleneck in database batch processing system

The statistical analysis system does not meet the timeliness requirements of responding to the supervision of anti-money laundering.

Database performance cannot be extended for performance.

Regulators' requirement for processing time is that suspicious rules and risk rating calculation requirements must be completed within the time limit of Test1. At present, the task time of running the approval order is about a few hundred minutes, and the overall task will be processed in 15 hours a day. With the increasing amount of data, it can not meet this performance requirement, so there is a need for transformation.

Due to the strict requirements of regulation, the anti-money laundering system also puts forward relatively strong requirements in terms of performance:

Meet the SQL2003 standards

Multi-table association, can query dataset less than 10 million, response time less than 5 seconds

Data files are loaded in batches, with a size of 20g, which cannot exceed 30 minutes.

500000 of the billions of data should be deleted, and the response time should be within 10 seconds.

Delete 20 million of the 300 million data, and also have a response time of less than 10 seconds.

The data volume of 300 million is updated by 1 million, and the response time is about 5 minutes.

It is estimated that this performance requirement can be achieved by using TiDB, so I chose the scheme of TiDB.

We upgrade the architecture in the form of TiDB, from the original Oracle synchronization to TiDB using (OGG for MySQL client). Some of the data is on the big data platform, and we use big data's publishing function to synchronize directly from Hive to TiDB.

Here we do four performance comparison tests on the anti-money laundering system, including anti-money laundering query business performance comparison, anti-money laundering insertion business performance comparison, anti-money laundering update business comparison, and anti-money laundering deletion performance comparison. From the test results, the overall batch performance has been improved by more than 3 times, the batch time has also been shortened to the original ⅓, and the overall effective processing capacity of the platform has been increased to more than 5 times, which further meets the needs of anti-money laundering.

Go to the core

Finally, we introduce our next goal. In the next stage, we will expand the scope of application and gradually migrate the core link system with rapid business development and large scale to TiDB. The main reason is that a lot of changes have taken place in the external environment, and there may be a lot of restrictions on the database in the future, so we must make some preparations in advance.

On the other hand, it is also for performance considerations. You can take a look at this monitoring chart, which is the QPS and CPU of the core system in one of the activities of Wing pay. We can see that at the peak of activity, the systems we use are two high-end computers and high-end storage. The execution capacity of a single node is about 84,000. At this time, the minimum contribution of CPU is 10%, indicating that our performance has basically reached the upper limit. Once there is business growth and data scale increases again, there may be some bottlenecks in performance, and it is difficult for small devices to expand. This is why distributed databases are considered on the core link. Therefore, we can make use of the distributed characteristics of TiDB to expand horizontally, so that the business can be expanded rapidly. At present, it is still in a research stage, because the stability and performance requirements of the core system is also a challenge to the TiDB database, this is our next goal.

At present, our core database will have a scale of hundreds of millions of data, and the total amount of data in a single database is more than 10T. In the process of exploration, we have also produced a lot of ideas, mainly because the downtime of the core business may be very short or cannot be stopped. It will be very difficult.

In this regard, we may need to update the development mode, including:

Dual-mode Development of coexistence of Oracle and TiDB

The switching process of grayscale or double writing

Have the ability of business verification

Design of module and batch schedule.

There will also be some higher requirements in operation and maintenance management, including the process of window switching operation, fallback plans and so on.

Go to the pit hand in hand

1.insert timed out periodically.

In the early version 2.0, a business experienced a periodic timeout of about 2W and sporadic 200ms per second in insert of TiDB. Due to the rapid increase in the number of Region and the performance bottleneck caused by single-thread work in Raftstore, the cycle of Heartbeat was temporarily increased to solve the problem. Later, it was upgraded to version 3.0.8. This problem has been solved through Raftstore multithreading work.

two。 The execution plan is not correct and the index is wrong.

In the early 2.0 version, there were several times when TiDB statistics were not allowed to cause an error in the execution plan and the business was affected. We try to analyze a full database table every hour, but this problem still occurs intermittently. At present, it is basically solved by upgrading to 3.0.8.

3. The number of concurrent backups is too high, resulting in occasional timeouts for businesses with backup time.

When a business uses TiDB version 3.0.8 with a storage capacity of about 8T, a backup is initiated at 3: 00 in the morning, resulting in an occasional business timeout. After investigation, it is found that it is caused by backup, because when TiDB is greater than TiKV backup, it takes up too much bandwidth of the business network, which can be solved by reducing the concurrency of backup.

The problem with 4.TiDB+keepalived.

TiDB's load balancing strategy uses HAproxy+keepalived scheme, and keepalived uses version 2.0.18 to detect occasional packet loss, resulting in business timeout. Later, it was replaced by a lower version of 1.2.8. It is recommended that TiDB can unify the access layer and implement it on its own, and one more external component will add one more point of failure.

5. Optimistic lock and pessimistic lock.

The previous version of TiDB 3.0.8 uses an optimistic lock model. Applications migrated from MySQL do not use row-level locks to lock related record rows like MySQL does when executing DML statements in transactions, and write conflicts are checked only when things are actually submitted. Although these can be solved through the application of transformation, they do cause an increase in migration costs and cause some trouble for developers. However, the pessimistic locking feature of TiDB version 4.0 is more mature, which also gives us more expectations and a better foundation for core migration.

Finally, to quote Mr. Lu Xun, there is no road in the world, and when there are many people walking, it becomes a road. The process of architecture landing is very difficult to explore, but it is also worth every enterprise to do, in this era, regardless of the size of the enterprise, we should learn to use the power of open source to avoid repeatedly building wheels. If the enterprise is small, you can directly use open source software to solve the pain point, if the enterprise is large, you can join the development of open source software, become a contributor, and solve your own problems at the same time.

Behind every seemingly relaxed, there is an unknown effort, and behind every seemingly bright, there is an unknown effort. There is a long way to go in the construction of distributed database, and our wing payers have the confidence and motivation to do it well.

The above is about the content of this article on "what are the pits used by TiDB". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.