In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Since Ant Financial Services Group's self-research database OceanBase won the first place in TPC-C test, it has attracted a lot of attention inside and outside the industry. we sincerely thank you for your support and love for OceanBase, and humbly listen to the opinions and suggestions of the outside world. In order to better understand the technical details of the test, we specially invited the core R & D staff of OceanBase to make a professional technical interpretation of this test. This is the first article, and the follow-up article will be released in the near future.
OceanBase was established in 2010. Over the past nine years, R & D staff have been making improvements and adding new functions to OceanBase one step at a time. OceanBase also began to serve Alipay and gradually opened up to the outside world to provide services for customers in various industries. In this process, we hope that the outside world has a more intuitive understanding of the strength of OceanBase, so that customers have more confidence in our products. TPC-C testing provides us with an excellent stage.
Through this test, we found some shortcomings of OceanBase. For example, the previous stand-alone database can only improve the processing power by adding CPU, memory and so on. Through the distributed architecture, OceanBase can make a large number of ordinary hardware devices process data like a computer. If you want to improve the performance, you only need to increase the equipment. However, the performance of OceanBase on each device still has a lot of room for improvement. In addition, there are still some gaps in the functionality, ease of use and database ecology supported by OceanBase compared with the industry benchmark.
Next, OceanBase will make efforts in two key directions, one is compatible with the various functions provided by Oracle database to facilitate customers to switch between different databases, and the other is to improve OLAP processing capabilities, that is, data analysis and mining capabilities, to use the same engine to support both OLAP and OLTP, and to improve the ability of OceanBase in big data processing.
In the future, we will open source this TPC-C testing tool, and hope to communicate with our counterparts in the industry to discuss the development and future of database technology.
Text
There are many articles about TPC-C benchmark on the network, and some database vendors claim to have tested TPC-C and obtained millions of tpmC on a single machine, tens of millions of distributed tpmC, and so on. What is the real situation?
As many people know, the International Affairs performance Committee (TPC) is a non-profit organization founded by dozens of member companies. TPC-C is a benchmark developed by TPC for order creation and payment for commodity sales, and an authoritative benchmark for database online transaction processing system (OLTP). TPC-C has five transactions, each with a specified proportion, with order payment not less than 43%, order query, order delivery and inventory query no less than 4%, and the rest for order creation (no more than 45%). TPMC value is the number of order creation transactions executed per minute.
The TPC-C benchmark test must be audited by the TPC organization (precisely by an auditor approved by the TPC-C organization). According to the audited TPC-C results, the complete and detailed test report (including the test manufacturer, database version, detailed software and hardware configuration, testing process, etc.) will be published on the TPC organization's website (www.tpc.org). It is not only infringement but also illegal to claim that he has passed the TPC-C test and obtained XXX tpmC without passing the audit of TPC. Apart from OceanBase, there is no TPC-C benchmark test report for a domestic database on the TPC website, whether it is completely independently developed or modified on the basis of open source.
Why do TPC-C benchmark tests have to be audited by the TPC organization? It also starts with the birth of the TPC organization. After the emergence of database online transaction processing system OLTP (Online Transactional Processing) in 1980s, it has greatly promoted the development of online transaction processing systems such as automatic teller machine (Automated teller transaction,ATM). Each database manufacturer tries to prove to its customers that its system has the best performance and the strongest processing power, but because there is no unified performance test standard, there is no one to supervise the execution of performance tests and the release of results. On the one hand, customers are unable to compare between different systems, on the other hand, the performance test data of database vendors are not convincing enough.
At the beginning of 1985, Jim Gray, together with 24 colleagues from academia and industry, published an article called "A Measure of Transaction Processing Power" and proposed a test method of online transaction processing capability, DebitCredit. DebitCredit defines some key characteristics of the database performance benchmark (http://www.tpc.org/information/about/history.asp):
The functional requirements of the system under test are defined rather than the software and hardware itself.
The expansion criterion of the tested system is defined, that is, the performance is matched with the amount of data.
It is stipulated that the transaction of the system under test needs to be completed within a specified time (for example, 95% of the transaction is completed within 1 s)
Incorporate the overall cost of the system under test into the performance benchmark
DebitCredit has established a unified and scientific standard to measure the performance of the database online transaction processing system, and the subsequent related benchmark has basically developed on this basis. However, some manufacturers delete some key requirements of the DebitCredit standard and test them in order to obtain better performance values (this practice is now also used by some domestic database manufacturers in TPC-C benchmark testing), which leads to the lack of a truly unified standard for measuring the performance of the database's online transaction processing system: if a law (DebitCredit) made by Jim Gray and others for the database's online transaction processing system benchmark But there is no law enforcement team to ensure the enforcement of the law. In 1988, Omri Serlin (http://www.tpc.org/information/who/serlin.asp), the founder of TPC, successfully persuaded eight companies to set up a non-profit TPC organization to uniformly formulate and issue benchmark standards and supervise and audit database benchmark testing.
After more than 30 years of development, the TPC organization has more than 20 members, which has created and improved a number of benchmark standards for database performance, and has been accepted all over the world. For example, the first version of TPC-C was released in 1992 and has since undergone several revisions to adapt to changes in requirements and technologies. In order to prevent manufacturers from tampering with TPC-C standards to get higher performance value, TPC requires all TPC test results to be audited by auditors approved by TPC Organization. The auditors review the test process and results in detail. After the audit is passed, the audit results together with the complete test report are submitted to Technical Advisory Board (TAB) of TPC Organization. After the TAB audit, there will be a 60-day publicity. During the publicity period, if any dissenting vendors need to prove that their tests meet the corresponding TPC standards (and run the benchmark test program again if necessary).
TPC-C is a good abstraction of actual business systems such as sales and payment of goods. In the course of preparing for the TPC-C test, we found a lot of poor performance of OceanBase. After optimizing and improving these places, we found that OceanBase has achieved the performance optimization goal of double 11 this year (2019): in fact, two of the five TPC-C transactions account for the highest proportion, order creation (new order, 45%) and order payment (payment, 43%). In fact, it corresponds to order creation and order payment in the production system. So the TPC-C model looks simple, and it is precisely this model that abstracts the actual online transaction processing very well.
As a widely accepted standard, TPC-C is very rigorous and greatly eliminates cheating:
First of all, benchmark,TPC-C, as an OLTP online transaction processing system, requires that the database under test must meet the ACID of database transactions, that is, atomicity, consistency, isolation and persistence, in which isolation is serializable isolation level, persistence is required to resist any single point of failure and so on. Obviously, this is a basic requirement for an OLTP database. In a distributed environment, the two main transactions of TPC-C, order creation (new order) and order payment (payment), have 10% and 15% distributed transactions respectively (up to 15 nodes). The ACID of transactions is a great challenge to distributed databases, especially the serializable isolation level, which is one of the main reasons why few distributed databases pass the TPC-C test. Some domestic manufacturers confuse the concept of distributed database and stack multiple stand-alone databases together, which is called distributed database. In fact, although each stand-alone database satisfies ACID, these stacked stand-alone databases as a whole do not meet ACID.
Secondly, TPC-C stipulates that the performance (tpmC) of the database under test is proportional to the amount of data, and in fact the same is true of real business scenarios. The basic data unit of TPC-C is warehouse, and the amount of data in each warehouse is usually about 70MB (related to the specific implementation). TPC-C requires end users to select five transactions according to the prescribed proportion when selecting transaction types. End users have a certain input time (fixed for each transaction) and a certain range of random thinking time (a logarithmic function). According to these requirements The upper limit of tpmC available per warehouse is 12.86 (assuming the response time of the database is 0). Suppose a system gets 1.5 million tpmC, corresponding to 120000 warehouses, and according to 70MB/ warehouse calculation, the amount of data is about 8.4TB, while TPC-C also requires the system to have a storage capacity of 60 days and 8 hours of stress testing every day, so the storage capacity of the system may be 30TB or more, while some manufacturers use hundreds or thousands of warehouses to store all of them, ignoring the maximum tpmC limit of a single warehouse, and then claim to get millions of tpmC Not only does it not conform to most real business scenarios, but it also clearly violates the TPC-C specification, just like some companies did before the establishment of the TPC organization.
Third, TPC-C requires that the database under test can run for a long time with smooth performance. During the test, after removing the start-up preheating (ramp up) and ending speed reduction (ramp down) time, the tested database should run smoothly (steady state) for at least 8 hours, and the cumulative fluctuation of performance within the performance collection period (not less than 2 hours) should not exceed 2%. As we all know, the performance of all kinds of computer systems will fluctuate greatly under the limit pressure and may be crushed and collapse. in order to avoid being crushed, the actual production environment will never let the system in the limit pressure. The regulation of TPC-C is based on the actual production demand. In addition, TPC-C requires the tested database to run for a long time, which is also the requirement of the actual production system. Some database vendors let the database hit a peak in performance in a very short period of time, neither ensuring the stable operation of the database over a long period of time, let alone a performance fluctuation of less than 2%, but claiming that their database reached this peak performance. In this benchmark test, OceanBase achieved an 8-hour performance fluctuation of less than 0.5%.
Fourth, TPC-C requires that the result of the write transaction of the database under test must be down within a certain period of time (database data, not logs, in fact, redo log is down before the transaction is committed). For databases with checkpoint function, the interval of checkpoint must not exceed 30 minutes, and the persistence time of checkpoint data must not exceed the interval of checkpoint. We understand that this is to ensure that the database system has a short fault recovery time in abnormal cases such as power outage. The data of the traditional database is based on data blocks (such as 4KB/8KB 's page/block). To do this is to flush the dirty pages. However, this is not the case with OceanBase, because the first OceanBase is a cross-machine deployment of multiple copies (3 copies in this test), which can be immediately recovered (RTO=30s) and data lossless (RPO=0) in the event of a single machine exception, and does not depend on the data of the write transaction. The second reason: OceanBase is the structure of "baseline data on hard disk + modify incremental data in memory". The design is to modify incremental data once a day (that is, daily consolidation, which can automatically increase the number of daily merges according to the increase of business). The actual production system does not need and does not rely on data in a relatively short period of time (such as 30 minutes). In the TPC-C benchmark test, OceanBase sets checkpointing to ensure that the interval of all checkpoint is less than 30 minutes, and the persistence time of checkpoint data is less than the checkpoint interval, in order to comply with the TPC-C specification.
Fifth, business oriented optimization (profile-directed optimization,PDO) can improve the performance of the software, and TPC-C also allows the use of PDO, but there are some limitations, such as the version optimized by PDO needs to be used by customers, and database manufacturers need to provide technical support for the optimized version of PDO. To avoid possible objections, OceanBase does not use PDO.
Finally, although the TPC-C specification is very strict, it still encourages the use of new technologies and methods. For example, in the TPC-C benchmark test of OceanBase, instead of purchasing physical servers and storage like the previous TPC-C benchmark, the ECS virtual machine of Aliyun Public Cloud is rented, which not only makes it easy to expand / reduce capacity, but also greatly reduces the actual test cost by leasing on-demand.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.