OLTP under the speculum and our findings (1) 10/17 Update SLTechnology News&Howtos

OLTP under the speculum and our findings (1)

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Author: Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraker

Online transaction processing (OLTP) database contains a series of functions optimized for computer technology in the 1970s-disk B-tree and heap files, lock-based concurrency control, multithreading support, and so on. Advances in modern processors, memory, and networks mean that today's computers are so different from those of 30 years ago that many OLTP databases can be placed in main memory, and most OLTP transactions can be processed in milliseconds or less. However, the database architecture has barely changed.

Based on this observation, we studied some interesting variants of traditional database systems that people can build to take advantage of recent hardware trends, and then we use a transactional database system (Shore) to run a subset of TPC-C benchmarks and speculate their performance through detailed instruction-level decomposition of the major components involved in this system. Instead of simply dissecting Shore, we modify it step by step so that after each feature removal or optimization, we have a (faster) working system to fully run our workload. Overall, we have identified overhead and optimizations that explain approximately 20 times the difference in original performance. We also prove that modern (memory-resident) database systems do not have a single "bottleneck", but spend a lot of time on logging, latching, blocking, B-tree, and buffer management operations.

Classification and subject words

H.2.4 [Database Management]: system-transaction processing; concurrency.

General terminology

Measurement, performance, experiment.

Keywords

Online transaction processing, OLTP, main memory transaction processing, database management system architecture.

1. Preface

Modern general online transaction processing (OLTP) database systems include a set of standard functions: a series of disk data structures for table storage (including heap files and B-trees), support for multiple concurrent queries through lock-based concurrency control, log-based recovery and efficient buffer managers. These features were developed to support transaction processing in the 1970s and 1980s, when OLTP databases were many times larger than main memory, and the computers running these databases cost hundreds of thousands to millions of dollars.

Today, the situation is completely different. First of all, modern processors run so fast that the computing time of many OLTP transactions is calculated in microseconds. People can buy a system with a few GB of main memory capacity for only a few thousand dollars. In addition, it is not uncommon for organizations to have a network cluster of many such workstations, in which the total memory capacity is calculated in hundreds of GB, which is enough to put many OLTP databases in random access memory.

Secondly, the rise of the Internet and a variety of data-intensive applications used in many fields have led to a growing interest in database-like applications that do not have a full set of standard database functions. Today, operating systems and web conferences are full of solutions for "database-like" storage systems with different forms of consistency, reliability, concurrency, replication and query capabilities [DG04, CDG+06, GBH+00, SMK+01].

This growing demand for database-like services, as well as significant performance improvements and reduced hardware costs, shows that a large number of interesting alternative systems can be built with a completely different feature set from the functionality provided by the standard OLTP engine.

1.1 optional DBMS schemas

Obviously, when a database can be placed in RAM, it is a good idea to optimize the OLTP system for main memory. But many other database variants are possible; for example:

No log database. A non-journaled database system may not need to be restored, or it may be possible to perform a restore from other sites in the cluster (as recommended in Harp [LGG+91], Harbor [LM06], and C-Store [SAB+05] systems).

Single threaded database. Because multithreading in OLTP databases is traditionally important for latency hiding during slow disk writes, it is not that important in in-memory database systems. In some cases, a single-threaded implementation may be sufficient, especially if it provides excellent performance. Even though people need

One way to take advantage of multiple processor cores on the same hardware, but recent advances in virtual machine technology provide a way to make these cores look like unique processing nodes without generating significant performance overhead [BDR97], which may make these designs feasible.

There is no transactional database. Many systems do not require transaction support. Especially in distributed network applications, final consistency is usually more popular than transaction consistency [Bre00, DHJ+07]. In other cases, for example, when all reads need to be completed before writing, lightweight transactions may be acceptable [AMS+07, SMA+07].

In fact, several solutions have been proposed within the database community to build database systems with some or all of the above features [WSA97, SMA+07]. One open question, however, is how well these different configurations would perform if they were actually built. This is the central issue of this white paper.

1.2 cost of measuring OLTP

To understand this problem, we chose a modern open source database system (Shore-- see http://www.cs.wisc.edu/shore/) and benchmark it on a subset of the TPC-C benchmark program. Our initial implementation on a modern desktop runs about 640 transactions per second (640 TPS). Then we modify it by removing different functions from the engine one by one, and do a new benchmark at each step until we get a microkernel that can handle 12700 TPS of query processing code. This kernel is a single-threaded, lock-free, main memory database system without recovery function. During this decomposition process, we identified four major components, and after removing them, the processing power of the system was greatly improved:

Log records. Assembling logging and tracking all changes in the database structure results in degraded system performance. If there is no need for recoverability, or if recoverability can be achieved in other ways (for example, other sites on the network), logging may no longer be necessary.

Lock it down. Because all access to the database structure is managed by a single entity (lock manager), the traditional two-phase locking can cause considerable overhead.

Latch. In multithreaded databases, many data structures must be latched before they can be accessed. After removing this feature and moving into a single-threaded approach, performance is significantly affected.

Buffer management. The main memory database system does not need to access pages through a buffer pool, thus eliminating an intermediate layer when each record is accessed.

1.3 results

Figure 1 shows how each of these modifications affects the bottom line performance of Shore (according to the CPU instructions for each TPC-C new order transaction). We can see

The bottom dotted line is useful work, measured by performing transactions on a non-overhead kernel.

Each subsystem itself accounts for 10 to 35 per cent of the total runtime (the total height of this figure represents 1.73 million instructions). Here, "hand-coded optimizations" represent a series of optimizations we have made to the code, which mainly improve the performance of the B-tree package. The instructions actually used to process the query (labeled "useful work", measured by the minimum implementation we built on the hand-coded main memory B-tree package) are only 1 hand 60 of the above data. The white box under the buffer manager represents the Shore version after we have removed all the functions, which can still run transactions, but it uses only about 1x15 instructions of the original system, or about four times the number of instructions in useful work. In our implementation, the extra overhead is due to the call stack depth of Shore, and we cannot completely remove all references to transactions and buffer managers.

Figure 1. Decomposition of the number of instructions for TPC-C new order transactions by various DBMS components. At the top of the bar chart is the performance of the original Shore (main memory database, wireless contention).

1.4 contribution and organizational structure of this White Paper

The main contributions of this white paper are: 1) carefully analyze where time is spent in modern database systems; 2) carefully measure the performance of various simplified variants of modern database systems; and 3) use these measurements to speculate the performance of various data management systems that people can build, such as systems without transactions or logs.

The organizational structure of the rest of this white paper is described below. In part 2, we will discuss OLTP features that may soon be obsolete (or already obsolete). In part 3, we will explore Shore DBMS, which is the starting point for our entire exploration process, and describe our decomposition. Part 4 contains our experiments with Shore. Then in part 5, we will use the measurements to discuss the impact of this on future OLTP engines and speculate on the performance of some hypothetical data management systems. We will introduce other related work in part 6 and summarize it in part 7.

two。 About the trend of OLTP

As mentioned in the preface, most common relational database management systems (RDBMS) originated from systems developed in the 1970s and include features such as disk-based indexes and heap files, log-based transactions, and lock-based concurrency control. However, it has been 30 years since people made these architectural decisions. Today's computing world is very different from the era when these traditional systems were invented; the purpose of this section is to explore the impact of these differences. We have made a series of similar observations in [SMA+07].

2.1 Cluster Computing

Most of the current generation of RDBMS was originally designed for multiprocessors with shared memory in the 1970s. Many vendors added support for shared disk architecture in the 1980s. Over the past two decades, we have witnessed the emergence of shared databases [DGS+90] similar to Gamma, as well as the rise of commercial computer clusters for many large computing tasks. All future database systems must be designed from scratch to run these clusters.

2.2 in-memory database

Given the rapid growth of RAM capacity over the past few decades, we have every reason to believe that many OLTP systems have been or will soon be placed in memory, especially the aggregate memory of large clusters. This is largely due to the fact that the size of most OTLP systems is not growing as significantly as RAM capacity, because the number of customers, products, and other real-world entities involved in the information they record does not grow according to Moore's Law. In view of this, it makes sense for database vendors to create systems that can optimize common use cases for memory systems. In these systems, it is important to consider optimized indexes [RR99, RR00] and tuple formats and page layouts that avoid disk optimization (or lack them) [GS92].

2.3 single thread in OLTP system

All modern databases include extensive support for multithreading, including a range of transaction concurrency control protocols and code that widely infiltrates them using latch commands to support shared structures such as multithreaded access buffer pools and index pages. The traditional motivation of multithreading is to allow one transaction to process while another transaction waits for data from disk and to prevent long-running transactions from hindering the progress of short-running transactions.

We assert that all these motives are no longer valid. First, if the database resides in memory, you no longer need to wait for data from disk. In addition, the production transaction system does not contain any user waits-transactions are executed almost entirely through stored programs. Second, the OLTP workload is very simple. A typical transaction consists of index lookups and updates that can be done in hundreds of microseconds in an in-memory system. In addition, as the modern database industry forks into a transaction processing and a data warehouse market, long-running (analytical) queries are now served by data warehouses.

One of our concerns is that multithreading needs to support machines with multiple processors. However, we believe that the solution to this problem can be to treat a physical node with multiple processors as multiple nodes in a shared cluster, which may be managed by a virtual machine monitor, which can dynamically allocate resources among these logical nodes [BDR97].

Another problem is that the network will become a new disk, introducing latency into distributed transactions and requiring transaction reintroduction. Under normal circumstances, this problem certainly exists, but for many transactional applications, it is possible to split the workload into a "single site" [_ Hel07, SMA+07] load, so that all transactions can run entirely on a single node in the cluster.

Therefore, some categories of database applications will not need to support multithreading; in these database systems, traditional blocking and latching code become unnecessary overhead.

2.4 High availability and logging

Product transaction processing system requires round-the-clock availability of 24x7. To this end, most systems use some form of high availability, essentially double (or more) hardware to ensure that a spare is available in the event of a failure.

A recent paper [LM06] points out that, at least for warehouse systems, it is possible to use these available spare parts to facilitate recovery. In particular, we can complete the recovery by replicating the lost state from other database replicas instead of using REDO logs. In our previous research, we have claimed that this can also be used in transactional systems [SMA+07]. If this is the case, the recovery code in the traditional database will also become an unnecessary overhead.

2.5 transaction variants

Although many OLTP systems explicitly require transaction semantics, recently some solutions have emerged for data management systems with weak consistency, especially in the Internet domain. Generally, we believe that usability is more important than transaction semantics, so what we need is some form of final consistency [Bre00, DHJ+07]. Databases in such environments may not require machines developed for transactions such as logs, locks, two-phase commit, and so on.

Even if someone needs some kind of strict consistency, there may be many weak consistency models. For example, the widespread adoption of snapshot isolation (non-transactional) indicates that many users are willing to trade transaction semantics for performance (in this case, due to the elimination of read locks).

Finally, the latest research shows that a limited number of transactions require far fewer machines than standard database transactions. For example, if all transactions are "two-phase", that is, they perform all their read operations before any write operation and are guaranteed not to abort after the read is completed, then there is no need for UNDO logs to exist [AMS+07, SMA+07].

2.6 Summary

As our references show, a number of research groups, including _ Amazon [DHJ+07], HP [AMS+07], NYU [WSA97] and MIT

[SMA+07] have shown their interest in building systems that are fundamentally different from classic OTLP designs. In particular, the MIT H-Store [SMA+07] system has demonstrated that removing all of the above features can result in two orders of magnitude acceleration in terms of transaction throughput, indicating that some of these database variants are likely to provide excellent performance. Therefore, it seems necessary for traditional database vendors to consider producing products that explicitly disable these features. To help these practitioners understand the performance impact of the different variants they might consider building, we continue to do detailed performance studies on Shore and the variants we built.

(to be continued)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.