How to choose a database 07/16 Update SLTechnology News&Howtos

How to choose a database

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to select a database. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

A database is a modular system made up of several parts: a transport layer that accepts requests, a query processor that decides to run queries in the most efficient way, an execution engine that performs operations, and a storage engine.

A storage engine (or database engine) is a software component of a database that stores, retrieves, and manages data in memory and on disk, and is designed to preserve data for each node over time [REED78]. The database can respond to complex queries, while the storage engine takes a more granular view of the data and provides a simple set of data manipulation APIs that allow users to create, update, delete, and retrieve data records. From one perspective, a database is an application built on top of a storage engine that provides a table schema, query language, indexes, transactions, and many other useful features.

For flexibility, keys and values can be arbitrary byte sequences without a preset format. Their ordering and presentation semantics are defined in higher-level subsystems. For example, you can use int32 (a 32-bit integer) as a key in one table and ascii(an ASCII string) in another; both keys are just serialized entries from the storage engine's point of view.

Storage engines such as Berkeley DB, LevelDB(and its descendants RocksDB), LMDB(and its descendants libmdbx, Sophia, and HaloDB) were developed independently of the databases they are now embedded in. Using pluggable storage engines enables database developers to build database systems using existing storage engines and focus on other subsystems.

At the same time, clear decoupling between database system components provides opportunities to switch between different engines that may be appropriate for specific use cases. For example, the popular database MySQL has several storage engines, including InnoDB, MyISAM, and RocksDB(in the MyRocks distribution), while MongoDB allows switching between WiredTiger, Memory, and (now deprecated) MMAPv1 storage engines.

Database comparisons will have a long-term impact on the choice of database systems. If the database you choose is inappropriate (because it causes performance issues, consistency issues, or operational challenges), it's best to spot this early in the development cycle, as migrating to a different system may not be easy, and in some cases you may even need to make significant changes to the application code.

Every database system has advantages and disadvantages. To reduce the risk of a costly migration, you can invest some time in selecting a database to ensure it has the capability to meet your application's needs.

Trying to find a database component (For example: storage engine used, how data is shared, replicated and distributed, etc.), their ranking (Popularity values presented by consulting agencies such as ThoughtWorks or databases such as DB-Engines and Database of Databases) or implementation language (C, Java, or Go, etc.) can lead to invalid and premature conclusions, these methods can only be used for high-level comparisons, and crude comparisons such as choosing between HBase and SQLite can occur. Therefore, even a superficial understanding of how each database works and its internal structure can help you to draw a more reliable conclusion.

Every comparison should start with clearly defined objectives, because even the smallest deviation can invalidate the entire survey. If you're looking for a database that's perfect for your current (or future) workload, the best thing you can do is simulate those workloads on different database systems, measure the performance metrics that matter to you, and compare the results. Some issues, especially performance and scalability issues, only begin to manifest themselves over time or as capacity grows. In order to detect potential problems, it is best to conduct long-term operational testing in an environment that is as close as possible to the real production environment.

Simulating a real-world workload not only helps you understand how your database works, but also helps you learn how to operate and debug your database and how friendly and helpful its community is. The choice of database is always a combination of these factors, and performance is usually not the most important aspect: using a database that holds data slowly is usually much better than using a database that loses data quickly.

To compare databases, it is helpful to understand use cases in great detail and define current and expected variables, such as:

Table structure and record size

number of clients

Query types and access patterns

Read/write query rate

Expected changes in any of these variables

Identifying these variables can help answer the following questions:

Does the database support the required queries?

Can the database handle the amount of data we plan to store?

How many reads and writes can a single node handle?

How many nodes does a system plan have?

Given the expected growth rate, how do we scale the cluster?

What is the maintenance process?

After answering these questions, you can build a test cluster and simulate your workload. Most databases already have pressure testing tools that can be used to reproduce specific use cases. If there are no standard profiling tools to generate real-world random workloads in the database ecosystem, this could be a red flag. If something prevents you from using the tools that come with the database, try one of the existing generic tools or implement one from scratch.

If the test results are satisfactory, further familiarity with the database code may be more helpful. To read the source code, first understand the various parts of the database, how to find the source code for the different components, and then browse through the components. Even a cursory understanding of the database code base helps you better understand the logs and configuration parameters it generates, and helps you spot problems in applications that use the database, and even in the database code itself.

Some people think it's good to be able to think of databases as black boxes without knowing what's inside. But practice often shows that sooner or later, you'll run into bugs, service outages, performance setbacks, or other problems. You'd better be prepared for these questions, because if you understand and understand the internal structure of your database, you can reduce business risk and be more likely to recover quickly.

A popular tool for benchmarking, performance evaluation, and comparison is Yahoo! Cloud Serving Benchmark(YCSB)。YCSB provides a framework and a common set of workloads that can be applied to different data stores. Like anything generic, you should be careful with this tool because it's easy to draw the wrong conclusions with it. In order to make a fair comparison and make an informed decision, you need to invest enough time to understand the real-world environment in which the database will operate and adjust the content of the benchmark accordingly.

The Transaction Processing Performance Council (TPC) provides a set of benchmarks that database vendors use to compare and advertise the performance of their products. TPC-C is an online transaction processing (OLTP) benchmark that is a mix of read-only and update transactions to simulate common application workloads.

The benchmark focuses on the performance and correctness of concurrent transactions executed. The primary performance metric is throughput: the number of transactions a database system can process per minute. It requires the execution of a transaction that has ACID attributes and conforms to the attribute set defined by the baseline itself.

This benchmark does not focus on any particular business unit, but provides an abstract set of operations that are important for most applications that apply OLTP databases. It includes several tables and entities, such as warehouses, inventory, customers, and orders, and specifies table layouts, details of transactions that can be performed on the tables, minimum rows for the tables, and data persistence constraints.

This does not mean that benchmarks can only be used to compare databases. Benchmarks can be used to define and test details of SLAs Note 1, understand system requirements, and capacity planning. The more you know about the database before you use it, the more time you save running it in production.

Choosing a database is a long-term decision, and it's best to track new releases, understand what's changed and why, and develop an upgrade strategy. New releases often contain improvements and fixes to bugs and security issues, but may also introduce new bugs, performance degradation, or unexpected behavior, so it is also critical to test new releases before deploying them. Looking at how database developers have handled upgrades in the past may give you a good idea of what to expect in the future. A smooth upgrade in the past does not guarantee that future upgrades will be so smooth, but a complex upgrade in the past may also be a sign that future upgrades will not be easy.

As users, we can see how the database behaves under different conditions, but when using the database, we must make choices that directly affect its behavior.

Designing a storage engine is certainly much more complex than just implementing a textbook data structure: it's hard to get many details and boundary cases right at the start. We need to design the physical data layout and organize pointers, decide on the serialization format, understand how the data will be garbage collected, how the storage engine fits into the semantics of the entire database system, explore how to make it work in a concurrent environment, and finally ensure that no data is lost under any circumstances.

Not only are there many things to decide, but most of these decisions involve trade-offs. For example, if we save the data in the order they were inserted into the database, we can store them faster; but if we retrieve them in dictionary order, we have to reorder them before returning the results to the client. As you will see in this book, there are many different approaches to storage engine design, and each implementation has its own advantages and disadvantages.

As we explore the different storage engines, we will discuss their advantages and disadvantages. If there was an absolutely optimal storage engine for every conceivable use case, everyone would use it. But there is no such storage engine, so we need to make smart choices based on the workload and use case of the service.

There are many storage engines on the market that use a variety of data structures and are implemented in different languages-from low-level languages such as C to high-level languages such as Java. All storage engines face the same challenges and limitations. An analogy can be made to urban planning: we construct a city for a given population and choose whether to expand the city in height or size. Both cases can put the same number of people into the city, but these methods lead to very different lifestyles. When cities are built at heights, where people live in apartments, population density may lead to increased traffic in smaller areas; in a larger and more dispersed city, people are more likely to live in larger houses, but commute farther.

Similarly, developers of storage engines make design decisions that make them better suited for different situations: some optimize for low read and write latency, some try to maximize storage density (the amount of data stored per node), and some focus on operational simplicity.

Thank you for reading! About "how to choose database" this article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.