[giant Sequoia Database SequoiaDB] Giant Sequoia Tech | Optimization practice of hundreds of billions of super-large tables in distributed database 07/13 Update SLTechnology News&Howtos

[giant Sequoia Database SequoiaDB] Giant Sequoia Tech | Optimization practice of hundreds of billions of super-large tables in distributed database

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction

With the growth of users and the development of business, the amount of data in the business system of large enterprise users is getting larger and larger, and the performance of super-large data tables has become a major obstacle to the realization of business functions. Among them, the pipeline meter, as the most common kind of super-large table, is the performance bottleneck often encountered by enterprise users.

This paper discusses the performance tuning of large tables stored in SequoiaDB giant sequoia database based on pipelined super-large tables. SequoiaDB Giant Sequoia Database, as a distributed database of the new generation of OLTP, is widely used in massive data storage and high concurrent operation scenarios. Compared with the traditional database, distributed database has natural advantages for massive data storage and high concurrency operation. It is easy to solve the performance problem of super-large table by making rational use of the various characteristics of SequoiaDB giant sequoia database.

Data storage planning is very important

For pipelined super-large tables, early data storage planning is particularly important. Reasonable data storage planning can effectively make use of database cluster hardware resources and provide higher performance and more efficient data services.

Cluster size evaluation and hardware configuration

At the beginning of database cluster planning, it is necessary to investigate the application scale of database cluster support, system positioning and long-term business development planning, in order to evaluate the cluster scale and the reasonable allocation of CPU, memory, hard disk and network card of each server.

Accurate evaluation of the size of a database cluster is a large and complex comprehensive project, which needs to be supported by some business needs assessment data. In general, due to rapid changes in business requirements and higher-than-expected business growth, small cluster planning can be evaluated according to 1.5 times of business survey information, and large cluster planning can be evaluated at 1: 1.5 times.

The scale of the cluster needs to be evaluated by three types of information: business scale, data storage scale, and 3-year data scale of the maximum pipeline meter. Business scale evaluation needs to conduct research from the business visit volume, the proportion of various operations in the database, the distribution of operation time, and finally get the transaction TPS and operation concurrency data of all kinds of operations.

The data storage scale mainly evaluates the pipeline data, investigates the information such as stock data, 3-year incremental data estimation, data throughput and so on, and finally obtains the cluster data storage size and data throughput, and estimates the disk IOPS combined with the transaction TPS. The data size of the basic information table has little impact on the evaluation of the cluster size. As a reference information, you only need to evaluate the overall scale. For example, a cluster needs to store 1000 basic tables, with an average of each 0.1GB, and the overall need for 100GB. This order of magnitude is negligible in the cluster size of hundreds of TB.

In terms of specific operation, you can first determine the cluster storage size, balance the use of disks or solid state drives according to IOPS and data throughput, the size of each hard disk, several hard disks mounted on a server, and so on, and configure CPU and memory according to TPS, operation type ratio and concurrency. In general, it is recommended that the CPU memory ratio is 1:8, and the capacity of a single hard disk is 1.5 to 3T. The larger the capacity of a single disk, the more likely it is to have IO performance bottlenecks. According to the cluster size and cluster throughput, you can choose to use gigabit network or 10 Gigabit network.

two。 How to build a flow meter is better

Pipeline data usually contains two dimensional characteristics: the business time dimension and the business primary key dimension. For pipelining tables, it is most common to create multi-dimensional partition tables (see figure 1). First, in the main table set, the data of different business dates is divided into different sub-table sets to ensure that the amount of data in a single set is not too large; then, in the sub-table set to scatter the data to each data node of the cluster through the business primary key, the amount of data collected on a single node should be less than one million data. In addition, through the multi-dimensional partition of the business date, the horizontal expansion of the cluster can be easily realized.

It is important to note that when designing a flow table collection space, it is best to create the collection space on an annual or monthly basis, so that the storage space can be quickly backed up, deleted, and freed up when the data exceeds the data storage cycle.

3. No matter how reasonable planning is, it can't be done once and for all.

No matter how reasonable the data storage planning is, it seems so powerless in the face of the rapid growth of data and ever-changing business requirements, the growth of data will always exceed expectations, and performance bottlenecks will always occur one day. At this time, it is necessary to optimize the performance for specific performance problems.

Performance bottleneck Analysis of hardware Resources

The use of cluster hardware resources is an important basis for analyzing performance bottlenecks. Through the changes in the use of hardware resources, we can directly determine that the cluster should have business changes or operational changes when there is a change. Sort out an analysis idea from the changes.

1. Monitoring data of cluster hardware resources is the foundation.

The daily performance monitoring of the hardware resources of the cluster server is very important. Through the daily monitoring data, we can understand the fluctuation of the cluster performance.

Nmon monitoring software is a well-known open source system performance monitoring tool, which is used to monitor the resource consumption information of AIX\ linux system, and can output the results to files, and then generate data files and graphical results through nmon_analyser tools. For example, figure 2 is the main interface for the operation of nmon. There are many tutorials on the Internet about the installation of nmon tools, real-time monitoring, data collection, analysis report generation and so on.

two。 Multi-dimensional Analysis of the usage of Cluster hardware Resources

Hardware resource usage analysis is divided into real-time monitoring analysis and monitoring report data analysis, both of which need to be used together.

For normal performance problems, first use real-time monitoring to obtain the hardware resources used by specific processes, and then use report analysis to lengthen the monitoring time to find the starting point of performance changes. to see whether the performance change is periodic, random or gradual, and to locate the problem according to the production changes and relevant data operations before and after this time point.

For the performance problems that have not been reproduced for the time being, we first analyze the hardware resource changes before and after the performance problems occur by means of monitoring report analysis, and then lengthen the monitoring time to try to find out the law of resource changes. then through the application log, database log and production changes, find out the operations that cause performance problems. Finally, try to re-perform the relevant operations, and conduct real-time performance monitoring to verify the conjecture of related problems.

Finding out the inherent logic of the performance change from the hardware resource monitoring is the key to the problem analysis. mastering the change law of the hardware resources is a preliminary grasp of the performance of the cluster, and it is also the first step for performance tuning.

Business situation and Operation Analysis of Database Cluster

Mastering and understanding the performance of hardware resources is the first step in performance optimization, and it also needs to be combined with the cluster business situation and the proportion and time distribution of all kinds of operations in order to further accurately locate the performance problem.

1. Find out the business situation

Business situation mapping generally obtains business access continuously through business access monitoring interface, and analyzes the database operations that need to be performed combined with business logic to obtain information such as business TPS, concurrency, data operation type distribution, data throughput and so on, so as to evaluate the pressure of database cluster.

two。 Analysis of data distribution

The performance problem caused by uneven data distribution is a very common problem. The uneven data distribution in pipelining super-large tables is mainly due to the weak randomness of the business primary key field, which can not be hashed well. You can use other fields with strong randomness instead, or using range partition combined with data characteristics is also a method that can be tried.

How to determine whether the data is evenly distributed? First, check whether the business date segmentation is balanced by viewing the cataloging information of the main table set; then, check the cataloging information of each sub-table set to check whether the data is scattered to each node of the data domain according to the business primary key; finally, spot check 2-3 sub-table sets to check whether the amount of data in each node of the sub-table set is quite balanced.

3. Access plan analysis

The detailed access plan of the query can be obtained through query.explain ({Detail: true, Run: true}). By analyzing the access plan of the data operation statement, we can know which sub-table sets are involved in the data operation, and the access plan of each node of each sub-table set, including the scanning mode of the access plan, index usage, data situation, resource usage and other information.

4. Query monitoring and full table scan detection

By capturing the database cluster session snapshot db.snapshot (SDB_SNAP_SESSIONS), you can capture the data operation details of each query, including session state, thread information, index data information, number of data operation records, operation time, resource usage, and so on.

By constantly grabbing the session snapshot information and monitoring whether the "LastOpInfo" field in the session snapshot information contains the word "tbscan", we can detect whether there is a full table scan in the data operation.

Performance optimization guidelines

The performance optimization of pipelined super-large tables generally follows an optimization idea from operation to software to hardware, from simplicity to complexity. Before optimization, it is necessary to fully understand the software and hardware, such as data operation, database configuration, system performance and so on.

1. Create an efficient index in the right amount

Index is a special object to improve the efficiency of data access. Proper use of index can improve the efficiency of data retrieval, but improper use of index will reduce the speed of data retrieval, and seriously lead to the decline of the overall service performance of the database.

Using index to optimize the performance of pipelined super-large tables is the most economical, simple and efficient way. So, how to create the index properly, that is, it can improve the efficiency of data operation without affecting the performance of the database service?

First, you need to understand some of the features of the index that can be used for tuning, as well as the associated performance overhead costs.

Some related features:

Using binary search, the data can be located quickly, and the average complexity is O (logN).

The size of index data information is much smaller than that of table data, and index data can be cached in memory for a long time.

Using the B-tree structure, the three-tier B-tree can represent millions of data, and the number of disk I-O can be reduced by indexing the data. The smaller the data field, the better the index efficiency.

The number of H=log O depends on the height H of the B tree. Assuming that the data in the current data table is N, and the number of data items in each disk block is M, then there are: map 1) N. when the amount of data N is constant, the larger M is, the smaller H is. And M = block size / data item size, disk block size is the size of a data page, is fixed, if the data item occupies less space, the more data items, the lower the height of the tree. This is why each data item, that is, the index field, should be as small as possible. For example, int occupies 4 bytes, which is less than half of the 8 bytes of bigint. This is why the B-tree requires that the real data be placed in the leaf node rather than in the inner node. Once placed in the inner node, the data items of the disk block will drop greatly, leading to the increase of the tree level. When the data item is 1, the B-tree degenerates into a linear table.

The index has the leftmost matching feature, and the composite index should be created according to the data repetition rate and the fields used by the query.

The data item of B-tree is a compound data structure, and the search tree is built in the order from left to right. For example, when such data as (Xiao Zhang, 22, female) is retrieved, B-tree will first compare name to determine the next search direction. If name is the same, then compare age and gender in turn, and finally get the retrieved data. However, when (22, female) there is no name data, the B tree does not know which node to check next, because when building a search tree, name is the first comparison factor, and you must search according to name to know where to query next. For example, when retrieving data like (Xiao Zhang, male), B-tree can specify the search direction according to name, but the next field age is missing, so we can only find all the data whose name is "Xiao Zhang", and then match the data whose gender is "male".

The index data is ordered and supports sequential and inverse sorting, and rational use can optimize query sorting and grouping efficiency.

The lower the data repetition rate, the higher the index efficiency.

Related performance overhead:

Additional maintenance of indexes is required for data operations, and the more indexes, the greater the maintenance performance cost.

Creating an index requires sorting, additional storage space, and table locks, which consumes a lot of memory and CPU when creating an index.

One data query operation, at least two IO operations, one query index data, one access to table data.

A data insert operation produces at least 1 IO operation (n is the number of indexes). When the index triggers node recursive splitting when the leaf node is too full, the IO operation will increase sharply.

A data deletion operation produces at least 3 IO operations (n is the number of indexes). When the index triggers node recursive merging when the leaf node is empty, the IO operation will increase sharply.

In one data update operation, if the update field is not an index field, 3 IO operations are generated. If the update field is an index field, 3 IO operations are generated (n is the number of indexes), and the splitting and merging of index nodes may occur (variable length data type fields).

Mastering the relevant characteristics and performance overhead of the index, combined with the characteristics of pipelining super-large tables, through evaluation to determine the proportion of data insertion, update and query operations, and to investigate whether the update fields coincide with the query fields and other information. in order to determine whether it is necessary to create indexes and which indexes to create.

Generally speaking, pipelined oversized tables need to create pipelined primary key indexes to ensure the uniqueness of pipelined data. Other index creation needs can be evaluated according to update, query conditions, data size and other information. Usually, the efficiency of creating 2-3 indexes is ideal. If the data query operation is dominant, the number of indexes can be increased as appropriate. For query conditions composed of multiple fields, a compound index can be created according to the repetition rate of condition fields; for fields with data sorting and data grouping, a composite index can be created by sorting grouped fields.

two。 Optimize query statement

For the query optimization of pipelined super-large tables, there are two basic principles: one is to force the data query to take the specified index through index tuning, and the other is to reduce the data size of query retrieval by limiting the business date range of the query.

Index tuning can be combined with the reasonable creation of indexes as described in the "creating efficient indexes in the right amount" section of this article, and ensure that the query runs away by looking at the execution plan.

Making reasonable use of the characteristics of the master and child tables of the giant sequoia database and determining the business date dimension of the query data when querying can greatly reduce the scale of retrieved data and save database cluster resources. For example: the application needs to query the pipelining data corresponding to the serial number through the serial number (primary key), which contains the business date information. Usually, the pipeline can be queried accurately and quickly by simply using the serial number, but for a large pipelined table, there may be ten years and decades of pipelined data, and the data may be stored in hundreds of sub-tables. If you only use the serial number to query, then the database needs to index each child table once, and finally only one table will retrieve the data. However, if the business date in the serial number is intercepted as a query condition, then the database can locate the child table where the pipeline data is located through the business date partition information of the main table, and the database will only index the child table.

In addition, for large-scale business date queries that need to be paged, you can query one business date partition at a time and complete the query multiple times. For queries with large result sets of query data, paging queries are usually used, and the amount of data per page should be less than several hundred.

Generally speaking, the tuning of query statements does not have a set of fixed standard operation methods, it is a step-by-step process, only through continuous testing, optimization, re-testing, re-optimization, gradually improve the query efficiency in the continuous iteration.

3. Adjust the granularity of data segmentation

When to adjust the data segmentation granularity of pipelining super-large tables may be the biggest challenge of this section, which requires comprehensive consideration of the use of cluster hardware resources, query optimization, sub-form node data size, index size and so on.

From the point of view of the use of hardware resources, the utilization rate of the hard disk is less than 70%, and more than 70% can be considered for direct cluster expansion. In the cluster CPU and memory are basically kept below 60%, only IOPS remains high, and the IO reading tension IO write is normal, and the query business TPS is much lower than IOPS. This phenomenon shows that a query will produce a lot of IO reads. From the point of view of query optimization, there is no full table scan query, and the query statement has been optimized enough, but the query still can not meet the demand. The data scale needs to be controlled at the million level when evaluated by the data size of each node of a single subform. The number of fields and data length of a single data record also have an impact on data retrieval, especially in data update operations, the more data fields and the longer the data length, the lower the update efficiency. Evaluate the business date range in the query statement. The more accurate the query, the greater the flexibility in adjusting the sharding granularity. In addition, according to the field type of the index created and the actual index size, evaluate whether the index can be fully cached in memory. Adjust the granularity of data segmentation to the index data as far as possible so that the index data can be fully cached in memory.

In fact, adjusting the data segmentation granularity of pipelined super-large tables is to improve the degree of data concurrency and reduce the index data size of a single child table in a single node, so that the whole index data can be resident in memory and minimize IO overhead. Therefore, adjusting the data segmentation granularity will increase the consumption of memory and CPU resources.

How to adjust the granularity of data segmentation is another problem that needs to be discussed. For creating a pipelined super-large table according to the rules of building a master sub-table introduced in this article, you can simply complete the adjustment action by dividing each business date interval of the main table into multiple split intervals, that is, the data of a sub-table is split into multiple sub-tables. The key question is how many sub-tables are reasonable, and it is generally recommended that a child table be split into 2-3 sub-tables, but the most reliable way is to simulate production testing.

4. Upgrade short board hardware

Server downtime to upgrade short board hardware is a simple thing for the giant sequoia database, which naturally has the characteristics of high availability for disaster tolerance. you only need to switch the database master node of the server that needs to be upgraded to another server. you can stop the database service and upgrade it. If the hard disk is replaced, the database will automatically synchronize the data after restarting the new hard disk mount and creating the corresponding data directory and restarting the database service.

Through the analysis of the performance bottleneck of hardware resources, the bottleneck of hardware resources can be clearly seen, and the short-board hardware can be upgraded according to business needs, such as adding memory strips, replacing the disk with solid-state disk, or even replacing the whole server directly.

5. Cluster expansion

When the storage space of the database cluster is insufficient, or the above optimization methods can not solve the performance bottleneck problem, we can start to expand the cluster capacity.

When the cluster capacity is expanded, it needs to be adjusted and optimized according to the existing cluster conditions. For example, there are 3 servers in the original data domain. Due to the sharp increase in incremental data, the subsequent planned data domain needs 6 or more servers. For example, with reference to the use of hardware resources of the old server, the new server needs to adjust the hardware configuration, reduce the idle configuration requirements, improve the configuration of short board hardware resources, and so on. With a ready-made reference object, the scale evaluation and hardware configuration evaluation of cluster expansion can be much easier.

The operation of cluster expansion for pipelined super-large tables is very simple. You only need to add new data nodes to the original cluster as a new data field on the server where the giant sequoia database software is installed, and then create the sub-table set of subsequent business dates in the new data field, and then mount the sub-table set to the main table set according to the specification. So far, all the steps of cluster expansion have been completed. As shown in the following figure, when all kinds of hardware resources in data field 1 are not used enough, a new data field 2 can be added to store pipelined data after 2030.

Summary

The performance optimization of pipelined super-large tables is a continuous iterative process, which needs to be established on the basis of a full understanding of the business layer, data layer and hardware resource layer of the database cluster. in turn, the hardware resource analysis, cluster operation analysis and detection, business analysis, to complete the positioning of performance problems.

At the same time, for the users of giant sequoia database, when dealing with the optimization problem of super-large table, we can make reasonable use of the multi-dimensional partition characteristics and simple horizontal expansion mechanism of SequoiaDB giant sequoia database to easily solve the performance problem of super-large table.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.