Giant Sequoia Tech | SequoiaDB data domain and storage planning 07/06 Update SLTechnology News&Howtos

Giant Sequoia Tech | SequoiaDB data domain and storage planning

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1 background

In recent years, the business of the enterprise has developed rapidly, the number of customers has been increasing, the pressure of the background service system is also increasing, and the hardware resources of the system have become very tight. Therefore, on the basis of controllable technical risk, we hope to introduce big data technology, use big data technology to optimize the existing IT system to achieve upgrading, and build a service platform for unified storage and management of historical and near-line data, while supporting high concurrency and low latency data query services, so as to improve the computing power of IT system, reduce the construction cost of IT system, and optimize the service system of IT system. Provide higher quality IT services for various business units.

This kind of service platform is essentially a system that lightens the burden of the core business system in the whole IT system architecture. SequoiaDB giant sequoia database supports massive distributed data storage, vertical partition and horizontal partition. Using these features, historical and near-line data can be stored in SequoiaDB, and data query services with high concurrency and low latency can be supported. This paper mainly explains how to make use of the characteristics of giant sequoia data database domain to carry out storage planning in historical and near-line data application scenarios to meet the business system requirements for performance, storage, maintenance and other convenience.

2 related concepts

Multi-dimensional data partition

SequoiaDB supports horizontal and vertical partitioning. Using hash or range horizontal partitioning is to distribute data to multiple nodes to increase data throughput and accelerate data query and writing; range vertical partitioning is to divide the data logically into multiple intervals in one node, each interval as an independent storage unit, reducing the network Icano when querying, and further speeding up the query.

Horizontal zoning

Hash horizontal partition, the principle is to select the partition key for hash operation, according to the hash value to distribute the data to the corresponding partition. The range horizontal partition directly matches the partition key and the corresponding range, and is stored in the corresponding partition. Each of the two partitioning methods has its own applicable scenarios, which are closely related to the running business. Range horizontal partitioning is generally not recommended, except for range partitioning keys (such as month) to ensure data balance (such as monthly data magnitude consistency). This is shown in figure 2-1.

Figure 2-1. SequoiaDB horizontal partition

Vertical partition

Vertical partitioning refers to the collection of data in a node according to a certain field, divided into multiple data segments. Each range represents a vertical partition. When data is queried and written, it is automatically distributed to the corresponding partition. Vertical partitioning greatly reduces the hard disk data access, reduces the network Imax O, and speeds up the query. Vertical partitions share resources (the same physical machine), starting from isolating hot and cold data, as shown in figure 2-2.

Figure 2-2. SequoiaDB vertical partition

Replication groups and domain

Partition group is also known as replication group, a replication group can contain one or more data nodes (or catalog nodes), the data between nodes use asynchronous log replication mechanism to maintain the final consistency.

A Domain is a logical unit consisting of several replication groups (ReplicaGroup). Each domain can automatically manage its data according to defined policies, such as data slicing and data isolation.

Take three servers as an example, each with 9 disks. The physical deployment of the replication group and the logical composition of the domain are shown in figure 2-3:

Using 3 copies, deploy data nodes by disk, each machine deploys 9 data nodes, 3 machines horizontally form data sets, a total of 9 data sets. For example, field 1 includes data sets 1-3, field 2 includes data sets 5-9, and field 3 includes data sets 1-6, so the domain is logically composed of data sets, and the data sets can overlap.

3 Business scenarios

With the increase of users and the development of business, the amount of data in the business system of large enterprise users is getting larger and larger, and most of the original systems are based on relational data, and the table structure is complex, and each query needs to be associated with several large data tables. as a result, the performance of associated queries is very low.

Therefore, we can use SequoiaDB to store a large amount of historical and near-line data and develop a unified entry for data query, and store historical and near-line data uniformly online according to the rules of data life cycle management. In addition, the platform provides high concurrency and real-time query service, which solves the problem of slow performance of massive data association query in relational database.

According to the business system history and the requirements of near-line data, the establishment of historical and near-line data storage areas are used to store the original data imported directly from the source system, including data beyond the retention period of the production system and data that need to be backed up on time. At the same time, in order to provide the data processing ability of online, medium-high concurrency and small result sets, multiple storage areas can be divided according to the source system, and the cluster can be classified and managed by dividing data domains.

4 data domain division method

When the business system accesses the structured data to the giant sequoia database through the access platform, the business system needs to be classified according to the data research information to determine the storage capacity, concurrency size, data life cycle and so on of the business system, so as to provide information support for the storage planning of structured data in the giant sequoia database. When the structured data of the business system is stored in the giant sequoia database, the data domain technology can be used to divide the data area of the business system. The specific division method is as follows:

Massive data or highly concurrent query business system

This kind of business system is characterized by large concurrency of business query, large storage space of data, and high requirements for cpu, memory and network. Using domain to isolate this kind of system can make full use of the physical resources of the domain machine in the cluster to improve the performance.

Business systems with small amount of data or small number of concurrent queries

This kind of business systems generally have low requirements for cpu, memory and network, and occupy less storage space. Therefore, such systems can share data sets with the data domains of other concurrent business systems that occupy less storage to save machine resources.

5 expand the cluster horizontally by using the data domain

At present, some business systems of enterprises have a large annual growth of structured data and more and more data. After the business system is put into production, the available capacity of the cluster becomes smaller with the increase of business volume, so it is necessary to consider the horizontal expansion of the whole cluster after the storage capacity is exhausted before the business system is connected to the cluster. SequoiaDB is a distributed database, so it is possible to achieve near-linear growth of cluster performance through cluster expansion. After the expansion, the two main problems are the capacity of data storage and the performance of the whole cluster. Due to the continuous growth of the amount of data and the promotion and use after launch, it is necessary to expand the capacity to improve the performance of the cluster and increase the data storage space.

SequoiaDB defines the concept of a data domain in the management of a cluster, and a data domain can include multiple data groups. A cluster can divide different data domains according to different business systems, which not only realizes the isolated storage of different business system data at the physical level, but also realizes the unified scheduling and management of different business system data. and the future cluster expansion can only be carried out for this domain according to the needs of the domain.

Therefore, in the expansion, we need to combine the SequoiaDB data domain and business system requirements for capacity expansion planning and implementation. When the structured data is expanded, the data group can be added to the data domain where the structured data is located, and then the data can be evenly segmented to the newly expanded machine, or the sub-table can be scattered in a separate set space, and the data group corresponding to the set space of the sub-table can be placed on the data group of the new expansion machine.

6 specific cases

The historical data of a certain system of a large financial user is 60T, and there is about 80g of incremental data every day. Assuming that the storage plan is made according to the total amount of data in 3 years, then the proportion of data in three years to the total storage is about (60+80G365/1024) 3 to 266T.

Suppose the hardware configuration information provided by the customer is as follows:

The specific installation and deployment is shown in figure 6-1:

Figure 6-1

Storage rule

According to the information of the above business systems, this kind of system can be divided into high concurrent mass storage business scenarios. Combined with the data domain division method and the future expansion requirements, the structured data of the business system is stored in the data domain partition rules as follows:

1. The highly concurrent query business system for massive data uses independent domains for storage.

2. Use the master child table to split according to time, and each child table is distributed to all the machines corresponding to the domain according to the ID hash.

3. The collection space is used separately for the sub-table of the business system with high concurrency of massive data.

4. Business systems with small amount of data or small concurrent queries can share the domain.

5. Structural domain expansion can be used to increase the data set and then evenly segment the data to the new expansion machine, or make the sub-table scattered in a separate set space, and make the data group of the set space corresponding to the sub-table on the data set of the new expansion machine.

Set space and set design

According to the above storage rules, structured data and unstructured data are stored in this business scenario as follows:

Use the master child table to split according to time, each child table is hashed according to ID or business field, the span of the child table is divided by month, and the child table is scattered in a separate collection space.

7 Summary

Data fields are logically composed of one or more data sets, physically corresponding to specific data nodes, and data sets can be overlapped between different domains. Therefore, the business system can divide the cluster into different regions to store structured data according to the characteristics of the master and child tables and the flexibility of the data groups contained in the domain, and make full use of the computing and storage resources of the cluster. Giant sequoia database, supports massive data storage, SequoiaDB supports vertical and horizontal partitions, and provides data query services with high concurrency and low latency.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.