How to play HTAP scenarios in distributed databases 07/11 Update SLTechnology News&Howtos

How to play HTAP scenarios in distributed databases

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to play HTAP scenarios in distributed databases. The content is very detailed. Interested friends can refer to it for reference. I hope it can help you.

Pain points for traditional database architectures

1. Cluster dispersion is not conducive to integration, and the workload of data structure synchronization is large.

The first trick: data synchronization of data sources

Keep production in real time by developing programs that consume kafka to synchronize data into SequoiaDB data. The following is the architecture diagram of synchronous data loading:

Second move: flexible expansion

The third trick: multi-mode data engine use

MySQL instance is suitable for operators familiar with MySQL, and is suitable for precise query, business data writing, counter query, OLTP scenarios. Add, delete, check and modify operations are exactly the same as MySQL, and the underlying data is stored in SequoiaDB.

PostgreSQL instances are suitable for operators who are familiar with PostgreSQL and are suitable for precise queries and OLAP scenarios. Support add, delete, check and modify functions and PostgreSQL use exactly the same, using the appearance of the way to save data in SequoiaDB.

Spark instances are distributed clusters, and SparkSQL is suitable for report analysis, large table associative queries, and OLAP scenarios. Cross-database association query is friendly, supporting standard SQL, JDBC access, Python docking query.

The fourth trick: multiple SQL engine association use methods

4.1 MySQL creates tables

Create a temp.test table with the following fields:

create table temp.test ( numcode smallint, agentcode char(12), bankname varchar(120), flag decimal(8,4), timecode datetime );

Insert the following four entries into the temp.test table:

insert into temp.test (numcode,agentcode,bankname,flag,timecode)values(1,'test1','beijingbank1',10.1,'2019-06-21 10:07:52');insert into temp.test (numcode,agentcode,bankname,flag,timecode)values(2,'test2','beijingbank2',10.2,'2019-06-22 10:07:52');insert into temp.test (numcode,agentcode,bankname,flag,timecode)values(3,'test3','beijingbank3',10.3,'2019-06-23 10:07:52');insert into temp.test (numcode,agentcode,bankname,flag,timecode)values(4,'test4','beijingbank4',10.4,'2019-06-24 10:07:52');

Update bankname to "guangzhoubank" in records with numcode=1 in temp.test

mysql> update temp.test set bankname="guangzhoubank" where numcode=1; Query OK, 0 rows affected (0.00 sec)Rows matched: 0 Changed: 0 Warnings: 0

Query again after updating, showing that the update has been successful

mysql> select * from temp.test;

Delete numcode=1 in temp.test table

mysql> delete from temp.test where numcode=1; Query OK, 0 rows affected (0.01 sec) mysql> select * from temp.test;

Create a mapping table in PostgreSQL client to query data

temp=# create foreign table test temp-# ( temp(# numcode int, temp(# agentcode text, temp(# bankname text, temp(# flag decimal(8,4), temp(# timecode text temp(# ) temp-# server sdb_server temp-# options ( collectionspace 'temp', collection 'test', decimal 'on' );

Connect SparkSQL client to create mapping table and query data

create table temp.test ( numcode int, agentcode string, bankname string, flag decimal(8,4), timecode string )USING com.sequoiadb.spark OPTIONS ( host '10.139.***.***: 11810', collectionspace 'temp', collection 'test') ;

The above proves that MySQL, PostgreSQL and Spark have data communication, and data can be shared.

4.2 Using Spark to generate subtables

Connect the Spark client and create a new table test2 using create table as

create table temp.test2 USING com.sequoiadb.spark OPTIONS ( host '10.139.***.***: 11810', domain 'allDomain', collectionspace 'temp', collection 'test2', ignoreduplicatekey 'true' , shardingkey '{"_id":1}', shardingType 'hash' , compressiontype 'lzw' , autosplit 'true' )as select * from temp.test ;

Connect MySQL client and map new tables created by Spark to query synchronized data

mysql> create table temp.test2 -> ( -> numcode smallint, -> agentcode char(12), -> bankname varchar(120), -> flag decimal(8,4), -> timecode datetime -> ); mysql> select * from temp.test2;

Fifth move: the application of the multi-copy mechanism

5.1 High availability architecture for three copies in the same city

There are two main and standby computer rooms, in which two nodes are deployed in the main computer room and one node is deployed in the standby computer room. The three machines together form a data set in which the election logic follows the Raft protocol.

5.2 Primary and standby consistency settings

In distributed systems, consistency refers to the property that data remains consistent across multiple replicas. SequoiaDB supports different levels of active-standby consistency policies to adapt to different application scenarios. Users can choose different consistency policies according to the requirements of business for data security and service availability.

1) Strong consistency

Write All Nodes When a write occurs, the database returns only after ensuring that all replication group nodes are synchronized. After the write operation is successfully processed, the subsequent data read must be the latest in the current replication group. The advantage is that it can effectively ensure the integrity and security of data, while the disadvantage is that it will reduce the write performance of the replication group, and when there is a node failure or exception in the cluster, data cannot be written, reducing high availability.

In core transactional businesses, strong consistency is recommended to ensure data security while sacrificing write performance.

2) Final consistency

In order to improve the high availability of the database and achieve the separation of read and write data, SequoiaDB adopts the "final consistency" policy by default. When read and write are separated, the data read may not be up to date for some time, but the data is eventually consistent between copies.

Write Master After the master node successfully executes the write operation, the write operation can be returned. For services that do not require high data query consistency, such as historical data query platforms, batch data import at night, and query services provided during the day, it is recommended to use the final consistency strategy of writing master nodes.

Whether the set is strongly consistent or ultimately consistent is specified by ReplSize parameter when creating the set. If ReplSize is set to-1 when creating the set, it means strong consistency. By default, ReplSize value is 1 to mean final consistency. Depending on the usage scenario to choose whether to use strong consistency or final consistency, users can modify the ReplSize attribute via db.setAttributes().

Step 6: Diversified monitoring tools

SequoiaPerf tool not only helps users to quickly locate slow queries, but also helps users to fully monitor SequoiaDB data clusters. On the SequoiaPerf homepage, users can take a macro view of SequoiaDB database cluster operations and quickly check the current cluster operations.

More detailed information about the server is available on SequoiaPerf's Server Resources page.

For example, I/O usage of server disks can be zoomed in to get more detailed data. At the same time, users can also select to view the recent resource usage through the time bar in the upper right corner of the page.

summary

1. Outstanding Ability of Database in Data Management

SequoiaDB is a finance-grade distributed relational database. The product engine adopts native distributed architecture, 100% compatible with MySQL syntax and protocol, and supports complete ACID and distributed transactions. SequoiaDB also provides a multi-model database storage engine, which natively supports multi-data center disaster recovery mechanism and is the first choice for a new generation of distributed databases. SequoiaDB giant cedar database can bring the following value to users:

Fully compatible with traditional relational data, data fragmentation is completely transparent to applications

High performance with unlimited horizontal scalability

Distributed transactions and ACID capabilities

Supports structured, semi-structured and unstructured data simultaneously

Finance-level security features, multi-data center disaster recovery to achieve RPO = 0

HTAP mixes loads, running online transactions and batch tasks simultaneously without interference

Multi-tenant capability, supporting multiple levels of physical and logical isolation in cloud environments

2. practical achievements

The database perfectly solves the pain points faced by traditional databases, reduces IT costs, improves operation and maintenance efficiency, and enables data to effectively provide services to enterprises. Its advantages are as follows:

Improve query efficiency, use Spark large table query efficiency increased by 20 times.

It has multiple capacities and merges and synchronizes all the data of multiple production libraries and historical libraries.

New schema, supports multiple data types, structured and unstructured.

Simple expansion, support rapid expansion and capacity reduction, basic machine can be expanded.

Easy to use, one-click deployment, reduce dba learning costs.

Rich query engines, support multiple SQL query engines, enrich business use.

Rich interfaces, in addition to SQL also supports JDBC, ODBC and API a variety of interfaces, flexible use.

The system is complete and new business systems can be accessed at any time.

How to play HTAP scenarios on distributed databases is shared here. I hope the above content can be of some help to everyone and learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.