Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the common ClickHouse cluster deployment architecture?

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about what common ClickHouse cluster deployment architectures look like. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

An overview

ClickHouse is different from the distributed system with master-slave architecture such as Elasticsearch and HDFS. It adopts multi-master (no center) architecture. Each node in the cluster plays an equal role, and the client can get the same effect when visiting any node.

ClickHouse uses shards to split data horizontally, while sharding depends on clusters. Each cluster consists of 1 to more shards, and each shard corresponds to one service node of CH. The upper limit of the number of shards depends on the number of nodes (one shard can only correspond to one service node).

But ClickHouse does not have the highly automated sharding function like other distributed systems; CH provides the concept of local table and distributed table; a local table is equivalent to a data shard. The distributed table is a logical table, which does not store any data, it is the access agent of the local surface, and its function is similar to the sub-library middleware. With the help of the distributed table, the agent can access multiple data fragments, thus realizing the distributed query. Of course, data distribution can also be implemented at the application layer.

ClickHouse also supports data replicas, and the concept of replicas is similar to that of Elasticsearch, but sharding in CH is actually a logical concept, and its physical load is borne by replicas.

The data replicas of ClickHouse are generally realized by ReplicatedMergeTree replication table series engine, and the data consistency between replicas is realized by ZooKeeper. In addition, it can also be responsible for the data writing of sharding and copies at the same time through the distributed table.

Second, cluster deployment architecture

Take the implementation of multi-sharding and double copies with four nodes as an example:

Option one

(shard as the master copy in the image above)

Create a data table in each node as a data fragment, use the ReplicatedMergeTree table engine to implement the data copy, and the distribution table as the entry for data writing and query.

This is the most common cluster implementation.

Option 2

Create a data table in each node, as a data shard, and the distribution table is responsible for both sharding and replica data writing.

In this implementation, there is no need to use replication tables, but the distribution table node needs to be responsible for both sharding and replica data writing, which is likely to be called the single point bottleneck of writing.

Option 3

Create a data table at each node as a data fragment and create two distribution tables at the same time, each of which contains only half of the data.

The implementation of the replica still requires the help of the ReplicatedMergeTree class table engine.

Option 4

Two data tables are created in each node, two copies of the same data fragment are located on different nodes, and each distributed table contains the general data.

This scheme can achieve data distribution and redundancy on fewer nodes, but the deployment is slightly cumbersome.

The fragmentation and replica functions of CH are realized entirely by configuration files and cannot be managed automatically, so when the scale of the cluster is large, the operation and maintenance cost of the cluster is higher.

Data replicas rely on ZooKeeper for synchronization. When the amount of data is large, ZooKeeper may be called a bottleneck.

If the resources are sufficient, it is recommended to use option 1, where the primary and secondary replicas are located on different nodes to better achieve read-write separation and load balancing.

If the resources are not sufficient, you can use scenario 4, where each node hosts two copies, but the deployment is slightly more complex.

Thank you for reading! This is the end of this article on "what is the common ClickHouse cluster deployment architecture?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report