The method of Building ClickHouse Cluster 07/13 Update SLTechnology News&Howtos

The method of Building ClickHouse Cluster

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "the method of building ClickHouse cluster". In the daily operation, I believe that many people have doubts about the method of building ClickHouse cluster. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "the method of building ClickHouse cluster". Next, please follow the editor to study!

ClickHouse is a column-oriented database and is a native vectorization execution engine. It does not follow the Hadoop ecology in the field of big data, but uses Local attached storage as storage, so the whole IO may not have the limitations of Hadoop. Its system can be applied to a large scale in a production environment because its linear scalability and reliability guarantee can natively support shard+replication solutions. It also provides some SQL direct interfaces and has a wealth of native client.

Characteristics of ClickHouse database:

Fast ClickHouse performance exceeds most column storage databases on the market, compared with the traditional data ClickHouse is 10-1000 times faster, ClickHouse still has a very big advantage. 100m datasets: ClickHouse is about 5 times faster than Vertica, 279 times faster than Hive, and 801 times faster than MySQL. 1 billion dataset: ClickHouse is about 5 times faster than Vertica, and MySQL and Hive can no longer complete the task.

Function is more than 1. Support class SQL query; 2. Support a variety of library functions (such as IP conversion, URL analysis, estimate calculation / HyperLoglog, etc.); 3. Support array (Array) and nested data structures (Nested Data Structure); 4. Database remote replication deployment is supported.

Note that since the fast query of ClickHouse is based on system resources, you should pay attention to the amount of storage on each node and the system resources of the node machine when using it. Because memory is used for aggregation when querying, the number of concurrent queries should not be too large, otherwise resources will crash.

Environment configuration

Initialize the environment (all nodes) # modify the machine's hostname vi / etc/hostname # configuration hosts vi / etc/hosts 192.168.143.20 node1 192.168.143.21 node2 192.168.143.22 node3

After the modification, execute hostname node1...3 without rebooting the machine to make it take effect

Download and install ClickHouse (all nodes)

There are four main files to download:

Clickhouse-client

Clickhouse-common-static

Clickhouse-server

Clickhouse-server-common

Rpm-ivh * .rpm install zookeeper (any node) # I choose node1 docker run-d-net host-name zookeeper zookeeper to configure the cluster (all nodes)

Modify / etc/clickhouse-server/config.xml

:: / var/lib/clickhouse/ / home/clickhouse/

Modify / etc/clickhouse-server/users.xml

5000000000000 xxxx...xxxx:: / 0 default default

Add configuration file / etc/metrika.xml

True node1 9000 root 123456 true node2 9000 root 123456 true node3 9000 root 123456 Node1 2181:: / 0 node1 10000000000 0.01 lz4

Restart the clickhouse service

If service clickhouse-server restart # is not successful, use the following command nohup / usr/bin/clickhouse-server-- config=/etc/clickhouse-server/config.xml $to create the datasheet (all nodes)

Use visualization tools to connect each node and create a MergeTree on it

Create database test; create table test.data (country String, province String, value String) engine=MergeTree () partition by (country, province) order by value; create a distribution table (node1 node) create table test.mo as test.data ENGINE = Distributed (test_cluster, test, data, rand ()); connect to clickhouse using Python

Install clickhouse-driver

Pip install clickhouse-driver

Execute a command

From clickhouse_driver import Client # connects the node client = Client ('192.168.143.20, user='root', password='123456', database='test') print (client.execute (' select count (*) from mo')) to the node in which the distribution table is created, and the study on "the method of building ClickHouse cluster" is over. I hope to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.