Linux system: building ClickHouse column storage database under Centos7 07/19 Update SLTechnology News&Howtos

Linux system: building ClickHouse column storage database under Centos7

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Source code of this article: GitHub click here | | GitEE click here |

1. Introduction to ClickHouse. 1. Basic introduction.

Yandex's open source database for data analysis, called ClickHouse, is suitable for streaming or batch-loading time series data. ClickHouse should not be used as a general database, but as a distributed real-time processing platform for fast query of massive data with ultra-high performance. In terms of data summary query (such as GROUP BY), ClickHouse query speed is very fast.

Download warehouse: https://repo.yandex.ru/clickhouse Chinese documents: https://clickhouse.yandex/docs/zh/2, database features

(1) column database

Column database is a database that stores data based on column-related storage architecture, which is mainly suitable for batch data processing and real-time query.

(2) data compression

Data compression is not used in some sequential database management systems. However, data compression does play a key role in achieving excellent storage systems.

(3) disk storage of data

Many column databases can only work in memory, which results in more equipment budgets than they actually do. ClickHouse is designed for systems that work on traditional disks, providing lower storage costs per GB.

(4) Multi-core parallel processing

Large queries can be parallelized in ClickHouse in a natural way to use all the resources available on the current server.

(5) Multi-server distributed processing

In ClickHouse, data can be stored on different shard, each shard consists of a set of replica for fault tolerance, and queries can be processed on all shard in parallel.

(6) support SQL and indexing

ClickHouse supports an SQL-based query language, which is for the most part compatible with the SQL standard. Supported queries include GROUPBY,ORDERBY,IN,JOIN and unrelated subqueries. Window functions and related subqueries are not supported. Sorting the data by primary key will help ClickHouse find a specific value or range of data with a low latency of tens of milliseconds.

(7) Vector engine

In order to use CPU efficiently, data is not only stored in columns, but also processed as vectors (part of columns).

(8) Real-time data update

ClickHouse supports the definition of primary keys in tables. In order to enable the query to quickly look up the range in the primary key, the data is always stored in the MergeTree incrementally. Therefore, data can be continuously and efficiently written to the table, and there is no locking behavior in the process of writing.

2. Installation process under Linux

1. Download the warehouse

Curl-s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo os=centos dist=7 bash

2. View the installation package

Sudo yum list 'clickhouse*'

3. Install the service

Sudo yum install-y clickhouse-server clickhouse-client

4. View the installation list

Sudo yum list installed 'clickhouse*'

Console output

Installed Packagesclickhouse-client.noarchclickhouse-common-static.x86_64clickhouse-server.noarch

5. View the configuration

Cd / etc/clickhouse-server/vim config.xml data directory: / var/lib/clickhouse/ temporary directory: / var/lib/clickhouse/tmp/ log directory: / var/log/clickhouse-serverHTTP port: 8123TCP port: 9000

6. Configure access permissions

Remove the comments configured below from the config.xml file.

7. Start the service

/ etc/rc.d/init.d/clickhouse-server start

8. View the service

Ps-aux | grep clickhouse III. Basic operation 1. Table statement CREATE TABLE cs_user_info (`id` UInt64, `pass_ name` String, `pass_ word` String, `phone` String, `email` String, `create_ day` Date DEFAULT CAST (now (), 'Date')) ENGINE = MergeTree (create_day, intHash42 (id), 8192)

Note: official recommendation engine, MergeTree

The most powerful table engine in Clickhouse is the MergeTree (merge Tree) engine and other engines in this series (* MergeTree). The basic concepts of the MergeTree engine family are as follows. When you have a large amount of data to insert into the table, you need to write pieces of data in batches efficiently and want them to be merged according to certain rules in the background. This strategy is much more efficient than constantly modifying (rewriting) data into storage during insertion.

2. Write INSERT INTO cs_user_info (id,user_name,pass_word,phone,email) VALUES in batches (1) 13923456789), (2) write in batches of INSERT INTO cs_user_info (cicada1) VALUES (1), (3) write in batches (345) (3) query sentence SELECT * cicada.com'). SELECT * FROM cs_user_info WHERE user_name='smile' AND pass_word='234';SELECT * FROM cs_user_info WHERE id IN (1Jing 2); SELECT * FROM cs_user_info WHERE id=1 OR id=2 OR id=3

Query statements are very similar to manipulating MySQL databases.

Source code address GitHub address https://github.com/cicadasmile/linux-system-baseGitEE address https://gitee.com/cicadasmile/linux-system-base

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.