What are the optimization skills of clickhouse SQL 07/12 Update SLTechnology News&Howtos

What are the optimization skills of clickhouse SQL

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what clickhouse SQL optimization skills are, the article is very detailed, has a certain reference value, interested friends must read it!

Most of the slow sql queries are caused by high cpu load, high io, or no index in the query column. Note: clickhouse itself does not support high concurrency scenarios, and too high qps will lead to too high cpu on the clickhouse server, resulting in slow query.

In these cases, it is common to consider whether there are complex operations in sql, whether the quantity of the query is too large, and whether the index in the column of the query is valid

Sql query features: large quantity and large partition span

There are more than 800 million pieces of data in the data table, and the data table is partitioned according to p_data_day

Select sn,COUNT (1) as valueQt from data WHERE sn='70A0600018109' and p_day > = '2017-01-01' and p_data_day < '2020-08-13'group by sn

The data traverses the entire partition, and the data is returned in about 1 second on average.

Optimization idea: reduce the traversal of unnecessary data (partitioning); make full use of clickhouse index (group by index)

According to the query of sn, the materialized view is established, and the 800 million pieces of data are divided into 256 partitions according to sn number and device_id (mac_code).

Create MATERIALIZED VIEW IF NOT EXISTS data_sn_materializedengine = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data_sn_materialized','{replica}') PARTITION BY sn_sort_key ORDER BY (sn_sort_key,sn,p_day) AS select halfMD5 (_ sn) 256 as sn_sort_key,sn,p_day,count () as cnt from data group by sn_sort_key,sn,p_day

Query statement; keep the original output and input parameters unchanged, and the data can be returned within the 200ms

Sql query features: large quantity and large partition span

The amount of data in the data table is more than 1 billion, and the table statement is as follows

CREATE TABLE data (`data_ day` Date, `flow_ type` UInt32 DEFAULT CAST (0, 'UInt32'),.) ENGINE = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data','{replica}') PARTITION BY data_day ORDER BY (flow_type, data_day) SETTINGS index_granularity = 8192

Query statement

Select... From data where data_day = '2020-09-11' We observe that when querying data, it is always specific to yesterday; and historical data will no longer be used

Optimization idea: use TTL of clickhouse to reduce table capacity, CREATE TABLE dwrt.lc_order_flow (`data_ day` Date,. `flow_ type`UInt32 DEFAULT CAST (0, 'UInt32'),.... ) ENGINE = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data','{replica}') PARTITION BY data_day ORDER BY (data_day, flow_type) TTL data_day + toIntervalDay (7) SETTINGS index_granularity = 8192

The above is all the content of this article "what are the clickhouse SQL optimization techniques?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.