In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what clickhouse SQL optimization skills are, the article is very detailed, has a certain reference value, interested friends must read it!
Most of the slow sql queries are caused by high cpu load, high io, or no index in the query column. Note: clickhouse itself does not support high concurrency scenarios, and too high qps will lead to too high cpu on the clickhouse server, resulting in slow query.
In these cases, it is common to consider whether there are complex operations in sql, whether the quantity of the query is too large, and whether the index in the column of the query is valid
Sql query features: large quantity and large partition span
There are more than 800 million pieces of data in the data table, and the data table is partitioned according to p_data_day
Select sn,COUNT (1) as valueQt from data WHERE sn='70A0600018109' and p_day > = '2017-01-01' and p_data_day < '2020-08-13'group by sn
The data traverses the entire partition, and the data is returned in about 1 second on average.
Optimization idea: reduce the traversal of unnecessary data (partitioning); make full use of clickhouse index (group by index)
According to the query of sn, the materialized view is established, and the 800 million pieces of data are divided into 256 partitions according to sn number and device_id (mac_code).
Create MATERIALIZED VIEW IF NOT EXISTS data_sn_materializedengine = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data_sn_materialized','{replica}') PARTITION BY sn_sort_key ORDER BY (sn_sort_key,sn,p_day) AS select halfMD5 (_ sn) 256 as sn_sort_key,sn,p_day,count () as cnt from data group by sn_sort_key,sn,p_day
Query statement; keep the original output and input parameters unchanged, and the data can be returned within the 200ms
Sql query features: large quantity and large partition span
The amount of data in the data table is more than 1 billion, and the table statement is as follows
CREATE TABLE data (`data_ day` Date, `flow_ type` UInt32 DEFAULT CAST (0, 'UInt32'),.) ENGINE = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data','{replica}') PARTITION BY data_day ORDER BY (flow_type, data_day) SETTINGS index_granularity = 8192
Query statement
Select... From data where data_day = '2020-09-11' We observe that when querying data, it is always specific to yesterday; and historical data will no longer be used
Optimization idea: use TTL of clickhouse to reduce table capacity, CREATE TABLE dwrt.lc_order_flow (`data_ day` Date,. `flow_ type`UInt32 DEFAULT CAST (0, 'UInt32'),.... ) ENGINE = ReplicatedMergeTree ('/ clickhouse/tables/ {ck_cluster} / data','{replica}') PARTITION BY data_day ORDER BY (data_day, flow_type) TTL data_day + toIntervalDay (7) SETTINGS index_granularity = 8192
The above is all the content of this article "what are the clickhouse SQL optimization techniques?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.