Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use SQL in ClickHouse performance improvement

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to improve the performance of ClickHouse in the use of SQL, many people may not understand, in order to make you better understand, the editor summed up the following, I hope you can get something according to this article.

1. Don't use select *

Counterexample:

Select * from app.user_model

Positive example:

Select login_id,name,sex from app.user_model reason: querying only the required fields can reduce disk io and network io, and improve query performance by 2. 5%. Do not construct virtual columns on large result sets

Counterexample:

Select id, pv, uv, pv/uv rate from app.scene_model

Positive example:

Select id, pv, uv from app.scene_model reason: virtual columns consume a lot of resources and performance, so the ratio is constructed when it is displayed at the front end after getting the pv uv. 3. Do not group or remove duplicates on unique columns or large cardinality columns

Counterexample:

Select id, count (1) cn from app.user_model group by id

Positive example:

Select id from app.user_model reason: too much cardinality consumes too much io and memory. 4. Query a specified range of data as needed (where)

Counterexample:

Select login_id,name,sex from app.user_model

Positive example:

Select login_id,name,sex from app.user_model where create_time > '2020-03-30' reason: reduce disk io and network io, improve query performance 5. The small table comes after the associated query (large table join small table)

Counterexample:

Select login_id,name,sex,a.scene_name from app.scene_model a join app.user_model b on a.create_user=b.id

Positive example:

Select login_id,name,sex,a.scene_name from app.user_model a join app.scene_model b on a.id=b.create_user reason:

Whether Left Join, Right Join, or Inner Join, always take each record in the right table to the left table to find out if the record exists.

6. Use uniqCombined instead of distinct

Counterexample:

SELECT count (DISTINCT create_user) from app.scene_model

Positive example:

SELECT uniqCombined (create_user) from app.scene_model reason: uniqCombined optimizes deduplication and improves query performance tenfold through approximate deduplication. Limit the number of entries returned by using limit

Counterexample:

Select id,scene_name,code,pv from app.scene_model order by pv desc

Positive example:

Select id,scene_name,code,pv from app.scene_model order by pv desc limit 100 reasons:

Use limit to return the specified number of result sets without scanning down, which greatly improves the efficiency of the query.

8. Try not to use string types

Counterexample:

CREATE TABLE scene_model (id String, scene_name String, pv String, create_time String) ENGINE =.

Positive example:

CREATE TABLE scene_model (id String, scene_name String, pv Int32, create_time Date) ENGINE =. Reason: time types will eventually be converted to numeric types for processing, and numeric types are much better than strings in terms of execution efficiency and storage.

9. Specify the query partition to get the necessary data

Suppose the partition field is day

Counterexample:

Select type,count (1) from app.user_model group by type

Positive example:

Select type,count (1) from app.user_model where day = '2020-03-30' group by type reason: specifying partition fields reduces the number of files scanned by the underlying database and improves query performance by 10. 5%. Filter unnecessary fields before grouping

Counterexample:

Select type,count (1) from app.user_model group by type

Positive example:

Select type,count (1) from app.user_model where type ='1' or type ='2' group by type reason: by limiting the number of result sets before grouping, query performance can generally prompt tens or even hundreds of times after reading the above, do you have any further understanding of how to use SQL in ClickHouse performance improvement? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report