In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail what are the mistakes that often occur in writing SQL queries, and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
SQL is widely used in data analysis and data extraction. Easy to use and well received by people in the industry
Although it was easy to write SQL at first, the error rate was also quite high.
Here are 5 mistakes that you often make when writing SQL query code.
The example is short and may seem simple. However, when dealing with larger queries, these errors are not obvious at a glance. Some of these examples are specific to AWS Redshift, while others appear in other SQL databases (Postgres, MySQL, and so on). These examples should be run on a local database, or they can be run online using SQLFiddle.
The sample SQL query is available for download.
Setting
Create two temporary tables with several entries to help with the example.
Sales table
The table contains sales entries with timestamps, products, prices, and so on. Note that the key column is unique and the values in other columns can be repeated (for example, the ts column).
DROP TABLE IF EXISTSsales; CREATE TEMPORARY TABLE sales (key varchar (6), ts timestamp, product integer, completed boolean, price float) INSERT INTO sales VALUES ('sale_1',' 2019-11-08 0100 TRUE, 1. 1), ('sale_2',' 2019-11-08 01V, 0, FALSE,1.2), ('sale_3',' 2019-11-08 01V, 0, TRUE,1.3), ('sale_4',' 2019-11-08 01D, 1, FALSE,1.4), ('sale_5'' '2019-11-08 02 TRUE,1.5), (' sale_6', '2019-11-08 02 TRUE,1.5) SELECT * FROM sales
Hourly delay table
The table contains the hourly delay time for a given day. Note that the ts column is unique in the following table.
DROP TABLE IF EXISTShourly_delay; CREATE TEMPORARY TABLE hourly_delay (ts timestamp, delay float); INSERT INTO hourly_delay VALUES ('2019-11-08 000V, 80.1), (' 2019-11-08 01V 100.2), ('2019-11-08 02V); SELECT* FROM hourly_delay
1. Sort by the same timestamp
Retrieve the most recent price of each product:
SELECT price FROM (SELECT price, row_number () OVER (PARTITION BYproduct ORDER BY ts DESC) AS ix FROM sales) ASq1 WHERE ix = 1
The problem with the above query is that multiple sales have the same timestamp. Continuous runs of this query on the same data may yield different results. As can be seen in the figure below, product 0 is sold twice at 01:00 on 2019-11-11-08 at prices of 1.2and 1.3respectively.
Fix this query with the next error:)
two。 Calculate the average according to the condition
Calculate the average price of the product sold. The value is (1.1 + 1.3 + 1.5 + 1.5) / 4, or 1.35.
SELECT avg (price) FROM (SELECT CASE WHEN completed = TRUETHEN price else 0 END AS price FROM sales) ASq1
When running the query, the value is 0.9. Why? Because of this calculation: (1.1 / 6 / 6 is 0.9. The error in the query is to set 0 to an item that should not be included. NULL should be used instead of 0.
SELECT avg (price) FROM (SELECT CASE WHEN completed = TRUETHEN price else NULL END AS price FROMsales) AS Q1
Currently, the output is 1.35 as expected.
3. Calculate the average of the integer column
Calculates the average of product columns that contain integers.
SELECT avg (product) FROM sales
There are 3 zeros and 3 1s in the Product column, with an estimated average of 0.5. Most databases, such as the latest version of Postgres, will return 0.5, but Redshift will return 0 because it does not automatically cast product columns to float. So you need to cast it to the float type:
SELECT avg (product::FLOAT) FROM sales
4. Internal connection
Suppose you want to summarize all daily sales delays and calculate the average daily sales price.
SELECT t2.ts::DATE, sum (t2.delay), avg (t1.price) FROM hourly_delay AS T2 INNER JOIN sales ASt1 ON t1.ts = t2.ts GROUP BY t2.ts::DATE
The result is wrong! The above query multiplies the delay column in the hourly_ delay table by multiple, as shown in the following figure. This is because you join by timestamp, which is unique in the hourly_delay table but repeats in the sales table.
To fix this problem, calculate statistics for each table in a separate subquery, and then join the summary. This makes the timestamp unique in both tables.
SELECT t1.ts, daily_delay, avg_price FROM (SELECT t2.ts::DATE, sum (t2.delay) ASdaily_delay FROM hourly_delay AS T2 GROUP BYt2.ts::DATE) AS T2 INNER JOIN (SELECTts::DATE AS ts, avg (price) AS avg_price FROM sales GROUPBY ts::DATE) AS T1 ON t1.ts = t2.ts
5. Add columns to ORDER BY
The remedy for the above mistakes is obvious. Add the key column to the ORDER BY so that the query results can be repeated on the same data-- quickly fixed.
SELECT price FROM (SELECT price, row_number () OVER (PARTITION BYproduct ORDER BY ts, key DESC) AS ix FROMsales) AS Q1 WHERE ix = 1
Why is the query result different from the last run? During the "quick fix", the key column was placed in the wrong place in the ORDER BY. It should come after the DESC statement, not before it. The query will now return the first sale, not the last one. Make another correction.
SELECT product, price FROM (SELECT product, price, row_number () OVER (PARTITION BYproduct ORDER BY ts DESC, key) AS ix FROMsales) AS Q1 WHERE ix = 1
This fix makes the results repeatable.
About the five common mistakes in writing SQL queries are shared here, I hope the above content can be of some help to you, you can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.