In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Structured query language (SQL) is an indispensable skill in the data mining and analysis industry, and generally speaking, it is relatively easy to learn. For SQL, writing a query is only the first step, and it is most important to ensure that the query is efficient and suitable for your database operations. This tutorial will provide you with steps to evaluate your query statements.
First of all, we should understand the importance of learning SQL for data mining and analysis.
Next, you should first learn the processing and execution of SQL query statements so that you can better understand how important it is to write high-quality queries. Specifically, you should understand how queries are parsed, rewritten, optimized, and finally evaluated
Once you have mastered the above, you not only need to review the query reverse model that beginners use when writing queries, but also need to know about alternatives and solutions that may go wrong. At the same time, you should also learn more about set-based programming methods in query work.
You also need to focus on the reverse model in terms of performance. In addition to manually improving SQL queries, you also need to analyze your queries in a more structured and in-depth manner so that you can use other tools to complete the entire query.
Before executing the query, you also need to have a deeper understanding of the time complexity of executing the query plan.
Finally, you should know how to further adjust your query statement.
Why learn SQL?
Looking for a job in the data mining and analysis industry, SQL is one of the most needed skills, whether it's applying for a data analysis job, a data engine job, a data mining analysis job, or other jobs. In the 2016 compensation report for data Science Practitioners published by O'Reilly, 70 per cent of respondents confirmed this, saying they needed to use SQL in a professional environment. In addition, SQL is far better than programming languages such as R (57%) and Python (54%) in this survey. So in the field of data mining and analysis, SQL is a necessary skill.
Let's analyze why SQL has been around for so long since it was developed in the early 1970s:
First, companies basically store data in relational database management systems (RDBMS) or relational data flow management systems (RDSMS), so they need to use SQL to achieve access. SQL is a universal data language, and you can use SQL to interact with almost any other database, and even set up your own database locally!
Second, only a small number of SQL implementations do not follow the standards and are not compatible between vendors. Therefore, understanding the SQL standard is a necessary requirement to gain a foothold in the data mining and analysis industry.
The most important thing is that SQL is also accepted by newer technologies, such as Hive or Spark SQL. Hive is a SQL-like query language interface for querying and managing large datasets; Spark SQL can be used to execute SQL queries.
In short, here's why you should learn this query language:
Even for beginners, SQL is easy to learn. The learning curve is smooth, and it takes little time to write an SQ query.
SQL follows the principle of "learn once, available anytime, anywhere", so it's cost-effective to spend time learning SQL!
SQL is an excellent complement to programming languages; in some cases, writing queries even takes precedence over writing code!
...
SQL processing and query execution
To improve the performance of SQL queries, you first need to know what happens internally when you run the query.
The process of query execution is as follows:
First of all, parse the query into a "parsing tree" and analyze whether the query meets the syntax and semantic requirements. The parser will create an internal representation of the input query and pass this output to the rewriting engine.
Then, the task of the optimizer is to find the best execution or query plan for a given query. The execution plan accurately defines the algorithm used for each operation and how to coordinate the execution of the operation.
Finally, in order to find the best execution plan, the optimizer enumerates all possible execution plans and determines the quality or cost of each plan in order to obtain information about the current database status, and finally select the best execution plan. Because the query optimizer may not be perfect, database users and administrators sometimes need to manually review and adjust the plans generated by the optimizer for better performance.
Now you know what a good execution plan is.
As we learned earlier, the cost quality of the plan plays an important role. More specifically, factors such as the number of disk I / Os required to evaluate the plan, the planned CPU cost, and the overall response time and total execution time of the database client are critical. This is the concept of time complexity. We will continue to learn about it later.
Next, the selected query plan is executed, evaluated by the system's execution engine, and the query results are returned.
Write a SQL query
It needs to be further explained that the garbage collection principle (GIGO) is originally expressed in query processing and execution: the person who formulates the query also determines the performance of the SQL query.
This means that when writing a query, something can be done at the same time. As described at the beginning of the article, there are two criteria for writing a query: first, the written query needs to meet certain criteria, and second, it should also deal with performance problems that can occur in the query.
In general, there are four clauses and keywords that make it easy for beginners to consider performance issues:
WHERE clause
INNER JOIN and LEFT JOIN keywords
HAVING clause
Although this approach is simple and naive, these methods are a good guide for a beginner. These are also the places where mistakes are easy to occur when you first start writing, and these errors are also difficult to find.
At the same time, if you want to improve performance and make it meaningful, you can't leave the context: when considering SQL performance, you can't arbitrarily assume that the above clauses and keywords are not good. Clauses that use WHERE or HAVING can also be good query statements.
Use the following sections to learn more about reverse models and alternatives when writing queries, and use these tips and techniques as guidance. How to rewrite the query and whether it needs to be rewritten depends on the amount of data, the number of times the database and the query need to be executed, and so on. It all depends on your query goal, it is very important to have some knowledge about the data in advance!
1. Retrieve only the data you need
When writing SQL queries, it is not always better to have more data. Therefore, you need to be careful when using SELECT statements, DISTINCT clauses, and LIKE operators.
SELECT statement
After writing the query, the first thing you need to do is check that the select statement is concise. Your goal should be to delete unnecessary select columns. In order to get only the data that meets the purpose of your query.
If there are related subqueries that use exists, you should use constants in the select statement instead of selecting the value of the actual column. This is particularly convenient when checking entities.
Keep in mind that related subqueries are subqueries that use values from external queries, and in this case, NULL can be used as a "constant", which is really confusing!
You can understand the meaning of using constants through the following example:
SELECT driverslicensenr, nameFROM DriversWHERE EXISTS (SELECT'1' FROM Fines WHERE fines.driverslicensenr = drivers.driverslicensenr)
Tip: it's easy to find that using related subqueries is not always a good idea, so consider avoiding related subqueries in the following ways, such as using INNER JOIN rewriting:
SELECT driverslicensenr, nameFROM driversINNER JOIN fines ON fines.driverslicensenr = drivers.driverslicensenr
DISTINCT clause
The SELECT DISTINCT statement is used to return different values. DISTINCT is a clause, so you don't have to use it as much as possible, because if you add DISTINCT to the query statement, it will lead to an increase in execution time.
LIKE operator
When using the LIKE operator in a query, the index is not used if the pattern starts with% or _. It will prevent the database from using indexes, if any. Of course, from another point of view, you can also think that this type of query may relax the conditions and retrieve many records that do not necessarily meet the query objectives.
In addition, your understanding of the data stored in the data can help you develop a pattern that can be used to filter all the data correctly in order to find the data you want most.
two。 Shrink the query results
If you can't avoid using SELECT statements, you can consider shrinking the query results in other ways. For example, use LIMIT clauses and data type conversion methods.
TOP,LIMIT and ROWNUM clauses
You can add LIMIT or TOP clauses to the query to set the maximum number of rows for the query results. Here is an example:
SELECT TOP 3 * FROM Drivers
Please note that you can further specify PERCENT.
For example, if you want to change the first line of the query SELECT TOP 50 PERCENT *.
SELECT driverslicensenr, nameFROM DriversLIMIT 2
In addition, you can add a ROWNUM clause corresponding to the LIMIT used in the query:
SELECT * FROM DriversWHERE driverslicensenr = 123456 AND ROWNUM
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.