Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Fully understand the steps of SQL

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

What this article shares with you is about how to fully understand the steps of SQL. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

1. SQL is a declarative language

First of all, keep this concept in mind: "declaration". The SQL language is an example of what you want from the raw data for the computer, rather than telling the computer how to get the result. The execution engine of sql will get the corresponding data based on the results of the data you declare.

SELECT first_name, last_name FROM employees WHERE salary > 100000

The above example is easy to understand. We don't care where these employee records come from. All we need are employees with a salary of more than 10W.

Where did we learn this?

If the SQL language is so simple, then what makes people "smell SQL change color"?

The main reason is that we subconsciously think about problems in the way of imperative programming. It's like this: "computer, perform this step first, then that step, but check to see if condition An and condition B are met before that." For example, passing parameters with variables, using looping statements, iterating, calling functions, and so on, are all habits of thinking in this imperative programming.

2. The syntax of SQL is not in syntactic order.

One feature of SQL statements that puzzles most people is that the execution order of SQL statements is not consistent with the grammatical order of their statements. The syntactic order of SQL statements is:

SELECT [DISTINCT] FROM WHERE GROUP BY HAVING UNION ORDER BY

For ease of understanding, not all the SQL syntax structures are listed above, but it is sufficient to show that the syntax order of SQL statements is completely different from the order in which they are executed. Take the above statements as an example, the execution order is as follows:

FROM WHERE GROUP BY HAVING SELECT DISTINCT UNION ORDER BY

With regard to the execution order of SQL statements, there are three things that deserve our attention:

1. FROM is the * step of SQL statement execution, not SELECT. The database's step in executing the SQL statement is to load data from the hard disk into the data buffer so that it can be manipulated.

2. SELECT is executed after most statements have been executed, strictly speaking, after FROM and GROUP BY. It's important to understand this, which is why you can't use fields with aliases set in SELECT as criteria in WHERE.

SELECT A.x + A.y AS z FROM A WHERE z = 10-z is not available here because SELECT is a statement executed by *!

If you want to reuse the alias z, you have two choices. Or rewrite the expression z represents:

SELECT A.x + A.y AS z FROM A WHERE (A.x + A.y) = 10

Or resort to derived tables, generic data expressions, or views to avoid alias reuse.

3. UNION always precedes ORDER BY in terms of syntax and order of execution. Many people think that ORDER BY sorting can be used for every UNION segment, but according to the SQL language standard and the differences in the execution of each database SQL, this is not true.

Although some databases allow SQL statements to sort subqueries (subqueries) or derived tables (derived tables), this does not mean that the sort remains in the sorted order after the UNION operation. Note: not all databases use the same parsing method for SQL statements. As in MySQL, PostgreSQL, and SQLite, it will not be executed in the way described in the second point above.

What have we learned?

Since not all databases perform SQL in the way described above, what are our gains?

The lesson is to always remember that the syntax order of SQL statements is not consistent with the order in which they are executed, so that we can avoid general errors. If you can remember the differences between the syntax order and execution order of SQL statements, you can easily understand some common SQL problems.

Of course, if a language is designed as a syntax order that directly reflects the order in which its statements are executed, then the language is very friendly to programmers, and this programming language design concept has been applied to the LINQ language by Microsoft.

3. The core of SQL language is the reference to table (table references)

Because of the different syntax order and execution order of SQL statements, many students think that the field information in SELECT is the core of SQL statements. In fact, the real core is the reference to the table.

According to the SQL standard, the FROM statement is defined as:

:: = FROM [{}...]

The "output" of the FROM statement is a federated table that comes from the union of all referenced tables in a certain dimension. Let's take our time to analyze:

FROM a, b

The output of the above FROM statement is a federated table that joins table an and table b. If table a has three fields and table b has five fields, then the output table has 8 (= 5, 3) fields.

The data in this joint table is aquib, the Cartesian product of an and b. In other words, every piece of data in table an is paired with every piece of data in table b. If table a has three pieces of data and table b has five pieces of data, then the federated table will have 15 (= 53.3) pieces of data.

The result of FROM output is filtered by WHERE statement and processed by GROUP BY statement to form a new output result.

If we look at it from the point of view of set theory (relational algebra), the table of a database is the relationship of a set of data elements, and each SQL statement will change one or more relationships, resulting in new data element relationships (that is, generating new tables).

What have we learned?

When thinking about the problem, think about the question from the perspective of the table, so it is easy to understand how the data has changed in the "pipeline" of the SQL statement.

4. Flexible reference tables can make SQL statements more powerful.

Flexible reference tables can make SQL statements more powerful. A simple example is the use of JOIN.

Strictly speaking, the JOIN statement is not a part of SELECT, but a special table reference statement.

The join definition of the table in the SQL language standard is as follows:

:: = | |

Take the previous example:

FROM a, b

A may enter connections such as the following table:

A1 JOIN a2 ON a1.id = a2.id

Put it in the previous example and it becomes:

FROM A1 JOIN a2 ON a1.id = a2.id, b

Although it is not a common practice to join one join table with another with a comma, you can do so. As a result, the final output table has a1+a2+b fields.

The application of derived tables in SQL statements is even more powerful than table joins, so let's talk about table joins.

What have we learned?

When thinking about problems, start from the point of view of table references, which makes it easy to understand how data is processed by SQL statements, and helps you understand what complex table references do.

More importantly, understand that JOIN is the keyword for building the join table and is not part of the SELECT statement. There are some databases that allow the use of JOIN in INSERT, UPDATE, DELETE.

5. Table joins are recommended in SQL statements.

Let's take a look at what we just said:

FROM a, b

Advanced SQL programmers may give you advice: try not to use commas instead of JOIN to join tables. This will improve the readability of your SQL statements and avoid some errors. Simplifying SQL statements with commas can sometimes cause confusion. Consider the following statement:

FROM a, b, c, d, e, f, g, h WHERE a.a1 = b.bx AND a.a2 = c.c1 AND d.d1 = b.bc-- etc...

It is not difficult to see that the advantage of using JOIN statements is: security. The JOIN is very close to the table to be joined so that errors can be avoided.

For more ways of joining, JOIN statements can distinguish between outer connections and inner connections, etc.

What have we learned?

Remember to use JOIN as much as possible to join tables, and never join tables with commas after FROM.

6. Different join operations in SQL statements

In the SQL statement, table joins are fundamentally divided into five ways:

EQUI JOIN SEMI JOIN ANTI JOIN CROSS JOIN DIVISION

EQUI JOIN is one of the most common JOIN operations, which includes two connection methods:

INNER JOIN (or JOIN)

OUTER JOIN (including: LEFT, RIGHT, FULL OUTER JOIN)

It is easiest to illustrate the difference with examples:

-- This table reference contains authors and their books. -- There is one record for each book and its author. -- authors without books are NOT included author JOIN book ON author.id = book.author_id-- This table reference contains authors and their books-- There is one record for each book and its author. -. OR there is an "empty" record for authors without books-- ("empty" meaning that all book columns are NULL) author LEFTOUTER JOIN book ON author.id = book.author_id

SEMI JOIN

There are two ways to express this connection in SQL: using IN or using EXISTS. "SEMI" means "half" in Latin. This join method is only part of the join target table. What does this mean?

Think again about the link between the author and the title above. Let's imagine a situation where we don't need a combination of author / title, just the author information of the books on the title list. Then we can write like this:

-Using IN FROM author WHERE author.id IN (SELECT book.author_id FROM book)-Using EXISTS FROM author WHERE EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

Although there are no strict rules on when you should use IN and when you should use EXISTS, here are some things you should know:

IN is more readable than EXISTS.

EXISTS is more expressive than IN (more suitable for complex statements)

There is no difference in performance between the two (but for some databases) because the author information corresponding to the book in the title table can also be obtained by using INNER JOIN, so many beginners think that they can be deduplicated through DISTINCT, and then write the SEMI JOIN statement like this:

-- Find only those authors who also have books SELECT DISTINCT first_name, last_name FROM author JOIN book ON author.id = book.author_id

This is a bad way to write for the following reasons:

SQL statement performance is poor: because de-repeat operations (DISTINCT) require the database to repeatedly read data from the hard disk into memory.

This is not entirely true: although it may not be a problem to write it now, as SQL statements become more complex, it becomes very difficult for you to get the right results back.

ANTI JOIN

The relationship of this connection is just the opposite of SEMI JOIN. This connection can be used by adding a NOT keyword before IN or EXISTS. For example, we list authors who do not have books on the list of titles:

-Using IN FROM author WHERE author.id NOT IN (SELECT book.author_id FROM book)-Using EXISTS FROM author WHERE NOT EXISTS (SELECT 1 FROM book WHEREbook.author_id = author.id)

You can also refer to SEMI JOIN for performance, readability, expressiveness and other features.

CROSS JOIN

This join process is the product of two joined tables: each piece of data in the table corresponds to each piece of data in the second table. As we've seen before, this is the use of commas in FROM statements. In practical applications, CROSS JOIN is rarely used, but once you do, you can express it with a SQL statement like this:

-- Combine every author with every bookauthor CROSS JOIN book

DIVISION DIVISION is really a freak. In short, if JOIN is a multiplication, then DIVISION is the inverse of JOIN. The relationship of DIVISION is difficult to express in SQL, since this is a novice guide, explaining DIVISION is beyond our purpose.

What have we learned?

I've learned a lot! Let's think about it again in our minds. SQL is a reference to a table, and JOIN is a complex way to refer to a table. But there is a difference between the expression of the SQL language and the actual logical relationship we need, not all logical relations can find the corresponding JOIN operation, so it is necessary for us to accumulate and learn more relational logic at ordinary times, so that you can choose the appropriate JOIN operation in writing SQL statements in the future.

7. Derived tables like variables in SQL

Before that, we learned that SQL is a declarative language and that SQL statements cannot contain variables. But you can write statements similar to variables, which are called derived tables:

To put it bluntly, the so-called derived table is a subquery in parentheses:

-A derived table FROM (SELECT * FROM author)

It is important to note that sometimes we can define a related name (what we call an alias) to the derived table.

-- A derived table with an alias FROM (SELECT * FROM author) a

Derived tables can effectively avoid problems caused by SQL logic.

For example: if you want to reuse a result that is queried with SELECT and WHERE statements, you can write it (take Oracle as an example):

-Get authors' first and last names, and their age in days SELECT first_name, last_name, age FROM (SELECT first_name, last_name, current_date- date_of_birth age FROM author)-If the age is greater than 10000 days WHEREage > 10000

It is important to note that in some databases, as well as in the SQL: 1990 standard, derived tables are classified as a lower level-universal table statements (common table experssion). This allows you to reuse derived tables multiple times in a SELECT statement.

The above example is (almost) equivalent to the following statement:

WITH an AS (SELECT first_name, last_name, current_date- date_of_birth age FROM author) SELECT * FROM a WHERE age > 10000

Of course, you can also create a separate view for "a" so that you can reuse the derived table on a wider scale.

What have we learned?

We have repeatedly stressed that, in general, SQL statements are references to tables, not fields. Take advantage of this and don't be afraid to use derived tables or other more complex statements.

8. GROUP BY in the SQL statement is an operation on the reference to the table

Let's recall the previous FROM statement:

FROM a, b

Now, let's apply GROUP BY to the above statement:

GROUP BY A.x, A.y, B.z

The result of the above statement is a reference to a new table with three fields. Let's take a closer look at this sentence: when you apply GROUP BY, columns that do not use aggregate functions after SELECT will appear after GROUP BY. (translator's note: when you are using GROUP BY, there are fewer columns on which you can perform the next level of logical operation, including those in SELECT). It is important to note that other fields can use aggregate functions:

SELECT A.x, A.y, SUM (A.z) FROM A GROUP BY A.x, A.y

It is also worth noting that MySQL does not adhere to this standard, which is indeed very confusing. (translator's note: this is not to say that MySQL does not have the function of GROUP BY) but don't be fooled by MySQL. GROUP BY changes the way tables are referenced. You can both reference a field in SELECT and group it in GROUP BY like this.

What have we learned?

GROUP BY, again, operates on a reference to the table to transform it into a new reference.

9. The SELECT in the SQL statement is essentially a mapping of relations.

I personally prefer the word "mapping", especially when it is used in relational algebra. (translator's note: the original word is projection, which has two meanings: prediction, planning and design, and the second meaning is projection and mapping. After repeated deliberation, I think the function of SELECT can be expressed more intuitively by mapping. Once you have created a reference to the table, after modification and deformation, you can map it to another model step by step.

The SELECT statement is like a "projector". We can think of it as a function that converts the data in the source table into the target table data according to a certain logic.

Through the SELECT statement, you can manipulate each field and generate the required data through complex expressions.

There are many special rules for SELECT statements, at least you should be familiar with the following:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

You can only use fields that can be referenced by a table.

If you have a GROUP BY statement, you can only use the fields or aggregate functions that follow the GROUP BY statement

When there is no GROUP BY in your statement, you can use the windowing function instead of the aggregate function

When there is no GROUP BY in your statement, you cannot use aggregate functions and other functions at the same time

There are some ways to encapsulate ordinary functions in aggregate functions

……

There are enough more complex rules to write another article. For example, why can't you use both ordinary and aggregate functions in a SELECT statement without GROUP BY? (section 4 above)

The reasons are as follows:

Intuitively, this doesn't make sense logically. If intuition can't convince you, then grammar can. SQL: 1999 standard introduces GROUPING SETS,SQL: 2003 standard introduces group sets: GROUP BY (). Whenever an aggregate function appears in your statement and there is no explicit GROUP BY statement, an ambiguous, empty GROUPING SET will be applied to the SQL. Therefore, the rules of the original logical order are broken, and the mapping (that is, SELECT) relations will first affect the logical relations, followed by the grammatical relations. (translator's note: the original paragraph is rather difficult, which can be simply understood as follows: in a SQL statement with both aggregate and ordinary functions, if there is no GROUP BY to group, the SQL statement defaults to treat the whole table as a group. when the aggregate function aggregates a field, each record in the referenced table loses its meaning and all the data is aggregated into a statistical value. It doesn't make sense for you to use other functions for each record at this time. Confused? Yeah, me too. Let's go back and look at something simple.

What have we learned?

The SELECT statement is probably the hardest part of the SQL statement, even though it looks simple. The purpose of other statements is actually different forms of references to the table. The SELECT statement integrates these references and maps the source table to the target table through logical rules, and the process is reversible, so we can clearly know where the data of the target table comes from.

If you want to learn the SQL language well, you should understand other sentences before using the SELECT statement. Although SELECT is the first keyword in the grammatical structure, it should be the first one to master it.

10. A few simple keywords in the SQL statement: DISTINCT, UNION, ORDER BY and OFFSET

After learning the complex SELECT, let's take a look at something simple:

Set operation (set operation): the main operation of set operation is on the set, which in fact refers to an operation on the table. Conceptually, they are easy to understand:

DISTINCT de-duplicates the data after mapping

UNION splices the two subqueries and removes the duplicates

UNION ALL splices two subqueries but does not duplicate them.

EXCEPT removes the results from the second word query from the * subqueries.

INTERSECT retains the results in both subqueries and removes the duplicates

Sort operation (ordering operation):

Sorting operations have nothing to do with logical relationships. This is a feature unique to SQL. The sort operation is performed not only in the SQL statement, but also during the run of the SQL statement. Use ORDER BY and OFFSET... FETCH is the most efficient way to ensure that data can be arranged in order. All other sorting methods have a certain degree of randomness, although the results they get are reproducible. OFFSE SET is a statement without a unified syntax, and different databases have different expressions, such as MySQL and PostgreSQL's LIMIT. TOP for OFFSET, SQL Server and Sybase... START AT et al.

The above is a full understanding of the steps of SQL, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report