An article teaches you how to understand SQL 12/31 Update SLTechnology News&Howtos

An article teaches you how to understand SQL

2025-12-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces an article to teach you how to understand SQL, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

SQL is one of the few declarative languages, and it runs in a completely different way from the command-line language, object-oriented programming language, and even functional language as we know it (although some people think that SQL is also a functional language).

We write SQL every day and apply it to the open source software jOOQ. So I want to introduce the beauty of SQL to those who still have a headache about it, so this article is specially written for the following readers:

1. People who use SQL at work but don't fully understand it.

2. Someone who is proficient in using SQL but does not understand its syntax and logic.

3. People who want to teach others SQL.

This article focuses on the SELECT sentence pattern, and other DML (Data Manipulation Language data manipulation language commands) will be introduced in other articles.

10 simple steps to fully understand SQL

1. SQL is a declarative language

First of all, keep this concept in mind: "declaration". The SQL language is an example of what you want from the raw data for the computer, rather than telling the computer how to get the result. Isn't this great?

(translator's note: to put it simply, the SQL language declares the properties of the result set, and the computer picks out the declared data from the database based on what the SQL declares, rather than telling the computer how to operate like traditional programming thinking. )

SELECT first_name, last_name FROM employees WHERE salary > 100000

The above example is easy to understand. We don't care where these employee records come from. All we need is data from those with high salaries (salary > 100000).

Where did we learn this?

If the SQL language is so simple, then what makes people "smell SQL change color"? The main reason is that we subconsciously think about problems in the way of imperative programming. It's like this: "computer, perform this step first, then that step, but check to see if condition An and condition B are met before that." For example, passing parameters with variables, using looping statements, iterating, calling functions, and so on, are all habits of thinking in this imperative programming.

2. The syntax of SQL is not in syntactic order.

One feature of SQL statements that puzzles most people is that the execution order of SQL statements is not consistent with the grammatical order of their statements. The syntactic order of SQL statements is:

SELECT [DISTINCT]

FROM

WHERE

GROUP BY

HAVING

UNION

ORDER BY

For ease of understanding, not all the SQL syntax structures are listed above, but it is sufficient to show that the syntax order of SQL statements is completely different from the order in which they are executed. Take the above statements as an example, the execution order is as follows:

FROM

WHERE

GROUP BY

HAVING

SELECT

DISTINCT

UNION

ORDER BY

With regard to the execution order of SQL statements, there are three things that deserve our attention:

1. FROM is the first step in the execution of a SQL statement, not SELECT. The first step of the database in executing the SQL statement is to load the data from the hard disk into the data buffer so that it can be manipulated. (translator's note: the original text is "The first thing that happens is loading data from the disk into memory, in order to operate on such data.", but this is not the case. Take Oracle and other common databases as an example, the data is extracted from the hard disk into the data buffer for operation. )

2. SELECT is executed after most statements have been executed, strictly speaking, after FROM and GROUP BY. It's important to understand this, which is why you can't use fields with aliases set in SELECT as criteria in WHERE.

SELECT A.x + A.y AS zFROM AWHERE z = 10-z is not available here because SELECT is the last statement executed!

If you want to reuse the alias z, you have two choices. Or rewrite the expression z represents:

SELECT A.x + A.y AS zFROM AWHERE (A.x + A.y) = 10

... Or resort to derived tables, generic data expressions, or views to avoid alias reuse. Take a look at the examples below.

3. UNION always precedes ORDER BY in terms of syntax and order of execution. Many people think that ORDER BY sorting can be used for every UNION segment, but according to the SQL language standard and the differences in the execution of each database SQL, this is not true. Although some databases allow SQL statements to sort subqueries (subqueries) or derived tables (derived tables), this does not mean that the sort remains in the sorted order after the UNION operation.

Note: not all databases use the same parsing method for SQL statements. As in MySQL, PostgreSQL, and SQLite, it will not be executed in the way described in the second point above.

What have we learned?

Since not all databases perform SQL predictions in the way described above, what are our gains? The lesson is to always remember that the syntax order of SQL statements is not consistent with the order in which they are executed, so that we can avoid general errors. If you can remember the differences between the syntax order and execution order of SQL statements, you can easily understand some common SQL problems.

Of course, if a language is designed as a syntax order that directly reflects the order in which its statements are executed, then the language is very friendly to programmers, and this programming language design concept has been applied to the LINQ language by Microsoft.

3. The core of SQL language is the reference to table (table references)

Because of the different syntax order and execution order of SQL statements, many students think that the field information in SELECT is the core of SQL statements. In fact, the real core is the reference to the table.

According to the SQL standard, the FROM statement is defined as:

:: = FROM [{}...]

The "output" of the FROM statement is a federated table that comes from the union of all referenced tables in a certain dimension. Let's take our time to analyze:

FROM a, b

The output of the above FROM statement is a federated table that joins table an and table b. If table a has three fields and table b has five fields, then the output table has 8 (= 5, 3) fields.

The data in this joint table is ab, the Cartesian product of an and b. In other words, every piece of data in table an is paired with every piece of data in table b. If table a has three pieces of data and table b has five pieces of data, then the federated table will have 15 (= 53) pieces of data.

The result of FROM output is filtered by WHERE statement and processed by GROUP BY statement to form a new output result. We will discuss this issue later.

If we look at it from the point of view of set theory (relational algebra), the table of a database is the relationship of a set of data elements, and each SQL statement will change one or more relationships, resulting in new data element relationships (that is, generating new tables).

What have we learned?

When thinking about the problem, think about the question from the perspective of the table, so it is easy to understand how the data has changed in the "pipeline" of the SQL statement.

4. Flexible reference tables can make SQL statements more powerful.

Flexible reference tables can make SQL statements more powerful. A simple example is the use of JOIN. Strictly speaking, the JOIN statement is not a part of SELECT, but a special table reference statement. The join definition of the table in the SQL language standard is as follows:

:: = | |

Take the previous example:

FROM a, b

A you may enter a connection to the following table:

A1 JOIN a2 ON a1.id = a2.id

Put it in the previous example and it becomes:

FROM A1 JOIN a2 ON a1.id = a2.id, b

Although it is not a common practice to join one join table with another with a comma, you can do so. As a result, the final output table has a1+a2+b fields.

(translator's note: in the original text, the word is degree and translated as dimension. If we visualize a table, we can imagine that each table is composed of horizontal and vertical dimensions, horizontal dimension is what we call fields or columns, English columns; vertical dimension represents each piece of data, English is record, according to the context, the author here should refer to the number of fields. )

Derived table references in SQL statements are even more powerful than table joins, so let's talk about table joins.

What have we learned?

When thinking about problems, start from the point of view of table references, which makes it easy to understand how data is processed by SQL statements, and helps you understand what complex table references do.

More importantly, understand that JOIN is the keyword for building the join table and is not part of the SELECT statement. There are some databases that allow the use of JOIN in INSERT, UPDATE, DELETE.

5. Table joins are recommended in SQL statements.

Let's take a look at the sentence just now:

FROM a, b

Advanced SQL programmers may learn to give you advice: try not to use commas instead of JOIN to join tables. This will improve the readability of your SQL statements and avoid some errors.

Simplifying SQL statements with commas can sometimes cause confusion. Consider the following statement:

FROM a, b, c, d, e, f, g, hWHERE a.a1 = b.bxAND a.a2 = c.c1AND d.d1 = b.bcmury-etc...

It is not difficult to see that the benefits of using JOIN statements are:

It's clear. The JOIN is very close to the table to be joined so that errors can be avoided.

For more ways of joining, JOIN statements can distinguish between outer connections and inner connections, etc.

What have we learned?

Remember to use JOIN as much as possible to join tables, and never join tables with commas after FROM.

6. Different join operations in SQL statements

In the SQL statement, table joins are fundamentally divided into five ways:

EQUI JOIN

SEMI JOIN

ANTI JOIN

CROSS JOIN

PISION

EQUI JOIN

This is the most common JOIN operation, which consists of two connection methods:

INNER JOIN (or JOIN)

OUTER JOIN (including: LEFT, RIGHT, FULL OUTER JOIN)

It is easiest to illustrate the difference with examples:

-- This table reference contains authors and their books.-- There is one record for each book and its author.-- authors without books are NOT includedauthor JOIN book ON author.id = book.author_id-- This table reference contains authors and their books-- There is one record for each book and its author.-- OR there is an "empty" record for authors without books-- ("empty" meaning that all book columns are NULL) author LEFT OUTER JOIN book ON author.id = book.author_id

There are two ways to express this connection in SQL: using IN or using EXISTS. "SEMI" means "half" in Latin. This join method is only part of the join target table. What does this mean? Think again about the link between the author and the title above. Let's imagine a situation where we don't need a combination of author / title, just the author information of the books on the title list. Then we can write like this:

-Using INFROM authorWHERE author.id IN (SELECT book.author_id FROM book)-Using EXISTSFROM authorWHERE EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

Although there are no strict rules on when you should use IN and when you should use EXISTS, here are some things you should know:

IN is more readable than EXISTS.

EXISTS is more expressive than IN (more suitable for complex statements)

There is no difference in performance between the two (but the performance difference can be very large for some databases)

Because you can also get the author information corresponding to the book in the title table by using INNER JOIN, many beginners think that you can remove the duplicates through DISTINCT, and then write the SEMI JOIN statement like this:

-- Find only those authors who also have booksSELECT DISTINCT first_name, last_nameFROM authorJOIN book ON author.id = book.author_id

This is a bad way to write for the following reasons:

SQL statement performance is poor: because de-repeat operations (DISTINCT) require the database to repeatedly read data from the hard disk into memory. (translator's note: DISTINCT is indeed a resource-consuming operation, but each database may operate differently with DISTINCT).

This is not entirely true: although it may not be a problem to write it now, as SQL statements become more complex, it becomes very difficult for you to get the right results back.

ANTI JOIN

The relationship of this connection is just the opposite of SEMI JOIN. This connection can be used by adding a NOT keyword before IN or EXISTS. For example, we list authors who do not have books on the list of titles:

-Using INFROM authorWHERE author.id NOT IN (SELECT book.author_id FROM book)-Using EXISTSFROM authorWHERE NOT EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

You can also refer to SEMI JOIN for performance, readability, expressiveness and other features.

This blog post introduces what to do when you encounter NULL when using NOT IN. Because it deviates from this topic a little bit, you can read it if you are interested.

CROSS JOIN

This join process is the product of two joined tables: each piece of data in the first table corresponds to each piece of data in the second table. As we've seen before, this is the use of commas in FROM statements. In practical applications, CROSS JOIN is rarely used, but once you do, you can express it with a SQL statement like this:

-- Combine every author with every bookauthor CROSS JOIN book

PISION

PISION is really a freak. In short, if JOIN is a multiplication, then pISION is the inverse of JOIN. The relationship of pISION is difficult to express in SQL, since this is a novice guide, explaining pISION is beyond our purpose.

What have we learned?

I've learned a lot! Let's think about it again in our minds. SQL is a reference to a table, and JOIN is a complex way to refer to a table. But there is a difference between the expression of the SQL language and the actual logical relationship we need, not all logical relations can find the corresponding JOIN operation, so it is necessary for us to accumulate and learn more relational logic at ordinary times, so that you can choose the appropriate JOIN operation in writing SQL statements in the future.

7. Derived tables like variables in SQL

Before that, we learned that SQL is a declarative language and that SQL statements cannot contain variables. But you can write statements similar to variables, which are called derived tables:

To put it bluntly, the so-called derived table is a subquery in parentheses:

-A derived tableFROM (SELECT * FROM author)

It is important to note that sometimes we can define a related name (what we call an alias) to the derived table.

-- A derived table with an aliasFROM (SELECT * FROM author) a

Derived tables can effectively avoid problems caused by SQL logic. For example: if you want to reuse a result that is queried with SELECT and WHERE statements, you can write it (take Oracle as an example):

-Get authors' first and last names, and their age in daysSELECT first_name, last_name, ageFROM (SELECT first_name, last_name, current_date-date_of_birth ageFROM author)-- If the age is greater than 10000 daysWHERE age > 10000

It is important to note that in some databases, as well as in the SQL: 1990 standard, derived tables are classified as a lower level-universal table statements (common table experssion). This allows you to reuse derived tables multiple times in a SELECT statement. The above example is (almost) equivalent to the following statement:

WITH an AS (SELECT first_name, last_name, current_date-date_of_birth age FROM author) SELECT * FROM aWHERE age > 10000

Of course, you can also create a separate view for "a" so that you can reuse the derived table on a wider scale.

What have we learned?

We have repeatedly stressed that, in general, SQL statements are references to tables, not fields. Take advantage of this and don't be afraid to use derived tables or other more complex statements.

8. GROUP BY in the SQL statement is an operation on the reference to the table

Let's recall the previous FROM statement:

FROM a, b

Now, let's apply GROUP BY to the above statement:

GROUP BY A.x, A.y, B.z

The result of the above statement is a reference to a new table with three fields. Let's take a closer look at this sentence: when you apply GROUP BY, columns that do not use aggregate functions after SELECT will appear after GROUP BY. (translator's note: when you are using GROUP BY, there are fewer columns on which you can perform the next level of logical operation, including those in SELECT).

It is important to note that other fields can use aggregate functions:

SELECT A.x, A.y, SUM (A.z) FROM AGROUP BY A.x, A.y

It is also worth noting that MySQL does not adhere to this standard, which is indeed very confusing. (translator's note: this is not to say that MySQL does not have the function of GROUP BY) but don't be fooled by MySQL. GROUP BY changes the way tables are referenced. You can both reference a field in SELECT and group it in GROUP BY like this.

What have we learned?

GROUP BY, again, operates on a reference to the table to transform it into a new reference.

9. The SELECT in the SQL statement is essentially a mapping of relations.

I personally prefer the word "mapping", especially when it is used in relational algebra. (translator's note: the original word is projection, which has two meanings, the first meaning is prediction, planning and design, and the second meaning is projection and mapping. After repeated deliberation, I think the function of SELECT can be expressed more intuitively by mapping here. Once you have created a reference to the table, after modification and deformation, you can map it to another model step by step. The SELECT statement is like a "projector". We can think of it as a function that converts the data in the source table into the target table data according to a certain logic.

Through the SELECT statement, you can manipulate each field and generate the required data through complex expressions.

There are many special rules for SELECT statements, at least you should be familiar with the following:

You can only use fields that can be referenced by a table.

If you have a GROUP BY statement, you can only use the fields or aggregate functions that follow the GROUP BY statement

When there is no GROUP BY in your statement, you can use the windowing function instead of the aggregate function

When there is no GROUP BY in your statement, you cannot use aggregate functions and other functions at the same time

There are some ways to encapsulate ordinary functions in aggregate functions

……

There are enough more complex rules to write another article. For example, why can't you use both ordinary and aggregate functions in a SELECT statement without GROUP BY? (section 4 above)

The reasons are as follows:

Intuitively, this doesn't make sense logically.

If intuition can't convince you, then grammar can. The SQL: 1999 standard introduces the GROUPING SETS,SQL:2003 standard and group sets: GROUP BY (). Whenever an aggregate function appears in your statement and there is no explicit GROUP BY statement, an ambiguous, empty GROUPING SET will be applied to the SQL. Therefore, the rules of the original logical order are broken, and the mapping (that is, SELECT) relations will first affect the logical relations, followed by the grammatical relations. (translator's note: the original paragraph is rather difficult, which can be simply understood as follows: in a SQL statement with both aggregate and ordinary functions, if there is no GROUP BY to group, the SQL statement defaults to treat the whole table as a group. when the aggregate function aggregates a field, each record in the referenced table loses its meaning and all the data is aggregated into a statistical value. It doesn't make sense for you to use other functions for each record at this time.

Confused? Yeah, me too. Let's go back and look at something simple.

What have we learned?

The SELECT statement is probably the hardest part of the SQL statement, even though it looks simple. The purpose of other statements is actually different forms of references to the table. The SELECT statement integrates these references and maps the source table to the target table through logical rules, and the process is reversible, so we can clearly know where the data of the target table comes from.

If you want to learn the SQL language well, you should understand other sentences before using the SELECT statement. Although SELECT is the first keyword in the grammatical structure, it should be the last one we master.

10. A few simple keywords in the SQL statement: DISTINCT, UNION, ORDER BY and OFFSET

After learning the complex SELECT Henan opera, let's take a look at something simple:

Set operations (DISTINCT and UNION)

Sort operation (ORDER BY,OFFSET... FETCH)

Set operation (set operation):

The main operation of set operation is on the set, which in fact refers to an operation on the table. Conceptually, they are easy to understand:

DISTINCT de-duplicates the data after mapping

UNION splices the two subqueries and removes the duplicates

UNION ALL splices two subqueries but does not duplicate them.

EXCEPT removes the results from the second word query from the first subquery

INTERSECT retains the results in both subqueries and removes the duplicates

Sort operation (ordering operation):

Sorting operations have nothing to do with logical relationships. This is a feature unique to SQL. The sort operation is performed not only at the end of the SQL statement, but also during the run of the SQL statement. Use ORDER BY and OFFSET... FETCH is the most efficient way to ensure that data can be arranged in order. All other sorting methods have a certain degree of randomness, although the results they get are reproducible.

OFFSET... SET is a statement without a unified syntax, and different databases have different expressions, such as MySQL and PostgreSQL's LIMIT. TOP for OFFSET, SQL Server and Sybase... START AT et al.

Let's make full use of SQL in our work!

Like other languages, it takes a lot of practice to learn the SQL language well. The above 10 simple steps can help you better understand the SQL statements you write every day. On the other hand, a lot of experience can be accumulated from common mistakes.

Thank you for reading this article carefully. I hope the article "an article that teaches you how to understand SQL" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.