After years of silence, SQL is making a comeback 07/02 Update SLTechnology News&Howtos

After years of silence, SQL is making a comeback

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original: https://www.enmotech.com/web/detail/1/861/1.html

Introduction: today's SQL is making a comeback after years of silence. What's the reason? What impact does this have on the data community? Take a look at the analysis of this article. The following is a translation.

Since we have been able to do things with computers, the data we have been collecting has been growing at an exponential rate, so there is an increasing demand for data storage, processing and analysis techniques. In the past decade, because SQL could not meet these requirements, software developers abandoned it, and NoSQL gradually developed: MapReduce,Bigtable,Cassandra,MongoDB and so on.

However, now SQL is making a comeback. Major cloud vendors now offer popular managed relational database services: such as Amazon RDS, Google Cloud SQL,Azure 's PostgreSQL database (Azure will be released this year). In Amazon's own words, Aurora Database combines PostgreSQL and MySQL databases, so the product has always been "the fastest-growing service in AWS history." The SQL interface over Hadoop and Spark continues to flourish. Just last month, Kafka launched SQL support.

In this article, we will examine why SQL is making a comeback now and what it means for future data community engineering and analysis.

Why is SQL making a comeback?

To understand why SQL is making a comeback, start with why you designed SQL.

Our story begins with IBM research in the early 1970s, when relational databases were born. The query language at that time depended on complex mathematical logic and symbols. Donald Chamberlin and Raymond Boyce, who have just completed their Ph.D., are impressed by the relational data model, but find that query language will become a major bottleneck in their development. So they began to design a new query language (in their own words): "to make it easier for users who have no formal training in math and computer programming."

Comparison of two query languages

Think about it. Before the advent of the Internet, before the advent of personal computers, when the programming language C was first introduced to the world, two young computer scientists realized that "the success of the computer industry depends largely on training users other than trained computer experts." What they want is a query language that is as easy to read as English, which will also include database management and manipulation.

The result was the first introduction of SQL to the world in 1974. In the coming decades, SQL will prove to be very popular. As relational databases such as System R, Ingres, DB2, Oracle, SQL Server, PostgreSQL, MySQL (etc.) take over the software industry, SQL has become an excellent language for interacting with databases and a lingua franca for an increasingly crowded and competitive ecosystem.

Unfortunately, Raymond Boyce has never had a chance to witness the success of SQL. He died of a brain aneurysm a month later, giving only the earliest SQL speech, when he was only 26 years old, leaving behind a wife and a young daughter. )

For a time, it seemed that SQL had successfully completed its task, but then the Internet appeared.

The counterattack of NoSQL

While both Chamberlin and Boyce were developing SQL, they didn't expect that a second team of engineers in California were working on another burgeoning project that would spread widely and threaten the existence of SQL. This project is ARPANET, which was born on October 29th, 1969.

Some of the creators of ARPANET eventually evolved into what the Internet is today.

SQL had been doing well, but it wasn't until 1989 that another engineer appeared and invented the World wide Web.

Like those dense weeds, the Internet and the Internet are booming, greatly disrupting our world, but for the data community, it also creates a particular problem: compared with the past, new data sources generate data at a higher number and speed.

With the continuous development of the Internet, the software community found that relational databases at that time could not handle this new load. So there was an uproar of power, as if a million databases were suddenly overloaded.

Then two new Internet giants made breakthroughs and developed their own non-relational distributed systems to help address this new data shock: MapReduce and Bigtable released by Google and Dynamo released by Amazon. These groundbreaking papers led to the emergence of more non-relational databases, including Hadoop,Cassandra and MongoDB. Because these new systems are basically written from scratch, they do not use SQL, leading to the rise of the NoSQL movement.

Software engineers in the developer community also accepted NoSQL, and accepted it more widely than the emergence of SQL at that time. The reason is easy to understand: NoSQL is now popular; it promises scale and power; it seems to be a shortcut to project success. But then something went wrong.

A typical software developer seduced by NoSQL. Don't be like this guy.

Developers soon discovered that the absence of SQL is actually very limited. Each NoSQL database provides its own unique query language, which means: learning more languages (and spreading knowledge among colleagues); increasing the difficulty of connecting the database to the application, resulting in a strong coupling between code; and the lack of a third-party ecosystem, requiring companies to develop their own operational and visualization tools.

These NoSQL languages are new, but not fully developed. For example, relational databases have been running for many years, and work such as adding the necessary features to SQL (such as JOIN) has already been done; the immaturity of the NoSQL language means more complexity at the application level. The lack of JOIN also leads to de-normalization, which in turn leads to data inflation and fossilization.

Some NoSQL databases add their own "sql-like" query languages, such as Cassandra's CQL. But this often makes the problem worse. If you use exactly the same interface as anything else, the more common it is, it will actually lead to more psychological questions: engineers don't know what to support and what they don't support.

The sql-like query language is like the Star Wars holiday special. Accept or not imitate.

(and always avoid Star Wars specials)

Some people in the community saw NoSQL's problems at an early stage (for example, DeWitt and Stone Blake discovered it in 2008). With the passage of time, through the painstaking accumulation of personal experience in the process of use, more and more software developers agree with this.

Chapter 3: the return of SQL

The software community, which was initially seduced by the dark forces, began to see the light, and SQL also staged a heroic return.

The first is the SQL interface on Hadoop (and after Spark), which led to the rise of NoSQL,NoSQL meaning "not just SQL" (Not Only SQL).

Then came the rise of NewSQL: a new extensible database that fully embraced SQL. H-Store (published in 2008) from researchers at the Massachusetts Institute of Technology (MIT) and Brown University (Brown) is one of the first extended OLTP databases. Google once again led the way, creating a geographically repetitive database of SQL interfaces based on their Spanner paper (published in 2012) (whose authors included the original MapReduce authors), followed by other pioneers such as CockroachDB (2014).

At the same time, the PostgreSQL community is beginning to recover, adding some key improvements, such as the JSON data type (2012), as well as potpourri, a new feature in PostgreSQL 10: better native support for partitioning and replication, full-text search support for JSON, and more features (scheduled for release later this year). Others such as CitusDB (2016) and others (TimescaleDB released this year) have found new ways to extend PostgreSQL for specific data workloads.

In fact, our process of developing TimescaleDB is closely related to the development trajectory of the industry. Early TimescaleDB builds used our own sql-like query language, ioQL. Yes, we also failed to resist the dark side: we felt that being able to build our own query language should be very powerful. However, although this seems to be a simple road, we soon realize that more work needs to be done. We also find that we need to constantly look up the appropriate syntax to query the content that can already be queried with SQL.

One day, we realized that there was no point in building our own query language. The most important thing is to accept SQL. This is one of the best design decisions we have ever made. Suddenly, a whole new world appeared. Now, although our database is only 5 months old, users can use our database in a production environment, and there are many other wonderful things: visualization tools (Tableau), connectors with common ORM, various tools and backup options, rich online tutorials and syntax explanations, and so on.

Believe in Google and live forever.

Google has been leading the way in data engineering and infrastructure for more than a decade. We should pay close attention to what they are doing.

Take a look at Google's second largest Spanner paper, released just four months ago (Spanner: becoming a SQL system, May 2017), and you will find that it supports our findings.

For example, Google started out building on Bigtable, but later found that not using SQL would cause a lot of problems (highlighting all of our following references):

While these systems provide some of the advantages of database systems, they lack the traditional database features that many application developers often rely on. A key example is a robust query language, which means that developers must write complex code to process and aggregate data in an application. Therefore, we decided to turn Spanner into a complete SQL system, and query execution is closely integrated with other architectural features of Spanner, such as strong consistency and global replication.

Later in the paper, they further grasp the basic principles of the transition from NoSQL to SQL:

Spanner's raw API provides NoSQL methods for point lookup and range scanning of individual and crosstab tables. Although the NoSQL method provides a simple way to start a wrench and continues to be useful in simple retrieval scenarios, SQL provides important added value in expressing more complex data access patterns and pushing calculations onto the data.

This article also describes how the adoption of SQL does not stop on the wrench, but actually extends to the rest of Google, where multiple systems now share a common SQL dialect:

Wrench's SQL engine shares a common SQL dialect, called "standard SQL", with several other systems drilled on Google, including internal systems such as F1 and keyholes (etc.) and external systems such as BigQuery …

For Google users, this reduces barriers to working across systems. A developer or data analyst has written SQL for Spanner databases so that their understanding of the language can be transferred to Dremel without having to worry about subtle differences such as syntax, null handling, and so on.

The success of this method is self-evident. Spanner has become a "source of truth" for major Google systems, including AdWords and Google Games, while "potential cloud customers are very interested in using SQL".

Considering that Google helped launch the NoSQL campaign in the first place, it is worth noting that it is now accepting SQL. (leading some people to think recently: "did Google send big data Industry in a 10-year holiday?")

SQL will become a thin waist.

In computer networks, there is a concept called "thin waist structure".

The emergence of this idea solves a key problem: imagine a stack, the underlying hardware layer and the top software layer on any given network device. There may be a variety of network hardware in the middle; similarly, there may be a variety of software and applications. There is a need for a way to ensure that the software can still connect to the network no matter what happens to the hardware; similarly, it ensures that the network hardware knows how to handle network requests no matter what happens to the software.

In the network, the role of thin waist is played by the Internet Protocol (IP), which is the public interface of the underlying networking protocol and higher-level application and transport protocols designed for the local area network. This is a good explanation.) And (in a broad simplification), this common interface has become the common language of computers, enabling networks to connect to each other and devices to communicate, and this "network" can evolve into today's rich and diverse Internet.

We believe that SQL has become a thin waist for data analysis.

We live in an era where data is becoming "the most valuable resource in the world" (the Economist, May 2017). Therefore, we see the Cambrian explosion of professional databases (OLAP, time series, documents, charts, etc.), data processing tools (Hadoop,Spark,Flink), data bus (Kafka,RabbitMQ) and so on. We also have more applications that rely on these data infrastructures, whether they are third-party data visualization tools (Tableau,Grafana PowerBI,Superset), web frameworks (Rails,Django), or custom data-driven applications.

Like the network, we have a complex stack, underlying infrastructure and top applications. Usually, we end up writing a lot of glue code to do this stack work. But the glue code can be fragile: it requires careful operation and maintenance.

What we need is a common interface that allows parts of the stack to communicate with each other. Ideally, the industry has been standardized. It can minimize the communication obstacles between different layers.

This is the power of SQL. Like IP, SQL is a public interface.

But SQL is actually much more complex than IP. Because the data still need to support human analysis. Moreover, one of the initial goals set by the creators of SQL was to be highly readable.

Is SQL perfect? No, but most people in the community already know the language. Although there are already engineers developing more natural language interfaces, where will these systems eventually be connected? It's still SQL.

So there is another layer at the top of the stack. That floor is us humans.

SQL regression

SQL is back. It's not just because the practice of writing glue code when assembling NoSQL tools is very offensive. Not only is it difficult to learn all kinds of new languages. Nor is it just because standards bring all kinds of advantages.

And because the world is full of data. It surrounds us and binds us. At first, we rely on the human sensory nervous system to deal with it. Now, software and hardware systems have also become smart enough to help us. As more and more data is collected, we can also better understand the world, and the need for system complexity, storage, processing, analysis, and visualization of these data will only continue to grow.

We live in a fragile world with a million different interfaces. Maybe we can keep hugging SQL. Everything will follow the law of conservation of energy.

Want to know more about databases and cloud technologies?

Come and follow the official websites of "data and Cloud", "Cloud and Enmo", "official account" and "Yunhe Enmo". We look forward to your learning and progress together!

Data and cloud Mini Program "DBASK" online Q & A, at any time, welcome to understand and follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.