Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A brief introduction to Spark SQL

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "brief introduction of Spark SQL". In daily operation, I believe many people have doubts about the brief introduction of Spark SQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "brief introduction of Spark SQL"! Next, please follow the editor to study!

What is it?

Spark 1.0 introduces Spark SQL, which is one of the most active components in the Spark ecosystem. Be able to use Spark for structured storage and operations. Structured data can come from external sources: Hive/Json/parquet,1.2 starts to support JDBC, etc., or it can be obtained by adding Schema to existing RDD.

The current Spark SQL uses the Catalyst optimizer to optimize the SQL statement to get a better execution plan.

A link in the ecology

More importantly, based on DataFrame, Spark SQL can be seamlessly integrated with SparkStreaming, MLIb, etc., so that you can use a technology stack for batch processing, streaming and interactive query.

Spark SQL allows you to query structured data in Spark programs using SQL or familiar DataFrame API. Available for Java, Scala, Python, and R.

Hive vs Shark vs Spark SQL

Hive is the predecessor of Shark and Shark is the predecessor of Spark SQL.

According to the test data provided by Berkeley Lab, the memory-based computing performance of Shark is about 100 times that of hive, and even disk computing is about 10 times that of hive. And Spark SQL has a greater improvement than Shark.

Hive is the basic framework of data warehouse based on Hadoop, and it is also one of the earliest SQL on Hadoop tools to run on Hadoop, but hive is based on MapReduce, and there are a large number of intermediate disk landing processes in the computing process, which consumes a lot of IO, which greatly reduces the running efficiency. Based on this, a large number of optimized SQL Hadoop tools appear, and the most prominent one is Shark.

Shark is built directly on Apache Hive, extending Hive and modifying the three modules of memory management, physical planning and execution in the Hive architecture to run on the Spark engine. So it supports almost all the features of hive, data format, UDF, and uses hive parser, query optimizer and so on. (as shown in the following figure)

In 2014, Databricks announced a full shift from Shark to Spark SQL.

At the hive compatibility level, Spark SQL only depends on HQL Parser/Hive Metastore/Hive SerDes, that is, from the parsed idiom tree (AST) Q of HQL, it is all taken over by Spark SQL, and Catalyst is responsible for the execution plan and optimization.

In addition to supporting existing Hive scripts, Spark SQL has built-in a streamlined SQL parsing and a set of Scala DSL. If we use Spark SQL built-in dialect or Scala DSL to manipulate native RDD objects, we can completely rely on hive. Spark SQL draws on the advantages of Shark, such as memory column storage.

DataFrames and SQL provide common methods to access a variety of data sources, including Hive, Avro, Parquet, ORC, JSON, and JDBC. You can even connect data across these sources.

Referenc

Https://spark.apache.org/sql/

At this point, the study of "brief introduction to Spark SQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report