How to use SparkSQL 07/08 Update SLTechnology News&Howtos

How to use SparkSQL

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Today, I would like to share with you the relevant knowledge of how to use SparkSQL. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

One: introduction to SparkSQL1.SparkSQL

SparkSQL is a module of Spark for dealing with structured data, it provides a data abstraction DataFrame (the core programming abstraction is DataFrame), and SparkSQL acts as a distributed SQL query engine.

Spark SQL is to convert SQL into a task and submit it to the cluster to run, similar to the way Hive executes.

Operation principle of 2.SparkSQL

Convert Spark SQL to RDD, and then submit it to the cluster for execution.

Characteristics of 3.SparkSQL

(1) easy to integrate. Spark SQL has been integrated into Spark.

(2) provide a unified way to access data: JSON, CSV, JDBC, Parquet, etc., all use a unified way to access.

(3) compatible with Hive

(4) Standard data connection: JDBC, ODBC

II. Application of SparkSQL

Package sqlimport org.apache.avro.ipc.specific.Personimport org.apache.sparkimport org.apache.spark.rdd.RDDimport org.apache.spark.sqlimport org.apache.spark.sql.catalyst.InternalRowimport org.apache.spark.sql. {DataFrame, Dataset, Row SparkSession} import org.junit.Testclass Intro {@ Test def dsIntro (): Unit = {val spark: SparkSession = new sql.SparkSession.Builder () .appName ("ds intro") .master ("local [6]") .getOrCreate () / Import implicit is shi conversion import spark.implicits._ val sourceRDD: RDD [Person] = spark.sparkContext.parallelize (Seq (Person ("Zhang San", 10), Person ("Li Si") 15)) val personDS: Dataset [Person] = sourceRDD.toDS () / / personDS.printSchema () print error message val resultDS: Dataset [Person] = personDS.where ('age > 10). Select (' name 'age) .as [Person] resultDS.show ()} @ Test def dfIntro (): Unit = {val spark: SparkSession = new SparkSession.Builder () .appName ("ds intro") .master ("local") .getOrCreate () import spark.implicits._ val sourceRDD: RDD [Person] = spark.sparkContext.parallelize (Seq (Person ("Zhang San", 10), Person ("Li Si") 15)) val df: DataFrame = sourceRDD.toDF () / Hidden shi Transformation df.createOrReplaceTempView ("person") / / create table val resultDF: DataFrame = spark.sql ("select name from person where age > = 10 and age10"). Show ()} @ Test def database2 (): Unit = {val spark: SparkSession = new SparkSession.Builder () .master ("local [6]") .appName ("database2") .getOrCreate () Import spark.implicits._ val dataset: Dataset [Person] = spark.createDataset (Seq (Person) 10), Person ("Li Si", 20)) / / No matter what type of object is placed in the Dataset, internalRow / / directly obtains the execution plan of the Dataset that has been analyzed and parsed on the RDD in the final execution plan. Get RDD val executionRdd: RDD [InternalRow] = dataset.queryExecution.toRdd / / by converting the underlying RDD of Dataset to the same type as Dataset through Decoder RDD val typedRdd:RDD [Person] = dataset.rdd println (executionRdd.toDebugString) println () println () println (typedRdd.toDebugString)} @ Test def database3 (): Unit = {/ / 1. Create sparkSession val spark: SparkSession = new SparkSession.Builder () .appName ("database1") .master ("local [6]") .getOrCreate () / / 2. Import the shift transformation import spark.implicits._ val dataFrame: DataFrame = Seq (Person ("zhangsan", 15), Person ("lisi", 20). ToDF () / / 3. See what DataFrame can do / / select name from... DataFrame.where ('age > 10). Select (' name) .show ()} / / @ Test// def database4 (): Unit = {/ 1. Create sparkSession// val spark: SparkSession = new SparkSession.Builder () / / .appName ("database1") / / .master ("local [6]") / / .getOrCreate () / 2. Import introduces shift transformation / / import spark.implicits._// val personList=Seq (Person ("zhangsan", 15), Person ("lisi") 20) / 1.toDF// val df1: DataFrame = personList.toDF () / / val df2: DataFrame = spark.sparkContext.parallelize (personList). ToDF () / 2.createDataFrame// val df3: DataFrame = spark.createDataFrame (personList) / 3.read// val df4: DataFrame = spark.read.csv (") / / df4.show () / /} / / toDF () is converted to DataFrame ToDs is converted to Dataset / / DataFrame, that is, Dataset [Row] represents weakly typed operations, Dataset represents strongly typed operations, the type in row,DataFrame can always be run-time type-safe, Dataset can achieve runtime type safety @ Testdef database4 (): Unit = {/ / 1. Create sparkSession val spark: SparkSession = new SparkSession.Builder () .appName ("database1") .master ("local [6]") .getOrCreate () / / 2. Import introduces shift transformation import spark.implicits._ val personList=Seq (Person ("zhangsan", 15), Person ("lisi", 20)) / / DataFrame represents weakly typed operation is compile-time unsafe val df: DataFrame = personList.toDF () / / Dataset is strongly typed val ds: Dataset [Person] = personList.toDS () ds.map ((person:Person) = > Person (person.name,person.age))} @ Test def row (): Unit = {/ / 1.Row how to create What is it / / row object must cooperate with Schema object to have column name val p: Person = Person ("zhangsan", 15) val row: Row = Row ("zhangsan", 15) / / 2. How to get data from row row.getString (0) row.getInt (1) / / 3.Row is also a sample class, row match {case Row (name,age) = > println (name,age)} case class Person (name: String, age: Int)

These are all the contents of this article "how to use SparkSQL". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.