Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The usage of RDD and DataFrame conversion examples in Spark SQL

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "RDD and DataFrame conversion example usage in Spark SQL". The explanation in this article is simple and clear and easy to learn and understand. Please follow the editor's train of thought to study and learn "RDD and DataFrame conversion example usage in Spark SQL".

one。 The first way is to convert RDD to DataFrame1. Official website

two。 Interpreting reflection defines all the schema information in the case class class. Code package coreimport org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.types.StructTypeobject Test {def main (args: Array [String]): Unit = {val spark = SparkSession.builder () .appName ("Test") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") import spark.implicits._") Val result = mess.map (_ .split (" ") .map (x = > Info (x (0) .toInt, x (1)) X (2) .toInt). ToDF () / / result.map (x = > x (0)). Show () / / in version 1.x, price rdd result.rdd.map (x = > x (0)). Collect (). Foreach (println) result.rdd.map (x = > x.getAs [Int] ("id")). Collect (). Foreach (println)}} case class Info (id:Int,name:String,age:Int) 4. Note: before version 2.2, the constructor parameters of the class are limited. After 2.2, there is no limit.

two。 The second way of conversion is 1. Official website

two。 Explain that the formulation of scheme information is the way that programming acts on Row 3. Steps

4. Step to explain the transformation from the original RDD, similar to textFile a StructType matches the data structure in Row (several columns), that is, several StructField associate schema with RDD through createDataFrame. Source code interpretation StructType

6. The source code interpretation StructField can be understood as a list of StructType containing 1 StructField7. The final code package coreimport org.apache.spark.sql.types. {IntegerType, StringType, StructField, StructType} import org.apache.spark.sql. {Row SparkSession} object TestRDD2 {def main (args: Array [String]): Unit = {val spark = SparkSession.builder () .appName ("TestRDD2") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") val result = mess.map (_ .split (", ")) .map (x = > Row (x (0). ToInt, x (1)) X (2) .toInt) / / write val structType = new StructType (Array ("id", IntegerType, true), StructField ("name", StringType, true), StructField ("age", IntegerType, true)) val schema = StructType (structType) val info = spark.createDataFrame (result,schema) info.show ()}} 8. Classic error

9. Solve the problem that the schema information defined by yourself does not match the information in Row val result = mess.map (_ .split (",")) .map (x = > Row (x (0), x (1), x (2)) / / write val structType = new StructType (StructField ("id", IntegerType, true), StructField ("name", StringType, true), StructField ("age", IntegerType, true)) above string wants int Be careful to convert type val result = mess.map (_ .split (",") .map (x = > Row (x (0). ToInt, x (1), x (2) .toInt)) three because of frequent errors. The use of 1.spark-shell some methods in the code to change their own hermit brick df.select ('name). Show this in spark-shell or df.select (' name') .show but not in the code, need hermit to 2.show source code show source code default is true display less than or equal to 20 items If the characters in the corresponding line are false, all the characters in the corresponding line will be displayed. Show (30 false) will not truncate show (5), but the following characters and 20 characters will not show that you can show (5).

3.select method source code

4.select method call position df.select ("name") .show (false) import spark.implicits._// such as df.select ('name) .show (false) df.select ($"name") the underlying source code of the first select is the first source code figure 2, the source code of the three select is the second 5.head source code head defaults to the first, and you can tune a few if you want to show

6.first () shows that the first underlying call is head

The default interpretation of 7.sort source code sort source code in ascending and descending order is

4. The operation method of SQL 1. The official website temporarily tries

two。 Global attempt to manipulate global view plus global_temp rules

five。 Miscellaneous 1. Error

two。 Reason and code val spark = SparkSession.builder () .appName ("Test") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") import spark.implicits._ val result = mess.map (_ .split (", ")) .map (x = > Info (x (0) .toInt, x (1)) X (2) .toInt). ToDF () / / in version 1.x, price rdd result.map (x = > x (0)) is not required in 2.x. Show () is right to write result.rdd.map (x = > x (0)). Collect (). Foreach (println) remove the data in the class in two ways: result.rdd.map (x = > x (0)). Collect (). Foreach (println) result.rdd. Map (x = > x.getAs [Int] ("id")). Collect (). Foreach (println) 3. Pay attention to escape characters for delimiters | you must add escape characters for segmentation, otherwise the data is not correct. Thank you for reading. The above is the content of "RDD and DataFrame conversion instance usage in Spark SQL". After the study of this article, I believe you have a deeper understanding of the use of RDD and DataFrame conversion examples in Spark SQL, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report