Preliminary application of SparkSQL 04/21 Update SLTechnology News&Howtos

Preliminary application of SparkSQL

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Recently, SparkSQL is used in the project to do statistical analysis of the data, and record it at leisure. Directly on the code: import org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContextobject SparkSQL {/ / defines two case class An and B: / / An is the user's basic information: including customer number, * * number and gender / / B is the user's transaction information: including customer number, consumption amount and consumption status case class A (custom_id:String,id_code:String,sex:String) case class B (custom_id:String,money:String) Status:Int) def main (args: array [string]): Unit = {/ / when the amount of data is small Tests have found that using local [*] is more efficient than both local and YARN-based ones. / / using local [*] mode, set AppName to "SparkSQL" val sc = new SparkContext ("local [*]", "SparkSQL") val sqlContext = new SQLContext (sc) import sqlContext.createSchemaRDD / / define two RDD:A_RDD and B_RDD. The data is separated by char (1) char (1), and the corresponding customer information is extracted. Val A_RDD = sc.textFile ("hdfs://172.16.30.2:25000/usr/tmpdata/A.dat") .map (_ .split ("\ u0001\ u0001")) .map (t = > tbclient (t (0), t (4), t (13)) val B_RDD = sc.textFile ("hdfs://172.16.30.3:25000/usr/tmpdata/B.dat"). Map (_ .split ("\ u0001\ u0001")) .map (t = > tbtrans (t (16)) T (33) T (71) .toInt) / / convert ordinary RDD to SchemaRDD A_RDD.registerTempTable ("A_RDD") B_RDD.registerTempTable ("B_RDD") def toInt (s: String): Int = {try {s.toInt} catch {case e: Exception = > 9999}} def myfun2 (id_code:String): Int = {val I = id _ code.length I} / / define the function: judge the sign according to the * * / note here the use of the substring method of Scala And Java, Oracle and others are different def myfun5 (id_code:String): String = {var year = "" if (id_code.length = = 18) {val md = toInt (id_code.substring (6jue 10)) val I = 1900 val years=new Array [String] (12) years (0) = "mouse" years (1) = "cow" years (2) = "tiger" Years (3) = "rabbit" years (4) = "dragon" years (5) = "snake" years (6) = "horse" years (7) = "sheep" years (8) = "monkey" years (9) = "chicken" years (10) = "dog" years (11) = "pig" year = years ((md-i)) % years.length)} year} / / set age group def myfun3 (id_code:String): String = {var rt = "" if (id_code.length = = 18) {val age = toInt (id_code.substring (6) 10) if (age > = 1910 & & age)

< 1920){ rt = "1910 ~ 1920" } else if(age >

= 1920 & & age

< 1930){ rt = "1920 ~ 1930" } else if(age >

= 1930 & & age

< 1940){ rt = "1930 ~ 1940" } else if(age >

= 1940 & & age

< 1950){ rt = "1940 ~ 1950" } else if(age >

= 1950 & & age

< 1960){ rt = "1950 ~ 1960" } else if(age >

= 1960 & & age= 1970 & & age= 1980 & & age= 1990 & age= 2010 & & age= "10000" & & money= "50000" & & money= "60000" & & money= "80000" & & money= "100000" & money= "150000" & money= "200000" & money= "1000000" & money= "10000000" & & money= "5000000" & & money= & md Md = 421 & & md = 522 & & md = 622 & & md = 723 & & md = 824 & md = 1024 & & md = 1123 & & md = 1223 & & md = 1123 & & md myfun1 (x) sqlContext.registerFunction ("fun3") (z:String) = > myfun3 (z) sqlContext.registerFunction ("fun4", (m:String) = > myfun4 (m)) sqlContext.registerFunction ("fun5", (n:String) = > myfun5 (n)) / / Constellation Statistics Note that there must be a restriction of fun2 (id_code) = 18, otherwise, the first field has this restriction However, the second statistical field does not have this restriction val result1 = sqlContext.sql ("select fun1 (id_code), count (*) from A_RDD t where fun2 (id_code) = 18 group by fun1 (id_code)") / / generic statistics val result2 = sqlContext.sql ("select fun5 (a.id_code))" Count (*) from A_RDD a where fun2 (id_code) = 18 group by fun5 (a.id_code)) / / Statistics on the number of consumers and the total amount according to the consumption range val result3 = sqlContext.sql ("select fun4 (a.money), count (distinct a.custom_id)) SUM (a.money) from B_RDD a where a.status=8 and a.custom_id in (select b.custom_id from A_RDD b where fun2 (b.id_code) = 18) group by fun4 (a.money) ") / / print the result result3.collect () .foreach (println) / / you can also save the result to OS/HDFS result2.saveAsTextFile (" file:///tmp/age")}})

When testing result3, an error was found:

Exception in thread "main" java.lang.RuntimeException: [1.101] failure: ``NOT'' expected but `select' found

Select fun5 (a.id_code), count (*) from A_RDD a where fun2 (a.id_code) = 18 and a.custom_id IN (select distinct b.custom_id from B_RDD b where b.status=8) group by fun5

(a.id_code)

At scala.sys.package$.error (package.scala:27)

At org.apache.spark.sql.catalyst.SqlParser.apply (SqlParser.scala:60)

At org.apache.spark.sql.SQLContext.parseSql (SQLContext.scala:74)

At org.apache.spark.sql.SQLContext.sql (SQLContext.scala:267)

At SparkSQL$.main (SparkSQL.scala:198)

At SparkSQL.main (SparkSQL.scala)

At sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod)

At sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)

At sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)

At java.lang.reflect.Method.invoke (Method.java:606)

At com.intellij.rt.execution.application.AppMain.main (AppMain.java:140)

It is still in the debugging phase, and it may be that SparkSQL's support for conditional sub-queries is not very good (just guesswork).

If you have any questions, I also hope that the passing experts will not hesitate to give us advice.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.