Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to read hbase data and convert it to dataFrame by spark

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how spark reads hbase data and converts it into dataFrame". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how spark reads hbase data and converts it into dataFrame.

In the last two days, we have studied Spark to read hbase data directly and convert it into dataframe. The reason for doing this

1. The company's data is mainly stored in hbase.

2. Using dataframe, it is easier to realize the calculation.

Although the official hbase has provided the hbase-spark interface, it has not been released to the public, and the current project has this need, and there are few references in this area on the Internet.

Therefore, the code is as follows, for reference only.

Import org.apache.hadoop.hbase.client._import org.apache.hadoop.hbase.io.ImmutableBytesWritableimport org.apache.hadoop.hbase.mapreduce.TableInputFormatimport org.apache.hadoop.hbase. {TableName, HBaseConfiguration} import org.apache.hadoop.hbase.util.Bytesimport org.apache.spark.sql.SQLContextimport org.apache.spark. {SparkContext, SparkConf} / * Created by seagle on 6-28-16. * / object HBaseSpark {def main (args: array [string]): Unit = {/ / Local mode operation Easy to test val sparkConf = new SparkConf (). SetMaster ("local"). SetAppName ("HBaseTest") / / create hbase configuration val hBaseConf = HBaseConfiguration.create () hBaseConf.set (TableInputFormat.INPUT_TABLE, "bmp_ali_customer") / / create spark context val sc = new SparkContext (sparkConf) val sqlContext = new SQLContext (sc) import sqlContext.implicits._// get data val hbaseRDD = sc.newAPIHadoopRDD (hBaseConf,classOf [TableInputFormat], classOf [ImmutableBytesWritable] from the data source ClassOf [Result]) / / Map data to tables, that is, convert RDD to dataframe schema val shop = hbaseRDD.map (r = > (Bytes.toString (r._2.getValue (Bytes.toBytes ("info"), Bytes.toBytes ("customer_id")), Bytes.toString (r._2.getValue (Bytes.toBytes ("info"), Bytes.toBytes ("create_id"). ToDF ("customer_id") "create_id") shop.registerTempTable ("shop") / / Test val df2 = sqlContext.sql ("SELECT customer_id FROM shop") df2.foreach (println)} code can run on the premise that 1. Spark-sql jar2 is referenced and Hbase-site.xml is configured. And put it in the root directory of the project so far, I believe you have a deeper understanding of "how to read hbase data from spark and convert it into dataFrame". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report