In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
When writing data to Hbase, common writing methods include using HBase API and Mapreduce to import data in batches. When using these methods to import data, the approximate flow of writing a piece of data to HBase database is shown in Figure.
The data is first written to the rain shoe log WAl, then to the pre-write log, then to MemStore, and finally to Flush to Hfile. This way of writing data does not lead to data loss, and the order of the data is correct, but when a large number of data writes are encountered, the speed of writing is difficult to guarantee. So, introduce a higher performance write method BulkLoad.
Bulk writing data using BulkLoad is mainly divided into two parts:
1. Use HFileOutputFormat2 to write HFile to HDFS directory through MapReduce job written by yourself. Since the data written to HBase is sorted in order, configureIncrementalLoad() in HFileOutputFormat2 can complete the required configuration.
2. Move Hfile from HDFS to HBase table. The approximate process is as shown in Figure 1.
Example code pom dependency:
org.apache.hbase hbase-server 1.4.0 org.apache.hadoop hadoop-client 2.6.4 org.apache.hbase hbase-client 0.99.2 package com.yangshou;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.io.ImmutableBytesWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class BulkLoadMapper extends Mapper { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //Read every piece of data in the file, using the serial number as the row key String line = value.toString(); //split data //The elements in the split array are: serial number, user id, product id, user behavior, product classification, time, address String[] str = line.split(" "); String id = str[0]; String user_id = str[1]; String item_id = str[2]; String behavior = str[3]; String item_type = str[4]; String time = str[5]; String address = "156"; //concatenate rowkey and put ImmutableBytesWritable rowkry = new ImmutableBytesWritable(id.getBytes()); Put put = new Put(id.getBytes()); put.add("info".getBytes(),"user_id".getBytes(),user_id.getBytes()); put.add("info".getBytes(),"item_id".getBytes(),item_id.getBytes()); put.add("info".getBytes(),"behavior".getBytes(),behavior.getBytes()); put.add("info".getBytes(),"item_type".getBytes(),item_type.getBytes()); put.add("info".getBytes(),"time".getBytes(),time.getBytes()); put.add("info".getBytes(),"address".getBytes(),address.getBytes()); //write the data context.write(rowkry,put); }}package com.yangshou;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.TableName;import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.io.ImmutableBytesWritable;import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class BulkLoadDriver { public static void main(String[] args) throws Exception { //Get Hbase configuration Configuration conf = HBaseConfiguration.create(); Connection conn = ConnectionFactory.createConnection(conf); Table table = conn.getTable(TableName.valueOf("BulkLoadDemo")); Admin admin = conn.getAdmin(); //set job Job job = Job.getInstance(conf,"BulkLoad"); job.setJarByClass(BulkLoadDriver.class); job.setMapperClass(BulkLoadMapper.class); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Put.class); //Set input/output path of file job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(HFileOutputFormat2.class); FileInputFormat.setInputPaths(job,new Path("hdfs://hadoopalone:9000/tmp/000000_0")); FileOutputFormat.setOutputPath(job,new Path("hdfs://hadoopalone:9000/demo1")); //load data into Hbase table HFileOutputFormat2.configureIncrementalLoad(job,table,conn.getRegionLocator(TableName.valueOf("BulkLoadDemo"))); if(job.waitForCompletion(true)){ LoadIncrementalHFiles load = new LoadIncrementalHFiles(conf); load.doBulkLoad(new Path("hdfs://hadoopalone:9000/demo1"),admin,table,conn.getRegionLocator(TableName.valueOf("BulkLoadDemo"))); } }}
instance data
44979 100640791 134060896 1 5271 2014-12-09 Tianjin City 44980 100640791 96243605 1 13729 2014-12-02 Xinjiang
Creating tables in the Hbase shell
create 'BulkLoadDemo','info'
Packaged and executed
```hadoop jar BulkLoadDemo-1.0-SNAPSHOT.jar com.yangshou.BulkLoadDriver
Note: Before executing hadoop jar, you should load the relevant packages from Hbase first.
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.