Typical scenarios of hbase 04/27 Update SLTechnology News&Howtos

Typical scenarios of hbase

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Hbase integrates Mapreduce

in offline task scenarios, MapReduce accesses HBASE data to speed up analysis and expand analysis capabilities.

Read data from hbase (result)

Public class ReadHBaseDataMR {private static final String ZK_KEY = "hbase.zookeeper.quorum"; private static final String ZK_VALUE = "hadoop01:2181,hadoop01:2182,hadoop03:2181"; private static Configuration conf; static {conf=HBaseConfiguration.create (); conf.set (ZK_KEY,ZK_VALUE) / / since you are reading from hbase to your own hdfs cluster, you need to load the hdfs configuration file conf.addResource ("core-site.xml"); conf.addResource ("hdfs-site.xml");} / job public static void main (String [] args) {Job job = null Try {/ / hbase's conf job = Job.getInstance (conf); job.setJarByClass (ReadHBaseDataMR.class); / / full table scan Scan scans=new Scan (); String tableName= "user_info" / / set the integrated TableMapReduceUtil.initTableMapperJob of MapReduce and hbase (tableName, scans, ReadHBaseDataMR_Mapper.class, Text.class, NullWritable.class, job, false) / / set the number of ReducerTask to 0 job.setNumReduceTasks (0); / / set the output to match the path Path output=new Path ("/ output/hbase/hbaseToHDFS") on hdfs; if (output.getFileSystem (conf). Paths (output)) {output.getFileSystem (conf) .delete (output, true) } FileOutputFormat.setOutputPath (job, output); / / submit task boolean waitForCompletion = job.waitForCompletion (true); System.exit (waitForCompletion?0:1);} catch (Exception e) {e.printStackTrace () }} / / Mapper / / use TableMapper to read data from tables in hbase: private static class ReadHBaseDataMR_Mapper extends TableMapper {Text mk = new Text (); NullWritable kv = NullWritable.get (); @ Override protected void map (ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {/ / default reads List cells = value.listCells () according to each rowkey / / A row of records, row keys, column clusters, columns, timestamps for (Cell cell:cells) {String row= Bytes.toString (CellUtil.cloneRow (cell)); / / row key String cf=Bytes.toString (CellUtil.cloneFamily (cell)); / / column cluster String column=Bytes.toString (CellUtil.cloneQualifier (cell)) are determined in four coordinates. / column String values=Bytes.toString (CellUtil.cloneValue (cell)); / / value long time=cell.getTimestamp (); / / timestamp mk.set (row+ "\ t" + cf+ "\ t" + column+ "\ t" + value+ "\ t" + time); context.write (mk,kv);}

Write data to hbase (put)

Public class HDFSToHbase {private static final String ZK_CONNECT_KEY = "hbase.zookeeper.quorum"; private static final String ZK_CONNECT_VALUE = "hadoop02:2181,hadoop03:2181,hadoop01:2181"; private static Configuration conf; static {conf=HBaseConfiguration.create (); conf.set (ZK_CONNECT_KEY,ZK_CONNECT_VALUE) / / since you are reading from hbase to your own hdfs cluster, you need to load the hdfs configuration file conf.addResource ("core-site.xml"); conf.addResource ("hdfs-site.xml");} / job public static void main (String [] args) {try {Job job = Job.getInstance (conf); job.setJarByClass (HDFSToHbase.class) Job.setMapperClass (MyMapper.class); / / specify the output job.setMapOutputKeyClass (Text.class) on the Map side; job.setMapOutputValueClass (NullWritable.class); / * * the representation specified as nulL uses the default * / String tableName= "student" / / integrate MapReduce reducer into hbase TableMapReduceUtil.initTableReducerJob (tableName,MyReducer.class, job,null, false); / / specify the input path of MapReducer Path input = new Path ("/ in/mingxing.txt"); FileInputFormat.addInputPath (job, input) / / submit tasks boolean waitForCompletion = job.waitForCompletion (true); System.exit (waitForCompletion? 0: 1);} catch (Exception e) {e.printStackTrace ();}} / / Mapper private static class MyMapper extends Mapper {NullWritable mv = NullWritable.get () / / output the read data directly to @ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {context.write (value, mv) on the reduce side without any operation on the map side. }} / / Reudcer, using Reudcer of TableReducer / * TableReducer * KEYIN:mapper output of key * VALUEIN:mapper output of value * KEYOUT:reduce output key * default has the fourth parameter: Mutation, which means put/delete operation * / private static class MyReducer extends TableReducer {/ / column cluster String family [] = {"basicinfo", "extrainfo"} @ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {/ / zhangfenglun,M,20,13522334455,zfl@163.com,23521472 field for (NullWritable value:values) {String fields [] = key.toString () .split (",") / take the name as rowkey Put put=new Put (fields [0] .getBytes ()); put.addColumn (fields [0] .getBytes (), "sex" .getBytes (), fields [1] .getBytes ()); put.addColumn (fields [0] .getBytes (), "age" .getBytes (), fields [2] .getBytes ()) Put.addColumn (fields [1] .getBytes (), "phone" .getBytes (), fields [3] .getBytes ()); put.addColumn (fields [1] .getBytes (), "email" .getBytes (), fields [4] .getBytes ()); put.addColumn (fields [1] .getBytes (), "qq" .getBytes (), fields [5] .getBytes ()) Context.write (value, put);} 2. MySQL is imported into HBASE

# Import HBASE from MySQL using sqoop

Entry to sqoop import\-- connect jdbc:mysql://hadoop01:3306/test\ # MySQL-- username hadoop\ # user name to log in to MySQL-- password root\ # login MySQL password-- table book\ # inserted table to MySQL-- table name of hbase-table book\ # HBASE-- column cluster in column-family info\ # HBASE table-- which in hbase-row-key bid\ # mysql One listed as rowkey#ps: here due to version incompatibility issues Therefore, the tables inserted in the HBASE here must be created in advance and cannot be used:-- hbase-create-table\, this statement 3.HBASE integrates hive

principle: Hive and HBASE use their own external API to achieve integration, mainly rely on HBaseStorageHandler inbound communication, using HBaseStorageHandler,Hive can obtain Hive table corresponding HBase table name, column cluster and column, InputFormat and OutputFormat classes, create and delete HBase table and so on.

Hive accesses table data in HBase, essentially reading HBase table data through MapReduce. The realization is that in MR, HiveHBaseTableInputFormat is used to complete the segmentation of HBase table and obtain RecordReader objects to read data.

's principle of splitting the HBASE table: a region is split into a split, that is, there are as many region,MapReduce as there are map task in the table.

reads HBASE table data through scanner, and scans the table. If there is a filter condition, it is converted to filter. When the filter condition is rowkey, it is converted to rowkey filtering.

Specific operations:

# specify the address of the zookeeper cluster used by hbase: the default port is 2181. You can leave it unwritten: hive > set hbase.zookeeper.quorum=hadoop02:2181,hadoop03:2181,hadoop04:2181;# specify the root directory hive > set zookeeper.znode.parent=/hbase that hbase uses in zookeeper. # create hive table based on HBase table hive > create external table mingxing (rowkey string, base_info map, extra_info map) row format delimited fields terminated by'\ t'> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > with serdeproperties ("hbase.columns.mapping" = ": key,base_info:,extra_info:") > tblproperties ("hbase.table.name" = "mingxing", "hbase.mapred.output.outputtable" = "mingxing") # ps:org.apache.hadoop.hive.hbase.HBaseStorageHandler: processor handling hive-to-hbase conversion relationship # ps:hbase.columns.mapping: defines hbase column clusters and column-to-hive mapping # ps:hbase.table.name:hbase table name

Although hive integrates hbase, the actual data is still stored on hbase, and the corresponding file under the corresponding table directory of hive is empty, but every time data is added in hbase, hive will update the corresponding fields when it executes the table query.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.