Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does hbase transfer data between different versions of hdfs clusters

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how hbase transfers data between different versions of hdfs clusters". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how hbase transfers data between different versions of hdfs clusters".

Many people have a need to write data from one hdfs cluster to the hbase database where another hdfs cluster resides. In general, there is not much difference between the versions of the two hdfs clusters, and such programs are easy to write. But sometimes it spans large versions. For example, in the factory where the author works, the data is all on the hdfs cluster modified based on the hadoop0.19.2 version. To import such data into the hdfs cluster with version 0.20.2 +, you cannot use the same hadoop jar package. How can it be realized?

The easiest way is to import the data from the src cluster locally, and then start another process to transfer the local data to the des cluster.

But there are a few problems:

Efficiency reduction

Take up local disk space

Unable to meet the needs of real-time data guide

The two processes need to be coordinated and the complexity increases.

A better approach is to read src data and write des clusters in the same process. However, this is equivalent to loading two versions of the hadoop jar package in the same process space, which requires the use of two classloader in the program.

The following code enables classloader to load a custom jar package and generate the required Configuration objects:

Java code

URL [] jarUrls = new URL [1]

JarUrls [0] = new File (des_jar_path). ToURI (). ToURL ()

ClassLoader jarloader = new URLClassLoader (jarUrls, null)

Class Proxy = Class.forName ("yourclass", true, jarloader)

Configuration conf = (Configuration) Proxy.newInstance ()

URL [] jarUrls = new URL [1]

JarUrls [0] = new File (des_jar_path). ToURI (). ToURL ()

ClassLoader jarloader = new URLClassLoader (jarUrls, null)

Class Proxy = Class.forName ("yourclass", true, jarloader)

Configuration conf = (Configuration) Proxy.newInstance ()

However, because you need to use this conf object when generating the HTable object, the code that loads the conf object itself is loaded by the default classloader, which is the 0.19.2 jar package. So the Configuration object cast in the last line of the above code is still version 0.19.2. What are we going to do?

After thinking about it for a while, I found that if you want to achieve the above functions, you must use this new classloader to generate HTable objects and all subsequent hbase operations, so the new classloader must load all the jar packages you need except for the 0.19.2 jar package, and then encapsulate all the operations. Call with reflection on the outside.

In this case, the constructor is usually not empty, so you need to use Constructor to construct a custom constructor

The code snippet is as follows:

Java code

Main.java

Void init () {

ClassLoader jarloader = generateJarLoader ()

Class Proxy = Class.forName ("test.writer.hbasewriter.HBaseProxy", true, jarloader)

Constructor con = Proxy.getConstructor (new Class [] {String.class, String.class, boolean.class})

Boolean autoflush = param.getBoolValue (ParamsKey.HbaseWriter.autoFlush, true)

Proxy = con.newInstance (new Object [] {path, tablename, autoflush})

}

Void put () {

...

While ((line = getLine ())! = null) {

Proxy.getClass () .getMethod ("generatePut", String.class) .invoke (proxy, line.getField (rowkey))

Method addPut = proxy.getClass () .getMethod ("addPut"

New Class [] {String.class, String.class, String.class})

AddPut.invoke (proxy, new Object [] {field, column, encode})

Proxy.getClass () .getMethod ("putLine") .invoke (proxy)

}

}

ClassLoader generateJarLoader () throws IOException {

String libPath = System.getProperty ("java.ext.dirs")

FileFilter filter = new FileFilter () {

@ Override

Public boolean accept (File pathname) {

If (pathname.getName () .startsWith ("hadoop-0.19.2")

Return false

Else

Return pathname.getName () .endsWith (".jar")

}

}

File [] jars = new File (libPath) .listFiles (filter)

URL [] jarUrls = new URL [jars.length+1]

Int k = 0

For (int I = 0; I < jars.length; iTunes +) {

JarUrls [knight +] = jars [I] .toURI () .toURL ()

}

JarUrls [k] = new File ("hadoop-0.20.205.jar")

ClassLoader jarloader = new URLClassLoader (jarUrls, null)

Return jarloader

}

Main.java

Void init () {

ClassLoader jarloader = generateJarLoader ()

Class Proxy = Class.forName ("test.writer.hbasewriter.HBaseProxy", true, jarloader)

Constructor con = Proxy.getConstructor (new Class [] {String.class, String.class, boolean.class})

Boolean autoflush = param.getBoolValue (ParamsKey.HbaseWriter.autoFlush, true)

Proxy = con.newInstance (new Object [] {path, tablename, autoflush})

}

Void put () {

...

While ((line = getLine ())! = null) {

Proxy.getClass () .getMethod ("generatePut", String.class) .invoke (proxy, line.getField (rowkey))

Method addPut = proxy.getClass () .getMethod ("addPut"

New Class [] {String.class, String.class, String.class})

AddPut.invoke (proxy, new Object [] {field, column, encode})

Proxy.getClass () .getMethod ("putLine") .invoke (proxy)

}

}

ClassLoader generateJarLoader () throws IOException {

String libPath = System.getProperty ("java.ext.dirs")

FileFilter filter = new FileFilter () {

@ Override

Public boolean accept (File pathname) {

If (pathname.getName () .startsWith ("hadoop-0.19.2")

Return false

Else

Return pathname.getName () .endsWith (".jar")

}

}

File [] jars = new File (libPath) .listFiles (filter)

URL [] jarUrls = new URL [jars.length+1]

Int k = 0

For (int I = 0; I < jars.length; iTunes +) {

JarUrls [knight +] = jars [I] .toURI () .toURL ()

}

JarUrls [k] = new File ("hadoop-0.20.205.jar")

ClassLoader jarloader = new URLClassLoader (jarUrls, null)

Return jarloader

}

Java code

HBaseProxy.java

Public HBaseProxy (String hbase_conf, String tableName, boolean autoflush)

Throws IOException {

Configuration conf = new Configuration ()

Conf.addResource (new Path (hbase_conf))

Config = new Configuration (conf)

Htable = new HTable (config, tableName)

Admin = new HBaseAdmin (config)

Htable.setAutoFlush (autoflush)

}

Public void addPut (String field, String column, String encode) throws IOException {

Try {

P.add (column.split (":") [0] .getBytes (), column.split (":") [1] .getBytes ()

Field.getBytes (encode))

} catch (UnsupportedEncodingException e) {

P.add (column.split (":") [0] .getBytes (), column.split (":") [1] .getBytes ()

Field.getBytes ()

}

}

Public void generatePut (String rowkey) {

P = new Put (rowkey.getBytes ())

}

Public void putLine () throws IOException {

Htable.put (p)

}

HBaseProxy.java

Public HBaseProxy (String hbase_conf, String tableName, boolean autoflush)

Throws IOException {

Configuration conf = new Configuration ()

Conf.addResource (new Path (hbase_conf))

Config = new Configuration (conf)

Htable = new HTable (config, tableName)

Admin = new HBaseAdmin (config)

Htable.setAutoFlush (autoflush)

}

Public void addPut (String field, String column, String encode) throws IOException {

Try {

P.add (column.split (":") [0] .getBytes (), column.split (":") [1] .getBytes ()

Field.getBytes (encode))

} catch (UnsupportedEncodingException e) {

P.add (column.split (":") [0] .getBytes (), column.split (":") [1] .getBytes ()

Field.getBytes ()

}

}

Public void generatePut (String rowkey) {

P = new Put (rowkey.getBytes ())

}

Public void putLine () throws IOException {

Htable.put (p)

}

In short, when loading multiple classloader in the same process, it is important to note that the objects loaded by classloader A cannot be converted into classloader B objects, and certainly cannot be used. The mutual calls between the two spaces can only use the basic type or reflection of java.

Thank you for reading, the above is the content of "how hbase transfers data between different versions of hdfs clusters". After the study of this article, I believe you have a deeper understanding of how hbase transfers data between different versions of hdfs clusters, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report