Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the problems in the use of HBase Thrift-based interface and related precautions?

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces the use of the HBase Thrift interface and related matters needing attention, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to know it.

HBase provides Thrift interface support for non-Java languages. Combined with the experience of using HBase Thrift interface (HBase version 0.92.1), this paper summarizes some problems encountered and related precautions.

1. Storage order of bytes

In HBase, because row (row key and column family, column qualifier, time stamp) is sorted according to dictionary order, for short, int, long and other types of data, through Bytes.toBytes (…) Once converted to an byte array, it must be stored in large-end mode (high bytes at low addresses and low bytes at high addresses). The same is true for value. Therefore, when using Thrift API (C++, Php, Python, etc.), it is best to carry out pack and unpack processing for both row and value according to the large end.

For example, in C++, int variables are converted to lexicographic order in the following ways:

The copy code is as follows:

String key

Int32_t timestamp = 1352563200

Const char* pTs = (const char*) & timestamp

Size_t n = sizeof (int32_t)

Key.append (pTs, n)

Convert the dictionary order to int in the following ways:

The copy code is as follows:

Const char * ts = key.c_str ()

Int32_t timestamp = * (int32_t*) (ts))

Pack and unpack methods are provided in Php for conversion:

The copy code is as follows:

$key = pack ("N", $num)

$num = unpack ("N", $key)

2. Use traps of TScan

In the PHP Thrift API of HBase, TScan can directly set startRow, stopRow, columns, filter and other attributes, which are all null by default, and become non-null after setting them (through the constructor of TScan or directly assigning values to the member variables of TScan). When performing RPC operations through the write () method and Thrift Server, the direct judgment is based on the fact that these attributes are not null, then they are transmitted to the Thrift Server side through the Thrift protocol.

However, in the Thrift API of C++, there is a variable of type _ TScan__isset _ _ isset in TScan, and its internal structure is as follows:

The copy code is as follows:

Typedef struct _ TScan__isset {

_ TScan__isset (): startRow (false), stopRow (false), timestamp (false), columns (false), caching (false), filterString (false) {}

Bool startRow

Bool stopRow

Bool timestamp

Bool columns

Bool caching

Bool filterString

} _ TScan__isset

The write () method of TScan determines whether the bool variable tags under _ TScan__isset set startRow, stopRow, columns, filter, and other attributes, and then decides whether to transfer these attributes to the Thrift Server side through the Thrift protocol, and these properties must be set through the _ _ set_xxx () method to take effect! In the default constructor of TScan, the _ _ isset tag for these attributes is not set to true!

Therefore, if initializing startRow, stopRow, columns, filter and other attributes directly through the constructor of TScan will cause the table to be traversed from scratch, only if the _ _ set_xxx () method is called will the corresponding bool identity be set to true, so that the server will obtain startRow, stopRow, columns, filter and other attributes from Thrift Server for scanning.

3. Number of concurrent access threads

First of all, in order to minimize the time overhead caused by network transmission, the Thrift Server of HBase is best deployed on the same machine as the application client. When Thrift Server starts, you can configure the number of concurrent threads with parameters, otherwise it is easy to cause Thrift Server threads to be full and do not respond to client read and write requests. Specific command: bin/hbase-daemon.sh start thrift-- threadpool-m 200-w 500 (for more parameters, please see here: bin/hbase-daemon.sh start thrift-h).

4. Maximum heap memory configuration

If the client performs scan operations with Thrift Server to read data sequentially, and sets a certain number of cache records (set by the int32_t caching variable of TScan), then these records by caching may take up a considerable part of the heap memory of Thrift Server, especially in the case of multi-client concurrent access.

Therefore, you can increase the maximum heap memory before Thrift Server starts, otherwise the process may be killed due to java.lang.OutOfMemoryError exceptions, especially if a large number of caching records are set when Scan (default is export HBASE_HEAPSIZE=1000MB, which can be set in conf/hbase-env.sh).

Thank you for reading this article carefully. I hope the article "the use of HBase Thrift-based interface and related matters needing attention" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report