In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces the use of the HBase Thrift interface and related matters needing attention, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to know it.
HBase provides Thrift interface support for non-Java languages. Combined with the experience of using HBase Thrift interface (HBase version 0.92.1), this paper summarizes some problems encountered and related precautions.
1. Storage order of bytes
In HBase, because row (row key and column family, column qualifier, time stamp) is sorted according to dictionary order, for short, int, long and other types of data, through Bytes.toBytes (…) Once converted to an byte array, it must be stored in large-end mode (high bytes at low addresses and low bytes at high addresses). The same is true for value. Therefore, when using Thrift API (C++, Php, Python, etc.), it is best to carry out pack and unpack processing for both row and value according to the large end.
For example, in C++, int variables are converted to lexicographic order in the following ways:
The copy code is as follows:
String key
Int32_t timestamp = 1352563200
Const char* pTs = (const char*) & timestamp
Size_t n = sizeof (int32_t)
Key.append (pTs, n)
Convert the dictionary order to int in the following ways:
The copy code is as follows:
Const char * ts = key.c_str ()
Int32_t timestamp = * (int32_t*) (ts))
Pack and unpack methods are provided in Php for conversion:
The copy code is as follows:
$key = pack ("N", $num)
$num = unpack ("N", $key)
2. Use traps of TScan
In the PHP Thrift API of HBase, TScan can directly set startRow, stopRow, columns, filter and other attributes, which are all null by default, and become non-null after setting them (through the constructor of TScan or directly assigning values to the member variables of TScan). When performing RPC operations through the write () method and Thrift Server, the direct judgment is based on the fact that these attributes are not null, then they are transmitted to the Thrift Server side through the Thrift protocol.
However, in the Thrift API of C++, there is a variable of type _ TScan__isset _ _ isset in TScan, and its internal structure is as follows:
The copy code is as follows:
Typedef struct _ TScan__isset {
_ TScan__isset (): startRow (false), stopRow (false), timestamp (false), columns (false), caching (false), filterString (false) {}
Bool startRow
Bool stopRow
Bool timestamp
Bool columns
Bool caching
Bool filterString
} _ TScan__isset
The write () method of TScan determines whether the bool variable tags under _ TScan__isset set startRow, stopRow, columns, filter, and other attributes, and then decides whether to transfer these attributes to the Thrift Server side through the Thrift protocol, and these properties must be set through the _ _ set_xxx () method to take effect! In the default constructor of TScan, the _ _ isset tag for these attributes is not set to true!
Therefore, if initializing startRow, stopRow, columns, filter and other attributes directly through the constructor of TScan will cause the table to be traversed from scratch, only if the _ _ set_xxx () method is called will the corresponding bool identity be set to true, so that the server will obtain startRow, stopRow, columns, filter and other attributes from Thrift Server for scanning.
3. Number of concurrent access threads
First of all, in order to minimize the time overhead caused by network transmission, the Thrift Server of HBase is best deployed on the same machine as the application client. When Thrift Server starts, you can configure the number of concurrent threads with parameters, otherwise it is easy to cause Thrift Server threads to be full and do not respond to client read and write requests. Specific command: bin/hbase-daemon.sh start thrift-- threadpool-m 200-w 500 (for more parameters, please see here: bin/hbase-daemon.sh start thrift-h).
4. Maximum heap memory configuration
If the client performs scan operations with Thrift Server to read data sequentially, and sets a certain number of cache records (set by the int32_t caching variable of TScan), then these records by caching may take up a considerable part of the heap memory of Thrift Server, especially in the case of multi-client concurrent access.
Therefore, you can increase the maximum heap memory before Thrift Server starts, otherwise the process may be killed due to java.lang.OutOfMemoryError exceptions, especially if a large number of caching records are set when Scan (default is export HBASE_HEAPSIZE=1000MB, which can be set in conf/hbase-env.sh).
Thank you for reading this article carefully. I hope the article "the use of HBase Thrift-based interface and related matters needing attention" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.