In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about the rowkey design principles and implementation of hbase, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
One: the storage form of hbase
The interior of hbase exists in the form of KeyValue, and its key has rowkey:family:column:logTime,value as its stored content. Most of them are arranged in ascending order in region, and the only thing is that logtime is arranged in descending order. Therefore, the closer the information is to the left, the easier it is to retrieve. When designing, consider putting the important information on the left and the unimportant information on the right. This can improve the speed of querying data. In this way, the most important thing to improve the speed of indexing is to design a suitable rowkey.
Second: the design principles of rowkey
1. The principle of length is as short as possible, with a maximum of no more than 64K. The impact of too long has two points, one is that it greatly affects the storage efficiency of HFile. Second, the cache memstore can not be used effectively, and the cache can not store too much information, resulting in the reduction of retrieval efficiency. 2. The only principle ensures the uniqueness of rowkey, which has nothing to say. 3. According to my own principle, we try our best to ensure that the rowkey that is often used together is stored on the same region, which helps to improve the efficiency of retrieval. But avoid hot issues. 4, for the commonly used retrieval rowkey, try to use the vertical table (more rows and fewer columns), do not choose a wide table (more columns and fewer rows).
Third: centralized solutions to hot issues caused by rowkey
1. Add salt: add a redundant information in front of the rowkey so that the data can be dispersed into different region. Advantages: it can effectively prevent rowkey from being centrally allocated to one or more region. Effectively avoid hot issues; disadvantages: virtually increase the length of rowkey; range retrieval can not be effectively used. 2. Exchange fields and increase weight: if there are several information fields in the rowkey, you can adjust the order of the information fields. Disadvantages: for a single information field, or rowkey that will encounter region hotspots no matter how you adjust it, you can't solve it. 3. Random keys: hashize the rowkey and assign it to different servers. Similar to the way of adding salt; the following is the ranking of sequential read performance (from high to low, write performance is opposite to read performance): sequence key-> use salt key-> promote field key-> random key
Impose several knowledge points:
1, try to use range query instead of prefix query; 2, when there is a lot of data, use paging query; after reading the above, do you have any further understanding of hbase's rowkey design principles and implementation? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.