In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is about how Hbase Rowkey is designed. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
1. Properties of the table
(1) the maximum number of versions: usually 3. If the application with frequent updates can be set to 1, it can quickly eliminate useless data, which is effective in saving storage space and improving query speed. However, this kind of demand is relatively small in the field of massive data.
(2) Compression algorithm: you can try the latest snappy algorithm. Compared with lzo, the compression ratio is similar, the compression efficiency is slightly higher, and the decompression efficiency is much higher.
(3) inmemory: attributes that tables are stored in memory and are always ignored. If the data is completely stored in memory, the performance gap between hbase and the popular in-memory databases memorycached and redis remains to be measured.
(4) bloomfilter: depending on the application, it depends on whether it needs to be accurate to rowkey or column. However, you need to understand the principle here. The function of bloomfilter is useful for finding the hfile where the records are located under a region. That is, if there are a large number of hfile under a region, the role of bloomfilter is more obvious. It is suitable for applications where compaction can't catch up with flush.
2 、 rowkey
2.1 scheduling problem
Sorting of digital rowkey from big to small: native hbase only supports sorting from small to large, which is awkward for queries such as rankings. Then the rowkey is converted by the way of rowkey = Integer.MAX_VALUE-rowkey, and the maximum becomes the smallest and the smallest becomes the largest. You can complete the sorting requirements by turning back at the application layer.
2.2 Hot issues
The rows in HBase are sorted according to the dictionary order of rowkey, and this design optimizes the scan operation so that the related rows and the rows that will be read together can be saved nearby, which is convenient for scan. However, poor rowkey design is the source of the hot spots. Hotspots occur in a large number of client direct access to one or a few nodes of the cluster (access may be read, write, or other operations). A large number of visits will cause the single machine where the hotspot region resides beyond its capacity, resulting in performance degradation and even region unavailability, which will also affect other region on the same RegionServer, because the host cannot serve the requests of other region. A good data access pattern is designed so that the cluster can be fully and evenly utilized.
To avoid writing hotspots, rowkey is designed so that different peers are in the same region, but in more cases, data should be written to multiple region of the cluster instead of one.
Here are some common ways to avoid hotspots and their advantages and disadvantages:
Salt
The addition of salt here is not the addition of salt in cryptography, but the addition of a random number before rowkey, specifically assigning a random prefix to rowkey to make it different from the beginning of the previous rowkey. The number of prefixes assigned should be the same as the number of prefixes you want to use to spread across different region. After adding salt, the rowkey will be scattered to each region according to the randomly generated prefixes to avoid hot spots.
Hash
The hash will always salt the same line with the same prefix. Hashes can also spread the load across the cluster, but reading is predictable. Using a determined hash allows the client to reconstruct the complete rowkey, and the get operation can be used to accurately obtain a row of data.
Reverse
The third way to prevent hotspots is to reverse fixed-length or digital format rowkey. This puts the frequently changed parts of the rowkey (the most meaningless parts) first. This can effectively random rowkey, but at the expense of the ordering of rowkey.
In the example of reversing rowkey, the mobile phone number is rowkey, and the string reversed by the mobile phone number can be used as the rowkey, which avoids the hot issues caused by starting with a fixed mobile phone number.
Timestamp reversal
A common data processing problem is to quickly get the latest version of the data. Using inverted timestamps as part of rowkey is very useful for this problem. You can append to the end of key with Long.Max_Value-timestamp, for example, [key] [reverse_timestamp], the latest value of [key] can get the first record of [key] through scan [key], because the rowkey in HBase is ordered. The first record is the last data entered. For example, you need to save a user's operation record and sort it in reverse order according to the operation time. When designing a rowkey, you can design [userId inversion] [Long.Max_Value-timestamp]. When querying all the operation record data of a user, you can directly specify that the reversed userId,startRow is [userId inversion] [000000000000], and stopRow is [userId inversion] [Long.Max_Value-timestamp] if you need to query the operation records for a certain period of time. StartRow is [user inversion] [Long.Max_Value-start time], and stopRow is [userId inversion] [Long.Max_Value-end time] rowkey is the key in the key-value storage of hbase. Usually, the field to be queried by the user is used as rowkey, and the query result is used as value. Several different query requirements can be met through design.
3 、 columnfamily
There are as few columnfamily as possible, because too much columnfamily will influence each other.
4 、 column
For applications where column needs to be extended, column can be designed in a common way, but for applications with relatively fixed columns, it is best to encapsulate a row of records into a column, which can save storage space. Protocolbuffer is recommended for encapsulation.
The following will introduce some special table structure design methods in different scenarios, which are just some explorations. Welcome to discuss:
Table structure design for scenarios with too many value:
At present, I have come across a kind of key-value data structure, which contains so many column under a key that there are three consequences: when querying on the client side, when oom,bulkload writes, oom,regionsplit fails. Generally speaking, the number of column in hbase should not exceed the order of millions. This has been verified both in the official instructions and in my actual tests.
There are two ideas to refer to. The first is to deal with these special rowkey separately, and the second is as follows:
Consider the solution of designing column to rowkey. For example, the original rowkey is uid1,column is uid2,uid3.... After redesign, the rowkey is ~, ~. Of course, you will have questions about how to query in this way, and what to do if you want to query all the uid under uid1. It is explained here that hbase is not the only random read method for get. It is a scanning method that contains scan (startkey,endkey), which is as efficient as get. All you need to get the record under uid1 is new Scan ("uid1~", "uid1~~").
Thank you for reading! This is the end of the article on "how to Design Hbase Rowkey". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.