What are the questions for big data's Hbase interview? 07/19 Update SLTechnology News&Howtos

What are the questions for big data's Hbase interview?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the interview questions of big data Hbase". In the daily operation, I believe that many people have doubts about the questions of big data Hbase interview questions. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "what are the big data Hbase interview questions?" Next, please follow the editor to study!

First, briefly describe the use of compact in HBASE, when it is triggered, which two types are divided into, what is the difference, and what are the relevant configuration parameters?

In hbase, whenever there is memstore data flush to disk, a storefile is formed. When the number of storeFile reaches a certain level, it is necessary to compaction the storefile file. The role of Compact:

1. Merge files

2. Clear expired and superfluous versions of data

3. Improve the efficiency of reading and writing data.

2. There are two ways to implement compaction in HBase: minor and major, the difference between the two compaction methods is:

1. The Minor operation is only used to merge some files, including minVersion=0 and set up the expired version cleanup of ttl, and does not do any cleaning work of deleting data or multi-version data.

2. The Major operation is to merge all the StoreFile under the HStore under Region, and the final result is to sort out and merge a file.

Third, what is the implementation principle of Hbase filter? Based on the actual project experience, write several scenarios that use filter.

HBase provides a set of filters for filtering data, through which you can filter data in multiple dimensions (rows, columns, data versions) of data in HBase, that is, the data that the filter can eventually filter can be refined into a specific storage cell (located by row key, column name, timestamp). RowFilter 、 PrefixFilter . The filter of hbase is set through scan, so it is filtered based on the query results of scan. There are many types of filters, but they can be divided into two categories-comparative filters and special filters. The function of the filter is to determine whether the data meets the conditions on the server side, and then only return the data that meet the conditions to the client; for example, in order development, we use rowkeyfilter to filter out all orders of a user.

4. What is the internal mechanism of Hbase?

In HBase, whether you add new lines or modify existing lines, the internal process is the same. After receiving the command, HBase saves the change information, or fails to write and throws an exception. By default, when a write is performed, it is written to two places: write-ahead log (also known as HLog) and MemStore. The default way for HBase is to record writes in these two places to ensure data persistence. The write action is considered complete only when the change information in these two places is written and confirmed.

MemStore is a write buffer in memory where data in HBase accumulates before it is permanently written to the hard disk. When the MemStore fills up, the data is written to the hard disk and a HFile is generated. HFile is the underlying storage format used by HBase. HFile corresponds to column families. A column family can have more than one HFile, but one HFile cannot store data for multiple column families. On each node of the cluster, each column family has a MemStore. Hardware failures are common in large distributed systems, and HBase is no exception.

Imagine that if the MemStore is not written, the server crashes and data that is not written to the hard disk in memory will be lost. The response for HBase is to write to WAL before the write action is completed. Each server in the HBase cluster maintains a WAL to record changes. WAL is a file on the underlying file system. The write action is not considered to have completed successfully until the new WAL record is successfully written. This ensures that HBase and the file system that supports it are persistent.

In most cases, HBase uses the Hadoop distributed File system (HDFS) as the underlying file system. If the HBase server goes down, data that is not written from MemStore to HFile can be recovered by playing back WAL. You don't have to do it by hand. There is a recovery process part of the internal mechanism of Hbase. Each HBase server has a WAL, and all tables on this server (and their column families) share this WAL. You might think that skipping WAL while writing should improve write performance. However, we do not recommend disabling WAL unless you are willing to lose data if something goes wrong. If you want to test it, the following code can disable WAL: note: not writing WAL increases the risk of data loss in the event of a RegionServer failure. If WAL is turned off, HBase may not be able to recover data in the event of a failure, and all written data not written to the hard disk will be lost.

5. How to deal with HBase downtime?

Downtime is divided into HMaster downtime and HRegisoner downtime. If HRegisoner is down, HMaster will redistribute the region it manages to other active RegionServer. Since the data and logs are persisted in the HDFS, this operation will not result in data loss. Therefore, the consistency and security of the data are guaranteed. If HMaster is down, HMaster does not have a single point of problem. Multiple HMaster can be started in HBase, and a Master is always running through Zookeeper's Master Election mechanism. That is, ZooKeeper will ensure that there will always be a HMaster providing services.

At this point, the study of "what are the interview questions for big data Hbase" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.