What are the popular questions and answers on Hbase FAQ? 04/21 Update SLTechnology News&Howtos

What are the popular questions and answers on Hbase FAQ?

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what Hbase FAQ popular Q & A has, which can be used for reference. I hope you can learn a lot after reading this article. Let's take a look at it.

Q: Hbase writes very slowly. One column family, each with more than 200 columns, writes 30000 pieces of data per second, uses mutate to add data, clientbuffer cache size is 10m, four test machines, 128g memory, allocates 60GB to Hbase, how to optimize it?

Answer: you can use bulkload to write, produce hfile files through mr programs, and import the generated hfile files directly with bulkload, which is very fast.

Q: hbase lost data on a large scale, the entire database system was down, and then sent an error log saying that the Hdfs internal configuration file, hbase.version, was lost. Have you ever encountered similar problems? A self-built cluster.

A: check whether the ports of some services are exposed to the public network and whether the cluster has been attacked. There are still some risks in self-building. Then check your hbase configuration. Look at the backup of the data.

Q: there is a paragraph in start-hbase.sh:

If ["$distMode" = = 'false'] then "$bin" / hbase-daemon.sh-- config "${HBASE_CONF_DIR}" $commandToRun master $@ else "$bin" / hbase-daemons.sh-- config "${HBASE_CONF_DIR}" $commandToRun zookeeper "$bin" / hbase-daemon.sh-- config "${HBASE_CONF_DIR}" $commandToRun master "$bin" / hbase-daemons.sh-- config "${HBASE_CONF_DIR}"\-- hosts "${HBASE_" REGIONSERVERS} "$commandToRun regionserver" $bin "/ hbase-daemons.sh-- config" ${HBASE_CONF_DIR} "\-- hosts" ${HBASE_BACKUP_MASTERS} "$commandToRun master-backupfidistMode means stand-alone if false True means cluster. It seems that the script only starts master on a stand-alone machine. Does it mean that zookeeper,regionserver is not needed in a stand-alone environment? however, after searching on the Internet, some people say that master and zookeeper will run on the same jvm in a stand-alone environment. Does anyone who is familiar with hbase can answer it?

A: all services in stand-alone mode are started by a jvm process, and the underlying file system is the local file system, which includes zookeeper,hmaster and regionserver. Other modes require you to manually start zk,hmaster,regionserver into different processes.

Q: there are nearly 100 Hbase tags for large-scale user portraits. Is it suitable? A: hbase is suitable for scenarios with hundreds of thousands of levels, and can even support millions of columns, but it is recommended that commonly used columns are less than 10w.

Q: how are the current transactions built into hbase 2 doing? What isolation level is supported? If so, what does hbase distributed transactions rely on? A: hbase transactions are still at region level. Hbase can do cross-line transactions, but only at region level.

Question: what is the faster way to delete hbase data in batches A: the fastest way is to set TTL directly. If the business cannot be satisfied, it is recommended to schedule the delete interface directly for faster performance.

Q: how is query performance optimized for HBase 2.0?

Answer: a large amount of memory garbage and fragmentation will be generated in both the read and write links of HBase. For example, when writing a request, you need to copy data from the ByteBuffer of Connection to the KeyValue structure, and when you write these KeyValue structures into memstore, you need to copy them to MSLAB. The construction of WAL Edit, the flush of Memstore, and so on, will produce a large number of temporary objects and end-of-life objects. As the writing pressure increases, so will the pressure on GC. There are also such problems in the read link, such as the replacement of cache, the decoding of block data, the copy in the write network and so on, which will virtually increase the burden of GC. The full-link offheap function introduced in HBase2.0 is to solve these GC problems. You know that the memory of Java is divided into onheap and offheap, while GC can only clean up the onheap heap. Full-link Offheap means that during the reading and writing process of HBase, the whole life cycle of KeyValue will be carried out in offheap, and HBase manages the memory of offheap on its own, reducing GC pressure and GC pauses.

The offheap of the write link includes the following optimizations:

In the RPC layer, read the KeyValue on the network flow directly into the bytebuffer of offheap.

MSLAB pool using offheap

Use the Protobuf version that supports offheap (3.0 +)

The offheap of the read link mainly includes the following optimizations:

Count BucketCache references to avoid reading copies

Use ByteBuffer as the implementation of the server KeyValue, so that the KeyValue can be stored in the memory of the offheap

A series of performance optimizations for BucketCache are carried out.

Q: does Hbase's bulkload have the concept of full volume and increment?

Answer: snapshot does the full amount, and then bulkload does the increment.

Q: Hive on hbase analyzes more than 1 billion of data performance issues? A: there is a performance penalty. Hive supports manipulating data in hbase through syntax similar to sql statements, but at a slower speed.

Q: what is the performance improvement between reading HFile directly and reading through the Hbase client? A: full table scan uses spark to read HFile, which is more than twice the performance of reading hbase directly, and will not affect other read and write uses of hbase.

Q: how to divide the number of HBase region? A: it is best to have a multiple of your regionserver, which will be automatically assigned to each server, and pay attention to the dispersion of the rowkey. Thank you for reading this article carefully. I hope the article "what are the Hbase FAQ popular questions and answers" shared by the editor will be helpful to you? at the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.