How to use YCSB for HBase performance testing 07/02 Update SLTechnology News&Howtos

How to use YCSB for HBase performance testing

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to use YCSB for HBase performance testing. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

When running any performance benchmark tool on a cluster, the key decision is always what dataset size should be used for performance testing, and here we demonstrate why choosing the "right" dataset size when running HBase performance is important to test on your cluster.

HBase cluster configuration and dataset size may change the performance and test results of workloads on the same cluster. You should choose this dataset size based on what you want to know about cluster performance. To show the difference between the available memory cache and a workgroup that matched us to read from the underlying storage 2 YCSB workload with the same CDP Private Cloud Foundation 7.2.2 operating database cluster selected the appropriate dataset size test. The dataset sizes we used are 40GB and 1TB, and the throughput of different YCSB workloads is compared below. In the chart, the higher the column, the better the throughput.

Note: throughput = operations per second

When an application attempts to read data from a HBase cluster, the zone server that processes the request first checks to see if the desired result already exists in its block cache in the data block local to its process. If there is a data block, the customer request can be served directly from the cache, which counts as a cache hit. However, if the block is not currently local to the zone server process, it is counted as a cache miss and must be read from the HFile in the HDFS store. Then, depending on the cache utilization, the block is saved in the cache for future requests.

As expected and shown in the summary figure, workloads with most datasets suitable for caching have lower latency and higher throughput than workload runs that access data from HFiles in hdfs storage.

In order to select the workload dataset size to meet our test goals, it is important to check the size of the RegionServer heap, L1 and L2 cache, and OS buffer cache, and then set the appropriate dataset size. After the YCSB workload is running, you can check a good parameter as a way to verify that things are working as expected, that is, how much data is provided from the cache (cache hits) and how much data is accessed from the hdfs store. The ratio of cache hits to total read requests by the zone server is the cache hit ratio.

You can find this information in the L1 cache hit ratio "l1CacheHitRatio" configuration. If both L1 and L2 caches are set up in the cluster, L1 cache serves index blocks, L2 cache serves data blocks, and you can record L1 "l1CacheHitRatio" and L2 "l2CacheHitRatio" configurations for reference.

The rest of this post details the test settings, select the dataset size, and then use those dataset sizes to run YCSB.

HBase cluster configuration for this test

Clusters used: 6 node clusters (1 master node + 5 regional servers)

Description: Dell PowerEdge R430, 20c / 40t Xenon e5-2630 v4 @ 2.2 GHz Ram,4-2TB disk

Security: not configured (no Kerberos)

CDP version: CDP private cloud Base 7.2.2 6-node HBase cluster with 1 master server + 5 regional servers

JDK uses jdk1.8_232

The HBase Region server is configured with a 32GB heap

HBase master station has been configured with 4GB heap

L1 cache with LruBlockCache for 12.3 GB cache size

The total number of L1 caches in the cluster is 61GB (12.3 * 5 = 61GB)

L2 out-of-heap cache is not configured on the cluster

Sizing case 1: the data fits perfectly into the available cache on the cluster

In our HBase cluster, we configured a total of 61GB (12.3GB * 5) among the five regional servers allocated to the L1 block cache. For a dataset that is fully suitable for caching, we chose a dataset of 40GB size.

Sizing case 2: the dataset is larger than the available cache on the cluster

In the second case, we want the data to be much larger than the available cache. To select the appropriate dataset size, we looked at the configured HBase block cache and OS buffer cache in the cluster. In a given HBase cluster, the configured L1 block cache is 61G when aggregating across RegionServer. The server node has a total of 128 gigabytes of RAM,OS per server that can use any memory not dedicated to server processes to effectively cache underlying HDFS blocks and improve overall throughput. In our test configuration, there is approximately 96G OS cache on each zone server node for this purpose (ignoring the memory used by the DataNode or OS processes to simplify the operation). Summing up on five regional servers, we have about 500G buffer (96G * 5 regional servers) potential. Therefore, we chose the dataset size of 1TB

Convert the target data size to YCSB parameters in YCSB, which is a behavior 1KB by default, so you can easily estimate the YCSB user Table data size based on the number of rows loaded into the YCSB user Table. Therefore, if you upload 1 million rows, the data of 1000000 * 1KB = 1GB has been uploaded to the YCSB user Table.

The dataset sizes used by our two tests are:

40 GB data and 40 million rows

1 TB data and 1 billion rows

Testing method

CDP Private Cloud Foundation 7.2.2 was installed on a 6-node cluster, generated 40 million rows of workload data (total dataset size = > 40 GB), and ran the YCSB workload. After loading, we wait for all compression operations to complete before starting the workload test. The YCSB workload running on HBase is

Workload 50% read and 50% update

Workload CRO 100% read

Workload FRO 50% read and 50% update / read-modify-write ratio: 50max 50

Custom update workload only: 100% UPDAT

Each of the YCSB workloads (Agraine Cpeng F and UpdateOnly) runs for 15 minutes and repeats the full run for 5 times without restarting between runs to measure YCSB throughput *. The result displayed is the average of the last three of the five runs. In order to avoid the loss of the first round and the second round, the first two tests were ignored.

Once we finished running 40GB, we discarded the available users and regenerated 1 billion rows to create the dataset size of 1TB and rerun the test on the same cluster in the same way.

Test results

YCSB result is 40GB

In the case of 40GB, the data can be fully contained in the 61GB L1 cache on the cluster. During the test, the L1 cache hit rate observed in the cluster was close to 99%.

Tip: for smaller datasets, the data can be cached, and we can also use the "cache on load" option and use the table option PREFETCH_BLOCKS_ON_OPEN to warm the cache to get a 100% cache hit ratio.

Each YCSB workload runs for 15 minutes every five times and averages the last three runs to avoid the loss of the first run.

The following table shows the results seen when the 40G L1 cache hit ratio reaches 99% on the regional server:

Mode of operation

Digital action

Flux

Average delay

95 delay

99 waiting time

(operations per second)

(multiple sclerosis)

Workload C

148558364

165063

0.24

0.30

0.48

Update only

56727908

63030

0.63

0.78

1.57

Workload A

35745710

79439

0.40

0.54

0.66

Workload F

24823285

55157

0.58

0.70

0.96

YCSB results with 1TB dataset

In the case of 1TB, the data cannot be put into the 61GB L1 cache or the 500GB OS buffer cache on the cluster. The L1 cache hit rate observed in the cluster during the test was 82-84%.

We run each workload for 15 minutes every five times and take the average of the last three runs to avoid the loss of the first run.

The following table shows the results seen when the 1TB L1 cache hit ratio reaches 82-84% on the regional server:

Mode of operation

Digital action

Flux

Average delay

95 delay

99 waiting time

(operations per second)

(multiple sclerosis)

Workload C

2727037

3030

13.19

55.50

110.85

Update only

56345498

62605

0.64

0.78

1.58

Workload A

3085135

6855

10.88

48.34

97.70

Workload F

3333982

3704

10.45

47.78

98.62

* throughput (ops / sec) = operations per second

Analysis.

Comparing the test results of the above two different dataset sizes, we can see how the same workload throughput changes from 3K operations per second to 165k operations per second when accessing data faster from a 40G dataset with a prefetch cache instead of quickly accessing data from hdfs. Storage.

The following chart shows the throughput and compares how the throughput changes with different workloads when running with two different sizes of datasets. In the chart, the higher the bar, the better the throughput.

Note: throughput = operations per second

As shown in the figure, YCSB workloads that read data such as Workload Aforce workload C and Workload F have better throughput in 40G cases, data is easily cached in 40G cases, and HFile data must be used in cases of 1TB data size. Access from HDFS

In terms of cache hit ratio, the cache hit rate of 40G datasets is close to 99%, while that of 1TB datasets is about 85%, so in the case of 1TB, 15% of the data is accessed from hdfs storage.

In both cases, the YCSB custom update-only workload we ran had the same throughput because it was updated and not read.

During HBase performance, we keep a close eye on the 95th and 99th percentile delays. The average latency is only the total throughput divided by the total time, but the 95th and 99th percentiles show the actual outliers that affect the total workload throughput. In the case of 1TB, the high latency outliers in the 95th and 99th percentiles lead to a decrease in throughput, while in the case of 40GB, a low latency hit in the 99th percentile leads to an increase in total throughput.

The following figure shows the average delay, the delay comparison between the 95th percentile delay and the 99th percentile delay, and the difference in latency among different workloads when running with different sizes of datasets.

In the chart above, it is difficult to see the bar charts that represent the latency of the 40GB dataset because they are very low compared to the latency observed by the 1TB dataset accessing the data from HDFS.

We use the logarithm of the wait time value to draw a waiting time graph to show the differences in the following table

As shown above, in the case of 40GB, the cache hit rate is close to 99%, and most of the workload data is available in the cache, so the latency is much lower. Compared to 1TB datasets, because HFile data must be accessed from HDFS storage, the cache hit ratio is about 85%.

In the case of 40G, the average latency and 99 latency of Workload C, which returns 99% data from the warm-up cache, is about 2-4 ms. In the case of 1TB, the 99th percentile delay of the same Workload C is approximately 100ms for Workload C (read-only workload).

This indicates that the cache hit from the block cache on the heap returns a read within about 2 ms, and that it may take about 100 ms for the cache miss and to get the record from the HDFS.

Suggestion

When running the YCSB benchmark, the size of the dataset can have a significant impact on the performance results, so it is important to resize the test appropriately. At the same time, looking at the cache hit ratio and the delay difference between the minimum delay and the 99th delay will help you find the cache hit latency compared to accessing data from the underlying storage in the cluster.

Be careful

To check the cache hit ratio of workloads on a regional server, use the following command

Curl http://:22102/jmx | grep-e l1CacheHitRatio-e l2CacheHitRatio

You can also follow these steps to view the cache hit ratio from HBase Web UI:

In HBase Web UI, click Zone Server

Under the Block Cache section, select L1 (or L2 if L2 is configured) to view the cache hit ratio.

The screenshot shows the cache hit ratio from the L1 block cache, as follows:

This is the link https://docs.cloudera.com/runtime/7.2.2/configuring-hbase/topics/hbase-blockcache.html to more information about the HBase screenshot and block cache shown above

Thank you for reading! This is the end of this article on "how to use YCSB for HBase performance testing". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.