How to install HBase performance testing tool YCSB 04/28 Update SLTechnology News&Howtos

How to install HBase performance testing tool YCSB

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to install the HBase performance testing tool YCSB. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

YCSB

I. background concept

English full name: Yahoo! CloudServing Benchmark (YCSB). Is a tool from Yahoo for basic testing of cloud services. The goal is to promote the performance comparison of the new generation of cloud data service systems. Set of core basic tests and results reports for four widely used systems: Cassandra, HBase, PNUTS, and a simple chip MySQL implementation.

II. Resource acquisition

First of all, download the source code compilation or download the software package directly on the official website.

Https://github.com/brianfrankcooper/YCSB/releases/tag/0.10.0

It is troublesome to compile the maven project and other resource packages, so it is recommended to download the package directly.

Add a little bit of compilation:

Download the latest source code

Extract it locally and enter the source root directory YCSB-0.10.0

If you want to compile the full version, type it directly.

Mvn clean package

If the compilation is successful, there will be a corresponding ycsb package in the YCSB-0.10.0/distribution directory. Copy and decompress it and use it.

The full version of ycsb compiled by this method is suitable for a variety of databases, so it depends on too many libraries, the object file is too large and takes too long, and it is not recommended to compile in this way.

It is recommended that you create a separate ycsb input that can only be used to test a database (such as hbase).

Mvn-pl com.yahoo.ycsb:hbase10-binding-am clean package # that's how I made it up

Compilation result excerpt:

... ..

[INFO] Building tar: / opt/YCSB-0.10.0/hbase10/target/ycsb-hbase10-binding-0.10.0.tar.gz

[INFO]

[INFO] Reactor Summary:

[INFO]

[INFO] YCSB Root.. SUCCESS [1.903s]

[INFO] Core YCSB.. SUCCESS [8.384s]

[INFO] Per Datastore Binding descriptor. SUCCESS [0.497s]

[INFO] YCSB Datastore Binding Parent. SUCCESS [0.582s]

[INFO] HBase 1.0 DB Binding.. SUCCESS [51.209s]

[INFO]

[INFO] BUILD SUCCESS

[INFO]

[INFO] Total time: 1:03.143s

[INFO] Finished at: Sat Jul 09 11:30:05 PHT 2016

[INFO] Final Memory: 52M/1694M

[INFO]

If this method is compiled successfully, there will be

The ycsb-hbase10-binding-0.10.0.tar.gz package is our target file, which can be used after decompression.

The mvn repository in the environment uses the resources on 172.7.1.216. In the actual compilation process, you will encounter the problem that the dependency package does not match or does not exist. You can manually search and download the corresponding resources on search.maven.org and put them in the corresponding directory of the local ~ / .m2/repository/.

We tested hbase here, so we downloaded ycsb-hbase10-binding-0.10.0.tar.gz directly (note that this is only applicable to hbase10, that is, hbase1.0. There is a software package applicable to all databases on the official website. 300m is too large for the network and it is not necessary to go down).

III. Configuration and use

The first thing to do here is to check whether the configuration in hbase-site.xml is correct, otherwise an error will be reported when running ycsb

Hbase.regionserver.global.memstore.size

0.4

Hfile.block.cache.size

0.2

Decompress and use the downloaded ycsb-hbase10-binding-0.10.0.tar.gz

Directory structure:

[root@node1 ycsb-hbase10-binding-0.10.0] # ll

Total 24

Drwxr-xr-x 2 root root 46 Jul 8 15:12 bin

Drwxr-xr-x 2 root root 4096 Jul 8 15:12 lib

-rw-r--r-- 1 root root 8082 Jul 7 20:45 LICENSE.txt

-rw-r--r-- 1 root root 615 Jul 7 20:45 NOTICE.txt

-rw-r--r-- 1 root root 5484 Jul 7 20:45 README.md

Drwxrwxr-x 2 root root 126 Jul 7 20:45 workloads

Here's an additional introduction to the directory structure of a full version of ycsb

[root@node1 ycsb-0.1.4] # ll

Total 20

Drwxrwxrwx 2 501 games 31 Feb 24 2012 bin

Drwxr-xr-x 3 root root 16 Jul 8 11:49 cassandra-binding

-rw-r--r-- 1 501 games 2291 Feb 24 2012 CHANGELOG

Drwxr-xr-x 3 root root 16 Jul 8 11:49 core

Drwxr-xr-x 4 root root 27 Jul 8 11:49 gemfire-binding

Drwxr-xr-x 4 root root 27 Jul 8 11:49 hbase-binding

Drwxr-xr-x 4 root root 27 Jul 8 11:49 infinispan-binding

Drwxr-xr-x 4 root root 27 Jul 8 11:49 jdbc-binding

-rw-r--r-- 1 501 games 8082 Feb 19 2012 LICENSE.txt

Drwxr-xr-x 3 root root 29 Jul 8 11:49 mapkeeper-binding

Drwxr-xr-x 3 root root 16 Jul 8 11:49 mongodb-binding

Drwxr-xr-x 4 root root 40 Jul 8 11:49 nosqldb-binding

-rw-r--r-- 1 501 games 479 Feb 19 2012 NOTICE.txt

-rw-r--r-- 1 501 games 952 Feb 24 2012 README

Drwxr-xr-x 3 root root 16 Jul 8 11:49 redis-binding

Drwxr-xr-x 2 root root 27 Jul 8 14:38 results

Drwxr-xr-x 4 root root 27 Jul 8 11:49 voldemort-binding

Drwxrwxrwx 2 501 games 102 Feb 22 2012 workloads

The full version needs to manually copy the relevant hbase libraries to the lib directory of hbase-binding, and copy the hbase-site.xml to the conf directory of hbase-binding.

While our ycsb-hbase10-binding-0.10.0 is more targeted and more concise, there is no need to manually cp all kinds of library files under the lib directory from the lib directory of hbase, and there is no conf directory, we just need to use the ycsb under bin directly.

Here's how to use it:

1. Create a table on hbase. YCSB needs to create a table called usertable in HBase. The table contains a Cloumn Family,CF name that can be customized. Execute the following two commands successively:

Hbase (main): 011 n_splits=120

= > 120

Hbase (main): 015usertable','family', 0 > create 'usertable','family', {SPLITS = > (1...n_splits) .map {| I | "user# {1000 * (9999-1000) / n_splits}"}}

0 row (s) in 18.3610 seconds

= > Hbase::Table-usertable

Command meaning: pre-splittingstrategy pre-partition and table building

Using HBase shell to build tables while establishing some pre-partitions can prevent hot issues when inserting data for the first time.

2. Use the following command to test

First initialize the data

-cp specifies the path / usr/hdp/2.4.2.0-258/hbase/conf/ where the configuration file of hbase is located (as mentioned earlier, the stripped-down version of ycsb does not have a conf directory)

Users first need to initialize the database with load, and then run the load with run. Dbname specifies the target database.

Currently, YCSB comes with six loads (under the workloads/ directory), five of which are shown in the table below. Users can customize the proportion of operations (read, update, insert, and scan) and the distribution of selected action target records: Uniform (randomly selected records with equal probability), Zipfian (randomly selected records with thermal records), and Latest (recently written records are thermal records).

-P specifies the location of the load file.

-p is used to set some parameters, such as ip and port of the database. Of course, the target database must be open before running YCSB. After the test is complete, YCSB prints information such as average / minimum / maximum delay. It can be followed by the parameter threads recordcount and so on.

-s outputs the running status, which is useful when running for a long time

[root@node1 ycsb-hbase10-binding-0.10.0] # bin/ycsb load hbase10- P workloads/workloada-cp/usr/hdp/2.4.2.0-258/hbase/conf/-p table=usertable-p columnfamily=family-s-threads10-p recordcount=100

# what is loaded here is the workloada load, with half of read and half of update operations

2016-07-08 16 util.DynamicClassLoader 1915 37129 WARN [Thread-2] util.DynamicClassLoader: Failed to identify the fs of dir hdfs://node1.dcom:8020/apps/hbase/data/lib, ignored

Java.io.IOException: No FileSystem for scheme: hdfs this error can be ignored!

At org.apache.hadoop.fs.FileSystem.getFileSystemClass (FileSystem.java:2579)

At org.apache.hadoop.fs.FileSystem.createFileSystem (FileSystem.java:2586)

At org.apache.hadoop.fs.FileSystem.access$200 (FileSystem.java:89)

At org.apache.hadoop.fs.FileSystem$Cache.getInternal (FileSystem.java:2625)

At org.apache.hadoop.fs.FileSystem$Cache.get (FileSystem.java:2607)

At org.apache.hadoop.fs.FileSystem.get (FileSystem.java:368)

At org.apache.hadoop.fs.Path.getFileSystem (Path.java:296)

At org.apache.hadoop.hbase.util.DynamicClassLoader. (DynamicClassLoader.java:104)

At org.apache.hadoop.hbase.protobuf.ProtobufUtil. (ProtobufUtil.java:232)

At org.apache.hadoop.hbase.ClusterId.parseFrom (ClusterId.java:64)

At org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode (ZKClusterId.java:75)

At org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId (ZooKeeperRegistry.java:86)

At org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId (ConnectionManager.java:833)

At org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation. (ConnectionManager.java:623)

At sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)

At sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:57)

At sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)

At java.lang.reflect.Constructor.newInstance (Constructor.java:526)

At org.apache.hadoop.hbase.client.ConnectionFactory.createConnection (ConnectionFactory.java:238)

At org.apache.hadoop.hbase.client.ConnectionFactory.createConnection (ConnectionFactory.java:218)

At org.apache.hadoop.hbase.client.ConnectionFactory.createConnection (ConnectionFactory.java:119)

At com.yahoo.ycsb.db.HBaseClient10.init (HBaseClient10.java:149)

At com.yahoo.ycsb.DBWrapper.init (DBWrapper.java:99)

At com.yahoo.ycsb.ClientThread.run (Client.java:418)

At java.lang.Thread.run (Thread.java:745)

Print and analyze the results after the Ycsb is run:

[OVERALL], RunTime (ms), 2787.0 time to load data: 2.787 seconds

[OVERALL], Throughput (ops/sec), 35.88087549336204 load operation throughput, average concurrency 35.88 per second

[TOTAL_GCS_PS_Scavenge], Count, 1.0

[TOTAL_GC_TIME_PS_Scavenge], Time (ms), 20.0

[TOTAL_GC_TIME_%_PS_Scavenge], Time (%), 0.7176175098672408

[TOTAL_GCS_PS_MarkSweep], Count, 0.0

[TOTAL_GC_TIME_PS_MarkSweep], Time (ms), 0.0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time (%), 0.0

[TOTAL_GCs], Count, 1.0

[TOTAL_GC_TIME], Time (ms), 20.0

[TOTAL_GC_TIME_%], Time (%), 0.7176175098672408

[CLEANUP], Operations, 2.0Total number of operations performing cleanup, 2

[CLEANUP], AverageLatency (us), 63575.0 average response time 63.575ms

[CLEANUP], MinLatency (us), 14.0 minimum response time 0.014ms

[CLEANUP], MaxLatency (us), 127167.0 maximum response time 127.167ms

[CLEANUP], 95thPercentileLatency (us), 127167.0 cleanup operation latency is less than 127.167ms

[CLEANUP], 99thPercentileLatency (us), 127167.0 cleanup operation latency is less than 127.167ms

[INSERT], Operations, 100.0 Total number of insert operations performed

[INSERT], AverageLatency (us), 13681.54 average delay per insert operation, 13.68154ms

[INSERT], MinLatency (us), 5556.0 minimum delay for all insert operations, 5.556ms

[INSERT], MaxLatency (us), 201343.0 maximum delay for all insert operations, 201.343ms

[INSERT], 95thPercentileLatency (us), 30063.0 insert operation latency is less than 30.063ms

[INSERT], 99thPercentileLatency (us), 53183.0 insert operation latency is less than 53.183ms

[INSERT], Return=OK, 1000 successful returns, 1000

Then execute the run program

[root@node1ycsb-hbase10-binding-0.10.0] # bin/ycsb run hbase10- Pworkloads/workloada-cp / usr/hdp/2.4.2.0-258/hbase/conf/-p table=usertable-pcolumnfamily=family-s-threads 10-p recordcount=100

Ycsb runs to print and analyze the results:

[OVERALL], RunTime (ms), 6921.0 time to load data: 6.921 seconds

[OVERALL], Throughput (ops/sec), 144.48779078167894 load operation throughput with an average of 144.48 concurrency per second

[TOTAL_GCS_PS_Scavenge], Count, 1.0

[TOTAL_GC_TIME_PS_Scavenge], Time (ms), 20.0

[TOTAL_GC_TIME_%_PS_Scavenge], Time (%), 0.2889755815633579

[TOTAL_GCS_PS_MarkSweep], Count, 0.0

[TOTAL_GC_TIME_PS_MarkSweep], Time (ms), 0.0

[TOTAL_GC_TIME_%_PS_MarkSweep], Time (%), 0.0

[TOTAL_GCs], Count, 1.0

[TOTAL_GC_TIME], Time (ms), 20.0

[TOTAL_GC_TIME_%], Time (%), 0.2889755815633579

[CLEANUP], Operations, 2.0Total number of operations performing cleanup, 2

[CLEANUP], AverageLatency (us), 71591.5 average response time 71.5915ms

[CLEANUP], MinLatency (us), 15.0min response time 0.015ms

[CLEANUP], MaxLatency (us), 143231.0 maximum response time 143.231ms

[CLEANUP], 95thPercentileLatency (us), 143231.0 insert operation latency is less than 143.231ms

[CLEANUP], 99thPercentileLatency (us), 143231.0 insert operation latency is less than 143.231ms

[READ], Operations, 480.0 Total number of operations performed by read

[READ], AverageLatency (us), 5027.9625 average response time 5.027ms

[READ], MinLatency (us), 2254.0 minimum response time 2.254ms

[READ], MaxLatency (us), 158847.0 maximum response time 158.847ms

[READ], 95thPercentileLatency (us), 10767.0 read operation latency is less than 10.767ms

[READ], 99thPercentileLatency (us), 14599.0 read operation latency is less than 14.599ms

[READ], Return=OK, 480 successful returns, 480

[UPDATE], Operations, 520.0 Total number of operations performed by read

[UPDATE], AverageLatency (us), 5812.123076923077 average response time 5.812ms

[UPDATE], MinLatency (us), 3302.0 minimum response time 3.302ms

[UPDATE], MaxLatency (us), 86207.0 maximum response time 86.207ms

[UPDATE], 95thPercentileLatency (us), 9991.0 read operation latency is less than 9.991ms

[UPDATE], 99thPercentileLatency (us), 11839.0 insert operation latency is less than 11.839ms

[UPDATE], Return=OK, 520 successful returns, 520

IV. Introduction of use cases

Here is an introduction to our use case approach to hbase performance testing with ycsb.

First of all, the test is done directly using ycsb_load.sh and ycsb_run.sh scripts, and the user only needs to customize a configuration file similar to the workload type.

Corresponding directory structure:

[root@node5 test] # pwd

/ root/ycsb-hbase10-binding-0.10.0/workloads/test

[root@node5 test] # ll

Total 4338592

-rw-r--r-- 1 root hadoop 1189 Aug 27 22:35 TR1003

-rw-r--r-- 1 root hadoop 4080039015 Aug 29 03:12 TR1003.report

-rw-r--r-- 1 root hadoop 800 Aug 30 16:20 TR1004

-rw-r--r-- 1 root hadoop 306542869 Aug 30 16:31 TR1004.report

-rw-r--r-- 1 root hadoop 751 Aug 30 10:06 TR1005

-rw-r--r-- 1 root hadoop 56106292 Aug 30 10:20 TR1005.report

-rw-r--r-- 1 root hadoop 631 Aug 23 19:07 workload_test_template

-rwxr-xr-x 1 root hadoop 590 Aug 23 17:52 ycsb_load.sh

-rwxr-xr-x 1 root hadoop 609 Aug 30 16:35 ycsb_run.sh

The suffix file named report is the print record of all terminals when the script is running and the ycsb is running (the file is large and can be deleted after the key value is obtained after testing).

The content of the Ycsb_load.sh script:

Echo "* Loading test begin*"

# define the path of ycsb

YCSB= "/ root/ycsb-hbase10-binding-0.10.0"

# define the path of hbase_site.xml

Hbase=$YCSB "/ bin"

# define test path

Test=$YCSB "/ workloads/test/"

# define the file path of workload

Workload=$test$1

# define the log file name

Report=$test$1 ".report"

# define ycbs runner

Runner=$YCSB "/ bin/ycsb"

# define measurement param

Raw_file=$test$1 ".raw"

Measurement_param= "measurement.raw.output_file=" $raw_file

# run test

$runner load hbase10-cp $hbase-P $workload-s-jvm-args='-Xmx32g' 1 > > $report 2 > > $report

Echo "* Loading test end*"

The content of the script is relatively simple, and the content of the ycsb_run.sh script is almost the same as above (the load in the ycsb run command in the last sentence is changed to run), where the $1 variable is the configuration file specified by the user at run time (workload, TR1003, etc.).

-jvm-args='-Xmx32g' is used to configure the memory size of the ycsb runtime jvm virtual machine, which means to allocate up to 32 gigabytes of memory to the ycsb clinet process

Specific usage: sh ycsb_load.sh TR1003

Ycsb profile TR1003 (works the same as a workload file)

# The thread count

Threadcount=20

# The number of fields in a record

Fieldcount=1

# The size of each field (in bytes)

Fieldlength=9216

# Number of Records will be loaded

Recordcount=1500000000

# Number of Operations will be handle in run parsh

Operationcount=1500000000

Readallfields=true

Insertorder=hashed

Insertstart=0

Insertcount=500000000

# Control Porption of hbase operation type

Readproportion=0

Updateproportion=0

Scanproportion=1

Insertproportion=0

# The following param always be fixed

# The table name

Table=usertable

# The colume family

Columnfamily=cf

# The workload class

Workload=com.yahoo.ycsb.workloads.CoreWorkload

# The measurement type

Measurementtype=raw

Clientbuffering=true

Writebuffersize=25165824

# requestdistribution=zipfian

This configuration file is used alone when loading the database in the load phase, where the clientbuffer configuration item is to configure the write cache on the hbase client side, and the configuration write cache can reduce the rpc overhead when writing hbase operations. The specific role can be seen in the hbase test tuning document, which is not matched by default. The default value of this configuration item is writebuffersize=1024*1024*12 (12m), and requestdistribution refers to the way the operation requirements are distributed.

Run phase profile TR1003:

# The thread count

Threadcount=100

# The number of fields in a record

Fieldcount=1

# The size of each field (in bytes)

Fieldlength=9216

# Number of Records will be loaded

Recordcount=1500000000

# Number of Operations will be handle in run parsh

Operationcount=2000000

Readallfields=true

# insertorder=hashed

# insertstart=0

# insertcount=500000000

# Control Porption of hbase operation type

Readproportion=0

Updateproportion=0

Scanproportion=1

Insertproportion=0

# The following param always be fixed

# The table name

Table=usertable

# The colume family

Columnfamily=cf

# The workload class

Workload=com.yahoo.ycsb.workloads.CoreWorkload

# The measurement type

Measurementtype=raw

Maxscanlength=1000000

# hbase.usepagefilter=false

# scanlengthdistribution=zipfian

# requestdistribution=latest

This configuration file is used in the run phase in conjunction with ycsb_run.sh, where the run phase is mainly about reading data from the hbase database.

Scan:

The Maxscanlength configuration item is used to specify the number of scan (implemented in ycsb code by taking the random value in 1~Maxscanlength as the number of scan at a time)

Whether the hbase.usepagefilter configuration item is the result of scan once is displayed in pages. It is enabled by default.

The scanlengthdistribution configuration item determines how the value is taken from the 1~Maxscanlength interval or the default uniform (equal probability random distribution).

It is also worth noting that in performing scan operations, the number of ycsb operations per second currentops/sec may be very small:

2016-08-30 04 days 21hours SCAN count 21 40 current ops/sec; est completion in 791 60 sec: 1 operations; 0.1 current ops/sec; est completion in 1388 days 21hours SCAN count: 1, average latency (us): 49083737.00

This is due to the ycsb calculation method, which only calculates the specific operands at each time, not the total number of scan, so 0.1 can be understood as performing one scan operation in 10 seconds, so the average Operand for these 10 seconds is 0.1 current ops/sec.

For scan operations, we can use bandwidth to calculate how many pieces of data are scanned by a single operation.

Read:

Read's words are similar to the output of the load phase, looking for one read at a time.

Attention! :

In fact, the Load phase is to load the data and insert the data into the hbase, the recordcount in the workload file is the number of entries to be inserted, the run phase is to perform various operations on the hbase, and operationcount is the Operand, so the load phase must be completed correctly, otherwise there will be errors like [READ-FAILED] in the run phase.

Thank you for reading! This is the end of the article on "how to install the HBase performance testing tool YCSB". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.