How to use AWS S3 offloader to unload data stored in BookKeeper 07/06 Update SLTechnology News&Howtos

How to use AWS S3 offloader to unload data stored in BookKeeper

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces how to use AWS S3 offloader to uninstall the data stored in BookKeeper, the content is very detailed, interested friends can refer to, hope to be helpful to you.

For large amounts of data that do not need to be accessed quickly, it is recommended that you use Apache Pulsar's built-in feature-tiered storage-which is also a native advantage of Pulsar sharding architecture.

With tiered storage, you can unload data from Apache BookKeeper to scalable, unlimited inexpensive cloud native storage (such as AWS S3 or Google Cloud Storage) or filesystem, build high-performance message clusters and reduce operation and maintenance costs.

AWS S3 offloader is a Pulsar plug-in hosted on StreamNative Hub.

How to unload data stored in BookKeeper to AWS S3 through AWS S3 offloader.

Installation

Follow these steps to install AWS S3 offloader. This example uses Pulsar 2.5.1.

? Preparatory work

Apache jclouds:2.2.0 or later

1. Choose any of the following ways to download the Pulsar package:

Download from Apache mirror

Https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz

Download from the Pulsar download page

Https://pulsar.apache.org/download

Download through the wget command

Wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz

two。 Download and extract the Pulsar offloaders installation package.

Wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz

Tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz needs to note:

When running Pulsar in a bare metal cluster, you need to make sure that the unzipped installation file `offloaders` is available in the Pulsar directory where each broker resides.

When running Pulsar in Docker or deploying Pulsar using Docker image (such as K8S, DCOS), you can use `apachepulsar/pulsar- all` image instead of `apachepulsar/ pulsar` image. `apachepulsar/pulsar- all` image already contains tiered storage offloaders.

3. Create an offloaders folder in your local Pulsar directory and copy the unzipped Pulsar offloaders file here.

Mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders

Ls offloaders > > output

As shown in the output below, Pulsar supports AWS S3 and GCS through Apache jclouds (https://jclouds.apache.org)).

Tiered-storage-file-system-2.5.1.nar tiered-storage-jcloud-2.5.1.nar

Make use of

The following are the detailed steps for using AWS S3 offloader in Pulsar.

Step 1: configure AWS S3 offloader driver

Before using AWS S3 offloader driver, you need to configure some properties for it. For more information on how to configure AWS S3 offloader driver properties, see:

Https://hub.streamnative.io/offloaders/aws-s3/2.5.1/#configuration

This example assumes that the following parameters have been configured and Pulsar is running in stand-alone mode.

In `standalone.conf`, configure the following parameters.

ManagedLedgerOffloadDriver=aws-s3 s3ManagedLedgerOffloadBucket=test-pulsar-offload s3ManagedLedgerOffloadRegion=us-west-2 needs to note:

To speed up ledger polling, you can do the following configuration in a test environment, but it is not recommended in a production environment.

ManagedLedgerMinLedgerRolloverTimeMinutes=1managedLedgerMaxEntriesPerLedger=5000

In `conf/pulsar_ env.sh`, configure the following parameters.

Export AWS_ACCESS_KEY_ID=ABCDEFG123456789export AWS_SECRET_ACCESS_KEY= QWERYHBDSSGJBVCCDCCC step 2: create an AWS S3 bucket

Before uploading data to AWS S3, you need to create a bucket in the AWS area (Region) to store the data. After you create a bucket, you can upload unlimited capacity of data to the bucket.

You can configure bucket properties, including geographic areas, access settings for objects in the bucket, and other metadata.

1. Log in to the AWS Management Dashboard and open the Amazon S3 Dashboard.

two。 Click create Bucket.

3. Set the bucket name and [Region).

It is important to note:

After the bucket is created, the bucket name cannot be changed. For more information about bucket naming, see bucket naming rules:

Https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html#bucketnamingrules

AWS S3 offloader driver has been configured before. The `bucket name `here should be the same as the configured `s3ManagedLedgerOffloadBucket` value, and the `zone name` should be the same as the configured `s3ManagedLedgerOffloadBucon` value.

4. In the Bucket's Block Public access setting, select the public access you want to set for the bucket.

5. Click create Bucket. Now you have successfully created a bucket.

Step 3: create a group

1. Log in to the AWS Management Dashboard and open the IAM Dashboard.

two。 In the left navigation bar, click Group > create a new group.

3. In the group name dialog box, fill in the group name and click next.

4. In the list of additional policies, select the policy that you want to apply to all members of the group, and click next.

5. Check all selected configurations and select to create a new group.

When you have successfully created a new group, you will see the interface shown in the following figure.

Step 4: create a user

1. Log in to the AWS Management Dashboard and open the IAM Dashboard.

two。 In the navigation pane, select user > add user.

3. Enter the user name (case-insensitive) and select the AWS access type.

4. Click next: permissions.

5. In the set permissions screen, select the permissions you want to set for the user.

6. Select the next step: label.

7. (optional) add tags for the user and select the next step: audit.

It is important to note:

For more information about tags in IAM, see "tagging IAM users and roles":

Https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags.html

8. Check all settings and select create user.

9. Click the display next to the password to view the user's access key (`access key ID` and `private access key `).

AWS S3 offloader driver has been configured before. The `access key ID` here should be consistent with the configured `AWS_SECRET_ACCESS_ ID` value, and the `private access key `value should be the same as the configured `AWS_SECRET_ACCESS_ KEY` value.

It is important to note:

Click download .csv to save the access key file, which is the only chance to view or download the access key. Before the user uses AWS API, the access key information needs to be provided to the user. Therefore, you need to properly save the user's new access key ID and private access key, and you will not be able to obtain the access key after this step.

Step 5: unload data from BookKeeper to AWS S3

The following commands are executed in the same folder as the local Pulsar (for example, `~ / path/to/apache-pulsar- 2.5.1`).

1. Start Pulsar in stand-alone mode.

. / bin/pulsar standalone-a 127.0.0.1

two。 To ensure that the generated data is not deleted immediately, it is recommended that you set a retention policy. The retention policy can be set to a size limit or a time limit, and the higher the value you set, the longer the data is retained.

. / bin/pulsar-admin namespaces set-retention public/default-- size-10G-- time 3D prompt:

For more information about the `pulsar-admin namespaces set-retention options` command (including flags, description, default value, etc.), please see:

Http://pulsar.apache.org/tools/pulsar-admin/2.6.0-SNAPSHOT/#-em-set-retention-em-

3. Use pulsar-perf production data.

. / bin/pulsar-perf produce-r 1000-s 2048 test-topic

4. The uninstall operation will not begin until Ledger is switched over. To ensure that the uninstall operation executes successfully, it is recommended that you wait for a few more ledger to switch (in this case, wait about 1 second). You can use pulsar-admin to view the ledger status.

. / bin/pulsar-admin topics stats-internal test-topic

5. After the Ledger switch, you can trigger the uninstall operation manually.

In addition, you can also set the uninstall operation to be triggered automatically. For more information on how to set up an automatic uninstall operation, see:

Https://hub.streamnative.io/offloaders/aws-s3/2.5.1/#configure-aws-s3-offloader-to-run-automatically

. / bin/pulsar-admin topics offload-- size-threshold 10m public/default/test-topic

> > output

Offload triggered for persistent://public/default/test-topic for messages before 12VOUR 0RHI 1

6. Check the status of the uninstall operation.

. / bin/pulsar-admin topics offload-status-w public/default/test-topic

The uninstall operation may take some time.

> > output

Offload was a success

After the operation is complete, the data is successfully unloaded to AWS S3.

On how to use AWS S3 offloader to uninstall data stored in BookKeeper to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.