In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how HBase to achieve multi-tenancy, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
Multi-tenancy (multi-tenancy technology), with reference to the Wikipedia definition, is to explore and implement how to share the same system or program in a multi-user environment, while still ensuring data isolation among users. With the advent of the era of cloud computing, multi-tenancy is becoming more and more important for cloud services. Therefore, HBase also has many multi-tenant related functions, which provide the ability of resource isolation for multiple users to share the same HBase cluster. This article will introduce it from three aspects of Namespace&ACL,Quota,RSGroup.
Namespace&ACL
In HBase, creating a namespace is a lightweight operation, and isolating tables from different businesses in different namespace is the easiest way to isolate resources. At the same time, common resource isolation methods such as ACL, quota, rsgroup and so on can be set on namespace.
ACL, the full name Access Control Lists, is used to restrict the operation or access of different users to different resources.
To use ACL, you need to add the following configuration:
1. Several concepts of ACL.
User is divided into ordinary user and super user. Superuser includes users who start the HBase service and users who configure hbase.superuser, and can manage the cluster. Ordinary users need authorization before they can access or operate HBase. Scope can be understood as the granularity of a resource.
The Action required for various operations of HBase can be found in the official documentation of HBase: http://hbase.apache.org/book.html#appendix_acl_matrix
According to the access or operation requirements of users, setting a reasonable user on a reasonable scope is the best way to achieve user authority control.
2. Set or cancel permissions
Set or revoke permissions in HBase shell or by calling HBase API. The operation in shell is shown in the figure:
To set the permission of namespace, you need to prefix it with @:
Set the permissions for Cell:
3. Storage of permissions
Stored in the hbase:acl table, the rowkey is calculated based on scope. The acl table is structured as follows:
Cell permissions are stored using tags of HFile v3.
4. Authentication authority
Authentication permission refers to the determination of whether a user has the right to an operation. This process is done in AccessController, where AccessController is a coprocessor that implements MasterObserver, RegionServerObserver, RegionObserver, and so on, and checks permissions in hook for master, regionserver, region, and so on. Since a complete PermissionCache is maintained on each RS, check that the required permissions are included in the PermissionCache, and if the permissions are insufficient, throw an AccessDeniedException.
5. Add / remove permissions
The process of adding / removing an award is shown in the following figure:
(1) client issues grant or revoke requests to region server with acl region
(2) upon receipt of the requested region server, add the new permission put or delete to the acl table
(3) AccessController in postPut of region and hook of postDelete, if the operation is acl region, the updated permissions are read out from acl table and written to zk
(4) through the monitoring mechanism of zk, notify master and regionserver to update PermissionCache, and realize the synchronization of permissions in master and other regionserver.
6. Add / remove permissions based on Procedure
In order to use Procedure to synchronize permissions, you need to first send the grant/revoke request to master for processing, refer to HBASE-21739. Then in the add / remove permissions phase, there are two key steps, one is to record permissions to the acl table, and the other is to synchronize the updated permissions to all RegionServer. UpdatePermissionProcedure is designed to do this, referring to HBASE-22271 (currently not merged into the community version of the master branch). In the UpdatePermissionStorage phase, update the acl table and PermissionCache on the zk,master. In the UpdatePermissionCacheOnRS phase, initiate the UpdatePermissionRemoteProcedure and update the PermissionCache of the RS.
UpdatePermissionProcedure needs to solve the case of five kinds of permission synchronization:
Grant: add permission
Revoke: delete permission
Delete Namespace: remove all permissions from namespace
Delete Table: remove all permissions from table
Reload: get all the Permission back.
In the new scheme, zk is not used to notify RS to update PermissionCache, only for acl storage. Because acl table is not necessarily online when RS or Master is started, you need to load permission from zk. When the permissions in the acl table are inconsistent with the permissions on the zk, the permissions in the acl table should prevail. Therefore, when master starts and acl table online, initiate a UpdatePermissionProcedure of type Reload, update permission on zk, and update PermissionCache on RS.
Quota&Throttle
Because there is an upper limit on the resources and service capacity of the cluster, Quota is used to limit the amount of data and access speed of each resource.
The quota feature of HBase needs to be enabled with the following configuration:
Several concepts about Quota in HBase and their interrelationships are shown in the following figure:
1 、 Throttle Quota
Throttle limits the number of times a resource is accessed or the amount of data per unit of time.
Supported time units include sec, min, hour, and day.
Use req to limit the number of requests
Use B, K, M, G, T, P to limit the amount of data requested
Use CU to limit requested read / write capacity units. A read / write capacity unit refers to a request that reads / writes less than 1KB data at a time. If a request reads 2.5K of data, it consumes 3 capacity units. The amount of data in a unit of capacity can be configured through hbase.quota.read.capacity.unit or hbase.quota.write.capacity.unit.
Machine scope represents the throttle quota configured on a single RS. Cluster represents that the throttle quota is shared by all RS of the cluster. If QuotaScope is not specified, the default is Machine.
The shell command to set Throttle is as follows:
Set the throttle of RegionServer (currently, you can only use the all keyword to represent all RegionServer, but not set Quota for the specified RegionServer). Generally speaking, the quota of RS represents the service limit of the RS, and it is recommended to set it in seconds:
Set the quota of Cluster scope:
How the quota of Cluster scope is assigned to each RS:
Quota,TableMachineLimit = ClusterLimit / TotalTableRegionNum * MachineTableRegionNum for table
For quota,NamespaceMachineLimit = ClusterLimit / RsNum of namespace, it should be noted that RSGroup is not considered here. If the namespace is isolated to a RSGroup, the throttle limit assigned to the RS is too small, and this calculation method needs to be improved later.
GlobalBypass is in the global scope, skips throttle and is configured on the user.
2 、 Space Quota
Space is used to limit the amount of data of a resource and is configured on namespace or table. When the amount of data reaches the limit, the configured violation policy is executed, including:
Disable:disable table/ the tables of namespace
NoInserts: forbids Mutation operations except Delete and allows Compaction
NoWrites: disable Mutation operation, allow Compaction
NoWritesCompactions: prohibit Mutation operation, prohibit Compaction
Look at the snapshot of the current Space quota (the snapshot here is not a snapshot in HBase), but refers to the space size of the current table, the configured limit, and the status of the triggered policy:
Limit the number of table or region for namespace:
Hbase.namespace.quota.maxtables/hbase.namespace.quota.maxregions
If the limit is exceeded, a QuotaExceededException is thrown.
The implementation principle of Space quota is as follows:
(1) send Region size information to master:RegionSizeReportingChoreMaster periodically by RS
(2) the size of the statistical table and the triggered strategy coexist in the quota table: QuotaObserverChoreRS
(3) read the quota table periodically and execute policy:SpaceQuotaRefresherChore
3 、 Soft limit
Configure throttle limit to soft limit, that is, if the cluster has abundant resources, allow overhair, and use the following command to enable or disable overhair:
Note that oversend means that users are allowed to request quota that exceeds the configured user/namespace/table when the quota of RS has a surplus. Therefore, the quota of RS must be set before the overhair feature can be enabled. RS's quota recommends that you set the time unit to seconds, because if you use other time units, once the quota of RS is consumed by other users' requests first, it will take a long time to recover quota, which may affect subsequent requests, even if these subsequent requests do not exceed their configured user/namespace/table quota.
4. Quota storage
Quota-related information is stored in the hbase:quota table.
The main types of row key are as follows:
Quota of n.namespace:namespace
Quota of t.table:table
Quota of u.user:user
Quota of r.all:RegionServer
ExceedThrottleQuota: whether oversending is allowed
Throttle-related quota is stored in Q CF, and Space-related quota is stored in u CF.
Whether the Throttle is opened and stored on the zk node of / hbase/rpc-throttle, with a value of true or false. Because turning on or off Throttle takes effect in real time, while other quota configurations read the quota table regularly through RS, it is delayed.
5 、 Throttle
Setting up throttle is divided into 2 steps:
(1) client sends set quota request to master, and master stores quota in hbase:quota table.
(2) RS loads the latest quota value from the quota table and updates QuotaCache every five minutes. Therefore, for a newly set quota, it will take effect after up to five minutes (the interval can be configured through hbase.quota.refresh.period).
When the read and write request reaches the RS, the current limit process is shown as follows:
Among them, the number of quota consumed by this request will be estimated before reading the data. The current community code is estimated to consume 100 bytes per get or mutate, and 1000 bytes per scan. This can be optimized, and the estimated number of bytes can be adjusted dynamically according to the amount of data read out after the last request.
Throttle limit is set on a certain time unit and gradually recovers over time. There are two main recovery methods:
(1) Average Interval Refill (default): restore the quota during this period based on the current and previous recovery time, but the maximum cannot exceed the limit configured by quota.
For example, 100 resources per second are configured, and 10 resources are recovered after 100ms. After 2 seconds, 100 resources were recovered instead of 200 resources.
(2) Fixed Interval Refill: all quota is recovered after a fixed time interval.
For example, if 100 resources per second are configured, if the time of the last quota recovery is 10 quota, then the next recovery time will be 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
Turn current restriction on or off:
When the current limit is disabled, the configured throttle will not be restricted, even if the quota feature is enabled in the cluster.
RSGroup
RSGroup, is to assign RS to different groups, and then assign namespace or table to a RSGroup, so as to achieve the purpose of isolation. It can be vividly understood that each RSGroup forms a small cluster.
To use RSGroup, you need to add the following configuration:
When RSGroup is enabled, all RS defaults to the group of default.
After you create a new group, you must first move the RS into the group before you can move the namespace or table to the group.
Add a new RSGroup:
First move the RS to the group, and then move the namespace to the group:
The function of RSGroup is mainly implemented in RSGroupAdminEndpoint, which is an Endponit that implements MasterObserver. In the hook of master operation, the region of table is moved to the corresponding RSGroup.
The information for RSGroup is stored in the hbase:rsgroup table. At the same time, the information of RSGroup is also stored in zk. When the cluster is started, when the rsgroup table does not have online, the information of RSGroup is read out from zk.
Thank you for reading this article carefully. I hope the article "how to achieve Multi-tenancy in HBase" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 295
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.