How to realize the principle of On Yarn by HetuEngine 04/18 Update SLTechnology News&Howtos

How to realize the principle of On Yarn by HetuEngine

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

In this article, the editor introduces in detail "how to achieve the principle of On Yarn in HetuEngine". The content is detailed, the steps are clear, and the details are handled properly. I hope that this article "how to achieve the principle of On Yarn in HetuEngine" can help you solve your doubts.

What is On Yarn?

As the name implies, the process is run on Yarn, and the resources are managed and scheduled by Yarn.

Whether it is TrinoDB/PrestoDB or openLooKeng, the deployment method is to run coordinator and worker processes directly on the host and share resources with other applications on the host, which can not achieve resource isolation and is difficult to expand.

MRS HetuEngine uses the capabilities provided by Yarn Service to run coordinator and worker processes in Yarn container in the form of Yarn application. By dividing the tenants of the MRS cluster, the HetuEngine computing instance can be launched in a specific tenant queue, thus achieving resource isolation.

HetuEngine architecture

The following figure is a topology diagram of HetuEngine. HetuEngine can interface with various data sources (such as Hive,GaussDB,HBASE,Elasticsearch, etc.) and provide CLI/JDBC interfaces to users. In the same set of MRS cluster, HetuEngine can start multiple HetuEngine computing instances in different tenant queues, and support one computing instance on one tenant queue. The HSBroker instance of HetuEngine interacts with Yarn Service, binds the tenant queue with the computing instance, and HSConsole provides the operation and maintenance management page for multiple computing instances of HetuEngine, including starting, stopping and deleting computing instances, configuring resources for computing instances, expanding capacity, and so on.

HetuEngine On Yarn principle

As mentioned earlier, On Yarn runs the process in the container of Yarn. How does HetuEngine implement running coordinator and worker in Yarn?

Yarn Service provides a series of API and a general AM, so that users can call API to submit tasks to Yarn, and Yarn implements task containerization and resource and lifecycle management of containers. Please refer to the introduction of the open source community for details. Https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html

The On Yarn implementation of HetuEngine relies on the capabilities provided by Yarn Service. In HetuEngine's HSBroker, call Yarn Service's API, pull up application, and run HetuEngine's own processes, namely coordinator and worker, in container. There are several key points:

Yarn Service API

The interface to create a Yarn Service service is / app/v1/services, and the parameter json structure is as follows.

POST / app/v1/services {"name": "hello-world", "version": "1.0.0", "description": "hello world example", "components": [{"name": "hello", "number_of_containers": 1, "artifact": {"id": "nginx:latest", "type": "DOCKER"} "launch_command": ". / start_nginx.sh", "resource": {"cpus": 1, "memory": "256"," additional ": {" yarn.io/gpu ": {" value ": 4 "unit": ""}]

Name: service name, displayed in Yarn's resource manager WEB interface servicename

Version: version number

Description: description of the service

Components: a service can contain multiple component to run different tasks

Components.name:component name

Number_of_containers: the number of container in this component

Artifact: the resource file on which the process depends, including id and type information. Type supports docker and tarball.

Launch_command: process start command

Resource: the resources required for this component.

The HSBroker of HetuEngine constructs this json based on user input, and then calls Yarn Service API to implement On Yarn. In addition, Yarn Service also provides API such as stop/delete, which is also called by HSBroker to implement operation and maintenance operations such as stopping / deleting HetuEngine computing instances.

Dependent file

Yarn Service supports the start of the process in the form of resource files on HDFS, and the API it provides can receive resource files in the form of tar packages and docker, and Yarn Service localizes the files on HDFS by itself. Therefore, HetuEngine only needs to deploy the dependent jar packages and resource files in advance to the specified location on the HDFS, and specify the resource file when calling Yarn Service's API.

Tenant binding

HetuEngine supports binding computing instances to Yarn tenant queues, and a combination of coordinator and worker can be run on each queue. Based on the previous Yarn Service capabilities, you only need to specify the queue information when you construct the json. In addition to queues, you can also set the placement policy (plecement policy) of container, which is not discussed in detail here, but can be referred to the documentation of yarn.

Resource management

HetuEngine allows users to customize the number of coordinator and worker as well as the CPU memory size. As shown in the following figure, on the HSConsole page of HetuEngine, users can set the number of CPU, memory and nodes of the computing instance. The internal implementation is that HSBroker receives user input and sets the resource size required for container to run in the resource section of json.

Currently, HetuEngine supports scaling the number of worker horizontally to achieve flexible scaling of resources. Even when the computing instance is running, you can manually adjust the number of worker without restarting the computing instance. Thanks to the flex interface provided in Yarn Service's API, it is possible to increase or decrease the number of container to a running application.

Client use

After the computing instance of HetuEngine is created, users can access it through hetu-cli or JDBC programs. Users need to bind the corresponding tenant queue permissions before submitting tasks to the specified queue.

Hetu CLI example:

Hetu-cli-catalog hive-tenant tenantName-schema schemaName

Tenant name: (optional) tenant name. Specifies the tenant resource queue initiated by HetuEngine, which is not specified as the default queue for tenants. When using this parameter, the user of kinit needs to have permissions for the corresponding role of the tenant.

Hetu JDBC example:

Properties properties = new Properties ();... Properties.setProperty ("tenant", "default"); properties.setProperty ("deploymentMode", "on_yarn"); Connection = DriverManager.getConnection (url, properties); After reading this, the article "how to achieve the principle of HetuEngine On Yarn" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.