How to deploy DataSphere Studio in an Ambari 2.7.4 cluster 04/19 Update SLTechnology News&Howtos

How to deploy DataSphere Studio in an Ambari 2.7.4 cluster

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to deploy DataSphere Studio in Ambari 2.7.4 cluster, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

Deploy DataSphere Studio in the Ambari 2.7.4 cluster

WeData Sphere

1. Overview

DataSphere Studio (DSS for short) is an one-stop data application development and management portal developed by WeBank.

Based on plug-in integration framework design and computing middleware Linkis, it can be easily connected to a variety of upper data application systems, making data development simple and easy to use.

Under the unified UI, DataSphere Studio will meet the requirements of the whole process of data application development, such as data exchange, desensitization cleaning, analysis mining, quality inspection, visual display, timing scheduling, data output application, etc., with workflow graphical drag-and-drop development experience.

Through the plug-and-pull integration framework design, DSS allows users to simply and quickly replace various integrated functional components or new functional components of DSS according to their needs.

With the connection, reuse and simplification capabilities of Linkis computing middleware, DSS naturally has the execution and scheduling capabilities of financial-level high concurrency, high availability, multi-tenant isolation and resource control.

However, the official installation documentation is a little simple, and some configuration details are not described in order to simplify the installation. As a result, the installation blocks some users in different environments. To this end, I will sort out my installation process for your reference.

Catalogue

1 Overview

2 configure stand-alone client machine based on Ambari cluster

2.1 big data cluster environment

2.2 DataSphere Studio dedicated client machine configuration

2.2.1 basic configuration description

2.2.2 hosts and hostname

2.2.3 configure password-free login from the host to client

2.3 add DataSphere Studio dedicated Client to Ambari cluster

2.3.1 specify the host

2.3.2 confirm the host to be registered

2.3.3 specify the components installed on the client

2.3.4 specify configuration group

2.3.5 Review configuration

2.3.6 install, start, test

2.3.7 complete

2.4 location of components and configuration files installed on the client

3 install DataSphere Studio

3.1 installation package

3.2 dependency installation

3.3The Yum installs nginx

3.4 revoke the cp/mv/rm alias

3.5 modify configuration

3.5.1 profile exampl

3.6 modify database configuration

3.6.1 create a database

3.6.2 configuration

3.7 execute the installation script

3.7.1 installation steps

3.7.2 verify that the installation is successful

3.8 access address

3.9 FAQ

4 start the service

4.1 start the service

4.2 check whether the startup is successful

5 pits

5.1 failed to submit upload resource task

5.2 some services show that they are already running

5.3 failed to start linkis

5.4 failed to upload resources

5.4.1 owner of tmp/linkis in hdfs is root:hdfs

6 Appendix

two。 Configure stand-alone client machine based on Ambari cluster

2.1 big data cluster environment

A four-node big data cluster has been deployed based on Ambari 2.7.4. The components installed on each node are automatically configured by ambari, not manually.

Mysql5.7 Community Edition is installed on the dn1 node, and the metadata of the big data component is stored in the mysql database of the dn1 node.

The four nodes are as follows:

2.2 DSS dedicated client machine configuration

2.2.1 basic configuration description

Centos7 minimized installation

Bring your own python2.7

Uninstall the original openJDK and replace it with oracle jdk 1.8

# yum-y install wget

16 GB of memory, 4 cores

2.2.2 hosts and hostname

# vi / etc/hosts

Join

The FQDN domain name of the IP address node

For example:

192.168.94.132 datastudio.sinobd#vi hostname

The FQDN name of the join node

For example

Datastudio.sinobd

Restart takes effect

2.2.3 configure password-free login from the host to client

# ssh-copy-id-I ~ / .ssh/id_rsa.pub client IP address or machine name

2.3 add DSS dedicated Client to Ambari cluster

In the ambari console menu, click the hosts menu below

2.3.1 specify the host

Enter the hostname

Upload the id_rsa file of the master computer

Cat .ssh / id_rsa replication sometimes goes wrong

If the private key is configured correctly, the client does not have to install agent

If you do not use the private key, you can also manually install ambari agent on the node and start it, with the following prompt box:

2.3.2 confirm the host to be registered

2.3.3 specify the components installed on the client

Only all client are installed by default

2.3.4 specify configuration group

2.3.5 Review configuration

2.3.6 install, start, test

2.3.7 complete

2.4 location of components and configuration files installed on the client

The components that ambari installs on the client are all in the

/ usr/hdp/current folder

Configuration file

In the corresponding component folder under / etc/, but it is actually a soft link to the corresponding component configuration file under / usr/hdp/current.

Know the location of these files, for later configuration

HADOOP_HOME, HIVE_HOME, SPARK_HOME, and finding configuration files are helpful.

3 install DSS

3.1 installation package

Use DSS & Linkis family bucket one-click deployment installation package (1.3GB) (official account reply: whole family bucket installation package)

3.2 dependency installation

3.3The Yum installs nginx

3.4 revoke the cp/mv/rm alias

Centos sets an alias for cp/mv/rm by default in .bashrc, resulting in a lot of cp commands prompting you whether to overwrite the installation. Enter alias, and if you have aliases for cp, mv, or rm, you should remove them to avoid a large number of prompts. The method is:

# vi ~ / .bashrc

3.5 modify configuration

Change the config.sh.standard.template under the conf directory to config.sh

Cp conf/config.sh.standard.template conf/config.sh

You can modify the relevant configuration parameters as needed:

Vi conf/config.sh

The parameters are described as follows:

3.5.1 profile exampl

3.6 modify database configuration

3.6.1 create a database

On the mainframe

# mysql-uroot-pMysql12#create database linkis;GRANT ALL PRIVILEGES ON linkis.* TO linkis@'%' IDENTIFIED BY 'sinosoft1234' WITH GRANT OPTION

3.6.2 configuration

3.7 execute the installation script

Sh bin/install.sh

Note: the installation script has two relative paths, so do not go into the bin directory to execute the script in order to install it correctly.

3.7.1 installation steps

The install.sh script asks you for the installation mode. The installation mode is divided into simplified version and standard version. Please choose the appropriate installation mode according to the environment you have prepared. This document adopts 2 standard version

The install.sh script asks if you need to initialize the database and import metadata, and both linkis and dss ask.

The first installation must be selected: yes.

3.7.2 verify that the installation is successful

Check the log information printed by the console to see if the installation is successful.

If there is an error message, you can check the specific reason for the error.

3.8 access address

DSS web access Port: 8088

Address of Linkis gateway: http://127.0.0.1:9001

DSS web static file address: / dss_linkis/web/dist

DSS web installation path: / dss_linkis/web

DSS nginx ip:127.0.0.1

3.9 FAQ

You can also get answers to our installation FAQs (official account reply: installation FAQs).

4 start the service

4.1 start the service

Start all services by executing the following command in the installation directory:

If the startup produces an error message, you can check the specific cause of the error. After startup, various microservices will carry out communication detection, and if there is an exception, it can help users locate the log and cause of the exception.

You can get answers to our startup FAQs (official account replies to launch FAQs).

Tip:

You can extend the time of sleep in start-all.sh under the bin directory of linkis and dss, for example, to 20 seconds.

You can also run sh bin/start-all.sh to see the service startup directly on the console.

4.2 check whether the startup is successful

The first service to start is Eureka. After it starts, you can check the startup status of Linkis & DSS backend micro-services in the Eureka (http://IP address: 20303 /) interface. As shown in the figure below, if the following micro-service appears on the Eureka home page, it means that all the services have been started successfully and can provide services normally:

5 pits

5.1 failed to submit upload resource task

Failure: failed to submit upload resource task

Operation failed (operation failed) s! The reason (reason): HttpClientResultException: errCode: 10905, desc: URL http://127.0.0.1:9001/api/rest_j/v1/bml/upload request failed! ResponseBody is {"method": nu ll, "status": 1, "message": "error code (error code): 50073, error message (error message): failed to submit upload resource task: errCode: 50001, desc: HDFS configuration was not read, please configure hadoop.config.dir or add env:HADOOP_CONF_DIR, ip: datastudio.sinobd, port: 9113, serviceKind: bml-server.", "data": {"errorMsg": {"serviceKind": "bml-server", "level": 2, "port": 9113 "errCode": 50073, "ip": "datastudio.sinobd", "desc": "failed to submit upload resource task: errCode: 50001, desc: HDFS configuration was not read, please configure hadoop.config.dir or add env:HADOOP_CONF_DIR, ip: datastudio.sinobd, port: 9113, serviceKind: bml-server"}. , ip: datastudio.sinobd, port: 9004, serviceKind: dss-server

Solution: the installation user must be consistent with the startup user

5.2 some services show that they are already running

Failure: when the system is rebooted, it shows that some servers are already running, such as

Solution: under the installation directory

# sh bin/stop-all.sh

Stop all services normally, and then restart

5.3 failed to start linkis

Fault:

Solution: extend the time of sleep in linkis/bin/start-all.sh and dss/bin/start-all.sh, for example, I extend it to 20 seconds.

5.4 failed to upload resources

Fault:

Reason:

Although the installation script specifies the hadoop user, the user who runs the installation script is root, causing the ower of the / tmp/linkis folder created in hadoop to be root. It is not valid to change owner with a regular hdfs command, you need to use the following command:

5.4.1 owner of tmp/linkis in hdfs is root:hdfs

Does sudo-u hdfs hadoop fs-chown-R hadoop:hadoop / tmp/linkis help you after reading the above? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.