How kettle connects HDP3 components Hive3.1.0 to access data 07/06 Update SLTechnology News&Howtos

How kettle connects HDP3 components Hive3.1.0 to access data

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces kettle how to connect the HDP3 component Hive3.1.0 to access data, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

Research process of kettle error reporting

Since I have not used kettle, I downloaded the latest version of kettle7.0 at the beginning, downloaded hive configuration and jar package through various Baidu, but always could not connect to hive, reported all kinds of errors, and did not give examples one by one until I reported an error: No suitable driver found for jdbc:hive2.

Log4j:ERROR No output stream or file set for the appender named [pdi-execution-appender]. September twenty _ ninth 2020 4:16:05 afternoon org.apache.cxf.endpoint.ServerImpl initDestination message: Setting the server's publish address to be / i18n2020/10/18 16:16:05-dept.0-Error occurred while trying to connect to the database2020/10/18 16:16:05-dept.0-2020-10-18 16:16:05-dept.0-Error connecting to database: (using class org.apache.hive.jdbc.HiveDriver) 2020-10-18 16:16:05-dept.0-No suitable Driver found for jdbc:hive2://worker1.hadoop.ljs:10000/lujisen

First of all, I hadoop here is a relatively new HDP3.1.4, each component version is Hadoop3.1.1.3.1, Hive3.1.0, a variety of Baidu and see the official website has not been able to solve this problem. It took a long time to find that the original version does not match, because kettle connects to the hadoop cluster, which is divided into connecting CDH or HDP. From the directory\ data-integration\ plugins\ pentaho-big-data-plugin\ hadoop-configurations, you can see that each version of kettle can only connect to the cluster version specified in this directory, because the file pentaho-hadoop-shims*.jar is used to match kettle and cluster version and cannot be matched at will. Because the version of shims on the official website is limited.

Kettle download

Here I downloaded the new stable version 8.3. I think this version has a shims package corresponding to hdp3.0.

Download address: https://fr.osdn.net/projects/sfnet_pentaho/releases#

4.kettle configuration

1. After downloading and decompressing, there is only one data-integration directory, and start configuring the connection hive. All versions of hadoop supported by the kettle you downloaded will have corresponding folders and shims packages in his plug-in subdirectory, as shown below:

two。 Find the corresponding version of the folder hdp30 and replace the xml configuration file of the cluster:

3. Modify the configuration item in the plugin.properties file in the data-integration\ plugins\ pentaho-big-data-plugin directory to specify which cluster configuration to use. Here I want to enable hdp30, as shown in the following figure:

4. Copy the jar package needed to connect to the hive, and copy all the jar at the beginning of the hive from the lib directory of the cluster Hive. In fact, some packages are not needed, so I no longer choose to copy them directly to the hdp30/lib. If you report an error, you can also place a copy under the decompressed root directory data-integration/lib, as shown below:

5. After placing the dependent jar, you need to restart. Before restarting, it is best to clean up the cache and data (if any) directories of data-integration and system/karaf, and clean the cache:

6. Restart, the test is successful.

Step 1-6: if kerberos is not enabled, kettle connects to hive. If kerbeos is enabled in the cluster, after performing the above 6 steps, you need to perform the following steps:

It should be noted here that the principal principal for accessing the application service environment hive is hive/master.hadoo.ljs@HADOOP.COM, not the user's own principal, and the user's own principal is only used as the client machine to obtain the authentication ticket.

7. Create the file kettle.login, write the following, and save

#: Windows is placed in C:/ProgramData/MIT/Kerberos5/kettle.login

#: Linux is placed in / etc/kettle.login

Com.sun.security.jgss.initiate {com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=false keyTab= "C:/ProgramData/MIT/Kerberos5/testuser.keytab" principal= "testuser/master.hadoop.ljs@HADOOP.COM" doNotPrompt=true debug=true debugNative=true;}

Note:

The keytab and pricipal above will be replaced by the ticket and authentication file of the user who logs in. Pay attention.

8. Modify Kettle startup script

# modify Kettle startup script # # 2.1 window system modify data-integration\ Spoon.bat# modify the OPT variable of about 98 lines to the following: # mainly add the following four parameters (note that each parameter is enclosed in quotation marks, separated by spaces, if you want to turn on the startup information Add pause to the last line of the startup script: # "- Djava.security.auth.login.config=C:/ProgramData/MIT/Kerberos5/kettle.login" # "- Djava.security.krb5.realm=HADOOP.COM" own cluster realm name # "- Djava.security.krb5.kdc=192.168.33.9" own kdcserver address # "- Djavax.security.auth.useSubjectCredsOnly=false" set OPT=%OPT%% PENTAHO_DI_JAVA_OPTIONS% "- Dhttps.protocols=TLSv1,TLSv1.1 TLSv1.2 ""-Djava.library.path=%LIBSPATH% ""-Djava.security.auth.login.config=C:/ProgramData/MIT/Kerberos5/kettle.login "- Djava.security.krb5.realm=HADOOP.COM"-Djava.security.krb5.kdc=192.168.0.201 "- Djavax.security.auth.useSubjectCredsOnly=false"-DKETTLE_HOME=%KETTLE_HOME% "- DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%"-DKETTLE_USER=%KETTLE_USER% ""- DKETTLE_PASSWORD=%KETTLE_PASSWORD% ""-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES% "- DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%"-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT% "

# # the modified data-integration/spoon.sh## of Linux system is about 205lines, similar to that of Windows system Add four parameters OPT= "$OPT $PENTAHO_DI_JAVA_OPTIONS-Dhttps.protocols=TLSv1,TLSv1.1" TLSv1.2-Djava.library.path=$LIBPATH-Djava.security.auth.login.config=/etc/kettle.login-Djava.security.krb5.realm=HADOOP.COM-Djava.security.krb5.kdc=192.168.0.101-Djavax.security.auth.useSubjectCredsOnly=false-DKETTLE_HOME=$KETTLE_HOME-DKETTLE_REPOSITORY=$KETTLE_REPOSITORY-DKETTLE_USER=$KETTLE_USER-DKETTLE_PASSWORD=$KETTLE_PASSWORD-DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES-DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT-DKETTLE_JNDI_ROOT=$KETTLE_JNDI_ROOT "

9. Connect hive

Connection name: lujisen connection type: Hadoop Hive 2 host name: hiveserver2 service address, for example: 192.168.0.101 database name: lujisen;principal=hive/master.hadoop.ljs@HADOOP.COM port number: 10000 user name: hive password:

Note:

In the fourth line above, the principal must be followed by the ticket of the hive administrator, not the ticket of a logged in user. Pay attention, attention, attention.

10. The connection is successful, as shown in the following figure:

Thank you for reading this article carefully. I hope the article "how to connect kettle to HDP3 components Hive3.1.0 to access data" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.