How to use the big data suite Hermes-MR index plug-in 07/15 Update SLTechnology News&Howtos

How to use the big data suite Hermes-MR index plug-in

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Big data suite Hermes-MR index plug-in how to use, I believe that many inexperienced people do not know what to do, so this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Hermes is a powerful tool for multidimensional analysis, and the steps of use are divided into two steps: index creation and data distribution.

Hermes has not been integrated into the TBDS suite (version 3.0) and external customers need to use Hermes components on their deployed clusters, so there is a problem of adapting Hermes to external Hadoop clusters.

After Hermes is integrated with a customer's external cluster, the stand-alone Hermes index creation plug-in is used during a stress test (2T data volume, 445604010 rows, 477field full index). Due to the large amount of data and abnormal phenomena such as Out of Memory, the index plug-in crashes, and there is a big gap between the actual data index and the actual data volume. Based on the above considerations, Siping provides an index creation plug-in based on MR to improve the efficiency of index creation.

The following documents the adaptation process between the MR indexing plug-in based on the hadoop2.2 version and the external cluster.

one。 Cluster related component version

Hermes version: hermes-2.1.0-1.x86_64

Hadoop cluster version: Hadoop 2.7.1.2.3.0.0-2557

Hadoop-common:hadoop-common-2.2.0.jar used by the Hermes-index-MR plug-in

II. Hermes-MR plug-in usage 1. The configuration needs to be modified: (the plug-in home directory is represented by $HERMES_INDEX_MR_HOME)

$HERMES_INDEX_MR_HOME/conf/hermes.properties

What to modify: hermes.zkConnectionString is changed to the zookeeper address of the cluster; hermes.hadoop.conf.dir is modified to the hadoop configuration directory of the cluster; hermes.hadoop.home is modified to the home directory of the hadoop installation of the cluster.

$HERMES_INDEX_MR_HOME/conf/hermes_index.properties

What to modify: hermes.hadoop.conf is changed to the hadoop configuration directory of the cluster; hermes.index.user.conf is changed to the absolute address of the user profile of the hermes-MR-index plug-in.

$HERMES_INDEX_MR_HOME/conf/user_conf.xml

Modify content: this configuration is the user profile of the hermes-MR-index plug-in, and the general default configuration item is fine. It is important to note that the plug-in supports specifying the field delimiter of the indexed file. The configuration items are higo.input.record.split and higo.input.record.ascii.split. Where the priority of higo.input.record.ascii.split is higher than the former, the first configuration after specifying higo.input.record.ascii.split will be invalid. The value entry of higo.input.record.split directly specifies the content of the delimiter (such as |,\,;, etc.); higo.input.record.ascii.split specifies the ascii code number corresponding to the delimiter.

two。 Run the plug-in

Execute the command: under the plug-in's home directory (where labcluster is the nn of HDFS by doing the name of HA):

Sh bin/submit_index_job.sh\ clk_tag_info_test_500\ 20160722\ hdfs://labcluster/apps/hive/market_mid/clk_tag_info_test/\ hdfs://labcluster/user/hermes/demo_dir/clk_tag_info_test_500/\ hdfs://labcluster/user/hermes/demo_dir/schema/clk_tag_info_test_500_hermes.schema\ key_id\ 3

Parameter description:

Sh bin/submit_index_job.sh table name data time (time partition) source data address on HDFS (single file or directory) index output HDFS directory schema file address in HDFS (need to manually create upload) number of primary key index shards

3. Log observation:

The index creation plug-in outputs hermes.log and index.log in $HERMES_INDEX_MR_HOME/logs after running. The former is a record related to hermes, while the latter is a record of the index creation process (including information related to MR tasks). Normally, index.log records the success of submitting MR tasks and the relevant jobid. You can see the status through the RM management page of HADOOP, and index.log will also record the progress of Map/Reduce. After completion, it will output information about Job ${job.id} completed successfully and MR tasks (see figure). If an error log occurs, it needs to be analyzed in detail. A series of problems encountered in this cluster adaptation are summarized below, which have been tested in TBDS3.0 (Hadoop2.7) cluster.

4. Basic process of adaptation

As mentioned earlier, the Hermes-MR-index plug-in uses version 2.2 of Hadoop-common.jar, but the cluster itself is Hadoop2.7. The following "strange" exception occurs when you directly execute the plug-in to create an index.

Diagnostics: Exception from container-launch.Container id: container_e07_1469110119300_0022_02_000001Exit code: 255Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand (Shell.java:545) at org.apache.hadoop.util.Shell.run (Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute (Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer (DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server .nodemanager.containermanager.launcher.ContainerLaunch.call (ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call (ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run (FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615) at java.lang.Thread.run (Thread.java:745)

After querying all the exception logs and finding nothing, and consulting with the great god of Hadoop, it is recommended to replace the Hadoop*.jar package used in the Hermes-MR-index plug-in with the cluster version. At first, there were a series of problems, and finally the Hermes-MR-index plug-in worked well in the hadoop2.7 environment.

Sorted out the following ideas for adaptation: 1. Replace all the hadoop-*.jar used by the Hermes-MR-index plug-in with the version used in the cluster; 2. Execution plug-in log error will generally be due to the new version (2.7) has a new jar package dependency, prompts the error, according to the error prompt missing class to find the corresponding jar package, add to the $HERMES_INDEX_MR_HOME/lib directory, repeat this operation, until no longer prompt missing class error. 3. When doing this, you should also note that the version of the jar package associated with the missing class must be the same as the version used by the actual cluster (the problem found when repeating step 2).

5. Question summary

The problems encountered in the adaptation of plug-ins and clusters are summarized as follows:

Configuration item mapreduce.framework.name exception

2016-07-21 15 cause:java.io.IOException 3915 ERROR org.apache.hadoop.security.UserGroupInformation 1600: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.

Solution: check the hadoop-related configuration of the cluster (that is, the configuration directory in the hadoop configuration path specified in hermes.properties, or you can copy the cluster and modify it independently) the mapreduce.framework.name configuration item in mapred-site.xml is yarn-tez, but currently the plug-in only supports yarn, so modify this configuration to yarn separately and save the exception.

The plug-in failed to submit tasks to the cluster

2016-07-21 20 auth:SIMPLE 1440 494 555 (ERRORorg.apache.hadoop.security.UserGroupInformation 1600): PriviledgedActionException as:hermes (auth:SIMPLE) cause:java.io.IOException: Failed to run job: org.apache.hadoop.security.AccessControlException: User hermes cannot submit applications to queue root.default

Solution: use hermes users to submit tasks to yarn without permission prompts. Modify the permissions of the yarn cluster to allow hermes. TBDS3.0 has a convenient access control page to operate.

Variable substitution exception when submitting a task

Exception message:/hadoop/data1/hadoop/yarn/local/usercache/hermes/appcache/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/launch_container.sh: line 9: $PWD:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/ Current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share / hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/$ {hdp.version} / hadoop/lib/hadoop-lzo-0.6.0.$ {hdp.version} .jar: / etc/hadoop/conf/secure:job.jar/job.jar:job.jar/classes/: Job.jar/lib/*:$PWD/*: bad substitution/hadoop/data1/hadoop/yarn/local/usercache/hermes/appcache/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/launch_container.sh: line 67: $JAVA_HOME/bin/java-Dlog4j.configuration=container-log4j.properties-Dyarn.app.container.log.dir=/hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001-Dyarn.app.container.log.filesize=0-Dhadoop.root.logger=INFO CLA-Dhdp.version=$ {hdp.version}-Xmx5120m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1 > / hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/stdout 2 > / hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/stderr: bad substitutionStack trace: ExitCodeException exitCode=1: / hadoop/data1/hadoop/yarn/local/usercache/hermes/appcache/application_1469110119300_0004/ Container_e07_1469110119300_0004_02_000001/launch_container.sh: line 9: $PWD:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/* : $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD / mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/$ {hdp.version} / hadoop/lib/hadoop-lzo-0.6.0.$ {hdp.version} .jar: / etc/hadoop/conf/secure:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*: bad substitution/hadoop/data1/hadoop/yarn/local/usercache / hermes/appcache/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/launch_container.sh: line 67: $JAVA_HOME/bin/java-Dlog4j.configuration=container-log4j.properties-Dyarn.app.container.log.dir=/hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001-Dyarn.app.container.log.filesize=0-Dhadoop.root.logger=INFO CLA-Dhdp.version=$ {hdp.version}-Xmx5120m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1 > / hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/stdout 2 > / hadoop/data1/yarn/container-logs/application_1469110119300_0004/container_e07_1469110119300_0004_02_000001/stderr: bad substitution

Solution: it can be determined from bad substitution that some configured parameters are not replaced properly. Check that the variables used in the specific exception are $PWD,$JAVA_HOME,$ {hdp.version} and $HADOOP_CONF_DIR and the above variables are found in the hadoop configuration file to replace them with actual values one by one without variables until the error prompt no longer appears. In practice, it is found that it is because the variable hdp.version has no value. You can add this configuration to the hadoop configuration or replace the place where the variable is used with the actual value.

A "strange" mistake

2016-07-22 15 Job job_1469110119300_0022 failed with state FAILED due to 2540 657 (INFO org.apache.hadoop.mapreduce.Job 1374): Application application_1469110119300_0022 failed 2 times due to AM Container for appattempt_1469110119300_0022_000002 exited with exitCode: 255For more detailed output, check application tracking page: http://bdlabnn2:8088/cluster/app/application_1469110119300_0022Then, Click on links to logs of each attempt.Diagnostics: Exception from container-launch.Container id: container_e07_1469110119300_0022_02_000001Exit code: 255Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand (Shell.java:545) at org.apache.hadoop.util.Shell.run (Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute (Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer (DefaultContainerExecutor.java:211) at Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call (ContainerLaunch.java:302)

Solution: this error is the most difficult to solve, which is finally solved by adapting plug-ins and cluster versions mentioned in this article. For solutions and ideas, see "adaptation basic process". The list of replaced or added jar packages is as follows:

Jackson-core-2.2.3.jarjersey-json-1.9.jarjersey-client-1.9.jarjersey-core-1.9.jarjackson-xc-1.9.13.jarjersey-guice-1.9.jarjersey-server-1.9.jarjackson-jaxrs-1.9.13.jarcommons-io-2.5.jarhtrace-core-3.1.0-incubating.jarhermes-index-2.1.2.jarhadoop-cdh4-hdfs-2.2.0.jarhadoop-cdh4-core-2. 2.0.jarhadoop-yarn-common-2.7.2.jarhadoop-yarn-client-2.7.2.jarhadoop-yarn-api-2.7.2.jarhadoop-mapreduce-client-jobclient-2.7.2.jarhadoop-mapreduce-client-core-2.7.2.jarhadoop-mapreduce-client-common-2.7.2.jarhadoop-hdfs-2.7.2.jarhadoop-common-2.7.2.jarhadoop-auth-2.7.2.jar

Unable to connect to yarn's RM task submission port

After submitting a task in the TBDS3.0 environment, the log indicates that the RMserver failed to reconnect, and the error is always prompted.

Solution: check the startup process and find that the port on which the internal cluster receives mr requests is 8032. After modifying the port of the RMserveraddress configuration in the item, the task passes

Adapt to the exception that occurs after all jar packages have been replaced / added

Exception in thread "main" java.lang.VerifyError: class org.codehaus.jackson.xc.JaxbAnnotationIntrospector overrides final method findDeserializer. (Lorg/codehaus/jackson/map/introspect/Annotated;) Ljava/lang/Object At java.lang.ClassLoader.defineClass1 (Native Method) at java.lang.ClassLoader.defineClass (ClassLoader.java:800) at java.security.SecureClassLoader.defineClass (SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass (URLClassLoader.java:449) at java.net.URLClassLoader.access$100 (URLClassLoader.java:71) at java.net.URLClassLoader$1.run (URLClassLoader.java:361) at java.net.URLClassLoader$1.run (URLClassLoader.java:355) at java.security.AccessController.doPrivileged (Native Method) at java.net.URLClassLoader FindClass (URLClassLoader.java:354) at java.lang.ClassLoader.loadClass (ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass (Launcher.java:308) at java.lang.ClassLoader.loadClass (ClassLoader.java:358) at java.lang.Class.getDeclaredMethods0 (Native Method) at java.lang.Class.privateGetDeclaredMethods (Class.java:2615) at java.lang.Class.getDeclaredMethods (Class.java:1860) at com.sun.jersey.core.reflection.MethodList.getAllDeclaredMethods (MethodList.java:70) at com. Sun.jersey.core.reflection.MethodList. (MethodList.java:64) at com.sun.jersey.core.spi.component.ComponentConstructor.getPostConstructMethods (ComponentConstructor.java:131) at com.sun.jersey.core.spi.component.ComponentConstructor. (ComponentConstructor.java:123) at com.sun.jersey.core.spi.component.ProviderFactory.__getComponentProvider (ProviderFactory.java:165) at com.sun.jersey.core.spi.component.ProviderFactory._getComponentProvider (ProviderFactory.java:159) at com.sun.jersey.core. Spi.component.ProviderFactory.getComponentProvider (ProviderFactory.java:153) at com.sun.jersey.core.spi.component.ProviderServices.getComponent (ProviderServices.java:251)

Solution: query that this exception class belongs to jackson*.jar, and the problem lies in this series of packages. Check and find that the Hermes-MR-index plug-in has

Jackson-core-asl-1.7.3.jarjackson-mapper-asl-1.7.3.jarjackson-core-asl-1.9.13.jarjackson-mapper-asl-1.9.13.jar

There are two versions of these two packages. Check that the version used in the Hadoop cluster is 1.9.13. After deleting the two packages of 1.7.3 in the plug-in lib directory, the plug-in runs normally. The reason comes down to the jar package version conflict.

Prompt cannot find the MR frame path

Exception in thread "main" java.lang.IllegalArgumentException: Could not locate MapReduce framework name 'mr-framework' in mapreduce.application.classpathat org.apache.hadoop.mapreduce.v2.util.MRApps.setMRFrameworkClasspath (MRApps.java:231) at org.apache.hadoop.mapreduce.v2.util.MRApps.setClasspath (MRApps.java:258) at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext (YARNRunner.java:458) at org.apache.hadoop.mapred.YARNRunner.submitJob (YARNRunner.java:285) at org.apache.hadoop.mapreduce .JobSubmitter.submitJobInternal (JobSubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run (Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run (Job.java:1287) at java.security.AccessController.doPrivileged (Native Method) at javax.security.auth.Subject.doAs (Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit (Job.java:1287) at org.apache .hadoop.mapreduce.Job.waitForCompletion (Job.java:1308) at com.tencent.hermes.hadoop.job.HermesIndexJob.subRun (HermesIndexJob.java:262) at com.tencent.hermes.hadoop.job.HermesIndexJob.run (HermesIndexJob.java:122) at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:70) at com.tencent.hermes.hadoop.job.SubmitIndexJob.call (SubmitIndexJob.java:194) at com.tencent.hermes.hadoop.job.SubmitIndexJob.main (SubmitIndexJob.java:101)

Solution: prompt that the path of mr framework is not found in the mapreduce.application.framework.path configuration. Check that there is an exception in the configuration item of mapred-site.xml. Add the mr framework path to the configuration item and pass it (the red below is the new configuration).

Mapreduce.application.classpath $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop / hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.2.0.0-2041/hadoop/lib/hadoop-lzo-0.6.0.2.2.0.0-2041.jar:/etc/hadoop/conf/secure after reading the above content Do you know how to use the big data suite Hermes-MR index plug-in? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.