Sample analysis using kylin 04/27 Update SLTechnology News&Howtos

Sample analysis using kylin

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Using the example analysis of kylin, I believe that many inexperienced people are at a loss about this. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

My kylin.properties configuration: # SERVICE # Kylin server mode, valid value [all, query, job] kyin.server.mode=all# Optional information for the owner of kylin platform, it can be your team's email# Currently it will be attached to each kylin's htable attributekylin.owner=whoami@kylin.apache.org# List of web servers in use, this enables one web server instance to sync up with other servers.kylin.rest.servers=192.168.64.16:7070# Display timezone on UI Format like [GMT+N or GMT-N] kylin.rest.timezone=GMT+8### SOURCE # Hive client, valid value [cli, beeline] kylin.hive.client=cli# Parameters for beeline client Only necessary if hive client is beeline#kylin.hive.beeline.params=-n root-- hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.* | dfs.*'-u 'jdbc:hive2://localhost:10000'kylin.hive.keep.flat.table=false### STORAGE # The metadata store in hbasekylin.metadata.url=kylin_metadata@hbase# The storage for final cube file in hbasekylin.storage.url=hbase# In seconds (2 days) kylin.storage.cleanup.time.threshold=172800000# Working folder in HDFS Make sure user has the right access to the hdfs directorykylin.hdfs.working.dir=/kylin# Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4] kylin.hbase.default.compression.codec=none# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020# Leave empty if hbase running on same cluster with hive and mapreducekylin.hbase.cluster.fs=hdfs://master1:8020# The cut size for hbase region, in GB.kylin.hbase.region.cut=5# The hfile size of GB Smaller hfile leading to the converting hfile MR has more reducers and be faster.# Set 0 to disable this optimization.kylin.hbase.hfile.size.gb=2kylin.hbase.region.count.min=1kylin.hbase.region.count.max=500### JOB # max job retry on error, default 0: no retrykylin.job.retry=0kylin.job.jar=$KYLIN_HOME/lib/kylin-job-1.5.4.jarkylin.coprocessor.local.jar=$KYLIN_HOME / lib/kylin-coprocessor-1.5.4.jar# If true Job engine will not assume that hadoop CLI reside on the same server as it self# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine # (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive Hadoop commands) kylin.job.run.as.remote.cmd=false# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.hostname=kylin.job.remote.cli.port=22# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.username=# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.password=# Used by test cases to prepare synthetic data for sample cubekylin.job.remote.cli.working.dir=/tmp/kylin# Max count Of concurrent jobs runningkylin.job.concurrent.max.limit=10# Time interval to check hadoop job statuskylin.job.yarn.app.rest.check.interval.seconds=10# Hive database name for putting the intermediate flat tableskylin.job.hive.database.for.intermediatetable=default# The percentage of the sampling Default 100%kylin.job.cubing.inmem.sampling.percent=100# Whether get job status from resource manager with kerberos authenticationkylin.job.status.with.kerberos=falsekylin.job.mapreduce.default.reduce.input.mb=500kylin.job.mapreduce.max.reducer.number=500kylin.job.mapreduce.mapper.input.rows=1000000kylin.job.step.timeout=7200### CUBE # 'auto',' inmem' 'layer' or' random' for testingkylin.cube.algorithm=autokylin.cube.algorithm.auto.threshold=8kylin.cube.aggrgroup.max.combination=4096kylin.dictionary.max.cardinality=5000000kylin.table.snapshot.max_mb=300### QUERY # kylin.query.scan.threshold=10000000# 3Gkylin.query.mem.budget=3221225472kylin.query.coprocessor.mem.gb=3# Enable/disable ACL check for cube querykylin.query.security.enabled=truekylin.query.cache.enabled=true### SECURITY # Spring security profile, options: testing, ldap, saml# with "testing" profile User can use pre-defined name/pwd like KYLIN/ADMIN to loginkylin.security.profile=testing### SECURITY # Default roles and admin roles in LDAP, for ldap and samlacl.defaultRole=ROLE_ANALYST,ROLE_MODELERacl.adminRole=ROLE_ADMIN# LDAP authentication configurationldap.server=ldap://ldap_server:389ldap.username=ldap.password=# LDAP user account directory Ldap.user.searchBase=ldap.user.searchPattern=ldap.user.groupSearchBase=# LDAP service account directoryldap.service.searchBase=ldap.service.searchPattern=ldap.service.groupSearchBase=## SAML configurations for SSO# SAML IDP metadata file locationsaml.metadata.file=classpath:sso_metadata.xmlsaml.metadata.entityBaseURL= https://hostname/kylinsaml.context.scheme=httpssaml.context.serverName=hostnamesaml.context.serverPort=443saml.context.contextPath=/kylin### MAIL # If true, will send email notification Mail.enabled=falsemail.host=mail.username=mail.password=mail.sender=### WEB # Help info, format {name | displayName | link}, optionalkylin.web.help.length=4kylin.web.help.0=start | Getting Started | kylin.web.help.1=odbc | ODBC Driver | kylin.web.help.2=tableau | Tableau Guide | kylin.web.help.3=onboard | Cube Design Tutorial | # Guide user how to build streaming cubekylin.web.streaming.guide= http://kylin.apache.org/# Hadoop url link, optionalkylin.web.hadoop=#job diagnostic url link, optionalkylin.web.diagnostic=#contact mail on web page, optionalkylin.web.contact_mail=crossdomain.enable=true1. Run. / bin/find-hive-dependency.sh to see if the Hive environment is configured correctly, indicating that the HCAT_HOME path cannot be found.

Solution: export HCAT_HOME=$HIVE_HOME/hcatalog

Then rerun the script

two。 Failed to load hive table in kylin web interface, prompting failed to take action.

Solution:

Vi. / bin/kylin.sh

You need to make two changes to this script:

1. Change export KYLIN_HOME=/home/grid/kylin # to absolute path

2. Export HBASE_CLASSPATH_PREFIX=$ {tomcat_root} / bin/bootstrap.jar:$ {tomcat_root} / bin/tomcat-juli.jar:$ {tomcat_root} / lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX # add $hive_dependency to the path

3. How to add login users to Kylin

The official doc gives a solution: Kylin uses Spring security framework for user authentication, so you need to configure the sandbox,testing part of ${KYLIN_HOME} / tomcat/webapps/kylin/WEB-INF/classes/kylinSecurity.xml

......

Password requires spring encryption:

Org.springframework.security spring-security-core 4.0.0.RELEASEString password = "123456"; org.springframework.security.crypto.password.PasswordEncoder encoder = new org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder (); String encodedPassword = encoder.encode (password); System.out.print (encodedPassword); 4. Establishing cube Times wrong FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

An inexplicable error. You can't see root cause in kylin.log. You need to check the log configured in hive (set in log4j, the default directory is / tmp/$user/). The reason is error message: "Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy () Z".

It turns out that it is the compression format. By default, kylin does not use hadoop's lzo compression format, but uses snappy.

There are three solutions:

1. Redeploy with kylin apache-kylin-1.5.2.1-HBase1.x-bin.tar.gz instead of apache-kylin-1.5.2.1-bin.tar.gz, because I used hbase0.98, so it was pass.

two。 Change it to lzo compression, which is a bit more troublesome. Check http://kylin.apache.org/docs15/install/advance_settings.html for details.

3. Hive and hbase do not use compression (cube build time may become longer, specific evaluation), in the configuration files conf/kylin.properties and conf/*.xml (grep snappy), and then delete all the configurations of snappy and compress.

5. Establishing the step3Extract Fact Table Distinct Columns error of cube java.net.ConnectException: Call From master1/192.168.64.11 to localhost:18032 failed on connection exception: java.net.ConnectException: Connection refused

Solution:

This issue took too much time. It was found on the Internet that it was a problem with the yarn port configuration, but it still didn't work after I modified the yarn-site.xml. Later I thought it was hive metastore server's problem. But after the revision, it is still the same problem.

No way, I finally had to change to HBase 1.1.6, while the kylin version also had to find the corresponding hbase1.x version. Problem solved.

6. After the cube is created successfully, error in coprocessor appears in the query sql

Solution:

This problem has really been bothering me for several days. I have equipped it with kylin.coprocessor.local.jar=/../kylin/lib/kylin-coprocessor-1.5.4.jar. The online solution is hbase_dependency= absolute path / habse-1.1.6/lib in the find-hbase-dependency.sh script, but it still doesn't seem to work.

Finally, the data of hbase on hdfs is completely deleted, and the restart of hbase is successful. It is estimated that there are some problems in the process of creating cube, which will be verified later.

7. About Kylin sql

With experience with count distinct issues, we have found that the following differences are listed for Kylin sql:

Cannot limit beg, end can only limit length

Union, union all are not supported

Where exists clause is not supported

8. Clean up the intermediate storage data of Kylin

Kylin generates a lot of intermediate data on HDFS during the process of creating cube. In addition, when we build/drop/merge cube, some HBase tables may remain in HBase, and these tables are no longer queried, so we need to be able to clean up some offline storage at regular intervals. The specific steps are as follows:

1. Check which resources need to be cleaned, and this operation does not delete anything:

${KYLIN_HOME} / bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob-- delete false

Time taken: 1.339 secondsOKkylin_intermediate_kylin_sales_cube_desc_2b8ea1a6_99f4_4045_b0f5_22372b9ffc60kylin_intermediate_weibo_cube_0d26f9e5_0935_409a_9a6d_6c1d03773fbdkylin_intermediate_weibo_cube_1d21fe49_990c_4a34_9267_e693421689f2Time taken: 0.33 seconds Fetched: 3 row (s)-Intermediate Hive Tables To Be Dropped-- 2016-10-12 15 Intermediate Hive Tables To Be Dropped 24 INFO 25881 INFO [main CubeManager:132]: Initializing CubeManager with config kylin_metadata@hbase2016-10-12 15 15 INFO 2414 25897 INFO [main CubeManager: 828]: Loading Cube from folder kylin_metadata (key='/cube') @ kylin_metadata@hbase2016-10-12 1515 INFO 25952 INFO [main CubeDescManager:91]: Initializing CubeDescManager with config kylin_metadata@hbase2016-10-12 1515 main CubeDescManager:91 25952 INFO [main CubeDescManager:197]: Reloading Cube Metadata from folder kylin_metadata (key='/cube_desc') @ kylin_metadata@hbase2016-10-12 1515 DEBUG [main CubeDescManager:222]: Loaded 2 Cube (s) 2016-10-12 15 24userlog_cube with reference 26038 DEBUG [main CubeManager:870]: Reloaded new cube: userlog_cube with reference beingCUBE [name = userlog_cube] having 1 segments:KYLIN_WEK77BKP6M2016-10-12 1515Groupe 26040 DEBUG [main CubeManager:870]: Reloaded new cube: weibo_cube with reference beingCUBE [name = weibo_cube] having 1 segments:KYLIN_5N8ZRC7Z1F2016-10-12 15Rose 2424Cube [main CubeManager:841]: Loaded 2 cubes Fail on 0 cubes2016-10-12 15 main StorageCleanupJob:218 24 INFO 26218 INFO [main StorageCleanupJob:218]: Skip / kylin/kylin_metadata/kylin-779df736-75b0-4263-b045-6a49401b4516 from deletion list, as the path belongs to segment userlog_ cube [19700101000000 _ 2016093000000] of cube userlog_cube2016-10-12 1515 INFO 24218 INFO [main StorageCleanupJob:218]: Skip / kylin/kylin_metadata/kylin-e9805d06-559a-4c15-ab1e-d6e947460093 from deletion list As the path belongs to segment weibo_ cube [19700101000000 _ 2014043000000] of cube weibo_cube- HDFS Path To Be Deleted-/ kylin/kylin_metadata/kylin-07e8f9b1-8dfc-4c57-8e5b-e9800392af0d/kylin/kylin_metadata/kylin-0855f8ed-89a5-4676-a9bb-f8c301ead327/kylin/kylin_metadata/kylin-0cdef491-d0b7-438d-ba54-091678cb463d/kylin/kylin_metadata/kylin-121752c8-ab9d -434b-812f-73f766796436/kylin/kylin_metadata/kylin-12b442a0-0c6d-43e7-830f-2f6e5826f23a/kylin/kylin_metadata/kylin-5ba7affe-d584-4f6e-85b2-2588e31a985c/kylin/kylin_metadata/kylin-5e1818bd-4644-4e8e-b332-b5bb59ff9677/kylin/kylin_metadata/kylin-680f7549-48be-496a-82c5-084434bfee74/kylin/kylin_metadata/kylin-707d1a65-392e-456f-97ea-d7d553b52950/kylin/kylin_metadata/kylin-7520fc6e-8b76-43cc-9fb8-bfba969040da/kylin/kylin_metadata / kylin-75e5b484-4594-4d31-83ce-729a6b3de1c2/kylin/kylin_metadata/kylin-79535d79-cd36-4711-858c-d8fa28266f7f/kylin/kylin_metadata/kylin-81eb9119-c806-4003-a6d6-fc43281a8c01/kylin/kylin_metadata/kylin-839e80d8-d116-4061-80d6-379c85db7114/kylin/kylin_metadata/kylin-843b185d-ed09-48c7-958c-1ee1e0e2cde5/kylin/kylin_metadata/kylin-97c0cdc6-c53e-4115-995e-b90f4381d307/kylin/kylin_metadata/kylin-998aa0aa-279c-44f0-8367-807b9110ae74 / kylin/kylin_metadata/kylin-ad2ad0c7-bee5-46f2-9fc3-e60b10941ffa/kylin/kylin_metadata/kylin-b5939b9b-2a6e-4acb-aaf7-888a83113ad7/kylin/kylin_metadata/kylin-b65b555d-90e5-4455-95ce-10b215b00482/kylin/kylin_metadata/kylin-d5ac36b3-b021-4ac6-87ae-f3a38f90eb06/kylin/kylin_metadata/kylin-e7a9b0d1-a788-4ddf-88f5-37671eaa7dc3/kylin/kylin_metadata/kylin-f7094827-00f8-474bMur9542luea001797a148- -2016-10-12 15 15 24 Exclude table KYLIN_WEK77BKP6M from drop list 24 INFO 26475 [main StorageCleanupJob:91]: As it is newly created2016-10-12 15 main StorageCleanupJob:102 2414 Exclude table KYLIN_5N8ZRC7Z1F from drop list 26475 INFO [main StorageCleanupJob:102]: Exclude table KYLIN_5N8ZRC7Z1F from drop list, as the table belongs to cube weibo_cube with status READY- Tables To Be Dropped-

two。 As shown in the figure above, lists the tables or files that can be deleted in hive/HDFS/Hbase (while automatically filtering out recently generated or queried tables). Based on the output above, check to see if the table is no longer needed. Once determined, change "- delete false" to true with the command of 1 and the cleanup operation begins.

${KYLIN_HOME} / bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob-- delete true

After reading the above, have you mastered the method of example analysis using kylin? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.