Deployment of big data's analysis platform Apache Kylin (used by Cube construction) 07/13 Update SLTechnology News&Howtos

Deployment of big data's analysis platform Apache Kylin (used by Cube construction)

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Preface

Apache Kylin is an open source distributed analysis engine originally contributed by eBay developers to the open source community. It provides SQL query interface and multidimensional analysis (OLAP) capability on Hadoop to support large-scale data, can handle TB and even PB-level analysis tasks, can query huge Hive tables in subseconds, and supports high concurrency.

The theoretical basis of Kylin: space for time.

Kylin reads source data from Hive, which is the most commonly used in data warehouse, uses MapReduce as the engine of Cube construction, and saves the precomputed results in HBase to expose the query interface of Rest API/JDBC/ODBC.

Deploy Kylin

(1) download and install

At the time of writing this blog, the latest version is 2.0.0 beta, and the latest official version is 1.6.0, so I use 1.6.0.

You can download the source package directly to compile and install it, or you can download the corresponding binary installation package according to your own version of the hadoop environment.

I am using HDP2.4.2,Hbase version 1.1.2. What is downloaded directly is the binary package installation.

$cd / opt$ wget http://ftp.tc.edu.tw/pub/Apache/kylin/apache-kylin-1.6.0/apache-kylin-1.6.0-hbase1.x-bin.tar.gz$ tar xf apache-kylin-1.6.0-hbase1.x-bin.tar.gz$ vim / etc/profileexport KYLIN_HOME=/opt/apache-kylin-1.6.0-hbase1.x-bin$ source / etc/profile

(2) Environmental inspection

$cd / opt/apache-kylin-1.6.0-hbase1.x-bin$./bin/check-env.shKYLIN_HOME is set to / opt/apache-kylin-1.6.0-hbase1.x-binmkdir: Permission denied: user=root, access=WRITE, inode= "/ kylin": hdfs:hdfs:drwxr-xr-xfailed to create / kylin, Please make sure the user has right to access / kylin# prompt to use hdfs user # check-env.sh script to check the local hive,hbase,hadoop and other environment. # and create a working directory for kylin in hdfs. $su hdfs$. / bin/check-env.sh KYLIN_HOME is set to / opt/apache-kylin-1.6.0-hbase1.x-bin$ hadoop fs-ls / # added a / kylin directory drwxr-xr-x-hdfs hdfs 0 2017-01-19 10:08 / kylin

(3) start

$chown hdfs.hadoop / opt/apache-kylin-1.6.0-hbase1.x-bin $. / bin/kylin.sh startA new Kylin instance is started by hdfs, stop it using "kylin.sh stop" Please visit You can check the log at / opt/apache-kylin-1.6.0-hbase1.x-bin/logs/kylin.log

(4) enter the page

Http://localhost:7070/kylin

User:ADMIN passwd:KYLIN

Use Kylin

(1) add a new project

Give the project a name and add a project description.

Add data sources to the project (load hive data tables)

On the page of the data source, you can fill in the hive table name manually

Successfully loaded the data of the resource table

You can see the field properties of the corresponding table.

(2) create model (model)

Create a new model

Edit model name and description

Select data Table

Next, select dimensions and metrics, which are the two most important attributes in building a precomputed model cube.

Metric: a metric is the aggregate quantity value examined specifically, such as sales quantity, sales amount, and per capita purchase. One description of the computer is that it is an aggregate function in SQL.

For example: select cate,count (1), sum (num) from fact_table where date > '20161112' group by cate

Count (1), sum (num) are metrics

Dimension: a dimension is the angle from which to observe the data. For example: date of sale, place of sale. The description of the computer is that it is the field in where and group by in SQL.

For example: select cate,count (1), sum (num) from fact_table where date > '20161112' group by cate

Date and cate are dimensions

Select the dimension field to analyze

Select the measurement field to analyze

Set the time field in the table

(3) create cube (cube)

The Cube build needs to rely on the model created earlier. Select model and set the cube name.

Select the fields you want to analyze from the dimension fields set by model above.

Select the metric.

The first _ COUNT_ is calculated by default.

The second COUNT_DISTINCT can be de-recalculated to get how many IP addresses there are, that is, the usual UV.

(there is a choice of accuracy in COUNT_DISTINCT calculation, and the more accurate the calculation, the longer it will take.)

The third TOP_N is used to calculate rankings.

The fourth MAX is used to calculate the maximum value.

There are various other evaluation expressions such as MIN,SUM.

The latter few basically have nothing to set up, just Next directly, and finally save cube.

(4) Construction of cube

After creating the cube, we just get a computational model. The data need to be calculated according to the model we set in order to get the corresponding results.

Let's start building cube and select Build in Action

Select the time range to build (if the data is continuously written to the hive table, then you can continue to build using cube)

Go to Monitor to view the Cube you are building and the cube that you built historically

(5) Enquiry

After the cube is successfully built, the data has been calculated and the result of the calculation is stored in Hbase. So at this point we can use SQL to query in kylin.

Compare the speed of querying in kylin with querying directly in hive.

Execute a query for group by order by.

SQL:select ip, max (loadmax) as loadmax,max (connectmax) as connectmax, max (eth0max) as eth0max, max (eth2max) as eth2max, max (rospace) as rospace,max (team) as team from resource group by ip order by loadmax asc

After Kylin pre-calculation, this query took only 0.11s.

The calculation time directly in hive is 30.05s.

The time difference is 270 times!

(VI) sample data

# kylin comes with a sample containing a sample of 1w pieces of data

$. / bin/sample.shSample cube is created successfully in project 'learn_kylin'.Restart Kylin server or reload the metadata from web UI to see the change.$. / bin/kylin.sh stopstopping Kylin:15334 $. / bin/kylin.sh start

You can see the learn_kylin project in Kylin. And there are created model and cube for reference and learning.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.