Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Extensible Ultra Fast OLAP engine Kylin

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Extensible ultra-fast OLAP engine Kylin how to use, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Kylin follows the concept of Cube in the original data warehouse technology, which "preprocesses" infinite data according to limited dimensions, and then loads the Cube into Hbase for users to query.

Kylin uses space for time to interactively query large-scale data sets on Hadoop in the case of subsecond delay. Kylin saves the calculation result set to Hbase through pre-calculation. The original row-based relational model is transformed into column storage based on key-value pairs, and through dimension combination as the RowKey of HBase, expensive table scans are no longer needed for query access. This makes it possible for high speed and high concurrency analysis. Kylin provides a standard SQL query interface that supports most SQL functions, as well as seamless integration of ODBC/JDBC with mainstream BI products.

How Kylin works

1. Specify the data model, define dimensions and metrics

2. Pre-calculate Cube, calculate all Cuboid and save as materialized view

3. When executing the query, read the Cuboid, calculate, and generate the query result.

The main features of Kylin

1. Standard SQL interface

2. Support very large datasets

3. Subsecond response

4. Scalability and high throughput

5. Integration of BI and visualization tools

Several core concepts

Data warehouse: (Data Warehouse): a large amount of historical data.

OLAP: online analytical processing, analyzing data in a multi-dimensional way, and flexibly providing operations such as roll-up, drill-down and perspective analysis. Different from online transaction processing (OLTP): pay more attention to daily transaction processing, add, delete, modify and check.

BI: business Intelligence

Dimension and measurement: dimension refers to the point of view of the data, which is usually an attribute of the data record, eg: time, place, etc., and the measurement is based on the calculated values of the data. Eg: sales, number of users, etc.

Fact table and dimension table: fact table stores tables with fact records, eg: system logs, sales records, etc., fact tables are growing dynamically. The dimension table stores the attribute values of the dimension, eg: date table, location table, etc.

Cube, Cuboid and Cube Segment

Cube: data cube, a technology often used for data analysis and indexing. It can build a multi-dimensional index of the original data and analyze the data through Cube, which greatly speeds up the efficiency of data query.

Cuboid: data calculated under a specific combination of dimensions in Kylin

Cube Segment: refers to the Cube data calculated for a fragment of the source data. Usually the data in the data warehouse grows over time, and the Cube Segment is built in chronological order.

The main process of using Apache Kylin is:

1. Data preparation: conforms to the star model, dimension table design (Kylin loads dimension tables into memory for processing, all dimension tables should not be too large), and Hive table partitioning.

2. Design Cube: import Hive table definition and create data model

3. When Cube:Kylin is created, the Cube is stored in Hbase in the way of Key-Value. The Key of Hbase, that is, RowKey, is made up of the values of each dimension.

4. Build Cube: incremental build and full build

5. Refresh and merge historical data (Segment)

6. Query Cube, the standard select statement of SQL.

Support the construction method:

Incremental build: divided into full and incremental

Streaming construction: realize real-time data update and interface with Kafka implementation. There is a risk of data loss at present.

Docking method is supported:

1. WEB GUI-Insight page

2 、 Rest API

3 、 ODBC/JDBC

4. Access Kylin through Tableau (BI).

After reading the above, have you mastered how to use the scalable ultra-fast OLAP engine Kylin? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report