In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you how to understand the concept of big data OLAP system, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
1.1 what is OLAP?
OLAP (OnLine Analytical Processing), that is, online analytical processing. OLAP performs multidimensional analysis of business data and provides complex computing, trend analysis and complex data modeling capabilities. It is mainly used to support enterprise decision management analysis, and is the technology behind many business intelligence (BI) applications. OLAP enables end users to impromptu analyze data from multiple dimensions to obtain the knowledge they need to make better decisions. OLAP technology has been defined as the ability to "quickly access shared multidimensional information".
1.2 Why do you need multidimensional analysis?
Business is actually a multi-dimensional activity. Enterprises track their business activities by considering many variables, and when tracking these variables on a spreadsheet, they are set on axes (x and y). For example, sales can be tracked monthly over a year, where sales indicators can be displayed on the y-axis and months on the x-axis. In order to analyze the health of the business and plan future activities, many variable groups or parameters must be tracked continuously. For example, a business should consider at least the following aspects: customers, locations, periods, salespeople, and products. These dimensions form the basis of the company's planning, analysis and reporting activities. Together, they represent the "overall" business situation and lay the foundation for all business planning, analysis and reporting activities.
1.3Origin of OLAP
The term OLAP was first proposed in 1993 by Edgar F. Codd, who is known as the father of relational databases, in his white paper Providing OLAP to User-Analysts: An IT Mandate. In this white paper, he establishes 12 evaluation rules for OLAP products:
Multidimensional Conceptual View (Multidimensional Conceptual View): from the point of view of user analysts, enterprises are naturally multidimensional. For example, you can view profits by region, product, time period, or plan (such as actual, budget, or forecast). The multidimensional data model enables users to deal with data more directly and intuitively, including "sharding and chunking".
Transparency (guidelines for Transparency): OLAP should be part of an open system architecture that can be embedded anywhere the user wants without affecting the functionality of the host tool. The data sources of OLAP tools should not be exposed to users, and the data sources may be homogeneous or heterogeneous.
Accessibility (access capability conjecture): OLAP tools should be able to apply their own logical structure to access heterogeneous data sources and perform any transformations needed to present a coherent view to the user. Tools, not users, should focus on the source of physical data.
Consistent Reporting Performance (stable reporting performance): as the number of dimensions increases, the performance of OLAP tools will not be significantly affected.
Client-Server Architecture (client / server architecture): the server component of the OLAP tool should be smart enough that various clients can easily connect to it. The server should be able to map and merge data between different databases.
Generic Dimensionalityc (equivalence Criterion of Dimensions): each data dimension should have the same structure and operational capabilities.
Dynamic Sparse Matrix Handling (dynamic sparse matrix processing criterion): the physical structure of the OLAP server should have the best sparse matrix processing.
Multi-User Support (guidelines for Multi-user support capabilities): OLAP tools must provide concurrent retrieval and update access, integrity and security.
Unrestricted Cross-dimensional Operations (unrestricted cross-dimensional operations): computing facilities must allow computation and data processing across any number of data dimensions, and must not restrict any relationship between data units.
Intuitive Data Manipulation (intuitive data manipulation): data operations inherent in the merge path, such as drilling down or zooming out, should be done through direct manipulation of the analytical model unit, rather than using menus or multiple trips across the user interface.
Flexible Reporting (flexible report generation): the report tool should display information in any way that the user wants to view.
Unlimited Dimensions and Aggregation Levels (unrestricted dimension and aggregation hierarchy).
1.4 History of OLAP
Although the concept of OLAP was proposed in 1993, the history of supporting OLAP-related products can be traced back to 1975:
The first OLAP product, Express, was introduced in 1975, flourished for more than 30 years after it was acquired by Oracle, and was finally replaced by its successor, Oracle 9i.
In 1979, the first spreadsheet application, VisiCalc, was launched. VisiCalc has the basic row and column structure that is standard in most spreadsheet applications today.
In 1982, Comshare developed a new decision support system software (System W), which is not only the first OLAP tool in the financial field, but also the first tool to apply hypercube method in its multi-dimensional modeling.
In 1983, IBM launched Lotus 1-2-3. Its structure is similar to Visicalc and quickly replaces Visicalc. Lotus 1-2-3 became the mainstream spreadsheet application before Windows.
In 1984, the first ROLAP product, Metaphor, was released. The multidimensional product establishes new concepts, such as client / server computing, multidimensional processing of relational data, working group processing, object-oriented development, and so on.
In 1985, Excel 1.0 was born. Microsoft's integration of PivotTable in Excel is probably one of the most important enhancements to Excel products, because PivotTable has become the most popular and widely used tool in multidimensional analysis.
In 1989, the SQL language standard was born, which can extract and process business data from relational databases. This could be a turning point. In the 1980s, spreadsheets played an absolutely dominant role in OLAP applications, but after 1990's, more and more database-based OLAP applications began to appear:
1992: Hyperion Solution released Essbase (extended spreadsheet database) and became a major OLAP server product in the market in 1997.
1997: PARIS Technologies launches PowerOLAP: integrating spreadsheets and transactional databases to instantly update data in spreadsheet applications such as Excel.
1999: Microsoft OLAP service was released and became Microsoft Analysis Services in 2000
2012: PARIS Technologies launches OLATION, which combines relational and multidimensional database technologies (in SQL Server,SAP HANA,Oracle, etc.) to ensure "true online" data updates to actual and planned data.
Core concepts and basic operation 1.5.1 core concepts of 1.5 OLAP
Dimension (Dimension): a dimension is a set of attributes that describe a business topic, and a single attribute or collection of attributes can form a dimension. Such as time, geographical location, age and gender are all dimensions.
Dimension level (Level of Dimension): a dimension can often have multiple levels, such as time dimension is divided into year, quarter, month and day and other levels, regional dimension can be country, region, province, city and other levels. The hierarchy here represents the degree of refinement of the data, corresponding to the conceptual layering. The roll-up operation described later is mapped from low-level concepts to high-level concepts. Concept layering can be determined not only according to the relationship between total order and partial order of concepts, but also by discretization and grouping of data.
Member of dimension (Member of Dimension): if the dimension is multi-level, the values of different levels constitute a dimension member. Part of the dimension level can also constitute a dimension member, such as "a certain year, a certain quarter", "a certain quarter, a certain month" and so on can all be members of the time dimension.
Measure: represents the value of a fact on a dimension member. For example, there are 39 Han males in the development department, which means the factual measurement of the number of enterprises in the three dimensions of department, nationality and gender.
1.5.2 basic operation
The operation of OLAP is mainly based on query-that is, the SELECT operation of the database, but queries can be very complex, for example, queries based on relational databases can be associated with multiple tables, and aggregate functions such as COUNT, SUM, AVG can be used. It is based on the multidimensional model that OLAP defines some common types of analysis-oriented operations that make these operations more intuitive.
The multi-dimensional analysis operations of OLAP include: Drill-down, Roll-up, Slice, Dice and Pivot * *. Let's take the data cube as an example.
Drill-down: changes between different levels of dimension, from the upper to the next, or split the summary data into more detailed data, such as drilling through the total sales data for the second quarter of 2010 to view the monthly consumption data for April, May, and June of 2010, as shown above; of course, you can also drill into Zhejiang Province to view Hangzhou, Ningbo, Wenzhou. Sales figures for these cities.
Volume 1 (Roll-up): the reverse operation of drilling, that is, the aggregation from fine-grained data to high-level data, such as summarizing the sales data of Jiangsu, Shanghai and Zhejiang provinces to view the sales data of Jiangsu, Zhejiang and Shanghai, as shown in the figure above.
Slice: select specific values in the dimension for analysis, such as selecting only sales data for electronic products, or data for the second quarter of 2010.
Dice: select a specific range of data in the dimension or a batch of specific values for analysis, such as sales data from the first quarter of 2010 to the second quarter of 2010, or sales data for electronics and commodities.
Pivot: the interchange of the position of a dimension is like the conversion of rows and rows of a two-dimensional table, such as the interchange of product and region dimensions through rotation in the figure.
1.6Classification of OLAP
According to the data storage mode, it can be divided into MOLAP, ROLAP and HOLAP.
1.6.1 Multidimensional OLAP (MOLAP)
MOLAP is a classic form of OLAP. MOLAP stores data in optimized multidimensional arrays rather than in relational databases. The attribute value of the dimension is mapped to the subscript value or range of the subscript of the multidimensional array, while the measurement data is stored in the cell of the array as the value of the multidimensional array. Because MOLAP adopts a new storage structure, which is implemented from the physical layer, it is also called physical OLAP (PhysicalOLAP); while ROLAP is mainly implemented by some software tools or intermediate software, the physical layer still uses the storage structure of relational database, so it is called virtual OLAP (VirtualOLAP).
Some MOLAP tools require precomputation and storage of data, and such MOLAP tools usually make use of pre-calculated data sets called "data cubes." The data cube contains all possible answers to a given range of questions. As a result, they respond very quickly to queries. On the other hand, depending on the degree of pre-calculation, the update may take a long time. Pre-calculation can also lead to a so-called data explosion.
1.6.2 Relational OLAP (ROLAP)
ROLAP stores multidimensional data for analysis in a relational database. This approach relies on the SQL language to implement the slicing and slicing functions of traditional OLAP. In essence, slicing and slicing and other actions are equivalent to adding "WHERE" clauses to SQL statements. Instead of using pre-calculated cubes, the ROLAP tool queries the standard relational database and its tables to get the data needed to answer the question. The ROLAP tool has the ability to ask any question, because the method (SQL) is not limited to the contents of the cube.
Although ROLAP uses relational databases as the underlying storage, these databases are generally optimized for ROLAP, such as parallel storage, parallel queries, parallel data management, cost-based query optimization, bitmap indexing, SQL OLAP extensions (cube,rollup), and so on. A database designed for OLTP does not work as well as a ROLAP database.
1.6.3 Hybrid OLAP (HOLAP)
Because MOLAP and ROLAP have their own advantages and disadvantages, and their structures are very different, it is a difficult problem for analysts to design OLAP structures. For this reason, a new OLAP structure, Hybrid OLAP (HOLAP), has been proposed to bridge the technological gap between these two products by allowing both multidimensional database (MDDB) and relational database (RDBMS) to be used as data storage. It allows model designers to decide which data is stored in MDDB and which is stored in RDBMS, for example, large amounts of detailed data are stored in relational tables, while pre-calculated aggregate data is stored in cubes. At present, the whole industry has not reached a clear consensus on "mixed OLAP".
1.6.4 Comparative Analysis of MOLAP and ROLAP
1.7 relationship between OLAP and other concepts 1.7.1 OLAP vs OLTP
The goals of the two designs are completely different:
OLTP (On-Line Transaction Processing), online transaction processing, is commonly used in business systems. OLTP has a very high requirement for transactional processing, which is generally a highly available online system, mainly based on traditional relational databases. Its applications are generally based on small transactions and small queries. When evaluating its system, it is generally based on the number of Transaction and SQL executed per second. In such a system, the Transaction (add, delete, change) processed by a single database often reaches hundreds of thousands per second, and the execution of Select query statements is thousands or even tens of thousands per second. Typical OLTP systems include e-commerce system, bank trading system, securities trading system and so on.
OLAP, which is generally used to analyze systems. The applications on it are generally based on queries with a large amount of data, with few operations to modify and delete. In such a system, the amount of execution of SQL statements is not an assessment indicator, because the execution time of a statement may be very long, and a lot of data is read. Therefore, when evaluating the system, we often look at the throughput of the system, complex query response time, data loading performance and so on.
A detailed comparison between the two is as follows:
1.7.2 OLAP vs data Warehouse / data Mart
There are several ways to model a data warehouse:
ER model (entity-relational model)
Data Vault model
Anchor model
Dimensional model
The first three models mainly aim to integrate the data from each business system into a unified data warehouse and carry out consistent processing to provide data models and atomic data that meet the third paradigm or higher paradigm. This kind of data warehouse is called enterprise data warehouse under the architecture of CIF (Corporate Information Factory). This kind of data warehouse architecture is advocated by Inmon, the father of data warehouse. However, due to the use of the normalized model, it becomes very difficult to query these atomic data, and this architecture can not be directly used to support analytical decisions. In order to better support the analysis, under this architecture, it is usually necessary to establish some data subsets, that is, data Marts, on the basis of the data warehouse. These data marts usually use dimensional models, and OLAP tools can work based on data marts. Data marts are usually built on OLAP systems.
The fourth model (dimensional model) is proposed by Kimball, another master in the field of data warehouse, and it is the most popular modeling method in the field of data warehouse at present. The dimensional model can not only support the analysis of decision-making requirements, but also has a good response performance of large-scale complex queries. The dimensional model can be docked directly with the OLAP tool. The data warehouse architecture advocated by Kimball is as follows, and the data warehouse based on this architecture can directly provide OLAP capabilities. The data warehouse established in this way becomes an OLAP system in itself.
1.7.3 OLAP vs BI tool
BI is the English abbreviation of Business Intelligence, which is interpreted as business intelligence in Chinese. It is a collection of technologies that use data to improve the quality of decision-making. It is a process of extracting information and knowledge from a large amount of data. OLAP and BI often appear together, and OLAP is an underlying technology of BI tools. BI tools can usually interface with OLAP systems, but not limited to this, but also directly with other databases and storage systems.
1.7.4 OLAP vs impromptu query
Ad hoc is a common Latin phrase meaning "ad hoc, purpose-specific, temporary, project". Ad hoc query (Ad Hoc Queries) refers to the query dynamically created by users according to their own needs, contrary to the predefined query.
Ad hoc query does not require a data model, as long as it can provide the ability of dynamic query, while in OLAP systems, the data model is generally required to be a multi-dimensional data model. For ROLAP systems, impromptu query capabilities are usually provided, but there is little difference between them, so they are often mixed.
The above is how to understand the concept of big data's OLAP system. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.