In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "what is the basic architecture of data warehouse". It is easy to understand and well organized. I hope it can help you solve your doubts. Let me lead you to study and learn the article "what is the basic architecture of data warehouse".
Basic Architecture of data Warehouse
The purpose of data warehouse is to build an analysis-oriented integrated data environment and provide decision support (Decision Support) for enterprises. In fact, the data warehouse itself does not "produce" any data, and it does not need to "consume" any data. The data comes from external sources and is open to external applications, which is why it is called a "warehouse" rather than a "factory". Therefore, the basic architecture of data warehouse mainly includes the process of data inflow and outflow, which can be divided into three layers-source data, data warehouse and data application:
It can be seen from the figure that the data of the data warehouse comes from different sources and provides a variety of data applications. The data flows into the data warehouse from top to bottom and then opens the application to the upper layer, while the data warehouse is only a platform for intermediate integrated data management.
Data sources of data Warehouse
Data warehouse obtaining data from various data sources and data conversion and flow in the data warehouse can be regarded as the process of ETL (extracting Extra, transforming Transfer, loading Load). ETL is the assembly line of the data warehouse, and it can also be regarded as the blood of the data warehouse, it maintains the data metabolism in the data warehouse, and most of the energy of the daily management and maintenance of the data warehouse is to keep the ETL normal and stable.
Data Storage of data Warehouse
The data warehouse does not need to store all the original data, and the data warehouse needs to store some of the detailed data. A simple explanation:
a. Why not need all the raw data? Data warehouse is oriented to analysis and processing, but some source data is of no value to analysis or its possible value is much lower than the implementation and performance cost of the data warehouse needed to store the data. For example, we know that the province and city of the user are sufficient, as to where the user lives may only be a matter of concern to the logistics provider, or it may be necessary for the user's comments on the blog to be just text mining. but the loss outweighs the gain by storing these lengthy comments in the data warehouse.
b. Why save detailed data? Detailed data is necessary, the analysis requirements of the data warehouse will change from time to time, and with the detailed data, we can keep up with the changes. If we only store data models based on certain requirements, it is clear that we will be at a loss for frequently changing requirements
c. Why should it be subject-oriented? Topic-oriented is the first feature of data warehouse, which mainly refers to the reasonable organization of data to realize analysis in aspects. For the source data, the data is organized in a variety of forms, such as the data format of the clickstream is unoptimized, and the data of the foreground database is organized and optimized based on OLTP operations, which may not be suitable for analysis, but sorting out the topic-oriented organization form is really conducive to analysis, such as sorting the clickstream log into three topics: page (Page), access (Visit or Session) and user (Visitor). This can obviously improve the efficiency of analysis.
The data warehouse processes the data on the basis of maintaining the detailed data, so that it can really be used in analysis. It mainly includes three aspects:
1. Aggregation of data
The aggregate data here refers to the simple aggregation based on specific requirements (the aggregation based on multidimensional data is reflected in the multidimensional data model). The simple aggregation can be the total Pageviews, Visits, Unique Visitors and other aggregate data of the website, or it can also be Avg. Time on page 、 Avg. Average data such as time on site, which can be displayed directly on the report.
two。 Multidimensional data model
Multi-dimensional data model provides multi-angle and multi-level analysis applications, such as sales star model and snowflake model based on time dimension and regional dimension, which can realize cross-query in various time dimensions and regional dimensions. and subdivision based on time and regional dimensions. Therefore, the application of multidimensional data model is generally based on online analytical processing (Online Analytical Process, OLAP), and the data Mart for specific demand groups will also be built based on multidimensional data model.
3. Business model
The business model here refers to the data model based on some data analysis and decision support, such as user evaluation model, association recommendation model, RFM analysis model, or decision support linear programming model, inventory model, etc.; at the same time, the pre-data processing of data mining can also be completed here.
Data Application of data Warehouse
Report display
The report is an indispensable data application in almost every data warehouse, showing the aggregated data and multi-dimensional analysis data to the report, providing the most simple and intuitive data.
Instant query
In theory, all data in the data warehouse (including detailed data, aggregated data, multidimensional data and analytical data) should be open to real-time query, which provides a flexible way to obtain data. Users can query and obtain data according to their own needs.
Data analysis
Most of the data analysis is based on the built business model, of course, the aggregated data can also be used for trend analysis, comparative analysis, correlation analysis and so on, while the multidimensional data model provides the data basis for multidimensional analysis. at the same time, it is also a common way to obtain some sample data from the detailed data for specific analysis.
data mining
Data mining uses some advanced algorithms to make the data show a variety of surprising results. Data mining can be developed based on the business model that has been built in the data warehouse, but most of the time data mining will start directly from the detailed data, and the data warehouse provides data interfaces for mining tools such as SAS, SPSS and so on.
The development process of the data warehouse:
On the first day, I was familiar with several systems that essentially perform operational processing.
On the second day, when the data is loaded into the first few tables of the first topic area in the data warehouse, there will be some curiosity, and users begin to discover the data warehouse and analyze and process.
On the third day, more data is loaded into the data warehouse, and as the amount of data increases, it will attract more users. Once users find that there are integrated data sources that are easier to load and have a historical basis for observing the data in the time dimension, this is not just curiosity. Around this time, serious DSS analysts are gradually attracted to the data warehouse.
On the fourth day, as more data is loaded into the data warehouse, a batch of data stored in an operational environment is appropriately placed in the data warehouse. Now, we "discover" that the data warehouse is a source of information that can be used for analysis and processing. A variety of DSS applications have emerged. Indeed, with the large-scale data now stored in the data warehouse, there are so many users and so many processing requests that the requirements and analysis of some users entering the data warehouse are delayed. The competition to enter the data warehouse has become an obstacle to the use of the data warehouse.
On the fifth day, departmental databases (data Marts, or OLAP) began to rise, and departments found that they made their processing cheap and easy by entering data from the data warehouse into their own departmental processing environment. The data that reaches the departmental level attracts some D S S analysts.
On the 6th day, the departmental system was busy, and it was cheaper, faster and easier to get departmental data than to obtain data from a data warehouse. Soon the end user abandoned the details of the data warehouse and went to the department to deal with it.
On the nth day, this architecture was fully developed. Only operational processing is left in the original collection of the production system. The data warehouse has rich data, some direct users of the data warehouse and many departmental databases. Because it is easy and cheap to get the data needed for processing at the departmental level, most DSS analysis processing is done at the departmental level.
Of course, evolution from day 1 to day n takes a long time, usually several years. And in the process from day 1 to day n, the DSS environment is constantly improving and functional.
Metadata management
Metadata (Meta Date) should actually be called interpretive data, or data dictionary, that is, the data of data. It mainly records the definition of the model in the data warehouse, the mapping relationship between each level, the data state of the monitoring data warehouse and the task running status of the ETL. Generally, metadata is stored and managed uniformly through metadata database (Metadata Repository). Its main purpose is to achieve coordination and agreement in the design, deployment, operation and management of data warehouse.
These are all the contents of the article "what is the basic structure of data Warehouse". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.