In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "what is a data Mart in big data". The content is simple and clear. I hope it can help you solve your doubts. Next, let the editor lead you to study and learn the article "what is a data Mart in big data?"
1. What is a data Mart? What is the difference between a data Mart and a data warehouse?
Data warehouse (Data Warehouse) is a topic-oriented (Subject Oriented), integrated (Integrate), relatively stable (Non-Volatile), reflecting historical changes (Time Variant) data set used to support management decisions. We can understand the concept of data warehouse from two levels: first, data warehouse is used to support decision-making and is oriented to analytical data processing, which is different from the existing operational database of enterprises; secondly, data warehouse is an effective integration of multiple heterogeneous data sources, reorganized according to the theme, and contains historical data, and the data stored in the data warehouse is generally no longer modified. (note: this definition comes from the book "Buildingthe Data Warehouse" written by the famous data warehouse expert W. H. Inmon).
A data Mart, also known as a data Mart, is a warehouse that collects data from operational data and other data sources that serve a particular group of professionals. In terms of scope, data is extracted from enterprise-wide databases, data warehouses, or more professional data warehouses. The focus of the data center is that it caters to the special needs of professional users in terms of analysis, content, performance, and ease of use. Users in the data center want the data to be represented in terms they are familiar with.
The data Mart is a subset of the enterprise data warehouse, which is mainly oriented to departmental business and only to a specific topic. In order to resolve the contradiction between flexibility and performance, a data Mart is a small department or workgroup level data warehouse added to the data warehouse architecture. Data Marts store pre-calculated data for specific users to meet their performance needs. Data Mart can alleviate the bottleneck of accessing data warehouse to some extent.
The main characteristics of data Marts are: 1) small scale; 2) oriented to departments; 3) specific applications; 4) defined, designed and developed by business units; 5) managed and maintained by business units; 6) can be implemented quickly; 7) cheaper to purchase; 8) rapid recovery of investment; 9) tight integration of tool sets; 10) provide a more detailed, pre-existing, summary subset of data warehouses 11) can be upgraded to a complete data warehouse.
The main difference between data Mart and data warehouse: data warehouse is enterprise-level, which can provide decision support means for the operation of all departments of the whole enterprise. On the other hand, the data Mart is a kind of micro data warehouse, which usually has less data, fewer topic areas, and less historical data, so it is at the departmental level and can only serve managers within a certain local scope. so it's also called departmental data warehouse.
Data warehouse
Data Mart
The source of data
Production system, external data, etc.
Data warehouse
Scope and scale
Enterprise level
Departmental or working group level
Theme
Take the enterprise as the theme
Focus on departmental or special analysis
Data granularity
The finest granularity
Coarser granularity
Data structure
The third normal form, standardized structure
Star model, snowflake model, constellation model
Historical data
A large amount of historical data
Moderate historical data
Optimize
Exploration of dealing with massive data and data
Easy access and analysis, quick query
Indexes
Height index
Height index
Data marts can be divided into two types-independent data marts and subordinate data marts. The independent data Mart obtains data directly from the operational environment, and the subordinate data Mart obtains data from the enterprise data warehouse. The architecture with subordinate data Marts is shown in figure 2.
The scale of the data warehouse is large and the cycle is long, which is difficult for some small enterprise users to bear. Therefore, as an effective way to quickly solve the current practical problems of enterprises, independent data Mart has become a fait accompli. Independent data Mart is an analytical environment established to meet the needs of specific users (usually at the departmental level). It can solve some specific problems quickly, and the scale of investment is much smaller than that of data warehouses.
The existence of independent data Mart will give people an illusion that it is possible to build a data Mart independently, and then directly transform it into a data warehouse when the data Mart reaches a certain scale. Some salespeople will promote this view, but its essence is often because the sales cycle of building an enterprise data warehouse is too long to operate.
The accumulation of multiple independent data marts cannot form an enterprise-level data warehouse, which is determined by the characteristics of the data warehouse and the data Mart itself-the data Mart is used by various departments or working groups. It is inevitable that there is inconsistency between different marts. Because of breaking away from the data warehouse, when several independent data marts grow to a certain scale, because there is no unified data warehouse coordination, the enterprise will only add some isolated islands of information, and still can not analyze the data in the view of the whole enterprise. To borrow the analogy of Inmon: it is impossible to pile up small fish in the sea to form a big whale, which also shows that there is an essential difference between a data warehouse and a data Mart.
If the enterprise finally wants to build a unified data warehouse of the whole enterprise and wants to analyze the data in the view of the whole enterprise, the independent data Mart may not be the appropriate choice; that is to say, it is not appropriate to build the data Mart independently first. When the data Mart reaches a certain scale and then directly converted into a data warehouse "is not appropriate. In the long run, the generic data Mart is more stable than the independent data Mart in architecture, which can be said to be the main direction of the future construction of the data Mart.
two。 Why is there a data Mart? What are the characteristics of a good data Mart?
Although OLTP and legacy systems have valuable information, it may be difficult and slow to extract meaningful information from these systems. And although these systems generally support reports with predefined operations, they are often unable to support an organization's need for historical, federated, intelligent, or easily accessible information. Because the data is distributed in many tables across systems and platforms, and is usually "dirty", it contains inconsistent and invalid values, making it difficult to analyze.
The data Mart will merge data sources from different systems to meet business information needs. If effectively implemented, the data Mart will have quick and easy access to simple information as well as systematic and historical views. A well-designed data Mart has the following characteristics (some features are also available in data warehouses, and some features are relative to data warehouses):
(1) the information needed by a specific user group is usually users of a department or a specific organization, and is not subject to a large number of requirements and operational crises of the source system (as opposed to the data warehouse).
(2) support access to non-volatile (nonvolatile) business information. (non-volatile information is updated at predetermined intervals and is not affected by ongoing updates in the OLTP system. )
(3) reconcile information from multiple operating systems in the organization, such as accounts, sales, inventory and customer management, as well as industry data outside the organization.
(4) provide cleansed data by default valid values, keeping the values of each system consistent, and adding descriptions to make the implicit code meaningful.
(5) provide reasonable query response time for ad hoc analysis and predefined reports (because the data Mart is departmental, compared with the huge data warehouse, the response time of query and analysis will be greatly shortened).
3. Data structure of data Mart
The structure of data in a data Mart is usually described as a star structure or a snowflake structure. A star structure consists of two basic parts-- a fact table and various supporting dimension tables.
(1) fact table
Fact tables describe the densest data in a data Mart. In phone companies, the data used for calls is typically the densest; in banks, data related to account reconciliations and ATMs is typically the densest. For the retail industry, sales and inventory data are the most intensive data, and so on.
A fact table is a combination of many types of data that are connected in advance, including: a primary key of an entity that reflects the purpose for which the fact table is established, such as an order, a sale, a phone call, etc., the primary key information, the foreign key that connects the fact table with the dimension table, and the non-key external data carried by the foreign key. If this non-key external data is often used for data analysis in the fact table, it will be included in the scope of the fact table. The fact table is highly indexed. It is very common to have 30 to 40 indexes in the fact table. Sometimes each column of the fact table is indexed, and the result is that the data in the fact table is very easy to read. However, the number of resources required to import the index must provide a factor for the equation. In general, the data in the fact table cannot be changed, but you can enter data, and once you enter a record correctly, you cannot change anything about that record.
(2) Dimension table
The dimension table is built around the fact table. The dimension table contains non-intensive data, which is connected to the fact table through foreign keys. Typical dimension tables are based on data marts, including product catalogs, customer lists, vendor lists, and so on.
The data in the data Mart comes from the enterprise data warehouse. All data, with one exception, should go through the enterprise data warehouse before being imported into the data Mart. This exception is specific data for data marts, which cannot be used elsewhere in the data warehouse. External data usually fall into this category. If this is not the case, the data will be used elsewhere in the decision support system, and the data must go through the enterprise data warehouse.
A data Mart contains two types of data, usually detailed data and summary data.
(1) detailed data
The detailed data in the data Mart is contained in the star structure. When the data passes through the enterprise data warehouse, the star structure is well summarized. In this case, the enterprise data warehouse contains the necessary basic data, while the data Mart contains data of higher interval sizes. However, in the minds of data Mart users, the data of the star structure is as detailed as when the data is obtained.
(2) Summary data
The second type of data that a data Mart contains is summary data. Analysts usually create a variety of summary data from the data in the star structure. A typical summary may be the total monthly sales of the sales area. Because the basis of the summary is constantly changing, the historical data is in the data Mart. But the advantage of these historical data lies in the generalization level of its storage. There is very little historical data stored in the star structure.
The data Mart is updated on the basis of enterprise data warehouse. It is very common for a data Mart to update about once a week. However, the update time of the data Mart can be less than one week or more than one week, which is mainly determined by the needs of the department to which the data Mart belongs.
4. How to build a data Mart?
Data warehouses (bazaars) can be designed in an iterative way. In iterative development, each iteration adds new functionality to the previous result. The order in which functionality is added takes into account iterative balance and the early detection of major risks. To put it popularly, it means to deliver imperfect intermediate products to customers many times before formal delivery. Some of these intermediate products will have some features that have not yet been added and are not stable, but after the customer proposes changes, the developer will be able to better understand the customer's needs. So repeatedly, so that the quality of products can gradually approach the requirements of customers. This development method has a long cycle and high cost, but it can avoid the risk of overturning the whole project, so it is more suitable for large projects and high-risk projects.
In theory, there should be a general concept of data warehouse before there is a data Mart. When actually building a data warehouse (bazaar), it is rarely done in China. In China, we usually start with the data Mart, do the data Mart for a specific topic (such as the customer information of the enterprise), and then build a data warehouse. The priority of the establishment of data warehouse and data Mart is closely related to the design method. Data warehouse as an engineering discipline, there is no distinction between right and wrong, the main way to judge should be whether to solve the current practical problems, and to maintain a certain scalability for the problems that may occur in the future.
5. Data Warehouse Modeling and data Mart Modeling
Data is just a record of all business activities, resources, and enterprise results. The data model is a well-organized abstraction of that data, so it is extremely natural that the data model becomes the best way to understand and manage the enterprise business. The data model plays a role in guiding or planning the implementation of the data warehouse. Before the real implementation begins, combining the data model for each business area can help ensure that the result is an effective data warehouse and can help reduce the cost of implementation.
(1) Modeling of data warehouse
Modeling data in a data warehouse is the process of transforming requirements into pictures and supporting metadata that represents those requirements. For readability purposes, this article separates the discussion of requirements from modeling, but in practice these steps often overlap. Once some initial requirements are recorded in the document, the initial model begins to take shape. As the requirements become more complete, so will the model.
The most important thing is to provide end users with a logical model of the data warehouse that is well integrated and easy to interpret. These logical models are one of the cores of data warehouse metadata. The simplicity provided to end-users and the integration and federation of historical data are the key principles that modeling methods should help provide.
(2) data modeling of data Mart
Because the warehouse end-user interacts directly with the data Mart, the modeling of the data Mart is one of the most effective tools to capture the end-user business requirements. The modeling process of a data Mart depends on many factors. The three most important ones are described below:
The modeling of data Marts is end-user-driven. End users must participate in the modeling process of the data Mart because they are obviously the people who want to use the data Mart. Because you should expect end users to be completely unfamiliar with complex data models, modeling techniques and processes should be organized as a whole to make complexity transparent to end users.
The modeling of data Marts is driven by business requirements. Data Mart models are useful for capturing business requirements because they are usually used directly by end users and are easy to understand.
The modeling of data Mart is greatly affected by data analysis technology. Data analysis techniques can affect the type and content of the selected data model. At present, there are several commonly used data analysis techniques: query and report making, multidimensional analysis and data mining.
An ER model with formal (normalized) or informal (denormalized) data structures is most appropriate if it is only intended to provide query and reporting capabilities. A dimensional data model may also be a better choice because it is user-friendly and has better performance. If its goal is to perform multidimensional data analysis, then the dimensional data model is the only option here. However, data mining usually works best at the lowest level of detail (level of detail) available. Therefore, if the data warehouse is used for data mining, you should include lower level of detail (level of detail) data in the model.
The above is all the contents of the article "what is a data Mart in big data". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.