In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Data Warehouse is a databas activity designed to achieve business intelligence: to help users understand and enhance the performance of their organizations. It is designed for query and analysis rather than transaction processing, and usually contains historical data derived from transaction data, but can contain data from other sources. The data warehouse separates the analysis workload from the transactional workload and enables organizations to merge data from multiple sources. This helps:
Maintain historical records
Analyze data to better understand and improve the business
In addition to relational databases, the data warehouse environment can include extract, transfer, transform and load (ETL) solutions, statistical analysis, reporting, data mining functions, client-side analysis tools, and other applications that manage the data collection process. To convert it into useful, actionable information and deliver it to business users
To achieve the goal of enhancing business intelligence, the data warehouse uses data collected from multiple sources. Source data may come from internally developed systems, purchased applications, third-party data aggregators, and other sources. It may involve trading, production, marketing, human resources, etc. In today's big data world, the data could be billions of clicks on a website, or large data streams from sensors built into complex machines.
Data warehouse is different from online transaction processing (OLTP) system. Using a data warehouse, you can combine analysis workloads with transaction workers, so the data warehouse is a very read-oriented system. They have a higher amount of data read than writes and updates. This enables better analytical performance and avoids affecting your transactional system. You can optimize your data warehouse system to integrate data from multiple sources to achieve a key goal: it becomes a "single source of fact" for your organization. It is valuable to have consistent data sources that all users can focus on; it can prevent many disputes and improve the efficiency of decision-making.
Data warehouses usually store months or years of data to support historical analysis. Data in a data warehouse is usually loaded through the process of extraction, transformation, and loading (ETL) from multiple data sources. Modern data warehouses are moving towards an extract, load, transform (ELT) architecture, in which all or most of the data transformations are performed on the database that hosts the data warehouse. It is important to note that defining the ETL process is a large part of the data warehouse design effort. Similarly, the speed and reliability of ETL operations are the basis after the start-up and operation of the data warehouse.
Users of the data warehouse perform data analysis that is usually time-related. For example, it includes last year's sales data, inventory analysis and profits by product and customer. But whether the time is concerned or not, users want to "slice and slice" the data they think is appropriate, and a well-designed data warehouse will be flexible enough to meet these needs. Sometimes users need highly aggregated data, and sometimes they need to learn more about the details. More complex analyses include trend analysis and data mining, which use existing data to predict trends or the future. The data warehouse acts as the underlying engine for the middleware business intelligence environment, providing end users with reports, dashboards, and other interfaces.
Although the above discussion focuses on the term "data warehouse", there are two important terms to mention. These are data marts and operational data stores (ODS).
Data Mart and data warehouse have the same function, but their scope is limited. It can serve a particular department or line of business. The advantage of data Marts and data warehouses is that they can be created more quickly because of their limited coverage. However, data marts can also cause inconsistencies. Keeping data and calculation definitions consistent in a data Mart requires strict discipline. This issue has been widely accepted, so there are two styles of data marts. An independent data Mart is a data Mart that is fed directly from the source data. They can become islands with inconsistent information. Subordinate data Marts are provided from existing data warehouses. Relying on data marts can avoid inconsistencies, but they require that enterprise-class data warehouses already exist.
Operational data stores exist to support day-to-day operations. ODS data is cleaned and validated, but it is not historically profound: it may just be the data of the day. Instead of supporting historical queries that the data warehouse can handle, ODS provides the data warehouse with a location to access the latest data that has not yet been loaded into the data warehouse. ODS can also be used as a source for loading data warehouses. As the data warehouse loading technology becomes more advanced, the data warehouse may no longer need ODS as the source for loading data. On the contrary, the constant trickle feed system can load the data warehouse in near real time.
The common way to introduce data warehouse is to refer to the characteristics of data warehouse proposed by William Inmon:
Theme-oriented
Integration
Non-volatile
Time change
Theme-oriented
The data warehouse is designed to help you analyze the data. For example, to learn more about your company's sales data, you can build a data warehouse that focuses on sales. Using this data warehouse, you can answer questions such as "who was the best customer for our project last year?" Or something like that. Or "who is likely to be our best customer next year?" This ability to define a data warehouse by topic, in this case sales, makes the data warehouse topic-oriented.
Integration
Integration is closely related to discipline orientation. The data warehouse must put data from different sources in a consistent format. They must solve problems such as naming conflicts and inconsistencies between units of measurement. When they achieve this goal, they are considered to be integrated.
Non-volatile
Non-volatility means that once you enter the data warehouse, the data should not be changed. This is logical because the purpose of the data warehouse is to enable you to analyze what is happening.
Time change
Data warehouse focus on change over time is the meaning of the term time change. In order to identify trends and identify hidden patterns and relationships in the business, analysts need a lot of data. This is in sharp contrast to online transaction processing (OLTP) systems, whose performance requirements require historical data to be moved to an archive.
1.1.1 main features of data warehouse
The characteristics of the Kay data warehouse are as follows:
The structure of the data simplifies access and high-speed query performance.
End users are time-sensitive and crave response time for thinking speed.
Use a large amount of historical data.
Queries typically retrieve large amounts of data, perhaps thousands of rows.
Both predefined and ad hoc queries are common.
Data loading involves multiple sources and transformations.
In general, fast query performance with high data throughput is the key to a successful data warehouse.
1.2 compare OLTP and data warehouse environments
There are important differences between OLTP systems and data warehouses. One of the main differences between system types is that the data warehouse is not just the third normal form (3NF), which is a common data normalization in the OLTP environment.
Data warehouse and OLTP system have very different requirements. Here are some examples of the differences between a typical data warehouse and an OLTP system:
amount of work
The data warehouse is designed to accommodate ad hoc queries and data analysis. You may not know the workload of the data warehouse in advance, so you should optimize the data warehouse to work well in a variety of possible query and analysis operations.
The OLTP system only supports predefined operations. Your application may be specifically tuned or designed to support only these operations.
Data modification
The ETL process (run nightly or weekly) periodically updates the data warehouse using bulk data modification techniques. The end user of the data warehouse does not update the data warehouse directly unless analytical tools such as data mining are used to predict the relevant probabilities, assign customers to market segments, and develop customer profiles.
In the OLTP system, the end user periodically issues separate data modification statements to the database. The OLTP database is always up to date and reflects the current status of each business transaction.
Architecture design
Data warehouses usually use partial non-normalization patterns to optimize query and analysis performance.
OLTP systems usually use a fully normalized schema to optimize update / insert / delete performance and to ensure data consistency.
Typical operation
A typical data warehouse query scans thousands or millions of rows. For example, "find the total sales of all customers last month."
A typical OLTP operation accesses only a few records. For example, retrieve the current order for this customer.
Historical data
Data warehouses usually store data for months or years. This is to support historical analysis and reporting.
OLTP systems usually store only weeks or months of data. The OLTP system stores historical data only as needed to successfully meet the requirements of the current transaction.
1.3 Common data warehouse tasks
As an O.racle data warehouse administrator or designer, you can expect to participate in the following tasks:
Configure the Oracle database to use as a data warehouse
Design data warehouse
Upgrade database and data warehouse software to a new version
Manage schema objects, such as tables, indexes, and materialized views
Manage users and security
Develop routines for extract, transform, and load (ETL) procedures
Create a report based on the data in the data warehouse
Back up the data warehouse and perform a restore if necessary
Monitor the performance of the data warehouse and take preventive or corrective actions as needed
In a small and medium-sized data warehouse environment, you may be the only person performing these tasks. In large enterprise environments, jobs are usually divided into several DBA and designers, each with their own expertise, such as database security or database tuning
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.