In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Xiaobian today takes you to understand how OLTP and OLAP queries in data warehouses are, and the knowledge points in the article are introduced in great detail. Friends who feel helpful can browse the content of the article together with Xiaobian, hoping to help more friends who want to solve this problem find the answer to the problem. Let's learn more about OLTP and OLAP queries in data warehouse together with Xiaobian.
In the early days of business data processing, writes to databases usually corresponded to ongoing business transactions-making sales, placing orders with suppliers, paying employees, etc. As the database expanded into uninvolved domains involving currency changing hands, the term transaction remained, referring to a set of read and write operations that make up a logical unit. These types of queries are called transaction processing system queries (OLTP). Systems designed for these queries are usually user-oriented, meaning they are likely to see a large number of requests. To handle the load, applications typically touch only a small number of records per query. The application uses some sort of key to request records, and the storage engine uses an index to look up data for the requested key. Disk lookup time is usually the bottleneck here.
But databases are also starting to be used more and more for data analysis, which has very different access patterns. Often, analytical queries require scanning a large number of records, reading only a few columns of each record, and calculating summary statistics (such as counts, sums, or averages) rather than returning raw data to the user. For example, if your data is a sales transaction table, the analytical query might be:
What was our gross revenue per store in January?
How many more iPhones have we sold than usual in recent promotions?
Which brand of milk is most often purchased with Kellogg's cornflakes?
These queries are typically written by business analysts and fed into reports that help company management make better decisions (business intelligence). To distinguish this pattern of using databases from transaction processing, it is called online analytical processing (OLAP). They are little-known because they are handled by business analysts rather than end users. They handle a much smaller number of queries than OLTP systems, but each query is often demanding, requiring millions of records to be scanned in a short period of time. Disk bandwidth (not seek time) is often the bottleneck here, and column-oriented storage is an increasingly popular solution for such workloads.
The difference between OLTP and OLAP is not always clear, but some typical characteristics are listed below.
First, use the same database for both transaction processing and analytical queries. SQL has proven to be very flexible in this regard: it works well for both OLTP and OLAP type queries. Nevertheless, in the late 1980s and early 1990s, there was a trend for companies to stop using OLTP systems for analysis and instead run analysis on separate databases. This independent database is called a data warehouse.
A business may have dozens of different transaction processing systems: systems powering customer-facing websites, point-of-sale (checkout) systems in physical stores, inventory tracking in warehouses, vehicle routing, supplier management, employee management, etc. One of these systems is complex and requires a team to maintain it, so these systems end up operating independently of each other. These OLTP systems are generally expected to have high availability and process transactions with low latency because they are often critical to business operations. Therefore, database administrators closely protect their OLTP databases. They are generally reluctant to have business analysts run ad hoc analytical queries on OLTP databases because these queries are often expensive and scan large portions of the dataset, which can compromise the performance of concurrently executing transactions.
data warehouse
In contrast, a data warehouse is a stand-alone database whose inner contents analysts can query without affecting OLTP operations. A data warehouse contains read-only copies of data from all of the company's various OLTP systems. Extract data from OLTP databases (using periodic data dumps or continuous streams of updates), convert it into patterns that are easy to analyze, clean it, and load it into a data warehouse. The process of putting data into a warehouse is called Extract-Transform-Load (ETL). Data warehouses now exist in almost all large businesses, but are almost unheard of in small businesses. This is probably because most small companies don't have many different OLTP systems; and most small companies have small amounts of data-small enough to be queried in regular SQL databases and even analyzed in spreadsheets. In a large company, it takes a lot of hard work to do something that is simple in a small company.
A big advantage of using a separate data warehouse for analytics rather than querying OLTP systems directly is that the data warehouse can be optimized for analytics access patterns. Some databases, such as Microsoft SQL Server and SAP HANA, support transaction processing and data warehousing in the same product. However, they are increasingly becoming two separate storage and query engines that happen to be accessible through a common SQL interface. Data warehouse vendors such as Teradata, Vertica, SAP HANA and ParAccel typically sell their systems under expensive commercial licenses. Amazon RedShift is a hosted version of ParAccel. Recently, there have been many open source SQL-on-Hadoop projects. They are young but designed to compete with commercial data warehouse systems. These include Apache hive, Spark SQL, Cloudera Impala, Facebook Presto, Apache Tajo and Apache Drill. Some of them are based on Google Dremel ideas.
Analytics Storage Architecture
Depending on the requirements of the application, a variety of different data models are used in the transaction domain. On the other hand, there is much less diversity in data models in analysis. Many data warehouses are used in a fairly formulaic way called star schema (also known as dimensional modeling). Often, facts are captured as a single event because this maximizes analysis later. However, this means that fact tables can get very large.
Stars and snowflakes:
The name "star schema" comes from the fact that when table relationships are visualized, the fact table is in the middle and surrounded by its dimension table; the connections of these tables are like the light of stars. A variation of this template is called snowflake mode, where the size is further subdivided into multiple subdimensions. A large enterprise like Apple, Walmart, or eBay may have tens of petabytes of transaction history in its data warehouse, most of which are actually tables.
Column storage:
Although fact tables typically exceed 100 columns, a typical data warehouse query can access only four or five of them at a time. In most OLTP databases, storage is laid out in a row-oriented manner: all values in a row of a table are stored next to each other. To process an analytical query such as "Find the average sales of an item X in December," the row-oriented storage engine still needs to load all of these rows (each row contains more than 100 attributes) from disk into memory, parse them, and filter out non-conforming conditions, which can take a long time. The idea behind column-oriented storage is simple: instead of storing all values in a row together, store all values in each column together. If each column is stored in a separate file, the query only needs to read and parse those columns used in the query, which can save a lot of work.
Column compression:
Typically, the number of distinct values in a column is small compared to the number of rows (e.g., a retailer may make billions of sales transactions but only 100,000 distinct products). Depending on the data in the column, different compression techniques can be used-one technique that is particularly effective in data warehouses is bitmap encoding.
Now we can take a column containing n different values and convert it into n separate bitmaps-one bitmap for each different value, one bit per row. This bit is 1 if the row has this value, 0 otherwise. If n is small (e.g., a country column might have about 200 different values), these bitmaps can be stored one bit per row.
Thank you for reading, the above is "OLTP and OLAP queries in the data warehouse is how" all the content, learn friends to quickly operate it. I believe that Xiaobian will definitely bring you better quality articles. Thank you for your support of the website!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.