Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Index in data Warehouse

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to use the index in the data warehouse, many people may not know much about it. In order to make you understand better, the editor summed up the following content for you. I hope you can get something from this article.

How to use Index in data Warehouse

Indexing of data warehouses is a thorny problem. If there are too many indexes, the data is inserted quickly but the query response is slow. If there are too many indexes, the data import is slow and the data storage space is larger, but the query response is faster. The function of the index in the database is to accelerate the query speed, whether it is the traditional database or the data warehouse. Especially for tables with a large amount of data and complex queries for designing table joins. There was little contact with the data warehouse before, so here is just a little experience.

Of course, there are some parameters to consider when creating a data warehouse index, such as data warehouse type, dimension table and fact table size, whether to partition, whether to AD hoc, and so on. These parameters determine your index structure. This article focuses on how to index relational tables in a data warehouse, paying attention to relational tables in a relational database, not data tables in SSAS.

Dimensional index

If you plan to build an index on the primary key of the dimension table, which is a surrogate key, not a natural or business key (such as user name or ID). Be careful not to create a clustered index on the surrogate key of the dimension table or on the columns that realize the gradient.

The dimension table contains a natural or business key (such as a transaction code or ID), which we call a business key from the business system. Although the business key may not be *, for slowly changing dimension tables, it is better to index the identity column (such as user ID, etc.), as shown below:

Clustered indexes in user and product dimension tables are built on business keys, and such indexes can enhance query speed, especially those that are used in where statements. This key value is often used in where expressions to query dimensional data.

Building a clustered index through a business key can avoid lock escalation (for example, row lock to table lock, with the intention of excluding to exclusive), because lock escalation may occur if there is a nonclustered index on the proxy key and all rows are added to the end of the file during the ETL process, and if the exclusive lock is escalated from a row lock to a table lock, it will cause blocking or even deadlock of other read or ETL or general operations, and the final program timeout.

In the figure above, the Date dimension and the Time dimension have external data sources or business keys. Consider using YYYYMMDD and HHMMSSSSS formats as primary keys for both tables and establishing a clustered index. This value ensures the indexing order, simplifies the range query in the fact table, and the key value also contains a date or time, which no longer requires a specific time.

For large slowly gradual dimension tables (for example, you need to type new data here), it may be possible to create a four-part nonclustered index that includes business keys, record start times, record end times, and surrogate keys. To be efficient and prevent storage from growing, use Include to include the record end time and surrogate key, as follows:

3CREATE NONCLUSTERED INDEX MyDim_CoveringIndex ON (NaturalKEY, RecordStartDate) INCLUDE (RecordEndDate, SurrogateKEY)

This index is very effective in querying and manipulating historical data during ETL, reducing the number of columns through nonclustered indexes, thus reducing unnecessary storage space. The relational database engine can obtain data directly from the index without directly accessing the dimensional data, which reduces the IO and improves the query speed.

If there are other columns in the dimension table for querying, sorting, and grouping, you can also create nonclustered indexes, just as you would in a transactional database. If there is an embedded hierarchy in the dimension table, such as the class-subclass-product ID hierarchy relationship in the product dimension table, considering indexing on the key values of the hierarchy will significantly improve the data query without affecting the data import.

Index on the fact table

Similar to indexing on a dimension table, conditions such as partitioning need to be taken into account. You can create a clustered index on a date column or a mixed date + time column. Because BI analysis always uses date / time components, the fact table contains date or datetime columns, and the use of clustered indexes here helps build cube. For this reason, data records are also stored in the order of date or datetime. The query for history has its advantages. If the fact table has more than one such column, you need to index on the column that queries or builds cube most frequently.

If you are partitioning on a date column, you can use a clustered index on that column. When it is found that the partitioning and clustered indexes are on the same column and on the filegroup that holds the partitioned fact table, SQLServer will automatically partition the index with the fact table partition (for example, the index will have the same partitioning functions and columns as the fact table). When the index is partitioned according to the fact table, the table and its index are automatically aligned, especially if you create a partition or switch partition switches frequently, which is much more convenient.

Next, create a nonclustered index on the foreign key of each fact table, and consider mixing the foreign key and date key, as shown in figure 1, you can see an index similar to CustomerKEY + DateKEY. Queries with the same foreign key values will be sorted with time, which will speed up the query. Note that when dealing with foreign keys, consider maintaining the integrity of the relationship.

Improve the index architecture

Over time, the data warehouse changes to adapt to changes in the organizational structure, and the index structure must be changed. Most data warehouses or BI systems connect relational tables directly, so index modifications can be made using relational table-tuned methods, such as evaluation queries and data mixing, to adjust the index accordingly. If the relational data warehouse is only used to represent the SSAS structure, then the index we discussed earlier may not be needed. SSAS prefers to use the same query over and over again, so you can use the index optimization wizard or fine tune the query. Start a simple, rigorous and thorough evaluation to build an index in the data warehouse.

After reading the above, do you have any further understanding of how to use indexes in data warehouses? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report