What is the common solution to the slow change dimension in big data 04/15 Update SLTechnology News&Howtos

What is the common solution to the slow change dimension in big data

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces what is the common solution of slow change in big data. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

one。 Define

Slowly changing dimensions:

One of the important characteristics of data warehouse is to reflect historical changes, so how to deal with dimensional changes is one of the important tasks of dimensional design. The slow change dimension is proposed because in the real world, the attribute of the dimension is not static, it will change slowly with the passage of time, compared with the fact table with relatively rapid data growth, the dimension change is relatively slow.

In some cases, preserving historical data has little analytical value, while in others, it is very important to retain historical data. In kimball theory, there are three ways to deal with slowly changing dimensions.

two。 Solution 1. Override latitude value

In this way, the historical data is not retained and the latest data is always taken.

# list of goods and orders before change

Commodity key

Commodity id

Commodity title

Belong to the category

Other Dimension Properties

one thousand

Item1

Titile1

Category 1

...

Order key

Date key

Commodity key

Transaction amount

Other facts

9000

2020-04-10

one thousand

131.00

...

# changed merchandise list and order list

Commodity key

Commodity id

Commodity title

Belong to the category

Other Dimension Properties

one thousand

Item1

Titile1

Category 2

...

Order key

Date key

Commodity key

Transaction amount

Other facts

9000

2020-04-10

one thousand

131.00

...

9001

2020-04-13

one thousand

52.00

...

two。 Insert a new dimension row

Insert a new dimension line. In this way, historical data is retained

The facts before the change of the dimension value are associated with the past dimension value, and the facts after the dimension value change are associated with the current dimension value.

# changed merchandise list and order list

Commodity key

Commodity id

Commodity title

Belong to the category

Other Dimension Properties

one thousand

Item1

Titile1

Category 1

...

1001

Item1

Titile1

Category 2

...

Order key

Date key

Commodity key

Transaction amount

Other facts

9000

2020-04-10

one thousand

131.00

...

9001

2020-04-13

1001

52.00

...

3. Add Dimension column

In the second way, the recorded facts before and after the change can not be grouped into the pre-change dimension or the changed dimension. For example, according to business needs, we need to count all the transactions in April into category 2, which cannot be achieved by using the second processing method. In order to solve this problem, the third processing method is adopted to retain the historical data, and any attribute column can be used.

# list of goods and orders before change

Commodity key

Commodity id

Commodity title

Belong to the new category

Belong to the old category

Other Dimension Properties

one thousand

Item1

Titile1

Category 1

...

Order key

Date key

Commodity key

Transaction amount

Other facts

9000

2020-04-10

one thousand

131.00

...

# changed merchandise list and order list

Commodity key

Commodity id

Commodity title

Belong to the new category

Belong to the old category

Other Dimension Properties

one thousand

Item1

Titile1

Category 2

Category 1

...

Order key

Date key

Commodity key

Transaction amount

Other facts

9000

2020-04-10

one thousand

131.00

...

9001

2020-04-13

one thousand

52.00

...

There is no completely correct answer as to which way to deal with slowly changing dimensions, which can be made according to business needs. For example, according to the statistics of the transaction volume in April 2020 according to the category to which the commodity belongs, the category to which the commodity belongs changed from category 1 to category 2 on April 13, 2020, assuming that the business demand side does not care about historical data. if all the turnover is counted on the latest category 2, there is no need to save historical data: suppose category 1 belongs to one business unit and category 2 belongs to another business unit. If different business units need to count their own performance, they need to retain historical data.

This is the end of what is the common solution for slow change in big data. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.