In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces what is the common solution of slow change in big data. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
one。 Define
Slowly changing dimensions:
One of the important characteristics of data warehouse is to reflect historical changes, so how to deal with dimensional changes is one of the important tasks of dimensional design. The slow change dimension is proposed because in the real world, the attribute of the dimension is not static, it will change slowly with the passage of time, compared with the fact table with relatively rapid data growth, the dimension change is relatively slow.
In some cases, preserving historical data has little analytical value, while in others, it is very important to retain historical data. In kimball theory, there are three ways to deal with slowly changing dimensions.
two。 Solution 1. Override latitude value
In this way, the historical data is not retained and the latest data is always taken.
# list of goods and orders before change
Commodity key
Commodity id
Commodity title
Belong to the category
Other Dimension Properties
one thousand
Item1
Titile1
Category 1
...
Order key
Date key
Commodity key
Transaction amount
Other facts
9000
2020-04-10
one thousand
131.00
...
# changed merchandise list and order list
Commodity key
Commodity id
Commodity title
Belong to the category
Other Dimension Properties
one thousand
Item1
Titile1
Category 2
...
Order key
Date key
Commodity key
Transaction amount
Other facts
9000
2020-04-10
one thousand
131.00
...
9001
2020-04-13
one thousand
52.00
...
two。 Insert a new dimension row
Insert a new dimension line. In this way, historical data is retained
The facts before the change of the dimension value are associated with the past dimension value, and the facts after the dimension value change are associated with the current dimension value.
# changed merchandise list and order list
Commodity key
Commodity id
Commodity title
Belong to the category
Other Dimension Properties
one thousand
Item1
Titile1
Category 1
...
1001
Item1
Titile1
Category 2
...
Order key
Date key
Commodity key
Transaction amount
Other facts
9000
2020-04-10
one thousand
131.00
...
9001
2020-04-13
1001
52.00
...
3. Add Dimension column
In the second way, the recorded facts before and after the change can not be grouped into the pre-change dimension or the changed dimension. For example, according to business needs, we need to count all the transactions in April into category 2, which cannot be achieved by using the second processing method. In order to solve this problem, the third processing method is adopted to retain the historical data, and any attribute column can be used.
# list of goods and orders before change
Commodity key
Commodity id
Commodity title
Belong to the new category
Belong to the old category
Other Dimension Properties
one thousand
Item1
Titile1
Category 1
Category 1
...
Order key
Date key
Commodity key
Transaction amount
Other facts
9000
2020-04-10
one thousand
131.00
...
# changed merchandise list and order list
Commodity key
Commodity id
Commodity title
Belong to the new category
Belong to the old category
Other Dimension Properties
one thousand
Item1
Titile1
Category 2
Category 1
...
Order key
Date key
Commodity key
Transaction amount
Other facts
9000
2020-04-10
one thousand
131.00
...
9001
2020-04-13
one thousand
52.00
...
There is no completely correct answer as to which way to deal with slowly changing dimensions, which can be made according to business needs. For example, according to the statistics of the transaction volume in April 2020 according to the category to which the commodity belongs, the category to which the commodity belongs changed from category 1 to category 2 on April 13, 2020, assuming that the business demand side does not care about historical data. if all the turnover is counted on the latest category 2, there is no need to save historical data: suppose category 1 belongs to one business unit and category 2 belongs to another business unit. If different business units need to count their own performance, they need to retain historical data.
This is the end of what is the common solution for slow change in big data. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.