In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about what is the zipper list in big data design. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.
Across the mountain, this sentence in the rapid development of the times toward two extremes, 1 really such as the mountain, 2 this mountain is like a piece of paper, more and more DB began to be responsible for some BIG DATA work also exists in many units, and to understand some of the design of BIG DATA and DATA warehouse, the more the better for the current and future work.
Today let's talk about one of the table types designed by big data, the zipper table.
First of all, let's talk about what a zipper list is. DB generally hears about a kind of table, which may be associated with some physical technology, such as temporary table, template table, inheritance table, and so on. Today's zipper list itself is not a physical technology, but a logical technology. A man-made way to achieve certain goals through some design.
Where the zipper list is used, it is obvious that strictly speaking, the zipper list is to be used in places like BIG DATA, data warehouse and the like is his origin, of course, it can be used in the study of some designs in DB (not to mention today).
In BIG DATA, there is a need to deal with historical data according to various dimensions, so as to achieve the purpose of statistical analysis of historical data, and this will give rise to a problem, the problem of data volume. Let's take an example. We have a user's record of the number of items added to the shopping cart. Customers may put some products in the shopping cart today and delete them after a few days. Of course, the reason is only because the customer knows, and whether the company is going to find something to do through this change, or be laid off. Through the historical changes of the shopping cart, you can get a lot of dimensional information, so that you can go to the promotion, of course, you can also get acquainted with XX.
OK talks so much nonsense, there is a definition of zipper list.
A way to store and process historical data by recording historical data and reflecting its changing state, compressing storage space and facilitating historical synchronization analysis, or periodic analysis, can be called zipper list.
The zipper table is also called for a reason. 1 stores the start and end time (for each row of records and business logic binding). 2 the start and end times are connected to each other, forming a chain mechanism 3 to avoid storing full daily records.
For example: we want to count the number of items added and deleted by customers in their shopping carts each month. (at least we can know the possible relationship between their desire to buy and their wallets)
Take MPP architecture as an example
1 full data of the shopping cart table on the day before the first day of the month
The total data table contains at least 3 fields of non-business data, the start time and the end time, the start time can be the time when the record was imported into data warehouse, or it can be the time that the record was inserted into the business table, which is specifically linked to the business analysis and the resources you have, as well as the last operation mode of recording I D U.
2 to design the zipper table partition table for the number of days in the month, the partition key can generally be a start time, or a field that symbolizes business logic
3 obtain the record of the shopping cart table that changed the next day by some means and store it in the temporary table
I D U the operations of UPDATE and delete in the business table of the next day (which may be logical operations, which are assumed to be physical operations).
4 through the data change history table of the previous day, the left join operation is performed with the historical change data recorded on that day, and then the operation records of Delete and update and the records of no changes are obtained, and then the record of the insert operation of the day is added to get the overall business table of the next day.
5 data changes for a whole month can be obtained in this way. (you can also exclude deleted records from the historical partition table of the next data in the DATA WAREHOUSE business history table according to the last operation status of the record row (either physical or logical).
In this way, the change graph of all customers' shopping carts in a month, or the products with the most deletions, can be counted and analyzed on the day that shopping carts are deleted in a certain region, or on the day that certain products are added, the maximum amount of the month, and so on.
The benefits of doing so
Can we synchronize our daily data to the data warehouse once? the answer is yes, but the question is
1.1 Shopping cart is dynamic, although you can use slave database to grab data when synchronizing data, but when there is a large amount of data, it is still clumsy to deal with. At the same time, I do not know whether the problem of the amount of data has been considered, a full amount of business table every day.
Benefits: the zipper list generates data by obtaining changing data, which significantly reduces the pressure on the business system, as well as the burden of multiple systems such as network and storage, and most of the work is done in the warehouse itself.
Although according to the above method, we have 1 billion records and the number of changes per day is 2 million, then the number of records we need to store (take 30 days as an example) is 2 million * 29 days + 1 day 1 billion. The amount of data stored in the way of one slice per day above will save a lot of storage space compared with 30 days * 1 billion +.
For example, this is our full scale for December 1.
The following record is the zipper table, which records the daily changes of each record. through the initial table + zipper table, all data changes in this month can be analyzed dimensionally, or a data table with a new time dimension can be generated.
The above is what the zipper list in the big data design is shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.