What is a zipper list? 07/09 Update SLTechnology News&Howtos

What is a zipper list?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is a zipper list". In daily operation, I believe that many people have doubts about what a zipper list is. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is a zipper list"! Next, please follow the editor to study!

I. introduction of zipper list

Zipper table: a table that maintains historical status and up-to-date status data. zipper tables are actually equivalent to snapshots according to the granularity of zippers, but are optimized to remove some of the unchanged records. through the zipper table, the customer record of the zipper time point can be easily restored.

Second, zipper table scene

In the process of designing the data model of a data warehouse, we often encounter such requirements:

Some of the fields in the table will be update, such as user's address, product description information, brand information, etc.

You need to view the historical snapshot information of a certain time point or time period, for example, to view the status of a product at a certain point in history, to see how many times a user has been updated in a certain period of time, and so on.

The proportion and frequency of changes are not great. For example, there are a total of 10 million members, and about 100000 of them are added or changed every day.

III. Commodity data cases

Demand: commodity list:

Listing type description goods_idvarchar (50) item number goods_statusvarchar (50) item status (to be reviewed, for sale, for sale, deleted) createtimevarchar (50) item creation date modifytimevarchar (50) item modification date

The figures for December 20, 2019 are as follows:

Goods_idgoods_statuscreatetimemodifytime001 for audit 2019-12-202019-12-20002 for sale 2019-12-202019-12-20003 on sale 2019-12-202019-12-20004 deleted 2019-12-202019-12-20

The state of the commodity will change with the passage of time, and we need to preserve the historical information of all the changes of the commodity. How can it be realized?

Plan 1: snapshot the data of each day to the data warehouse (diagram)

The scheme is as follows:

Keep a full copy every day and synchronize all the data into the warehouse (I use MySQL operation here)

Many records are kept repeatedly without any change.

December 20 (4 pieces of data)

Goods_idgoods_statuscreatetimemodifytime001 for audit 2019-12-182019-12-20002 for sale 2019-12-192019-12-20003 on sale 2019-12-202019-12-20004 deleted 2019-12-152019-12-20

December 21 (10 pieces of data)

December 22 (18 pieces of data)

Goods_idgoods_statuscreatetimemodifytime the following is 20 December snapshot data for sale 2019-12-182019-12-20002 for sale 2019-12-192019-12-20003 on sale 2019-12-202019-12-20004 deleted 2019-12-152019-12-12-20 for 21 December snapshot data for sale (from audit to sale) 2019-12-182019-12-21002 for sale 2019-12-192019-12-20003 2019-12-202019-12-20004 has been deleted 2019-12-152019-12-20005 pending audit 2019-12-212019-12-21006 pending audit 2019-12-212019-12-21 below for 22 December 2019 snapshot data for sale 2019-12-182019-12-21002 for sale 2019-12-192019-12-20003 deleted (from on-sale to deleted) 2019-12-202019-12-22004 2019-12-212019-12-21005 pending audit 2019-12-212019-12-21006 has been deleted. To deleted) 2019-12-212019-12-22007 pending audit 2019-12-222019-12-22008 pending audit 2019-12-222019-12-22

Scheme 1: MySQL to MySQL code implementation

MySQL initialization

1. Zw libraries and merchandise tables are used to the original data layer in MySQL

-- create database create database if not exists zw;-- create commodity table create table if not exists `zw`.`t _ product` (goods_id varchar (50),-- item number goods_status varchar (50),-- Product status createtime varchar (50),-- Product creation time modifytime varchar (50)-- Product modification time)

two。 Create ods and dw layer simulation data warehouse in MySQL

-- ods creates commodity table create table if not exists `zw`.`ods _ t _ product` (goods_id varchar (50),-- Product number goods_status varchar (50),-- Product status createtime varchar (50),-- Product creation time modifytime varchar (50),-- Commodity modification time cdat varchar (10)-- Simulation hive partition) default character set = 'utf8'; -- dw creates commodity table create table if not exists `zw`.`dw _ t _ product` (goods_id varchar (50),-- Product number goods_status varchar (50),-- Commodity status createtime varchar (50),-- Commodity creation time modifytime varchar (50),-- Commodity modification time cdat varchar (10)-- Simulation hive partition) default character set = 'utf8';

Incremental import of December 20th data

1. Import raw data into December 20 (4 items)

Insert into `zw`.`t _ product` (goods_id, goods_status, createtime, modifytime) values ('001,' pending audit, '2019-12-18,' 2019-12-20'), ('002, for sale,' 2019-12-19, '2019-12-20'), ('003,' on sale', '2019-12-20,' 2019-12-20'), ('004, deleted') '2019-12-15,' 2019-12-20)

Note: because the data warehouse that I use MySQL to simulate here is lazy to import data directly using insert into, it is possible to use hive to do data warehouse in enterprises and use kettle or sqoop or datax to synchronize data.

# Import from raw data tier to ods tier insert into zw.ods_t_product select *, '20191220' from zw.t_product; # synchronize from ods to dw tier insert into zw.dw_t_product select * from zw.ods_t_product where cdat='20191220'

Incremental import of December 21 data

1. Raw data layer imports December 21 data (6 pieces of data)

UPDATE `zw`.`t _ product` SET goods_status = 'for Sale', modifytime = '2019-12-21' WHERE goods_id = '001'; INSERT INTO `zw`.`t _ product` (goods_id, goods_status, createtime, modifytime) VALUES ('005', 'pending audit', '2019-12-21', '2019-12-21'), ('006,' pending audit', '2019-12-21,' 2019-12-21')

two。 Import data into ods layer and dw layer

# Import from raw data tier to ods tier insert into zw.ods_t_product select *, '20191221' from zw.t_product; # synchronize from ods to dw tier insert into zw.dw_t_product select * from zw.ods_t_product where cdat='20191221'

3. View the running results of the dw layer

Select * from zw.dw_t_product where cdat='20191221'

Incremental import of December 22 data

1. Raw data layer imports December 22 data (6 pieces of data)

UPDATE `zw`.`t _ product` SET goods_status = 'deleted', modifytime = '2019-12-22' WHERE goods_id = '003'; UPDATE `zw`.`t _ product` SET goods_status = 'deleted', modifytime = '2019-12-22' WHERE goods_id = '006' INSERT INTO `zw`.`t _ product` (goods_id, goods_status, createtime, modifytime) VALUES ('007,' pending audit', '2019-12-22,' 2019-12-22'), ('008,' pending audit', '2019-12-22,' 2019-12-22')

two。 Import data into ods layer and dw layer

# Import from raw data tier to ods tier insert into zw.ods_t_product select *, '20191222' from zw.t_product; # synchronize from ods to dw tier insert into zw.dw_t_productpeizhiwenjian select * from zw.ods_t_product where cdat='20191222'

3. View the running results of the dw layer

Select * from zw.dw_t_product where cdat='20191222'

From the above cases, we can see that:

Keep a full copy of the table every day, and a lot of constant information will be saved each time. If the amount of data is very large, it will be a great waste of storage.

The table can be designed as a zipper table, which can not only meet the historical state of the response data, but also maximize storage space savings.

Option 2: use zipper tables to save historical snapshots (ideas / diagrams)

The zipper table does not store redundant data, and only when the data of a row changes, it needs to be saved, which saves storage space compared with each full synchronization.

Able to query historical snapshots

Additional two columns (dw_start_date, dw_end_date) are added for the life cycle of the data row

Data for merchandise zipper tables on December 20:

Goods_idgoods_statuscreatetimemodifytimedw_start_datedw_end_date001 for review 2019-12-182019-12-202019-12-209999-12-31002 for sale 2019-12-192019-12-202019-12-209999-12-31003 on sale 2019-12-202019-12-209999-12-209999-12-31004 deleted 2019-12-152019-12-202019-12-209999-12-31

The data on December 20th is completely new and imported into the dw table.

Dw_start_date represents the start time of the life cycle of a piece of data, that is, the data is valid from that time (that is, the effective date)

Dw_end_date indicates the end time of the life cycle of a piece of data, that is, the date on which the data reaches (excluding) (i.e. expiration date).

The dw_end_date is 9999-12-31, which means that the current data is the latest data, and the data will not expire until 9999-12-31.

Data for commodity zipper tables for December 21

There is no redundant data stored in the zipper table (as long as the data does not change, there is no need for synchronization)

The status of the commodity data has changed (from → for sale). The original dw_end_date needs to be changed from 9999-12-31 to 2019-12-21, indicating the status to be reviewed. It is valid from 2019-12-20 (inclusive) to 2019-12-21 (excluding).

001The new status has been re-saved as a record, dw_start_date is 2019. 12 pm, 21 pm, and the date is 9999-12-31.

The new data 005,006 and dw_start_date are 2019, 12, and 21, respectively. The date is 9999-12-31.

Data for commodity zipper tables on December 22nd

Goods_idgoods_statuscreatetimemodifytimedw_start_datedw_end_date001 for audit 2019-12-182019-12-202019-12-202019-12-21002 for sale 2019-12-192019-12-202019-12-209999-12-31003 2019-12-202019-12-202019-12-202019-12-22004 has been deleted 2019-12-152019-12-202019-12-209999-12-31001 for sale 2019-12-182019-12-212019-12-219999-12-31005 for review 2019-12-212019-12-31005 -212019-12-219999-12-31006 pending audit 2019-12-212019-12-212019-219999-12-31003 (variable) 2019-12-202019-12-222019-12-229999-12-31007 (new) 2019-12-222019-12-222019-12-222019-12-229999-12-31008 (new) 2019-12-222019-12-229999-12-31

Data for commodity zipper tables on December 22nd

The status of the commodity data numbered 003 has changed (deleted from → on sale). The original dw_end_date needs to be changed from 9999-12-31 to 2019-12-22, indicating that it is on sale. It is valid from 2019-12-20 (inclusive) to 2019-12-22 (excluding).

003 the new status has been re-saved a record, dw_start_date is 2019, the number is 12, the date is 22, the date is 9999-12-31

The new data 007,008, dw_start_date is 2019, the date is 2019, the date is 22, the date is 9999-12-31.

Scheme 2: implementation of zipper table snapshot code

Operation procedure:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Add two additional columns to the original dw layer table

Synchronize only the modified data of the day to the ods tier

Implementation of zipper table algorithm

The data of the zipper table are: the latest data of the day, UNION ALL historical data

Code implementation:

1. Zw libraries and merchandise tables are used to the original data layer in MySQL

-- create database create database if not exists zw;-- create commodity table create table if not exists `zw`.`t _ product_ 2` (goods_id varchar (50),-- Commodity number goods_status varchar (50),-- Commodity status createtime varchar (50),-- Commodity creation time modifytime varchar (50)-Commodity modification time) default character set = 'utf8'

two。 Create ods and dw layer simulation data warehouse in MySQL

-- ods creates commodity table create table if not exists `zw`.`ods _ t _ product2` (goods_id varchar (50),-- Product number goods_status varchar (50),-- Commodity status createtime varchar (50),-- Commodity creation time modifytime varchar (50),-- Commodity modification time cdat varchar (10)-- Simulation hive partition) default character set = 'utf8' -- dw creates commodity table create table if not exists `zw`.`dw _ t _ product2` (goods_id varchar (50),-- Product number goods_status varchar (50),-- Commodity status createtime varchar (50),-- Commodity creation time modifytime varchar (50),-- Commodity modification time dw_start_date varchar (12)-- effective date dw_end_date varchar (12) -- failure time cdat varchar (10)-- simulated hive partition) default character set = 'utf8'

Full import of data on December 20, 2019

1. The original data layer imports December 20 data (4 pieces of data)

Insert into `zw`.`t _ product_ 2` (goods_id, goods_status, createtime, modifytime) values ('001,' pending audit, '2019-12-18,' 2019-12-20'), ('002, for sale,' 2019-12-19, '2019-12-20'), ('003,' on sale', '2019-12-20,' 2019-12-20'), ('004, deleted') '2019-12-15,' 2019-12-20)

two。 Import data into the ods layer in the warehouse

Insert into zw.ods_t_product2 select *, '20191220' from zw.t_product_2 where modifytime > = '2019-12-20'

3. Import data from the ods tier to the dw tier

Insert into zw.dw_t_product2 select goods_id, goods_status, createtime, modifytime, modifytime,'9999-12-31, cdat from zw.ods_t_product2 where cdat='20191220'

Incremental import of data on December 21, 2019

1. Raw data layer imports December 21 data (6 pieces of data)

UPDATE `zw`.`t _ product_ 2` SET goods_status = 'for Sale', modifytime = '2019-12-21' WHERE goods_id = '001'; INSERT INTO `zw`.`t _ product_ 2` (goods_id, goods_status, createtime, modifytime) VALUES ('005', 'pending audit', '2019-12-21', '2019-12-21'), ('006,' pending audit', '2019-12-21,' 2019-12-21')

two。 Synchronize the original data layer to the ods layer

Insert into zw.ods_t_product2 select *, '20191221' from zw.t_product_2 where modifytime > = '2019-12-21'

3. Write ODS layer to dw layer to recalculate dw_end_date

Note: I directly put the SQL statement of the result here because I don't write the overwrite to the database, but it doesn't affect our result. The operation procedure on December 22nd is the same as that of 21, so I won't write it.

Select t1.goods_id, t1.goods_status, t1.createtime, t1.modifytime, t1.dw_start_date, case when (t2.goods_id is not null and t1.dw_end_date > '2019-12-21') then '2019-12-21'else t1.dw__date end as end T1.cdat from zw.dw_t_product2 T1 left join (select * from zw.ods_t_product2 where cdat='20191221') T2 on t1.goods_id=t2.goods_id union select goods_id, goods_status, createtime, modifytime, modifytime,'9999-12-31, cdat from zw.ods_t_product2 where cdat='20191221'

Query result

At this point, the study of "what is a zipper list" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.