In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you a table about how to merge in mysql. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
De-duplication of mysql merge table
Goal:
Existing tables an and b merge the data from the two tables and rejoin them into table c. The amount of data in tables an and b is about 20 million.
Basic situation
Operating system version: CentOS release 5.664 bit
Operating system memory: 8G
Version: 5.1.56-community 64 bit
Database initialization parameters: default
Database tables and data volume
Table a: www.2cto.com
> desc a2kw
+-+ +
| | Field | Type | Null | Key | Default | Extra | |
+-+ +
| | C1 | varchar (20) | YES | MUL | NULL |
| | c2 | varchar (30) | YES | | NULL | |
| | c3 | varchar (12) | YES | | NULL |
| | c4 | varchar (20) | YES | | NULL |
+-+ +
4 rows in set (0.00 sec)
Table b
Mysql > desc b2kw
+-+ +
| | Field | Type | Null | Key | Default | Extra | |
+-+ +
| | C1 | varchar (20) | YES | | NULL |
| | c2 | varchar (30) | YES | | NULL | |
| | c3 | varchar (12) | YES | | NULL |
| | c4 | varchar (20) | YES | | NULL |
+-+ +
4 rows in set (0.00 sec)
The data in tables an and b are summarized as follows
Mysql > select * from a2kw limit 10
+-+
| | C1 | c2 | c3 | c4 |
+-+
| | 662164461 | 131545534 | TOM0 | 20120520 | |
| | 226662142 | 605685564 | TOM0 | 20120516 | |
| | 527008225 | 172557633 | TOM0 | 20120514 | |
| | 574408183 | 350897450 | TOM0 | 20120510 | |
| | 781619324 | 583989494 | TOM0 | 20120510 | |
| | 158872754 | 775676430 | TOM0 | 20120512 | |
| | 815875622 | 631631832 | TOM0 | 20120514 | |
| | 905943640 | 477433083 | TOM0 | 20120514 | |
| | 660790641 | 616774715 | TOM0 | 20120512 | |
| | 999083595 | 953186525 | TOM0 | 20120513 | |
+-+
10 rows in set (0.01 sec)
Basic steps www.2cto.com
1. Create an index on Table B
Mysql > select count (*) from b2kw
+-+
| | count (*) |
+-+
| | 20000002 |
+-+
1 row in set (0.00 sec)
Mysql > create index ind_b2kw_c1 on b2kw (C1)
Query OK, 20000002 rows affected (1 min 2.94 sec)
Records: 20000002 Duplicates: 0 Warnings: 0
Data volume: 20000002, time: 1 min 2.94 sec
2. Insert an and b into the intermediate table temp respectively
Create an intermediate table
Mysql > create table temp select * from c2kw where 1: 2
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Insert data
Mysql > insert into temp select * from a2kw
Query OK, 20000002 rows affected (13.23 sec)
Records: 20000002 Duplicates: 0 Warnings: 0
Mysql > insert into temp select * from b2kw
Query OK, 20000002 rows affected (13.27 sec)
Records: 20000002 Duplicates: 0 Warnings: 0
Www.2cto.com
Mysql > select count (*) from temp
+-+
| | count (*) |
+-+
| | 40000004 |
+-+
1 row in set (0.00 sec)
Data volume: 40000004, time: 26.50 sec
3. Temp establishes a federated index to force the index to remove duplicate data.
Mysql > create index ind_temp_c123 on temp (C1, c2, c3)
Query OK, 40000004 rows affected (3 min 43.87 sec)
Records: 40000004 Duplicates: 0 Warnings: 0
View the execution plan
Mysql > explain select C1 ~ c2 ~ c3 ~ max (c4) from temp FORCE INDEX
(ind_temp_c123) group by C1, c2, and c3
+-
-+
| | id | select_type | table | type | possible_keys | key |
| | key_len | ref | rows | Extra | |
+-
-- +
| | 1 | SIMPLE | temp | index | NULL | ind_temp_c123 | 71 |
| | NULL | 40000004 |
+-+ www.2cto.com-+-+-
-+
1 row in set (0.05sec)
Mysql > insert into c2kw select C1 ~ c2 ~ c3 ~ max (c4) from temp
FORCE INDEX (ind_temp_c123) group by C1, c2pm c3
Query OK, 20000004 rows affected (2 min 0.85 sec)
Records: 20000004 Duplicates: 0 Warnings: 0
The actual cost is about: 6 min
4. Delete the intermediate table
Mysql > drop table temp
Query OK, 0 rows affected (0.99 sec)
The actual approximate cost is: 1 sec
5. Set up c index
Mysql > create index ind_c2kw_c1 on c2kw (C1)
Query OK, 20000004 rows affected (49.74 sec)
Records: 20000004 Duplicates: 0 Warnings: 0
Mysql > create index ind_c2kw_c2 on c2kw (c2)
Query OK, 20000004 rows affected (1 min 47.20 sec)
Records: 20000004 Duplicates: 0 Warnings: 0
Mysql > create index ind_c2kw_c3 on c2kw (c3)
Query OK, 20000004 rows affected (2 min 42.02 sec)
Records: 20000004 Duplicates: 0 Warnings: 0
The actual cost is about 5 minutes www.2cto.com
6. Clear tables an and b
Mysql > truncate table a2kw
Query OK, 0 rows affected (1.15 sec)
Mysql > truncate table b2kw
Query OK, 0 rows affected (1.34 sec)
The above is the mysql merge table shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.