In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the MySQL bitmap index how to solve the user portrait problem, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
The original table of user portraits has 100 million records and more than 100 dimensions (more than 100 columns), such as age, gender, hobbies, whether there is a car, whether there is a house, etc.
Test environment 800w data, about 5G
The first problem to be solved is to select any N column among 100 columns, filter the query, and the execution time is less than one second. In fact, N is usually between 5 and 10.
That is, similar
Select * from Portrait form where Sex = 'male' and age between 20 and 30 and has a car = 'yes' and has a house =' yes' and married = 'no'
Problem 2: random aggregation of all data, execution time less than 5 seconds
such as
Select age, sex, count (*) from picture table group by age, sex
There is some trouble in solving this problem in the database, and the traditional optimization of indexing doesn't work anymore.
If you choose a query with more than 100 columns at will, it is impossible to build so many indexes in advance.
Look at the test data first.
CREATE TABLE `ohumhuaxiang` (`id` bigint (20) NOT NULL AUTO_INCREMENT, `umc_ id` bigint (20) DEFAULT NULL, `umc_ sex` varchar (20) DEFAULT NULL, `age` varchar (30) DEFAULT NULL, PRIMARY KEY (`id`))
To deal with this problem, I naturally want to simulate a bitmap.
There are several types of general portrait data
1. Numerical type
two。 Date Typ
3. Date time type
4. String type
Where date and string types can be used as discrete values
Date-time types can also be converted to date types and treated as discrete values.
The numerical type is troublesome, which requires human intervention to determine whether it is a discrete value, and if not, a range is also needed.
In short, all values should be mapped to discrete values
Then the first five data in the above figure are taken as an example to map the discrete values to bitmaps.
Male 0 0 0 1
Unknown 1 0 0 1 0
Women 0 1 1 0 0
A bigint is 8 bytes, and for rounding, I store the bit information of 60 records.
Then create the bit chart as follows
CREATE TABLE `bitmap20` (original table name of `table_ name` varchar (32) NOT NULL DEFAULT''comment' bit chart record', `comment name` varchar (32) NOT NULL DEFAULT 'comment' column name', `min_ id`int (11) DEFAULT NULL comment 'start ID', `max_ id`int (11) DEFAULT NULL comment' end ID', `gid`int (11) NOT NULL DEFAULT'0' comment 'grouping ID, each group of 1200 records', `grouped` varchar (32) NOT NULL DEFAULT 'comment' discrete value' `total` bigint (21) NOT NULL DEFAULT'0' comment 'Total', `c20` bigint (20) NOT NULL DEFAULT '0numbers, `c19` bigint (20) NOT NULL DEFAULT' 0numbers, `c18` bigint (20) NOT NULL DEFAULT '0numbers, `c17` bigint (20) NOT NULL DEFAULT' 0numbers, `c16` bigint (20) NOT NULL DEFAULT '0numbers, `c15` bigint (20) NOT NULL DEFAULT' 0numbers, `c14` bigint (20) NOT NULL DEFAULT '0numbers, `c13` bigint (20) NOT NULL DEFAULT' 0' `c12` bigint (20) NOT NULL DEFAULT '0mm, `c11` bigint (20) NOT NULL DEFAULT' 0mm, `c10` bigint (20) NOT NULL DEFAULT '0mm, `c9` bigint (20) NOT NULL DEFAULT' 0mm, `c8` bigint (20) NOT NULL DEFAULT '0mm, `c7` bigint (20) NOT NULL DEFAULT' 0mm, `c6` bigint (20) NOT NULL DEFAULT '0mm, `c5` bigint (20) NOT NULL DEFAULT' 0mm, `c4` bigint (20) NOT NULL DEFAULT '0mm, `c3` bigint (20) NOT NULL DEFAULT' 0' `c2` bigint (20) NOT NULL DEFAULT '0chart, `c1` bigint (20) NOT NULL DEFAULT' 0chart, PRIMARY KEY (`column_ name`, `gid`, `grouped`) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED comment 'bit chart'
C1-c20, a total of 20 fields of type bigint, each bigint records 60 bits of information.
That is, the bitchart stores 1200 bitmap information of the original records per row, and the bitchart enables compression.
Test environment
4C 8G memory (innodb buffer 2G) SSD hard disk
8 million original portrait data, occupying 5G of hard disk
Initialization bit chart
Insert into bitmap20 select 'table_name,' umc_sex' column_name, ((g1200-1) * 60) * 20 min_id, ((g1200-1) * 60) * 20 "1200 max_id, v2.*from (select g1200, grouped, sum (total) total, ifnull (max (case when abs (g1200-1) * 20-g60) = 20 then bitmap else null end), 0) c20 Ifnull (max (case when abs ((g1200-1) * 20-g60) = 19 then bitmap else null end), 0) c19, ifnull (max (case when abs ((g1200-1) * 20-g60) = 18 then bitmap else null end), 0) c18, ifnull (max (case when abs ((g1200-1) * 20-g60) = 17 then bitmap else null end), 0) c17, ifnull (max (case when abs ((g1200-1) * 20-g60) = 16 then bitmap else null end) 0) c16, ifnull (max (case when abs ((g1200-1) * 20-g60) = 15 then bitmap else null end), 0) c15, ifnull (max (case when abs ((g1200-1) * 20-g60) = 14 then bitmap else null end), 0) c14, ifnull (max (case when abs ((g1200-1) * 20-g60) = 13 then bitmap else null end), 0) c13 Ifnull (max (case when abs ((g1200-1) * 20-g60) = 12 then bitmap else null end), 0) c12, ifnull (max (case when abs ((g1200-1) * 20-g60) = 11 then bitmap else null end), 0) c11, ifnull (max (case when abs ((g1200-1) * 20-g60) = 10 then bitmap else null end), 0) c10, ifnull (max (case when abs ((g1200-1) * 20-g60) = 9 then bitmap else null end) 0) c9, ifnull (max (case when abs ((g1200-1) * 20-g60) = 8 then bitmap else null end), 0) c8, ifnull (max (case when abs ((g1200-1) * 20-g60) = 7 then bitmap else null end), 0) c7, ifnull (max (case when abs ((g1200-1) * 20-g60) = 6 then bitmap else null end), 0) c6 Ifnull (max (case when abs ((g1200-1) * 20-g60) = 5 then bitmap else null end), 0) c5, ifnull (max (case when abs ((g1200-1) * 20-g60) = 4 then bitmap else null end), 0) c4, ifnull (max (case when abs ((g1200-1) * 20-g60) = 3 then bitmap else null end), 0) c3, ifnull (max (case when abs ((g1200-1) * 20-g60) = 2 then bitmap else null end) 0) c2, ifnull (max (case when abs ((g1200-1) * 20-g60) = 1 then bitmap else null end), 0) C1 from (SELECT CEIL (id / 60) G60, CEIL (id / 1200) g1200, umc_sex grouped, COUNT (*) total BIT_OR (1
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.