In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how HIVE counts the most popular TOP3 products in various regions. I hope you will get something after reading this article. Let's discuss it together.
Environment:
Hadoop-2.6.0-cdh6.7.0
Hive-1.1.0-cdh6.7.0
Sqoop-1.4.6-cdh6.7.0
MySQL5.6.39
Demand: HIVE counts the most popular TOP3 products in each region, and exports the statistical results to MySQL
There are city_info city information table and product_info commodity information table in MySQL.
There are user_click user behavior logs in HIVE, which are partitioned by date
I. MySQL database building, table building and initialization data
1.1 MySQL create ruozedata database:
CREATE DATABASE ruozedata
1.2 create a city_info table
DROP TABLE if exists city_info
CREATE TABLE `city_ info` (
`city_ id` int (11) DEFAULT NULL
`city_ name` varchar (255) DEFAULT NULL
`area` varchar (255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Initialize city_ info table data
Insert into 'city_ info` (' city_ id`, 'city_ name`, `area`) values (1), (2), (2), (3), (3), (3), (4), (5), (6), (6), (7), (7), (8), (9), (10, and 10) 'NE')
1.3Create product_info table
DROP TABLE if exists product_info
CREATE TABLE `product_ info` (
`product_ id` int (11) DEFAULT NULL
`product_ name` varchar (255) DEFAULT NULL
`extend_ info` varchar (255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Initialize product_ info table data
Insert into product_info (product_id,product_name,extend_info) values (1 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (2 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (3) "product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (4 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (5, product_status, 5, production, 5)
Insert into product_info (product_id,product_name,extend_info) values (6) "product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (7 recordings, product7 copies,'{product_status ": 1}')
Insert into product_info (product_id,product_name,extend_info) values (8, product_status, 8, product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (9 recordings production 9 minutes'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (10 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (11 recordings, product11 copies,'{product_status ": 0}')
Insert into product_info (product_id,product_name,extend_info) values (12 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (13dint product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (14, product_status, 14, product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (15, product_status, 15, production, 15: {"product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (16 recordings, product16 copies,'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (17 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (18maths, product18packs,'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (19recast, production, 19pm,'{product_status ": 1}')
Insert into product_info (product_id,product_name,extend_info) values (20 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (21 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (22 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (23mai product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (24recordings, product24copies,'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (25, "product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (26maths, product26th,'{"product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (27 recordings, product27 copies,'{product_status ": 0}')
Insert into product_info (product_id,product_name,extend_info) values (28 recordings, product28,'{"product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (29 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (30 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (31 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (32 product_status product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (33 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (34 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (35 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (36 cinematic products 36 minutes,'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (37 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (38 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (39 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (40 recordings, product40 copies,'{"product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (41 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (42 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (43 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (44 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (45 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (46 cinematic products 46 minutes'{"product_status": 1}')
Insert into product_info (product_id,product_name,extend_info) values (47 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (48 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (49 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (50, product_status, 50, production, 50: 1}')
Insert into product_info (product_id,product_name,extend_info) values (51 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (52 product_status product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (53 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (54 product_status product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (55 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (56 product_status product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (57 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (58 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (59 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (60 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (61 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (62 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (63 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (64 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (65 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (66 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (67 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (68mai / product68pm /'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (69 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (70 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (71 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (72 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (73 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (74 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (75 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (76 cinematic products 76 minutes'{"product_status": 0}')
Insert into product_info (product_id,product_name,extend_info) values (77 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (78 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (79 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (80 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (81 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (82 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (83 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (84 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (85 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (86 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (87 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (88 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (89 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (90th product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (91 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (92 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (93 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (94 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (95 product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (96 product_status product_status: 0}')
Insert into product_info (product_id,product_name,extend_info) values (97 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (98 product_status: 1}')
Insert into product_info (product_id,product_name,extend_info) values (99 product_status / product 99: 0}')
Insert into product_info (product_id,product_name,extend_info) values (100 product_status: 1}')
1.4 create MySQL database product statistical table product_stat
Drop table if exists `product_ stat`
CREATE TABLE `product_ stat` (
`product_ id` int (11) DEFAULT NULL
, `product_ name` varchar (255) DEFAULT NULL
, `area` varchar (255) DEFAULT NULL
, `click_ count` int (11)
, `rank` int
, `days` varchar (20)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Use sqoop to import data from MySQL's city_ info table and product_info table into hive (including creating tables)
2.1 check the tables in hive before you operate
Hive > show databases
OK
Default
Hive2
Hive2_ruozedata
Ruozedata
Sqoophive
Time taken: 0.06 seconds, Fetched: 5 row (s)
Hive > use ruozedata
OK
Time taken: 0.069 seconds
Hive > show tables
OK
Emp_hive
Gw_test
Hive_wc
Rating_json
Ruozedata_emp
Ruozedata_emp_managed
Ruozedata_emp_partition
Ruozedata_person
Time taken: 0.061 seconds, Fetched: 8 row (s)
Hive >
2.2 Import the city_ info table (import data from MySQL to hive's ruozedata.city_info table):
Sqoop import\
-- connect jdbc:mysql://localhost:33066/ruozedata\
-- username root\
-- password root\
-- table city_info-m 1\
-- mapreduce-job-name FromMySQLToHive\
-- delete-target-dir\
-- create-hive-table\
-- hive-table ruozedata.city_info\
-hive-import-hive-overwrite
2.3 Import the product_ info table (import data from MySQL to hive's ruozedata.product_info table):
Sqoop import\
-- connect jdbc:mysql://localhost:33066/ruozedata\
-- username root\
-- password root\
-- table product_info-m 1\
-- mapreduce-job-name FromMySQLToHive\
-- delete-target-dir\
-- create-hive-table\
-- hive-table ruozedata.product_info\
-hive-import-hive-overwrite
2.4After operation, look at the table in hive:
Hive > show tables
OK
City_info
Emp_hive
Gw_test
Hive_wc
Product_info
Rating_json
Ruozedata_emp
Ruozedata_emp_managed
Ruozedata_emp_partition
Ruozedata_person
Time taken: 0.061 seconds, Fetched: 10 row (s)
Hive > select * from city_info
OK
1 BEIJING NC
2 SHANGHAI EC
3 NANJING EC
4 GUANGZHOU SC
5 SANYA SC
6 WUHAN CC
7 CHANGSHA CC
8 XIAN NW
9 CHENGDU SW
10 HAERBIN NE
Time taken: 0.554 seconds, Fetched: 10 row (s)
Hive > select * from product_info
OK
1 product1 {"product_status": 1}
2 product2 {"product_status": 1}
3 product3 {"product_status": 1}
4 product4 {"product_status": 1}
5 product5 {"product_status": 1}
6 product6 {"product_status": 1}
7 product7 {"product_status": 1}
8 product8 {"product_status": 1}
9 product9 {"product_status": 0}
10 product10 {"product_status": 1}
11 product11 {"product_status": 0}
12 product12 {"product_status": 0}
13 product13 {"product_status": 0}
14 product14 {"product_status": 0}
15 product15 {"product_status": 1}
16 product16 {"product_status": 0}
17 product17 {"product_status": 1}
18 product18 {"product_status": 0}
19 product19 {"product_status": 1}
20 product20 {"product_status": 1}
21 product21 {"product_status": 0}
22 product22 {"product_status": 0}
23 product23 {"product_status": 0}
24 product24 {"product_status": 0}
25 product25 {"product_status": 1}
26 product26 {"product_status": 1}
27 product27 {"product_status": 0}
28 product28 {"product_status": 1}
29 product29 {"product_status": 0}
30 product30 {"product_status": 0}
31 product31 {"product_status": 0}
32 product32 {"product_status": 0}
33 product33 {"product_status": 1}
34 product34 {"product_status": 1}
35 product35 {"product_status": 0}
36 product36 {"product_status": 0}
37 product37 {"product_status": 1}
38 product38 {"product_status": 0}
39 product39 {"product_status": 0}
40 product40 {"product_status": 1}
41 product41 {"product_status": 1}
42 product42 {"product_status": 1}
43 product43 {"product_status": 1}
44 product44 {"product_status": 0}
45 product45 {"product_status": 1}
46 product46 {"product_status": 1}
47 product47 {"product_status": 0}
48 product48 {"product_status": 1}
49 product49 {"product_status": 1}
50 product50 {"product_status": 1}
51 product51 {"product_status": 1}
52 product52 {"product_status": 0}
53 product53 {"product_status": 0}
54 product54 {"product_status": 1}
55 product55 {"product_status": 0}
56 product56 {"product_status": 0}
57 product57 {"product_status": 1}
58 product58 {"product_status": 1}
59 product59 {"product_status": 1}
60 product60 {"product_status": 1}
61 product61 {"product_status": 0}
62 product62 {"product_status": 1}
63 product63 {"product_status": 1}
64 product64 {"product_status": 0}
65 product65 {"product_status": 0}
66 product66 {"product_status": 1}
67 product67 {"product_status": 1}
68 product68 {"product_status": 0}
69 product69 {"product_status": 1}
70 product70 {"product_status": 0}
71 product71 {"product_status": 0}
72 product72 {"product_status": 0}
73 product73 {"product_status": 1}
74 product74 {"product_status": 0}
75 product75 {"product_status": 1}
76 product76 {"product_status": 0}
77 product77 {"product_status": 0}
78 product78 {"product_status": 1}
79 product79 {"product_status": 0}
80 product80 {"product_status": 0}
81 product81 {"product_status": 0}
82 product82 {"product_status": 1}
83 product83 {"product_status": 1}
84 product84 {"product_status": 1}
85 product85 {"product_status": 0}
86 product86 {"product_status": 1}
87 product87 {"product_status": 1}
88 product88 {"product_status": 1}
89 product89 {"product_status": 1}
90 product90 {"product_status": 1}
91 product91 {"product_status": 1}
92 product92 {"product_status": 0}
93 product93 {"product_status": 0}
94 product94 {"product_status": 1}
95 product95 {"product_status": 0}
96 product96 {"product_status": 0}
97 product97 {"product_status": 1}
98 product98 {"product_status": 1}
99 product99 {"product_status": 0}
100 product100 {"product_status": 1}
Time taken: 0.141 seconds, Fetched: 100row (s)
Hive >
Sqoop successfully imported data from MySQL's city_ info table and product_info table into hive (including creating tables)
Third, operate the table and data in hive, and finally make statistics.
3.1Create user behavior log table user_click partition table
Create table user_click
(
User_id int
, session_id string
, action_time string
, city_id int
, product_id int
)
Partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY','
Upload user_click.txt data file and import the data into the table.
Upload the user_click.txt data file to the / home/hadoop/data directory.
[hadoop@hadoop002 data] $ll
Total 62912
-rw-rw-r--. 1 hadoop hadoop 652 Jun 4 16:34 emp.txt
-rw-r--r--. 1 hadoop hadoop 84 Jun 7 09:53 hive_row_number.txt
-rw-rw-r--. 1 hadoop hadoop 34 Jun 11 15:17 hive_wc.txt
-rw-r--r--. 1 hadoop hadoop 63602280 Jun 7 09:54 rating.json
-rw-rw-r--. 1 hadoop hadoop 67 Jun 6 18:30 student.txt
-rwxrwxrwx. 1 hadoop hadoop 725264 Jun 9 21:28 user_click.txt
[hadoop@hadoop002 data] $pwd
/ home/hadoop/data
Import data into the user_ Click table:
LOAD DATA LOCAL INPATH'/ home/hadoop/data/user_click.txt' OVERWRITE INTO TABLE user_click PARTITION (date='2018-06-20')
Create a product statistics partition table product_stat in hive to write statistical results
Create table product_stat
(
Product_id int
, product_name string
, area string
, click_count int
, rank int
, days string
)
Partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY'\ t'
3.4 count the most popular TOP3 products in each region and write them into the statistical table product_stat
Insert overwrite table product_stat partition (date='2018-06-20')
Select
T.product_id
, t.product_name
, t.area
, t.click_count
, t.rank
, '2018-06-20' as days
From
(
Select
Ci.area as area
, uc.product_id as product_id
, pd.product_name as product_name
, count (uc.product_id) as click_count
, (row_number () over (partition by ci.area order by count (uc.product_id) desc)) as rank
From city_info ci
Left join user_click uc on uc.city_id = ci.city_id and uc.date='2018-06-20'
Left join product_info pd on pd.product_id = uc.product_id
Group by ci.area,uc.product_id,pd.product_name
) t where t.rank select * from product_stat where date='2018-06-20'
OK
7 product7 CC 39 1 2018-06-20 2018-06-20
26 product26 CC 39 2 2018-06-20 2018-06-20
70 product70 CC 38 3 2018-06-20 2018-06-20
4 product4 EC 40 1 2018-06-20 2018-06-20
96 product96 EC 32 2 2018-06-20 2018-06-20
99 product99 EC 31 3 2018-06-20 2018-06-20
9 product9 NC 16 1 2018-06-20 2018-06-20
40 product40 NC 16 2 2018-06-20 2018-06-20
94 product94 NC 13 3 2018-06-20 2018-06-20
NULL NULL NE 01 2018-06-20 2018-06-20
67 product67 NW 201 2018-06-20 2018-06-20
56 product56 NW 20 2 2018-06-20 2018-06-20
48 product48 NW 19 3 2018-06-20 2018-06-20
38 product38 SC 35 1 2018-06-20 2018-06-20
88 product88 SC 34 2 2018-06-20 2018-06-20
33 product33 SC 34 3 2018-06-20 2018-06-20
16 product16 SW 2018-06-20 2018-06-20
95 product95 SW 19 2 2018-06-20 2018-06-20
60 product60 SW 19 3 2018-06-20 2018-06-20
Time taken: 0.345 seconds, Fetched: 19 row (s)
Hive >
4. Use sqoop to export the statistical table product_stat data of hive to the statistical table product_stat of MySQL:
. / sqoop export\
-- connect jdbc:mysql://localhost:33066/ruozedata\
-- username root\
-- password root\
-- table product_stat\
-export-dir / ruozedata_03/product_stat/date=2018-06-20\
-- input-fields-terminated-by'\ t'\
-- input-null-string'--input-null-non-string 0\
-columns "product_id,product_name,area,click_count,rank,days"\
-- update-key product_id,area-- update-mode allowinsert\
Go to MySQL to check whether the statistics have been successfully exported:
Mysql > select * from product_stat where days='2018-06-20 'order by area,rank
+-+ +
| | product_id | product_name | area | click_count | rank | days | |
+-+ +
| | 7 | product7 | CC | 39 | 1 | 2018-06-20 | |
| | 26 | product26 | CC | 39 | 2 | 2018-06-20 | |
| | 70 | product70 | CC | 38 | 3 | 2018-06-20 | |
| | 4 | product4 | EC | 40 | 1 | 2018-06-20 | |
| | 96 | product96 | EC | 32 | 2 | 2018-06-20 | |
| | 99 | product99 | EC | 31 | 3 | 2018-06-20 | |
| | 9 | product9 | NC | 16 | 1 | 2018-06-20 | |
| | 40 | product40 | NC | 16 | 2 | 2018-06-20 | |
| | 94 | product94 | NC | 13 | 3 | 2018-06-20 | |
| | 67 | product67 | NW | 20 | 1 | 2018-06-20 | |
| | 56 | product56 | NW | 20 | 2 | 2018-06-20 | |
| | 48 | product48 | NW | 19 | 3 | 2018-06-20 | |
| | 38 | product38 | SC | 35 | 1 | 2018-06-20 | |
| | 88 | product88 | SC | 34 | 2 | 2018-06-20 | |
| | 33 | product33 | SC | 34 | 3 | 2018-06-20 | |
| | 16 | product16 | SW | 20 | 1 | 2018-06-20 | |
| | 95 | product95 | SW | 19 | 2 | 2018-06-20 | |
| | 60 | product60 | SW | 19 | 3 | 2018-06-20 | |
+-+ +
18 rows in set (0.00 sec)
After reading this article, I believe you have a certain understanding of "how HIVE counts the most popular TOP3 products in various regions". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.