In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to import the log of hadoop". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to import the hadoop log"!
It is stated in advance that what is imported this time is not the original system log, but the data from the traffic statistics system developed by our technicians, but the basic format is similar to the original data.
Create a datasheet:
/ / create an extension table. The advantage of an extension table is that when you execute drop table, only the original data is deleted, not the original data.
CREATE EXTERNAL TABLE weblog (
Id string
Ip string
Url string
Referrer string
Urlflow string
Useragent string
Usercolordepth string
Userlanguages string
Userresolution string
Username string)
PARTITIONED BY (year string, month string) / / use year and month as partition
Row format delimited
Fields terminated by'\ |'/ / use "|" as the default delimiter
STORED AS TEXTFILE
Import data:
Load data local inpath'/ home/hadoop/20130206.txt' overwrite into table weblog partition (year='2013',month='2')
Execute query: select count (*) from weblog
Results:
The top 10 URL with the highest traffic statistics:
Select url,count (url) as num from weblog group by url order by num desc limit 10
In the test environment, only a small amount of data was imported, and the result was good.
Function application:
Parse_url function, which can decompose domain names or query parameters in url, which makes it easier for us to count user behavior. For example, the top 10 url visited by us were changed to the top 10 domain names visited most:
Select parse_url (url,'HOST'), count (parse_url (url,'HOST')) as num from weblog group by parse_url (url,'HOST') order by num desc limit 10
two。 Edit the kpi analysis:
On the premise that we have an article and a corresponding table for editing id in our database, ha ha, there will be a general cms system, if not, I can only say, master ~
Create a tabl
CREATE EXTERNAL TABLE articles (id string,title string,username string)
Row format delimited
Fields terminated by'\ |'/ / use comma as the default delimiter
STORED AS TEXTFILE
Importing data has been talked about before, so I won't repeat it.
The id value of the internal consultation on our website is guid, so just use the regexp_extract function to regularly match the id value in url.
Get the number of visits, title, and related editors of the top 10 most visited pages of the day:
Select nid,num,title,username
From (select nid,count (nid) as num
From (select regexp_extract (url,' ([A-Z0-9] {8}\-[A-Z0-9] {4}\-[A-Z0-9] {12})', 1) as nid from weblog) as T1 group by nid order by num desc) as T2
Join articles on (articles.id = concat ("{", t2.nid, "}") limit 10
Running result:
Because the sql statement is a little more complex than the previous one, we can see that hive divides it into several map/reduce.
For example, other analyses are similar to this. If the user deletes an item in the shopping cart, then there must be a product id in the url, there must be an action similar to delete or remove, so we can analyze the items that are removed by the user most every day, and so on.
At this point, I believe you have a deeper understanding of "how to import hadoop logs". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.