In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-10-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to carry out the simple application of spark in the log analysis system, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
1. Download spark and run
Wget http://apache.fayea.com/apache-mirror/spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz
What I download here is version 1.0.0, because we are only testing the use of spark, so we do not need to configure the spark cluster, just extract the files and enter the bin/ folder.
Spark supports scala,java and python.
Scala and java input commands:. / spark-shell python enter commands. / pyspark enters the console. Let's take python as an example:
It's simple without any configuration. A good start is half the success. Let's move on.
two。 Simple application
To read the text file, let's take the previous log file as an example:
> file = sc.textFile ("/ home/hadoop/20130207.txt")
PS:python is case-sensitive; the path should be full, otherwise the default is the path where you run the command; the python2.7.X version has always been a problem for coding support, please try to unify the encoding of the source file, such as "utf-8".
Displays the total number of rows:
> > file.count ()
265063
Display the first line:
> > file.first ()
Get the count of all url accessed by ie8:
> file.filter (lambda line: "MSIE 8.0" in line) .count ()
98670
Ps: lambda is written for anonymous functions; filter enters the entire line by default, and the above code means to iterate through each line and calculate the sum of the lines containing the "MSIE 8.0" string.
Get the number of fields in the row with the largest number of fields:
> file.map (lambda line: len (line.split ("|")) .reduce (lambda a _ r b: an if a > b else b)
Ps:map (lambda line: len (line.split ("|")) is the number of members that split each row into sets and return the set.
The reduce built-in function in python is a binary operation function, which is used to perform the following operations on all data in a data set (linked list, tuple, etc.): use the function func () passed to reduce (which must be a binary operation function) to first operate on the first and second data in the set, and then operate with the third data with the func () function, and finally get a result.
Count the number of occurrences of each string (field contents):
File.flatMap (lambda line: line.split ("|")) .map (lambda word: (word,1)) .reduceByKey (lambda a dint bjazahib) .collect ()
The result shows that there is too much data, nervous breakdown, another way.
Count the number of occurrences of each string (field contents) and display a maximum of 10 strings:
> > file.flatMap (lambda line: line.split ("|")) .map (lambda word: (word,1)) .reduceByKey (lambda a-Leng blo ahib) .map (lambda (k-Magi v): (vMagne k)) .sortByKey (). Top (10)
SortByKey is sorted by key value, and top is the first X records extracted, similar to limit in hive. Since there is no sortByValue method in spark, key and value are interchanged before sorting.
This is the answer to the simple application of spark in the log analysis system. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.