Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to apply spark in Log Analysis system

2025-10-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

How to carry out the simple application of spark in the log analysis system, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

1. Download spark and run

Wget http://apache.fayea.com/apache-mirror/spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz

What I download here is version 1.0.0, because we are only testing the use of spark, so we do not need to configure the spark cluster, just extract the files and enter the bin/ folder.

Spark supports scala,java and python.

Scala and java input commands:. / spark-shell python enter commands. / pyspark enters the console. Let's take python as an example:

It's simple without any configuration. A good start is half the success. Let's move on.

two。 Simple application

To read the text file, let's take the previous log file as an example:

> file = sc.textFile ("/ home/hadoop/20130207.txt")

PS:python is case-sensitive; the path should be full, otherwise the default is the path where you run the command; the python2.7.X version has always been a problem for coding support, please try to unify the encoding of the source file, such as "utf-8".

Displays the total number of rows:

> > file.count ()

265063

Display the first line:

> > file.first ()

Get the count of all url accessed by ie8:

> file.filter (lambda line: "MSIE 8.0" in line) .count ()

98670

Ps: lambda is written for anonymous functions; filter enters the entire line by default, and the above code means to iterate through each line and calculate the sum of the lines containing the "MSIE 8.0" string.

Get the number of fields in the row with the largest number of fields:

> file.map (lambda line: len (line.split ("|")) .reduce (lambda a _ r b: an if a > b else b)

Ps:map (lambda line: len (line.split ("|")) is the number of members that split each row into sets and return the set.

The reduce built-in function in python is a binary operation function, which is used to perform the following operations on all data in a data set (linked list, tuple, etc.): use the function func () passed to reduce (which must be a binary operation function) to first operate on the first and second data in the set, and then operate with the third data with the func () function, and finally get a result.

Count the number of occurrences of each string (field contents):

File.flatMap (lambda line: line.split ("|")) .map (lambda word: (word,1)) .reduceByKey (lambda a dint bjazahib) .collect ()

The result shows that there is too much data, nervous breakdown, another way.

Count the number of occurrences of each string (field contents) and display a maximum of 10 strings:

> > file.flatMap (lambda line: line.split ("|")) .map (lambda word: (word,1)) .reduceByKey (lambda a-Leng blo ahib) .map (lambda (k-Magi v): (vMagne k)) .sortByKey (). Top (10)

SortByKey is sorted by key value, and top is the first X records extracted, similar to limit in hive. Since there is no sortByValue method in spark, key and value are interchanged before sorting.

This is the answer to the simple application of spark in the log analysis system. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report