In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Installation
First, you need to install Java and Scala, then download the Spark installation to make sure that PATH and JAVA_HOME are set up, and then you need to use Scala's SBT to build the Spark as follows:
$sbt/sbt assembly
The build time is relatively long. After the build is complete, verify that the installation is successful by running the following command:
$. / bin/spark-shellscala > val textFile = sc.textFile ("README.md") / / create a point to README.md reference scala > textFile.count / / count the number of lines in this file scala > textFile.first / / print out the first line
Apache access Log Analyzer
First of all, we need to write an analyzer for the Apache access log in Scala, but fortunately someone has already written it to download Apache logfile parser code. Use SBT for compilation and packaging:
Sbt compilesbt testsbt package
The package name is assumed to be AlsApacheLogParser.jar.
Then start Spark on the Linux command line:
/ / this works$ MASTER=local [4] SPARK_CLASSPATH=AlsApacheLogParser.jar. / bin/spark-shell
For Spark 0.9, some methods do not work:
/ / does not work$ MASTER=local [4] ADD_JARS=AlsApacheLogParser.jar. / bin/spark-shell// does not workspark >: cp AlsApacheLogParser.jar
After the upload is successful, create an AccessLogParser instance in Spark REPL:
Import com.alvinalexander.accesslogparser._val p = new AccessLogParser
You can now read the apache access log accesslog.small as you did before reading readme.cmd:
Scala > val log = sc.textFile ("accesslog.small") 11:25:23 on 14-03-09 INFO MemoryStore: ensureFreeSpace (32856) called with curMem=0, maxMem=30922506214/03/09 11:25:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 32.1 KB, free 294.9 MB) log: org.apache.spark.rdd.RDD [String] = MappedRDD [1] at textFile at: 15scala > log.count (a lot of output here) res0: Long = 100000
Analyze Apache Lo
We can analyze the number of 404s in the Apache log by creating them as follows:
Def getStatusCode (line: Option [AccessLogRecord]) = {line match {case Some (l) = > l.httpStatusCode case None = > "0"}}
Where option [AccessLogRecord] is the return value of the parser.
Then use the following on the Spark command line:
Log.filter (line = > getStatusCode (p.parseRecord (line)) = = "404") .count
This statistic will return the number of rows with a httpStatusCode of 404.
Dig deeply
If we want to know which URL is problematic, such as a space in URL that causes a 404 error, we obviously need to take the following steps:
Filter out all 404 records to get request fields from each 404 record (whether the URL string requested by the analyzer has spaces, etc.) do not return duplicate records
Create the following method:
/ / get the `request`field from an access log recorddef getRequest (rawAccessLogString: String): Option [String] = {val accessLogRecordOption = p.parseRecord (rawAccessLogString) accessLogRecordOption match {case Some (rec) = > Some (rec.request) case None = > None}}
Paste this code into Spark REPL, and then run the following code:
Log.filter (line = > getStatusCode (p.parseRecord (line)) = = "404") .map (getRequest (_)). Countval recs = log.filter (line = > getStatusCode (p.parseRecord (line)) = = "404") .map (getRequest (_)) val distinctRecs = log.filter (line = > getStatusCode (p.parseRecord (line)) = = "404") .map (getRequest (_)) .distinctdistinctRecs.foreach (println)
Summary
Of course, grep is better for simple analysis of access logs, but more complex queries require Spark. It is difficult to determine the performance of Spark on a single system. This is because Spark is aimed at large files in distributed systems.
The above is the whole content of this article, I hope it will be helpful to your study, and I also hope that you will support it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.