In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to use Spark to analyze website logs, I believe that many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Depressed from yesterday, the personal website constantly issued an alarm 504 error. After logging in to the machine, the php-fpm reported an error. After this error restarted php-fpm, the alarm was given in a few hours. It was no problem for almost a year, strange.
[28-Sep-2016 11:53:19] NOTICE: ready to handle connections
[28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
Thought that this value was set too small, so the configuration was modified and the value was changed to a large value.
[28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179
[28-Sep-2016 15:51:43] NOTICE: ready to handle connections
[28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children
[28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
The result is still the same later, a few hours later again 504 alarm, and then look at the nginx log, found that some strange ip traffic is very large. It is suspected that there are malicious ip visits. It seems necessary to check the number of ip visits in the access log.
Root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log
121.42.53.180-- [25/Sep/2016:06:26:29 + 0800] "POST / wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" WordPress/4.3.1; http://zhwen.org"
182.92.148.207-[25/Sep/2016:06:26:29 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
203.208.60.226-- [25/Sep/2016:06:28:55 + 0800] "GET /? pinch 675 HTTP/1.1" 200 8204 "-" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)")
203.208.60.226-[25/Sep/2016:06:28:57 + 0800] "GET / wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
203.208.60.226-[25/Sep/2016:06:28:57 + 0800] "GET / wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200374 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
203.208.60.226-[25/Sep/2016:06:28:58 + 0800] "GET / wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200771 "http://zhwen.org/?p=675"" Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
121.43.107.174-[25/Sep/2016:06:29:18 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
115.28.189.208-[25/Sep/2016:06:29:33 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
42.156.139.59-[25/Sep/2016:06:30:58 + 0800] "GET /? paged=14 HTTP/1.1" 11164 "-" YisouSpider "
182.92.148.207-[25/Sep/2016:06:31:29 + 0800] "GET / HTTP/1.1" 41253 "-" Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) "
61.135.169.81-[25/Sep/2016:06:34:14 + 0800] "GET /? p=articles/cscope-tags HTTP/1.1" 10681 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 1012) AppleWebKit/602.1.50 (KHTML, like Gecko) "
61.135.169.81-[25/Sep/2016:06:34:14 + 0800] "GET / apple-touch-icon-precomposed.png HTTP/1.1" 404151 "-" Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86 / 64) "
So a simple statistic is made on the ip of the access log:
1) first take out the ip (in order to reduce the amount of data, it can also be directly compressed and downloaded to the local), and then downloaded to the local
Root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log | awk'{print $1}'> tt
Execute the following code in sparkshell:
Val line = sc.textFile ("/ data1/data/t1")
Line.flatMap (_ .split (")) .map ((_, 1)) .reduceByKey (_ + _)
.map (e = > (e. ReduceByKey 2, e. Map 1). Map (_ + "," + _)
.sortByKey (true,1) .saveAsTextFile ("/ data1/data/t3")
2) the content of the final result T3 is as follows, it is found that the traffic of these ip is very large, especially
191.96.249.53
.
(855182.92.148.207)
(3100121.8.136.75)
(3889pm 61.135.169.81)
(53513191.96.249.53)
3) set up another iptables restriction and get it done. Spark to do this kind of statistical analysis is very simple, just one line of code to get the analysis done.
Root@iZ28bhfjhgkZ:/var/log# iptables-L
Chain INPUT (policy ACCEPT)
Target prot opt source destination
Chain FORWARD (policy ACCEPT)
Target prot opt source destination
Chain OUTPUT (policy ACCEPT)
Target prot opt source destination
Root@iZ28bhfjhgkZ:/var/log# iptables-An INPUT-s 191.96.249.53-j DROP
Root@iZ28bhfjhgkZ:/var/log# iptables-L
Chain INPUT (policy ACCEPT)
Target prot opt source destination
DROP all-DEDICATED.SERVER anywhere
Chain FORWARD (policy ACCEPT)
Target prot opt source destination
Chain OUTPUT (policy ACCEPT)
Target prot opt source destination
Root@iZ28bhfjhgkZ:/var/log# has read the above content, have you mastered how to use Spark to analyze the website log? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.