Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use MongoDB to analyze Nginx logs

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to use MongoDB to analyze Nginx logs". In daily operation, I believe many people have doubts about how to use MongoDB to analyze Nginx logs. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "how to use MongoDB to analyze Nginx logs". Next, please follow the editor to study!

Log parsing process

Normally, the process for nginx log resolution is as follows:

In general, we will split the log to be parsed in advance, and the common way is to save the log for 1 week according to the date. Then there is log parsing, in which tools or programming languages such as awk, grep, perl, python are used.

The final storage and visualization processing generally depends on the business, and there are no mandatory requirements.

The solution of log query

The common solutions for nginx log parsing are as follows:

Parsing through awk and grep

Log mapping through postgresql external tables

Log query through the combination of python and mongodb

Query through elk, an open source suite

Among them, the method of postgresql external table has been used in the company before, of course, it is to deal with multiple 3gb-sized logs of the company. While the first and fourth solutions do not have much practical experience, here we focus on the second solution.

Log format

With regard to log parsing processing, our more common way is to use regular expressions for matching, and the commonly used library is nginxparser, which we can install directly through pip. Of course, there are other ways to parse, depending on the business.

In log parsing, the most important thing is the log format. By default, the log format of nginx is as follows:

Log_format main'$remote_addr-$remote_user [$time_local] "$request"'$status $body_bytes_sent "$http_referer"'"$http_user_agent"$http_x_forwarded_for"'$upstream_addr $upstream_response_time $request_time

Let's look at an application in the actual business. Before the company had an activity to rob WeChat red packet, of course, some users reported that they could not get a red packet for several days. Therefore, our team members thought that there might be cheating in the process, so they decided to parse the nginx log.

Here is a record of a real log:

101.226.89.14-[10/jul/2016:07:28:32 + 0800] "get / pocketmoney-2016-xikxcpck.html http/1.1" 302231 "-" mozilla/5.0 (linux; android 5.1; oppo r9tm build/lmy47i) applewebkit/537.36 (khtml, like gecko) version/4.0 chrome/37.0.0.0 mobile mqqbrowser/6.2 tbs/036548 safari/537.36 micromessenger/6.3.22.821 nettype/wifi language/zh_cn "

Log analysis

Parsing through awk

Next, let's take a look at how to use awk to parse the most visited records of ip. You can learn about awk syntax by referring to:

Dog@dog-pc:~$ awk'{a [$1] + +} end {for (i in a) print iMagazine a [I]} 'nginx.log | sort-t'-K2-rn | head-n 10 111.167.50.208 26794 183.28.6.143 16244 118.76.216.77 9560 14.148.114.213 3609 183.50.96.127 3377 220.115.235.21 3246 222.84.160.249 2905 121.42.16 2212 14.208.240.200 2000 14.17.37.143

By default, awk is delimited by spaces, so $1 will get the remote address in the default format of nginx. Here, we define a field, use ip as the key name, and add 1 to the number of corresponding key names if they exist. Finally, we iterate through the dictionary, sort it by number, and finally get 10 records through head.

Of course, this operation method has a big error, because we do not specify other conditions such as status code. Let's take a look at the data filtered according to the status code and request method:

Dog@dog-pc:~$ awk'{if ($9 > 0 & & $9 million) & & substr ($6heli2) = = "get") a [$1] + +} end {for (i in a) print I A [I]} 'nginx.log | sort-t'-K2-rn | head-n 10 222.84.160.249 2856 183.6.143 2534 116.1.17.110 14.208.240.200 1521 14.17.143 1335 219.133.40.13 1014 219.133.40.15 994 14.17.144 988 14.17.161 960 183.61.5195 944

In this way, we can analyze the 10 ip and consider the next steps, such as prohibiting the access of the ip or limiting the number of visits through the iptables combination.

Through postgresql

The query method using sql after entering the database through postgresql can be viewed by the following two kinds of images:

The main thing in the above figure is to view the total number of request status codes in the log. The following figure shows the filtering of the top 10 ip with a status code of 200:

You can see that it is basically the same as the way awk parses above.

Query through mongodb

We know that mongodb is a document database, through this database we help solve some of the work that relational databases are not very good at.

In python, the main mongodb client driver is pymongo. We can establish a connection in the following ways:

In [1]: from pymongo import mongoclient in [2]: client = mongoclient ()

Since we are using the default port and address here, no parameters are passed in the mongoclient class.

Here, let's talk about the format of the log we inserted into mongodb:

{"status": 302, / / http status code "addr": "101.226.89.14", / / remote ip address "url": "-", "req": "/ pocketmoney-2016-xicxcpck.html", / / requested address "agent": "mozilla/5.0 (linux; android 5.1) Oppo r9tm build/lmy47i) applewebkit/537.36 (khtml, like gecko) version/4.0 chrome/37.0.0.0 mobile mqqbrowser/6.2 tbs/036548 safari/537.36 micromessenger/6.3.22.821 nettype/wifi language/zh_cn ", / / request user-agent" referer ":" nettype/wifi "," t ":" 06:28:32 on 2016-07-10 ", / / request time" size ": 231 / / response size "method": "get", / / request method "user": "-" / / user name}

Here we parse through python, assemble it into the above format and insert it into mongodb. Here, we mainly use the insert_one method of the mongodb document object to insert a record.

Db = client ['log'] col = db [' nginx'] data = {}. Col.insert_one (data)

Then we start to query the above records, mainly through the map-reduce provided by mongodb to achieve the aggregation operation, the corresponding python code is:

In [3]: db = client ['log'] in [4]: col = db [' nginx'] in [5]: pipeline = [.: {"$match": {"status": 200}},.: {"$group": {"_ id": "$addr", "count": {"$sum": 1},.: {"$sort": {"count":-1}} .: {"$limit": 10}...:] in [6]: list (col.aggregate (pipeline)) out [6]: [{upright counter: upright 222.84.160.249, upright counter: 2856}, {upright counter: 183.28.6.143, upright counter: 2534}, {upright counter: 116.1.127.110, upright counter: 1625} {upright counter: 14.208.240.200, upright countdown: 1521}, {upright counter: 14.17.37.143, upright countdown: 1335}, {upright counter: 219.133.40.13, upright countdown: 1014}, {upright counter: 219.133.40.15, upright countdown: 994} Upright counting: 988}, {upright counting: 14.17.37.161, upright countdown: 960}, {upright counting: 183.61.51.195, upright countdown: 944}]

You can see that this process is consistent with the results obtained in the previous two ways.

About Visualization

For visualization, we can choose some libraries of javascript, such as:

Baidu's echarts

D3.js and its derived libraries

For python, some of the following libraries can be used for visualization:

Matplotlib

Pandas

Of course, there are some other libraries that will not be described here.

The following is an interface drawn using Baidu echart:

At this point, the study on "how to use MongoDB to analyze Nginx logs" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report