Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the user information

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to analyze user information". Friends who are interested might as well take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to analyze user information"!

Don't rush to start.

When we want to analyze the log, we first use the ls-lh command to check the size of the log file. If the log file size is very large, it is best not to do it online.

For example, my following log is 6.5m, which is not big, and the problem of online environmental analysis is not big.

If the amount of data in the log file is too large, as soon as you directly execute a cat command, it will affect the online environment and increase the load on the server, which may cause the server to become unresponsive.

When the log is found to be very large, we can use the scp command to transfer the file to the idle server for further analysis. The scp command is used as shown below:

Use cat cautiously

We all know that the cat command is used to view the contents of the file, but how much data the log file has and how much it reads is obviously not suitable for large files.

For large files, we should get into the habit of using the less command to read the contents of the file, because less will not load the whole file, but load on demand, first output a small page of content, when you want to read down, will continue to load.

It can be found that each line of nginx's access.log log is a record of a user visit, which contains the following information from left to right:

IP address of the client

Access time

HTTP request method, path, protocol version, protocol version, returned status code

User Agent, usually the operating system used by the client and the version, browser and version, etc.

However, sometimes if we want to see the latest section of the log, we can use the tail command, for example, when you want to see the bottom five lines, you can use this command:

If you want to see the log print in real time, you can use the tail-f command, so that when you look at the log, it will be blocked, and when there is a new log output, it will be displayed in real time.

PV analysis

The full name of PV is Page View, users visit a page is a PV, for example, most blog platforms, click on the page, read by 1, so the number of PV does not represent the real number of users, just a number of clicks.

For the acess.log log file of nginx, it is relatively easy to analyze the PV. Since the contents of the log are access records, there are as many PV as there are log records.

We can view the overall PV by directly using the wc-l command, as shown in the figure below, with a total of 49903 PV.

PV grouping

The acess.log log file of nginx has access time information, so we can group according to access time, such as grouping by day and viewing the total PV of each day, so that we can get more intuitive data.

To group by time, let's first filter out the "access time". Here you can use the awk command to deal with it. Awk is a powerful tool for processing text.

The awk command defaults to "space" as the delimiter. Since the access time is in column 4 of the log, you can use the awk'{print $4} 'access.log command to filter out the access time information. The result is as follows:

The above information also contains hours and seconds. If you only want to display the information of the year, month and day, you can use the substr function of awk to intercept 11 characters starting with the second character.

Next, we can use sort to sort the dates, and then use uniq-c for statistics, so the PV grouped by day comes out.

As you can see, the daily amount of PV is about 2000-2800:

Note that sort sorting is required before using the uniq-c command, because the principle of uniq deduplication is to compare adjacent lines, and then remove the second line and subsequent copies of that line, so use the sort command to make all duplicate lines adjacent before using the uniq command.

UV analysis

The full name of UV is Uniq Visitor, which represents the number of visitors. For example, the number of official account readings is counted by UV, no matter how many times a user clicks, it is only one reading.

Although there is no identity information of the user in the access.log log, we can use the "client IP address" to approximate the UV.

The output of this command is 2589, which means that the amount of UV is 2589. In the image above, the command from left to right means as follows:

Awk'{print $1} 'access.log, take the contents of column 1 of the log, and the IP address of the client is the column 1.

Sort, sort the information

Uniq to remove duplicate records

Wc-l, check the number of records

UV grouping

Assuming that we analyze the number of UV per day in groups, this situation is slightly more complicated and requires more commands to implement.

Since you want to count UV by day, you have to filter out the "date + IP address" and remove the duplicates. The command is as follows:

The specific analysis is as follows:

The first ack is to filter out the date in column 4 and the client IP address in column 1, and concatenate them with spaces

Then sort sorts the first ack output

Then use uniq to remove duplicate records, that is, only one row with the same date + IP is retained.

The above only lists the data of UV, but does not count the number of times.

If you need to count the UV of the day, you can use the above command to concatenate awk'{uv [$1] +; next} END {for (ip in uv) print ip, uv [ip]} 'command. The result is as follows:

Awk itself is processed "line by line", and when one line is executed, we can use the next keyword to tell awk to jump to the next line and take the next line as input.

For each line of input, awk accumulates according to the string in column 1 (that is, the date), so that the ip address of the same date adds up as the number of uv for the day.

The following END keyword represents a trigger that executes the statement in END {} only when the previous input is complete. The statement in END is to foreach through all the key in the uv to print out the number of uv grouped by day.

Terminal analysis

The information about User Agent at the end of nginx's access.log log is mainly about the tools used by the client to access the server, which may be mobile phones, browsers, etc.

Therefore, we can use this information to analyze which terminals have accessed the server.

The information of User Agent is in column 12 of the log, so we first use awk to filter out the contents of column 12, then sort the contents of column 12, then use uniq-c to remove duplicates and statistics, and finally use sort-rn (r for reverse sorting, n for numerical sorting) to sort the statistical results. The results are as follows:

Analyze TOP3's request

In the access.log log, column 7 is the path of the client request. First, use awk to filter out the contents of column 7, then sort the contents of column 7, then use uniq-c to remove duplicates and statistics, and then use sort-rn to sort the statistical results. Finally, use head-n 3 to analyze the TOP3 request. The result is as follows:

At this point, I believe you have a deeper understanding of "how to analyze user information". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report