In the era of big data, the self-cultivation of operation and maintenance engineers 07/02 Update SLTechnology News&Howtos

In the era of big data, the self-cultivation of operation and maintenance engineers

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

"all the past is a prelude."

-Shakespeare

This article is actually an article for the company, but because of the rush of time, the article did not analyze the code at that time, but only about the display of data, but found it very valuable, so I shared it again.

People who know me, of course, what is the first thing I want everyone to do with me?

First of all, please repeat after me. Hello, Python fa.

This article is all the words of a family, if there is a bias, we also hope to correct it.

In the author's opinion, if the data cannot be visualized, then to a large extent it is bullshit, and of course data visualization is only the first step of the long March. Data is like piles of stones, this article may not lead you to see the whole picture, but at least a glimpse of the leopard.

So what does the data look like?

It could look like this.

Or is it like this?

If your Excel is good enough, I think you can use excel to make a good enough picture, but only enough.

In my opinion, the data at least looks like this.

Or something like this.

Note: the ELK suite is used above

ELK installation documentation, refer to: http://youerning.blog.51cto.com/10513771/1726338

But this is only the tip of the iceberg, and it is far from enough. Who can use the tool and how to use it is one level, and reuse on this basis is another level. For the length of the article, this article mainly focuses on the latter and locks the line of sight on the log data. Other data will not be considered for the time being.

The data in the log must have at least three functions

One: the data should be able to explain the problem or phenomenon.

Second: data should solve the problem.

Third: data should be able to predict and prevent problems.

The first point is easy to understand, that is, the data can directly explain the problem or phenomenon in the process of visualization. Even through the most basic extraction and filtering, we can know the amount of visits over a period of time, what the client device is, what the response time is, and further refinement, it should be the number of visits to each ip, and what content is accessed and what content is often accessed, on the premise of knowing the response time. Learn which link takes the longest to visit, etc., as well as some things that may not be your responsibility, please think about the details.

Here is a simple hotspot map of global access to IP:

Note: don't ask me why I don't use ELK's own hot map, two words, wayward ~

Second, since the log can explain the problem, of course, the problem can be solved, the log file in addition to the most basic info records, of course, there is debug information, through the debug information we know where the program runs to throw this bug, why to throw this bug, in order to respond the first time, we also have to quickly locate the host that throws bug.

Through a simple query, we can quickly locate the host where the 404 state occurred, when it occurred, why the client used the device, and what 404 was thrown when accessing.

Third, the problem is not terrible to a certain extent, the terrible thing is that it can not be dealt with in time and appears repeatedly, but there is nothing we can do about it, so how to make effective use of data and cooperate with reliable and real-time monitoring alarm mechanism is very important. As for prediction, the data of certain algorithms can be quantified, then evaluated and simulated.

After talking about the basics, we focus on the reuse of log data.

It is mainly divided into three parts:

First: simple statistics, whether through rrdtool or Excel, or Python, as long as there is data, through pre-selection and data cleaning, you can get the data you want, and visualization will be natural when you have the data.

Second: statistical refinement, data visualization may only be part of data analysis, because visualization can only show very simple results, and can not hear the voice of the inner voice of the data, so a certain degree of statistical technology and programming foundation is very important. Fortunately, Python has enough support libraries.

Third: statistical analysis, this aspect may be more useless than most people, and do not care, so skip it here.

One: simple statistics

Let's start the one-day Top IP,Top URL,Top City.

Top IP

Top URL

Top city

From the above three pictures, we can directly know the IP,URL that we visit most frequently that day, as well as cities. We need to pay attention to the frequency and frequency of individual IPs. URL can help us evaluate, while cities can let us know the audience distribution of services. One of the simplest functions may be that CDN is accelerated. In other aspects, please think on your own, not to expand here.

Second, statistical refinement

We refine it again on the basis of the above, such as the usage of terminal devices in each city, as shown in the following figure

Of course, you can look at it the other way around.

Three: statistical analysis

Let's take a brief look at the correlation coefficient between Android and Apple on user terminals.

Basic trend chart analysis

And the relationship between each terminal, as follows

Related relationship.

No matter from the trend or correlation coefficient, there is a certain correlation between Android and Apple on that day.

In the drawings in the third part, most people may not understand except the trend chart, and they will not explain the various parameters and the corresponding relations too much here, because the time to explain these contents may be longer than that of this article. Although this is not big data, I still want to borrow a sentence from the era of big data as the end of this article. " Big data told us "what" instead of "why". In the era of big data, we do not need to know the reasons behind the phenomenon, we just need to let the data speak for themselves. "

Note: because it is one of the day's data, there are also a lot of irregularities in the process of data processing while errors exist, but the main purpose of this article is to give you some understanding of the data.

Well, the above is the legendary PPT, even if you do not read the text content, but you can probably know that through data visualization, we can display the data to the above level, and the following content is only about part of the visual code explanation.

Because the data in this article is stored based on Elasticsearch, the prerequisite is that you must have the data content stored in Elasticsearch. If you know something about Pandas, you may look at the code to see what's going on.

#-*-coding: utf-8-*-# = # used to generate Top IP,Top URL, historical Top IP Top URL#====import pandas as pdfrom pandas import DataFrameimport matplotlib.pyplot as pltimport numpy as npimport seaborn as snsfrom elasticsearch import Elasticsearchimport arrow##es apies = Elasticsearch (["http://IP:9200/"])## time setting # time_now = arrow.now () .format (" X ") +" 000,000 "index_today =" logstash- "+ arrow.now () .format (" YYYY.MM.DD ") index_all =" logstash-* "# time_yesterday = arrow.now () .replace (days= -1). Format ("X") + "000" # time_year = arrow.now (). Replace (years=-1). Format ("X") + "000" # # query field qroomurl = "xxx_url" qroomip= "xxxx_ip" # # query statement setting function def top_search (query_str): rets = "" {"size": 0 "query": {"filtered": {"filter": {"bool": {"must": [{"term": {"type": "xxxxx_access"}}]}} "query": {"query_string": {"query": "! xxxxx", "analyze_wildcard": true} "aggs": {"% s": {"terms": {"field": "% s", "size": 15}} "% (query_str Query_str + ".raw") return rets # # execute query today_top_ip = es.search (index=index_today,body=top_search (q_ip)) today_top_url = es.search (index=index_today,body=top_search (q_url)) year_top_ip = es.search (index=index_all,body=top_search (q_ip)) year_top_url = es.search (index=index_all Body=top_search (q_url)) df_today_ip = DataFrame (today_top_ip ["aggregations"] [q_ip] ["buckets"]) df_today_url = DataFrame (today_top_url ["aggregations"] [q_url] ["buckets"]) df_all_ip = DataFrame (year_top_ip ["aggregations"] [q_ip] ["buckets"]) df_all_url = DataFrame (year_top_url ["aggregations"] [q_url] [ "buckets"]) p1 = sns.factorplot (x = "key" Y = "doc_count", data=df_today_ip, kind= "bar", palette= "summer") p1.set_xticklabels (rotation=90) p1.set_titles ("Today Top 15 IP") p1.savefig ("topip_today.png", dpi=100) p3 = sns.factorplot (x = "key", y = "doc_count", data=df_all_ip, kind= "bar" Palette= "summer") p3.set_xticklabels (rotation=90) p3.set_titles ("Top 15 IP") p3.savefig ("topip.png", dpi=100)

With the above code, we can generate the current Top 15 IP and the historical Top 15 IP.

It is worth noting that you can filter out the data simply through Kibana, and it is also very good-looking, but why use Python, because you can visualize the data through Kibana, but the reuse of the data is not so ideal, such as generating reports or some more advanced visualization customizations, here is mainly to give you a basic understanding of calling Elasticsearch APi through Python.

Here are some important parts to explain.

First install the dependent library:

Pip install elasticsearch

For the installation of scientific analysis libraries such as pandas, please refer to: http://youerning.blog.51cto.com/10513771/1711008

And then the basic call.

From elasticsearch import Elasticsearchimport arrow##es apies = Elasticsearch (["http://IP:9200/"])

For the query statement, please refer to the official: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search.html

Doc = "" {"size": 0, "query": {"filtered": {"filter": {"bool": {"must": [{"term": {"type": "xxxxx_access"}}]}} "query": {"query_string": {"query": "! xxxxx", "analyze_wildcard": true}, "aggs": {"ip": {"terms": {"field": "xxx_ip" "size": 15}} ""

The meaning of the above statement is to return the query result 0 = > size = 0

Then query the data whose condition type is "xxxx_access" and query the content that is not "xxxxx", that is, filter the string with xxxx.

Then there is the aggs,aggs that we mainly use to represent aggregation, because one of the strengths of Elasticsearch is about data processing. We have had aggs, and the aggregation condition is xxx_ip, so the returned result is the statistical result, such as the total number of times of this IP. Here we set size=15, that is, 15 pieces of aggregated data are returned.

Then we execute the query:

In [19]: es.search (index=index_today,body=top_search (q_ip))

The query results are as above

Conclusion: the use of data is varied, as long as your brain is big enough.

Postscript: in fact, the operation and maintenance engineer has a lot of resources, but no matter the superior or the operation and maintenance engineer himself do not pay attention to it, on the one hand, it is because of the certain threshold of programming, and the other is to draw a circle on the ground as a prison. However, an operation and maintenance engineer like me may not be like a traditional operation and maintenance engineer. After all, the most essential responsibility is to maintain the system and deal with faults. In the three aspects of data reuse in this article, I think most of my colleagues think that the first layer of utilization is far enough, but with the development of the times and the explosive growth of data, we can really have a lot of resources on our hands. Don't you care?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.