Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze big data deeply and simply

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly analyzes how to analyze the relevant knowledge points of big data analysis, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn the knowledge of "how to analyze big data in depth."

The word "big data" has spread to various fields in IT circles. If you really want to ask "how to achieve big data analysis", I am afraid it is IT.

Many people in the circle can't explain clearly at 01:30. Therefore, it is meaningful to try to make a profound and simple analysis of big data's analysis. The benevolent see benevolence, the wise see wisdom, the ability is limited, if the expression is not accurate, I hope you can use a tolerant attitude to understand and guide.

First, scan the following paragraph for five seconds:

If you know that the above is a fragment of a log file, please raise your hand. May I ask your Excellency that you are a respected programmer?

If it looks like a heavenly book, please raise your hand. Please do not doubt your ability, prove that you are a normal person, your life is still full of hope and light.

If the above log information is summarized as follows, does it seem to feel a little bit?

Whenever you visit a website, from the time you open the home page of the website to the time you leave the website, as long as the website is willing, your every move will continue to produce log records like this. The visits of countless people will produce a large number of visit records, and the "user visit situation big data" of this website has been produced in this way.

Then think, what is the value of big data who visits the situation of these users?

That's right! Do website user behavior analysis, understand users' movements and preferences on the site, then recommend to users content that they are more likely to be interested in, provide data reference for website operation decisions, and so on. This process is summed up with a bit of technical style: "Log Nuggets."

Log Nuggets is a specific application scenario analyzed by big data. Because the information in the original log file (data source) is large and comprehensive, and the structure is complex and difficult to read, log mining is like panning for gold, filtering and cleaning out valuable key information-KPI (gold) from the vast ocean of data.

So keep thinking, how to filter out "KPI" from "data source" through technology? Here is a brief data flow chart of the Nuggets. Please take a look at it a little bit more patiently (the text interpretation below will make you turn the corner but there is another village):

The behavior generated by users surfing the Internet is recorded by the "log file". Because the number of visits to the website is very large, the resulting log file is also very large. In order to analyze this file more efficiently, it is saved to a file called "

HDFS

During this process, a complete "log file" will be split into n small files (according to each small file 64MB), and each small file after splitting will copy 2 more backups (n small files will become 3n), and then save these small files to "

HDFS

"on the divided storage node of the system (a storage node can be simply understood as a computer), in the process of saving, the same small file and its copy should be saved on different storage nodes (in order to prevent some computers from breaking down, if there is no backup, the file will be missing).

008.png953x550 55.5 KB

Through the above process, the next step is to find data from a large log file so that you can use a group of computing nodes (computers) to find data in parallel from n small files at the same time, and then merge and summarize the results of each node, which is called MapReduce data cleaning.

The process is a bit complicated. Take Chestnut: count the number of times each word appears from a file that contains a set of words (understood as a "log file"). First, divide a large file into three small files, then count the number of times each word appears in each small file, and finally summarize the statistical results of each small file.

After MapReduce data cleaning, the required key indicator data is extracted from a log file with irregular data structure, large and complete. Please note that the extracted data is still saved in HDFS.

This article mainly analyzes how to analyze the relevant knowledge points of big data analysis, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn the knowledge of "how to analyze big data in depth."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report