Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Big data's misunderstanding

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Personal remarks, this article only looks at big data from another perspective, if you can not understand, please laugh it off, do not do unnecessary patting bricks, that's all.

1 what is big data

Nowadays, many people are keen to talk about big data, but you have to ask what is big data and what does big data have to do with you? It is estimated that it is rare to say one, two or three. The reason is, first, because people have a deep primitive desire for new technology, at least not to appear "hillbilly" when chatting; second, there are too few cases in which people can really participate in the practice of big data in the work and living environment.

The first to put forward the arrival of big data era is McKinsey: "data has become an important production factor in every industry and business functional area." people's mining and use of massive data indicates the arrival of a new wave of productivity growth and consumer surplus. "

IBM first summed up the characteristics of big data into four "V" (* Volume, multiple Variety, value Value, fast Velocity). First, the amount of data is huge. Big data's starting unit of measurement is at least P (1000 T), E (1 million T) or Z (1 billion T). Second, there are many data types. For example, web logs, videos, pictures, geographic location information and so on. Third, the value density is low and the commercial value is high. Fourth, the processing speed is fast. Finally, this point is essentially different from the traditional data mining technology. In fact, these V can not really say all the characteristics of * Chu big data. The following picture gives an effective explanation of some related characteristics of big data.

Victor Mayer-Schoenberg gives a variety of examples in the big data era, all in order to illustrate a truth: when the era of big data has come, we should use big data's thinking to explore the potential value of big data. In the book, the author mentions most about how Google uses people's search records to mine the secondary use value of data, such as predicting the trend of a local flu outbreak; how Amazon uses users' purchase and browsing history data to make targeted book purchase recommendations to effectively boost sales; and how Farecast uses all airline ticket price discount data over the past decade to predict whether it is the right time to buy a ticket.

The core of big data mentioned in the book is prediction. There are three shifts in thinking: 1-not random samples, but full data; 2-not accuracy, but hybrid; 3-not causality, but correlation.

2 Analysis of the current situation

According to the main data Bulletin of the third National Economic Census issued on December 16, 2014, there are a total of 10.857 million legal entities engaged in secondary and tertiary industry activities, accounting for 95.6% of all enterprise legal entities. There are 356.023 million employees, with an average of 32.8 per unit. This data shows that the vast majority of Chinese enterprises are small and medium-sized enterprises, in this case, how many enterprises have huge data?

Let's take a look at the data from another point of view. Let's search for the website ranking of several typical domestic customers. The query website is alexa.

User friends:

Neusoft:

Green League:

It can be seen that Yuyou has the largest pv, that is, 63000 a day, and the amount of data in a year is 23 million. With other data, the order of magnitude of the data is G, which is far from T, let alone P. At this level, a better pc server can handle most of the requirements, up to two if reliability is taken into account. Through the above analysis, we can find that in China, the vast majority of companies do not have much data.

3 the core value of big data

The core value of big data mentioned in the big data era is prediction, but we often mention that big data often mentions big data's technology, such as hadoop,spark,storm,hbase,hlive,spark, and people are always happy to discuss it. But the reality is often that the data can only verify the present, the data can not foresee the future!

Take a recent example:

Big data told us that there must be a rebound after the stock market plummeted. So after the 6.25 crash, everyone thought there must be a rebound on Friday. As a result, I was severely educated by the bookmakers on Friday. The double cut of 6.28 (interest rate cut, reserve rate cut), all said that it would rise on Monday, but on Monday, Chinese bookmakers made it clear to retail investors that data and experience are just your wishful thinking, and that they will not give you a breath of breath.

All the technologies that do not take solving the business as the starting point are rogue, the development of computer technology is very rapid, often a technology may not take long to be eliminated or upgraded. If there is no business scenario as a support, the quintessence of learning big data's technology is not of much value, what the author advocates is to put what he has learned into practice. Because a very obvious feature of the brain is forgetfulness, if you don't need it, you will forget about these skills after a period of time. You might as well stop learning for a while and learn when you need to use them later. Except for the basic knowledge of principles.

Secisland is original, please do not reprint it.

4 is the data really valuable?

In many cases, the data is not as valuable as we think, especially the data that can be easily collected on the Internet, such as crawlers, which I didn't understand at first, but it took me a while and basically understood it. Whether you write one by yourself in python or directly use a variety of off-the-shelf software, you can quickly deploy and start collecting. There are a large number of programmers in China, as well as computer enthusiasts who know a little bit about programming, and the emergence of crawler software allows a novice to take a little time to learn to collect. Therefore, the threshold of collection is lowering; secondly, the replicability of data leads to its low cost, especially non-structural data. From a large number of reprinted articles on the Internet, we can see that the dissemination and replication of knowledge is very cheap.

The use of data is valuable. For example, what is the use of an old man who looks at dozens of bits and pieces of data in front of him every day but does not tell him the relationship between behavioral data and business data? A company CEO, see dozens of data every day, what PV, PU, UV and so on are meaningless, for them, just need to know that there is a problem? What's the problem? Did you find anything new? What do you need to do? That's it.

5 big data's bubble

The answer comes from Professor Jordan of Berkeley, one of the most respected experts in machine learning in the world. (the following translation is Quinn Sure, the author of Zhihu.)

1. At present, the reliability of the results given by big data is too low. If you are eager to apply it to practice, it is like starting to build a bridge without learning civil engineering well. As a result, you can only create a "tofu-dreg project". A large wave of "false positive" (false positives) is approaching, because the growth rate of the data is not enough to support our desire to misuse big data everywhere. As a science, it is not rigorous enough (the original text is "without error bar"). Unlike the civil engineering of building a bridge, after years of accumulation, it can clearly tell us what kind of situation can be built and what is not. But big data didn't.

two。 At present, the progress in the field of computer vision is still very small, and it can only be recognized in a very limited range, such as face recognition. (although this is not a direct reference to big data, it can be seen that the author thinks that it is still a long way to achieve sensor of all things, and big data's collection ability is still limited after all).

3. Neural network is not the same thing as the neural network of the human brain. Our understanding of the brain can not be quoted to the extent of computer science. Now the back propagation techniques adopted by deep learning are obviously not the operation of the brain. The structure of network is completely different. What fuzzy processing of data has reached the realm of the human brain, mainly media talk.

A summary of his views:

Some media have made some analogies to make it easy for the public to understand, but this analogy has caused too much misunderstanding, resulting in too much hype (exaggerated hype). Big data is still a science that is not rigorous enough, and there may be a certain probability to make some useful predictions, but improper use and over-premature dependence will lead to disastrous consequences.

Many times people burst into enthusiasm for a technology prematurely in the hope that she can change the world. If there is no result in a short period of time, the enthusiasm may suddenly turn cold and feel that it is a mistake to speed up the withdrawal of resources for this technology. Obviously, Michael is worried that the current public enthusiasm for the technology is not based on the understanding of the technology, which may lead to such a change in attitude. But he believes that this field is real, and many important applications, over time, will create value. But now a lot of media propaganda and even investment behavior are bubbles.

6 start with small data

So how to do it? from the beginning of small data, small data is the digital information of each of our organizations. For example, I drink one or two drinks every day, and suddenly I have a stomachache after drinking wine one day, so I think, what's the difference between this day and before? It turns out that the wine I drank on this day is a new brand, and it may be that drinking this new brand of wine gives me a stomachache. This is the "little data" in my life. It is not as great as big data. It is not as great as Bi Ying's cooking halo, easy to cook, brown and cool, and the plaque is very cool. She cut ∈ and served the emperor to steal. From then on, he cut the squid, waved the squid, waved the squid, followed it, and in the fourth sheet, there were ∈ and Luan Luan, the first model, and Huan Jesus 5, the emperor pried the cloth to imitate it.

First of all, you should understand your own enterprise and what is the core of your industry. There are many enterprises in the process of competition, not by the current competitors, but by many competitors who are not yours. A very simple example, everyone thinks that Amazon is an e-commerce business, but this is wrong, and its main revenue now comes from cloud services. So to find the core data of the enterprise, this is the most important. Only on this basis, use the analysis of these data, and then do some extension. Second, looking for some data related to the inside, to grow it slowly. It's a bit like a snowball, the first layer is the core, and the second layer is the peripheral data. The third layer is some structured data from external institutions. The fourth layer is social, and all kinds of so-called unstructured data. You have to find it layer by layer, and you have to find something valuable related to you. So your data can be used.

Secisland is original, please do not reprint it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report