In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
What this article shares with you is about the depth and breadth of big data. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Let's take a look at it with the editor.
The depth and breadth of big data
If big data corresponds to a huge amount of data, it is a very vague concept, which is equivalent to becoming a synonym for information, and it is obviously difficult to answer the question of what information can do.
At this time, it is usually necessary to classify in order to promote thinking. If we take time and space as the most basic perspective, the first thing to distinguish is the depth and breadth of big data. Big data is a complete history from the perspective of time, and big data is a trace of global activities from the perspective of space. The former can be regarded as a kind of depth, while the latter can be regarded as a kind of breadth. Different scenes have different emphasis on depth and breadth.
For some vertical industries, such as medical care, big data's depth is more important. After all the history can be found in the data, people can better understand and optimize the corresponding industries.
For society, breadth is more important in many cases. We only have a little information about a scene, but when this information is enough and wide enough, it is possible to describe the whole picture in a relatively timely manner. The frequently cited examples of Google predicting infectious diseases rely on this breadth.
This determines big data's application development trend. In deep and important places, organizations like companies need to become the main body, and the difficulty is how to cross the boundary of data ownership. It is clear that it is good for hospitals to digitize and share all treatment cases, but if only one hospital does so, it is more likely to be the downside of a rebound in privacy for that hospital.
In places where breadth is important, although companies can also benefit from search in the field of search, the organization that can really benefit from big data is actually the government. The wider the data, the larger the subject it describes, and if it describes society as a whole, it is clear that the primary responsibility of society will benefit from it. This is a matter of common sense, which is almost the same as not taking the medicine prescribed by the doctor when you see a doctor. Sometimes CCTV will show the personnel flow chart made by Baidu during the Spring Festival, which can also explain the problem from the side. This kind of mobility map is far less helpful to the people who can make it than to the government.
To sum up, the two directions of depth and breadth have different requirements for data. The former requires more detailed and high-quality data sources, while the latter requires less, but both will face the problem of unequal returns in application. Big data tends to describe the whole, but it is often the individual who has the ability to collect or deal with big data, and the return of the individual is not easily reflected in the promotion of the whole.
Therefore, the bottleneck of big data's development is not the technology, but the establishment of the distribution relationship needed behind it. If this relationship doesn't make sense, the data will stay at the island level, each organization has its own thing, and name it "big data". In order to straighten out this relationship, we have to return to a very classic question of whether the "Commons" can be established.
The assumption of data Commons
Big data is actually a bit like the Commons. One of the most famous arguments in economics is the tragedy of the Commons. American Economic History gives an easy-to-understand example of what is the tragedy of the Commons:
... These economic reasoning helps explain how the sharing of collective ownership and output (equal or fixed share) leads to the problem of "free hitchhikers". To illustrate this point, consider sharing land ownership and 10 workers who jointly produce 100 bushels of corn, consuming an average of 10 bushels of corn per person. Suppose a worker starts to slack off and halves his labor efforts, resulting in a 5 bushel reduction in output. As a result of the output-sharing system, the consumption of slackers, like other workers, is now 9.5 bushels. Although his efforts have fallen by 50%, his consumption has fallen by only 5%. Lazy people are hitchhiking other people's work.
There is a very profound human problem behind this, and even if we can work together to create more wealth, individuals can share more from it, but the obvious personal tendency in the group is to work less but share more. This actually has something to do with the prisoner's dilemma.
At present, there is no complete solution to this problem in the physical-based world, and we can only rely on some basically accepted distribution order, such as: natural selection in the past, but bit-based digital wealth now seems to have the possibility to solve the problem.
The difference between bit-based data and physical data is that data is not something you take away from me, and the price of hardware is falling rapidly, and open source makes data access tools basically free. These things add up to make the data Commons possible.
The interesting question here is that if people care more about whether the absolute value of what I get becomes larger, then the formation of the data Commons is more likely, because if there is a data Commons, then everyone (enterprise) must gain more. But if people care more about whether I have more than you, then there will be many obstacles to the construction of the data Commons, because the Commons actually allow relevant people to stand on the same competitive starting point.
Big data's problem is a technical problem in the use of data, but it is actually a socio-economic problem in terms of data sources, and the latter is more difficult, so the development of big data's application does not depend on the development of technology but on the speed of socio-economic change. In limited areas, such as search, e-commerce, and cloud computing, technology has been fully developed. At present, the question of who pays and who benefits is the most important problem in the process of turning small data into big data.
Which way does big data go?
The internal driving force of data development is that the more complete the data is, the greater the value of data is. In fact, this is also a network effect, which leads to only two trends in the development of data ownership from a macro point of view:
One is that, like today's mobile devices, everyone has their own private data sources, and then they start a life-and-death competition, and eventually one of them survives, which can also achieve the goal of data unification.
The other is to start working together in the competition to build the data Commons mentioned above.
As mentioned earlier, the nature of industry data is very different from that of the whole society, so it should be discussed separately.
For industry data, frank co-operation between competitors is unlikely unless a very special person appears. In this case, the easiest way is to introduce a third party.
For example, every operator has the mobile data of almost all netizens, but it is difficult for operators to cooperate openly with each other to integrate these data to create some kind of value. At this time, if a third party intervenes, it is possible to make a good benefit distribution plan.
If this can be achieved, the only key point is whether the corresponding business model can exceed the cost of data processing. What must be stressed is that big data's value density is very sparse, and many things are valuable but not necessarily worth doing. One of the key reasons why video websites can not make money is that the cost of bandwidth and storage is relatively high. and for big data, the business model is not good, the situation may be worse than video sites. In any case, the cost of mining is less than the cost of mining to be valuable.
The above problems may not be too big in the industry data, generally speaking, the value density of the industry data will eventually be larger, and because it is relatively vertical, the total amount is limited after all. Therefore, big data's industry application is relatively easy to develop.
But for social data, this is a problem in many cases. We all know that the comprehensiveness of the sample is more valuable than the amount of data, but if more is the only way to ensure the comprehensiveness of the sample, it means that there must be complete data to do one thing to make sense.
There are two kinds of application directions of social data, one is the data that enterprises can handle, such as Google, and the other is the activity data of people related to smart cities, which belongs to a social level and is difficult to belong to an enterprise alone. The latter needs to be supported by the data Commons mentioned above.
From the perspective of data, there are two forms of data storage: one is that enterprises like Google have all the data on a cross section of the whole society, which should be a special case, and the data will be limited to public information. One is fragmented data related to human behavior, such as shopping-related e-commerce, people-related social networks and IM, offline services in O2O enterprises, railways in 12306, and so on. Google has all the data, but does not own human behavior, so an enterprise like Google is equivalent to having a cross-sectional data of the whole society. All other companies have only data in a vertical area.
If you rely on enterprises to do this kind of data unification attempt, there will be 20 billion investment in O2O in the former, because this will complement the data, and in the latter, there will be e-commerce people who want to socialize, social people who want to do e-commerce, and so on. Similar stories can also happen on the terminal, and the goal of all these actions is for a company to do all these things, but it's impossible, and it's not just for economic reasons. And the data can not get through, that can only be in the fragmented data to do self-righteous big data big data.
So in essence, it is the question of whether the data Commons can be established, and if we want to build the data Commons, we should at least solve the problem of who will do it. The enlightenment given by open source is very critical: first, it cannot be a profit organization; second, it should be supported by many enterprises. Because data involves privacy, there must be clearer rules that define the use of data than open source.
These are the depth and breadth of big data. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.