In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The editor said: in this era when everyone says big data, many people's impression of big data only stays at the stage of looking up. In fact, big data is not as magical, mysterious or omnipotent as people say. Today, we will compare with traditional data to see what characteristics big data has put him at the top of the tide of the times.
This article is selected from "starting from 1-the Road to the growth of data analysts".
Compared with the traditional data, the main characteristics of big data can be summarized as follows: the amount of data is "large", the data type is "complex" and the data value is "infinite".
It is easy to understand the large amount of data. In the past, the unit we used to store data was KB, and an Excel table was only tens to hundreds of KB. Now we often talk about the data order of magnitude of GB or even TB or even PB. Their quantitative relationship is shown below.
1MB=1024KB
1GB=1024MB
1TB=1024GB
1PB=1024TB
More intuitively, 1KB is equivalent to 512 Chinese characters, and 1MB is equivalent to the number of characters in six books of A Dream of Red Mansions. Taobao generated about 7TB data per day in March 2015, equivalent to 40 million copies of A Dream of Red Mansions, while the National Library of China, China's largest library, has 30 million volumes. From this point of view, our big data really has a huge amount of data. What are the reasons for generating such a large amount of data? We might as well discuss the problem of large amount of data from the way of data acquisition, the way of data transmission and the way of data storage.
The qualitative change of the way of data acquisition is the core element that big data can produce. The traditional way to obtain data is to obtain data manually, and the biggest feature is to enter data manually. For a period of time, supermarkets collected user data by requiring cashiers to type in user characteristics. The keyboard will generally look like the shape shown in figure 3-3.
Supermarkets collect users' data in this way, analyze the collected data, and locate users' portraits and crowds. Just imagine whether the cashier can ensure the accuracy of data entry when the supermarket receives such a large number of guests every day. At the same time, how much data can be collected every day by manual input? Similar to this keyboard recording method, there are many ways to manually input data. Instead of giving examples, the traditional way of recording data must be small-scale, small and inaccurate. At present, most of the data acquisition methods are through URL transmission and API interface. Generally speaking, there are several kinds of data acquisition methods: crawler crawling, user retention, user upload, data transaction and data sharing.
Self-owned data and external data are the two main channels of data acquisition. In our own data, we can use some crawler software to crawl purposefully, such as crawling the Weibo follow data of a group of users, the quotations of various models of a car forum, and so on. User retention is mostly because users use the company's products or business, users will leave a series of behavior data in the use of products or business, which constitutes the main body of our database, and the usual data analysis is based on the data retained by users. Data uploaded by users, such as licensed selfies, address books, historical call lists and other data that require active authorization, are often key data in business operations. Compared with self-owned data acquisition, the acquisition of external data is much simpler, the vast majority of them are based on API interface transmission, and a small amount of data is transmitted offline in the form of tables or files. This kind of data is either clearly priced how much a piece of data is, or data sharing is carried out, and both sides of the transaction promise to share data and seek common development.
So far, we can see that the form of data acquisition in the new era is more diverse and efficient than the traditional one.
The same big data and the traditional data transmission mode is also completely different. Traditional data is transmitted either in the form of offline traditional files, or by mail or third-party software, but with the maturity and popularity of the API interface, it is like the previous mobile phone charging interface, from a variety of strange and varied to today's two main categories: iPhone system and Android system. API interface is also gradually standardized and unified with the development of the times. A programmer can complete the development of an API interface in two days, and the efficiency of API interface to transmit data can reach millisecond level.
In terms of data storage, big data's storage environment has jumped several orders of magnitude compared with traditional data storage. I still remember that floppy disks were very advanced more than a decade ago, and floppy disks with storage up to 20MB were already expensive, not to mention USB drives and removable hard drives.
Another significant difference between big data and traditional data is the richness of data types. Traditional data pay more attention to the description of objects, while big data is more inclined to record the data process. In order to make it easier for you to understand, here is a simple example to illustrate the difference between traditional data and big data's recording method.
Traditional data are recorded in the following table.
Big data's recording method is as follows.
It is obvious that the biggest difference between the traditional data and the data recorded by big data is that big data not only described the object, but also added dimensions such as time and place, which recorded a process. The whole process was recorded from before Xiaoming entered the restaurant to when Xiaoming left the restaurant. However, the traditional way of recording data is more inclined to a simple description of the results.
Of course, the user dining data that big data can record is far from limited to the fields listed above. Ideally, big data monitoring will even record a series of data, such as the way users eat, their behavior, their facial expressions, and so on. these data reflect users' feelings about the dining environment and their reaction to the taste of the meal, and can be further used to improve the dining environment, food taste, and give some suggestions.
The core difference between big data and traditional data lies in its inestimable value. The value of traditional data is reflected in information transmission and representation, which is the description and feedback of the phenomenon, so that people can understand the data through the data. Big data is a full record of the process of the phenomenon, through the data can not only understand the object, but also analyze the object, grasp the rules of the operation of the object, mine the internal structure and characteristics of the object, and even understand the information that the object does not know.
Such as the description and summary of a person in an encyclopedia, recording the person's height, weight, date of birth, hobbies, daily activities, relatives and friends and other data, these are traditional data, through these traditional data you can know and know this person. If we record a person in big data's way, we can detail a series of process data, such as when he gets up, his sleep quality, his physical condition, what he is doing at each point in time, and so on. Through these process data, we not only know and know this person, but also know his habits and personality, and even dig out information such as emotions and inner activities hidden in his living habits. These are not reflected by the traditional data, but also the wealth of information carried by big data, there is a huge value hidden behind the rich information, and these values can even help people to achieve the state of "what they think is what they get".
The special feature of big data's value lies in its exploitability. With the same pile of data, different people can get different levels of things. It's like seeing a person, some people only see whether his appearance is good, some people can read mental activity in his expression, experience in his eyes, taste in his clothes, and habits in his shoes. And these deep-seated non-apparent content needs skills and strength to dig out, which is what we call data analysis and data mining.
This article is selected from "starting from 1-the Road to the growth of data analysts". Click this link to view the book on the official website of the blog.
For more wonderful articles in time, search for "blog viewpoints" on Wechat or scan the QR code below and follow.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.