In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Hadoop and big data are one of the hottest words in the past two years. More and more companies are interested in this thing, but most of the people I come into contact with, whether they are technicians or bosses. I don't know how to use these things to improve my company's business. In the process of solving the problem, extract a few key points and record them.
Are big data and Yun the same thing?
This is one of the most confusing concepts. Personally, I think these are two different things. Cloud services, whether they are cloud hosts, cloud storage or other cloud applications, provide users with an interface, but the back end of this interface is virtual machine technology. Or distributed storage technology, or other distributed computing technologies and so on. In short, the concept of cloud is that I provide services to you, and you don't need to care about how complex the architecture or technical implementation of such services is. For example, if we want to use electricity in the pre-cloud era, we need to build our own power plant to generate electricity, build units, build substations, and then use it. On the other hand, cloud service is like someone else has built a power plant, and the wire goes directly into your home. If you want to use it, you just need to plug it in and you don't need to care about how electricity is made. The production of electricity and the maintenance of power equipment are undertaken by the State Grid. When it comes to the network, that is, we used to buy our own servers, install our own systems, put them on the shelves, do our own load balancing, and maintain our own software and hardware environment. With the cloud, this is done through the virtual machine technology of the cloud service provider. Data security and network security are provided by cloud service providers, and you don't need to hire people to maintain a bunch of devices.
If you want to learn big data well, you'd better join a good learning environment. You can come to this Q Group 251956502 so that it is more convenient for everyone to learn, and you can also communicate and share materials together.
When it comes to big data, this can be cloud-based or not cloud-based. Big data's processing technology is different from that of providing cloud services, but there is some overlap. It can be said that cloud services are infrastructure and municipal projects, while big data is a high-rise building in the city. Big data can be cloud-based or not cloud-based.
From a technical point of view, most of the domestic cloud service providers mainly provide virtual machine services, which is a sub-concept, which divides a physical server into multiple virtual small servers and uses its physical resources as much as possible to avoid waste. Big data's idea is to merge many servers into a virtual giant server, which can quickly serve the productivity by allocating computing resources. To use an old Chinese saying to describe big data and Hadoop is: three cobblers are worth Zhuge Liang. Use the combined computing resources to exceed the computing power of minicomputers or medium computers. Of course, there is the concept of cloud, that is, you don't need to care about how the data is stored and calculated, you just need to use it.
Does big data's technology necessarily require a large amount of data, and if the amount of data is small, it does not need to be used?
Usually think so, but not absolutely, the calculation dimension is large, the calculation process is complex can also be thought of as big data. In other words, if the data you need cannot be calculated properly within the time you need, you may need to use big data's technology.
On the one hand, the storage capacity of your data is beyond the capacity of the database or data warehouse, and you may need big data technology; on the other hand, your computation is beyond the timeliness of traditional data processing methods. You may also need big data technology. The typical challenge of computing power comes from data mining and multi-dimensional analysis. The amount of data may be small, but the algorithm and process are very complex, and big data's technology may be needed. For example, make recommendations to users and make accurate advertising based on the classification of user groups. Or in traditional industries to calculate weather forecasts, calculate geological data for oil exploration, mineral exploration. Or it may be used in the financial industry, through the establishment of mathematical models of historical data to predict the risk of securities and futures loans. The reason why Alibaba's forecasts of China's economy and imports and exports are more accurate than those of the Ministry of Commerce and the Bureau of Statistics, apart from their group of math and statistical experts, big data is a completely indispensable technical tool.
Is big data's technology that Handup?
Obviously not, big data field there are many manufacturers and applications, some open source, there are fees. For example, some companies and software that are not handled by Hadoop big data, splunk of EMC's Greenplum,Splunk Company, and so on. None of these are based on Hadoop, but they also have a common drawback, that is, they are expensive. Therefore, most companies use open source software to complete big data's business processing. The best thing to do in the open source field should be hadoop. So now hadoop has basically become synonymous with big data's treatment. There are many commercial companies based on Hadoop because Apache's license agreement does not reject commerce. Like the well-known Cloudera,MapR in China, their commercial products are based on Hadoop and its surrounding ecological software.
How should big data promote the development of the company's business?
This is a matter of imagination. With large capacity and large computing, you can only think about how to use it. We still do what the original data does, but in addition to beer and diapers, chewing gum and condoms, there is another vivid example like this: there is a company in the United States that inserts a sensor every other mile in each grain-producing area. collect data such as air humidity and soil nitrogen content. After collecting it, through big data's processing methods and algorithms, we predicted what the harvest in the region might be like, and then sold the forecast report to American agricultural insurance companies.
What are the advantages and disadvantages of Hadoop?
The advantage of Hadoop is that data capacity, computing power and data backup security have been greatly improved. 1. 0 can support parallel storage and computing up to about 4000 servers, while 2. 0 can support about 6000 servers. However, 2.0 is not perfect yet, so 1.0 is recommended in the production environment. I think 4000 clusters have enough capacity and computing power to match the mainframe of IBM, judging from the large-scale outage of the Bank of China on December 15 last year. No matter how secure the mainframe is, it is still a single point. There is really something wrong, and no one dares to switch to the backup mainframe. Hadoop 1.0 already has many solutions to solve single-point problems, and 2.0 supports single-point failover on its own. Perhaps it will continue to develop in the future and will surpass the mainframe in an all-round way. In fact, IBM has already started to release its own Hadoop distribution.
As for the disadvantage, there is still a single point problem in Hadoop1.0, but other technical means can be used to make up for the hot switching, which only requires a high technical level of maintenance personnel. Another disadvantage is that it takes a long time to calculate, and it is still impossible to achieve real-time query and fast decision response. But there are many other solutions to make up for this problem with Hadoop, such as Apache's Impala from Drill,Cloudera, which competes with Google Dremel, and other products. Real-time computing has Twitter open source Storm cluster, the design concept is the same as Hadoop, but it can calculate the real-time data stream and generate the calculation results immediately. Do it as you check it out.
With the support of various open source communities and the joint efforts of programmers all over the world, big data's processing power is also developing at a high speed, and programmers are transforming the world with their own wisdom.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.