In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you what are the misunderstandings of Hadoop, I believe that most people still do not understand, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!
Hadoop is an open source software framework for storing and analyzing large datasets that can handle data distributed across multiple existing servers. Hadoop is generally considered a big data operating system because it is suitable for handling diverse and heavily loaded data from mobile phones, e-mail, social media, sensor networks and other different channels. And this is the source of the first misunderstanding:
1. Hadoop is a complete solution.
This is not the case. It doesn't matter whether you call it a "framework" or a "platform", but you can't think that Hadoop can solve all big data's problems.
"there are no standard Hadoop products on the market," said Phil Simon, author of too big to ignore: big data's Business case. "it's not like anything else. You can get a standard database from IBM or SAP."
Simon, however, does not think this is a long-term problem. First, because Hadoop is an open source project, many other Hadoop-related projects, such as Cassandra and HBase, can meet specific needs. The distributed database provided by HBase supports the structured data storage of large data tables.
In addition, just as Red Hat, IBM and others package Linux into various user-friendly products, there are many startups in big data's side that are doing the same thing to Hadoop. So, although Hadoop itself is not a complete solution, most enterprises will actually encounter it in the relatively complete big data solution.
2. Hadoop is a kind of database.
Hadoop is often treated as a database, but this is not the case. Marshall Bockrath-Vandegrift, a software engineer at Damballa Security, said: "there is no core platform in the Hadoop core similar to query or indexing." Damballa uses Hadoop to analyze real-time security risks.
"We use HBase to help our risk analysts run real-time queries against passive DNS data. HBase and other real-time technologies not only complement Hadoop, but also mostly rely on Hadoop core distributed storage technology (HDFS) for high-performance distributed data set access." He added.
Prateek Gupta, a scientist at Bloom Reach data Marketing Analysis, also said: "Hadoop is not made to replace database systems, but it can be used to build database systems."
3. Enterprise Hadoop applications are too risky.
Many enterprises worry that Hadoop is too new, untested and unsuitable for enterprise applications. There is nothing more wrong than this. Don't forget that Hadoop is based on Google File system's distributed storage platform and the GoogleMapReduce data analysis tool that runs on that file system. Yahoo invested money and effort on Hadoop and in 2008 launched its first large Hadoop app, a search "site map" that indexes all known pages and corresponding metadata to complete the search for those pages.
Today, Hadoop is adopted by companies including Netflix, Twitter and eBay, including Microsoft, IBM and Oracle, which have Hadoop tools for sale. It is too early to call Hadoop a "mature" technology, similar to that of any big data platform, but it has indeed been adopted and validated by large enterprises.
This does not mean that it is a risk-free platform, the security issue itself is a thorny issue. But companies should not be scared away by the youth of the Hadoop platform.
4. To use Hadoop, you have to hire a bunch of programmers.
Depending on what you have to do, this may be true. If you plan to develop an excellent next-generation Hadoop big data suite, you may need professional Java and MapReduce programmers. On the other hand, programming is not a problem if you are willing to take advantage of the achievements of others. Syncsort, a data integration provider, recommends that analysts use Hadoop-compatible data integration tools to run advanced queries without any coding effort.
Most data integration tools have graphical interfaces that shield the complexity of MapReduce programming, many with preset templates. In addition, startups, including Alpine Data Labs, Continuuity and Hortonworks, provide tools to simplify big data and Hadoop applications.
Hadoop is not suitable for small and medium-sized enterprises.
Many small and medium-sized enterprises are worried that they will be shut out by the trend of "big data". Large vendors such as IBM and Oracle naturally tend to peddle large and expensive solutions. This does not mean that there are no relevant tools on the market for small and medium-sized enterprises.
Cloud computing is rapidly promoting the popularity of some cutting-edge technologies. "Cloud computing is turning capital expenditure into operating costs," points out Phil Simon, author of big data. "you can use the same cloud services as Netflix. The same thing is starting to happen in big data, a company with only five employees can still use Kaggle."
Kaggle calls itself "a market for bridging data problems and data solutions." For example, startup Jetpac is offering a $5000 reward for an algorithm to find the most attractive vacation photos. Most vacation photos are not good, and screening them is a tedious and time-consuming process.
Jetpac had 30000 photos manually selected and looked for an algorithm that could be similar to the manual way, but by analyzing the metadata (photo size, title, description) to sort. If the company had developed the algorithm on its own, it would have cost more than $5000. And they can only get one option, rather than choosing it from a variety of options. Jetpac's image processing tools eventually helped it secure $2.4 million in venture capital.
6. Hadoop is cheaper.
This misconception applies to any open source software. Just because you save on your initial purchasing costs doesn't mean you'll save money. One of the problems with cloud computing, for example, is that it is so easy to build a research project on Amazon that many people set up their own projects on AWS and keep paying for them while forgetting the projects themselves.
The blind expansion of virtual servers has dwarfed the increase in physical servers. Although Hadoop can help you store and analyze data, how do you import old data into the new system? How to realize the visualization of data? How to share data? How can you protect the data that will be shared more by everyone?
Hadoop is actually a patchwork solution. You can get a complete enterprise solution from a company like Cloudera, or you can start to build your own highly customized solution. No matter what route you choose, budget carefully, because free software is never really free.
The above is all the contents of this article "what are the misunderstandings of Hadoop?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 239
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.