In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "what are the advantages of hadoop", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what are the advantages of hadoop" this article?
First, Hadoop is a distributed system infrastructure developed by the Apache Foundation.
Hadoop is a distributed computing platform that makes it easy for users to structure and use. Users can easily develop and run applications that deal with huge amounts of data on Hadoop. It mainly has the following advantages:
1. High reliability. The ability of Hadoop to store and process data bit by bit is trustworthy.
two。 High scalability. Hadoop distributes data and completes computing tasks among available computer clusters, which can be easily extended to thousands of nodes.
3. High efficiency. Hadoop can move data dynamically between nodes and ensure the dynamic balance of each node, so the processing speed is very fast.
4. High fault tolerance. Hadoop can automatically save multiple copies of data and automatically reassign failed tasks.
5. Low cost. Compared with all-in-one machines, commercial data warehouses and data marts such as QlikView and Yonghong Z-Suite, hadoop is open source, so the software cost of the project will be greatly reduced.
Hadoop comes with a framework written in the Java language, so it is ideal to run on the Linux production platform.
In fact, we need to know the essential characteristics of big data: in view of the large amount of structured, unstructured and semi-structured data in increments, how to mine efficient market data quickly and repeatedly? With this problem infiltrated into the business to analyze, you will know what business scenarios hadoop needs to be applied to! If a relational database can handle the work, do you still need hadoop?
With regard to hadoop, I am more impressed by the "second marketing" mentioned by someone in Zhihu. What is "second marketing"? To put it bluntly:
1. Calculate your personal information through big data
two。 And then make a precise push.
What else can hadoop do?
For example:
Large data storage: distributed storage
Log processing: Hadoop is good at this
Mass computing: parallel computing
ETL: data extraction to oracle, mysql, DB2, mongdb and mainstream databases
Using HBase for data analysis: using extensibility to deal with a large number of write operations-Facebook constructs a real-time data analysis system based on HBase
Search engine: hadoop + lucene implementation
Data Mining: a popular Advertising recommendation
Read sequentially from a large number of files. HDFS optimizes sequential reads at the cost of high load for random access.
Any server may fail, and the performance will not be greatly affected by a large amount of data replication.
Personalized advertising recommendation
Having said so much, it doesn't matter if you don't understand. Let's pick the core and talk about it.
The core designs of Hadoop are HDFS and MapReduce.
1.Hdfs provides massive data storage.
For more information, please see HDFS (personal recommendation, this blog has a lot of documentation support)
2.MapReduce provides the calculation of the data.
2.1MapReduce programming model
MapReduce adopts the idea of "divide and conquer", distributes the operation of large-scale data sets to each sub-node under the management of a master node, and then obtains the final result by integrating the intermediate results of each node. To put it simply, MapReduce is "the decomposition of tasks and the summary of results".
In Hadoop, there are two machine roles for performing MapReduce tasks: one is JobTracker;, the other is TaskTracker,JobTracker, which is used to schedule work, and TaskTracker is used to perform work. There is only one JobTracker in a Hadoop cluster.
In distributed computing, MapReduce framework is responsible for dealing with complex problems in parallel programming, such as distributed storage, job scheduling, load balancing, fault-tolerant balancing, fault-tolerant processing and network communication. The processing process is highly abstracted into two functions: map and reduce,map are responsible for decomposing tasks into multiple tasks, and reduce is responsible for summarizing the results of multitasking after decomposition.
It should be noted that the dataset (or task) processed with MapReduce must have the characteristics that the dataset to be processed can be decomposed into many small datasets, and each small dataset can be processed in full parallel.
2.2MapReduce process
In Hadoop, each MapReduce task is initialized to a Job, and each Job can be divided into two phases: the map phase and the reduce phase. These two stages are represented by two functions, namely, map function and reduce function. The map function receives a formal input, and then produces a formal intermediate output. The Hadoop function receives a formal input, and then processes the value set. Each reduce produces 0 or 1 output, and the output of reduce is also formal.
The process of dealing with large data sets by MapReduce
The above is all the contents of this article "what are the advantages of hadoop?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.