Hadoop distributed cluster deployment and some pits encountered in the process 09/13 Update SLTechnology News&Howtos

Hadoop distributed cluster deployment and some pits encountered in the process

2025-09-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

In the hadoop learning process, the first step is to deploy pseudo-distributed and distributed clusters.

Http://www.powerxing.com/install-hadoop-cluster/ during the deployment of a cluster

Use this blog as a reference.

During the deployment process. There are some problems.

For example: to run a simple MAPREDUCE task with PYTHON, you first need to streamingJAR the package now. To put it simply, this package encapsulates some commonly used interfaces, and PYTHON calls the package through standard input. Finally complete the function implemented in JAVA internally.

Download address: http://www.java2s.com/Code/JarDownload/hadoop-streaming/

The python program is mapper.py

#! / usr/bin/env python

Import sys

For line in sys.stdin:

Line = line.strip ()

Words = line.split ()

For word in words:

Print "% s\ t% s"% (word, 1)

And reducer.py.

* * #! / usr/bin/env python

From operator import itemgetter

Import sys

Current_word = None

Current_count = 0

Word = None

For line in sys.stdin:

Line = line.strip ()

Word, count = line.split ('\ tasking, 1)

Try:

Count = int (count)

Except ValueError: # count if it's not a number, just ignore it.

Continue

If current_word = = word:

Current_count + = count

Else:

If current_word:

Print "% s\ t% s"% (current_word, current_count)

Current_count = count

Current_word = word

If word = = current_word: # Don't forget the final output

Print "% s\ t% s"% (current_word, current_count) * *

Operation mode: hadoop jar. / hadoop-streaming-2.6.0.jar-file. / mappper.py-file. / reducer.py-input / input-output / output. Note that / input must be placed on the hadoop file system. Hadoop fs-put input / input / output cannot exist. If it exists, delete it first and write #! / usr/bin/env python in the first line of python, otherwise an error may be reported. For specific reasons, you can see the http://andylue2008.iteye.com/blog/1622260 blog. In addition, if you use a command like hadoop fs-ls to report an error: the ls directory cannot be found. Because the home directory hadoop fs-mkdir-p / user/hadoop is not created

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.