Are you sure you understand the 45 problems that Hadoop will encounter? 07/06 Update SLTechnology News&Howtos

Are you sure you understand the 45 problems that Hadoop will encounter?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

In this article, we will share some common problems in Hadoop cluster setup.

In work and life, some questions are very simple, but often can not find the answers they need after searching for a long time, as well as in the process of learning and using Hadoop. Follow the official account of WX: big data, technical engineer. Hadoop study materials are available free of charge. Here are some common problems in Hadoop cluster setup:

What are the three modes that a 1.Hadoop cluster can run?

Stand-alone (local) mode pseudo-distributed mode fully distributed mode

What are the points for attention in stand-alone (local) mode?

There are no daemons in stand-alone mode (standalone), and everything runs on a JVM. There is also no DFS here, using the local file system. Stand-alone mode is suitable for running MapReduce programs during development, which is the least used mode.

What are the points for attention in the pseudo-distribution pattern?

Pseudo-distributed (Pseudo) is suitable for development and test environments, where all daemons run on the same machine.

Can VM be called Pseudo?

No, two things, and Pseudo only targets Hadoop at the same time.

What are the points for attention in the fully distributed model?

Fully distributed mode is usually used in production environments, where we use N hosts to form a Hadoop cluster, and the Hadoop daemon runs on top of each host. There will be hosts running by Namenode, hosts running by Datanode, and hosts running by task tracker. In a distributed environment, the master node and the slave node are separated.

Does Hadoop follow the UNIX pattern?

Yes, in the UNIX use case, Hadoop also has a "conf" directory.

In what directory is Hadoop installed?

Cloudera and Apache use the same directory structure, and Hadoop is installed on cd/usr/lib/hadoop-0.20/.

What are the port numbers for Namenode, Job tracker, and task tracker?

Namenode,70;Job tracker,30;Task tracker,60 .

What is the core configuration of Hadoop?

The core configuration of Hadoop is accomplished through two xml files: 1 Hadoopmura default.xmlbot 2 Hadooplysite.xml. These files are in xml format, so there are some properties in each xml, including names and values, but these files no longer exist.

Then how to configure it at the moment?

Hadoop now has three configuration files: 1Magne coremursite.xmlter2meme hdfsripsite.xmlter3memmapredripsite.xml. These files are saved in the conf/ subdirectory.

What is the spillover factor of RAM?

The overflow factor (Spill factor) is the size of the file stored in the temporary file, that is, the Hadoop-temp directory.

Fs.mapr.working.dir is just a single directory?

Fs.mapr.working.dir is just a directory.

What are the three main attributes of hdfs-site.xml?

Dfs.name.dir determines the path to metadata storage and how DFS is stored (disk or remote) dfs.data.dir determines the path to data storage fs.checkpoint.dir for the second Namenode

How do I exit input mode?

The ways to exit the input are: 1, press ESC;2, type: Q (if you haven't entered any current) or type: wq (if you have already entered the current), and press Enter.

What happened to the system when you typed hadoopfsck / caused "connection refused java exception'"?

This means that Namenode is not running on your VM.

We use Ubuntu and Cloudera, so where should we download Hadoop, or install it with Ubuntu by default?

This is the default configuration of Hadoop, which you must download from Cloudera or Edureka's dropbox and run on your system. Of course, you can configure it yourself, but you need a Linux box,Ubuntu or Red Hat. There are installation steps on the Cloudera website or in Edureka's Dropbox.

What is the use of the "jps" command?

This command checks whether Namenode, Datanode, Task Tracker, and Job Tracker are working properly.

How do I restart Namenode?

Click stop-all.sh, and then click start-all.sh. Type sudo hdfs (Enter), su-hdfs (Enter), / etc/init.d/ha (Enter), and / etc/init.d/hadoop-0.20-namenode start (Enter).

The full name of Fsck?

Full name: File System Check.

How do I check that Namenode is working properly?

If you want to check that Namenode is working properly, use the command / etc/init.d/hadoop-0.20-namenode status or simply jps.

What is the function of the mapred.job.tracker command?

It can let you know which node is Job Tracker.

What is the purpose of the / etc / init.d command?

/ etc / init.d indicates the location or status of the daemon (service), which is actually a LINUX feature and has little to do with Hadoop.

How do I find Namenode in a browser?

If you do need to find Namenode in your browser, you no longer need the port number of localhost:8021,Namenode is 50070.

How do I transfer from SU to Cloudera?

To go from SU to Cloudera, just type exit.

What files are used for startup and shutdown commands?

Slaves and Masters.

What does Slaves consist of?

Slaves consists of a list of hosts, each with 1 row, to describe the data node.

What does Masters consist of?

Masters is also made up of a list of hosts, one line each, to describe the second Namenode server.

What is hadoop-env.sh used for?

Hadoop-env.sh provides Hadoop. The operating environment of JAVA_HOME.

Does the Master file provide multiple entries?

Yes, you can have multiple Master file interfaces.

What is the current location of the hadoop-env.sh file?

Hadoop-env.sh is now located in conf.

What does PID stand for in Hadoop_PID_DIR?

PID stands for "Process ID".

What is / var/hadoop/pids used for?

/ var/hadoop/pids is used to store PID.

What is the function of hadoop-metrics.properties file?

Hadoop-metrics.properties is used as "Reporting", controls the Hadoop report, and the initial status is "not to report".

What kind of network does Hadoop need?

The Hadoop core uses Shell (SSH) to drive the server process on the slave node and uses password-less SSH connections between the master node and the slave node.

Why is password-less SSH needed in a fully distributed environment?

This is mainly because the communication in the cluster is too frequent, and Job Tracker needs to publish tasks to Task Tracker as soon as possible.

Does this cause security problems?

Don't worry at all. Hadoop clusters are completely isolated and generally cannot be operated from the Internet. Unusual configuration, so we do not need to care about this level of security vulnerabilities, such as intrusion through the Internet, and so on. Hadoop provides a relatively secure way to connect between machines.

What is the port number on which SSH works?

The port number on which SSH works is NO.22, which of course can be configured, and 22 is the default port number.

What are the other points for attention in SSH?

SSH is just a secure shell communication, which can be thought of as a protocol on NO.22 and can be securely accessed by configuring a password.

Why does the SSH local host need a password?

The use of passwords in SSH is mainly to increase security, and in some cases password communication is not set at all.

If you add key to SSH, do you still need to set the password?

Yes, even if you add key to SSH, you still need to set a password.

What if there is no data in Namenode?

A Namenode without data cannot be called a Namenode. Usually, a Namenode must have data.

What happens to Namenode when Job Tracker goes down?

When Job Tracker fails, the cluster still works fine, as long as the Namenode is fine.

Is the shard input decided by the client or Namenode?

This is not determined by the client, in the configuration file and in determining the slicing details.

Is it possible to build a Hadoop cluster on your own?

Yes, you can do this as long as you are familiar with the Hadoop environment.

Can I run Hadoop on Windows?

You'd better not do this. Red Hat Linux or Ubuntu is the best operating system for Hadoop. In a Hadoop installation, Windows is usually not used because of all kinds of problems. Therefore, Windows is definitely not a recommendation system for Hadoop.

These are some common problems in hadoop cluster. I hope they will be of some help to you. At the same time, I hope you will pay more attention to me. More wonderful articles will be brought to you.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.