Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the test questions that Hadoop often meets?

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what are the common test questions about Hadoop. Xiaobian thinks it is quite practical, so share it with you as a reference. I hope you can gain something after reading this article.

1 What are the configuration files in hadoop and their respective roles?

hadoop-env.sh JAVA_HOME, HADOOP_CONF_DIR, HADOOP_LOG_DIR, HADOOP_PID_DIR, HADOOP_CLASSPATH, hadoop related process JVM parameters Other core-site.xml fs.defaultFS,hadoop.tmp.dir,ha.zookeeper.quorum,io.compression.codecs,io.file.buffer.sizehdfs-site.xml url info for namenode, dfs.name.dir, dfs.data.dir, dfs.replication, dfs.namenode.shared.edits.dir, dfs.journalnode.edits.dir,dfs.hosts.excludeslaves datanode list mapred-site.xml mapreduce.framework.name,mapreduce.map.output.compress.codecyarn-site.xml resource manager information includes exclusion node list

2 What is HDFS storage mechanism?

1. HDFS pioneered the design of a set of file storage methods, that is, the files are divided and stored separately;2. HDFS divides large files to be stored, stores them in established storage blocks, and preprocesses stored data through preset optimization processing modes, thus solving the storage and calculation requirements of large files;3. An HDFS cluster consists of two main parts, NameNode and DataNode. Generally speaking, there will be one NameNode and multiple DataNodes working together in a cluster;4. NameNode is the master server of the cluster, mainly used to maintain all files and content data in HDFS, and constantly read and record the host status and working status of DataNode in the cluster, and store it by reading and writing mirror log files;5. DataNode plays a task-specific role in HDFS cluster and is the worker node of the cluster. The file is divided into several data blocks of the same size, which are stored on several DataNodes respectively. DataNodes will regularly send their own running status and storage contents to NameNodes in the cluster, and work according to the instructions sent by NameNodes;6. NameNode is responsible for receiving the information sent by the client, and then sending the file storage location information to the client that submitted the request. The client directly contacts DataNode to perform calculations and operations on some files. 7. Block is the basic storage unit of HDFS, the default size is 64M (128M in hadoop2);8. HDFS can also perform multi-copy backup of stored blocks, copying each Block to at least 3 independent hardware, so that damaged data can be quickly recovered;9. Users can use the established API interface to manipulate files in HDFS;10. When the client reads an error, the client reports the error to the NameNode and requests the NameNode to exclude the wrong DataNode and sort it again according to distance, so as to obtain the read path of a new DataNode. If all DataNodes report read failures, then the entire task fails to read;11. FSDataOutputStream does not close immediately for problems during write out operations. The client reports error messages to the NameNode and writes data directly to the DataNode that provides the backup. Backup DataNode is promoted to Preferred DataNode and backup replication data in the remaining 2 DataNodes. NameNode marks the wrong DataNode for later processing.

3 How to view, delete, move, copy hadoop files

hdfs dfs -text ... hdfs dfs -rm ... hdfs dfs -mv ... hdfs dfs -cp ...

4 Hadoop Combiners

Combiner is similar to local reduce function. Realize aggregation of local keys, reduce io pressure from clear to reduce

5 How mr works, distance explains how mr works

I'm not sure if I can answer this picture.

6 Hive is different from oracle, which functions are not supported by hive at present (list more than 5)

Hive processes large data volume, high latency, based on hdfs, hql converted to mr execution, does not support data modification Oracle processes relatively small data volume, has all, low latency, supports data modification

7 hbase Common basic commands, create tables, add records, view records, delete records

create 'table name',' column familyname1','column name familyname2',' column name familynameN 'put ' table name','row name',' column name','value'get 'table name',' row name','column name'

8 See below.

//create table net_info (device_number int,lac int,ci int,imei bigint,start_time timestamp,end_time timestamp,duration int,send_bytes int,recv_bytes int,total_bytes int)row format delimited fields terminated by '|';//load data local inpath '/home/hadoop/text.txt' into table net_info;select * from net_info;//calculate select sum(total_bytes) from net_info where start_time>='2014-12-31' and end_time

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report