Hadoop Cluster problem set 04/28 Update SLTechnology News&Howtos

Hadoop Cluster problem set

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

1 、 bigdata is not allowed to impersonate xxx

Reason: the user agent is not in effect. Check that the core-site.xml file is configured correctly.

Hadoop.proxyuser.bigdata.hosts * hadoop.proxyuser.bigdata.groups *

Note: XXX in hadoop.proxyuser.XXX.hosts and hadoop.proxyuser.XXX.groups is the user name in User:* in exception information.

Hadoop.proxyuser.bigdata.hosts * The superuser can connect only from host1 and host2 to impersonate a user hadoop.proxyuser.bigdata.groups * Allow the superuser oozie to impersonate any members of the group group1 and group2

After adding the above configuration, there is no need to restart the cluster. You can reload these two attribute values directly on the namenode node using the administrator account. The command is as follows:

$hdfs dfsadmin-refreshSuperUserGroupsConfigurationRefresh super user groups configuration successful$ yarn rmadmin-refreshSuperUserGroupsConfiguration 15:02:29 on 19-01-16 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0pur8033

If the cluster is configured with HA, execute the following command to reload all namenode nodes:

# hadoop dfsadmin-fs hdfs://ns-refreshSuperUserGroupsConfigurationDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.Refresh super user groups configuration successful for master/192.168.99.219:9000Refresh super user groups configuration successful for node01/192.168.99.173:90002, org.apache.hadoop.hbase.exceptions.ConnectionClosingException

Symptom: when you call hiveserver2 with beeline, jdbc, python, you cannot query, create tables and other Hbase associated tables

Hive.server2.enable.doAs false Setting this property to true will have HiveServer2 execute Hive operations as the user making the calls to it.

Create Hbase associated tables in hive

The table name test_tbCREATE TABLE test_tb (key int, value string) # in # Hive specifies the storage processor STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'# declares the column family, and the column name WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val") # hbase.table.name declares the HBase table name, and the optional attribute defaults to the same table name as the Hive table name # hbase.mapred.output.outputtable specifies the table to be written when the data is inserted If you need to insert data into the table later, you need to specify the value TBLPROPERTIES ("hbase.table.name" = "test_tb", "hbase.mapred.output.outputtable" = "test_tb") Spark work directory is cleaned regularly

Using spark standalone mode to perform tasks, without submitting a task, a folder is generated under the work directory of each node, with the naming convention app-xxxxxxx-xxxx. Under this folder are the resource files required by the program downloaded by each node from the master node when the task is submitted. These directories are generated every time they are executed, and do not clean up automatically, and too many tasks will burst the memory.

The directory of each application contains the dependent packages needed for the spark task to run: export SPARK_WORKER_OPTS= "- Dspark.worker.cleanup.enabled=true # whether to enable automatic cleanup-Dspark.worker.cleanup.interval=1800 # cleanup cycle, and how often Unit second-Dspark.worker.cleanup.appDataTtl=3600 "# how long does it take to retain too many data zookeeper connections that cause hbase and hive to fail to connect 2019-01-25 03-26 hbase 41627 [myid:]-WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211]-Too many connections from / 172.17.0.1-max is 60 modify the hbase according to the online environment, Hive connection Zookeeper configuration hbase-site.xmlhbase.zookeeper.property.maxClientCnxnshive-site.xmlhive.server2.thrift.min.worker.threadshive.server2.thrift.max.worker.threadshive.zookeeper.session.timeoutzoo.cfg# Limits the number of concurrent connections (at the socket level) that a single client Identified by IP addressmaxClientCnxns=200# The minimum session timeout in milliseconds that the server will allow the client to negotiateminSessionTimeout=1000# The maximum session timeout in milliseconds that the server will allow the client to negotiatemaxSessionTimeout=60000

Keep updating.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.