In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. An exception was encountered when starting spark with. / bin/spark-shell: java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!
Solution: add export SPARK_LOCAL_IP= "127.0.0.1" to spark-env.sh
2. Java Kafka producer error:ERROR kafka.utils.Utils$-fetching topic metadata for topics [Set (words_topic)] from broker [ArrayBuffer (id:0,host: xxxxxx,port:9092)] failed
Solution: Set 'advertised.host.name' on server.properties of Kafka broker to server's realIP (same to producer's' metadata.broker.list' property)
3 、 java.net.NoRouteToHostException: No route to host
Solution: match the IP of zookeeper
4 、 Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.net.UnknownHostException: linux-pic4.site:
Solution: add your hostname to / etc/hosts: 127.0.0.1 localhost linux-pic4.site
5 、 org.apache.spark.SparkException: A master URL must be set in your configuration
Solution: SparkConf sparkConf = new SparkConf () .setAppName ("JavaDirectKafkaWordCount") .setMaster ("local")
6 、 Failed to locate the winutils binary in the hadoop binary path
Solution: install hadoop first
7. When starting spark: Failed to get database default, returning NoSuchObjectException
Solution: 1) Copy winutils.exe from here (https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin) to some folder say, C:\ Hadoop\ bin. Set HADOOP_HOME to C:\ Hadoop.2) Open admin command prompt. Run C:\ Hadoop\ bin\ winutils.exe chmod 777 / tmp/hive
8. Org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
Solution: Use this constructor JavaStreamingContext (sparkContext: JavaSparkContext, batchDuration: Duration) instead of new JavaStreamingContext (sparkConf, Durations.seconds (5))
9 、 Reconnect due to socket error: java.nio.channels.ClosedChannelException
Solution: kafka server broker ip write pair
10 、 java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
Solution: the RDD generated in the last step of tranformation must have the corresponding Action operation, such as massages.print (), etc.
11. Experience: writing data to ElasticSearch in spark must be performed in action in units of RDD
12 、 Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use
Solution: master and slave are configured into the same IP, and different IP is required.
13 、 CALL TO LOCALHOST/127.0.0.1:9000
Solution: host is configured correctly, / etc/sysconfig/network / etc/hosts / etc/sysconfig/network-scripts/ifcfg-eth0
13. Open the namenode:50070 page and Datanode Infomation displays only one node
Solution: SSH configuration error leads to strict matching of hostnames and reconfiguration of ssh password-free login
14. Experience: when building a cluster, configure the hostname first, and restart the machine to make the configured hostname take effect.
15 、 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
Solution: if the master and slave nodes can ping each other, then turn off the firewall service iptables stop
16. Experience: do not format HDFS at will, this will lead to many problems such as inconsistent data versions. Empty the data folder before formatting.
17 、 namenode1: ssh: connect to host namenode1 port 22: Connection refused
Solution: when sshd is closed or not installed, which sshd checks whether it is installed. If it is installed, sshd restart, and ssh native hostname, check whether the connection is successful.
18 、 Log aggregation has not completed or is not enabled.
Solution: add configuration in yarn-site.xml to support log aggregation
19 、 failed to launch org.apache.spark.deploy.history.History Server full log in
Solution: correctly configure the SPARK_HISTORY_OPTS attribute in spark-defaults.xml,spark-en.sh
20. Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
Solution: the exception in yarn-lient mode is temporarily unsolved.
21. The file of hadoop cannot be downloaded and the Tracking UI in YARN cannot access the history log
Solution: because the windows system cannot resolve the domain name, copy the hosts file hostname to the hosts of windows
Experience: HDFS file path is written as: hdfs://master:9000/ file path, where master is namenode hostname,9000 is HDFS port number.
23 、 Yarn JobHistory Error: Failed redirect for container
Solution: configure http://:19888/jobhistory/logs into yarn-site.xml and restart yarn and JobHistoryServer
24. When accessing the hdfs folder through hadoop UI, the prompt Permission denied: user=dr.who appears
Solution: namonode node terminal execution: hdfs dfs-chmod-R 755 /
Experience: Spark's Driver only receives results when it comes to Action
26. Experience: Spark should use an accumulator (Accumulator) when it needs global aggregate variables
27. Experience: Kafka divides the relationship between topic and consumer group. The messages of a topic will be consumed by the consumer group that subscribes to it. If you want a consumer to use all messages of topic, you can set up only one consumer in this group. The number of consumers in each group cannot be greater than the total number of partition of topic, otherwise the extra consumer will have no expense.
28. Java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor () Ljava/util/concurrent/Executor
Solution: unify the ES version and try to avoid creating ES client directly in spark
29. Eturned Bad Request (400)-failed to parse;Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes; Bailing out..
Solution: data format correction written to ES
30 、 java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
Solution: ensure password-free login between all nodes
31. In cluster mode, spark cannot write data to elasticsearch
Solution: use this write method (with the Map parameter configured by es) results.foreachRDD (javaRDD-> {JavaEsSpark.saveToEs (javaRDD, esSchema, cfg); return null;})
32. Experience: all custom classes must implement the serializable interface, otherwise they will not take effect in the cluster
33. Experience: resources resource files should be read on the Spark Driver side and passed to the closure function in the form of local variables
34. When reading a resource file through nio, java.nio.file.FileSystemNotFoundException at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem (ZipFileSystemProvider.java:171)
Solution: the change of URI is caused by the change of URI after the package is made, such as the JarJV filePartition, the CRV, the path, the path, the, the project, the jar, the, the.
Final Map env = new HashMap ()
Final String [] array = uri.toString () .split (! ")
Final FileSystem fs = FileSystems.newFileSystem (URI.create (array [0]), env)
Final Path path = fs.getPath (array [1])
35. Experience: DStream stream conversion only produces a temporary stream object. If you want to continue to use it, you need a reference to the temporary stream object.
36. Experience: jobs submitted to yarn cluster cannot be directly print to the console, but should be output to log files using log4j
37 、 java.io.NotSerializableException: org.apache.log4j.Logger
Solution: serialization classes cannot contain non-serializable objects, you have to prevent logger instance from default serializabtion process, either make it transient or static. Making it static final is preferred option due to many reason because if you make it transient than after deserialization logger instance will be null and any logger.debug () call will result in NullPointerException in Java because neither constructor not instance initializer block is called during deserialization. By making it static and final you ensure that its thread-safe and all instance of Customer class can share same logger instance, By the way this error is also one of the reason Why Logger should be declared static and final in Java program.
38 、 log4j:WARN Unsupported encoding
Solution: 1. Change UTF to lowercase utf-8 2. The line that sets the code has a space.
39 、 MapperParsingException [Malformed content, must start with an object
Solution: use the interface JavaEsSpark.saveJsonToEs, because saveToEs can only handle objects, not strings
40 、 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application
Solution: resources cannot be allocated too much, or .setMaster ("local [*]") is not removed.
41, WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
Solution: the configuration file broker number should be written correctly, and the IP in the command should write the real IP.
42. User class threw exception: org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set ([mywaf,7], [mywaf,1])
Solution: configure kafka correctly and recreate the topic
43. There are nodes found in the ES interface that shard fragments are not displayed.
Solution: the node does not have enough disk capacity. Clean up the disk to increase the capacity.
44. The method updateStateByKey (Function2,Optional,Optional >, int) in the type JavaPairDStream is not applicable for the arguments (Function2,Optional,Optional >, int)
Solution: Spark use com.google.common.base.Optional not jdk default package java.util.Optional
45 、 NativeCrc32.nativeComputeChunkedSumsByteArray
Solution: add 64-bit version 2.6 hadoop.dll to the hadoop-home,bin and system32 folders of eclipse
Experience: Spark Streaming includes three computing models: nonstate, stateful, and window
47. RM single point failure of Yarn
Solution: complete Yarn HA through three-node zookeeper cluster and yarn-site.xml configuration file
Experience: kafka can use its own zookeeper cluster through configuration files
Experience: all the operations of Spark are the operations of RDD in the final analysis.
50. How to ensure the strong order of kafka message queues
Solution: set only one partition for the topic that needs to be strongly ordered
51. Linux batch multi-computer mutual trust
Solution: match the pub secret key into a
52 、 org.apache.spark.SparkException: Failed to get broadcast_790_piece0 of broadcast_790
Solution: remove spark.cleaner.ttl configuration from spark-defaults.conf
53. In Yarn HA environment, accessing history logs through web is redirected to 8088 and cannot be displayed.
Solution: restore Yarn Http default port 8088
54 、 but got no response. Marking as slave lost
Solution: using yarn client to submit jobs in this situation, there is no solution for the time being.
55 、 Using config: / work/poa/zookeeper-3.4.6/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
Solution: incorrect configuration files, such as hostname mismatch
Experience: to deploy Spark tasks, you don't have to copy the entire package, just copy the modified files, and then compile and package them on the target server.
57 、 Spark setAppName doesn't appear in Hadoop running applications UI
Solution: set it in the command line for spark-submit "--name BetterName"
58. How to monitor whether Sprak Streaming jobs hang up
Solution: monitor the Driver port or write Linux timed scripts according to yarn instructions
59. Kafka internal and external network problems
Solution: kafka machine dual network card, do not write IP in the configuration file server.properties, use the domain name form, the producers of the external network and the consumers of the internal network are parsed into their own IP.
60. Experience: do not set the log.dirs of kafka to the directory under / tmp. It seems that the tmp directory has file count and disk capacity limits.
61. After kafka moves the machine, in the new cluster, the topic is created automatically, and there is only one broker load
Solution: add delete.topic.enable=true and auto.create.topics.enable=false to server.properties, delete the old topic, recreate the topic, and restart kafka
62. Install sbt and run the sbt command card in Getting org.scala-sbt sbt 0.13.6.
Solution: sbt takes some time to download its jars when it is run first time, do not quit until the sbt is finished
63. Experience: the fragmentation of ES is similar to kafka's partition
64. OOM exception occurred in kafka
Solution: enter the kafka broker startup script and increase the JVM heap memory parameters in export KAFKA_HEAP_OPTS= "- Xmx24G-Xms1G"
65. The linux server disk is full. Check for files that exceed the specified size.
Solution: find /-type f-size + 10G
66. Spark-direct kafka streaming speed limit
Solution: spark.streaming.kafka.maxRatePerPartition, configure the read rate per kafka partition per second
Org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error returned Not Found (404)-[EngineClosedException CurrentState [CLOSED]
Solution: close the index first and then open it in the kopf plug-in. The cause may be that shard was broken when Index was created.
68 、 Job aborted due to stage failure: Task not serializable:
Solution: Serializable the class;Declare the instance only within the lambda function passed in map;Make the NotSerializable object as a static and create it once per machine;Call rdd.forEachPartition and create the NotSerializable object in there
69 、 Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable
Solution: this cannot be done as of Spark 1.6, spark version needs to be upgraded
70. IDEA imports the scala project from git, prompting the variable never used throughout
Solution: change the src folder mark directory as sources root
71. Run configuration in IntelliJ result in "Cannot start compilation: the output path is not specified for module" xxx. Specify the output path in Configure Project.
Solution: In the default intellij options, "Make" was checked as "Before Launch". Unchecking it fixed the issue.
72. UDFRegistration$$anonfun$register$26 $$anonfun$apply$2 cannot be cast to scala.Function1
Solution: aggregate functions cannot use UDF, but should define UDAF
73 、 SPARK SQL replacement for mysql GROUP_CONCAT aggregate function
Solution: customize UDAF
74. In intellij idea's maven project, you cannot New scala files
Solution: pom.xml join the scala-tools plug-in related configuration, download and update
75 、 Error:scala: Error: org.jetbrains.jps.incremental.scala.remote.ServerException
Solution: modify the pom.xml configuration file and change scala to the latest version
76. The balance of each node of HADOOP disk full
Solution: run the instruction hdfs balancer-Threshold 3 or run the start-balancer.sh script format: $Hadoop_home/bin/start-balancer.sh-threshold. Parameter 3 is a proportional parameter, which means 3%, that is, the deviation of direct disk utilization of each DataNode is less than 3%.
77. Experience: the second parameter input: Row of the update function in sparkSQL UDAF corresponds not to the line of DataFrame, but to the line projected by inputSchema
78. Error: No TypeTag available for String sqlContext.udf.register ()
Solution: inconsistent scala versions, unify all scala versions
79 、 How to add a constant column in a Spark DataFrame?
Solution: The second argument for DataFrame.withColumn should be a Column so you have to use a literal: df.withColumn ('new_column', lit (10))
80 、 Error:scalac:Error:object VolatileDoubleRef does not have a member create
Solution: inconsistent scala version, unified development environment and scala version of the system
81. Java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty () Lscala/collection/immutable/HashSet
Solution: unify scala versions of scala and spark
82. Maven projects are packaged to remove unwanted dependencies to prevent the target jar from being too large.
Solution: add provided to indicate that the dependency is not put into the target jar, and package it in maven shaded mode
83. Maven packages the mixed project of scala and java
Solution: use the instruction mvn clean scala:compile compile package
84. Udf of sparkSQL cannot register UDAF aggregate function
Solution: change the object keyword of the UDAF custom class to the class declaration
85. Experience: deleting hadoop data directories at run time will invalidate JOB that depends on HDFS
86 、 [IllegalArgumentException [Document contains at least one immense term in field=XXX
Solution: participle long text fields when creating indexes in ES
87. Maven shade packaged resource file was not typed in.
Solution: put the resources folder under src/main/, side by side with the scala or java folder
Experience: spark Graph builds a graph based on edge sets, and vertex sets only specify which vertices in the graph are valid
89. When ES uses regular matching to write query, Determinizing automaton would result in more than 10000 states.
Solution: the string of regular expressions is too long and complex, regular matching should be concise, not enumerated matching.
90, java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin (TreeNode.scala:53)
Solution: the where condition of sql statement is too long, string stack overflow
91 、 org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
Solution: increase executor memory, reduce the number of executor, and increase the concurrency of executor
92. ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 61.0 GB of 61 GB physical memory used
Solution: remove the RDD cache operation, increase the spark.storage.memoryFraction coefficient value of the JOB, and increase the spark.yarn. Executor.construcyOverhead value of the job.
93. EsRejectedExecutionException [rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction
Solution: reduce the number of spark concurrency and reduce the concurrent reading of ES
Experience: the number of excutor cores for a single spark task should not be set too high, otherwise it will cause other JOB delays
95. Experience: data skew only occurs in the shuffle process. Operators that may trigger shuffle operations are: distinct groupByKey reduceByKey aggregateByKey join cogroup repartition and so on.
96. How to locate the data tilt of spark
Solution: take a look at the amount of data allocated by each task of stage and the execution time in Spark Web UI, and locate the shuffle class operator in the code according to the principle of stage partition.
97. How to solve spark data skew
Solutions: 1) filter a small number of key that lead to tilt (only abandoned Key has little effect on the job), 2) improve the parallelism of shuffle operations (limited improvement), 3) two-stage aggregation (local aggregation + global aggregation), first prefix the same key into multiple key, local shuffle and then remove the prefix, and then carry out global shuffle (only applicable to aggregation shuffle operations, the effect is obvious Invalid for shuffle operation of join class), 4) convert reduce join to map join, broadcast small tables, map operations on large tables, traverse small table data (only applicable to large tables or RDD cases), 5) use random prefixes and expansion RDD for join, put a random prefix within n for each piece of data in one RDD, and prefix each piece of data after n-fold expansion and expansion of the other RDD with the flatMap operator. Finally, join the two modified key RDD (which can greatly alleviate the skew of join type data and consume a huge amount of memory)
Experience: after the calculation of a stage, shuffle write classifies the data processed by each task according to key in order that the next stage can execute the operators of the shuffle class, and writes the same key to the same disk file, while each disk file belongs to only one task of the downstream stage. Before writing the data to disk, the data will be written to the memory cache first. How many task are there in the next stage? How many disk files need to be created for each task of the current stage.
99. Java.util.regex.PatternSyntaxException: Dangling meta character'? Near index 0
Solution: metacharacters remember to escape
100th, spark flexible resource allocation
Solution: configure spark shuffle service and open spark.dynamicAllocation.enabled
Experience: kafka's comsumer groupID is not valid for spark direct streaming
102.Boot hadoop yarn and found that only ResourceManager was started, not NodeManager
Solution: there is a problem with the yarn-site.xml configuration, check and standardize the configuration
How to view the hadoop system log
Solution: the service log of the YARN system in Hadoop 2.x includes the ResourceManager log and each NodeManager log, and their log locations are as follows: the ResourceManager log is stored in the logs directory under the Hadoop installation directory, and the yarn-*-resourcemanager-*.log,NodeManager log is stored in the logs directory under the hadoop installation directory on each NodeManager node.
Experience: small files smaller than 128m will occupy a 128m BLOCK. Merging or deleting small files will save disk space.
105 、 how to remove Non DFS Used
Solution: 1) clear the user cache files in the hadoop data directory: cd / data/hadoop/storage/tmp/nm-local-dir/usercache;du-hashing RM-rf `find-type f-size + 10M`; 2) clean up the junk data in the Linux file system
Experience: Non DFS Used refers to all documents that are not HDFS
Linux profile profile isolation
Solution: cd / etc/profile.d; create a new configuration script here
The reference to entity "autoReconnect" must end with the'; 'delimiter
Solution: replace & with &
109 、 Service hiveserver not found
Solution: Try to run bin/hive-service hiveserver2 instead of hive-service hiveserver for this version of apache hive
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException (Failed to create spark client.)'
Solution: do not precompile the spark, recompile the spark and make sure it is consistent with the version in hive pom
Java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS at org.apache.hive.spark.client.rpc.RpcConfiguration. (RpcConfiguration.java:45)
Solution: the hive spark version must match and must be a spark compiled without the-phive parameter
112 、 javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Solution: add mysql connector to hive's lib
Org.apache.hadoop.hive.ql.metadata.HiveException (Failed to create spark client FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Solution: there are many reasons. Go to hive.log to check the log to further locate the problem.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
Solution: compiling spark uses the hadoop-provided parameter, resulting in a lack of hadoop-related packages
115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115,115, linux, press delete key to display ^ H
Solution: execute the instruction stty erase ^ H
Experience: check the appropriate spark version through the hive source file pom.xml, as long as the version is consistent, for example, spark1.6.0 and 1.6.2 can match
Experience: open the Hive command line client and observe whether the output log prints "SLF4J: Found binding in" to determine whether hive is bound to StaticLoggerBinder.class]
Start yarn and find that only part of Nodemanager is started
Solution: unstarted nodes lack yarn-related packages. Keep jar packages consistent for all nodes.
119 、 Error: Could not find or load main class org.apache.hive.beeline.BeeLine
Solution: recompile Hive with the parameter-Phive-thriftserver
Experience: do not add the-Phive parameter when compiling spark,hive on spark, and add the-Phive parameter if you need sparkSQL to support hive syntax
121 、 User class threw exception: org.apache.spark.sql.AnalysisException: path hdfs://XXXXXX already exists.
Solution: df.write.format ("parquet"). Mode ("append"). Save ("path.parquet")
122,122, check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1
Solution: use the new version of mysql-connector
Org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate
Solution: vim core-site.xml,hadoop.proxyuser.root.hosts,value = *, hadoop.proxyuser.root.groups,value = *, restart yarn
Java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields ([Lorg/apache/parquet/schema/Type;) Lorg/apache/parquet/schema/Types$BaseGroupBuilder
Solution: unify the versions of parquet components in hive and spark due to version conflicts
Experience: spark.executor.instances, spark.executor.cores, spark.executor.memory and other configurations can be modified by hive-site.xml to optimize hive on spark execution performance, but it is best matched with dynamic resource allocation.
126 、 WARN SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled.
Solution: if you want to use dynamic resource allocation, do not set the number of actuators
Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)
Solution: the node.environment property (in the node.properties file) is set but fails to match the following regular expression: [a-z0-9] [_ a-z0-9] *. Re-standardize naming
128 、 com.facebook.presto.server.PrestoServerNo factory for connector hive-XXXXXX
Solution: connector.name is written incorrectly in hive.properties and should be the specified version, so that presto can use the corresponding adapter and modify it to: connector.name=hive-hadoop2
129 、 org.apache.spark.SparkException: Task failed while writing rows Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null
Solution: ES overload, repair ES
Experience: if the maven download is slow, it is likely to be caused by the GFW wall of China. You can add a domestic image under the setting.conf configuration file mirrors tag of the maven installation directory to resist the network blockade of * * parties, for example:
Nexus-aliyun
*
Nexus aliyun
Http://maven.aliyun.com/nexus/content/groups/public
131 、 RROR ApplicationMaster: Uncaught exception: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
Solution: add under the tag in the pom.xml file
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
Scala.MatchError: Buffer (10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)
Solution: clean up dirty data in ES that is not compatible with scala data types
133.How to restore HDFS files deleted by mistake: add to core-site files
Fs.trash.interval
2880
HDFS trash can be set to restore erroneous deletion. The configured value is minutes, and 0 is disabled.
Restore files execute hdfs dfs-mv / user/root/.Trash/Current/ erroneously deleted files / original path
134. the order of some tasks in the linux timing script has been changed, resulting in some tasks not being executed and some repeated execution.
Solution: Linux script takes effect in real time after modification. Be sure to modify the script after all execution to avoid side effects.
Experience: spark has two partition methods, coalesce and repartition, the former is narrowly dependent and the data is uneven after partition, while the latter is widely dependent, which leads to shuffle operation and uniform data after partition.
Org.apache.spark.SparkException: Task failed while writing rows scala.MatchError: Buffer (10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)
Solution: ES data is not compatible with sparksql type conversion. You can take ES data as a string through EsSpark.esJsonRDD, and then convert rdd to dataframe.
137 、 Container exited with a non-zero exit code 143 Killed by external signal
Solution: do not allocate enough resources, increase memory or adjust the code to avoid excessive memory consumption of large objects such as JsonObject, or Include below properties in yarn-site.xml and restart VM
Yarn.nodemanager.vmem-check-enabled
False
Whether virtual memory limits will be enforced for containers
Yarn.nodemanager.vmem-pmem-ratio
four
Ratio between virtual memory to physical memory when setting memory limits for containers
138. manually generate maven dependencies on existing jar
Solution: mvn install:install-file-Dfile=spark-assembly-1.6.2-hadoop2.6.0.jar-DgroupId=org.apache.repack-DartifactId=spark-assembly-1.6.2-hadoop2.6.0-Dversion=2.6-Dpackaging=jar
139The FAILED: SemanticException [Error 10006]: Line 1 Error 122 Partition not found''2016-08-01'
Solution: hive version is too new, hive itself bug, reduce hive version from 2.1.0 to 1.2.1
ParseException line 1:17 mismatched input 'hdfs' expecting StringLiteral near' inpath' in load statement
Solution: remove the IP port number prefix that begins with hdfs, write the absolute path in HDFS directly, and enclose it in single quotation marks
141The [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected solution: export HADOOP_USER_CLASSPATH_FIRST=true
The shell script started in crontab does not work properly, but there is no problem with manual execution
Solution: write source / etc/profile on the first line of the script, because the cront process does not automatically load the .profile file in the user directory
143 、 SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted
Solution: insufficient cluster resources to ensure that the real remaining memory is larger than the memory requested by spark job
144 、 PrestoException: ROW comparison not supported for fields with null elements
Solution: replace! = null with is not null
Start the presto server. Some nodes failed to start.
Solution: the memory allocated by JVM must be less than the real remaining memory
Experience: once the presto process is started, JVM server will always take up memory
147 、 Error injecting constructor, java.lang.IllegalArgumentException: query.max-memory-per-node set to 20GB, but only 10213706957B of useable heap available
Solution: Presto will claim 0.40 * max heap size for the system pool, so your query.max-memory-per-node must not exceed this. You can increase the heap or decrease query.max-memory-per-node.
148 、 failed: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. Failed java.util.concurrent.CancellationException: Task was cancelled
Solution: such exceptions caused by timeout limits, extend the waiting time, and set exchange.http-client.request-timeout=50s in the config configuration of the work node
What are the mainstream schemes for big data's ETL visualization
Solution: the technology stacks that can be considered are ELK (elasticsearch+logstash+kibana) or HPA (hive+presto+airpal)
Experience: presto cluster does not need to adopt on yarn mode, because hadoop relies on HDFS, if some machine disks are very small, HADOOP will be very embarrassed, while presto is pure memory computing, does not rely on disks, independent installation can be across multiple clusters, it can be said that where there is memory, there can be presto
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.