In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what kind of pit will be encountered in Spark, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.
1.1 Integration of Scala and Intellij to report errors
After the successful installation of Scala, I was ready to write Scala code on Intellij and found that Scala was all equipped (there are a lot of information on how to configure it online). As a result, I made an error in running the Scala program.
Error:
Error:scalac: Multiple 'scala-library*.jar' files (scala-library.jar, scala-library.jar, scala-library.jar) in Scala compiler classpath in Scala SDK scala-sdk-2.12.2
Solution: find an idea on OverStackflow. Open project structure in Intellij, delete the path to the existing Scala (my Scala is installed in the / usr/local/Cellar/scala/2.12.2 path), and re-add the / usr/local/Cellar/scala/2.12.2/idea/lib directory.
Before the change
After modification
1.2 Scala syntax Intellij does not recognize
A HelloWorld of Scala is written in Intellij with the following code
/ * Created by jackie on 17-5-7. * / package com.jackie.scala.s510 object HelloWorld {def main (args: Array [String]): Unit = {println ("hello world") println (increaseAnother (5)); println (Array). Map {(x:Int) = > mkString (",")); println (Array) map {(x:Int) = > Xue1} mkString (",")) Println (Array) map {(x:Int) = > Xero1} mkString (",") / / test object var person = new Person () person.name_= ("john") / / name_= () corresponds to the setter method println ("Person name:" + person.name) person.name = "Jackie" println ("Person name:" + person.name) var mp = new MyPerson () mp.name_ ("ali") println ("MyPerson name:" + person.name) var pwp = new PersonWithParam ("Jackie") in java 18) println ("PersonWithParam:" + pwp.toString ())} def increaseAnother (x: Int): Int = x + 1}
At run time, the error is not recognized by mkString.
Error: mkString can't be resolved
Solution: I need to explain the version parameters of each environment, Intellij-14.0, jdk-8, scala-2.12.2. But the only Scala*** version that can be selected in Intellij is 2.11, so later upgrade Intellij to version 2017.1, then error Error:scalac: Error: org.jetbrains.jps.incremental.scala.remote.ServerException, then open project structure in Intellij, change scala from 2.12.2 to 2.11.7, and the problem is solved.
1.3The integration of Spark and Intellij
The Spark environment is installed, so I want to run the Spark program in Intellij, but after adding the related dependencies of Spark, I find that I can't compile it.
Error:
Exception NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence
Solution: implement the declaration that spark-core2.10 is always referenced in maven. At this time, an error is reported. I locate the problem on Guava, and then find all the jar that indirectly depend on Guava, all exclude, and the problem is still unsolved. During this period, a lot of dependencies of Spark were added, but it didn't work even if I tried. * * I tried Spark-core2.11 to solve the problem (sometimes the compatibility of the version is really bad).
1.4 hadoop uploads local files to HDFS
If you want to upload local files to HDFS, use hadoop fs-put localDir hdfsDir, as long as you make sure hadoop starts.
Error:
Jackie@jackies-MacBook-Pro:~ | ⇒ hadoop fs-put ~ / Documents/doc/README.md / 10:56:39 on 17-05-13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable 17-05-13 10:56:40 WARN ipc.Client: Failed to connect to server: localhost/127.0.0.1:8020: try once and fail. Java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect (Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect (SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect (NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect (NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection (Client.java:681) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams (Client.java:777) at org.apache.hadoop.ipc.Client$Connection.access$3500 (Client.java:409) at org.apache.hadoop.ipc.Client.getConnection (Client.java:1542) at org.apache.hadoop.ipc.Client.call (Client.java:1373) at org.apache.hadoop.ipc.Client.call (Client.java:1337) at org.apache.hadoop .ipc.ProtobufRpcEngine $Invoker.invoke (ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke (ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy10.getFileInfo (Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo (ClientNamenodeProtocolTranslatorPB.java:787) at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43 ) at java.lang.reflect.Method.invoke (Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:398) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod (RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke (RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce (RetryInvocationHandler.java:95) at org.apache.hadoop .io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:335) at com.sun.proxy.$Proxy11.getFileInfo (Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo (DFSClient.java:1700) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall (DistributedFileSystem.java:1436) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall (DistributedFileSystem.java:1433) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81) at org. Apache.hadoop.hdfs.DistributedFileSystem.getFileStatus (DistributedFileSystem.java:1433) at org.apache.hadoop.fs.Globber.getFileStatus (Globber.java:64) at org.apache.hadoop.fs.Globber.doGlob (Globber.java:282) at org.apache.hadoop.fs.Globber.glob (Globber.java:148) at org.apache.hadoop.fs.FileSystem.globStatus (FileSystem.java:1685) at org.apache.hadoop.fs.shell.PathData.expandAsGlob (PathData.java At org.apache.hadoop.fs.shell.CommandWithDestination.getRemoteDestination (CommandWithDestination.java:195) at org.apache.hadoop.fs.shell.CopyCommands$Put.processOptions (CopyCommands.java:256) at org.apache.hadoop.fs.shell.Command.run (Command.java:164) at org.apache.hadoop.fs.FsShell.run (FsShell.java:315) at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:76) at org.apache .hadoop.util.ToolRunner.run (ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main (FsShell.java:378) put: Call From jackies-macbook-pro.local/192.168.73.56 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Solution: go to the hadoop installation directory (mine is / usr/local/Cellar/hadoop) and execute. / start-all.sh under sbin to start the hadoop service.
1.5 Spark start
The previous article did not configure the spark-defaults.conf file when configuring Spark, so there was an error starting / start-all.sh in the Spark installation directory (mine is / usr/local/Spark).
Error:
Spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel (newLevel). 17-05-13 13:42:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable 17-05-13 13:42:51 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 192.168.73.56 org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse (RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse (RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply (AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse ( RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse (RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply (PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult (RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI (RpcEnv.scala:88) at org.apache.spark.rpc.RpcEnv.setupEndpointRef (RpcEnv.scala:96) at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1 $$anon$1. Run (StandaloneAppClient.scala:106) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run (Thread.java:745) Caused by: java.io.IOException: Failed to connect to / 192.168.73.567077
Solution: make a copy of the spark-defaults.conf.template in the conf under the Spark installation directory, rename it to spark-defaults.conf, configure it according to https://sanwen8.cn/p/3bac5Bj.html, and then start Spark, and still report an error.
Https://sanwen8.cn/p/3bac5Bj.html Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel (newLevel). 17-05-13 14:19:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable 17-05-13 14:19:15 ERROR SparkContext: Error initializing SparkContext. Java.net.ConnectException: Call From jackies-MacBook-Pro.local/192.168.73.56 to 192.168.73.56 Call From jackies-MacBook-Pro.local/192.168.73.56 to 8021 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
So according to StackOverflow, change the spark.eventLog.enabled in spark-defaults.conf from true to false, and then start successfully.
Note: here I repeatedly configure localhost and my own ip, switch back and forth, and finally prove that as long as you configure the name of the mapping corresponding to ip in / etc/hosts, you can use the name directly without writing ip, and to keep the configuration file in hadoop consistent with the configuration file in spark, otherwise you will be exhausted.
1.6 error reporting that assigns the operation task to Spark to run
Run the following Demo program
Package com.jackie.scala.s513; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import scala.Tuple2; import java.util.Arrays; import java.util.Iterator; import java.util.List Import java.util.regex.Pattern; / * Created by jackie on 17-5-13. * / public class Simple {private static final Pattern SPACE = Pattern.compile (""); public static void main (String [] args) throws Exception {/ / create a RDD object SparkConf conf=new SparkConf (). SetAppName ("Simple") .setMaster ("local") / / create a spark context object, which is the data entry JavaSparkContext spark=new JavaSparkContext (conf); / / get the data source JavaRDD lines = spark.textFile ("hdfs://jackie:8020/") / * users can perform various operations on the basis of the DStream obtained from the data source. * the data obtained from the data source in the current time window is first segmented, * then calculated by Map and ReduceByKey methods, and of course, the results are output using the print () method. * / JavaRDD words = lines.flatMap (new FlatMapFunction () {@ Override public Iterator call (String s) {return Arrays.asList (SPACE.split (s)) .iterator ();}}) / / use RDD's map and reduce methods to calculate JavaPairRDD ones = words.mapToPair (new PairFunction () {@ Override public Tuple2 call (String s) {return new Tuple2 (s, 1);}}) JavaPairRDD counts = ones.reduceByKey (new Function2 () {@ Override public Integer call (Integer i1, Integer i2) {return i1 + i2;}}); List output = counts.collect () For (Tuple2 tuple: output) {/ / output calculation result System.out.println (tuple._1 () + ":" + tuple._2 ());} spark.stop ();}}
This program needs to read the README.md file in the root directory on HDFS, but before that I executed "hadoop namenode-format" (note that this operation caused a series of problems later). So I was going to use hadoop fs-put localDir hdfsDir to upload README.md again, but an error was reported at this time.
Error:
Hadoop fs-put / Users/jackie/Documents/doc/README.md / 15:47:15 on 17-05-13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable 15:47:16 on 17-05-13 WARN hdfs.DataStreamer: DataStreamer Exception org.apache.hadoop.ipc.RemoteException (java.io.IOException): File / README.md._COPYING_ could only be replicated to 0 nodes instead of minReplication (= 1). There are 0 datanode (s) running and no node (s) are excluded in this operation. At org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock (BlockManager.java:1733) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock (FSDirWriteFileOp.java:265) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java:2496) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock (NameNodeRpcServer.java:828)
Later found that datanode did not start, and then began to find the reason why datanode did not start, here http://www.aboutyun.com/thread-7931-1-1.html
When we perform file system formatting, we save a current/VERSION file in the namenode data folder (that is, the path of dfs.name.dir in the configuration file to the local system), record the namespaceID, and identify the version of the formatted namenode. If we format namenode frequently, the current/VERSION file saved in datanode (that is, the path of dfs.data.dir in the configuration file on the local system) is only the ID of namenode saved during your * format, which will cause id inconsistencies between datanode and namenode.
Solution: the approach taken is to get a successful prompt based on the execution of hadoop namenode-format.
At this time, execute the jps command again, and we can see datanode.
Similarly, the same execution of hadoop fs-put / Users/jackie/Documents/doc/README.md / reports an error as follows
Hadoop fs-put / Users/jackie/Documents/doc/README.md / 09:51:04 on 17-05-15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable 17-05-15 09:51:05 WARN ipc.Client: Failed to connect to server: jackie/192.168.73.56:8020: try once and fail. Java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect (Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect (SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect (NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect (NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection (Client.java:681) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams (Client.java:777) at org.apache.hadoop.ipc.Client$Connection.access$3500 (Client.java:409)
At first, I thought it was the configuration of ip, but repeatedly modified it to no avail, but later found that when I used jps, I didn't start namenode, so I looked for http://blog.csdn.net/bychjzh/article/details/7830508 on the Internet.
So delete the tmp directory originally configured in core-site.xml under / usr/local/Cellar/hadoop/hdfs, then create a new hadoop_tmp directory, and modify it in core-site.xml to
Hadoop.tmp.dir / usr/local/Cellar/hadoop/hdfs/hadoop_tmp A base for other temporary directories.
And execute hadoop namenode-format,*** after using start-all.sh to start all services, upload files successfully
Thank you for reading this article carefully. I hope the article "what pits will be encountered in Spark" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.