What are the pits encountered in the Spark development process? 04/27 Update SLTechnology News&Howtos

What are the pits encountered in the Spark development process?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what are the pits encountered in the process of Spark development?" in the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Let's start with the background:

Three servers, hadoop, hbase and spark are all cluster environments and are all built on these three servers.

It is planned to execute spark application remotely in the way of driver.

Pit 1: the development uses a ubuntu virtual machine. The automatically generated ip address is not on the same network segment as the server cluster, which causes the server cluster to fail to communicate with driver normally. Then set the virtual machine network to bridge mode, done!

Pit 2: most of the information about setJars, including the official information of apache, is based on the examples given by spark submit or shell. There is no mention of setJars at all, which leads to all kinds of inexplicable problems. Only later did I know that I need to use setJars to package the jar of driver to spark cluster,done!

Pit 3: still about setJars, due to the need to access hbase, I introduced the jar package of hbase. At first, I thought that I could put the jar package in lib and include it in the jar of driver. Later, I found that it would not work. I need to separately give the jar package to the past, done! (can you set the dependent external jar package in the environment variables of spark? I tried SPARK_CLASSPATH and found it didn't work. I don't know what other way to do. )

Pit 4: er ~ ~, or about setJars, use saveAsHadoopDataset to save RDD directly into hbase, the code looks at my previous post, after starting to output some logs, the log will not move! Find all kinds of questions and post everywhere for help! No one cares! Finally, I adjusted the spark log level to debug, found sockettimeoutexception, and found that it was connected to port 10620. I suspected that the port was occupied, so I changed region server port to something else, and the error remained the same. I had no choice but to go to dinner, and when I came back from dinner, spark finally gave up after n retries, and then gave an error log, which turned out to be a lack of jar package. Added a metrics-core-2.2.0.jar,done!

This is the end of the content of "what are the pits encountered in the process of Spark development". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.