How to run spark machine learning code in eclipse 07/04 Update SLTechnology News&Howtos

How to run spark machine learning code in eclipse

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to run spark machine learning code in eclipse, which is very detailed and has certain reference value. Friends who are interested must finish it!

Run directly in eclipse, do not need hadoop, do not need to build spark, only need dependency integrity in pom.xml

Import org.apache.spark. {SparkConf, SparkContext} import org.apache.spark.mllib.classification.LogisticRegressionWithSGDimport org.apache.spark.mllib.feature.HashingTFimport org.apache.spark.mllib.regression.LabeledPointobject MLlib {def main (args: Array [String]) {val conf = new SparkConf (). SetAppName (s "Book example: Scala"). SetMaster ("local [2]") val sc = new SparkContext (conf) / / Load 2 types of emails from text files: spam and ham (non-spam). / / Each line has text from one email. Val spam = sc.textFile ("file:/Users/xxx/Documents/hadoopTools/scala/eclipse/Eclipse.app/Contents/MacOS/workspace/spark_ml/src/main/resources/files/spam.txt") val ham = sc.textFile ("file:/Users/xxx/Documents/hadoopTools/scala/eclipse/Eclipse.app/Contents/MacOS/workspace/spark_ml/src/main/resources/files/ham.txt") / / val abc=sc.parallelize (seq 2) / / Create a HashingTF instance to map email text to vectors of 100features. Val tf = new HashingTF (numFeatures = 100) / / Each email is split into words, and each word is mapped to one feature. Val spamFeatures = spam.map (email = > tf.transform (email.split (")) val hamFeatures = ham.map (email = > tf.transform (email.split (") / / Create LabeledPoint datasets for positive (spam) and negative (ham) examples. Val positiveExamples = spamFeatures.map (features = > LabeledPoint (1, features)) val negativeExamples = hamFeatures.map (features = > LabeledPoint (0, features)) val trainingData = positiveExamples + + negativeExamples trainingData.cache () / / Cache data since Logistic Regression is an iterative algorithm. / / Create a Logistic Regression learner which uses the LBFGS optimizer. Val lrLearner = new LogisticRegressionWithSGD () / / Run the actual learning algorithm on the training data. Val model = lrLearner.run (trainingData) / / Test on a positive example (spam) and a negative one (ham). / / First apply the same HashingTF feature transformation used on the training data. Val posTestExample = tf.transform ("O M G GET cheap stuff by sending money to..." .split (")) val negTestExample = tf.transform (" Hi Dad, I started studying Spark the other... ".split (")) / / Now use the learned model to predict spam/ham for new emails. Println (s "Prediction for positive test example: ${model.predict (posTestExample)}") println (s "Prediction for negative test example: ${model.predict (negTestExample)}") sc.stop ()}}

The parameter in sc.textFile is the absolute path of the file locally.

SetMaster ("local [2]") indicates that it is running locally, using only two cores

HashingTF is used to create an entry frequency feature vector from a document, where the dimension is set to 100. 0.

TF-IDF (Term frequency-inverse document frequency) is a widely used feature vectorization method in text mining. TF-IDF reflects the importance of words in the corpus to the document. Suppose the word is represented by t, the document is represented by d, and the corpus is represented by D, then the document frequency DF (t, D) is the number of documents containing the word t. If we only use word frequency to measure importance, it is easy to overemphasize words with more burdens but less information, such as "a", "the" and "of". If a word appears frequently in the entire corpus, it means that it does not carry information specific to a particular document. Inverse document frequency (IDF) is a numerical measure of the amount of information a word carries.

Pom.xml

4.0.0 com.yanan.spark_maven spark1.3.1 0.0.1-SNAPSHOT jar spark_maven http://maven.apache.org UTF-8 1.9.13 junit Junit 3.8.1 test org.scala-lang scala-library 2.10.4 org. Apache.spark spark-core_2.10 1.3.1 org.apache.spark spark-mllib_2.10 1.3.1 Org.scala-tools maven-scala-plugin Compile testCompile Scala-tools.org Scala-tools Maven2 Repository http://scala-tools.org/repo-releases Cloudera-repo-releases https://repository.cloudera.com/artifactory/repo/

Ham.txt

Dear Spark Learner, Thanks so much for attending the Spark Summit 2014! Check out videos of talks from the summit at... Hi Mom, Apologies for being late about emailing and forgetting to send you the package. I hope you and bro have been... Wow, hey Fred, just heard about the Spark petabyte sort. I think we need to take time to try it out immediately... Hi Spark user list, This is my first question to this list, so thanks in advance for your help! I tried running... Thanks Tom for your email. I need to refer you to Alice for this one. I haven't yet figured out that part either... Good job yesterday! I was attending your talk, and really enjoyed it. I want to try out GraphX... Summit demo got whoops from audience! Had to let you know. -- Joe

Spam.txt

Dear sir, I am a Prince in a far kingdom you have not heard of. I want to send you money via wire transfer so please... Get Vi_agra real cheap! Send money right away to... Oh my gosh you can be really strong too with these drugs found in the rainforest. Get them cheap right now... YOUR COMPUTER HAS BEEN INFECTED! YOU MUST RESET YOUR PASSWORD. Reply to this email with your password and SSN... THIS IS NOT A SCAM! Send money and get access to awesome stuff really cheap and never have to. Vi_agra was supposed to remove the underline.

These are all the contents of the article "how to run spark Machine Learning Code in eclipse". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.