How to analyze the essence of MLlib in Spark 07/19 Update SLTechnology News&Howtos

How to analyze the essence of MLlib in Spark

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to analyze the essence of MLlib in Spark? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Org.apache.spark.ml (http://spark.apache.org/docs/latest/ml-guide.html)

Org.apache.spark.ml.attributeorg.apache.spark.ml.classificationorg.apache.spark.ml.clusteringorg.apache.spark.ml.evaluationorg.apache.spark.ml.featureorg.apache.spark.ml.paramorg.apache.spark.ml.recommendationorg.apache.spark.ml.regressionorg.apache.spark.ml.source.libsvmorg.apache.spark.ml.treeorg.apache.spark.ml.tuningorg.apache.spark.ml.util

Org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html)

Org.apache.spark.mllib.classificationorg.apache.spark.mllib.clusteringorg.apache.spark.mllib.evaluationorg.apache.spark.mllib.featureorg.apache.spark.mllib.fpmorg.apache.spark.mllib.linalgorg.apache.spark.mllib.linalg.distributedorg.apache.spark.mllib.pmmlorg.apache.spark.mllib.randomorg.apache.spark.mllib.rddorg.apache.spark.mllib.recommendationorg.apache.spark.mllib.regressionorg.apache.spark.mllib.statorg.apache.spark.mllib.stat.distributedorg.apache. Spark.mllib.stat.testorg.apache.spark.mllib.treeorg.apache.spark.mllib.tree.configurationorg.apache.spark.mllib.tree.impurityorg.apache.spark.mllib.tree.lossorg.apache.spark.mllib.tree.modelorg.apache.spark.mllib.util

ML concept

DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.Parameter: All Transformers and Estimators now share a common API for specifying parameters.

ML classification and regression

Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All) Regression Linear regressionDecision tree regression Random forest regression Gradient-boosted tree regression Survival regressionDecision treesTree Ensembles Random Forests Gradient-Boosted Trees (GBTs)

ML clustering

K-meansLatent Dirichlet allocation (LDA)

MLlib data type

Local vectorLabeled pointLocal matrixDistributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix

MLlib classification and regression

Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive BayesMulticlass Classification:logistic regression, decision trees, random forests, naive BayesRegression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression

MLlib clustering

K-meansGaussian mixturePower iteration clustering (PIC, mostly used for image recognition) Latent Dirichlet allocation (LDA, mostly used for topic classification) Bisecting k-meansStreaming k-means

MLlib Models

DecisionTreeModelDistributedLDAModelGaussianMixtureModelGradientBoostedTreesModelIsotonicRegressionModelKMeansModelLassoModelLDAModelLinearRegressionModelLocalLDAModelLogisticRegressionModelMatrixFactorizationModelNaiveBayesModelPowerIterationClusteringModelRandomForestModelRidgeRegressionModelStreamingKMeansModelSVMModelWord2VecModel

Example

Import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.param.ParamMap import org.apache.spark.mllib.linalg. {Vector, Vectors} import org.apache.spark.sql.Row val training = sqlContext.createDataFrame (Seq ((1.0,Vectors.dense (0.0,1.1,0.1)), (0.0,Vectors.dense (2.0,1.0,1.0)), (0.0, Vectors.dense (2.0,1.3) ), (1.0, Vectors.dense (0.0,1.2,-0.5)) .toDF ("label" "features") val lr = new LogisticRegression () println ("LogisticRegression parameters:\ n" + lr.explainParams () + "\ n") lr.setMaxIter (10) .setRegParam val model1 = lr.fit (training) println ("Model 1 was fit using parameters:" + model1.parent.extractParamMap) val paramMap = ParamMap (lr.maxIter-> 20) .put (lr.maxIter, 30) .put (lr.regParam-> 0.1) Lr.threshold-> 0.55) val paramMap2 = ParamMap (lr.probabilityCol-> "myProbability") val paramMapCombined = paramMap + + paramMap2val model2 = lr.fit (training, paramMapCombined) println ("Model 2 was fit using parameters:" + model2.parent.extractParamMap) test = sqlContext.createDataFrame (Seq ((1.0,Vectors.dense (- 1.0,1.5,1.3), (0.0,Vectors.dense (3.0,2.0,0.1)), Vectors.dense (0.0,2.2,1.5)) .toDF ("label", "features") model2.transform (test) .select ("features", "label", "myProbability", "prediction") .foreach {case Row (features: Vector, label: Double, prob: Vector, prediction: Double) = > println (s "($features, $label)-> prob=$prob, prediction=$prediction")} Have you mastered how to analyze the nature of MLlib in Spark? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.