In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to analyze the essence of MLlib in Spark? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Org.apache.spark.ml (http://spark.apache.org/docs/latest/ml-guide.html)
Org.apache.spark.ml.attributeorg.apache.spark.ml.classificationorg.apache.spark.ml.clusteringorg.apache.spark.ml.evaluationorg.apache.spark.ml.featureorg.apache.spark.ml.paramorg.apache.spark.ml.recommendationorg.apache.spark.ml.regressionorg.apache.spark.ml.source.libsvmorg.apache.spark.ml.treeorg.apache.spark.ml.tuningorg.apache.spark.ml.util
Org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html)
Org.apache.spark.mllib.classificationorg.apache.spark.mllib.clusteringorg.apache.spark.mllib.evaluationorg.apache.spark.mllib.featureorg.apache.spark.mllib.fpmorg.apache.spark.mllib.linalgorg.apache.spark.mllib.linalg.distributedorg.apache.spark.mllib.pmmlorg.apache.spark.mllib.randomorg.apache.spark.mllib.rddorg.apache.spark.mllib.recommendationorg.apache.spark.mllib.regressionorg.apache.spark.mllib.statorg.apache.spark.mllib.stat.distributedorg.apache. Spark.mllib.stat.testorg.apache.spark.mllib.treeorg.apache.spark.mllib.tree.configurationorg.apache.spark.mllib.tree.impurityorg.apache.spark.mllib.tree.lossorg.apache.spark.mllib.tree.modelorg.apache.spark.mllib.util
ML concept
DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.Parameter: All Transformers and Estimators now share a common API for specifying parameters.
ML classification and regression
Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All) Regression Linear regressionDecision tree regression Random forest regression Gradient-boosted tree regression Survival regressionDecision treesTree Ensembles Random Forests Gradient-Boosted Trees (GBTs)
ML clustering
K-meansLatent Dirichlet allocation (LDA)
MLlib data type
Local vectorLabeled pointLocal matrixDistributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix
MLlib classification and regression
Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive BayesMulticlass Classification:logistic regression, decision trees, random forests, naive BayesRegression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
MLlib clustering
K-meansGaussian mixturePower iteration clustering (PIC, mostly used for image recognition) Latent Dirichlet allocation (LDA, mostly used for topic classification) Bisecting k-meansStreaming k-means
MLlib Models
DecisionTreeModelDistributedLDAModelGaussianMixtureModelGradientBoostedTreesModelIsotonicRegressionModelKMeansModelLassoModelLDAModelLinearRegressionModelLocalLDAModelLogisticRegressionModelMatrixFactorizationModelNaiveBayesModelPowerIterationClusteringModelRandomForestModelRidgeRegressionModelStreamingKMeansModelSVMModelWord2VecModel
Example
Import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.param.ParamMap import org.apache.spark.mllib.linalg. {Vector, Vectors} import org.apache.spark.sql.Row val training = sqlContext.createDataFrame (Seq ((1.0,Vectors.dense (0.0,1.1,0.1)), (0.0,Vectors.dense (2.0,1.0,1.0)), (0.0, Vectors.dense (2.0,1.3) ), (1.0, Vectors.dense (0.0,1.2,-0.5)) .toDF ("label" "features") val lr = new LogisticRegression () println ("LogisticRegression parameters:\ n" + lr.explainParams () + "\ n") lr.setMaxIter (10) .setRegParam val model1 = lr.fit (training) println ("Model 1 was fit using parameters:" + model1.parent.extractParamMap) val paramMap = ParamMap (lr.maxIter-> 20) .put (lr.maxIter, 30) .put (lr.regParam-> 0.1) Lr.threshold-> 0.55) val paramMap2 = ParamMap (lr.probabilityCol-> "myProbability") val paramMapCombined = paramMap + + paramMap2val model2 = lr.fit (training, paramMapCombined) println ("Model 2 was fit using parameters:" + model2.parent.extractParamMap) test = sqlContext.createDataFrame (Seq ((1.0,Vectors.dense (- 1.0,1.5,1.3), (0.0,Vectors.dense (3.0,2.0,0.1)), Vectors.dense (0.0,2.2,1.5)) .toDF ("label", "features") model2.transform (test) .select ("features", "label", "myProbability", "prediction") .foreach {case Row (features: Vector, label: Double, prob: Vector, prediction: Double) = > println (s "($features, $label)-> prob=$prob, prediction=$prediction")} Have you mastered how to analyze the nature of MLlib in Spark? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.