What are the knowledge points of Pyspark linear regression gradient descent cross-validation? 07/02 Update SLTechnology News&Howtos

What are the knowledge points of Pyspark linear regression gradient descent cross-validation?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you what are the knowledge points of Pyspark linear regression gradient descent cross-verification. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

I'm trying to perform cross-validation on the SGD model in pyspark, and I'm using LinearRegressionWithSGD where both pyspark.mllib.regression,ParamGridBuilder and CrossValidator come from the pyspark.ml.tuning library.

After tracking the files on the Spark website, I hope that running this method will work properly

Lr = LinearRegressionWithSGD () pipeline=Pipeline (stages= [LR]) paramGrid = ParamGridBuilder ()\ .addGrid (lr.stepSize, Array (0.1,0.01))\ .build () crossval = CrossValidator (estimator=pipeline,estimatorParamMaps= paramGrid, evaluator=RegressionEvaluator (), numFolds=10)

But LinearRegressionWithSGD () has no attribute stepSize (and no luck has tried anyone else).

I can set lr to LinearRegression, but I can't use SGD in the model and cross-validate.

There is a kFold method in Skala, but I'm not sure how to access it from pyspark

Solution

You can use the step parameter in LinearRegressionWithSGD to define the step size, but because you are mixing incompatible libraries, this will prevent the code from working properly. Unfortunately, I don't know how to use SGD optimization to cross-validate ml libraries, I want to know myself, but you're using a mix of pyspark.ml and pyspark.mllib libraries. Specifically, you cannot use LinearRegressionWithSGD with the pyspark.ml library. You must use pyspark.ml.regression.LinearRegression.

The good news is that you can set the setsolver property of pyspark.ml.regression.LinearRegression to use 'gd'. Therefore, you might be able to set the parameter of the 'gd' optimizer to run in SGD, but I'm not sure where the solver document is or how to set solver properties (such as batch size). This api shows the LinearRegression object that calls Param (), but I'm not sure if it uses the pyspark.mllib optimizer. If someone knows how to set solver properties, you can answer your question by allowing you to use the Pipeline,ParamGridBuilder and CrossValidation ml packages for LinearRegression model selection and parameter tuning using SGD optimization.

These are the knowledge points of Pyspark linear regression gradient descent cross-validation shared by Xiaobian. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.