In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about how to achieve data scaling in sklearn, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
I. data scaling
Your preprocessed dataset may contain a variety of feature fields with different proportions, such as US dollars, kilograms, and prices. Some feature columns range from 100 to millions.
Many machine learning models do not want such a big difference, if the smaller the numerical difference between the attributes of the data set, the better the model will perform. Here is not a generalization, the specific reasons for friends to explore their own.
Method 1: data normalization
Data normalization refers to scaling the original data to between 0 and 1.
This scaling of input attributes works well for models that depend on the size of the value. For example, the distance measurement in K-nearest neighbor model and the preparation of regression coefficient.
Next, demonstrate data normalization with the well-known iris dataset:
# normalize the iris data set.
From sklearn.datasets import load_iris
From sklearn import preprocessing
# load data
Iris = load_iris ()
Print (iris.data.shape)
# separate the original data set, divided into independent variables and dependent variables
X = iris.data
Y = iris.target
# normalization processing
Normalized_X = preprocessing.normalize (X)
Method 2: data standardization
Data standardization refers to scaling the data and changing the distribution of each attribute so that the average value is 0 and the standard deviation is 1.
For models that depend on feature distribution, such as Gaussian processes, it is very useful to standardize features.
The example of still using irises:
# standardize the data set of Iris.
From sklearn.datasets import load_iris
From sklearn import preprocessing
# load data
Iris = load_iris ()
Print (iris.data.shape)
# separate the original data set, divided into independent variables and dependent variables
X = iris.data
Y = iris.target
# standardized processing
Standardized_X = preprocessing.scale (X)
After reading the above, do you have any further understanding of how to achieve data scaling in sklearn? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.