Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of logical regression and unsupervised Learning in python

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of logical regression and unsupervised learning in python. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

First, logical return 1. Saving and loading of Model

After the model is trained, it can be saved directly, and the joblib library is needed. It is saved in pkl format, binary, and saved by the dump method. You can use the load method when loading.

Install joblib:conda install joblib

Save: joblib.dump (rf, 'test.pkl')

Load: estimator = joblib.load ('model path')

After loading, the test set can be directly substituted for prediction.

two。 Principle of logical regression

Logical regression is a classification algorithm, but the standard of this classification is to use sigmoid function after h (x) input, and at the same time, according to the threshold, it can output numbers between 0 and 1 according to different h (x) values. We think of the output between 0 and 1 as probability. Suppose the threshold is 0.5, then if it is greater than 0.5, we think it is 1, otherwise we think it is 0. Logical regression is suitable for binary classification problems.

Input of ① logical regression

As you can see, the input is still a linear regression model, which still has the weight w and the eigenvalue x, and our goal is still to find the most appropriate w.

② sigmoid function

The image of the function is as follows:

The function formula is as follows:

Z is the result of regression h (x). Through the transformation of sigmoid function, no matter what value z is, the output is between 0 and 1. Then we need to choose the most appropriate weight w, so that the probability of the output and the results can be as close as possible to the target value of the training set. Therefore, logical regression also has a loss function, which is called logarithmic likelihood loss function. By minimizing it, the goal w can be obtained.

Loss function of ③ Logic regression

The image of the loss function at yearly 1 and 0 is as follows:

As can be seen from the figure above, if the true value category is 1, the closer the output given by h (x) is to 1, the smaller the loss function is, and vice versa. It's the same when yawning is zero. So according to this, when the loss function is minimum, our target is found.

Characteristics of ④ logical regression

Logical regression is also solved by gradient descent. For the mean square error, there is only one minimum, and there is no local minimum, but for the logarithmic likelihood loss, there may be multiple local minimums. At present, there is no complete solution to the local minimum problem. Therefore, we can only avoid it by initializing randomly many times and adjusting the learning rate. However, even if the final result is a local optimal solution, it is still a good model.

3. Logical regression API

Sklearn.linear_model.LogisticRegression

Among them, penalty is the regularization way, C is the punishment strength.

4. Logical regression case Overview of ① case

In the given data, multiple features are used to comprehensively judge whether the tumor is malignant or not.

Specific process of ②

Because the flow of the algorithm is basically the same, and the focus is on the processing of data and features, it will not be described in detail in this paper, and the code is as follows:

Note:

The target values of logical regression are not 0 and 1, but 2 and 4, but do not need to be processed, and will be automatically marked as 0 and 1 in the algorithm.

After the prediction of the algorithm, if you want to see the recall rate, you need to pay attention to the name of the category, but you need to label the name before giving the name. See the picture above. Otherwise, the method does not know which is benign and which is malignant. When labeling, the order should be well matched.

In general, which category has fewer samples will be judged according to which one. For example, if there are few malignant ones, they will be judged by "judging the probability of being malignant".

5. Logical regression summary

Application: prediction of click-through rate of ads, illness and other two-category problems

Pros: suitable for scenarios where you need to get a classification probability

Disadvantages: when the feature space is very large, the performance of logical regression is not very good (depending on hardware capabilities)

Second, unsupervised learning

Unsupervised learning means not giving the right answer. In other words, there are no target values in the data, only eigenvalues.

Principle of 1.k-means clustering algorithm

Suppose there are three categories of clustering, and the process is as follows:

① randomly takes three samples from the data as the three central points of the category.

② calculates the distance between the three center points of the remaining points, and selects the nearest point as its own mark. Form three ethnic groups

③ calculated the averages of the three groups and compared the three averages with the previous three central points. If the same, end the clustering, if different, take the three averages as the new clustering center, repeat the second step.

2.k-means API

Sklearn.cluster.KMeans

Usually, clustering is done before classification. First, the samples are clustered and marked, and then when there are new samples, they can be classified according to the criteria given by clustering.

3. Clustering performance Evaluation principle of ① performance Evaluation

To put it simply, it is the distance between each point in the class and the "point within the class" and the distance from the "point outside the class". The closer to the point in the class, the better. And the farther away from the point outside the class, the better.

If sc_i is less than 0, it means that the average distance of aplasi is greater than that of the nearest other clusters. The clustering effect is not good.

If the sc_i is larger, it means that the average distance of the aplasi is smaller than that of the nearest other clusters. Clustering effect is good.

The value of the contour coefficient is between [- 1], and the closer to 1 means that the degree of cohesion and separation are relatively better.

② performance Evaluation API

Sklearn.metrics.silhouette_score

Clustering algorithm is easy to converge to local optimization and can be solved by multiple clustering.

This is the end of the article on "example Analysis of logical regression and unsupervised Learning in python". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report