Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does Python linear classification mean?

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article is about what Python linear classification means. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

By constraining the equality of covariance of classes, the Bayesian classifier is simplified to a linear classifier.

Compare the performance of generation model and discriminant model in challenge classification tasks.

In this lab lesson: we will compare the "generative modeling" and "discriminant modeling" methods of linear classification. For the "generate" approach, we will revisit the Bayesian classification code we used in the previous exercise, but we will restrict the system to have an equal covariance matrix, that is, a covariance matrix to represent all categories, instead of each category having its own covariance matrix. In this case, the system becomes a linear classifier. We will compare it with the "discriminant" method, in which we use perceptron learning algorithm to learn linear classifier parameters directly.

In this notebook, we will use another dataset from the UCI machine learning library: abalone data. Abalone is a kind of conch. The age of a sample can be determined by cutting the shell on a cone and counting a ring with a microscope (more like a tree), but this is a time-consuming and expensive process. The task here is to try and predict the number of rings by simply measuring the weight and size of the animal. For the dataset we are using, the true value of the number of rings is known (that is, the rings are counted after measuring the snail). The results range from 1 to 29 rings, so this is usually regarded as 29 categories of classification problems. To simplify, I reassembled the data into two classes of roughly the same size: young (less than 10 rings) and old (10 or more rings). And I only took samples of women. There are seven measurements (all of which are highly related) for predicting category labels.

Generative modeling: Bayesian classification with equicovariant multivariate normal distribution.

There are more samples (1306, 178) than the previous introduction, so we don't have to worry about missing a test, instead, we just need to cut the data into test and training sets of the same size as the previous one.

By modifying the code written last time, the multivariate normal distribution with complete covariance matrix is used to evaluate the performance of Bayesian classifier. When considering changes to the code, note that the main difference is that there are only two classes in this notebook, not three. If you prefer, you can try wrapping the code in a function to see if it can be designed for any number of classes. )

What is the performance of your classifier? The score for this task may be between 60% and 70%, so if the performance seems to be much worse than the previous task, don't worry. If the performance is less than 60%, you should check the code for possible bug.

Import numpy as npX = np.loadtxt (open ("data/abalone.txt", "r")) X.shapefrom scipy.stats import multivariate_normalimport matplotlib.pyplot as plt%matplotlib inlineabalone1 = X [X [:, 0] = = 1,:] abalone2 = X [X [:, 0] = = 2,:] abalone1_test = abalone1 [0 data/abalone.txt 2,:] abalone1_train = abalone1 [1 data/abalone.txt 2,:] abalone2_test = abalone2 [0 data/abalone.txt 2,:] abalone2_train = abalone2 [1 data/abalone.txt 2 Abalone_test = np.vstack ((abalone1_test, abalone2_test)) abalone_test.shapemean1 = np.mean (abalone1_train [:, 1:], axis=0) mean2 = np.mean (abalone2_train [:, 1:], axis=0) cov1 = np.cov (abalone1_train [:, 1:], rowvar=0) cov2 = np.cov (abalone2_train [:, 1:], rowvar=0) dist1 = multivariate_normal (mean=mean1, cov=cov1) dist2 = multivariate_normal (mean=mean2, cov=cov2) p1 = dist1.pdf (abalone_test [: 1:]) p2 = dist2.pdf (abalone_test [:, 1:]) p = np.vstack ((p1, p2)) index = np.argmax (p, axis=0) + 1plt.plot (index, "k.", ms=10) correct = abalone_test [:, 0] = indexpercent_correct = np.sum (correct) * 100.0 / index.shapeprint (percent_correct)

Rowvarbool, optional

If rowvar is True (the default), each row represents a variable and the column contains observations. Otherwise, the relationship is transformed: each column represents a variable, while the row contains observations.

Use an equal covariance matrix:

If you correctly follow the same steps as the previous note, you will estimate a separate covariance matrix for each class. These matrices will not be equal, so your system will not be a linear classifier (that is, it will have a non-planar decision boundary). In order to simplify it to a linear system, we need to ensure that there is only one covariance matrix. Can you imagine doing this?

The same way:

First, you can imagine simply estimating a single covariance matrix from a complete training set, and then dividing it into classes. This will generate a matrix, but this is not the right thing to do. We want the matrix to represent the distribution within the class, and if you only use the complete training data set to train the model, it will also capture the distribution between classes.

Secondly, you can imagine averaging two class correlation covariance matrices. This is closer to the correct situation, but it does not take into account the fact that the number of examples of classes may not be equal.

The best way is to first move the center of two classes to the same point, and then treat them as a single class. To move the class center to the same point, simply subtract the class average vector from each data sample.

Def centre_data (data): nsamples = data.shape [0] data_mean = np.mean (data, axis=0) data_centred = data-data_mean return data_centredabalone1_centred = centre_data (abalone1_train) abalone2_centred = centre_data (abalone2_train) abalone_centred = np.vstack ((abalone1_centred, abalone2_centred)) cov_global = np.cov (abalone_centred [:, 1:], rowvar=0) dist1 = multivariate_normal (mean=mean1, cov=cov_global) dist2 = multivariate_normal (mean=mean2 Cov=cov_global) p1 = dist1.pdf (abalone_test [:, 1:]) p2 = dist2.pdf (abalone_test [:, 1:]) p = np.vstack ((p1, p2)) index = np.argmax (p, axis=0) + 1plt.plot (index, "k.", ms=10) correct = abalone_test [:, 0] = indexpercent_correct = np.sum (correct) * 100.0 / index.shapeprint (percent_correct) Thank you for reading! This is the end of this article on "what is the meaning of Python linear classification". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report