How to implement PCA in Numpy 10/22 Update SLTechnology News&Howtos

How to implement PCA in Numpy

2025-10-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to achieve PCA in Numpy". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to implement PCA in Numpy".

Implementation of PCA in Numpy

From numpy import * '10.235186 11.32199710.122339 11.8109939.190236 8.9049439.306371 9.8473948.330131 8.34035210.152785 10.12353210.408540 10.821986.'''def loadDataSet (fileName, delim='\ t'): fr = open (fileName) stringArr = [line.strip (). Split (delim) for line in fr.readlines ()] datArr = [map (float) Line) for line in stringArr] return mat (datArr) def pca (dataMat, topNfeat=9999999): meanVals = mean (dataMat, axis=0) # calculate the column mean print meanVals # [[9.06393644 9.09600218]] print'= 'meanRemoved = dataMat-meanVals # remove mean # each column minus the mean covMat = cov (meanRemoved Rowvar=0) # calculate the covariance of the new matrix (minus the mean) print covMat # [1.05198368 1.1246314] [1.1246314 2.21166499]] # Covariance print'= 'eigVals EigVects = linalg.eig (mat (covMat)) # calculate the eigenvalues and Eigenvectors of the covariance matrix print eigVals # [0.36651371 2.89713496] print'= 'print eigVects # [ [- 0.85389096-0.52045195] [0.52045195-0.85389096]] print'= 'eigValInd = argsort (eigVals) # is sorted by eigenvalues from largest to smallest. Select topN eigValInd = eigValInd [:-(topNfeat+1):-1] redEigVects = eigVects [: EigValInd] print redEigVects print'='# [[- 0.52045195] [- 0.85389096] ] lowDDataMat = meanRemoved * redEigVects # N x 2 * 2 x 1 = > N x 1, that is, the matrix of N x 2 is converted into the matrix of N x 1 Dimension reduced to 1 reconMat = (lowDDataMat * redEigVects.T) + meanVals return lowDDataMat, reconMat

Mean: mean (X) = (x0 + x1 +... + xn) / n

Standard deviation: std = Math.sqrt ([x0-mean (x)] ^ 2 / (NMUI 1), 2)

Variance: var= [x0-mean (x)] ^ 2 / (nmur1)

For example, the average values of the two sets [0pc8 ~ 12] and [8 ~ 9 ~ 11 ~ 12] are both 10. 5%. But there is a big difference between the two sets. Calculate two standard deviations, the former is 8. 3 and the latter is 1. 8.

Shows that the latter is more concentrated. The standard deviation describes the "dispersion" of the data. The reason why it is divided by nmur1 instead of n. Because it enables us to approach the overall standard deviation better with smaller samples. That is, unbiased estimation.

Why do you need covariance?

Standard deviation and variance are generally used to describe one-dimensional data. But in real life, we often encounter datasets containing two-dimensional data. The simplest thing is to count the test results of many subjects when we go to school. The relationship between multidimensional data. Covariance is such a statistic to measure the relationship between two random variables.

Var (X) = {Math.pow (xi-mean (X), 2)} / (nmur1) = {xi-mean (X)} {xi-mean (X)} / (nMul 1)

Emulate the definition of variance:

Cov (Xmai Y) = {xi-mean (X)} {yi-mean (Y)} / (NMUI 1)

To measure the extent to which each dimension deviates from its mean.

The significance of the results of covariance:

If it is a positive value, it means that the two are positively correlated, and if the result is negative, it means that the two are negatively correlated. If it is 0, it means that the two are not related and independent of each other.

Multidimensional covariance: represented by matrix

Cov (xmeme x) cov (xpene y) cov (xmeme z)

It can be seen that the covariance matrix is a symmetric matrix, and the diagonals are the variances of each dimension. C=cov (yrecoery x) cov (yrecoery y) cov (yjinz) = >. It's 3-3.

Cov (zjinger x) cov (zjinger y) cov (zjinger z)

At this point, I believe you have a deeper understanding of "how to achieve PCA in Numpy". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.