In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
Today, I would like to share with you the relevant knowledge points about how to achieve 12 dimensionality reduction algorithms in Python. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article.
Why data dimensionality reduction is needed
The so-called dimensionality reduction means that a set of vector Zi with the number of d is used to represent the useful information contained in the vector Xi with the number of d, where d
Usually, we will find that the dimensions of most datasets can reach hundreds or even thousands, while the dimensions of the classic MNIST are 64.
MNIST handwritten digital data set
However, in practical application, the useful information we use does not need such a high dimension, and the number of samples required for each additional dimension increases exponentially, which may directly lead to a great "disaster of dimensionality". Data dimensionality reduction can be achieved:
Make the dataset easier to use
Ensure that variables are independent of each other
Reduce the computational cost of the algorithm
Once we can deal with the information correctly and reduce the dimension correctly and effectively, it will greatly help to reduce the amount of calculation and improve the operation efficiency of the machine. Data dimensionality reduction is also often used in text processing, face recognition, picture recognition, natural language processing and other fields.
Principle of data dimensionality reduction
Often, the data in high-dimensional space will be sparsely distributed, so in the process of dimensionality reduction, we usually do some data deletion, including redundant data, invalid information, repeated expression and so on.
For example, in an existing picture of 1024 to 1024, the zero information can be classified as useless information except for the area with center 50: 50, while for symmetrical graphics, the information in the symmetrical part can be classified as repetitive information.
Therefore, most of the classical dimensionality reduction techniques are also based on this content, in which dimensionality reduction methods are divided into linear and nonlinear dimensionality reduction, and nonlinear dimensionality reduction is divided into kernel function-based and eigenvalue-based methods.
Linear dimensionality reduction methods: PCA, ICA LDA, LFA, LPP (linear representation of LE)
Nonlinear dimensionality reduction method:
Nonlinear dimensionality reduction methods based on Kernel function-- KPCA, KICA, KDA
Nonlinear dimensionality reduction method based on eigenvalues (flow pattern Learning)-- ISOMAP, LLE, LE, LPP, LTSA, MVU
Heucoder, a master's degree student majoring in computer technology in Harbin Institute of Technology, collates 12 classical dimensionality reduction algorithms, including PCA, KPCA, LDA, MDS, ISOMAP, LLE, TSNE, AutoEncoder, FastICA, SVD, LE and LPP, and provides relevant information, code and presentation. The following will mainly take PCA algorithm as an example to introduce the specific operation of dimensionality reduction algorithm.
Principal component Analysis (PCA) dimensionality reduction algorithm
PCA is a mapping method based on mapping from high-dimensional space to low-dimensional space, and it is also the most basic unsupervised dimensionality reduction algorithm. Its goal is to project to the direction where the data changes most, or to the direction where reconstruction error is minimized. It was proposed by Karl Pearson in 1901 and belongs to linear dimensionality reduction method. The principles related to PCA are often called maximum variance theory or minimum error theory. The two goals are the same, but the focus of the process is different.
Dimensionality reduction principle of maximum Variance Theory
To reduce a set of N-dimensional vectors to K-dimensional (K > 0, less than N), the goal is to select K-unit orthogonal bases, each field pairwise COV (XMagi Y) is 0, and the variance of the field is as large as possible. Therefore, the maximum variance maximizes the variance of the projected data. In this process, we need to find the best projection space Wnxk and covariance matrix of the dataset Xmxn. The algorithm flow is as follows:
Algorithm input: dataset Xmxn
Calculate the mean Xmean of dataset X by column, and then make Xnew=X − Xmean
Solve the covariance matrix of the matrix Xnew and record it as Cov
Calculate the eigenvalues and corresponding Eigenvectors of the covariance matrix COV
The eigenvalues are sorted from large to small, and the largest k eigenvectors are selected, and then the corresponding k eigenvectors are used as column vectors to form the eigenvector matrix Wnxk.
Calculate the XnewW, that is, the dataset Xnew is projected onto the selected feature vector, so we get the reduced-dimensional dataset XnewW we need.
Dimensionality reduction principle of minimum error theory
The minimum error is the linear projection which minimizes the average projection cost. in this process, we need to find the parameters such as the square error evaluation function J0 (x0).
Principal component Analysis (PCA) Code implementation
The code for the PCA algorithm is as follows:
From _ future__ import print_functionfrom sklearn import datasetsimport matplotlib.pyplot as pltimport matplotlib.cm as cmximport matplotlib.colors as colorsimport numpy as np%matplotlib inlinedef shuffle_data (X, y, seed=None): if seed: np.random.seed (seed) idx = np.arange (X.shape [0]) np.random.shuffle (idx) return X [idx], y [idx] # normalized dataset Xdef normalize (X, axis=-1, pair2): lp_norm = np.atleast_1d (np.linalg.norm (X) P, axis) lp_ normalization [LP _ norm = = 0] = 1 return X / np.expand_dims (lp_norm Axis) # Normalized dataset Xdef standardize (X): X_std = np.zeros (X.shape) mean = X.mean (axis=0) std = X.std (axis=0) # always keep in mind that the denominator cannot be equal to 0 when performing division # X_std = (X-X.mean (axis=0)) / X.std (axis=0) for col in range (np.shape (X) [1]): if std [col]: X_std [: Col] = (X_std [:, col]-mean [col]) / std [col] return X_std# divides the data set into training set and test set def train_test_split (X, y, test_size=0.2, shuffle=True, seed=None): if shuffle: X, y = shuffle_data (X, y, seed) n_train_samples = int (X.shape [0] * (1-test_size) x_train, x_test = X [: n_train_samples] X [n _ train_samples:] y_train, y_test = y [: n_train_samples], y [n _ train_samples:] return x_train, x_test, y_train, y_test# calculate the covariance matrix def calculate_covariance_matrix (X) of matrix X. Y=np.empty ((0axis=0 0)): if not Y.any (): y = X n_samples = np.shape (X) [0] covariance_matrix = (1 / (n_samples-1)) * (X-X.mean (axis=0)) .T.dot (Y-Y.mean (axis=0)) return np.array (covariance_matrix Dtype=float) # calculate the variance of each column of dataset X def calculate_variance (X): n_samples = np.shape (X) [0] variance = (1 / n_samples) * np.diag ((X-X.mean (axis=0)) .T.dot (X-X.mean (axis=0) return variance# calculate the standard deviation of each column of dataset X def calculate_std_dev (X): std_dev = np.sqrt (calculate_variance (X)) ) return std_dev# calculates the correlation coefficient matrix def calculate_correlation_matrix (X Y=np.empty ([0]): # first calculate the covariance matrix covariance_matrix = calculate_covariance_matrix (X, Y) # calculate the standard deviation of X, Y std_dev_X = np.expand_dims (calculate_std_dev (X), 1) std_dev_y = np.expand_dims (calculate_std_dev (Y), 1) correlation_matrix = np.divide (covariance_matrix Std_dev_X.dot (std_dev_y.T)) return np.array (correlation_matrix, dtype=float) class PCA (): principal component analysis algorithm PCA Unsupervised learning algorithm. " Def _ _ init__ (self): self.eigen_values = None self.eigen_vectors = None self.k = 2 def transform (self, X): "" reduce the dimension of the original dataset X through PCA "" covariance = calculate_covariance_matrix (X) # solve eigenvalues and eigenvector self.eigen_values Self.eigen_vectors = np.linalg.eig (covariance) # sorts eigenvalues from large to small Note that the feature vectors are arranged in columns. That is, the k column of self.eigen_vectors is the eigenvector idx = self.eigen_values.argsort () [::-1] eigenvalues = self.eigen_ values [IDX] [: self.k] eigenvectors = self.eigen_vectors [:, idx] [: : self.k] # Mapping original dataset X to low-dimensional space X_transformed = X.dot (eigenvectors) return X_transformeddef main (): # Load the dataset data = datasets.load_iris () X = data.data y = data.target # Mapping dataset X to low-dimensional space X_trans = PCA (). Transform (X) x1 = X_trans [:, 0] x2 = X_trans [: 1] cmap = plt.get_cmap ('viridis') colors = [cmap (I) for i in np.linspace (0,1, len (np.unique (y))] class_distr = [] # Plot the different class distributions for i, lin enumerate (np.unique (y)): _ x1 = x1 [y = = l] _ x2 = x2 [y = = l] _ y = y [y = = l] class_distr.append (plt.scatter (_ x1) _ x2, color= colors [I]) # Add a legend plt.legend (class_distr, y, loc=1) # Axis labels plt.xlabel ('Principal Component 1') plt.ylabel (' Principal Component 2') plt.show () if _ _ name__ = = "_ _ main__": main ()
Finally, we will get the results of dimensionality reduction as follows. Among them, if we get that when the number of features (D) is much larger than the number of samples (N), we can use a little trick to achieve the complexity conversion of PCA algorithm.
Display of PCA dimensionality reduction algorithm
Of course, although this algorithm is classical and commonly used, its shortcomings are also very obvious. It can remove the linear correlation very well, but in the face of high-order correlation, the effect is poor; at the same time, the premise of PCA implementation is to assume that the main features of the data are distributed in the orthogonal direction, so the effect of PCA will be greatly reduced when there are several directions with large variance in the non-orthogonal direction.
Other dimensionality reduction algorithms and code address 1.KPCA (kernel PCA)
KPCA is the product of the combination of kernel technology and PCA. The main difference between PCA and kernel is that kernel function is used to calculate the covariance matrix, that is, the covariance matrix after kernel function mapping.
The introduction of kernel function can solve the problem of nonlinear data mapping. KPCA can map nonlinear data to high dimensional space and use standard PCA to map it to another low dimensional space in high dimensional space.
Display of KPCA dimensionality reduction algorithm
Code address
2.LDA (Linear Discriminant Analysis)
LDA is a technology that can be used as feature extraction. Its goal is to project to maximize inter-class differences and minimize intra-class differences, so as to facilitate classification and other tasks to effectively separate samples from different classes. LDA can improve the computational efficiency in the process of data analysis, and for models that can not be regularized, it can reduce the over-fitting caused by dimensional disaster.
Display of LDA dimensionality reduction algorithm
Code address
3.MDS (multidimensional scaling)
MDS is multidimensional scaling analysis, which is a traditional dimensionality reduction method to express the perception and preference of research objects through intuitive spatial graphs. This method calculates the distance between any two sample points, so that the relative distance can be maintained after the projection to the low-dimensional space so as to realize the projection.
Because MDS in sklearn adopts iterative optimization, iterative and non-iterative methods are implemented below.
Display of MDS dimensionality reduction algorithm
Code address
4.ISOMAP
Isomap is the equivalent metric mapping algorithm, which can solve the disadvantages of MDS algorithm in nonlinear structured data sets.
The MDS algorithm keeps the distance between the reduced samples unchanged, while the Isomap algorithm introduces the neighborhood graph, in which the samples are only connected with the adjacent samples, and then calculate the distance between the nearest neighbor points, and then reduce the dimension and keep the distance.
Display of ISOMAP dimensionality reduction algorithm
Code address
5.LLE (locally linear embedding)
LLE is a local linear embedding algorithm, which is a nonlinear dimensionality reduction algorithm. The core idea of the algorithm is that each point can be approximately reconstructed by the linear combination of several adjacent points, and then the high-dimensional data is projected into the low-dimensional space to maintain the local linear reconstruction relationship between the data points, that is, it has the same reconstruction coefficient. When dealing with the so-called manifold dimensionality reduction, the effect is much better than PCA.
Display of LLE dimensionality reduction algorithm
Code address
6.t-SNE
T-SNE is also a nonlinear dimensionality reduction algorithm, which is very suitable for dimensionality reduction to 2D or 3D visualization of high-dimensional data. It is an unsupervised machine learning algorithm that reconstructs the data trend at low latitudes (2D or 3D) based on the original trend of the data.
The following results show a reference to the source code and can also be implemented in tensorflow (no need to manually update the parameters).
Display of t-SNE dimensionality reduction algorithm
Code address
7.LE (Laplacian Eigenmaps)
LE is Laplace feature mapping, which is similar to LLE algorithm, and it also constructs the relationship between data from a local point of view. Its intuitive idea is to want the related points (the connected points in the graph) to be as close as possible in the reduced dimension space; in this way, a solution that can reflect the geometric structure of the manifold can be obtained.
Display of LE dimensionality reduction algorithm
Code address
8.LPP (Locality Preserving Projections)
LPP is a locally preserved projection algorithm, and its idea is similar to Laplace feature mapping. the core idea is to construct projection mapping by best keeping the neighbor structure information of a data set, but LPP is different from LE to get the projection result directly, it needs to solve the projection matrix.
Display of LPP dimensionality reduction algorithm
These are all the contents of the article "how to achieve 12 dimensionality reduction algorithms in Python". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.