In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
For more information on big data's analysis and modeling, please follow the official account "bigdatamodeling".
A headache is often encountered in the classification problem, that is, the non-equilibrium problem in which there is a large deviation in the category of target variables. This will cause the prediction result to be biased towards multi-category, because multi-category accounts for more weight in the loss function, and the multi-category bias can make the loss function smaller.
There are generally two methods to deal with non-equilibrium problems, undersampling and oversampling. The undersampling method can generate more concise and balanced data sets and reduce the learning cost. But it also brings some problems, it will delete some useful samples, especially when the non-equilibrium proportion is large, deleting more samples will seriously distort the distribution of the original data, and then affect the generalization ability of the classifier.
Therefore, the over-sampling method was later developed, which does not delete multi-category samples, but deals with the non-equilibrium problem by copying a small number of samples. However, the application of random oversampling method to copy a small number of class samples means that a higher weight is given to the fewer category samples, which is prone to over-fitting problems.
In 2002, researchers proposed SMOTE (Synthetic Minority Oversampling Technique) method to replace the standard random oversampling method, which can overcome the over-fitting problem caused by random oversampling to some extent and improve the generalization ability of the classifier. The SMOTE method creates new small-class samples by applying interpolation to the nearest neighbors of small-class samples, instead of simply copying or assigning weights.
The steps of SMOTE algorithm are as follows:
(1) to select the K nearest neighbors of the I small class sample in all the small category samples.
(2) N samples are randomly selected from K neighbors, and N new samples are obtained by interpolation.
(3) repeat steps (1) and (2) until all small category samples are traversed.
See the following figure:
Pseudo code of SMOTE algorithm:
# = = SMOTE algorithm pseudo code = # Algorithm 1 SMOTE algorithm1: function SMOTE (T, N, k) Input: T; N; k # T:Number of minority class examples # N:Amount of oversampling # K:Number of nearest neighbors Output: * T # synthetic minority class samples Variables: Sample [] # array for original minority class samples; newindex # keeps a count of number of synthetic samples generated, initialized to 0 Synthetic [] [] # array for synthetic samples2: if N < 100 then3: Randomize the T minority class samples4: t = (N to 100) * T5: n = 1006: end if7: n = (int) N The amount of SMOTE is assumed to be in integral multiples of 100.8: for I = 1 to T do9: Compute k nearest neighbors for i, and save the indices in the nnarray10: POPULATE (N, I) Nnarray) 11: end for12: end functionAlgorithm 2 Function to generate synthetic samples1: function POPULATE (N, I, nnarray) Input: n I Nnarray # N:instances to create # i:original sample index # nnarray:array of nearest neighbors Output: N new synthetic samples in Synthetic array2: while N! = 0 do3: nn = random (1 K) 4: for attr = 1 to numattrs do # numattrs:Number of attributes5: Compute: dif = for attr [nn]] [attr] − Sample [I] [attr] 6: Compute: gap = random (0,1) 7: Synthetic [newindex] [attr] = Sample [I] [attr] + gap * dif8: end for9: newindex + + 10: n − − 11: end while12: end function
Python implementation of SMOTE algorithm:
Here is the simplest version of SMOTE implemented in python:
# python3.6 # import randomimport numpy as npfrom sklearn.neighbors import NearestNeighborsdef smote_sampling (samples, Numb100, Kraft 5): n_samples, n_attrs = samples.shape if N
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.