Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of outlier detection in data science?

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

What is the principle of outlier detection in data science? The detection methods of outliers include statistics-based methods, clustering-based methods, and some special methods to detect outliers. With pandas, you can directly use describe () to observe the statistical description of the data, or simply use a scatter chart to clearly observe the existence of outliers. Let's have a look with the editor.

1. Premise of outlier detection in data science

The data sample accords with the standard normal distribution, and the core of the normal distribution is the central limit theorem, that is, if a thing is affected by many factors, no matter what the distribution of each factor is, the average value of the result is the normal distribution. If we want to conform to the normal distribution, these factors must be independent of each other, and the factors that are not independent of each other will strengthen and influence each other, so we can not form a normal distribution.

Second, the principle of abnormal value detection in data science.

The curve under the standard normal distribution is a bell curve, the expected value μ determines its position, and its standard deviation σ determines the amplitude of the distribution. When μ = 0, σ = 1, the normal distribution is standard normal distribution. Therefore, for a group of data, if it conforms to the normal distribution, the outliers can be detected by the rule of thumb. It can be found in the same figure that 68.2% of the measured values fall within the range of plus or minus one standard deviation σ at μ, 95.4% of the measured values fall within the range of plus or minus two standard deviations σ at μ, and 99.7% of the values fall within the range of plus or minus three standard deviations σ at μ. Therefore, for a set of data that accords with normal distribution, if a value distance μ value exceeds three standard deviations σ, it can be judged that this value belongs to abnormal data.

III. Calculation steps

μ value: μ is the mean value of random variables that follow the normal distribution. Since the premise is that the influence of various factors on the results is added, the calculation of μ value can be the arithmetic mean of the sample data.

Standard deviation σ: all data minus the square sum of their averages, and the result is divided by the number N of the group (the dataset is the overall data, which is generally used in the big data algorithm) or the number N minus 1 (the dataset is the case of sample data. It is considered that the dataset is not the overall data but a part of the overall data, which is generally used for statistics), and then open the root sign, the resulting number is the standard deviation of this set of data.

Judgment logic: calculate μ + 3 σ, μ-3 σ. When a single data is greater than μ + 3 σ or less than μ-3 σ, the data is considered to be an outlier, because according to the rule of thumb, this data is outside the range of 99.7% of the data set.

First of all, understand the principle of data scientific outlier detection, master the calculation steps, and finally realize the data scientific outlier detection.

These are the details of what is the principle of outlier detection in data science, and do you have anything to gain after reading it? If you want to know more, welcome to the industry information!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report