Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of abnormal value Detection and handling in Python data

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of abnormal value detection and handling of Python data, which is very detailed and has a certain reference value. Interested friends must read it!

1 what is an outlier?

In machine learning, anomaly detection and handling is a relatively small branch, or a by-product of machine learning, because in general prediction problems, a model is usually an expression of the overall sample data structure. This expression usually grasps the general properties of the whole sample, and those points that are completely inconsistent with the overall sample in these properties are called outliers. Usually outliers are not welcomed by developers in prediction problems, because the prediction problem focuses on the properties of the whole sample, and the generation mechanism of outliers is completely inconsistent with the whole sample, if the algorithm is sensitive to outliers, then the generated model can not have a good expression of the whole sample, so the prediction will not be accurate. On the other hand, outliers are of great interest to analysts in some scenarios, such as disease prediction, the physical indicators of healthy people are usually similar in some dimensions, if a person's physical indicators are abnormal, then his physical condition must have changed in some ways, of course, this change is not necessarily caused by disease (often referred to as noise point). However, the occurrence and detection of abnormalities is an important starting point for disease prediction. Similar scenarios can also be applied to credit fraud, cyber attacks, and so on.

2 detection method of outliers

The general outlier detection methods include statistics-based methods, clustering-based methods, and some special outlier detection methods, which are introduced below.

1. Simple statistics

If we use pandas, we can directly use describe () to observe the statistical description of the data (just a cursory observation of some statistics), but the statistics are continuous, as follows:

Df.describe ()

Or the existence of outliers can be clearly observed by simply using scatter plots. As follows:

2. 3 ∂ principle

There is a condition in this principle: the data should obey the normal distribution. Under the principle of 3 ∂, if the abnormal value is more than 3 times the standard deviation, it can be regarded as an outlier. The probability of plus or minus 3 ∂ is 99.7%, then the probability of occurrence of values other than the average distance of 3 ∂ is P (| Xmuru | > 3 ∂)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report