In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
What are the methods of data cleaning, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
The methods of data cleaning include: 1, the method of dividing boxes, that is, putting the data to be processed into the box according to certain rules, and then testing it; 2, the regression method, that is, using the data of the function to draw the image, and then smoothing the image; 3, clustering method, that is, the abstract objects are grouped into different sets to find unexpected isolated points in the set.
What are the methods of data cleaning?
There are three methods to clean the data, namely, box separation method, clustering method and regression method.
1. Split-box method
It is a frequently used method, the so-called box-splitting method, which puts the data that needs to be processed into the box according to certain rules, and then tests the data in each box, and takes methods to deal with the data according to the actual situation of each box in the data.
2. Regression method
The regression method uses the data of the function to draw the image, and then smooth the image. There are two kinds of regression methods, one is single linear regression, the other is multi-linear regression. Single linear regression is to find the best straight line of two attributes, which can predict the other from one attribute. Multi-linear regression is to find many attributes to fit the data into a multi-dimensional surface, so that noise can be eliminated.
3. Clustering method
The workflow of clustering method is relatively simple, but it is really complicated to operate. the so-called clustering method is to group abstract objects into different sets to find unexpected isolated points in the set, which are noise. In this way, the noise can be found directly and then cleared.
Extended data:
Data cleaning can also be seen from the name is to "wash off" the "dirty", refers to the last program to find and correct identifiable errors in the data file, including checking data consistency, dealing with invalid values and missing values, etc.
Because the data in the data warehouse is a collection of data oriented to a certain topic, which is extracted from multiple business systems and contains historical data, it is inevitable that some data are wrong data and some data conflict with each other. These wrong or conflicting data are obviously we do not want, called "dirty data".
We have to "wash off" the "dirty data" according to certain rules, which is data cleaning. The task of data cleaning is to filter the data that do not meet the requirements, and give the filtering results to the business department to confirm whether it is filtered or corrected by the business unit before extraction.
The data that does not meet the requirements are mainly divided into three categories: incomplete data, wrong data and duplicate data. Data cleaning is different from the questionnaire audit, the data cleaning after input is generally done by the computer rather than manually.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.