In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces what is the method of data cleaning, the article is very detailed, has a certain reference value, interested friends must read it!
Data cleaning methods include: 1. Split-box method, put the data to be processed into the box according to certain rules, then test the data in each box, and take methods to deal with the data according to the actual situation of each box in the data. 2. The regression method makes use of the data of the function to draw the image, and then smooth the image. 3. Clustering method.
The operating environment of this tutorial: windows7 system, Dell G3 computer.
Nowadays, science and technology have developed unprecedentedly, and it is for this reason that a lot of science and technology have made great progress. In recent years, there have been many nouns, such as big data, Internet of things, cloud computing, artificial intelligence and so on. Among them, big data's popularity is the highest, because now many industries have accumulated huge original data, and the data that is helpful to the decision-making of enterprises can be obtained through data analysis. Big data's technology can be better than the traditional data analysis technology.
However, big data can not do without data analysis, data analysis can not be separated from data, there are a lot of massive data we need data, but also a lot of data we do not need. Just as there is nothing completely pure in the world, there will be impurities in the data, which requires us to clean the data to ensure the reliability of the data.
Generally speaking, there is noise in the data, so how is the noise cleaned? In this article, we will introduce the methods of data cleaning.
Generally speaking, there are three methods to clean data, namely, box-dividing method, clustering method and regression method. Each of these three methods has its own advantages and can clean up the noise in all directions.
The box-splitting method is a method that is often used. The so-called box-splitting method is to put the data that needs to be processed into the box according to certain rules, and then test the data in each box, and take methods to deal with the data according to the actual situation of each box in the data. See here many friends only a little understand, but do not know how to divide the box. How to divide the boxes? We can divide the boxes according to the number of rows recorded, so that each box has the same number of records.
Or we can set the range of each box to a constant so that we can divide the box according to the range of the interval. In fact, we can also customize the interval for sub-box. All three ways are possible. By dividing the box number, we can calculate the average and median of each box, or use the extreme value to draw the broken line graph. generally speaking, the larger the width of the broken line graph, the more obvious the smoothness.
The regression method uses the data of the function to draw the image, and then smooth the image. There are two kinds of regression methods, one is single linear regression, the other is multi-linear regression. Single linear regression is to find the best straight line of two attributes, which can predict the other from one attribute. Multi-linear regression is to find many attributes to fit the data into a multi-dimensional surface, so that noise can be eliminated.
The workflow of clustering method is relatively simple, but it is really complicated to operate. the so-called clustering method is to group abstract objects into different sets to find unexpected isolated points in the set, which are noise. In this way, the noise can be found directly and then cleared.
We have introduced the methods of data cleaning one by one, that is, box separation method, regression method and clustering method. Each method has its own unique advantages, which makes data cleaning work can be carried out smoothly. Therefore, mastering these methods is helpful to our later data analysis work.
The above is all the contents of this article "what are the methods of data cleaning?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.