Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does the data preprocessing in the computer include?

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces what the data preprocessing in the computer includes, which is very detailed and has a certain reference value. Friends who are interested must finish reading it!

The content of data preprocessing: 1, data audit, which can be divided into four aspects: accuracy audit, applicability audit, timely audit and consistency audit; 2, data screening, the errors found in the audit process should be corrected as far as possible; 3. Sort the data and arrange the data in a certain order.

The operating environment of this tutorial: windows7 system, Dell G3 computer.

Data preprocessing (data preprocessing) refers to the processing of data before the main processing. For example, before the conversion or enhancement of most geophysical area observation data, the irregularly distributed survey network is transformed into regular network by interpolation to facilitate the operation of the computer. In addition, for some profile measurement data, such as seismic data preprocessing, there are vertical stack, rearrangement, adding head, editing, resampling, multi-channel editing and so on.

Data preprocessing refers to the necessary processing such as audit, screening and sorting before classifying or grouping the collected data.

Preprocessing content

1. Data audit

The statistical data obtained from different sources are different in the contents and methods of the audit.

The original data should be reviewed mainly from two aspects: completeness and accuracy. The main purpose of the integrity audit is to check whether there are omissions in the units or individuals to be investigated, and whether all the investigation items or indicators are completed. Accuracy audit mainly includes two aspects: one is to check whether the data truly reflect the actual situation and whether the content is in line with the reality; the other is to check whether the data is wrong and whether the calculation is correct. The main methods to check the accuracy of data are logical check and calculation check. Logical check is mainly to check whether the data is logical, whether the content is reasonable, and whether there is any contradiction between items or figures. This method is mainly suitable for the audit of qualitative (quality) data. Calculation check is to check whether each item of data in the questionnaire has errors in the calculation results and calculation methods, which is mainly used for the audit of quantitative (numerical) data.

For second-hand information obtained through other channels, in addition to reviewing its completeness and accuracy, emphasis should also be placed on the applicability and timeliness of the data. Second-hand information can come from a variety of sources, some of which may have been obtained through special surveys for specific purposes, or may have been processed in accordance with the needs of specific purposes. For users, first of all, they should find out the source of the data, the caliber of the data and the relevant background information, in order to determine whether these data meet the needs of their own analysis and research, whether they need to be reprocessed, and not copy them blindly. In addition, it is necessary to audit the timeliness of the data. For some problems with strong timeliness, if the data obtained are too lagging behind, it may lose the significance of the research. In general, the latest statistics should be used as much as possible. After the data has been audited, it is confirmed that it is suitable for the actual needs, so that it is necessary to do further processing.

The content of data audit mainly includes the following four aspects:

Accuracy audit. The main purpose of the audit is to check the data from the point of view of the authenticity and accuracy of the data, and the focus of the audit is to check the errors occurred in the process of investigation.

Applicability review. The main purpose is to check the extent to which the data explain the problem according to the use of the data. It includes whether the data matches the subject of the survey, the definition of the target as a whole, and the interpretation of the survey project.

Timely audit. The main purpose is to check whether the data is submitted in accordance with the prescribed time, and if it is not submitted at the prescribed time, it is necessary to check the reasons why it has not been submitted in time.

Conformance audit. The main purpose is to check whether the data are comparable in different regions or countries and in different time periods.

2. Data screening

Errors found in the audit process should be corrected as far as possible. At the end of the survey, when the errors found in the data can not be corrected, or when some data do not meet the requirements of the survey and cannot be made up, the data need to be screened. Data screening includes two aspects: one is to eliminate some data that do not meet the requirements or have obvious errors; the other is to filter out the data that meet certain specific conditions and eliminate the data that do not meet specific conditions. Data screening is very important in market research, economic analysis and management decision-making.

3. Data sorting

Data sorting is to arrange the data in a certain order, so that researchers can find some obvious characteristics or trends by browsing the data and find clues to solve the problem. In addition, sorting also helps to check and correct the data, and provides a basis for reclassification or grouping. In some cases, sorting itself is one of the purposes of analysis. Sorting can be easily done with the help of a computer.

For classified data, if it is alphabetical data, the sort can be divided into ascending order and descending order, but the customary ascending order is more commonly used, because the ascending order is the same as the natural arrangement of letters; if it is Chinese character data, there are many sorting ways. for example, it is arranged according to the first phonetic alphabet of Chinese characters, which is exactly the same as that of alphabetical data, and it can also be sorted by strokes, which can also be divided into ascending and descending order of strokes. The alternating use of different sorting methods is very useful in the process of checking and correcting Chinese font data.

For numerical data, there are only two kinds of sorting, namely, increasing and decreasing. The sorted data is also called order statistics.

The above is all the contents of the article "what does the data preprocessing in the computer include?" Thank you for your reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report