In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "the use of proc sql statements in data cleaning". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Libname clean "c:/books/clean"; / * define permanent database * /
* create a new sample data one
Data one
Input X Y Z
Datalines
1 2 3
101 202 303
44 55 66
444 555 666
Title "Values of X from data set ONE where X is greater than 100"
/ * conditional selection of observations * /
Proc sql
Select X
From one
Where X gt 100
Quit
* Program 8-2
* check values of invalid character types
Title "Checking for Invalid Character Data"
Proc sql
Select Patno
Gender
DX
AE
From clean.patients
Where Gender not in ('Mauremen.'') or
Notdigit (trim (DX)) and not missing (DX) or
AE not in ('014, 1', 1')
Quit
* check for invalid numeric values
Title "Checking for out-of-range numeric values"
Proc sql
Select Patno
HR
SBP
DBP
From clean.patients
Where HR not between 40 and 100 and not missing (HR) or
SBP not between 80 and 200 and not missing (SBP) or
DBP not between 60 and 120 and not missing (DBP)
Quit
* use a simple algorithm to check values based on standard deviation
Title "Data values beyond two standard deviations"
Proc sql
Select Patno
SBP
From clean.patients
Having SBP not between mean (SBP)-2 * std (SBP) and
Mean (SBP) + 2 * std (SBP) and
SBP is not missing
Quit
* check for missing values
Options linesize=84
Title "Observations with missing values"
Proc sql
Select *
From clean.patients
Where Patno is missing or
Gender is missing or
Visit is missing or
HR is missing or
SBP is missing or
DBP is missing or
DX is missing or
AE is missing
Quit
* date of inspection
Title "Dates before June 1, 1998 or after October 15, 1999"
Proc sql
Select Patno
Visit
From clean.patients
Where Visit not between '01jun1998'd and' 15oct1999'd and
Visit is not missing
Quit
* check for duplicate values
Title "Duplicate Patient Numbers"
Proc sql
Select Patno
Visit
From clean.patients
Group by Patno
Having count (Patno) gt 1
Quit
* identify variables corresponding to multiple observations
Title "Listing of patients who do not have two visits"
Proc sql
Select Patno
Visit
From clean.patients2
Group by Patno
Having count (Patno) ne 2
Quit
* check the required serial number ID in the two files
Data one
Input Patno X Y
Datalines
1 69 79
2 56.
3 66 99
5 98 87
12 13 14
Data two
Input Patno Z
Datalines
1 56
3 67
4 88
5 98
13 99
* ID that is not contained in both files
Title "Patient numbers not in both files"
Proc sql
Select One.patno as ID_one
Two.patno as ID_two
From one full join two
On One.patno eq Two.patno
Where One.patno is missing or Two.patno is missing
Quit
This is the end of the content of "the application of proc sql statement in data cleaning". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.