Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Application of proc sql statement in data cleaning

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "the use of proc sql statements in data cleaning". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Libname clean "c:/books/clean"; / * define permanent database * /

* create a new sample data one

Data one

Input X Y Z

Datalines

1 2 3

101 202 303

44 55 66

444 555 666

Title "Values of X from data set ONE where X is greater than 100"

/ * conditional selection of observations * /

Proc sql

Select X

From one

Where X gt 100

Quit

* Program 8-2

* check values of invalid character types

Title "Checking for Invalid Character Data"

Proc sql

Select Patno

Gender

DX

AE

From clean.patients

Where Gender not in ('Mauremen.'') or

Notdigit (trim (DX)) and not missing (DX) or

AE not in ('014, 1', 1')

Quit

* check for invalid numeric values

Title "Checking for out-of-range numeric values"

Proc sql

Select Patno

HR

SBP

DBP

From clean.patients

Where HR not between 40 and 100 and not missing (HR) or

SBP not between 80 and 200 and not missing (SBP) or

DBP not between 60 and 120 and not missing (DBP)

Quit

* use a simple algorithm to check values based on standard deviation

Title "Data values beyond two standard deviations"

Proc sql

Select Patno

SBP

From clean.patients

Having SBP not between mean (SBP)-2 * std (SBP) and

Mean (SBP) + 2 * std (SBP) and

SBP is not missing

Quit

* check for missing values

Options linesize=84

Title "Observations with missing values"

Proc sql

Select *

From clean.patients

Where Patno is missing or

Gender is missing or

Visit is missing or

HR is missing or

SBP is missing or

DBP is missing or

DX is missing or

AE is missing

Quit

* date of inspection

Title "Dates before June 1, 1998 or after October 15, 1999"

Proc sql

Select Patno

Visit

From clean.patients

Where Visit not between '01jun1998'd and' 15oct1999'd and

Visit is not missing

Quit

* check for duplicate values

Title "Duplicate Patient Numbers"

Proc sql

Select Patno

Visit

From clean.patients

Group by Patno

Having count (Patno) gt 1

Quit

* identify variables corresponding to multiple observations

Title "Listing of patients who do not have two visits"

Proc sql

Select Patno

Visit

From clean.patients2

Group by Patno

Having count (Patno) ne 2

Quit

* check the required serial number ID in the two files

Data one

Input Patno X Y

Datalines

1 69 79

2 56.

3 66 99

5 98 87

12 13 14

Data two

Input Patno Z

Datalines

1 56

3 67

4 88

5 98

13 99

* ID that is not contained in both files

Title "Patient numbers not in both files"

Proc sql

Select One.patno as ID_one

Two.patno as ID_two

From one full join two

On One.patno eq Two.patno

Where One.patno is missing or Two.patno is missing

Quit

This is the end of the content of "the application of proc sql statement in data cleaning". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report