3. Data Manipulation
3.1 Copying a SAS data set
Assume that data set DATA1 exists and you wish to create new variables, drop variables, subset the data set or perform some other manipulation of it. This requires a new DATA step in which you have to make available the information stored in DATA1 to SAS. There easiest way to do this is with the SET command.
data <<new data set name>>; run; |
example1
data weight3;
set weight; [1]
run;
proc print; [2]
run;
[1] create data set WEIGHT3 by copying all the variables from data set WEIGHT.
[2] print data set Weight3 to the OUT window. You can check if it is same as Weight.
Output of example1
The SAS System 10:14 Thursday, August 29, 2002 2
Obs name Week1 Week16
1 A 195 163
2 B 220 198
3 C 173 155
4 D 135 116
3.2.1. Transformations
There is no need to perform most data manipulations outside of The SAS System. Creation of new variables, calculations with existing variables, subsetting of data, sorting, etc. are best done within SAS.
example2
data weight3;
set weight;
logweek1=log(week1);[3]
rootwk16=sqrt(week16); [4]
nothing=week1*rootwk16;[5]
run;
proc print;run;
[3] create new variable logweek1 which is logarithm of week1.
[4] create new variable rootwk2 which is root of week2.
[5] create new variable nothing which is multiplication of week1 and rootwk2.
Output of example2
The SAS System 10:14 Thursday, August 29, 2002 3
Obs name Week1 Week16 logweek1 rootwk16 nothing
1 A 195 163 5.27300 12.7671 2489.59
2 B 220 198 5.39363 14.0712 3095.67
3 C 173 155 5.15329 12.4499 2153.83
4 D 135 116 4.90527 10.7703 1453.99
*For more information about mathematical, trigonometric, and other functions see the Help Files (go to Help-Extended Help, then select SAS System Help, select SAS Language, select SAS Functions, select Function Categories)
Logical comparisons in SAS return the numeric value 1 if true, 0 otherwise.
example3
data weight4;
set weight;
group=(week1<=150);[6]
run;
proc print;
run;
[6] create new variable group – if week1<=150 group=1, else group=0
output of example3
The SAS System 10:14 Thursday, August 29, 2002 4
Obs name Week1 Week16 group
1 A 195 163 0
2 B 220 198 0
3 C 173 155 0
4 D 135 116 1
example4
data weight4;
set weight;
if week1<=150 then group=1; [7]
else group=0;
run;
proc print; run;
[7] create new variable group – if week1<=150 group=1, else group=0. example3 and example 4 give same output.
3.3 Dropping and keeping variables from a data set
Variables are dropped from a data set with the DROP statement or the DROP= option in DATA step. To drop the variables week16 from the weight data the two syntax constructs are
example5
data weight5;
set weight;
drop week16; [8]
run;
proc print; run;
[8] drop variable week16 in the data set weight5.
Output of example5
The SAS System 10:14 Thursday, August 29, 2002 5
Obs name Week1
1 A 195
2 B 220
3 C 173
4 D 135
example6
data weight5;
set weight(drop=week16); [9]
run;
[9] drop week16, it gives same output as example5.
The complementary DATA step command and data set option to DROP (DROP=) are the KEEP statement and KEEP= option. Only the variables listed after KEEP (KEEP=) are being kept in the data set. All others are eliminated.
example7
data weight6; set weight(keep=Name Week1); run; proc print; run;
data weight6; set weight; keep name week1; run; proc print; run;
output of example7
The SAS System 10:14 Thursday, August 29, 2002 6
Obs name Week1
1 A 195
2 B 220
3 C 173
4 D 135
3.4. Dropping observations from a data set (Subsetting data)
Dropping observations (subsetting data) means to retain only those observations that satisfy a certain conditions. This is accomplished with IF and WHERE statements as well as the WHERE= data set option
example8
data weight;
set weight;
if week1>150; [10]
run; proc print; run;
example9;
data weight;
set weight;
if week1<=150 then delete; [10]
run; proc print; run;
[10] keep the observations whose week1 value is greater than 150.
Output of example8 and example9
The SAS System 10:14 Thursday, August 29, 2002 7
Obs name Week1 Week16
1 A 195 163
2 B 220 198
3 C 173 155
3.5. Setting and merging multiple data sets
Setting
data sets means concatenating their contents vertically. Merging data means combining two or more data sets horizontally.Data << new data set name>>; Set << dataset1 dataset2>>; Run; Data <<new dataset name>>; Merge <<dataset1 dataset2>>; Run; |
An example will make this clearer.
example10
data one;
input group age @@;
cards;
1 20 1 30 1 42 2 25 2 39 2 29
; run;
data two;
input group age @@;
cards;
1 33 1 41 2 21 2 26
;run;
data three; set one two; run;
proc print data=three; run;
output of example10
Obs group age
1 1 20
2 1 30
3 1 42
4 2 25
5 2 39
6 2 29
7 1 33
8 1 41
9 2 21
10 2 26
example11
Data one;
input id sex $ age;
cards;
1 F 35 17
input id x y;
cards;
1 12 50
17 11 43
33 19 33
49 22 42
65 16 45
;
data three; merge one two; run;
proc print data=three; run;
output of example11
The SAS System 10:14 Thursday, August 29, 2002 9
Obs id sex age x y
1 1 F 35 12 50
2 17 M 50 11 43
3 33 F 45 19 33
4 49 M 24 22 42
5 65 F 52 16 45