3. Data Manipulation

3.1 Copying a SAS data set

Assume that data set DATA1 exists and you wish to create new variables, drop variables, subset the data set or perform some other manipulation of it. This requires a new DATA step in which you have to make available the information stored in DATA1 to SAS. There easiest way to do this is with the SET command.

data <<new data set name>>;
   set DATA1;
  << more statements>>
run;

example1

data weight3;

set weight; [1]

run;

proc print; [2]

run;

[1] create data set WEIGHT3 by copying all the variables from data set WEIGHT.

[2] print data set Weight3 to the OUT window. You can check if it is same as Weight.

Output of example1

The SAS System 10:14 Thursday, August 29, 2002 2

Obs name Week1 Week16

1 A 195 163

2 B 220 198

3 C 173 155

4 D 135 116

 

3.2. Creating new variables
3.2.1. Transformations

There is no need to perform most data manipulations outside of The SAS System. Creation of new variables, calculations with existing variables, subsetting of data, sorting, etc. are best done within SAS.

example2

data weight3;

set weight;

logweek1=log(week1);[3]

rootwk16=sqrt(week16); [4]

nothing=week1*rootwk16;[5]

run;

proc print;run;

[3] create new variable logweek1 which is logarithm of week1.

[4] create new variable rootwk2 which is root of week2.

[5] create new variable nothing which is multiplication of week1 and rootwk2.

Output of example2

The SAS System 10:14 Thursday, August 29, 2002 3

Obs name Week1 Week16 logweek1 rootwk16 nothing

1 A 195 163 5.27300 12.7671 2489.59

2 B 220 198 5.39363 14.0712 3095.67

3 C 173 155 5.15329 12.4499 2153.83

4 D 135 116 4.90527 10.7703 1453.99

*For more information about mathematical, trigonometric, and other functions see the Help Files (go to Help-Extended Help, then select SAS System Help, select SAS Language, select SAS Functions, select Function Categories)

 

3.2.2. Logical expressions

Logical comparisons in SAS return the numeric value 1 if true, 0 otherwise.

example3

data weight4;

set weight;

group=(week1<=150);[6]

run;

proc print;

run;

[6] create new variable group – if week1<=150 group=1, else group=0

output of example3

The SAS System 10:14 Thursday, August 29, 2002 4

Obs name Week1 Week16 group

1 A 195 163 0

2 B 220 198 0

3 C 173 155 0

4 D 135 116 1

example4

data weight4;

set weight;

if week1<=150 then group=1; [7]

else group=0;

run;

proc print; run;

[7] create new variable group – if week1<=150 group=1, else group=0. example3 and example 4 give same output.

 

3.3 Dropping and keeping variables from a data set

Variables are dropped from a data set with the DROP statement or the DROP= option in DATA step. To drop the variables week16 from the weight data the two syntax constructs are

example5

data weight5;

set weight;
   drop week16; [8]
run;

proc print; run;

[8] drop variable week16 in the data set weight5.

Output of example5

The SAS System 10:14 Thursday, August 29, 2002 5

Obs name Week1

1 A 195

2 B 220

3 C 173

4 D 135

example6

data weight5;

set weight(drop=week16); [9]
run;

[9] drop week16, it gives same output as example5.

The complementary DATA step command and data set option to DROP (DROP=) are the KEEP statement and KEEP= option. Only the variables listed after KEEP (KEEP=) are being kept in the data set. All others are eliminated.

example7

data weight6; set weight(keep=Name Week1); run; proc print; run;

data weight6; set weight; keep name week1; run; proc print; run;

output of example7

The SAS System 10:14 Thursday, August 29, 2002 6

Obs name Week1

1 A 195

2 B 220

3 C 173

4 D 135

3.4. Dropping observations from a data set (Subsetting data)

Dropping observations (subsetting data) means to retain only those observations that satisfy a certain conditions. This is accomplished with IF and WHERE statements as well as the WHERE= data set option

example8

data weight;

set weight;

if week1>150; [10]

run; proc print; run;

example9;

data weight;

set weight;

if week1<=150 then delete; [10]

run; proc print; run;

[10] keep the observations whose week1 value is greater than 150.

Output of example8 and example9

The SAS System 10:14 Thursday, August 29, 2002 7

Obs name Week1 Week16

1 A 195 163

2 B 220 198

3 C 173 155

3.5. Setting and merging multiple data sets

Setting data sets means concatenating their contents vertically. Merging data means combining two or more data sets horizontally.

Data << new data set name>>;

Set << dataset1 dataset2>>;

Run;

Data <<new dataset name>>;

Merge <<dataset1 dataset2>>;

Run;

An example will make this clearer.

example10

data one;

input group age @@;

cards;

1 20 1 30 1 42 2 25 2 39 2 29

; run;

data two;

input group age @@;

cards;

1 33 1 41 2 21 2 26

;run;

data three; set one two; run;

proc print data=three; run;

output of example10

Obs group age

1 1 20

2 1 30

3 1 42

4 2 25

5 2 39

6 2 29

7 1 33

8 1 41

9 2 21

10 2 26

example11

Data one;

input id sex $ age;

cards;

 1  F  35 17
17  M  50 14
33  F  45  6
49  M  24 14
65  F  52  9
;
data two;

input id x y;

cards;

1 12 50

17 11 43

33 19 33

49 22 42

65 16 45

;

data three; merge one two; run;

proc print data=three; run;

output of example11

The SAS System 10:14 Thursday, August 29, 2002 9

Obs id sex age x y

1 1 F 35 12 50

2 17 M 50 11 43

3 33 F 45 19 33

4 49 M 24 22 42

5 65 F 52 16 45