stata - how do I remove one factor level in R? -
i need drop variables data frame in r. data has column 18 factors:
- agriculture
- fisheries ...
- unclassified
i need remove factor #18 before creating dummy variables "the person x works in y industry". is, need keep first 17 levels (the classified levels)
in stata remove level
drop if rama1 == 99
(rama1 factor column , 99 "unclassified")
then create dummies in stata (one binary variable per industry) run:
quietly tabulate rama1, generate(rama1_)
that in r is:
for(i in unique(data$rama1)) { data[paste("type", i, sep="")] <- ifelse(data$rama1 == i, 1, 0) }
any ideas? highly welcome
to remove levels, either way approached bondeddust or jlhoward works fine. create dummy variables, depend on want/how want formulated.
for example, removed factor, want rows show <na>
or 0
.
base r
the easiest way using model.matrix
in base r. building on example bondeddust;
df <- data.frame(x=as.factor(sample(letters[1:5],100, replace=true)), y=1:100) # remove e , level is.na(df$x) <- df$x == "e" df$x <- factor(df$x)
yields this:
> head(df) x y 1 d 1 2 c 2 3 3 4 <na> 4 5 d 5 6 6
then, can run model.matrix dummy variables our factor level. default change nas 0.
> model.matrix(~x, df) (intercept) xb xc xd 1 1 0 0 1 2 1 0 1 0 3 1 0 0 0 5 1 0 0 1 6 1 0 0 0 8 1 1 0 0 9 1 0 0 0 11 1 0 0 0 12 1 0 1 0
caret
an alternative way use caret package, may give more power when running these factors/releveling across test/holdout models.
it contains dummyvars
function you.
> xx <- dummyvars(~x, df) > predict(xx, df) x.a x.b x.c x.d 1 0 0 0 1 2 0 0 1 0 3 1 0 0 0 4 na na na na 5 0 0 0 1 6 1 0 0 0 7 na na na na
Comments
Post a Comment