r - How to check the consistency of specific variables from different IDs registered repeated times in a dataframe -
i know how can detect if different individuals captured repeated times have same value in specific variables along different measures.
specifically, have repeated measures of individuals (column id) values of different variables along time (e.g. sex, weight)
i check individuals assigned time same sex, having reference last measure because measure reliable.
later store every row or register mismatch references in 1 dataframe.
id <- c("1", "2", "3", "1", "2", "3", "1", "2", "3") sex <- c("m", "f", "m", "m", "m", "m", "f", "f", "m") weight <- c(20, 15, 30, 22, 18, 32, 26, 21, 36) time <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) df <- data.frame (id, sex, weight, time) df
to that, have selected last register of each id
library (data.table) dt <- as.data.table (df) dt_last_register <- dt [, .sd[c(.n)], = id] dt_last_register
and create loop each id select registers not match, storing these registers in new dataframe (e.g. df_no_match)
# create vector ids id_vector <- unique (df$id) # create loop (i in 1:length(id_vector) x <- id_vector [i] df_subset <- subset (df$id==x) # select registers of 1 individual ... ...
i don't know how follow step, , check registers of each individual. know how it?
finally, change values of variable sex register haven't matched reference, , store database changes in new dataframe. e.g df_final
id <- c("1", "2", "3", "1", "2", "3", "1", "2", "3") sex <- c("f", "f", "m", "f", "f", "m", "f", "f", "m") weight <- c(20, 15, 30, 22, 18, 32, 26, 21, 36) time <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) df_final <- data.frame (id, sex, weight, time) df_final
thanks in advance
i'm not 100% clear on goal, seems work. key self-merging data.table
.
library(data.table) setdt(df) #get gender of final observation each id df[df[,sex[.n],by=id], recent_sex:=(i.v1), on="id"] #find if there mismatches id df[,mismatch:=any(recent_sex!=sex), by=id] #overwrite erroneous genders df[,sex_new:=recent_sex]
if want separate mismatched observations, do
df_mismatches<-df[(mismatch)]
(note parentheses necessary force [.data.table
interpret mismatch
logical vector, otherwise expects mismatch
data.table
we're merging df
)
Comments
Post a Comment