How to Cluster Sequential Categorical Data in R -
consider data set users can choose among 3 activities, , have data choice of first 10 activities. example data:
for (i in 1:10) { # sample list of 3 strings using set probability x <- sample( c("a", "b", "c"), 1000, replace=true, prob=c(0.5, 0.3, 0.2) ) # assign variable created on fly assign( paste("cat", i, sep=""), x ) } first10 <- data.frame(cat1, cat2, cat3, cat4, cat5, cat6, cat7, cat8, cat9, cat10)
what's best approach in r cluster users according activity sequence?
i've looked around on stackoverflow, , similar questions ask how cluster categorical data in r (which part of analysis), in , of doesn't account sequential nature of data. there r packages well-suited analysis?
look frequent itemset mining instead of clustering.
most clustering methods continuous numerical data, , assume vector field. take every position account.
a frequent pattern, however, may part if sequence, sequence may exhibit multiple (or none) of these patterns, , patterns may have gaps inbetween. of these properties desirable.
Comments
Post a Comment