R: fastest way to check presence of each element of a vector in each of the columns of a matrix -


i have integer vector a

a=function(l) as.integer(runif(l,1,600)) a(100)   [1] 414 476   6  58  74  76  45 359 482 340 103 575 494 323  74 347 157 503 385 518 547 192 149 222 152  67 497 588 388 140 457 429 353  [34] 484  91 310 394 122 302 158 405  43 300 439 173 375 218 357  98 196 260 588 499 230  22 369  36 291 221 358 296 206  96 439 423 281  [67] 581 127 178 330 403  91 297 341 280 164 442 114 234  36 257 307 320 307 222  53 327 394 467 480 323  97 109 564 258   2 355 253 596 [100] 215 

and integer matrix b

b=function(c) matrix(as.integer(runif(5*c,1,600)),nrow=5) b(10)      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]  250  411  181  345    4  519  167  395  130   388 [2,]  383  377  555  304  119  317  586  351  136   528 [3,]  238  262  513  476  579  145  461  191  262   302 [4,]  428  467  217  590   50  171  450  189  140   158 [5,]  178   14   31  148  285  365  515   64  166   584 

and make new boolean l x c matrix shows whether or not each vector element in a present in each specific column of matrix b.

i tried with

ispresent1 = function (a,b) {  out = outer(a, b, fun = "==" ) apply(out,c(1,3),fun="any") } 

or with

ispresent2 = function (a,b) t(sapply(1:length(a), function(i) apply(b,2,function(x) a[[i]] %in% x))) 

but neither of these ways fast:

a1=a(1000) b1=b(20000) system.time(ispresent1(a1,b1))    user  system elapsed    76.63    1.08   77.84   system.time(ispresent2(a1,b1))    user  system elapsed    218.10    0.00   230.00  

(in application matrix b have 500 000 - 2 million columns)

probably trivial, proper way this?

edit: proper syntax, mentioned below, ispresent = function (a,b) apply(b,2,function(x) { %in% x } ), rcpp solution below still 2 times faster! this!

rcpp awesome problems this. quite possible there way data.table or existing function, inline package takes less time write find out.

require(inline)  ispresent.cpp <- cxxfunction(signature(a="integer", b="integer"),                              plugin="rcpp", body='     integervector av(a);     integermatrix bm(b);     int i,j,k;     logicalmatrix out(av.size(), bm.ncol());     for(i = 0; < av.size(); i++){         for(j = 0; j < bm.ncol(); j++){             for(k = 0; k < bm.nrow() && av[i] != bm(k, j); k++);             if(k < bm.nrow()) out(i, j) = true;         }     }     return(out); ')  set.seed(123) a1 <- a(1000) b1 <- b(20000) system.time(res.cpp <- ispresent.cpp(a1, b1)) 
   user  system elapsed    0.442   0.005   0.446 
res1 <- ispresent1(a1,b1) identical(res1, res.cpp) 
[1] true 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -