R: fastest way to check presence of each element of a vector in each of the columns of a matrix -
i have integer vector a
a=function(l) as.integer(runif(l,1,600)) a(100) [1] 414 476 6 58 74 76 45 359 482 340 103 575 494 323 74 347 157 503 385 518 547 192 149 222 152 67 497 588 388 140 457 429 353 [34] 484 91 310 394 122 302 158 405 43 300 439 173 375 218 357 98 196 260 588 499 230 22 369 36 291 221 358 296 206 96 439 423 281 [67] 581 127 178 330 403 91 297 341 280 164 442 114 234 36 257 307 320 307 222 53 327 394 467 480 323 97 109 564 258 2 355 253 596 [100] 215
and integer matrix b
b=function(c) matrix(as.integer(runif(5*c,1,600)),nrow=5) b(10) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 250 411 181 345 4 519 167 395 130 388 [2,] 383 377 555 304 119 317 586 351 136 528 [3,] 238 262 513 476 579 145 461 191 262 302 [4,] 428 467 217 590 50 171 450 189 140 158 [5,] 178 14 31 148 285 365 515 64 166 584
and make new boolean l x c
matrix shows whether or not each vector element in a
present in each specific column of matrix b
.
i tried with
ispresent1 = function (a,b) { out = outer(a, b, fun = "==" ) apply(out,c(1,3),fun="any") }
or with
ispresent2 = function (a,b) t(sapply(1:length(a), function(i) apply(b,2,function(x) a[[i]] %in% x)))
but neither of these ways fast:
a1=a(1000) b1=b(20000) system.time(ispresent1(a1,b1)) user system elapsed 76.63 1.08 77.84 system.time(ispresent2(a1,b1)) user system elapsed 218.10 0.00 230.00
(in application matrix b
have 500 000 - 2 million columns)
probably trivial, proper way this?
edit: proper syntax, mentioned below, ispresent = function (a,b) apply(b,2,function(x) { %in% x } )
, rcpp
solution below still 2 times faster! this!
rcpp
awesome problems this. quite possible there way data.table
or existing function, inline
package takes less time write find out.
require(inline) ispresent.cpp <- cxxfunction(signature(a="integer", b="integer"), plugin="rcpp", body=' integervector av(a); integermatrix bm(b); int i,j,k; logicalmatrix out(av.size(), bm.ncol()); for(i = 0; < av.size(); i++){ for(j = 0; j < bm.ncol(); j++){ for(k = 0; k < bm.nrow() && av[i] != bm(k, j); k++); if(k < bm.nrow()) out(i, j) = true; } } return(out); ') set.seed(123) a1 <- a(1000) b1 <- b(20000) system.time(res.cpp <- ispresent.cpp(a1, b1))
user system elapsed 0.442 0.005 0.446
res1 <- ispresent1(a1,b1) identical(res1, res.cpp)
[1] true
Comments
Post a Comment