In R, apply() is NOT faster than a loop!!
I don't know where I pick up this perception about apply() being faster than a loop in R. For a long time, I always think that apply() runs a function (for example, mean()) on a data structure (row or column) in a single shot. So take the mean of each row in the same time. And a loop does it n.row or n.column times. So assuming running a single shot of apply() does not affect too much computing power, I thought apply() should at most n.row or n.column times faster than a loop.
But I am WRONG! In R, apply() is NOT faster than a loop!!
Here is a small simulation I did to test the speed of the two functions. After 1000 simulation, on average, apply() consums twice more system time than a loop does. I thought maybe it is because I did not have two equivalently competing code of apply() and a loop. So I do some search on Here is the result.
According to Professor Brian Ripley, "apply() is just a wrapper for a loop." The only advantage for using apply() is that it makes your code neater!
Bummer! I had it wrong for a long time.
1 month ago
But if you try this you can find that sapply() run it a little faster.
> play <- function(x){
+ return(exp(x)+sin(x)-rnorm(1))
+ }
> b <- rnorm(1000)
> time <- proc.time()
> temp <- sapply(b,FUN="play")
> print(proc.time()-time)
user system elapsed
0.032 0.000 0.208
> time <- proc.time()
> temp <- numeric(length(b))
> for(i in 1:length(b)){
+ temp[i] <- play(b[i])
+ }
> print(proc.time()-time)
user system elapsed
0.048 0.000 0.519
I have compared apply() vs sapply(). sapply() is no faster than apply(). I guess for your example, maybe you should do a MC test (see my example). I guess a loop is no slower if not faster than apply().
A more comprehensive study is done here.
I'm very interested in the problem. Could you please give me the link to what Professor Brian Ripley said in your original post? Thank you.
I don't know why but I feel your examples are not comparable.
I revise your code as:
Now loop is faster than apply().
As for Prof. Ripley's comment, see my original post and look for here is the result. There is a hyper link for the word "result".
I really wonder how you can let applytest2.R run correctly without specifying the "temp" variable in FUN.loop and generate any results.
Besides, please note that length(b)=10,000 but you were using 1,000 loops for the for function.
The performances will at least equal after the update.
play <- function(a){
+ exp(a)+sin(a)+rnorm(1)
+ }
> b <- rnorm(10000)
> FUN.apply <- function(x) sapply(x, FUN="play")
> FUN.loop <- function(x){
+ temp <- numeric(length(x))
+ for(j in 1:10000){
+ temp[j] <- play(x[j])
+ }
+ return(temp)
+ }
> system.time(FUN.apply(b))
user system elapsed
0.184 0.000 0.182
> system.time(FUN.loop(b))
user system elapsed
0.184 0.000 0.202
Oops, my fault!
But at least we reach a conclusion that loop is not slower if not faster than apply().
There was a time when the apply family of functions was a lot faster, but that's no longer true. These days, performance is usually equivalent. Me, I use them no matter what, since the code is a lot cleaner.
That said, in the special case where apply works (lapply almost always works, but row- or columnwise seems a special case), it is faster in my experience. Try something real-world. Though maybe it's a platform or OS issue. We've already determined you're a 32-bit Windows person; 64-bit Debian maybe is different.
Post a Comment