Thursday, April 24, 2008

speed issue in R computing: apply() vs a loop

In R, apply() is NOT faster than a loop!!

I don't know where I pick up this perception about apply() being faster than a loop in R. For a long time, I always think that apply() runs a function (for example, mean()) on a data structure (row or column) in a single shot. So take the mean of each row in the same time. And a loop does it n.row or n.column times. So assuming running a single shot of apply() does not affect too much computing power, I thought apply() should at most n.row or n.column times faster than a loop.

But I am WRONG! In R, apply() is NOT faster than a loop!!

Here is a small simulation I did to test the speed of the two functions. After 1000 simulation, on average, apply() consums twice more system time than a loop does. I thought maybe it is because I did not have two equivalently competing code of apply() and a loop. So I do some search on Rseek.org. Here is the result.

According to Professor Brian Ripley, "apply() is just a wrapper for a loop." The only advantage for using apply() is that it makes your code neater!

Bummer! I had it wrong for a long time.

7 comments:

Hesen Peng said...

Hi,

But if you try this you can find that sapply() run it a little faster.

> play <- function(x){
+ return(exp(x)+sin(x)-rnorm(1))
+ }
>
> b <- rnorm(1000)
>
> time <- proc.time()
> temp <- sapply(b,FUN="play")
> print(proc.time()-time)
user system elapsed
0.032 0.000 0.208
>
>
> time <- proc.time()
> temp <- numeric(length(b))
> for(i in 1:length(b)){
+ temp[i] <- play(b[i])
+ }
> print(proc.time()-time)
user system elapsed
0.048 0.000 0.519
>

Yu-Sung Su said...

Hi,

I have compared apply() vs sapply(). sapply() is no faster than apply(). I guess for your example, maybe you should do a MC test (see my example). I guess a loop is no slower if not faster than apply().

Hesen Peng said...

Hey,

A more comprehensive study is done here. http://hesen.peng.googlepages.com/haha.r

I'm very interested in the problem. Could you please give me the link to what Professor Brian Ripley said in your original post? Thank you.

Yu-Sung Su said...

Hi,

I don't know why but I feel your examples are not comparable.

I revise your code as:
http://yusung.googlepages.com/applytest2.R

Now loop is faster than apply().

As for Prof. Ripley's comment, see my original post and look for here is the result. There is a hyper link for the word "result".

Hesen Peng said...

I really wonder how you can let applytest2.R run correctly without specifying the "temp" variable in FUN.loop and generate any results.

Besides, please note that length(b)=10,000 but you were using 1,000 loops for the for function.

The performances will at least equal after the update.

play <- function(a){
+ exp(a)+sin(a)+rnorm(1)
+ }
> b <- rnorm(10000)
>
>
> FUN.apply <- function(x) sapply(x, FUN="play")
> FUN.loop <- function(x){
+ temp <- numeric(length(x))
+ for(j in 1:10000){
+ temp[j] <- play(x[j])
+ }
+ return(temp)
+ }
>
> system.time(FUN.apply(b))
user system elapsed
0.184 0.000 0.182
> system.time(FUN.loop(b))
user system elapsed
0.184 0.000 0.202
>

Yu-Sung Su said...

Oops, my fault!

But at least we reach a conclusion that loop is not slower if not faster than apply().

wcw said...

There was a time when the apply family of functions was a lot faster, but that's no longer true. These days, performance is usually equivalent. Me, I use them no matter what, since the code is a lot cleaner.

That said, in the special case where apply works (lapply almost always works, but row- or columnwise seems a special case), it is faster in my experience. Try something real-world. Though maybe it's a platform or OS issue. We've already determined you're a 32-bit Windows person; 64-bit Debian maybe is different.