In R, apply() is NOT faster than a loop!!

I don't know where I pick up this perception about apply() being faster than a loop in R. For a long time, I always think that apply() runs a function (for example, mean()) on a data structure (row or column) in a single shot. So take the mean of each row in the same time. And a loop does it n.row or n.column times. So assuming running a single shot of apply() does not affect too much computing power, I thought apply() should at most n.row or n.column times faster than a loop.

But I am WRONG! In R, apply() is NOT faster than a loop!!

Here is a small simulation I did to test the speed of the two functions. After 1000 simulation, on average, apply() consums twice more system time than a loop does. I thought maybe it is because I did not have two equivalently competing code of apply() and a loop. So I do some search on Rseek.org. Here is the result.

According to Professor Brian Ripley, "apply() is just a wrapper for a loop." The only advantage for using apply() is that it makes your code neater!

Bummer! I had it wrong for a long time.

1 day ago

## 7 comments:

Hi,

But if you try this you can find that sapply() run it a little faster.

> play <- function(x){

+ return(exp(x)+sin(x)-rnorm(1))

+ }

>

> b <- rnorm(1000)

>

> time <- proc.time()

> temp <- sapply(b,FUN="play")

> print(proc.time()-time)

user system elapsed

0.032 0.000 0.208

>

>

> time <- proc.time()

> temp <- numeric(length(b))

> for(i in 1:length(b)){

+ temp[i] <- play(b[i])

+ }

> print(proc.time()-time)

user system elapsed

0.048 0.000 0.519

>

Hi,

I have compared apply() vs sapply(). sapply() is no faster than apply(). I guess for your example, maybe you should do a MC test (see my example). I guess a loop is no slower if not faster than apply().

Hey,

A more comprehensive study is done here. http://hesen.peng.googlepages.com/haha.r

I'm very interested in the problem. Could you please give me the link to what Professor Brian Ripley said in your original post? Thank you.

Hi,

I don't know why but I feel your examples are not comparable.

I revise your code as:

http://yusung.googlepages.com/applytest2.R

Now loop is faster than apply().

As for Prof. Ripley's comment, see my original post and look for here is the result. There is a hyper link for the word "result".

I really wonder how you can let applytest2.R run correctly without specifying the "temp" variable in FUN.loop and generate any results.

Besides, please note that length(b)=10,000 but you were using 1,000 loops for the for function.

The performances will at least equal after the update.

play <- function(a){

+ exp(a)+sin(a)+rnorm(1)

+ }

> b <- rnorm(10000)

>

>

> FUN.apply <- function(x) sapply(x, FUN="play")

> FUN.loop <- function(x){

+ temp <- numeric(length(x))

+ for(j in 1:10000){

+ temp[j] <- play(x[j])

+ }

+ return(temp)

+ }

>

> system.time(FUN.apply(b))

user system elapsed

0.184 0.000 0.182

> system.time(FUN.loop(b))

user system elapsed

0.184 0.000 0.202

>

Oops, my fault!

But at least we reach a conclusion that loop is not slower if not faster than apply().

There was a time when the apply family of functions was a lot faster, but that's no longer true. These days, performance is usually equivalent. Me, I use them no matter what, since the code is a lot cleaner.

That said, in the special case where apply works (lapply almost always works, but row- or columnwise seems a special case), it is faster in my experience. Try something real-world. Though maybe it's a platform or OS issue. We've already determined you're a 32-bit Windows person; 64-bit Debian maybe is different.

Post a Comment