Monday, June 16, 2008

Speed up R! Make R Run Faster!

This is another example of why defaults matter a lot.

I got an email of Evan Cooch forward by Matt, saying that there exists a trick to speed up R matrix caculation. He found that if we replace the default Rblas.dll in R with the proper one. It can boost R's speed in doing matrix caculation.

The file is here (This file only works under Windows). For Mac and Linux users, see here.

Here are the steps to replace the Rblas.dll file (for Windows users):

1. Check what kind of processor (CPU) your PC or laptop is using (My computer --> Property). Download Rblas.dll from the corresponding directory under http://cran.r-project.org/bin/windows/contrib/ATLAS/.

2. Go to your R directory tp locate where the Rblas.dll is, for example, c:/program files/R/R.2-7.0/bin. Rename it as Rblasold.dll so that if the new Rblass.dll doesn't fit, you can use the old one by renaming it back.

3. Copy the new Rblass.dll you just download in this folder.

4. Restart R!

Here is an example to test:

require(Matrix)
set.seed(123)
X <- Matrix(rnorm(1e6), 1000)
print(system.time(for(i in 1:25) X%*%X))
print(system.time(for(i in 1:25) solve(X)))
print(system.time(for(i in 1:10) svd(X)))

Here is a test result on my machine (Intel Pentium (R) M process 1.73GHz with 1 GB RAM).

Default Rblas.dll

> print(system.time(for(i in 1:25) X%*%X))
user system elapsed
114.19 0.38 121.04
> print(system.time(for(i in 1:25) solve(X)))
user system elapsed
87.03 0.28 89.31
> print(system.time(for(i in 1:10) svd(X)))
user system elapsed
232.29 1.44 242.64
New Rblas.dll
> print(system.time(for(i in 1:25) X%*%X))
user system elapsed
37.18 0.36 37.89
> print(system.time(for(i in 1:25) solve(X)))
user system elapsed
30.62 0.56 31.78
> print(system.time(for(i in 1:10) svd(X)))
user system elapsed
102.89 2.17 107.17

Overall, R with the new Rblass.dll is twice faster than R with the old one. Now we wonder why this isn't a default design for R.

3 comments:

Yu-Sung Su said...

Indeed!
Maybe someone has to write a function to identify what kind of CPU your are using so that it can incorporate Rblass.dll into the installation process of R.

wcw said...

Of course it's real. Now, the fast library is the default if you run Debian, where your package maintainer cares. If you run Winders, you're on your own.

Remember, free software may not always be the best, but it more often cares about you. Winders only cares about Bill Gates.

Unknown said...

Thank you for this -
I've spammed everyone in my department using R with this.
I fully expect our output of publishable research to double now ;)