Yu-Sung Su's Blog: Smart programming saves computing times

Recently, I tried to learn C language so that I can write a C addon for some R rountines to redress the speed issue in R. In my last post here, I have documented how a simple trick can speed up R. Nonetheless, I had an example which shows that smart programming can really save computing times.

I was trying to calculate a distant matrix for the pairwise matching. That is, suppose I had vectors of X and Z, each with n number of elementes, says X=c(x1, x2, x3, ...,xn), Z=c(z1, z2, z3, ...,zn). The distant matrix is defined as: Dij = (xi-xj)^2/(zi-zj)^2, where i and j are 1, 2, ..., n. This will result in a n x n D matrix (or a traingle matrix since half of the matrix is just a repetition.)

A straightforward way to program this is to write a loop to caculate each element of the D matrix, which means this will consume Q(n^2) times.

D.mat <- matrix(NA, n, n) 
for (i in 1:n){   
  for (j in 1:n){      
    x.diff <- (x[i] - x[j])^2      
    z.diff <- (z[i] - z[j])^2      
    D.mat[i,j] <- x.diff/z.diff   
  } 
}

This could be very slow if you have more than 1000 elements in both your X and Z vectors (5 - 10 min in R). This is frustrating and inefficient. So I had a chat with Masanao, and asked if we can implement this in C. But he gave me another idea (Also thanks for chchou for a better code):

x.mat <- mattrix(x, n, n)
z.mat <- matrix(z, n, n)
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2

Basically, what this does is this:

     [,1] [,2] [,3]          [,1] [,2] [,3]
[1,]  D1   D1   D1     [1,]   D1   D2   D3
[2,]  D2   D2   D2  -  [2,]   D1   D2   D3
[3,]  D3   D3   D3     [3,]   D1   D2   D3

What R does in the backgroud is to do n times matrix substraction from row 1 to row n. So the total computing time is Q(n). So this is so much quicker! Here is the test result:


> system.time(test1())
   user  system elapsed 
  13.61    0.00   14.26 

> system.time(test2())
   user  system elapsed 
   0.40    0.00    0.42

This whole idea remind me of what I have learned in the past week in the C programming class. Sometimes, a careful, smart programming can be more efficiently saving computing times then replying on the hardware improvment.

4 comments:

Anonymous said...: mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- x * mat # z here
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(x.mat) # z.mat here
D.mat <- x.mat^2/z.mat^2; 10:43 AM
Yu-Sung Su said...: Thanks. Corrected!; 10:47 AM
spoli said...: Use
x.mat <- matrix(x, ncol=n, nrow=n)
z.mat <- matrix(z, ncol=n, nrow=n)
will reduce 2 matrix computation.
should be more quick.
result:
> x <- rnorm(5000)
> y <- rnorm(5000)
> system.time(testch(x,y))
user system elapsed
5.34 0.83 6.22
> system.time(test2(x,y))
user system elapsed
5.07 1.18 16.64

test2 <- function(x, z){
n <- length(z)
mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- z * mat
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}

testch <- function(x, z){
n <- length(x)
x.mat <- matrix(x, ncol=n, nrow=n, byrow=T)
z.mat <- matrix(z, ncol=n, nrow=n, byrow=T)
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}; 1:56 AM
Yu-Sung Su said...: Interesting!

Thanks for the input! Indeed, it is faster!; 3:10 AM

Yu-Sung Su's Blog

Sunday, June 22, 2008

Smart programming saves computing times

4 comments:

About Me

Labels

Blog Archive

Online

Followers

RSeek.org

My Blog List

Google Analytics

Statistics

Yu-Sung Su's Blog

Sunday, June 22, 2008

Smart programming saves computing times

4 comments:

About Me

Labels

Blog Archive

Online

Followers

Subscribe To

RSeek.org

My Blog List

Google Analytics

Statistics