Sunday, June 22, 2008

Smart programming saves computing times

Recently, I tried to learn C language so that I can write a C addon for some R rountines to redress the speed issue in R. In my last post here, I have documented how a simple trick can speed up R. Nonetheless, I had an example which shows that smart programming can really save computing times.

I was trying to calculate a distant matrix for the pairwise matching. That is, suppose I had vectors of X and Z, each with n number of elementes, says X=c(x1, x2, x3, ...,xn), Z=c(z1, z2, z3, ...,zn). The distant matrix is defined as: Dij = (xi-xj)^2/(zi-zj)^2, where i and j are 1, 2, ..., n. This will result in a n x n D matrix (or a traingle matrix since half of the matrix is just a repetition.)

A straightforward way to program this is to write a loop to caculate each element of the D matrix, which means this will consume Q(n^2) times.
`D.mat <- matrix(NA, n, n) for (i in 1:n){     for (j in 1:n){          x.diff <- (x[i] - x[j])^2          z.diff <- (z[i] - z[j])^2          D.mat[i,j] <- x.diff/z.diff     } }       `
This could be very slow if you have more than 1000 elements in both your X and Z vectors (5 - 10 min in R). This is frustrating and inefficient. So I had a chat with Masanao, and asked if we can implement this in C. But he gave me another idea (Also thanks for chchou for a better code):
`x.mat <- mattrix(x, n, n)z.mat <- matrix(z, n, n)x.mat <- x.mat - t(x.mat)z.mat <- z.mat - t(z.mat)D.mat <- x.mat^2/z.mat^2 `
Basically, what this does is this:
`     [,1] [,2] [,3]          [,1] [,2] [,3][1,]  D1   D1   D1     [1,]   D1   D2   D3[2,]  D2   D2   D2  -  [2,]   D1   D2   D3[3,]  D3   D3   D3     [3,]   D1   D2   D3`
What R does in the backgroud is to do n times matrix substraction from row 1 to row n. So the total computing time is Q(n). So this is so much quicker! Here is the test result:

`> system.time(test1())   user  system elapsed   13.61    0.00   14.26 > system.time(test2())   user  system elapsed    0.40    0.00    0.42 `

This whole idea remind me of what I have learned in the past week in the C programming class. Sometimes, a careful, smart programming can be more efficiently saving computing times then replying on the hardware improvment.

Yihui said...

mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- x * mat # z here
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(x.mat) # z.mat here
D.mat <- x.mat^2/z.mat^2

Yu-Sung Su said...

Thanks. Corrected!

CHChou said...

Use
x.mat <- matrix(x, ncol=n, nrow=n)
z.mat <- matrix(z, ncol=n, nrow=n)
will reduce 2 matrix computation.
should be more quick.
result:
> x <- rnorm(5000)
> y <- rnorm(5000)
> system.time(testch(x,y))
user system elapsed
5.34 0.83 6.22
> system.time(test2(x,y))
user system elapsed
5.07 1.18 16.64

test2 <- function(x, z){
n <- length(z)
mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- z * mat
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}

testch <- function(x, z){
n <- length(x)
x.mat <- matrix(x, ncol=n, nrow=n, byrow=T)
z.mat <- matrix(z, ncol=n, nrow=n, byrow=T)
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}

Yu-Sung Su said...

Interesting!

Thanks for the input! Indeed, it is faster!