I was trying to calculate a distant matrix for the pairwise matching. That is, suppose I had vectors of X and Z, each with n number of elementes, says X=c(x1, x2, x3, ...,xn), Z=c(z1, z2, z3, ...,zn). The distant matrix is defined as: Dij = (xi-xj)^2/(zi-zj)^2, where i and j are 1, 2, ..., n. This will result in a n x n D matrix (or a traingle matrix since half of the matrix is just a repetition.)

A straightforward way to program this is to write a loop to caculate each element of the D matrix, which means this will consume Q(n^2) times.

D.mat <- matrix(NA, n, n)This could be very slow if you have more than 1000 elements in both your X and Z vectors (5 - 10 min in R). This is frustrating and inefficient. So I had a chat with Masanao, and asked if we can implement this in C. But he gave me another idea (Also thanks for chchou for a better code):

for (i in 1:n){

for (j in 1:n){

x.diff <- (x[i] - x[j])^2

z.diff <- (z[i] - z[j])^2

D.mat[i,j] <- x.diff/z.diff

}

}

x.mat <- mattrix(x, n, n)Basically, what this does is this:

z.mat <- matrix(z, n, n)

x.mat <- x.mat - t(x.mat)

z.mat <- z.mat - t(z.mat)

D.mat <- x.mat^2/z.mat^2

[,1] [,2] [,3] [,1] [,2] [,3]What R does in the backgroud is to do n times matrix substraction from row 1 to row n. So the total computing time is Q(n). So this is so much quicker! Here is the test result:

[1,] D1 D1 D1 [1,] D1 D2 D3

[2,] D2 D2 D2 - [2,] D1 D2 D3

[3,] D3 D3 D3 [3,] D1 D2 D3

> system.time(test1())

user system elapsed

13.61 0.00 14.26

> system.time(test2())

user system elapsed

0.40 0.00 0.42

This whole idea remind me of what I have learned in the past week in the C programming class. Sometimes, a careful, smart programming can be more efficiently saving computing times then replying on the hardware improvment.

## 4 comments:

mat <- matrix(1, n, n)

x.mat <- x * mat

z.mat <- x * mat # z here

x.mat <- x.mat - t(x.mat)

z.mat <- z.mat - t(x.mat) # z.mat here

D.mat <- x.mat^2/z.mat^2

Thanks. Corrected!

Use

x.mat <- matrix(x, ncol=n, nrow=n)

z.mat <- matrix(z, ncol=n, nrow=n)

will reduce 2 matrix computation.

should be more quick.

result:

> x <- rnorm(5000)

> y <- rnorm(5000)

> system.time(testch(x,y))

user system elapsed

5.34 0.83 6.22

> system.time(test2(x,y))

user system elapsed

5.07 1.18 16.64

test2 <- function(x, z){

n <- length(z)

mat <- matrix(1, n, n)

x.mat <- x * mat

z.mat <- z * mat

x.mat <- x.mat - t(x.mat)

z.mat <- z.mat - t(z.mat)

D.mat <- x.mat^2/z.mat^2

return(D.mat)

}

testch <- function(x, z){

n <- length(x)

x.mat <- matrix(x, ncol=n, nrow=n, byrow=T)

z.mat <- matrix(z, ncol=n, nrow=n, byrow=T)

x.mat <- x.mat - t(x.mat)

z.mat <- z.mat - t(z.mat)

D.mat <- x.mat^2/z.mat^2

return(D.mat)

}

Interesting!

Thanks for the input! Indeed, it is faster!

Post a Comment