I was trying to calculate a distant matrix for the pairwise matching. That is, suppose I had vectors of X and Z, each with n number of elementes, says X=c(x1, x2, x3, ...,xn), Z=c(z1, z2, z3, ...,zn). The distant matrix is defined as: Dij = (xi-xj)^2/(zi-zj)^2, where i and j are 1, 2, ..., n. This will result in a n x n D matrix (or a traingle matrix since half of the matrix is just a repetition.)
A straightforward way to program this is to write a loop to caculate each element of the D matrix, which means this will consume Q(n^2) times.
D.mat <- matrix(NA, n, n)This could be very slow if you have more than 1000 elements in both your X and Z vectors (5 - 10 min in R). This is frustrating and inefficient. So I had a chat with Masanao, and asked if we can implement this in C. But he gave me another idea (Also thanks for chchou for a better code):
for (i in 1:n){
for (j in 1:n){
x.diff <- (x[i] - x[j])^2
z.diff <- (z[i] - z[j])^2
D.mat[i,j] <- x.diff/z.diff
}
}
x.mat <- mattrix(x, n, n)Basically, what this does is this:
z.mat <- matrix(z, n, n)
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
[,1] [,2] [,3] [,1] [,2] [,3]What R does in the backgroud is to do n times matrix substraction from row 1 to row n. So the total computing time is Q(n). So this is so much quicker! Here is the test result:
[1,] D1 D1 D1 [1,] D1 D2 D3
[2,] D2 D2 D2 - [2,] D1 D2 D3
[3,] D3 D3 D3 [3,] D1 D2 D3
> system.time(test1())
user system elapsed
13.61 0.00 14.26
> system.time(test2())
user system elapsed
0.40 0.00 0.42
This whole idea remind me of what I have learned in the past week in the C programming class. Sometimes, a careful, smart programming can be more efficiently saving computing times then replying on the hardware improvment.
4 comments:
mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- x * mat # z here
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(x.mat) # z.mat here
D.mat <- x.mat^2/z.mat^2
Thanks. Corrected!
Use
x.mat <- matrix(x, ncol=n, nrow=n)
z.mat <- matrix(z, ncol=n, nrow=n)
will reduce 2 matrix computation.
should be more quick.
result:
> x <- rnorm(5000)
> y <- rnorm(5000)
> system.time(testch(x,y))
user system elapsed
5.34 0.83 6.22
> system.time(test2(x,y))
user system elapsed
5.07 1.18 16.64
test2 <- function(x, z){
n <- length(z)
mat <- matrix(1, n, n)
x.mat <- x * mat
z.mat <- z * mat
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}
testch <- function(x, z){
n <- length(x)
x.mat <- matrix(x, ncol=n, nrow=n, byrow=T)
z.mat <- matrix(z, ncol=n, nrow=n, byrow=T)
x.mat <- x.mat - t(x.mat)
z.mat <- z.mat - t(z.mat)
D.mat <- x.mat^2/z.mat^2
return(D.mat)
}
Interesting!
Thanks for the input! Indeed, it is faster!
Post a Comment