Friday, August 08, 2008

Link C with R

I should have posted this earlier. In my previous post, I have discussed the speed issue in R briefly. The speed issue unfortunately are two of the shortcomings of R. The other one is that R does not handle big data set very well. So instead of waiting for R to improve, for those who cannot wait (me included), we have to go back to the source. By this I mean that we should program some heavy procedure in C/C++. Having said this, I still must say that C/C++ is not easy to learn. At least, I suffered a lot.

Nonetheless, I recalled a conversation with a Rochester Polisci Ph.D. student in PolMeth that they learn C/C++ in the advance programming course. They learn C so that they can program Bayesian MCMC in C. I was wondering at that time that there might be a trend of learning C/C++.

R provides some nice features that can ease some of the hassle in C programming. In particular, R provides some C header files that are heady in programming. For instance, Rmatch.h provides many common probability distribution functions, e.g, rnorm, runif, etc. So we don't need to programm this in C. Another good feature of R is that we can call C object in R and feed in the output from C to R. So we can do post estimation and plotting in R and leave heavy duty job to C.

Here is a example code I wrote in C. This example is a gibbs sampler of the 8-schools example in Gelman et al 2004. I am not going to talk about programming in C as I am no expert on it. I will just explain how to link C with R.

We need to use Rtools to compile the C code.
R CMD SHLIB 8schools.c
After compilation, it produces some executable objects. Under the Windows OS, we use 8schools.dll. To use it in R, we can can use .C or .Call.

dyn.load("8schools.dll")

a <- .C("gibbs",
"n.iter" = as.integer(n.iter),
"n.chain" = as.integer(n.chains),
"n.var" = as.integer(n.var),
"y" = as.double(y),
"sigma.y" = as.double(sigma.y),
"n.schools" = as.integer(J),
"ans" = double(n.var * n.iter * n.chains)
)
Here is the full R code that does the 8-schools example. I also include a R version of the gibbs sampling in the bottom of the code. You can alter n.iter in the code to see the speed difference between doing 8-schools in R and in C.

I am still new to C, so my code might not be perfect. I welcome your feedback on how to make it better. I might also need your help here. ( I found that I need to run rchisq() once in R before I call my C code (see line 18 in the code). Otherwise, the C procedure will stuck forever in the rchisq() part in C). # this problem has been solved by Fabian's suggestion. See his comment follows this post.

2 comments:

Unknown said...

Concerning your trouble with the RNG in C:

you need to do

getRNGstate();

once BEFORE you call the RNG in C for the first time and

putRNGstate();

before you return to R, that way continuity of the random number stream is ensured, AFAIK.

Yu-Sung Su said...

Fabian,

Thanks for your comment. Indeed, it works!