Tuesday, April 03, 2007

Average Over Unbalance Time-Series Cross-Sectional Data

Average over unbalance Time-Series Cross-Section (TSCS) data is a challenging task, at least for me. I always avoid doing this or just use MS Excel to do it section by section. But it is always very time-consuming. So I decide to sit down, think it hard and try to programe it using R.

Here is a low-level sample code that average over 5-year period over TSCS data.

mean5yrs <- function(x){
out <- NULL for (i in 1:199){
m.80 <- mean(x[country==i&year>1975&year<=1980], na.rm=T)
m.85 <- mean(x[country==i&year>1980&year<=1985], na.rm=T)
m.90 <- mean(x[country==i&year>1985&year<=1990], na.rm=T)
m.95 <- mean(x[country==i&year>1990&year<=1995], na.rm=T)
m.00 <- mean(x[country==i&year>1995&year<=2000], na.rm=T)
temp <- rbind(m.80, m.85, m.90, m.95, m.00)
out <- rbind(out, temp)

For some reason, R returns an error message using the above function. But the answer is correct. Als0, using tapply can be a lot faster than using a loop. See the discussion here.

And we can group all the variable in a X matrix and then use apply to do every variable at the same time.

XG <- cbind(odacon, stralag, odwplag, oil, brifracol, yg)
out <- apply(XG, 2, mean5yrs)

This function is not perfect. But it does save me tremendous amount of time!

1 comment:

Anonymous said...

This is Whitty, your favorite dog.
I have several questions for you.
First, why do u want to hide this blog from sister birdy and I?
Second, why my picture is not shown here?
You wrote a lot about statistics, which I totally don't understand, so I guess this blog is only for your freinds from the academic field, however, you also wrote something about your trips. So please count me in by posting my cutty pictures. I am sure that people would love to know your adorable pet.