Tuesday, 5 April 2005

HOWTO deal with panel data in R, part one

These are really just notes for myself, but they may be useful to others. I am learning as I go.

There aren't really any easy introductions online, as far as I can tell. A useful resource appears to be the S-plus users' guide to nlme.

First of all, load the "nlme" library.


Now you can create your panel data object. Assume your data is loaded in a data frame called "df"; the time variable is called "year"; the panel id variable is called, say, "country". You also have a dependent variable called "GDP".

> panel.data <- groupedData(GDP ~ year | country, df)

Note that nlme doesn't specifically know that this is time data. It is just grouped data. It knows that variables with the same score on "year", have something in common; and so do variables with the same value of "country".

You might also have multiple "grouping factors" - for example, if you have data for male and female test subjects, from different cities, tested at different time periods, you could do

> panel.data <- groupedData( dependent.variable ~ time.period | city/sex , data.frame )

Your cases may also have some other things in common. For example, population may vary with time in your observations, but presumably the country's physical area does not. You can tell R that as follows:

> panel.data <- groupedData(GDP ~ year | country, df, outer = ~ area)

I am not yet sure exactly how that information gets used, but it seems important. I am also not yet sure how you specify multiple "outer" variables; or when exactly a variable should be "outer" and when it should be a "grouping factor". (But often the latter is intuitive - countries are groups but country area is not a group.)