Wednesday, 5 July 2006

interpolating data in R

Here's a little function that does linear interpolation of data. For example, if you have a matrix like

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] NA 1 NA NA 6 NA NA
[2,] NA 8 NA NA 5 NA NA
[3,] NA 1 NA NA 4 NA NA
[4,] NA 7 NA NA 7 NA NA
[5,] NA 9 NA NA 1 NA NA

it will be turned into

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 2.666667 4.333333 6 6 6
[2,] 8 8 7.000000 6.000000 5 5 5
[3,] 1 1 2.000000 3.000000 4 4 4
[4,] 7 7 7.000000 7.000000 7 7 7
[5,] 9 9 6.333333 3.666667 1 1 1

Columns 1, 6 and 7 have just been copied from columns 2 and 5 - i.e. this doesn't do extrapolation. Columns 3 and 4 have been interpolated linearly.

The columns of the input matrix must be either all valid, or all NA. Otherwise it will get confused. Fixes welcome.



do.interpolate <- function (m) {
if (any(is.na(m[,1]))) m[] <- rep(m[,ncol(m)], ncol(m))
else if (any(is.na(m[,ncol(m)]))) m[] <- rep(m[,1], ncol(m))
else {
p <- ncol(m)-1
m <- as.matrix(m[,c(1, ncol(m))]) %*% matrix(c(p:0/p, 0:p/p), nrow=2,
byrow=T)
}

return (m)
}

# m is a matrix with missing data
mass.interpolate <- function (m) {
for (c in 1:ncol(m))
if (any(is.na(m[,c])) & ! all(is.na(m[,c])))
stop("column ", colnames(m)[c], " of matrix m is incomplete")
valid <- c(1, which(apply(m, 2, function (x) ! any(is.na(x)))))
rvalid <- c(valid[-1], ncol(m))

for (i in 1:length(valid)) {
if (valid[i] == rvalid[i]) next # if left or rightmost cols are valid
m[ ,valid[i]:rvalid[i] ] <- do.interpolate(m[ ,valid[i]:rvalid[i],
drop=F])
}

return(m)
}

Usage: mass.interpolate(my.matrix)