Wednesday, 29 April 2015

R tip: quick functions using dplyr


How to e.g. find the proportion of NAs in your data by column.

The old way is:

sapply(mydata, function (x) mean(is.na(x), na.rm = TRUE))

This wastes a lot of time typing out function(x) etc., and is hard to read.

Here's a better way, using the wonderful dplyr:

library(dplyr)
sapply(mydata, . %>% is.na %>% mean(na.rm = TRUE))

The magic is that the dot . before the pipe operator %>% creates a function.