Saturday, 12 September 2015

A bit more on ffplot

This week I spent a few evenings and train journeys developing ffplot. The ffplot README gives technical details, here I'll talk about the motivation.

I spend a lot of time plotting my data, either to explore it or to create output for papers and presentations. I'm pretty experienced with basic R plotting, and I also use the excellent ggplot2. But I kept finding myself looking up solutions on Stackoverflow and mailing lists, and thinking "but what I want is so simple! Why can't I understand how to do it?"

ffplot is a simple frontend to ggplot - basically, a hack, born out of my own stupidity. It doesn't do anything new, it just makes it easier for the user to think about.

For example, for my honesty paper, I had people from 15 countries who reported either heads or tails in a coin flip experiment. So, I wanted a bar chart of the proportions of people reporting heads in each country, with some confidence intervals.

In ffplot, this looks like:

ffplot(prop(heads) + ci(heads) ~ country, data)
This gets you:

I like this way of thinking, because what you say is what you get.

Another example: my data includes a test of "citizen integrity", as well as a quiz which respondents could cheat on. I'd like to know if there's a relationship between answers to the test, and scores on the quiz. I'd also like to spot patterns in the data.

So, let's plot integrity score and quiz score, and add a smooth mean. Again, what you say to ffplot corresponds to what you want:
ffplot(quizperf + smooth(mean(quizperf)) ~ integ, data)

Hmm, no obvious linear relationship, but people with very low scores either got 0 or full marks in the quiz - probably because they either cheated, or didn't bother with it at all.

Lastly, let's look at the relationship between reporting heads on the coin flip – which gave respondents a cash reward – and scoring high on hard quiz questions. I hope this is positive, as that will support the idea that both the quiz and the coin flip are tapping the same underlying dimension of willingness to lie. In fact, I want to make sure it's positive in every age group.

So, I'll plot a bar chart of proportions reporting heads, with confidence intervals, split up by age.  Again, the command corresponds to what you want:

ffplot(bar(prop(heads), fill = "red") + ci(heads) ~ quizp.hard | age, data)

This looks pretty reasonable - top scores on the hard questions are generally associated with more reporting of heads.

There's lots still to do, and this is definitely a hack, based on my own ignorance of ggplot2 – if I was smarter I could get it to work more easily. But I like it when things are simple.

Many more examples, and installation instructions, can be found at the github page for ffplot.