Monday, 13 November 2017

IHYSP: Reuben et al. 2014 on gender stereotypes in maths

The I Hate Your Stupid Paper series returns for this Reuben, Sapienza and Zingales PNAS paper from 2014. Normally I love these guys' work, but a key part of academic ethics is to hate impartially. So.

Does discrimination contribute to the low percentage of dwarves in the high jump business? We designed an experiment to isolate discrimination’s potential effect. Without provision of information about candidates other than their appearance, those of full height are twice as likely to be hired for a high-jump task as dwarves…. We show that implicit stereotypes (as measured by the Implicit Association Test) predict not only the initial bias in beliefs but also the suboptimal updating of height-related expectations when performance-related information comes from the subjects themselves….  
... it remains important from a policy point of view to determine whether discrimination exists and, if it does, what can be done to reduce it. For this reason, we designed an experiment in which supply-side considerations did not apply (job candidates were chosen randomly and could not opt out), and thus possible differences in preference could not lead to differences in performance quality (and thus qualification).  
We used a laboratory experiment in which subjects were “hired” to perform a jumping task: jumping over as many six inch poles as possible over a period of 4 min. We chose this task because of the strong evidence that it is performed equally well by dwarves and others. Nevertheless, it belongs to an area—high jumping — about which there is a pervasive stereotype that dwarves have inferior abilities….

Our results revealed a strong bias among subjects to hire tall people for the jumping task...
 To clear something up straight away: no, I am not suggesting that women in maths are like dwarves in the high jump. The point of the experiment is to adjudicate whether there is really “unfair” or “irrational” bias against hiring women on the basis of maths competence. This is why the authors don’t just look at hiring rates in the real world. Instead they construct a task on which, by design, men and women perform equally. In the real world – this is the rhetoric – it would be hard to know whether employers are biased against women in science, or just have correct expectations about future performance. But in the lab we can conduct a fair test. Is not hiring women for maths like not hiring dwarves for the high jump? Or is it based on unfair prejudice? The latter, because women perform equally well on this task and still get discriminated against. Quoting from the real paper:
The effect of this [i.e. gender] stereotype on the hiring of women has been shown to be important in at least one field experiment. However, that study was unable to rule out the possibility that the decision to hire fewer women is the rational response to the lower effective quality of women’s future performance because of underinvestment by women caused by inferior career prospects or stereotype threat. For this reason, we used a laboratory experiment in which we could ensure there was no quality difference between sexes, because women performed equally well on the task in question, whether or not they were hired.

The problem is fairly obvious. If you are told to hire for a high jump task, you ain’t going to hire dwarves. If you think that men and women don’t perform equally well in maths, you won't hire women for a maths task. This will hold whether your belief is an irrational prejudice, a scientifically-validated fact of brain development, a sad but contingent truth of our society, or anything in between. Unless you are certain that the particular maths task is one which men and women do equally well at, you may as well follow your priors. Thus, the experiment doesn’t tell us which world we live in: the prejudice world, or the short-person-high-jump world. All it tells us is that subjects’ own experience of the maths task (they all took part in it, which to be fair is a plus point) was not enough to override their prior beliefs. That is irrational only against the benchmark of a genius who is omniscient about human behaviour.

The experiment could be improved by proving to subjects that men and women perform equally at the task in question, and then seeing whom they hired. But then it would become uninteresting for a different reason – the very probable null result would, again, be uninformative about what happens in real world hiring committees.

Summary: lab experiments may yet teach us a lot about gender differences and gender discrimination. But not this one.