Tuesday, 5 December 2017

Scientism and my shrink

Some time ago I started seeing a psychotherapist, a Jungian whom a friend had recommended. My excellent research assistant, a psychology PhD, was surprised and scornful: “You realise that’s not real scientific psychology?”

Jung with pipe

She was right, of course. Jung is taken no more seriously than Freud by modern psychologists. There’s no evidence that Jungian psychology is practically effective either. Until the rise of cognitive behavioural therapy, no school of therapy did better than any other in scientific trials, or even better than just talking to a friend. With apologies to lay people, we can write this down in an equation:

ATEJung    =     E[x | J = 1] – E[x | J = 0]     =     0                                 (1)

where x is mental health, E[x | J = 1] is the expected level of a person’s mental health given a spell of Jungian therapy, and E[x | J = 0] , is the expected level of their health after no treatment (or, say, after some more reasonable control, like talking to a friend). ATE is the Average Treatment Effect, the average effect on someone of having a Jungian therapist; equivalently, the difference between their health after Jungian therapy and after the alternative.

But I stayed with my therapist all the same. My RA was right to be shocked at such an unscientific attitude, no?

Some things about my guy seemed to differentiate him from the average therapist, Jungian or not. He was extremely intelligent, thoughtful and calm, and I’d developed a warm relationship with him. I felt that I’d learned some things about myself and perhaps this was helping with my problems.
Here’s an equation for what matters to me:

TE[i] = (xi | J = 1) – (xi | J = 0)                          (2)

where TE[j] is the treatment effect on the psychological health of unit j, the difference between their psychological health after seeing my therapist and after the alternative; and i represents myself. Equation (1) is just the average of equation (2), taken over some appropriate population of patients and therapists. And if it is estimated right, then your best guess of (2) for a random individual and a random therapist is (1); in other words, by scientific standards Jungian therapy is useless.

But of course, I am not a random individual to myself, and my therapist is also not randomly chosen. I know or believe many things about me and him, which may lead me to a different estimate of (2). Some of these will be the data of my own experience, others will be intuitions, or perhaps what I’ve heard from my friend. It’s not obvious how I should deal with the scientific information embodied in equation (1). It is not something I should just ignore, and it certainly comes out of a more careful and objective process than my own scraps of intuition and gossip. But that does not mean those scraps are worthless. Very little of the knowledge we live by day-to-day is scientific, but we get by well enough.

These ideas are relevant to the debate on expertise. Here’s Simon Wren-Lewis on expertise:
In reality ignoring expertise means dismissing evidence, ignoring history and experience, and eventually denying straightforward facts.
With respect, this is one-sided, and even arrogant and dangerous [1]. For instance, a person who worries that their job may be taken by a migrant is not proved wrong by even theoretically perfect research showing that immigration on average does not reduce native employment [2]. Yes, people can be misled by xenophobia or biased newspaper reporting. They may also know specific things about their town, or their company, that researchers do not. Those pieces of knowledge will not have been reached by careful scientific experimentation. But decentralized, embodied information about specific particular conditions is, among other things, what makes free markets work [3]. If all knowledge were expert knowledge, socialism would have outrun capitalism.

Another point is specific to social science [4]. Humans live in history, which is a river that you cannot step in twice: conditions are always changing. What we are really interested in is the effect of certain policies in future. But the only data we have is from the present and the past. Statisticians understand the risk of extrapolating from the data – assuming that something’s behaviour will remain predictable in conditions beyond the boundaries of what one has so far observed [5]. Well, if time is a relevant variable, all social science is extrapolation from the future to the past, and sometimes it fails. Relationships that once held cease to do so, perhaps suddenly. To understand such a world, the observer often has to make a choice: gather a respectably-sized sample, perhaps reaching back far into the past [6]; or look at what’s happening now and make a risky but relevant guess. Past averages; or straws in the wind? 

This often divides scientists from journalists. Social scientists want to make well-founded generalizations and are trained to pay little regard to journalists’ anecdotes. Journalists can legitimately retort that they have a better instinct for what matters today. Neither side is always right. I haven’t mentioned yet how little we truly know, perhaps how little there is to know, about many vital matters of macro social science. Put it this way: until they are a little better at predicting financial crises, or the short run effects of Brexit, economists will fulminate in vain against journalists who don’t take their other predictions seriously.

The idea that everything to be known must be known by scientific methods has a name: it is called scientism. But scientism is not scientific.

Notes and references

[1] Incidentally, Professor Wren-Lewis gave the choice of Corbyn as Labour leader as an example of ordinary people (Labour members) ignoring expertise. I also used to think that was a bad idea for Labour. Neither of us look very expert now, do we?
[2] There’s a debate between George Borjas and others [1, 2] on migration, which hinges, among other things, on how much to "borrow strength" between different social groups, so as to predict one group's outcome from another's.
[3] Here is Hayek's classic argument about markets, "The Use of Knowledge in Society". It's short and easy to read.
[4] This is why Professor Wren-Lewis is wrong to argue that ignoring experts on Brexit is "exactly equivalent to giving considerable publicity to a report from some climate change denial outfit". The equivalence is a bit looser than that.
[6] A good example is the very interesting dataset of financial crises collected by Reinhard and Rogoff for their book This Time Is Different. As their subtitle boasts, it reaches back through Eight Centuries of Financial Folly. It was certainly wrong to think the noughties' boom economy was different from any previous period, but it might reasonably be different from the conditions of the fourteenth century.

The “river you cannot step in twice” line comes from the Ancient Greek philosopher Parmenides, who said that you cannot step in the same river twice.

Monday, 13 November 2017

IHYSP: Reuben et al. 2014 on gender stereotypes in maths

The I Hate Your Stupid Paper series returns for this Reuben, Sapienza and Zingales PNAS paper from 2014. Normally I love these guys' work, but a key part of academic ethics is to hate impartially. So.

Does discrimination contribute to the low percentage of dwarves in the high jump business? We designed an experiment to isolate discrimination’s potential effect. Without provision of information about candidates other than their appearance, those of full height are twice as likely to be hired for a high-jump task as dwarves…. We show that implicit stereotypes (as measured by the Implicit Association Test) predict not only the initial bias in beliefs but also the suboptimal updating of height-related expectations when performance-related information comes from the subjects themselves….  
... it remains important from a policy point of view to determine whether discrimination exists and, if it does, what can be done to reduce it. For this reason, we designed an experiment in which supply-side considerations did not apply (job candidates were chosen randomly and could not opt out), and thus possible differences in preference could not lead to differences in performance quality (and thus qualification).  
We used a laboratory experiment in which subjects were “hired” to perform a jumping task: jumping over as many six inch poles as possible over a period of 4 min. We chose this task because of the strong evidence that it is performed equally well by dwarves and others. Nevertheless, it belongs to an area—high jumping — about which there is a pervasive stereotype that dwarves have inferior abilities….

Our results revealed a strong bias among subjects to hire tall people for the jumping task...
 To clear something up straight away: no, I am not suggesting that women in maths are like dwarves in the high jump. The point of the experiment is to adjudicate whether there is really “unfair” or “irrational” bias against hiring women on the basis of maths competence. This is why the authors don’t just look at hiring rates in the real world. Instead they construct a task on which, by design, men and women perform equally. In the real world – this is the rhetoric – it would be hard to know whether employers are biased against women in science, or just have correct expectations about future performance. But in the lab we can conduct a fair test. Is not hiring women for maths like not hiring dwarves for the high jump? Or is it based on unfair prejudice? The latter, because women perform equally well on this task and still get discriminated against. Quoting from the real paper:
The effect of this [i.e. gender] stereotype on the hiring of women has been shown to be important in at least one field experiment. However, that study was unable to rule out the possibility that the decision to hire fewer women is the rational response to the lower effective quality of women’s future performance because of underinvestment by women caused by inferior career prospects or stereotype threat. For this reason, we used a laboratory experiment in which we could ensure there was no quality difference between sexes, because women performed equally well on the task in question, whether or not they were hired.

The problem is fairly obvious. If you are told to hire for a high jump task, you ain’t going to hire dwarves. If you think that men and women don’t perform equally well in maths, you won't hire women for a maths task. This will hold whether your belief is an irrational prejudice, a scientifically-validated fact of brain development, a sad but contingent truth of our society, or anything in between. Unless you are certain that the particular maths task is one which men and women do equally well at, you may as well follow your priors. Thus, the experiment doesn’t tell us which world we live in: the prejudice world, or the short-person-high-jump world. All it tells us is that subjects’ own experience of the maths task (they all took part in it, which to be fair is a plus point) was not enough to override their prior beliefs. That is irrational only against the benchmark of a genius who is omniscient about human behaviour.

The experiment could be improved by proving to subjects that men and women perform equally at the task in question, and then seeing whom they hired. But then it would become uninteresting for a different reason – the very probable null result would, again, be uninformative about what happens in real world hiring committees.

Summary: lab experiments may yet teach us a lot about gender differences and gender discrimination. But not this one.

Tuesday, 24 October 2017

On the proportion of Black students at Oxford

So David Lammy has accused Oxford of not doing enough to recruit black students.

Is this true?

First of all, the 2011 UK census data shows the numbers of black 18 year olds, out of all 18 year olds. It's 57,428 out of 1,460,156, so about 4%. If only 1.5% of Oxford students are black, on the face of it we have a problem.

But whose problem is it? I googled for "A level results by ethnicity" and found this Freedom of Information request. (How cool is it, by the way, that these requests are available on an easy-to-find, functioning website?) The data here is a couple of years out of date, but it's a start.

Let's be realistic and assume you need 3 A grades to get to Oxford. 395,401 pupils took A levels. Of them, 12.5% got 3 A* or A grades. 8,532 of those pupils were black. Of them, 4.9% got 3 A* or A grades. That means about 418 black students got these grades, out of about 49425 students in all: 0.8%.

From this five minutes analysis, if anything, Oxford is doing rather well in the numbers of black students it admits. The problem is that the UK school system is not turning out enough well-qualified black students.

For any journalists reading, I hope this demonstration shows how easy it is for you to check the claims politicians make. Go and do likewise!

Update: a picture is worth a thousand words.

Wednesday, 20 September 2017

Area genetics

Here is an interesting map of the UK.

The colours relate to the genetics of people born in each county. Specifically, they show you the average Educational Attainment Polygenic Score (EA PS) of residents from within our sample. EA PS is a DNA measure that can be used to predict a person's level of education (e.g. do they leave school at 16, or get a university degree). Red is the worst, pale yellow is the best.

The black outline shows areas of former coalmining. Coal employment has been declining since the 1920s, and by the 1970s, these areas were often socially deprived.
I won't say much more for now!

Tuesday, 19 September 2017

We preregistered an experiment and lived

For my school experiment with Jinnie, we decided to pre-register our analyses. That seemed like the modern and scientifically rigorous thing to do.

There are different preregistration venues for economists. osf.io is very complete and allows you to upload many kinds of resources, then "freeze" them as a public preregistration. aspredicted.org is at the other extreme, it justs asks you 9 questions about your project. The AEA also runs a registry for randomized controlled trials at www.socialscienceregistry.org.

For this project, we decided to use osf.io. We were pretty serious. We uploaded not just a description of our plans, but exact computer code for what we wanted to do. Here's our preregisration on osf.io.

This was the first time I have preregistered a project. We ran into a few hurdles:
  • We preregistered too late, after we'd already collected data.
This was pure procrastination and lack of planning on our part. Of course it means that we could have run 100 analyses, then preregistered the analysis that worked.
  • Our  preregistered code had bugs.
This was true even though it worked on the fake data we'd used to test it. Luckily we were able to upload a corrected version, but if you've frozen the files you uploaded, this would be a problem.
  • Our analysis was not the right one.
The data looked odd and our results weren't significant! Now we faced a dilemma. The correct thing to do would be to admit defeat. You preregister, your results are insignificant... go home. However, it was also reasonably clear that we had assumed our dependent variable would look one way (a nice, normally-ish distributed variable), and in fact it looked completely different (huge spikes at certain values, some weird and very influential outliers).

We were sure that statistically, we should do a different analysis. But of course, then we were in the famous garden of forking paths. So we compromised: we changed the approach, but added an appendix with our initial analysis, and retrying it with some fairly minimal changes (e.g. removing  outliers). In fact, even just clustering our standard errors appropriately would give us a significant result, though again, that wasn't in the original plan.

Bottom line: you are an imperfect researcher. Your initial plan may just be mistaken, and as you think about your project, you may improve on it. Your code may fail. And the data may reveal that your assumptions were wrong. These can raise awkward choices. It is easy to convince yourself that your new analysis, which just happens to get that coveted significance star, is better than your original plan.

Despite these problems, I'm glad we preregistered. This did discipline our analysis. We've tried to keep a clear separation between questions in our analysis plan; and exploratory questions which we thought of later, or which seminar participants suggested to us. For example, we have a result where children are more influential on each other if they have many shared friends. Interesting, and it kind of makes sense among our adolescent subjecs, but it is exploratory. So, I'd want to see it replicated elsewhere before being fully persuaded this was a real result. By contrast, I am quite confident in our main result, which follows the spirit though not the letter of our plan.

In many cases, preregistering one's code may be over the top. It's better to state clearly and accurately the specific hypotheses you're going to test. There's no way you can be fully specific, but that's fine – the goal is to reduce your degrees of freedom by a reasonable amount. So, I would probably favour the quick aspredicted.org style, over the more complex osf.io style, unless I was running a really involved project.

I've just preregistered an observational analysis of some genetic data. It's over at aspredicted.org, number 5584. Just waiting for my authors to approve...

Monday, 24 July 2017

New, new working paper just out

I've also got a new working paper out with Jinnie Ooi, my brilliant research assistant and co-author. It's about how norms of fairness spread among teenagers. I'll blog in more detail later, but here is just a nice picture to give a taste of the result:

New old paper just out

A very old paper of mine and Martin Leroch's has just got published in Homo Economicus. (Ungated, older version.) The topic is reciprocity between groups. Here's a quote from the intro, I've bolded the key point:
At first sight it appears straightforward that people take revenge against entire groups, not only against direct individual perpetrators, even in routine social and economic life. For instance, consumers buy fewer products from countries which they see as politically antagonistic (Klein and Ettensoe 1999, Leong et al. 2008). Further, on days after terrorist bombings in Israel, Jewish (Arab) judges become more likely to favor Jewish (Arab) plaintiffs in their decisions, and Israeli Arabs face higher prices for used cars (Shayo and Zussman 2011; Zussman 2012). On a political level, for instance, Keynes (1922) perceived the Treaty of Paris’ devastation of the German economy as an act of revenge, and quoted Thomas Hardy’s play The Dynasts: ‘‘Nought remains/But vindictiveness here amid the strong,/And there amid the weak an impotent rage.’’ In its most extreme case, revenge against groups may trigger violent intergroup conflict. After an argument between an Indian Dalit and an upper caste farmer, upper caste villagers attacked 80 Dalit families (Hoff et al. 2011). In Atlanta, 1906, after newspaper allegations of black attacks on white women, a group of white people rioted, killing 25 black men (Bauerlein 2001). In both cases, innocent people were made to suffer for the real or supposed crimes of others. Many field studies of intergroup violence report similar tit-for-tat processes, with harm to members of one group being avenged by attacks on previously uninvolved coethnics of the original attackers (Horowitz 1985, 2001; Chagnon 1988).

We started thinking about this back in 2009, I just looked up the email:
Reciprocity towards groups; that's a pretty important idea if it holds, right? (Think about wars, racial discrimination; patriotism...) I don't know if there's anything done in the area. But perhaps it's one for another experiment.
As well as seeming important, it turned out there was basically nothing out there in economics, and only a few papers in psychology.

We ran not one but several experiments, polishing the treatment and figuring out "what works". (There's issues of multiple testing here, but I'll ignore that.)

Our final experiment had some interesting results, and we sent it off to a top journal. It was rejected. Then we sent it off to another journal and... it was rejected. And another, and another.... I was annoyed by this because I felt that this was an important topic that nobody had written about! After all, Chen and Li (2009) had got into the AER by doing a basic group identity experiment, the same thing psychologists had done for decades, and adding incentives.

Yeah, I was naive! There are lots of reasons for the paper not doing well, some good, some bad:
  • The design was complex and hard to explain. We spent ages on multiple rewrites of our design section to make it clear what we had done.
  • In addition, the design and methodology weren't perfect - we were both quite inexperienced. There are things I'd do differently. Of course, reviewers picked these up.
  • Our topic fell between stools: it was an economic experiment on a fundamentally political topic. It is a sad reality that interdisciplinary work is not easy to publish.
  • Relatedly: referees and academics are conservative. It is easier to answer a question they already consider important, than to introduce a new question and persuade them it is important. That's probably reasonable. The dominant themes of any literature are dominant for a reason.
  • Chen and Li's AER paper did what I have since learned is important - it created a building block. It deserves its placement. I still think we were out there doing something quite new, but sometimes you have to lead the academic horse to water.
Anyway, for all that, I still think that intergroup dynamics are under-researched, given that they may be involved in the devastating phenomena we touch on in our intro. So, I'm glad it's finally out!

Here's a picture of the basic result, which I'm sure has been up on this blog before. The slope of the solid line shows subjects' "upstream reciprocity" towards a fellow group member of their most recent opponent in a public goods game. The dashed line is the control, showing reciprocity towards someone in a different group.

Friday, 9 June 2017

Whistling in the dark

I remember 1992.

Everyone expected Labour to win and kick out Major. I sat and watched it with a friend from school. I was very Left wing, and in 1992 almost everyone my age (even Etonians) wanted the Tories out.

By 2am, it was clear that Labour was not winning. I took out a tiny, tiny speck of dope that I had left over and ate it in a feeble attempt to get high. Then we went to bed.

Anyway. I need to find some positives in this situation:
  • We will get rid of May, who has shown zero talent and zero charisma. 
  • Corbyn probably will not form a government.
  • If he does, it will be a weak one, and as he has shown zero talent for organization and management – as opposed to his huge talent for campaigning and speechmaking, for which, full respect – it will probably lead to swift disillusionment for the kids who are now out celebrating.
  • The Lib Dems might be able to demand a referendum on PR, and the mood of the country is such that it might vote yes this time.
  • A lot of young people have been enthused by politics. I'm not sure this is a particularly good thing, but at least they will be enjoying themselves.
  • Ruth Davidson has done really well in Scotland. (I've often thought that it would be quite funny, and really wind up the Left, if the Conservative party could have the first Jewish, the first female, the first gay and the first black Prime Ministers.)
  • The SNP are one step further from breaking up my country.
I will try to think about the negatives in the morning. At the moment it is just too grim. Oh, one more:
  • There were some excellent dogs at the polling station where I was a teller.

Why did Corbyn do so well? A little bit of political economy

Let's assume the exit poll is about true, and that Jeremy Corbyn has done even better than the polls thought – and he was already pulling far ahead of what people, including me, expected.

There are lots of things to say about this: failures in polling (again); Theresa May's incompetent campaign and feeble personality; Jeremy Corbyn's quality as a campaigner; the role of the internet.

I think one dog that very importantly did not bark is the Labour manifesto. Remember, Jeremy Corbyn is a passionately ideological Leftwinger. But the manifesto was in many ways rather moderate. It did not, for example, aim to spend much more than the Conservatives. It did not set out to reverse many Conservative welfare cuts.

A classic model in political science explains why parties move to the centre. Suppose the two parties are concerned only to win the elections. They will each promise a platform right in the middle of the electorate, at the famous "median voter" - the person in the middle, who has half the electorate to the left of her and half the electorate to the right. Why? Because if one candidate moves to the left of this person, then (at least!) everyone on the right votes for the other candidate, who wins a 50% majority. And if one candidate moves to the right, then everyone on the left votes for the other candidate who again wins.

Here's an ASCII art picture. It shows a line representing political preferences from Left to Right. The voters are at x. The median voter is marked with a *, with two voters on her left and two on her right. Voters vote for the party that is closest to them. Both parties will propose a platform at *. Any party that moved left or right would get three voters preferring the other candidate.


A natural response to this is "oh, but politicians are idealists! Or at least, Jeremy is. Jeremy cares about policy, not just getting elected." Well, stop swooning over Jeremy for a second, and suppose that is true. Suppose you are Jeremy Corbyn and deeply want policy to be as left wing as possible. You will still move to the centre. For, if you do not, and lose, then you will get the policies implemented by your right wing opponent.

This seems to be what has happened. Corbyn moderated his manifesto. That made Labour palatable to voters who would never have tolerated Corbyn's own ideal policies.

In a sense, you could say that despite appearances, the ghost of Blair still haunts the Labour party. Even with Corbyn as leader, they are forced to go along with a lot of the consensus of the past forty years.

(Thank God! ... But this is a post about the "horse race", not the outcome.)

The original model of the median voter is the "Downsian" model, made famous by Anthony Downs' An Economic Theory of Democracy (1957); but actually first suggested by Hotelling (1929) "Stability in competition". The point about "idealistic" politicians was first made, I think, by Donald Wittman (1929) "Candidates with policy preferences: A dynamic model" – sorry no ungated version.