A common technique in analyses of education policies (and popularized in the book, Freakonomics) has suffered a setback recently. The technique attempts to correct for endogeneity, which occurs when your dependent variable is causing one of your independent variables rather than simply the other way around.
It’s probably best to explain this with an example. Let’s say you want to know how the number of police officers in a city affects the crime rate. In this example the dependent variable is the crime rate and the independent variable is the number of police officers. That is, you are trying to explain how the size of the police force causes crime rates to be high or low.
The trouble is that the causal arrow also goes in the other direction. The crime rate affects the size of the police force because cities with a lot of crime may decide to hire a lot of police officers. So, the number of police officers is endogenous to the crime rate.
That endogeneity could produce some odd results if we didn’t do anything to correct it. We might find that the number of police officers causes crime rates to be higher when it might really be the case that the size of the police force reduces crime but high crime rates cause larger police forces.
This kind of problem comes up quite often in econometric analyses in general and in particular in evaluations of education policies. So, it was a great a thing that University of Chicago economist James Heckman developed a technique for unravelling these circular relationships and correcting for endogeneity bias. Basically, the technique uses some exogenous variable to predict the independent variable without bias.
Again, it’s probably easiest to explain with an example. If we can find something that predicts the number of police officers that has nothing to do with the crime rate, then we can come up with an unbiased estimated of the number of police officers. We can then use that unbiased estimate of how many police officers there would be (independent of the crime rate) to predict the crime rate. In theory the technique works great. Heckman won the Nobel Prize in economics for developing it.
The tricky part is coming up with a truly exogenous instrument (something that predicts the independent variable but has no relationship with the dependent variable). The only obviously exogenous instrument is chance itself. An example of that kind of instrument can be found in analyses of the effect of using a voucher on the student achievement of students who actually attend a private school when the vouchers are awarded by lottery. Those analyses use whether a student won the lottery or not to predict whether a student attended a private school and then used that unbiased estimate of whether a student attended a private school to predict the effect of private schooling on student achievement.
Whether a student won the lottery is purely a matter of chance and so is completely unrelated to student achievement, but it is predictive of whether a student attends a private school. It is a perfectly exogenous instrument.
The problem is that other than lotteries, it isn’t always clear that the instruments used are truly exogenous. Even if we can’t think of how things may be related, they may well be.
A perfect example of this — and it is one that raises questions about how exogenous all instruments other than lotteries truly are — was recently described in the Wall Street Journal having to do with date of birth. The date during the year when babies are born has long been thought to be essentially random and has been used as an exogenous instrument in a variety of important analyses, including a seminal paper in 1991 by Josh Angrsit and Alan Krueger on the effects of educational attainment on later life outcomes.
Since states have compulsory education laws require that students stay in school until a certain age, babies born earlier in the year reach that age at a lower grade and can drop out having attained less education. By comparing those born earlier in the year to those born later, which they believed should have nothing to do with later life outcomes, they were able to make claims about how staying in school longer affected income, etc…
But new work by Kasey Buckles and Daniel Hungerman at the University of Notre Dame suggests that the month and day of birth is not really exogenous to life outcomes. As it turns out, babies born in January are more likely to be born to unwed, less educated, and low income mothers than babies born later in the year. The difference is not huge, but it is significant. And since this variable is not exogenous, perhaps some or all of the effect of attainment Angrist and Krueger observed is related to this relationship between date of birth and SES, not truly attributable to attainment.
And if birth order is not random when we all assumed it was, what other instruments in these analyses are also not truly exogenous but we just don’t know how yet? It’s a potentially serious problem for these analyses.