Simpson’s Paradox — D’oh!

When it is pointed out that NAEP scores for 17 year-olds or graduation rates have remained flat for roughly three decades despite a doubling in per pupil spending (adjusted for inflation), I always brace myself for the Simpson’s Paradox response.  I particularly brace for it because its most active (and grating) purveyor is Gerald Bracey — D’oh!

As Bracey explains it, “Simpson’s Paradox occurs whenever the whole group shows one pattern but subgroups show a different pattern. ”  Test scores may rise over time for every ethnic/racial subgroup but the overall average may still decline or remain flat.  “The explanation lies,” Bracey argues, “in the changing makeup of the test taking groups. At Time 1, only 20% of the test takers were minorities. At Time 2, they make up 40% of the group. Their scores are improving, but they are still lower than whites’ so as they become a larger and larger proportion of the total sample of test takers, their improving-but-lower test scores attenuate the overall average or, in this case, actually cause it to fall.”

On the surface this story sounds very appealing.  Even sensible-sounding people like JPGB commentator, Parry, repeat the argument.  But on closer examination, Simpson’s Paradox does not explain away the frustrating lack of education productivity over the last few decades.

If we want to know whether we are receiving returns on our enormous additional investment in education, we want to see progress in the overall picture.  It would provide us with little comfort to see that our investments benefited some students but did not produce an aggregate gain — unless holding steady was actually a victory in the face of significantly more difficult to educate students.

And that is the unstated argument behind the use of Simpson’s Paradox to explain the lack of educational progress: minority students are more difficult to educate and we have more of them, so holding steady is really a gain.

The problem with this is that it only considers one dimension by which students may be more or less difficult to educate — race.  And it assumes that race has the same educational implications over time.  Unless one believes that minority students are more challenging because they are genetically different, which I do not imagine Bracey or Parry believe, we have to think about race/ethnicity differently over time as the host of social and economic factors that race represents changes.  Being African-American in 1975 is very different from being African-American in 2008.  (Was a black president even imaginable back then?)  So, the challenges associated with educating minority students three decades ago were almost certainly different from the challenges today.

If we want to see whether students are more difficult to educate over time, we’d have to consider more than just how many minority students we have.  We’d have to consider a large set of social and economic variables, many of which are associated with race.  Greg Forster and I did this in a report for the Manhattan Institute in which we tracked changes in 16 variables that are generally held to be related to the challenges that students bring to school.  We found that 10 of those 16 factors have improved, so that we would expect students generally to be less difficult to educate.  For example, we observed that students are significantly more likely to attend pre-school and come to the K-12 system with greater academic preparation.  Expansions in higher educational opportunities have significantly improved the average level of parental education, which should contribute to student readiness for K-12.  Median family incomes (adjusted for inflation) have improved and a smaller percentage of children live in poverty.  Children are more likely to come to school with better health and there are fewer teen moms.

Yes, some factors have made things more difficult.  There are more students from homes in which English is not the first language and more children in single-parent households.

And yes, there are more minority students, but those minority students have better incomes, better educated parents, more pre-school, and lower rates of crime in their communities.  Unless one wants to make a genetic argument, it is obviously misleading to say that students in general are more difficult to educate because there are more minority students.

But that is exactly what the purveyors of Simpson’s Paradox are doing.  They focus only on race and act as if it were an immutable influence on academic performance.  Many things have changed over the last few decades and most of them tend to make students better prepared for K-12 school.  Even if you are not completely persuaded by the report that Greg and I produced (and we make no claim to having a definitive analysis), it would be very difficult to suggest that students have become twice as difficult to educate to completely off-set the doubling in resources we have devoted to their education.  Any reasonable examination of the evidence suggests that we have suffered from a serious decline in educational productivity, where we buy significantly less achievement for each additional dollar spent.

11 Responses to Simpson’s Paradox — D’oh!

  1. Matthewladner says:

    The solution to the “Simpson’s Paradox” is simply to have more than marginal gains by subgroups. That’s not too much to ask, is it?

  2. Parry says:

    I get my own post? Awesome!

    I don’t think I disagree with the premise of your central contention: increased investments in education don’t appear to have gotten us commensurate returns. (I know I’m couching my opinion in circumspect language, but that’s because I’m not as convinced as you)

    But I do still think you’re being a little sneaky with the math. You said “It would provide us with little comfort to see that our investments benefited some students but did not produce an aggregate gain.” You’re defining “aggregate gain” as improvements in the average NAEP test scores of the overall population. But you could define aggregate gain differently. You could instead take a variety of student sub-groups (I used racial sub-groups because they’re the ones that get the most attention, but you could also use different sub-groups) and calculate the percentage increases or decreases of their scores on the NAEP each year. Then you could add up all of those gains and losses and come up with a composite score (weighting for population size each year). That would represent an aggregate gain (or loss, depending on the way the math went), but would be different from overall population average. In fact, that’s kinda the approach that you used for your teachability index.

    Speaking of the teachability index, it would be interesting to take this approach using your teachability index as the lens. Look at each of the sub-groups created by your index (for example, students participating in preschool, students not participating in preschool, teen moms, teen non-moms, etc.), look at the NAEP scores for each of those sub-groups, and then see if there was a net gain or loss over time (but not averaging all the scores, adding up all of the net gains or losses, with weighting for sub-group size).

    And by the way, thanks for saying that I sound sensible. You clearly don’t know me well enough yet.


  3. Hey Parry,

    For you I’d write two blog posts. Honestly, I was concerned that you were being snookered by Simpson’s Paradox, so I thought I would address all of the sensible people out there like you who might be attracted to the paradox without really thinking it through.

    In the end I think we are mostly in agreement. We both agree that students have not become twice as difficult to educate to off-set the doubling in per pupil spending.

    As to your specific suggestion of taking the weighted average of the percentage gain for each subgroup — I believe that is the mathematical equivalent of the aggregate average, if I understand you correctly.

    And I tried to keep my language circumspect as well. I’m not completely sold on the idea that kids are 8% easier to educate, but I am pretty convinced that they are not significantly more difficult to educate.

    I look forward to getting to know your less sensible side in the future. : )

  4. pm says:

    You seem to be expecting a linear relationship between inputs and outputs. By example I can successively come up with $100, $1000, and $10000, but I still can’t afford that new car. What are your grounds for thinking that twice as much money should make any difference?

  5. If doubling per pupil spending hasn’t done the trick, what makes us think that tripling or quadrupling (without fundamentially altering the incentives in the system) would produce any better results? Unless we are guided by faith on this matter, we need some evidence to suggest that the lack of resources is the major problem in K-12 education.

  6. pm says:

    I don’t have to do a study to know that doubling spending on schools hasn’t doubled the length of the school day or the school year. But yet every glimmer of hope that I see in education, charters or fanatic teachers, seem to rely on spending more time in the classroom. Combine this with the marginal utility of time one would expect a non-linear relationship between teacher salaries and time spent on the job. And since teachers’ salaries are the dominating cost in education, I would expect that I’d have to more than double costs to double outputs. But hey teachers may be willing to give us a bargain 🙂

  7. Patrick says:

    I’ll sum up the best of all possible responses by quoting the great American sage, Homer Simpson,

    “Facts are meaningless; they can be used to prove anything.”

  8. Parry says:

    For me, the more interesting questions are: Where did the extra money go? Why hasn’t increased spending resulted in higher levels of improvement than have occurred? What changes might lead to greater-than-marginal improvements?

    Simple questions, no?


  9. When the measurement tool hasn’t registered a change in 30 years, is it a function of the condition being measured or the tool being used?

    Which is my fancy way of saying that the NAEP is ass.

  10. I see. And are flat grad rates also bogus because they haven’t changed in 30 years? Is the standard for judging whether a measure is bogus that it shows flat results?

  11. Larry Bernstein says:


    You seem to focus on the implications of the changing composition of minorities. What if we just looked at just whites, would we see any progress? Did spending go up 2x for white students as well?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: