Over the last few years I have developed a deeper skepticism about the reliability of relying on test scores for accountability purposes. I think tests have very limited potential in guiding distant policymakers, regulators, portfolio managers, foundation officials, and other policy elites in identifying with confidence which schools are good or bad, ought to be opened, expanded, or closed, and which programs are working or failing. The problem, as I’ve pointed out in several pieces now, is that in using tests for these purposes we are assuming that if we can change test scores, we will change later outcomes in life. We don’t really care about test scores per se, we care about them because we think they are near-term proxies for later life outcomes that we really do care about — like graduating from high school, going to college, getting a job, earning a good living, staying out of jail, etc…
But what if changing test scores does not regularly correspond with changing life outcomes? What if schools can do things to change scores without actually changing lives? What evidence do we actually have to support the assumption that changing test scores is a reliable indicator of changing later life outcomes?
This concern is similar to issues that have arisen in other fields about the reliability of near-term indicators as proxies for later life outcomes. For example, as one of my colleagues noted to me, there are medicines that are able to lower cholesterol levels but do not reduce — or even may increase — mortality from heart disease. It’s important that we think carefully about whether we are making the same type of mistake in education.
If increasing test scores is a good indicator of improving later life outcomes, we should see roughly the same direction and magnitude in changes of scores and later outcomes in most rigorously identified studies. We do not. I’m not saying we never see a connection between changing test scores and changing later life outcomes (e.g. Chetty, et al); I’m just saying that we do not regularly see that relationship. For an indicator to be reliable, it should yield accurate predictions nearly all, or at least most, of the time.
To illustrate the un-reliability of test score changes, I’m going to focus on rigorously identified research on school choice programs where we have later life outcomes. We could find plenty of examples of disconnect from other policy interventions, such as pre-school programs, but I am focusing on school choice because I know this literature best. The fact that we can find a disconnect between test score changes and later life outcomes in any literature, let alone in several, should undermine our confidence in test scores as a reliable indicator.
I should also emphasize that by looking at rigorous research I am rigging things in favor of test scores. If we explored the most common use of test scores — examining the level of proficiency — there are no credible researchers who believe that is a reliable indicator of school or program quality. Even measures of growth in test scores or VAM are not rigorously identified indicators of school or program quality as they do not reveal what the growth would have been in the absence of that school or program. So, I think almost every credible researcher would agree that the vast majority of ways in which test scores are used by policymakers, regulators, portfolio managers, foundation officials, and other policy elites cannot be reliable indicators of the ability of schools or programs to improve later life outcomes.
With the evidence below I am exploring the largely imaginary scenario in which test scores changes can be attributed to schools or programs with confidence. Even then, the direction and magnitude of changing test scores does not regularly correspond with changing later life outcomes. I’ve identified 10 rigorously designed studies of charter and private school choice programs with later life outcomes. I’ve listed them below with a brief description of their findings and hyperlinks so you can read the results for yourself.
Notice any patterns? Other than the general disconnect between test scores and later life outcomes (in both directions), I notice that the No Excuses charter model that is currently the darling of the ed reform movement and that New York Times columnists have declared as the only type of “Schools that Work” tend not to fare nearly as well in later outcomes as they do on test scores. Meanwhile the unfashionable private choice schools and Mom and Pop charters seem to do much better on later life outcomes than at changing test scores. I don’t highlight this pattern as proof that we should shy away from No Excuses charters. I only mention it to suggest ways in which over-relying on test scores and declaring with confidence that we know what works and what doesn’t can lead to big policy mistakes.
Here are the 10 studies:
- Boston charters (Angrist, et al, 2014) – Huge test score gains, no increase in HS grad rate or postsecondary attendance. Shift from 2 to 4 yr
- Harlem Promise Academy (Dobbie and Fryer, 2014) – Same as Boston charters
- KIPP (Tuttle, et al, 2015) – Large test score gains, no or small effect on HS grad rate, depending on analysis used
- High Tech High (Beauregard, 2015) – Widely praised for improving test scores, no increase in college enrollment
- SEED Boarding Charter (Unterman, et al, 2016) – same as Boston charters
- TX No Excuses charters (Dobbie and Fryer, 2016) – Increase test scores and college enrollment, but no effect on earnings
- Florida charters (Booker, et al, 2014) – No test score gains but large increase in HS grad rate, college attendance, and earnings
- DC vouchers (Wolf, et al, 2013) – Little or no test score gain but large increase in HS grad rate
- Milwaukee vouchers (Cowen, et al, 2013) – same as DC
- New York vouchers (Chingos and Peterson, 2013) – modest test score gain, larger college enrollment improvement
I’ve said this before, but will repeat it:
Our middle school has never had jaw-droppingly high test scores. Relative to the similar schools (demographics, etc.) we’re good, but on an absolute scale we aren’t.
While we have never tracked the results officially, anecdotally the later outcomes of our students (HS graduation, college-career participation) align with FL, DC, NYC and Milwaukee. Our HS graduation rate I’d guesstimate is 90%, and students either attending college or employed are close to the same rate.
Yes, good inquiry here.
We need more of these long-term follow-up studies…particularly the ones that get to age 25+, see what’s happening in the labor market, etc.
Small note: High Tech High is almost the opposite of a “no excuses” school. HTH is project-based, group work, personalized, anti-standardized test, etc. I wasn’t clear if you were putting them in the NE category or just listing them as a type of charter.
Test scores seem to predict other test scores but not much else when controlling for other variables. Nice to know students’ academic proficiency and growth from one year to the next, but why base most of our accountability systems on such scores if they don’t predict life outcomes? I wonder which districts or charter schools have developed an alternative accountability system with measurements derived from their state longitudinal data systems (U.S. Dept of Ed invested $750 M on these since 2006)? Or are the SLDS data still languishing in data warehouses?
Jay, I sure hope you get a grad student who wants an easy publication to write this up as a paper. It needs to get into the “citeable” bloodstream.
It’s in the works but they take forever.
In other news, this post needed a catchy picture.
Doesn’t it depend a bit on the test, though, Jay? And perhaps on the point that whoever is doing the research wants to make? I’m not as familiar with all data in the learned papers as you are, but, as a parent of fairly recent college grads, it’s hard to ignore the empirical evidence garnered from watching their peer groups. The kids who got the better SAT scores, and passed the AP exams, in general, got into higher profile colleges and emerged to either more interesting or better remunerated careers. Don’t you think you would find a correlation between SAT scores, or certainly of performance on STEM AP’s, and starting salary of first post-college job, If that’s the case shouldn’t that inform policy that it might be a great idea for a governor or a secretary of Ed to set goals in those categories ( eg “C’mon folks, let’s get half the kids to pass AP Calculus!” )
There is no doubt that the LEVEL of test results is strongly correlated with later life outcomes. The question is whether CHANGING test scores produces roughly commensurate changes in later life outcomes.
Maybe I’m missing something? If you change the test scores for the better, doesn’t that, by definition improve the level and therefore the outcome? Seems to me that we spend billions on searching for some magic curriculum fix, or grading cut point terminology, that will effortlessly improve results, while ignoring the evidence, staring us in the face, that all the foreign systems that outperform us place the burden for performance squarely on the shoulders of the students in the form of very high stakes testing ( don’t pass your A-levels, Baccalaureate, Abitur, Board Exams etc, and you don’t go to college ). Foreign students prioritize differently from our kids – that’s why they do better. Foreign systems are set up, accordingly.
[…] it is worth noting that there is a general disconnect between test scores and later life outcomes. It is highly reductionistic to measure the success of […]
[…] Greene argues. Greene, who heads the Department of Education Reform at the University of Arkansas, has written that test scores don’t capture long-term benefits from schools like graduation rates and […]
[…] rigorously conducted studies of charter schools, including those of the Harlem Promise Academy, KIPP, High Tech High, SEED boarding charter schools, and no excuses charters in Texas. While of course […]
[…] a copy of Jay Greene’s article, “Evidence for the Disconnect Between Changing Test Scores and Changing Later Life Outcomes.” […]
[…] in students’ life outcomes? University of Arkansas professor Jay Greene has written about the disconnect between test scores and lifelong outcomes. Greene […]
[…] 13. See http://educationnext.org/mostly-care-test-scores-private-school-choice-not/ and also https://jaypgreene.com/2016/11/05/evidence-for-the-disconnect-between-changing-test-scores-and-chang…. […]
[…] choice argue that requiring a state exam may drive away effective schools, and that test scores are poor measures of school […]
[…] choice argue that requiring a state exam may drive away effective schools, and that test scores are poor measures of school […]
[…] late 2016, Jay P. Greene produced a short and brilliant paper that challenged that assumption. I have fallen into the habit of asking myself whether the young people who are super-stars in many […]
[…] Ravitch exhumed a blog post by Jay P. Greene, a charter school advocate, who begrudgingly acknowledged that there was no […]
We briefly met and spoke at the “Failures to Fixes” Show Me Institute conference in KC last May.
Using the results of onto-epistemologically challenged standardized tests for anything is, as Noel Wilson states “vain and illusory”. To understand why he states such I suggest that you read his 1997 dissertation that totally destroys the concepts of educational standards and standardized testing. See: “Educational Standards and the Problem of Error” found at:
or for a shorter version on the invalidities involved in standardized testing read his “A Little Less than Valid: An Essay Review”
I sent you a copy of my book “Infidelity to Truth: Education Malpractice in American Public Education”* last summer. If you did not receive it please email me with your address and I’ll send you another. My email: email@example.com
*In it I discuss the purpose of American public education and of government in general, issues of truth in discourse, justice and ethics in teaching practices, the abuse and misuse of the terms standards and measurement which serve to provide an unwarranted pseudo-scientific validity/sheen to the standards and testing regime and how the inherent discrimination in that regime should be adjudicated to be unconstitutional state discrimination no different than discrimination via race, gender, disability, etc. . . .
[…] is not the first reform outfit to question the BS Tests’ value. Jay Greene was beating this drum a year and a half […]
[…] Jay Greene (no relation), head of the Department of Education Reform at the University of Arkansas, was writing about the disconnect in test scores— if test scores were going up, wasn’t that supposed to improve “life […]
[…] Jay Greene (no relation), head of the Department of Education Reform at the University of Arkansas, was writing about the disconnect in test scores – if test scores were going up, wasn’t that supposed to improve “life […]
[…] get to all the studies that show that the measure of choice– raised test scores– is bunk. We should believe that this is a myth because AFC says […]