Evidence for the Disconnect Between Changing Test Scores and Changing Later Life Outcomes

Over the last few years I have developed a deeper skepticism about the reliability of relying on test scores for accountability purposes. I think tests have very limited potential in guiding distant policymakers, regulators, portfolio managers, foundation officials, and other policy elites in identifying with confidence which schools are good or bad, ought to be opened, expanded, or closed, and which programs are working or failing. The problem, as I’ve pointed out in several pieces now, is that in using tests for these purposes we are assuming that if we can change test scores, we will change later outcomes in life. We don’t really care about test scores per se, we care about them because we think they are near-term proxies for later life outcomes that we really do care about — like graduating from high school, going to college, getting a job, earning a good living, staying out of jail, etc…

But what if changing test scores does not regularly correspond with changing life outcomes? What if schools can do things to change scores without actually changing lives? What evidence do we actually have to support the assumption that changing test scores is a reliable indicator of changing later life outcomes?

This concern is similar to issues that have arisen in other fields about the reliability of near-term indicators as proxies for later life outcomes. For example, as one of my colleagues noted to me, there are medicines that are able to lower cholesterol levels but do not reduce — or even may increase — mortality from heart disease. It’s important that we think carefully about whether we are making the same type of mistake in education.

If increasing test scores is a good indicator of improving later life outcomes, we should see roughly the same direction and magnitude in changes of scores and later outcomes in most rigorously identified studies. We do not. I’m not saying we never see a connection between changing test scores and changing later life outcomes (e.g. Chetty, et al); I’m just saying that we do not regularly see that relationship. For an indicator to be reliable, it should yield accurate predictions nearly all, or at least most, of the time.

To illustrate the un-reliability of test score changes, I’m going to focus on rigorously identified research on school choice programs where we have later life outcomes. We could find plenty of examples of disconnect from other policy interventions, such as pre-school programs, but I am focusing on school choice because I know this literature best. The fact that we can find a disconnect between test score changes and later life outcomes in any literature, let alone in several, should undermine our confidence in test scores as a reliable indicator.

I should also emphasize that by looking at rigorous research I am rigging things in favor of test scores. If we explored the most common use of test scores — examining the level of proficiency — there are no credible researchers who believe that is a reliable indicator of school or program quality. Even measures of growth in test scores or VAM are not rigorously identified indicators of school or program quality as they do not reveal what the growth would have been in the absence of that school or program. So, I think almost every credible researcher would agree that the vast majority of ways in which test scores are used by policymakers, regulators, portfolio managers, foundation officials, and other policy elites cannot be reliable indicators of the ability of schools or programs to improve later life outcomes.

With the evidence below I am exploring the largely imaginary scenario in which test scores changes can be attributed to schools or programs with confidence. Even then, the direction and magnitude of changing test scores does not regularly correspond with changing later life outcomes. I’ve identified 10 rigorously designed studies of charter and private school choice programs with later life outcomes. I’ve listed them below with a brief description of their findings and hyperlinks so you can read the results for yourself.

Notice any patterns? Other than the general disconnect between test scores and later life outcomes (in both directions), I notice that the No Excuses charter model that is currently the darling of the ed reform movement and that New York Times columnists have declared as the only type of “Schools that Work” tend not to fare nearly as well in later outcomes as they do on test scores. Meanwhile the unfashionable private choice schools and Mom and Pop charters seem to do much better on later life outcomes than at changing test scores. I don’t highlight this pattern as proof that we should shy away from No Excuses charters. I only mention it to suggest ways in which over-relying on test scores and declaring with confidence that we know what works and what doesn’t can lead to big policy mistakes.

Here are the 10 studies:

Boston charters (Angrist, et al, 2014) – Huge test score gains, no increase in HS grad rate or postsecondary attendance. Shift from 2 to 4 yr
Harlem Promise Academy (Dobbie and Fryer, 2014) – Same as Boston charters
KIPP (Tuttle, et al, 2015) – Large test score gains, no or small effect on HS grad rate, depending on analysis used
High Tech High (Beauregard, 2015) – Widely praised for improving test scores, no increase in college enrollment
SEED Boarding Charter (Unterman, et al, 2016) – same as Boston charters
TX No Excuses charters (Dobbie and Fryer, 2016) – Increase test scores and college enrollment, but no effect on earnings
Florida charters (Booker, et al, 2014) – No test score gains but large increase in HS grad rate, college attendance, and earnings
DC vouchers (Wolf, et al, 2013) – Little or no test score gain but large increase in HS grad rate
Milwaukee vouchers (Cowen, et al, 2013) – same as DC
New York vouchers (Chingos and Peterson, 2013) – modest test score gain, larger college enrollment improvement

This entry was posted on Saturday, November 5th, 2016 at 8:30 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

28 Responses to Evidence for the Disconnect Between Changing Test Scores and Changing Later Life Outcomes

pdexiii says:

November 5, 2016 at 10:20 pm

I’ve said this before, but will repeat it:
Our middle school has never had jaw-droppingly high test scores. Relative to the similar schools (demographics, etc.) we’re good, but on an absolute scale we aren’t.
While we have never tracked the results officially, anecdotally the later outcomes of our students (HS graduation, college-career participation) align with FL, DC, NYC and Milwaukee. Our HS graduation rate I’d guesstimate is 90%, and students either attending college or employed are close to the same rate.

Reply
Mike G says:

November 6, 2016 at 2:15 pm

Yes, good inquiry here.

We need more of these long-term follow-up studies…particularly the ones that get to age 25+, see what’s happening in the labor market, etc.

Small note: High Tech High is almost the opposite of a “no excuses” school. HTH is project-based, group work, personalized, anti-standardized test, etc. I wasn’t clear if you were putting them in the NE category or just listing them as a type of charter.

Reply
Barry Stern says:

November 6, 2016 at 5:00 pm

Test scores seem to predict other test scores but not much else when controlling for other variables. Nice to know students’ academic proficiency and growth from one year to the next, but why base most of our accountability systems on such scores if they don’t predict life outcomes? I wonder which districts or charter schools have developed an alternative accountability system with measurements derived from their state longitudinal data systems (U.S. Dept of Ed invested $750 M on these since 2006)? Or are the SLDS data still languishing in data warehouses?

Reply
Greg Forster says:

November 7, 2016 at 6:32 am

Jay, I sure hope you get a grad student who wants an easy publication to write this up as a paper. It needs to get into the “citeable” bloodstream.

Reply
- Jay P. Greene says:
  
  November 7, 2016 at 7:52 am
  
  It’s in the works but they take forever.
  
  Reply
  - Greg Forster says:
    
    November 7, 2016 at 6:41 pm
    
    In other news, this post needed a catchy picture.
Richard Evans says:

November 25, 2016 at 7:49 pm

Doesn’t it depend a bit on the test, though, Jay? And perhaps on the point that whoever is doing the research wants to make? I’m not as familiar with all data in the learned papers as you are, but, as a parent of fairly recent college grads, it’s hard to ignore the empirical evidence garnered from watching their peer groups. The kids who got the better SAT scores, and passed the AP exams, in general, got into higher profile colleges and emerged to either more interesting or better remunerated careers. Don’t you think you would find a correlation between SAT scores, or certainly of performance on STEM AP’s, and starting salary of first post-college job, If that’s the case shouldn’t that inform policy that it might be a great idea for a governor or a secretary of Ed to set goals in those categories ( eg “C’mon folks, let’s get half the kids to pass AP Calculus!” )

Reply
- Jay P. Greene says:
  
  November 26, 2016 at 8:14 am
  
  There is no doubt that the LEVEL of test results is strongly correlated with later life outcomes. The question is whether CHANGING test scores produces roughly commensurate changes in later life outcomes.
  
  Reply
  - Richard Evans says:
    
    November 27, 2016 at 1:57 pm
    
    Maybe I’m missing something? If you change the test scores for the better, doesn’t that, by definition improve the level and therefore the outcome? Seems to me that we spend billions on searching for some magic curriculum fix, or grading cut point terminology, that will effortlessly improve results, while ignoring the evidence, staring us in the face, that all the foreign systems that outperform us place the burden for performance squarely on the shoulders of the students in the form of very high stakes testing ( don’t pass your A-levels, Baccalaureate, Abitur, Board Exams etc, and you don’t go to college ). Foreign students prioritize differently from our kids – that’s why they do better. Foreign systems are set up, accordingly.
Setting the Record Straight on Detroit Charter Schools | What Did You Say? says:

December 1, 2016 at 5:55 pm

[…] it is worth noting that there is a general disconnect between test scores and later life outcomes. It is highly reductionistic to measure the success of […]

Reply
How vouchers transformed Indiana: Private schools now live or die by test scores, too | Chalkbeat says:

April 13, 2017 at 11:25 am

[…] Greene argues. Greene, who heads the Department of Education Reform at the University of Arkansas, has written that test scores don’t capture long-term benefits from schools like graduation rates and […]

Reply
If You Mostly Care About Test Scores, Private School Choice Is Not For You – by Jay P. Greene | #1 News Source For Teens says:

April 28, 2017 at 8:12 am

[…] rigorously conducted studies of charter schools, including those of the Harlem Promise Academy, KIPP, High Tech High, SEED boarding charter schools, and no excuses charters in Texas. While of course […]

Reply
A Better “Care Package” for Test Takers - The Locker Room says:

May 22, 2017 at 9:50 am

[…] a copy of Jay Greene’s article, “Evidence for the Disconnect Between Changing Test Scores and Changing Later Life Outcomes.” […]

Reply
School Voucher Programs in Indiana and Louisiana – by Marty Lueken | #1 News Source For Teens says:

June 27, 2017 at 10:51 pm

[…] in students’ life outcomes? University of Arkansas professor Jay Greene has written about the disconnect between test scores and lifelong outcomes. Greene […]

Reply
More Findings About School Vouchers and Test Scores, and They are Still Negative – by Mark Dynarski | #1 News Source For Teens says:

July 17, 2017 at 12:03 am

[…] 13. See http://educationnext.org/mostly-care-test-scores-private-school-choice-not/ and also https://jaypgreene.com/2016/11/05/evidence-for-the-disconnect-between-changing-test-scores-and-chang…. […]

Reply
As Tax Credit Scholarships Expand, Questions About Accountability and Outcomes | The 74 | Scholarship Database says:

August 3, 2017 at 10:38 pm

[…] choice argue that requiring a state exam may drive away effective schools, and that test scores are poor measures of school […]

Reply
To Test or Not to Test: As Tax Credit Scholarships Expand, Questions About Accountability and Outcomes | Scholarship Database says:

October 9, 2017 at 5:51 am

[…] choice argue that requiring a state exam may drive away effective schools, and that test scores are poor measures of school […]

Reply
Jay P. Greene: The Disconnect Between Changing Test Scores and Changing Life Outcomes | Diane Ravitch's blog says:

March 15, 2018 at 7:00 am

[…] late 2016, Jay P. Greene produced a short and brilliant paper that challenged that assumption. I have fallen into the habit of asking myself whether the young people who are super-stars in many […]

Reply
Studies Show No Connection Between Test Scores and Life Outcomes… But, Nevertheless, Testing Persists! | Network Schools - Wayne Gersen says:

March 16, 2018 at 4:56 am

[…] Ravitch exhumed a blog post by Jay P. Greene, a charter school advocate, who begrudgingly acknowledged that there was no […]

Reply
Duane E Swacker says:

March 19, 2018 at 5:15 pm

Jay,

We briefly met and spoke at the “Failures to Fixes” Show Me Institute conference in KC last May.

Using the results of onto-epistemologically challenged standardized tests for anything is, as Noel Wilson states “vain and illusory”. To understand why he states such I suggest that you read his 1997 dissertation that totally destroys the concepts of educational standards and standardized testing. See: “Educational Standards and the Problem of Error” found at:

http://epaa.asu.edu/ojs/article/view/577/700

or for a shorter version on the invalidities involved in standardized testing read his “A Little Less than Valid: An Essay Review”

http://edrev.asu.edu/index.php/ER/article/view/1372/43

I sent you a copy of my book “Infidelity to Truth: Education Malpractice in American Public Education”* last summer. If you did not receive it please email me with your address and I’ll send you another. My email: duaneswacker@gmail.com

*In it I discuss the purpose of American public education and of government in general, issues of truth in discourse, justice and ethics in teaching practices, the abuse and misuse of the terms standards and measurement which serve to provide an unwarranted pseudo-scientific validity/sheen to the standards and testing regime and how the inherent discrimination in that regime should be adjudicated to be unconstitutional state discrimination no different than discrimination via race, gender, disability, etc. . . .

Reply
Even Big Edu-‘Reformers’ Can See That Their Plan Has Failed | GFBrandenburg's Blog says:

March 20, 2018 at 1:45 pm

[…] is not the first reform outfit to question the BS Tests’ value. Jay Greene was beating this drum a year and a half […]

Reply
There is a time and a place for summative annual assessments. We just have been using them inappropriately for decades now | Eslkevin's Blog says:

September 23, 2018 at 6:24 pm

[…] Jay Greene (no relation), head of the Department of Education Reform at the University of Arkansas, was writing about the disconnect in test scores— if test scores were going up, wasn’t that supposed to improve “life […]

Reply
Is the Big Standardized Test a Big Standardized Flop? | MetropolisCafé.US says:

September 23, 2018 at 9:07 pm

[…] Jay Greene (no relation), head of the Department of Education Reform at the University of Arkansas, was writing about the disconnect in test scores – if test scores were going up, wasn’t that supposed to improve “life […]

Reply
DeVos Organization Issues School Choice Guidebook - Garn Press says:

October 24, 2018 at 1:56 pm

[…] get to all the studies that show that the measure of choice– raised test scores– is bunk. We should believe that this is a myth because AFC says […]

Reply
Peter Greene: Petrilli Says, Let’s Do NCLB Again | Diane Ravitch's blog says:

September 6, 2023 at 9:00 am

[…] future real-life outcomes (absolutely not a shred of evidence–even reformster Jay Greene said as much).”It’s true that No Child Left Behind was imperfect,” says Petrilli. No. It stunk. […]

Reply
Is there another way to increase student achievement? - Jaspercommunityteam says:

November 30, 2024 at 10:30 am

[…] is not causation. There is still a huge gap in research surrounding the test; we are still lack of evidence that changing a student’s test score will change the outcome of the student’s […]

Reply
Research: Is There Another Way To Lift Student Achievement? - Vontanews says:

December 1, 2024 at 1:53 pm

[…] correlation is not causation. There’s still a huge gap in research around the test; we are still missing evidence that changing a student’s test score will change the student’s life […]

Reply
Research: Is There Another Way To Lift Student Achievement? - 25finz, L.L.C says:

December 1, 2024 at 2:15 pm

[…] correlation is not causation. There’s still a huge gap in research around the test; we are still missing evidence that changing a student’s test score will change the student’s life […]

Reply