(Guest post by Greg Forster)
I’m already seeing this study being discussed as if it debunked all use of test scores. Four researchers used statistical methods usually associated with measuring teacher effect on year-to-year test score gains, and used them to measure teacher effect on student height. They found a substantial apparent teacher effect on year-to-year changes in height, which is obviously a false positive.
This definitely debunks one way of using test scores – the way commonly used by technocrats and central controllers of the Common Core type. If you use only one year’s worth of data (or, technically, use two years of data to track one year’s worth of change in the data) you are getting a lot of noise along with your signal. Multiple years of change must be tracked before you can sort out signal from noise to measure a teacher’s effectiveness.
But serious scholarship had already long since debunked the one-year way of using test scores. This particular way of showing that technocratic abuse of test scores is absurd gains points for cleverness. However, the finding itself isn’t new. People who really care about measuring effective teaching have been complaining for years about technocratic abuse of test score data!
The technocrats and central controllers have done a lot to make the use of test scores look worthless and even counterproductive. If they don’t want look ridiculous in the way this study makes them look ridiculous, maybe they should start listening to serious scholars about the responsible use of data. Of course, if they did, they’d have to give up being technocrats entirely because technocracy always abuses data.
My thanks to Jay for helping me think this through before posting; thoughts here are my own.
As udsual, I have to refer people back to that book ” How to Lie with Statistics” and that book, while it needs to be updated, probably says it all—UNDERSTANDING data is a key element- and so few people really understand statistics- and data- and basals and ceilings ( especially in regard to growth of gifted students and those at the other end of the bell shaped curve..Measuring a teacher’s effectiveness is a great thing- but measuring teacher effectiveness with kids with special needs is still another issue- and measuring teacher effectiveness in Maine as opposed to Florida is still another issue.
VAM as a source of debate and discussion is the gift that keeps on giving.
Two things to note: (1) The paper’s finding that a teacher’s value-added on scores has a positive covariance between years but a teacher’s value-added on student height does not is evidence that value-added on scores is a meaningful signal, though, as the paper suggests, it appears to have a low signal-to-noise ratio; and (2) applying shrinkage methods did not alter the relative rankings of teachers–which means shrinkage only affected value-added by small amounts. A teacher ranked, say, third in her grade level is likely to still be ranked third after all the high tech stuff is applied.
Critics of VAM like to portray VAM as a terribly unfair system imposed on teachers (class sizes are so small!), while failing to note that some system of measuring performance has to be imposed on teachers. Relying on classroom observations is problematic because they are biased too, as Whitehurst and Chingos showed. So, what measure of teacher performance has superior properties (both statistical and cost) and should instead be used by districts?
Searching for perfect solutions is a great way to ensure nothing happens.
Yes—I tend to agree with Mark—-we are so preoccupied with searching for perfect solutions- we neglect to remember that not all teachers are perfect ( some are experienced, some not ) and that not all students are alike ( one wishes they were). AND of course when you link a valid statistical measure and cost- that too is a way to ensure ( to use Mark’s words) that nothing happens.
I don’t think it’s much of a defense to point out that rankings are unaffected. We aren’t much interested in rankings if we don’t know where on the objective scale they fall. If all the teachers are good we don’t want to fire any of them, not even the lowest-ranked. If all the teachers are bad we want to fire more than just the lowest-ranked.
School choice is a preferable method of measuring teacher effectiveness because parents can observe their own children with minute precision, and are good at knowing when the problem is the teacher.
Greg, if I understand your view (I may not!), so long as parents have choice to be the final deciders of school quality, then if School A uses VAM, School B uses observations, and School C never evaluates teachers at all, you’re fine with that, right?
Of course! We won’t really know what standards *or* what metrics are most effective until they can all get a fair chance to be tried without arbitrary political control. Choice creates the conditions under which clarity about standards and metrics becomes possible.
I think Mark’s point about rankings remaining unchanged despite the use of advanced adjustments like shrinkage actually supports your argument that VAM has a lot of noise that hinders its use for technocratic management. Even fancy adjustments that school systems are very likely to use fix the problem since they do not alter the info about which teachers are better or worse — their relative ranking.
Instead, I think Mark’s argument is that all metrics have defects and schools have to use something to manage, so maybe test VAM is the best of a bad bunch. I disagree because I don’t think schools have to use technocratic management at all. They can just use judgement. It is also filled with error but does not exploit the false authority of science.
Judgment has the additional merit of being able to measure qualitatively as well as quantitatively, which means it can measure the aspects of education that most people think are most important.
But of course, once we have school choice and thus arbitrary political interference in education has been removed, there’s no reason multiple methods can’t be used!
Who is going to be bold enough to tell us why VAM became so popular and what else could USED et al easily use to measure accountability (and with far less expense)?
VAM, like RTI ( and a lot of other initials ) becomes initially popular because people are looking for some magic wand, some magic bullet, magic concept, magic idea, that like SOMPA ( decades ago) gradually vanishes and evaporates…..As for accountability- we just need to address two words ( maybe more ) called homogeneity and heterogeneity—and how to measure teacher impact on classes that are either homogeneous, but increasingly heterogeneous…..as for expense, well, there is the old adage that ” you get what you pay for”…
Aside from test scores, what could schools assess/measure for accountability purposes? Are there no other ideas out there? Sandra
Sandra- as usual, you raise provocative questions- that need answers or at least deep philosophical discussions—-and I think that we need to be discussing your question—” What could schools assess and measure for accountability purposes” and also ask what exactly ARE schools accountable for–since we HAVE drifted from Horace Mann and reading, writing and arithmetic and delved into providing services for LD, ED./BD, ADD, VI, HI, ID, TBI, PDD, OHI and as you know the list goes on—