(Guest post by Ryan Marsh)
Many reform strategies are predicated on the belief that teachers have the largest impact on student achievement and that we can measures the teacher’s contribution with reasonable accuracy. Policies, such as performance pay or other efforts to recruit and retain effective teachers require reasonably accurate identification of which teachers are the most effective and which are the least at adding to their students’ achievement.
Value-added models, or VAMs, are the statistical models commonly used for this purpose. VAMs attempt to estimate teacher effectiveness by controlling for prior achievement and other student characteristics.
Two recent working papers have started a very important debate about the use of VAMs, a debate which will greatly influence future education policy and research. Economist Jesse Rothstein has a working paper in which he performed a critical analysis of these VAMs and their ability to estimate teacher effectiveness. His analysis focuses on the question of whether students are randomly assigned to teachers. If they are not, then the results of a VAMs should not necessarily be interpreted as causal estimates of teacher effectiveness. That is, if some teachers are non-randomly assigned students who will learn at a faster rate than others, then our estimates of who is an effective teacher could be biased.
Without getting too technical, Rothstein checks to see if a future teacher can predict past or present scores. If the teacher can predict growth in achievement for students before he or she becomes their teacher, then we have evidence of non-random assignment of students to teachers. After all, teachers could not have caused things that happened in the past.
But even if we have bias in VAMs from non-random assignment of students to teachers, the question is how seriously distorted are our assessments of who is an effective teacher. Many measures have biases and imperfections, but we still rely on them because the distortions are relatively minor. Rothstein recognizes this when he suggests on p. 32 a way of assessing the magnitude of the bias:
“An obvious first step is to compare non-experimental estimates of individual teachers’ effects in random assignment experiments with those based on pre- or post- experimental data (as in Cantrell, Fullerton, et. al 2007).”
The working paper he cites—by Steven Cantrell, Jon Fullerton, Thomas J. Kane, and Douglas O. Staiger—uses data from an experimental analysis of National Board for Professional Teaching Standards (NBPTS) certification. In the paper, the authors use a random assignment process where NBPTS applicant teachers are paired with non-applicant comparison teachers in their school and principals set up two classrooms which they would be willing to assign to the NBPTS teacher. Classes are randomly chosen for each teacher and compared with the class not chosen. The paper also uses VAMs to assess teacher effectiveness before the experiment was run. The prior effectiveness was used to predict how well a teacher’s students during the experiment performed above students in the comparison classrooms. This allows the researchers to test how well the VAM estimates compare with a random assignment experiment.
That is, teacher effectiveness was measured using VAMs before students were randomly assigned to teachers and then teacher effectiveness was measured after students were randomly assigned, when no bias would be present. The two correlate well, suggesting little distortion from the non-random assignment. As the authors conclude, the VAM estimates have “considerable predictive power in predicting student achievement during the experiment.”
In short, Rothstein raises a potentially lethal concern for policies based on value-added models, but another paper by Cantrell, et al suggests that the concerns may be little more than a flesh wound.