ALEC Releases New Report Card on American Education

January 23, 2012

(Guest Post by Matthew Ladner)

The American Legislative Exchange Council has released the 17th edition of the Report Card on American Education: Ranking State K-12 Performance, Progress and Reform. This edition has a forward by Indiana Governor Mitch Daniels and was coauthored by yours truly and Dan Lips.

Dan and I have updated the rankings of state academic performance based upon general education low-income students to reflect the 2011 NAEP and the rankings of state policy based upon the latest available rankings available.

Dan and I build the case that the historic changes seen in K-12 reform in 2011 represent “the end of the beginning” in the battle for K-12 Reform. Far, far, far more remains to be done than has been done of course, but from tenure reform to parental choice, reformers began to hold their own in 2011 for the first time on a widespread basis.

Chapter 2 is a formalized thought experiment on state academic achievement. Loyal Jayblog readers will recognize it from prototype arguments that were tested here in the blogosphere proving grounds. Chapter 3 provides a detailed ranking of state learning gains by student subgroups on the combined NAEP exams.

The book features state pages providing a wealth of information on each state, like this:


Finally, the fourth chapter discusses trends in technology based learning. You can download your copy for free here. If you would like a paper copy, email me and I’ll see what I can do.


NCLB Blamed for Ruining Teen Oral Sex

January 18, 2012

HT to Sara Mead for finding this incredible piece of research.  It is written in a language other than English, but if I remember how to translate properly stupid BS, this study appears to be claiming that an emphasis on individual academic achievement in school “crowds out” “the pleasure, choice, and mutuality” of teen fellatio and replaces it with an emphasis on “competence and skill usually associated with achievement and schooling.”  They know this from interviews with 98 girls between the ages of 12 and 17.

Here’s the abstract:

Young women’s narratives of their sexual experiences occur amid conflicting cultural discourses
of risk, abstinence, and moral panic. Yet young women, as social actors, find ways to make meaning of their
experiences through narrative. In this study, we focused on adolescent girls’ (N=98, age 12–17 years) narratives of their first experiences with oral sex. We document our unexpected findings of persistent discourses of performance which echo newly emergent academic achievement discourses. Burns and Torre (Feminism & Psychology 15 (1):21–26, 2005) argue that an extreme and high stakes focus on individual academic achievement in schools impoverishes young minds through the “hollowing” of their sexualities. We present evidence that such influence also works in the opposite direction, with an achievement orientation invading girls’ discourses of sexuality, “crowding out” possible narratives of pleasure, choice, and mutuality with narratives of competence and skill usually associated with achievement and schooling. We conclude with policy implications for the future development of “positive” sexuality narratives.

UPDATE — This is the abstract of the article, “‘It’s Like Doing Homework’  Academic Achievement Discourse in Adolescent Girls’ Fellatio Narratives” published in the journal Sexuality Research and Social Policy by professors from CUNY and University of Virginia.


The Value-Add Map Is Not the Teaching Territory, But You’ll Still Get Lost without It

January 11, 2012

(Guest post by Greg Forster)

Since we’re so deep into the subject of value-added testing and the political pressures surrounding it, I thought I’d point out this recently published study tracking two and a half million students from a major urban district all the way to adulthood. (HT Whitney Tilson)

They compare teacher-specific value added on math and English scores with eventual life outcomes, and apply tests to determine whether the results are biased either by student sorting on observable variables (the life outcomes of their parents, obtained from the same life-outcome data) or unobserved variables (they use teacher switches to create a quasi-experimental approach).

Finding?

Students assigned to high-VA teachers [i.e. teachers who produce high “value added” on test scores] are more likely to attend college, attend higher- ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement. They are also less likely to have children as teenagers. Teachers have large impacts in all grades from 4 to 8.

Let’s bring that down to reality:

Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase students’ lifetime income by more than $250,000 for the average classroom in our sample.

But here’s what I want to pay the most attention to. Note the careful wording of the conclusion:

We conclude that good teachers create substantial economic value and that test score impacts are helpful in identifying such teachers.

Note what they don’t say. They don’t say that increasing math and English test scores by itself leads to improved life outcomes. They say good teachers lead to improved life outcomes, and value-add is one relatively good way to identify good teachers.

You’ve heard the saying that the map is not the territory? (If not, that means you haven’t seen Ronin, in which case shame on you.) Well, it’s true. What raises life outcomes is good teaching, and good teaching can’t be reduced to test scores. (See here, here, here, here, here, here, here and here.)

But if you want to find your way around the territory, you need a map. If you want to help those kids stuck with lousy teachers who are out a quarter million, you’re going to need a tool that identifies them. Value added analysis is the best tool we’ve come up with yet – other than parental choice, of course.

And where the tests are freely selected and voluntarily adopted by schools, the tests provide helpful data for parents, so parent choice is strengthened by voluntary testing. That’s why over 90% of private schools use testing in some form. On the other hand, forcing teachers to use a test they don’t believe in is a self-defeating proposal.

But how do you get schools to want to use a test? Parent choice, of course! Choice is what creates the external standard of performance that makes assessment tools seem legitimate rather than illegitimate. So testing and choice are like chocholate and peanut butter – they’re two great tastes that taste great together.


Anticipating Responses from Gates

January 9, 2012

Over the weekend I posted about how I thought the Gates Foundation was spinning the results of their Measuring Effective Teachers Project to suggest that the combination of student achievement gains, student surveys, and classroom observations was the best way to have a predictive measure of teacher effectiveness.  Let me anticipate some of the responses they may have:

1) They might say that they clearly admit the limitations of classroom observations and therefore are not guilty of spinning the results to inflate their importance.  They could point to p. 15 of the research paper in which they write: “When value-added data are available, classroom observations add little to the ability to predict value-added gains with other groups of students. Moreover, classroom observations are less reliable than student feedback, unless many different observations are added together.”

Response: I said in my post over the weekend that the Gates folks were careful so that nothing in the reports is technically incorrect.  The distortion of their findings comes from the emphasis and manner of presentation.  For example, the summary of findings in the research paper on p. 9 states: “Combining observation scores with evidence of student achievement gains and student feedback improved predictive power and reliability.”  Or the “key findings” in the practitioner brief on p. 5 say: “”Observations alone, even when scores from multiple observations were averaged together, were not as reliable or predictive of a teacher’s student achievement gains with another group of students as a measure that combined observations with student feedback  and achievement gains on state tests.”  Notice that these summaries of the results fail to mention the most straightforward and obvious finding: classroom observations are really expensive and cumbersome and yet do almost nothing to improve the predictiveness of student achievement-based measures of teacher quality.

And the proof that the results are being spun is that the media coverage uniformly repeats the incorrect claim that multiple measures are an important improvement on test scores alone.  Either all of the reporters are lousy and don’t understand the reports or the reporters are accurately repeating what they are being told and what they overwhelmingly see in the reports.  My money is on the latter explanation.

And further proof that the reporters are being spun is that Vicki Phillips, the Gates education chief, is quoted in the LA Times coverage mis-characterizing the findings: “Using these methods to evaluate teachers is ‘more predictive and powerful in combination than anything we have used as a proxy in the past,’ said Vicki Phillips, who directs the Gates project.”  This is just wrong.  As I pointed out in my previous post, the combined measure is no more predictive than student achievement by itself.

Lastly, the standard for fair and accurate reporting of results is not whether one could find any way to show that technically the description of findings is not false.  We should expect the most straightforward and obvious description of findings emphasized.  With the Gates folks I feel like I am repeatedly parsing what the meaning of the word “is” is.  That’s political spin, not research.

2) They might say that classroom observations are an important addition because at least they provide diagnostic information about how teachers can improve, while test scores cannot.

Response:  This may be true, but it is not a claim supported by the Gates study.  They found that all of the different classroom observation methods they tried had very weak predictive power.  You can’t provide a lot of feedback about how to improve student achievement based on instruments that are barely correlated with gains in student achievement.  In addition, they were unable to find sub-components of the classroom observation methods that were more predictive, so they can’t tell teachers that they really need to do certain things, since those things are much more strongly related to student learning gains.  Lastly, it is simply untrue that test scores cannot be diagnostic.  There are sub-components of the tests that measure learning in different aspects of the subject.  Teachers could be told to emphasize more those areas on which their students have lagged.

3) They may say that classroom observations and students surveys improve the reliability of a teacher quality measure when combined with test scores.

Response: An increase in reliability is cold comfort for a lack of predictive power.  Reliability is just an indicator of how consistent a measure is.  There are plenty of measures that are very consistent but not helpful in predicting teacher quality.  For example, if we asked students to rate how attractive their teacher was, we would probably get a very “reliable” (consistent) measure from year to year and section to section.  But that consistency would not make up for the fact that attractiveness is unlikely to help improve the prediction of effective teaching.  So, the student survey has a high amount of consistency, but who knows what that is really measuring since it is only weakly related to student learning gains.  It is consistent, but consistently wrong.  Our focus should be on the predictive power of teacher evaluations and classrooms observations and student surveys don’t really do anything to help with that (at least, not according to the Gates study).

4) They may say that classroom observations and student surveys improve on the prediction of student effort and classroom environment.

Response: As I mentioned in the post over the weekend, they don’t really have validated measures of student effort and classroom environment.  The Gates folks took a lot of flack last year for focusing on test-score gains, so they came up with some non-test score outcome measures simply by taking some of the items from the students survey where students are asked about their effort or classroom environment.  We have no idea whether they have really measured the amount of effort students exert or the quality of the classroom environment, they are just using some survey answers on those items and claiming that they have measured those “outcomes.”  The only validated outcome measure we have in the Gates study are the test score gains, so we have to focus on that.

—————————————————————————————————

The good news is that my fears about the Gates study being used to dictate what teachers do have not been realized, at least not yet.  But it wasn’t for lack of trying.  If the classroom observations had worked a little better in predicting student learning gains, I’m sure we would have heard about how teachers should run their classrooms to produce greater gains.  But the classroom observations were so much of a dud that gates education chief, Vicki Phillips, didn’t even attempt to claim that they found that drill and kill is bad or that teachers should avoid teaching to the test.

But the inability to use the classroom observations to tell teachers the “right” way of teaching is another way of saying that the classroom observations are not able to be used for diagnostic purposes.  The most straightforward reading of the Gates results is that classroom observations appear to be an expensive and ineffective dud.  But it’s hard for an organization that spends $45 million on a project to scientifically validate classroom observations to admit that it failed.   It’s hard enough for a third-party evaluator to say that, let alone an in-house study about a key aspect of the Gates policy agenda.


How the Gates Foundation Spins its Research

January 7, 2012

The Gates Foundation has released the next installment of reports in their Measuring Effective Teachers Project.  When the last report was released, I found myself in a tussle with the Gates folks and Sam Dillon at the New York Times because I noted that the study’s results didn’t actually support the finding attributed to it.  Vicki Phillips, the education chief at Gates,  told the NYT and LA Times that the study showed that “drill and kill” and “teaching to the test” hurt student achievement when the study actually found no such thing.

With the latest round of reports, the Gates folks are back to their old game of spinning their results to push policy recommendations that are actually unsupported by the data.  The main message emphasized in the new round of reports is that we need multiple measures of teacher effectiveness, not just value-added measures derived from student test scores, to make reliable and valid predictions about how effective different teachers are at improving student learning.

This is the clear thrust of the newly released Policy and Practice Brief  and Research Paper and is obviously what the reporters are being told by the Gates media people.  For example, Education Week summarizes the report as follows:

…the study indicates that the gauges that appear to make the most finely grained distinctions of teacher performance are those that incorporate many different types of information, not those that are exclusively based on test scores.

And Ed Sector says:

The findings demonstrate the importance of multiple measures of teacher evaluation: combining observation scores, student achievement gains, and student feedback provided the most reliable and predictive assessment of a teacher’s effectiveness.

But buried away on p. 51 of the Research Paper in Table 16 we see that value-added measures based on student test results — by themselves — are essentially as good or better than the much more expensive and cumbersome method of combining them with student surveys and classroom observations when it comes to predicting the effectiveness of teachers.  That is, the new Gates study actually finds that multiple measures are largely a waste of time and money when it comes to predicting the effectiveness of teachers at raising student scores in math and reading.

According to Table 16, student achievement gains correlate with the underlying value-added by teachers at .69. If the test scores are combined (with an equal weighting) with the results of a student survey and classroom observations that rate teachers according to a variety of commonly-used methods, the correlation to underlying value-added drops to be between .57 and .61.  That is, combining test scores with other measures where all measures are equally weighted actually reduces reliability.

The researchers also present the results of a criteria weighted combination of student achievement gains, student surveys, and classroom observations based on the regression coefficients of how predictive each is of student learning growth in other sections for the same teacher.  Based on this the test score gains are weighted at .729, the student survey at .179, and the classroom observations at .092.  This tells us how much more predictive test score gains are than student surveys or classroom observations.  Yet even when test score gains constitute 72.9% of the combined measure, the correlation to underlying teacher quality still ranges between .66 and .72, depending on which method is used for rating the classroom observations.  The criteria-weighted combined measure provides basically no improvement in reliability over using test score gains by themselves.

And using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.  Using test scores alone the difference between the top quartile and bottom quartile teacher in producing  student value-added is .24 standard deviations in math learning growth on the state test.  If we combine test scores with student surveys and classroom observations using an equal weighting, the difference between top and bottom quartile teachers shrinks to be between .19 and .21.  If we use the criteria weights, where test scores are 72.9% of the combined measure, the gap between top and bottom teacher ranges between .22 and .25.  In short, using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.

The same basic pattern of results holds true for reading, which can be seen in Table 20 on p. 55 of the report.  Combining test score measures of teacher effectiveness with student surveys and classroom observations does improve a little our ability to predict how students would answer survey items about their effort in schools as well as how they felt about their classroom environment.  But unlike test scores, which have been shown to be strong predictors of later life outcomes, I have no idea whether these survey items accurately capture what they intend or have any importance for students’ lives.

Adding the student surveys and classroom observation measures to test scores yields almost no benefits, but it adds an enormous amount of cost and effort to a system for measuring teacher effectiveness.  To get the classroom observations to be usable, the Gates researchers had to have four independent observations of those classrooms by four separate people.  If put into practice in schools that would consume an enormous amount of time and money.  In addition, administering, scoring, and combing the student survey also has real costs.

So, why are the Gates folks saying that their research shows the benefits of multiple measures of teacher effectiveness when their research actually suggests virtually no benefits to combining other measures with test scores and when there are significant costs to adding those other measures?  The simple answer is politics.  Large numbers of educators and a segment of the population find relying solely on test scores for measuring teacher effectiveness to be unpalatable, but they might tolerate a system that combined test scores with classroom observations and other measures.  Rather than using their research to explain that these common preferences for multiple measures are inconsistent with the evidence, the Gates folks want to appease this constituency so that they can put a formal system of systematically measuring teacher effectiveness in place.  The research is being spun to serve a policy agenda.

This spinning of the findings  is not just an accident or the results of a misunderstanding.  It is clearly deliberate.  Throughout the two reports Gates just released, they regularly engage in the same pattern of presenting the information. They show that the classroom observation measures by themselves have weak reliability and validity in predicting effective teachers.  But if you add the student survey and then add the test score measures, you get much better measures of effective teachers.  This pattern of presentation suggests the importance of multiple measures, since the classroom observations are strengthened when other measures are added.  The only place you find the reliability and validity of test scores by themselves is at the bottom of the Research Paper in Tables 16 and 20.  If both the lay-version and technical reports had always shown how little test scores are improved by adding student surveys and classroom observations, it would be plain that test scores alone are just about as good as multiple measures.

The Gates folks never actually inaccurately describe their results (as Vicki Phillips did with the previous report).  But they are careful to frame the findings as consistently as possible with the Gates policy agenda of pushing a formal system of measuring teacher effectiveness that involves multiple measures.  And it worked, since the reporters are repeating this inaccurate spin of their findings.

———————————————————————-

(UPDATE — For a post anticipating responses from Gates, see here.)


Something Rotten in the State of NAEP?

November 10, 2011

(Guest Post by Matthew Ladner)

So if you measure the learning gains for children with disabilities on the four main NAEP exams for the entire period all 50 states and the DC have participated, you get the information in the above chart. Last week, the Bluegrass Institute’s Richard Innes alerted me in the comments and by email about fishy exclusion rates for children with disabilities and English Language Learners. I had only casually examined the exclusion rates, but having examined them more closely, I’m concerned.

The 2011 NAEP included standards for inclusion, which include 95% of all students selected for testing, including 85% of students with disabilities or classified as English Language Learners. One might possibly infer that some states were playing games and tricks with excluding such students in the past, and that simply listing the rates wasn’t doing the trick. This year, they listed expected standards and provided the gory details in an Appendix. On the conference call regarding the results, the NAEP team took pains to note this innovation.

So, as you can see, half of the states in the Top 10 gainers for children with disabilities just so happen to be states that violated the inclusion standards on one or more NAEP exam. Hmmm. Moreover, some of them didn’t just barely miss these standards, but instead chose to commit violence against them.

Maryland led the nation in gains among children with disabilities….or did they? Maryland’s inclusion rate for children with disabilities on the 4th grade reading test in 2011: 31%, which though completely pathetic actually beat the 30% rate for children with disabilities on the 8th grade reading test. The ELL rates were almost as bad.

The only other state to sink into the 30s? That would be second place Kentucky, which also excluded an enormous number of ELL students from NAEP examination. The math exams were better than the reading, but lo and behold- there is Maryland again falling below inclusion standards. Maryland failed to meet the 95% overall inclusion standard on 3 out of the 4 exams in 2011.

I have run the numbers for gains among children who are neither disabled nor ELL, and something real and positive is happening in Maryland: scores are up. It is however obvious that the NAEP created these standards for a reason, and have invited people to make up their own minds about whether to throw a skeptical flag in the air.

I’m throwing my flag. I don’t know if it explains all of the gains in Maryland and Kentucky, but it seems pretty obvious to me the results from those two states and perhaps others ought not to be considered comparable to the other states.

I’ve been told and I find it credible that these exclusions have only a small impact on the statewide numbers. Can we imagine however that very high exclusion rates for ELL students will not heavily bias the Hispanic number? Or that sky-high special ed exclusions won’t inflate a variety of subgroup scores? Or that excluding many of both of these subgroups won’t impact your Free and Reduced lunch eligible sample?

So given that the Congress mandated participation in NAEP as a part of NCLB, a mandate which all the federalist bones in my body find quite reasonable, perhaps it would be a jolly good idea for Congress to mandate minimum inclusion rates along with participation when reauthorization finally rolls around. Caesar’s wife must be above suspicion.


Los Estados que no desea ser reencarnado en si viene como un niño pobre de los hispanos.

November 4, 2011

(Guest Post by Matthew Ladner)

So how did my English to Spanish translator website do? I studied French while a student, which has come in handy about three times in my life, and may never do so again.

But I digress. The chart here ranks states by the percentage of low-income Hispanic students scoring “Below Basic” on the 4th grade NAEP reading exam in 2011.

Like any of these reincarnation charts, there are any number of factors to bear in mind. Some states have more ELL students than others, generational effects are important, and Hispanics are far from monolithic.

Nevertheless, isn’t it interesting that Oregon yet again makes an appearance in the hall of shame. Last time I visited, Oregon was way up in the Pacific Northwest and far from the southern border.

Now that the mandatory Oregon mocking is complete, let’s talk serious business: California is a disaster. The sheer size and low scores of the California Hispanic population ought to be a national concern. While it is fun to poke at Oregon for a being even worse than California, California’s Hispanic population is a sea to Oregon’s pond.

Matters are far better in Texas, the home of the second vast Hispanic population in America, but still very much in need of improvement.

California and Texas educate more than half the nation’s Hispanics, almost 5.5 million students. We need California to wake up, and for Texas to step up.


The New No Excuses: Where Not to be Reincarnated a Rich White Kid

November 3, 2011

(Guest Post by Matthew Ladner)

So the plot thickens, as many JPGB readers (including this author) was born as an American White kid who was not eligible for a Free or Reduced Lunch. In the Great Reincarnation to Come, maybe that is always how it works out!

Or maybe not.

In any case, you ought not to feel overly reassured. Assuming again that you want to learn to read, the above chart shows achievement levels from the 2011 NAEP for non-FRL eligible White students.

Before proceeding to dwell on West Virginia and others, I should note that DC has finally come in first place in something! If you are an ultra-wealthy White student going to one of the highly exclusive public schools in Georgetown, your reading ability rocks. Congratulations to the portion of the DC school into which few poor kids ever step foot much less attend.

Something has been going wrong in West Virginia, as their NAEP scores have been declining. Alaska is a different sort of place that obviously needs to get their act together on K-12. Tennessee can’t be happy to see themselves near the top of this list, and Nevada needs to let go of the idea that you don’t need to be well-educated to deal blackjack.

And then, there’s Oregon. Someone please explain to me why 21% of middle and upper income Anglos in Oregon should be illiterate.

 

 


Gates Responds

October 26, 2011

Steve Cantrell, a senior researcher at Gates, sent me an email last night in response to my post from yesterday asking for the MET results to be released.  He said that I was right in suggesting that large, complicated projects sometimes take longer than originally planned.  He said that final scores for coding the videos had just been delivered to the research team and that the full results for the 2009-10 year were now scheduled to be released January 5, 2012.  It’s unclear whether that report will also contain information for the 2010-11 year as well.  The MET web site will be changed to reflect this new schedule.  (Update: According to another email from Steve Cantrell, the January release will only have the full 09-10 results.  The final results including 10-11 and are scheduled for release in early summer of 2012 .)

Steve also clarified information on the cost of the project.  Last year I repeated the New York Times and LA Times description of the project costing $45 million.  More recently I’ve repeated the Wall Street Journal description of the project cost as $335 million.  Steve resolved the confusion by saying that the MET study costs about $50 million and the $335 million figure includes grants to the partner districts.

Let me be clear that I think Gates has a lot of good and smart people working on the MET project.  My concern is not that these are bad people.  My concern is that Gates has a flawed strategy based on centrally identifying what educators should do and then building a system of standards, curriculum, and assessments to impose those practices on the education system.  I don’t think this kind of centralized approach can work and I fear that it creates enormous pressure on good and smart researchers to toe the centralized line — even if it becomes obvious that it is not working.  Everyone at Gates can see what happened to the folks who pushed small schools when the Foundation decided that approach was not working.

And unlike Diane Ravitch, Valerie Strauss, and the Army of Angry Teachers, I am not criticizing the Gates Foundation because I think Bill Gates is in the “billionaire boys club” and therefore somehow disqualified from using his wealth to try to improve education.  I am critical of recent Gates Foundation efforts because I believe Gates can and should try to improve education by adopting a more fruitful strategy.

(corrected typos)


Buckle Up…2011 NAEP release on Nov. 1st

October 18, 2011

(Guest Post by Matthew Ladner)

NAEP is releasing 4th and 8th Grade Reading and Math results for 2011 on November 1st.

I’ll comb through the data and post the results here.