Robot Essay Grading

I received this amazing press release from Tom Vander Ark about how computer grading of essays may be as accurate as human grading. I’m not sure if this means that computer grading has really advanced or if human grading really stinks. Besides, I don’t even know why the scientists invented the robots.

In any event, here is the release:

A direct comparison between human graders and software designed to score student essays achieved virtually identical levels of accuracy, with the software in some cases proving to be more reliable, a groundbreaking study has found.

The study, which was underwritten by the William and Flora Hewlett Foundation and conducted by experts in educational measurement and assessment, will be released here on Monday, April 16th, at the annual conference of the National Council on Measurement in Education. An advance copy of the study is available today at http://bit.ly/HJWwdP.

“The demonstration showed conclusively that automated essay scoring systems are fast, accurate, and cost effective,” said Tom Vander Ark, CEO of Open Education Solutions, which provides consulting serves related to digital learning, and co-director of the study.

That’s important because writing essays are one important way for students to learn critical reasoning, but teachers don’t assign them often enough because grading them is both expensive and time consuming. Automated scoring of essays holds the promise of lowering the cost and time of having students write so they can do it more often.

Education experts believe that critical reasoning and writing are part of a suite of skills that students need to be competitive in the 21st century. Others are working collaboratively, communicating effectively and learning how to learn, as well as mastering core academic content. The Hewlett Foundation calls this suite of skills Deeper Learning and is making grants to encourage its adoption at schools throughout the country.

“Better tests support better learning,” says Barbara Chow, Education Program Director at the Hewlett Foundation. “This demonstration of rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And, the more we can use essays to assess what students have learned, the greater the likelihood they’ll master important academic content, critical thinking, and effective communication.”

For more than 20 years, companies that provide automated essay scoring software have claimed that their systems can perform as effectively, more affordably and faster than other available methods of essay scoring. The study was the first comprehensive multi-vendor trial to test those claims. The study challenged nine companies that constitute more than ninety-seven percent of the current market of commercial providers of automated essay scoring to compare capabilities. More than 16,000 essays were released from six participating state departments of education, with each set of essays varying in length, type, and grading protocols. The essays were already hand scored according to state standards. The challenge was for companies to approximate established scores by using software.

At a time when the U.S. Department of Education is funding states to design and develop new forms of high-stakes testing, the study introduces important data. Many states are limited to multiple-choice formats, because more sophisticated measures of academic performance cost too much to grade and take too long to process. Forty-five states are already actively overhauling testing standards, and many are considering the use of machine scoring systems.

The study grows from a contest call the Automated Student Assessment Prize, or ASAP, which the Hewlett Foundation is sponsoring to evaluate the current state of automated testing and to encourage further developments in the field.

In addition to looking at commercial vendors, the contest is offering $100,000 in cash prizes in a competition open to anyone to develop new automated essay scoring techniques. The open competition is underway now and scheduled to close on April 30th. The pool of $100,000 will be awarded the best performers. Details of the public competition are available atwww.kaggle.com/c/ASAP-AES . The open competition website includes an active leader board to document prize rules, regularly updated results, and discussion threads between competitors.

The goal of ASAP is to offer a series of impartial competitions in which a fair, open and transparent participation process will allow key participants in the world of education and testing to understand the value of automated student assessment technologies.

ASAP is being conducted with the support of the Partnership for Assessment of Readiness for College and Careers and Smarter Balanced Assessment Consortium, two multi-state consortia funded by the U.S. Department of Education to develop next-generation assessments. ASAP is aligned with the aspirations of the Common Core State Standards and seeks to accelerate assessment innovation to help more students graduate from college and to become career ready.

Jaison Morgan, CEO of The Common Pool, a consulting business that specializes in developing effective incentive models for solving problems, and co-director of the study, said the prize and studies will raise broader awareness of the current capabilities of automated scoring of essays.

“By offering a private demonstration of current capabilities, we can reveal to our state partners what is already commercially available,” Morgan said. “But, by complimenting it with a public competition, we will attract new participants to the field and investment from new players. We believe that the public competition will trigger major breakthroughs.”

ASAP is preparing to introduce a second study, in which private providers and public competitors will be challenged to reveal the capabilities of automated scoring systems for grading short-answer questions. The second study will be conducted this summer. There are another three ASAP studies in development.

This entry was posted on Monday, April 16th, 2012 at 6:45 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

10 Responses to Robot Essay Grading

Matthew Ladner says:

April 16, 2012 at 9:35 am

Does the grading robot take steroids and speak with a heavy Austrian accent?

Reply
Greg Forster says:

April 16, 2012 at 10:37 am

Hasta la vista, split infinitive. [Blam blam blam!]

I wonder if robot grading would continue to be equally effective if it were regularly implemented – the old “high stakes testing” question. The results of standardized reading and math tests don’t seem to be affected by stakes, but I wonder if essay grading would be more vulnerable to the potential development of strategies for gaming the system.

I’m not saying it would, just wondering.

Reply
Ann in L.A. says:

April 16, 2012 at 11:44 am

I wonder thouh, if the computer can give constructive criticism. I can see using this on standardized tests, and as a overall-check even at the classroom level, but at some point a human is going to have to look it over and tell the student how to make it better, let them know what needs to be added, what isn’t essential, encourage more-complex sentences and thought, etc. My experience was always that I learned more from the corrections than anything else. I’d hate to see that disappear.

Reply
Teacher Joe, in Los Angeles says:

April 16, 2012 at 4:58 pm

Love Jay’s comment” I don’t know why they even invented the robot.”

Jay, this is why you need to get an emergency credential somehow and teach for 6 months. Please do so.

I quit trying to teach writing years ago ( I teach history). The paperwork and regulations increased by so much the time it takes to do the essential requirements of my job I had to drop helping the English teachers teach writing.

Anything that speeds any kind of paper correcting adds efficiency. Six years ago a district administrator complained about student writing and said how it had deteriorated. BUT the DISTRICT was NEVER smart enough to find a way to give us the time to correct essays and paragraphs.

Ann, I’ll have time to tell you how to make it better if the robot can correct your spelling, typographical, and grammar errors (thouh)

Reply
- Jay P. Greene says:
  
  April 16, 2012 at 6:03 pm
  
  Hey Joe — My comment was just a random joke about robots. Click the link and see. You’ll enjoy it.
  
  Reply
Niki Hayes says:

April 16, 2012 at 6:36 pm

This are a couple of issues that I see as separate from the purpose of this article, but I want to comment on them. The first one is picky, but there is a grammatical error in the sentence, “That’s important because writing essays are one important way for students…” It should be “…writing essays IS one important way…” Maybe they should use their robots to copyread their press releases.

Second, statements that keep making the rounds in education like this one drive me nuts: “…Others [21st century skills] are working collaboratively, communicating effectively and learning how to learn, as well as mastering core academic content.” This same stuff was laid on us as teachers in the early 1990’s. That’s when we were pushed to have “groups” in our rooms so students could learn to “work collaboratively with others.” Equity and effort were more important than accurate answers or grammar.

Then, in 1998-99, as a middle school counselor, I printed a handout for the school’s parents and students to show what was considered “poor” or “fair” in students, according to a survey of college professors and employers. I needed to rally parents to stay involved with their middle school children academic world. (Of course the middle school kids wanted their parents to disappear.) The source of this information was “Public Agenda, 1998,” Education Week, Vol XVII, no. 17, Jan. 8, 1998.

The professors / employers saw the following as “poor” or “fair”:

Grammar and spelling…77% / 77%
Ability to write clearly…81% / 73%
Basic math skills…65% / 62%
Being organized, good work habits…69% / 58%
Being motivated,conscientious…60% / 56%
Speaking English well…48% / 51%
Working with others effectively…31% / 35%
Honesty…27% / 33%

I would love to see a similar survey done today. Is the number one issue having employees be able to work collaboratively? Where do the academic skills rank in importance now with professors and employers?

Reply
Niki Hayes says:

April 16, 2012 at 6:39 pm

Well, egg is on my face for leaving an apostrophe off of “…middle school children’s academic world.” But, I’m just human. A robot would have caught that, I guess.

Reply
Monty Neill says:

April 17, 2012 at 5:03 pm

If the robot cannot give useful and accurate feedback to the student(s), its value is quite limited. One parent in VA took a sample computer scored essay and got a high score with nonsense, while her child got back generic and essentially useless info (given to all those with a given score). A reporter from I think a Pittsburgh paper found similar flaws. That said, accuracy of robots with humans in giving a score to a relatively short response to a prompt (a form of writing that does not deserve to be termed an essay) has been established for a while. The questions that stem from that point are 1) can the computerized scorer do as well with more complex, sophisticated, lengthier writing (such as research papers for college); and 2) how much value educational value is writing a page or so to a prompt? On the latter, common in many states, test prep (practice to beat the scoring rubric) is an epidemic. Silly to believe this will prepare people for college.

Reply
Greg Forster says:

April 18, 2012 at 10:06 am

A lot of the comments here seem to assume that the robots’ ability to grade essays is of no value unless they can also teach. I don’t see why. Couldn’t this just be a way of relieving human teachers from the drudgery of grading so they have more time to teach? I know that when I was teaching, I could have been a much more effective teacher if I hadn’t had to spend so much of my time grading.

I’m still not endorsing the idea, but it seems foolish to me to say that a grading robot would be of no value unless it could teach.

Reply
What Experts Realize About Common Core Standards: 2012 « COMMON CORE says:

September 11, 2012 at 1:04 pm

[…] with the software in some cases proving to be more reliable, a groundbreaking study has found.” >>read more>> Related article: April 30, 2012 ROBOT GRADERS […]

Reply