麻豆影视

麻豆影视

Texas Will Use Computers to Grade Written Answers on This Year鈥檚 STAAR Tests

The state will save more than $15 million by using technology similar to ChatGPT to give initial scores, reducing the number of human graders needed.

STAAR testing begins across Texas on Tuesday. (Eli Hartman/The Texas Tribune)

Help fund stories like this.

Students sitting for their STAAR exams this week will be part of a new method of evaluating Texas schools: Their written answers on the state鈥檚 standardized tests will be graded automatically by computers.

The Texas Education Agency is rolling out an 鈥渁utomated scoring engine鈥 for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies. The technology, which uses natural language processing technology like artificial intelligence chatbots such as GPT-4, will save the state agency about $15-20 million per year that it would otherwise have spent on hiring human scorers through a third-party contractor.

The change comes after the STAAR test, which measures students鈥 understanding of state-mandated core curriculum, was redesigned in 2023. The test now includes fewer multiple choice questions and more open-ended questions 鈥 known as constructed response items. After the redesign, there are six to seven times more constructed response items.

鈥淲e wanted to keep as many constructed open ended responses as we can, but they take an incredible amount of time to score,鈥 said Jose Rios, director of student assessment at the Texas Education Agency.

In 2023, Rios said TEA hired about 6,000 temporary scorers, but this year, it will need under 2,000.

To develop the scoring system, the TEA gathered 3,000 responses that went through two rounds of human scoring. From this field sample, the automated scoring engine learns the characteristics of responses, and it is programmed to assign the same scores a human would have given.

This spring, as students complete their tests, the computer will first grade all the constructed responses. Then, a quarter of the responses will be rescored by humans.

When the computer has 鈥渓ow confidence鈥 in the score it assigned, those responses will be automatically reassigned to a human. The same thing will happen when the computer encounters a type of response that its programming does not recognize, such as one using lots of slang or words in a language other than English.

鈥淲e have always had very robust quality control processes with humans,鈥 said Chris Rozunick, division director for assessment development at the Texas Education Agency. With a computer system, the quality control looks similar.

Every day, Rozunick and other testing administrators will review a summary of results to check that they match what is expected. In addition to 鈥渓ow confidence鈥 scores and responses that do not fit in the computer鈥檚 programming, a random sample of responses will also be automatically handed off to humans to check the computer鈥檚 work.

TEA officials have been resistant to the suggestion that the scoring engine is artificial intelligence. It may use similar technology to chatbots such as GPT-4 or Google鈥檚 Gemini, but the agency has stressed that the process will have systematic oversight from humans. It won鈥檛 鈥渓earn鈥 from one response to the next, but always defer to its original programming set up by the state.

鈥淲e are way far away from anything that鈥檚 autonomous or can think on its own,鈥 Rozunick said.

But the plan has still generated worry among educators and parents in a world still weary of the influence of machine learning, automation and AI.

Some educators across the state said they were caught by surprise at TEA鈥檚 decision to use automated technology 鈥 also known as hybrid scoring 鈥 to score responses.

鈥淭here ought to be some consensus about, hey, this is a good thing, or not a good thing, a fair thing or not a fair thing,鈥 said Kevin Brown, the executive director for the Texas Association of School Administrators and a former superintendent at Alamo Heights ISD.

Representatives from TEA first mentioned interest in automated scoring in testimony to the Texas House Public Education Committee in August 2022. In the fall of 2023, the agency announced the move to hybrid scoring at a conference and during test coordinator training before releasing details of the process in December.

The STAAR test results are a key part of the accountability system TEA uses to grade school districts and individual campuses on an A-F scale. Students take the test every year from third grade through high school. When campuses within a district are underperforming on the test, state law allows the Texas education commissioner to intervene.

The commissioner can appoint a conservator to oversee campuses and school districts. State law also allows the commissioner to suspend and replace elected school boards with an appointed board of managers. If a campus receives failing grades for five years in a row, the commissioner is required to appoint a board of managers or close that school.

With the stakes so high for campuses and districts, there is a sense of uneasiness about a computer鈥檚 ability to score responses as well as a human can.

鈥淭here’s always this sort of feeling that everything happens to students and to schools and to teachers and not for them or with them,鈥 said Carrie Griffith, policy specialist for the Texas State Teachers Association.

A former teacher in the Austin Independent School District, Griffith added that even if the automated scoring engine works as intended, 鈥渋t’s not something parents or teachers are going to trust.鈥

Superintendents are also uncertain.

鈥淭he automation is only as good as what is programmed,鈥 said Lori Rapp, superintendent at Lewisville ISD. School districts have not been given a detailed enough look at how the programming works, Rapp said.

The hybrid scoring system was already used on a limited basis in December 2023. Most students who take the STAAR test in December are retaking it after a low score. That鈥檚 not the case for Lewisville ISD, where high school students on an altered schedule test for the first time in December, and Rapp said her district saw a 鈥渄rastic increase鈥 in zeroes on constructed responses.

鈥淎t this time, we are unable to determine if there is something wrong with the test question or if it is the new automated scoring system,鈥 Rapp said.

The state overall saw an increase in zeroes on constructed responses in December 2023, but the TEA said there are other factors at play. In December 2022, the only way to score a zero was by not providing an answer at all. With the STAAR redesign in 2023, students can receive a zero for responses that may answer the question but lack any coherent structure or evidence.

The TEA also said that students who are retesting will perform at a different level than students taking the test for the first time. 鈥淧opulation difference is driving the difference in scores rather than the introduction of hybrid scoring,鈥 a TEA spokesperson said in an email.

For $50, students and their parents can request a rescore if they think the computer or the human got it wrong. The fee is waived if the new score is higher than the initial score. For grades 3-8, there are no consequences on a student鈥檚 grades or academic progress if they receive a low score. For high school students, receiving a minimum STAAR test score is a common way to fulfill one of the state graduation requirements, but it is not the only way.

Even with layers of quality control, Round Rock ISD Superintendent Hafedh Azaiez said he worries a computer could 鈥渕iss certain things that a human being may not be able to miss,鈥 and that room for error will impact students who Azaiez said are 鈥渢rying to do his or her best.鈥

Test results will impact 鈥渉ow they see themselves as a student,鈥 Brown said, and it can be 鈥渉umiliating鈥 for students who receive low scores. With human graders, Brown said, 鈥渟tudents were rewarded for having their own voice and originality in their writing,鈥 and he is concerned that computers may not be as good at rewarding originality.

Julie Salinas, director of assessment, research and evaluation at Brownsville ISD said she has concerns about whether hybrid scoring is 鈥渁llowing the students the flexibility to respond鈥 in a way that they can demonstrate their 鈥渇ull capability and thought process through expressive writing.鈥

Brownsville ISD is overwhelmingly Hispanic. Students taking an assessment entirely in Spanish will have their tests graded by a human. If the automated scoring engine works as intended, responses that include some Spanish words or colloquial, informal terms will be flagged by the computer and assigned to a human so that more creative writing can be assessed fairly.

The system is designed so that it 鈥渄oes not penalize students who answer differently, who are really giving unique answers,鈥 Rozuick said.

With the computer scoring now a part of STAAR, Salinas is focused on adapting. The district is incorporating tools with automated scoring into how teachers prepare students for the STAAR test to make sure they are comfortable.

鈥淥ur district is on board and on top of the things that we need to do to ensure that our students are successful,鈥 she said.

Disclosure: Google, the Texas Association of School Administrators and Texas State Teachers Association have been financial supporters of The Texas Tribune, a nonprofit, nonpartisan news organization that is funded in part by donations from members, foundations and corporate sponsors. Financial supporters play no role in the Tribune’s journalism. Find a complete .

This article originally appeared in at .

The Texas Tribune is a member-supported, nonpartisan newsroom informing and engaging Texans on state politics and policy. Learn more at texastribune.org.

Help fund stories like this.

Republish This Article

We want our stories to be shared as widely as possible 鈥 for free.

Please view The 74's republishing terms.





On The 74 Today