• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar

IZA Newsroom

IZA – Institute of Labor Economics

  • Home
  • Archive
  • Press Lounge
  • DE
  • EN
ResearchDecember 11, 2024

Can AI match human educators?

Study explores the potential and pitfalls of AI in feedback and grading

© IZA, created with Midjourney

As artificial intelligence rapidly transforms industries, its role in education is under increasing scrutiny, with tools like ChatGPT promising to ease workloads and personalize learning. A new IZA discussion paper by Arnaud Chevalier, Jakub Orzech and Petar Stankov investigates whether AI-powered tools, specifically ChatGPT 3.5 and 4, could match human instructors in providing feedback and grading student work.

Using a randomized controlled trial (RCT), undergraduate students were divided into three groups: those receiving feedback from human graders, ChatGPT 3.5, or ChatGPT 4. The quality of the feedback was evaluated based on the students’ performance on the subsequent assignment. The double-blind design ensured neither students nor instructors knew the source of feedback, isolating its effects on student outcomes.

Inconsistencies in grading reveal critical shortcomings

The results show that ChatGPT 4 can deliver feedback comparable to human instructors, with students receiving its guidance performing on par with those who received human feedback. In contrast, students who received feedback from ChatGPT 3.5 performed worse in subsequent assessments, suggesting that this earlier version of the AI struggled with providing actionable and effective insights.

When it came to grading, the study highlighted significant gaps. Both versions of ChatGPT tended to assign more generous grades than human graders, and their evaluations lacked consistency and contextual understanding. For example, ChatGPT 3.5 struggled with complex tasks like assessing draft work or interpreting tables and empirical data. Even ChatGPT 4, while more capable, showed limitations. Not only do the grade distributions differ, but the rank of students within the grade distribution varies considerably. Crucially, the variability in grades—where the same submission could receive drastically different scores—further highlights the current unsuitability of AI for grading.

AI shows potential to save educators time

While AI tools like ChatGPT show promise in reducing the time educators spend on feedback provision and marking, allowing them to focus more on teaching-oriented tasks, the study concludes that these tools are not yet ready to fully replace human expertise in grading. As generative AI technology continues to improve, this research provides critical insights for educators and policymakers navigating its integration into the classroom.

[Editor’s note: In keeping with the focus of the study, this summary is based on a ChatGPT-generated draft, edited by a human.]

Featured Paper:

IZA Discussion Paper No. 17511 Man vs Machine: Can AI Grade and Give Feedback Like a Human? Arnaud Chevalier, Jakub Orzech, Petar Stankov

Share this article

Share on X Share on Facebook Share on LinkedIn Share via e-mail
  • AI
  • artificial intelligence
  • education
  • feedback
  • grading
  • teaching
  • Arnaud Chevalier
  • Jakub Orzech
  • Petar Stankov
Previous Post
Shuffle
Next Post

Reader Interactions

Primary Sidebar

Recent Posts

  • June 4, 2025

    How workplaces shape the economic impact of caregiving shocks on mothers
  • June 2, 2025

    How administrative data fosters young economists’ careers
  • May 30, 2025

    Raising the glass raises risks for the whole family

Related Content

  • October 2, 2023

    How exposure to artificial intelligence affects worker well-being
  • September 4, 2024

    AI and automation: How should we tax the future?
  • 
  • 
  • Archive
  • 
  • Research
  • 
  • Can AI match human educators?

© 2013–2025 Deutsche Post STIFTUNGImprint | Privacy PolicyIZA