The peer review process in economics suffers from persistent inefficiencies, including a shortage of qualified reviewers and long decision times, often exceeding six months. A new IZA Discussion Paper by Pat Pataranutaporn, Nattavudh Powdthavee, and Pattie Maes examines whether large language models (LLMs) can alleviate these challenges.
To rigorously assess AI’s role, the study conducted a large-scale experiment using 9,030 submissions derived from 30 recently published economics papers. These included research from top-five economics journals, mid-tier and lower-ranked publications, as well as AI-generated papers designed to mimic top-tier research. The researchers systematically varied author attributes—such as institutional affiliation, reputation, and gender—to analyze AI’s decision-making patterns.
Findings show that AI can effectively distinguish between different quality levels, suggesting its potential to reduce editorial workload. However, AI evaluations exhibit systematic biases, favoring submissions from prestigious institutions, well-known economists, and male authors, even when research content is identical. Additionally, AI struggles to differentiate genuine top-tier research from high-quality AI-generated papers, raising concerns about its ability to assess novelty and originality.
The authors advocate for a hybrid peer review model, where AI assists in initial screening but final decisions remain with human reviewers. To ensure fairness, they recommend bias mitigation strategies such as training AI on anonymized data and refining evaluation criteria. The study highlights both the promise and risks of AI in academic publishing, emphasizing the need for careful integration to enhance efficiency without compromising research integrity.