AI-generated feedback on writing: insights into efficacy and ENL student preference
(2023) 20:57
Escalante et al. Int J Educ Technol High Educ
https://doi.org/10.1186/s41239-023-00425-2
RESEARCH ARTICLE
International Journal of Educational
Technology in Higher Education
Open Access
AI‑generated feedback on writing: insights
into efficacy and ENL student preference
Juan Escalante1* , Austin Pack1 and Alex Barrett2
*Correspondence:
1
Faculty of Education and Social
Work, Brigham Young UniversityHawaii, 55‑220 Kulanui Street,
Laie, HI 96762, USA
2
College of Education, Florida
State University, Stone Building,
114 West Call Street, Tallahassee,
FL 32306, USA
Abstract
The question of how generative AI tools, such as large language models and chatbots,
can be leveraged ethically and effectively in education is ongoing. Given the critical
role that writing plays in learning and assessment within educational institutions, it
is of growing importance for educators to make thoughtful and informed decisions
as to how and in what capacity generative AI tools should be leveraged to assist
in the development of students’ writing skills. This paper reports on two longitudinal studies. Study 1 examined learning outcomes of 48 university English as a new
language (ENL) learners in a six-week long repeated measures quasi experimental
design where the experimental group received writing feedback generated from ChatGPT (GPT-4) and the control group received feedback from their human tutor. Study
2 analyzed the perceptions of a different group of 43 ENLs who received feedback
from both ChatGPT and their tutor. Results of study 1 showed no difference in learning outcomes between the two groups. Study 2 results revealed a near even split
in preference for AI-generated or human-generated feedback, with clear advantages
to both forms of feedback apparent from the data. The main implication of these studies is that the use of AI-generated feedback can likely be incorporated into ENL essay
evaluation without affecting learning outcomes, although we recommend a blended
approach that utilizes the strengths of both forms of feedback. The main contribution
of this paper is in addressing generative AI as an automatic essay evaluator while incorporating learner perspectives.
Keywords: Automated writing evaluation, ChatGPT, Artificial intelligence, Language
education
Introduction
Automated writing evaluation (AWE) systems such as Grammarly and Pigai assist learners and educators in the writing process by providing corrective feedback on learner
writing. These systems, and older tools such as spelling and grammar checkers, rely
on natural language processing to identify errors and infelicities in writing and suggest
improvements. However, with the recent unleashing of highly sophisticated generative
pretrained transformer (GPT) large language models (LLMs), such as GPT-4 by OpenAI
and PaLM 2 by Google, AWE may be entering a new era.
As Godwin-Jones (2022) pointed out in his treatise on AWE tools in second language
writing, GPT-powered programs are capable of not only correcting errors in essays,
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/.
Escalante et al. Int J Educ Technol High Educ
(2023) 20:57
but can also compose essays. Given a simple prompt, generative artificial intelligence
(GenAI) LLMs and chatbots that allow users to interface with LLMs, such as ChatGPT
and Bard, can produce complete essays that are passable at the university level (AbdElaal et al., 2022; Herbold et al., 2023). It is also possible for English as a new language
(ENL) writers to use GPT-powered machine translation to turn their essays written in
their first language (L1) into an English essay (Godwin-Jones, 2022) take problematic
writing and correct any mistakes wholesale, change its tone from informal to academic,
or add cohesive elements like discourse markers (Tate et al., 2023). Educators have
begun to use AI-powered plagiarism detectors to identify student submissions that were
generated by AI, yet AI paraphrasing programs like Quillbot have been found to render
AI-generated text indetectable by such tools (Krishna et al., 2023). With millions of users
engaging with ChatGPT and other GenAI tools since ChatGPT’s debut in November of
2022, public discourse has speculated on the disruptive and problematic nature of these
tools for the field of education (Lampropoulos et al., 2023).
The public reaction to GenAI in education has been diverse. In Fütterer et al.’s (2023)
systematic review of popular publications across Australia, New Zealand, The U.K., and
the U.S., general sentiment appeared evenly split between positive and negative, but concerns about academic integrity have been raised (Sullivan, 2023), with some educational
institutions deciding to ban ChatGPT than to allow its use (Yang, 2023). The disruption GenAI represents for language education has been likened to the pocket calculator’s
impact on math education (Urlaub & Dessein, 2022), when institutions debated between
prohibiting the technology or incorporating it by rethinking the educational objectives
of math education. The prevailing sentiment on GenAI seems to be that reforms are
needed to adapt educational practices in accommodation of the technology (Fütterer
et al., 2023; Tseng & Warschauer, 2023). However, research is urgently needed so that
teachers, students, and instructional designers can appropriately apply GenAI in education (Chiu et al., 2023).
This article represents a step in the direction of better understanding how GenAI
might be used in language learning classrooms by examining how language teachers and
learners employ it in the writing process. Specifically, we will attempt to investigate the
efficacy of using GPT-4 as an AWE tool for generating corrective feedback on student
writing and whether students will prefer this feedback over that of a human tutor.
Overview of relevant literature
ChatGPT is a public-facing GenAI chatbot that allows users to interface with LLMs.
GenAI chatbots have been trained on a large corpus of language from the Internet to
statistically predict the next most probable word in response to a user prompt; these
responses are then put through an algorithm of reinforcement l (...truncated)