AI-generated feedback on writing: insights into efficacy and ENL student preference

International Journal of Educational Technology in Higher Education, Oct 2023

The question of how generative AI tools, such as large language models and chatbots, can be leveraged ethically and effectively in education is ongoing. Given the critical role that writing plays in learning and assessment within educational institutions, it is of growing importance for educators to make thoughtful and informed decisions as to how and in what capacity generative AI tools should be leveraged to assist in the development of students’ writing skills. This paper reports on two longitudinal studies. Study 1 examined learning outcomes of 48 university English as a new language (ENL) learners in a six-week long repeated measures quasi experimental design where the experimental group received writing feedback generated from ChatGPT (GPT-4) and the control group received feedback from their human tutor. Study 2 analyzed the perceptions of a different group of 43 ENLs who received feedback from both ChatGPT and their tutor. Results of study 1 showed no difference in learning outcomes between the two groups. Study 2 results revealed a near even split in preference for AI-generated or human-generated feedback, with clear advantages to both forms of feedback apparent from the data. The main implication of these studies is that the use of AI-generated feedback can likely be incorporated into ENL essay evaluation without affecting learning outcomes, although we recommend a blended approach that utilizes the strengths of both forms of feedback. The main contribution of this paper is in addressing generative AI as an automatic essay evaluator while incorporating learner perspectives.

AI-generated feedback on writing: insights into efficacy and ENL student preference

(2023) 20:57 Escalante et al. Int J Educ Technol High Educ https://doi.org/10.1186/s41239-023-00425-2 RESEARCH ARTICLE International Journal of Educational Technology in Higher Education Open Access AI‑generated feedback on writing: insights into efficacy and ENL student preference Juan Escalante1* , Austin Pack1 and Alex Barrett2 *Correspondence: 1 Faculty of Education and Social Work, Brigham Young UniversityHawaii, 55‑220 Kulanui Street, Laie, HI 96762, USA 2 College of Education, Florida State University, Stone Building, 114 West Call Street, Tallahassee, FL 32306, USA Abstract The question of how generative AI tools, such as large language models and chatbots, can be leveraged ethically and effectively in education is ongoing. Given the critical role that writing plays in learning and assessment within educational institutions, it is of growing importance for educators to make thoughtful and informed decisions as to how and in what capacity generative AI tools should be leveraged to assist in the development of students’ writing skills. This paper reports on two longitudinal studies. Study 1 examined learning outcomes of 48 university English as a new language (ENL) learners in a six-week long repeated measures quasi experimental design where the experimental group received writing feedback generated from ChatGPT (GPT-4) and the control group received feedback from their human tutor. Study 2 analyzed the perceptions of a different group of 43 ENLs who received feedback from both ChatGPT and their tutor. Results of study 1 showed no difference in learning outcomes between the two groups. Study 2 results revealed a near even split in preference for AI-generated or human-generated feedback, with clear advantages to both forms of feedback apparent from the data. The main implication of these studies is that the use of AI-generated feedback can likely be incorporated into ENL essay evaluation without affecting learning outcomes, although we recommend a blended approach that utilizes the strengths of both forms of feedback. The main contribution of this paper is in addressing generative AI as an automatic essay evaluator while incorporating learner perspectives. Keywords: Automated writing evaluation, ChatGPT, Artificial intelligence, Language education Introduction Automated writing evaluation (AWE) systems such as Grammarly and Pigai assist learners and educators in the writing process by providing corrective feedback on learner writing. These systems, and older tools such as spelling and grammar checkers, rely on natural language processing to identify errors and infelicities in writing and suggest improvements. However, with the recent unleashing of highly sophisticated generative pretrained transformer (GPT) large language models (LLMs), such as GPT-4 by OpenAI and PaLM 2 by Google, AWE may be entering a new era. As Godwin-Jones (2022) pointed out in his treatise on AWE tools in second language writing, GPT-powered programs are capable of not only correcting errors in essays, © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/. Escalante et al. Int J Educ Technol High Educ (2023) 20:57 but can also compose essays. Given a simple prompt, generative artificial intelligence (GenAI) LLMs and chatbots that allow users to interface with LLMs, such as ChatGPT and Bard, can produce complete essays that are passable at the university level (AbdElaal et al., 2022; Herbold et al., 2023). It is also possible for English as a new language (ENL) writers to use GPT-powered machine translation to turn their essays written in their first language (L1) into an English essay (Godwin-Jones, 2022) take problematic writing and correct any mistakes wholesale, change its tone from informal to academic, or add cohesive elements like discourse markers (Tate et al., 2023). Educators have begun to use AI-powered plagiarism detectors to identify student submissions that were generated by AI, yet AI paraphrasing programs like Quillbot have been found to render AI-generated text indetectable by such tools (Krishna et al., 2023). With millions of users engaging with ChatGPT and other GenAI tools since ChatGPT’s debut in November of 2022, public discourse has speculated on the disruptive and problematic nature of these tools for the field of education (Lampropoulos et al., 2023). The public reaction to GenAI in education has been diverse. In Fütterer et al.’s (2023) systematic review of popular publications across Australia, New Zealand, The U.K., and the U.S., general sentiment appeared evenly split between positive and negative, but concerns about academic integrity have been raised (Sullivan, 2023), with some educational institutions deciding to ban ChatGPT than to allow its use (Yang, 2023). The disruption GenAI represents for language education has been likened to the pocket calculator’s impact on math education (Urlaub & Dessein, 2022), when institutions debated between prohibiting the technology or incorporating it by rethinking the educational objectives of math education. The prevailing sentiment on GenAI seems to be that reforms are needed to adapt educational practices in accommodation of the technology (Fütterer et al., 2023; Tseng & Warschauer, 2023). However, research is urgently needed so that teachers, students, and instructional designers can appropriately apply GenAI in education (Chiu et al., 2023). This article represents a step in the direction of better understanding how GenAI might be used in language learning classrooms by examining how language teachers and learners employ it in the writing process. Specifically, we will attempt to investigate the efficacy of using GPT-4 as an AWE tool for generating corrective feedback on student writing and whether students will prefer this feedback over that of a human tutor. Overview of relevant literature ChatGPT is a public-facing GenAI chatbot that allows users to interface with LLMs. GenAI chatbots have been trained on a large corpus of language from the Internet to statistically predict the next most probable word in response to a user prompt; these responses are then put through an algorithm of reinforcement l (...truncated)


This is a preview of a remote PDF: https://educationaltechnologyjournal.springeropen.com/counter/pdf/10.1186/s41239-023-00425-2
Article home page: https://link.springer.com/article/10.1186/s41239-023-00425-2

Escalante, Juan, Pack, Austin, Barrett, Alex. AI-generated feedback on writing: insights into efficacy and ENL student preference, International Journal of Educational Technology in Higher Education, 2023, pp. 1-20, Volume 20, Issue 1, DOI: 10.1186/s41239-023-00425-2