Artificial Intelligence-Generated Text in Higher Education – Usage and Detection in the Literature
Interdisciplinary Description of Complex Systems 22(3), 238-245, 2024
ARTIFICIAL INTELLIGENCE-GENERATED TEXT
IN HIGHER EDUCATION – USAGE AND
DETECTION IN THE LITERATURE
László Berek*
Óbuda University, University Library
Budapest, Hungary
DOI: 10.7906/indecs.22.3.1
Regular article
Received: 20 May 2024.
Accepted: 15 June 2024.
ABSTRACT
Since ChatGPT launch in November 2022, artificial intelligence has become more and more widespread
in all areas of life. Generative applications of artificial intelligence are proliferating in a wide range of
fields. The technology has great potential for applications such as machine translation, voice
recognition, education, or content creation, but it also raises concerns about misuse, ethical use, and
plagiarism. As texts generated by artificial intelligence tools continue to improve, detection tools on the
market will have to involve additional efforts to keep pace. This article uses data from the Scopus and
Web of Science databases to map the current usability of detectors, of texts generated by artificial
intelligence, in higher education and academia. One of the aims of the article is to provide an insight
into the experiences with currently available detectors of texts generated by artificial intelligence in
higher education.
KEY WORDS
artificial intelligence, AI-generated text detector, academic integrity, plagiarism, higher education
CLASSIFICATION
ACM: I.2.0, I.2.6, I.2.7
APA: 3550
JEL:
I21
*Corresponding author, : ; +36 (1) 666-55976;
*Óbuda University, University Library, Bécsi út 96/B, H-1034 Budapest, Hungary
Artificial intelligence-generated text in higher education – usage and detection in ...
INTRODUCTION
Artificial intelligence (AI) and Natural Language Processing (NLP) have made significant
progress over the past decade, and in the last few years, the solutions and opportunities offered
by new technology have spread to all areas of life. In addition to many other areas, generative
AI for textual content is of course making huge strides forward. The new technology offers a
great potential for applications such as machine translation, voice recognition, education, or
content creation, but it also raises concerns about misuse, ethical use, and plagiarism.
In recent decades, higher education institutions have made great strides towards detecting
plagiarism violations by students and researchers, with the help of the increasingly improved
plagiarism detection systems available on the market. In many universities, plagiarism checks
are a requirement as part of the education system for students’ midterm papers, theses, and
dissertations. At the Óbuda University, for example, a plagiarism detector has been part of the
institutional repository under the control of the University Library since 2011. Its use is not
only to check the students’ theses, but also to check plagiarism in the university’s journals and
other publications. [1, 2].
In the literature, the use of AI-generated text is commonly confused with plagiarism or is part
of the concept of plagiarism. On the one hand, it is understandable that we are talking about
some kind of unconscious plagiarism, whereby the generative AI creates the text using
available, previously published works, but in most cases the reference of the sources used by
the AI is not visible in the final result. (of course there are exceptions, platforms, and systems
where the insertion of the appropriate reference is a function of the software) In research on
AI-generated writing, the phenomenon is often referred to as patchwriting or cryptomnesia.
The research focuses on the conceptual definition of the phenomenon [3].
To create an AI-generated text, systems use huge amounts of text and other data available
online (online contents, books, journals, webpages...). By recognising and further learning
language patterns, relations, and contexts, they can evolve to create content similar to the
original human-written texts in their datasets. This is where the problem begins, these generated
texts are often not easily identifiable as generated text to the human eye. With the rapid
advances in technology and the learning process, it is predictable that this will lead to ever
more improved texts in the future.
The rise and use of generative AI in higher education is shown in the BestColleges survey
conducted in autumn 2023. The survey included 1000 respondents who are currently studying
at a university or college in the US. Students were asked to answer several questions related to
AI use. 56% of students reported that they had already used an AI tool to complete assignments.
In addition, 54% of respondents agreed with the statement that using AI to complete
assignments is cheating or plagiarism.[4] The percentage of responses to this question is shown
in Figure 1. A survey conducted six months earlier (March 2023), also by BestColleges, also
asked whether students use AI tools to solve problems. The rapid development of the use of AI
tools is shown by the fact that six months earlier, only 22% of students answered yes to the
question [5].
The development of artificial intelligence, and in particular generative AI, can be predicted for
the coming years and decades. Bloomberg’s Autumn 2023 forecast shows the evolution of the
generative AI market between 2020 and 2032. The market has grown from $14 billion U.S. in
2020 to $900 billion U.S. in 2023. The forecast is shown in Figure 2 [6].
239
L. Berek
25%
54%
21%
Yes
No
Neutral
Figure 1. Using AI Tools to Complete Assignments or Exams is Cheating or Plagiarism|
BestColleges 2023 [4].
Figure 2. Generative AI revenue worldwide from 2020 with forecast until 2032 (in billion U.S.
dollars) [6].
The literature review focuses on the role of generative AI in higher education institutions and
academia. A review of the research results is presented to explore the effectiveness of AI
generated text detectors. The research also focuses on the regulation of generative AI in higher
education.
MATERIALS AND METHODS
The two major scientific databases used for the bibliographic search were Scopus and Web of
Science. Zotero reference management software was used for data collection and further
processing. Rayyan software was used for the deduplication of publications and for the
screening and selection stage.
240
Artificial intelligence-generated text in higher education – usage and detection in ...
CRITERIA AND LIMITATIONS
The main data source for the study was Scopus; the data collected was supplemented by the
results of a search of the Web of Science database. Additional data, mainly statistical, were
collected from the Statista database. Search queries were conducted in May 2024 in both
Scopus and Web of Science databases.
The search in Scopus and Web of Science did not exclude conference proceedings or book
chapters. All content indexed in these databases were included in the analysis.
Several keywords were specified in order to identify relev (...truncated)