EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS (pdf)

Article PDF cannot be displayed. You can download it here:

https://esnbu.org/data/files/2025/esnbu.25.1.8.pdf

EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS

English Studies at NBU, 2024 Vol. 11, Issue 1, pp. 141-159 https://doi.org/10.33919/esnbu.25.1.8 pISSN 2367-5705 eISSN 2367-8704 www.esnbu.org EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS Lina Miloshevska University of Information Science and Technology “St. Paul The Apostle”, Ohrid, North Macedonia Abstract The present article is a description of the stages involved in compiling a specialized corpus of Macedonian magazine texts and the software tools employed to extract anglicisms from the corpus. The texts were collected from the magazine Kapital and cover two distinct periods: the years 2000 and 2020. The size of the corpus is about 2 million tokens and 141,852 types. The software employed produced word lists that later in combination with other statistical techniques produced a refined Anglicism headword list from which new anglicisms were extracted. In addition to the software tools, careful manual inspection was necessary in both the extraction and analysis stages. As a result of the research, a total of 220 completely new anglicisms have been identified. Most of these new anglicisms are not yet included in existing Macedonian dictionaries. Keywords: Anglicisms, anglicisms extraction, corpus linguistics, corpus analysis tools Article history: Received: 03 December 2024 Reviewed: 15 February 2025 Accepted: 08 March 2025 Published: 30 June 2025 Copyright © 2025 Lina Miloshevska This is an Open Access article published and distributed under the terms of the CC BY 4.0 International License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Citation: Miloshevska, L. (2025). Extraction of anglicisms from a corpus of Macedonian magazine texts. English Studies at NBU, 11(1), 141-159. https://doi.org/10.33919/esnbu.25.1.8 Lina Miloshevska is a senior lecturer at the University of Information Science and Technology “St. Paul The Apostle”, North Macedonia. Her research interests are in the field of corpus linguistics, discourse analysis, language contact and language change, CALL, and ESP. E-mail: https://orcid.org/0000-0003-3856-5448 141 EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS The present article describes a research project aimed at building a specialized corpus of Macedonian magazine texts to be used for anglicisms identification and analysis. Previous research on anglicisms has provided a methodological background for the identification and extraction of these English loan words. To identify anglicisms, the language of the media, whether print or digital is regarded as a very convenient source since “it is representative of a wide range of registers and is highly receptive and open towards neologisms, loanwords and linguistic creativity in general” (Furiassi & Hofland, 2007, p. 347). Foreign words, anglicisms, and false anglicisms are often used for their positive connotation and their strategically communicative features, especially in eye-catching headlines (p. 347). The media plays a significant role as a primary source for introducing anglicisms in the Macedonian language. Written texts are particularly important in the study of new loans as they provide these items more visibility in the influx of newly introduced borrowings and coinages. The fact that some new anglicisms are accompanied by their original in parenthesis, quotation marks or explained is more likely to fix them in both the passive and the active lexical repertoire of Macedonian readers. Additionally, the presence or absence of typographical resources (such as inverted commas or italics) can be interpreted as marks of novelty or foreign character. In fact, they reveal to what extent the writer considers the word should be highlighted as foreign or not. The examples from the corpus in Appendix 1 illustrate this. Applying corpus analysis tools to study Anglicisms in Macedonian media texts is a robust approach to understanding how English loanwords influence Macedonian language use, particularly in journalism and media discourse. Attempts at automatic and semi-automatic retrieval of anglicisms with varying degrees of success are discussed by Andersen (2005, 2011, 2012), Furiassi & Hofland (2007), Furiassi (2008), and Losnegaard & Lyse (2012). Identification of Anglicisms The starting point for any study of anglicisms is based on the definition of these loan words, i.e. what counts as an Anglicism, which is essential to determine in order to calculate the number and the impact (frequency) of English vocabulary on a language. Definitions vary significantly in the literature and are usually adapted to the researcher’s 142 Lina Miloshevska interest. Definitions can be quite restrictive, focusing only on the most recent anglicisms (cf. Görlach 2001) or more accommodating (cf. Gottlieb 2004), including both new and older anglicisms that have long been accepted into the recipient language. For the purpose of this paper, the definition of what constitutes an Anglicism focuses mainly on lexical items without constraints on their degree of acceptance. Moreover, no limitations are placed on whether a word is an Anglicism, Americanism, or Briticism. Consequently, the term Anglicism used in this paper covers any variant of English origin adapted or adopted (unadapted) and serves as a portmanteau term. In other words, anglicisms in this study are adapted lexical items and unadapted lexical items that clearly have an English origin (are attested in the source language) and bear English traits in their phonology, morphology, orthography, and semantics. Adapted anglicisms are words or compounds whose orthography and morphology are adapted to the recipient language system. Such items often become a productive source for new terms in the recipient language system; for example, финишира, стратува, инвестира, are clear loanwords adapted to the Macedonian phonological, morphological, and grammatical system. On the other hand, adopted/unadapted loans are words or compounds borrowed from English “wholesale” without much structural integration so that the expression remains recognizably English, such as скрининг, кобрандинг, бот, бизнис. The only intervention of the recipient language is in the phonology of the term, given the difference between English and the recipient language phonological systems. For practical reasons, the anglicisms discussed in this paper are one-word lexical units or single-unit compounds unhyphenated. Thus defined a list of anglicisms for further analysis was extracted from the corpus as explained in section 4 The KAPITAL corpus To extract and study anglicisms in Macedonian, a corpus of magazine articles was compiled and analyzed. The corpus was created specifically for the purpose of this study and was compiled from scratch. The corpus size is 2,288,999 tokens. The corpus was extracted from two distinct time periods: the years 2000 and 2020. This proved to be crucial i (...truncated)