EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS
English Studies at NBU, 2024
Vol. 11, Issue 1, pp. 141-159
https://doi.org/10.33919/esnbu.25.1.8
pISSN 2367-5705
eISSN 2367-8704
www.esnbu.org
EXTRACTION OF ANGLICISMS FROM A CORPUS OF
MACEDONIAN MAGAZINE TEXTS
Lina Miloshevska
University of Information Science and Technology
“St. Paul The Apostle”, Ohrid, North Macedonia
Abstract
The present article is a description of the stages involved in compiling a specialized corpus of Macedonian
magazine texts and the software tools employed to extract anglicisms from the corpus. The texts were
collected from the magazine Kapital and cover two distinct periods: the years 2000 and 2020. The size of
the corpus is about 2 million tokens and 141,852 types. The software employed produced word lists that
later in combination with other statistical techniques produced a refined Anglicism headword list from
which new anglicisms were extracted. In addition to the software tools, careful manual inspection was
necessary in both the extraction and analysis stages. As a result of the research, a total of 220 completely
new anglicisms have been identified. Most of these new anglicisms are not yet included in existing
Macedonian dictionaries.
Keywords: Anglicisms, anglicisms extraction, corpus linguistics, corpus analysis tools
Article history:
Received: 03 December 2024
Reviewed: 15 February 2025
Accepted: 08 March 2025
Published: 30 June 2025
Copyright © 2025 Lina Miloshevska
This is an Open Access article published and distributed under the terms of the CC BY 4.0
International License which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Citation: Miloshevska, L. (2025). Extraction of anglicisms from a corpus of Macedonian magazine texts.
English Studies at NBU, 11(1), 141-159. https://doi.org/10.33919/esnbu.25.1.8
Lina Miloshevska is a senior lecturer at the University of Information Science and Technology “St. Paul
The Apostle”, North Macedonia. Her research interests are in the field of corpus linguistics, discourse
analysis, language contact and language change, CALL, and ESP.
E-mail:
https://orcid.org/0000-0003-3856-5448
141
EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS
The present article describes a research project aimed at building a specialized
corpus of Macedonian magazine texts to be used for anglicisms identification and
analysis. Previous research on anglicisms has provided a methodological background for
the identification and extraction of these English loan words.
To identify anglicisms, the language of the media, whether print or digital is
regarded as a very convenient source since “it is representative of a wide range of
registers and is highly receptive and open towards neologisms, loanwords and linguistic
creativity in general” (Furiassi & Hofland, 2007, p. 347). Foreign words, anglicisms, and
false anglicisms are often used for their positive connotation and their strategically
communicative features, especially in eye-catching headlines (p. 347).
The media plays a significant role as a primary source for introducing anglicisms
in the Macedonian language. Written texts are particularly important in the study of new
loans as they provide these items more visibility in the influx of newly introduced
borrowings and coinages. The fact that some new anglicisms are accompanied by their
original in parenthesis, quotation marks or explained is more likely to fix them in both
the passive and the active lexical repertoire of Macedonian readers. Additionally, the
presence or absence of typographical resources (such as inverted commas or italics) can
be interpreted as marks of novelty or foreign character. In fact, they reveal to what extent
the writer considers the word should be highlighted as foreign or not. The examples from
the corpus in Appendix 1 illustrate this.
Applying corpus analysis tools to study Anglicisms in Macedonian media texts is a
robust approach to understanding how English loanwords influence Macedonian
language use, particularly in journalism and media discourse. Attempts at automatic and
semi-automatic retrieval of anglicisms with varying degrees of success are discussed by
Andersen (2005, 2011, 2012), Furiassi & Hofland (2007), Furiassi (2008), and
Losnegaard & Lyse (2012).
Identification of Anglicisms
The starting point for any study of anglicisms is based on the definition of these
loan words, i.e. what counts as an Anglicism, which is essential to determine in order to
calculate the number and the impact (frequency) of English vocabulary on a language.
Definitions vary significantly in the literature and are usually adapted to the researcher’s
142
Lina Miloshevska
interest. Definitions can be quite restrictive, focusing only on the most recent anglicisms
(cf. Görlach 2001) or more accommodating (cf. Gottlieb 2004), including both new and
older anglicisms that have long been accepted into the recipient language.
For the purpose of this paper, the definition of what constitutes an Anglicism
focuses mainly on lexical items without constraints on their degree of acceptance.
Moreover, no limitations are placed on whether a word is an Anglicism, Americanism, or
Briticism. Consequently, the term Anglicism used in this paper covers any variant of
English origin adapted or adopted (unadapted) and serves as a portmanteau term. In
other words, anglicisms in this study are adapted lexical items and unadapted lexical
items that clearly have an English origin (are attested in the source language) and bear
English traits in their phonology, morphology, orthography, and semantics. Adapted
anglicisms are words or compounds whose orthography and morphology are adapted to
the recipient language system. Such items often become a productive source for new
terms in the recipient language system; for example, финишира, стратува, инвестира,
are clear loanwords adapted to the Macedonian phonological, morphological, and
grammatical system. On the other hand, adopted/unadapted loans are words or
compounds borrowed from English “wholesale” without much structural integration so
that the expression remains recognizably English, such as скрининг, кобрандинг, бот,
бизнис. The only intervention of the recipient language is in the phonology of the term,
given the difference between English and the recipient language phonological systems.
For practical reasons, the anglicisms discussed in this paper are one-word lexical units or
single-unit compounds unhyphenated. Thus defined a list of anglicisms for further
analysis was extracted from the corpus as explained in section 4
The KAPITAL corpus
To extract and study anglicisms in Macedonian, a corpus of magazine articles was
compiled and analyzed. The corpus was created specifically for the purpose of this study
and was compiled from scratch. The corpus size is 2,288,999 tokens. The corpus was
extracted from two distinct time periods: the years 2000 and 2020. This proved to be
crucial i (...truncated)