Extraction of anglicisms from a corpus of Macedonian magazine texts

Authors

  • Lina Miloshevska University of Information Science and Technology St. Paul The Apostle image/svg+xml

DOI:

https://doi.org/10.33919/esnbu.25.1.8

Keywords:

Anglicisms, anglicisms extraction, corpus linguistics, corpus analysis tools

Abstract

The present article is a description of the stages involved in compiling a specialized corpus of Macedonian magazine texts and the software tools employed to extract anglicisms from the corpus. The texts were collected from the magazine Kapital and cover two distinct periods: the years 2000 and 2020. The size of the corpus is about 2 million tokens and 141,852 types. The software employed produced word lists that later in combination with other statistical techniques produced a refined Anglicism headword list from which new anglicisms were extracted. In addition to the software tools, careful manual inspection was necessary in both the extraction and analysis stages. As a result of the research, a total of 220 completely new anglicisms have been identified. Most of these new anglicisms are not yet included in existing Macedonian dictionaries.

Author Biography

Lina Miloshevska, University of Information Science and Technology St. Paul The Apostle

Lina Miloshevska is a senior lecturer at the University of Information Science and Technology “St. Paul the Apostle”, North Macedonia. Her research interests are in the field of corpus linguistics, discourse analysis, language contact and language change, CALL, and ESP.

References

Anthony, L. (2024a). AntConc (Version 4.3.1) [Computer Software]. Waseda University https://www.laurenceanthony.net/software/AntConcAnthony

Anthony, L. (2024b). TagAnt (Version 2.1.1) [Computer Software]. Waseda University. https://www.laurenceanthony.net/software/TagAnt

Andersen, G. (2005). Assessing algorithms for automatic extraction of anglicisms in Norwegian texts. Corpus Linguistics 2005.

Andersen, G. (2011). Corpora as lexicographical basis: the case of anglicisms in Norwegian. Methodological and Historical Dimensions of Corpus Linguistics (Studies in Variation, Contacts and Change in English 6), ed. by P. Rayson, S. Hoffmann & G. Leech. VARIENG. https://varieng.helsinki.fi/series/volumes/06/andersen

Andersen, G. (2012). Semi-automatic approaches to Anglicism detection in Norwegian corpus data. The anglicization of European lexis, 10, 111-130. https://doi.org/10.1075/z.174.09and DOI: https://doi.org/10.1075/z.174.09and

Andersen, G. (2021). On a daily basis… a comparative study of phraseological borrowing. In R. Marti Solano & P. Ruano San Segundo (Eds.), Anglicisms and Corpus Linguistics: Corpus-Aided Research into the Influence of English on European Languages (pp. 13-30). Peter Lang. https://www.peterlang.com/document/1184575

Furiassi, C. G. (2008). Non-adapted Anglicisms in Italian: Attitudes, frequency counts, and lexicographic implications. In R. Fischer, & H. Pulaczewska (Eds.), Anglicisms in Europe. Linguistic Diversity in a Global Context (pp. 313-327). Cambridge Scholars Publishing. https://hdl.handle.net/2318/100769

Furiassi, C., & Hofland, K. (2007). The retrieval of false anglicisms in newspaper texts. In R. Facchinetti (Ed.), Corpus linguistics 25 years on (pp. 347-363). Brill. https://doi.org/10.1163/9789401204347_020 DOI: https://doi.org/10.1163/9789401204347_020

Görlach, M. (Ed.). (2001). A dictionary of European anglicisms: A usage dictionary of anglicisms in sixteen European languages. Oxford University Press.

Gottlieb, H. (2004). Danish echoes of English. Nordic Journal of English Studies, 3(2), 39-65. https://doi.org/10.35360/njes.161 DOI: https://doi.org/10.35360/njes.161

Khoutyz, I. (2010). The pragmatics of anglicisms in modern Russian discourse. In R. Facchinetti, D. Crystal, & Barbara Seidlhofer (Eds.), From international to local English – and back again (pp. 197-208). Peter Lang.

Losnegaard, G. S., & Lyse, G. I. (2012). A data-driven approach to anglicism identification in Norwegian. In G. Andersen (ed.), Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian, Studies in Corpus Linguistics vol. 49 (pp. 131-154). John Benjamins. https://doi.org/10.1075/scl.49.07los DOI: https://doi.org/10.1075/scl.49.07los

Noguerolez, E. E. N. (2017). The Use of Anglicisms in Various Thematic Fields: An Analysis Based on the Corpus de Referencia del Español Actual. ANGLICA-An International Journal of English Studies, 26(2), 123-149. https://doi.org/10.7311/0860-5734.26.2.08 DOI: https://doi.org/10.7311/0860-5734.26.2.08

Honnibal, M., & Montani, I. (2025). spaCy. https://spacy.io/models/mk

Winter-Froemel, E., & Onysko, A. (2012). Proposing a pragmatic distinction for lexical Anglicisms. In C. Furiassi, V. Pulcini, & F. R. González (Eds.), The anglicization of European lexis (pp. 43-64.). John Benjamins. https://doi.org/10.1075/z.174.06win DOI: https://doi.org/10.1075/z.174.06win

Mańczak-Wohlfeld, E., & Witalisz, A. (2019). Anglicisms in the National Corpus of Polish: Assets and limitations of corpus tools. Studies in Polish Linguistics, 14(4), 171-190. https://doi.org/10.4467/23005920SPL.19.019.11337 DOI: https://doi.org/10.4467/23005920SPL.19.019.11337

Downloads

Published

2025-06-18

How to Cite

Miloshevska, L. (2025). Extraction of anglicisms from a corpus of Macedonian magazine texts. English Studies at NBU, 11(1), 141–159. https://doi.org/10.33919/esnbu.25.1.8

Issue

Section

Doctoral Section