An alternative proposal for eliciting key words

Authors

DOI:

https://doi.org/10.33919/esnbu.15.2.1

Keywords:

corpora, key words, chi-square, log likelihood, lemmas, lemmatization

Abstract

The article reports research on the concept of key words as statistically significant items in a text or corpus. It reviews approaches to eliciting key words used in various software products for language analysis and the rationale for adopting them. Based on empirical data, a new method is proposed and tested on an exploratory corpus. The motivation and arguments for proposing the procedure are revealed, using comparisons between different languages. The adequacy of the results yielded by the different methods is tested via a mechanism developed with this research.

Author Biography

Elena Tarasheva, New Bulgarian University

Elena Tarasheva is Associate Professor of Discourse Analysis (Media) at the English Studies Department, New Bulgarian University. She obtained her BA in English Philology from Veliko Turnovo University and specialized in Media and Culture Studies at the University of Strathclyde in Glasgow, UK. Tarasheva has her PhD in Mathematical Linguistics from the Bulgarian Academy of Science. She has published two monographs: Repetitions of Word Forms: an Approach to Text Structure CSP 2011 and The Image of a Country created by International Media: The Case of Bulgaria CSP 2014 and several articles about Cultural Studies, Corpus Linguistics and Political Linguistics.

References

Baker, P. (2004). Querying keywords: questions of difference, frequency and sense in keywords analysis†Journal of English Linguistics, 32(4), 346-359.

Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.

Davies, Mark. (2004). BYU-BNC. (Based on the British National Corpus from Oxford University Press). Available online at http://corpus.byu.edu/bnc

Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1), 61-74.

Hoey, M. (1991). Patterns of Lexis in Text. Oxford: Oxford University Press.

Hoey, M. (2005). Lexical Priming: A new theory of words and language. Routledge.

Kilgarriff, A. (1996). Which Words are Particularly Characteristic of Text? A Survey of Statistical Approaches. Information Technology Research Institute, University of Brighton. Retrieved from https://www.kilgarriff.co.uk/Publications/1996-K-AISB.pdf

Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394.

Oakes, M. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.

Phillips, M. (1989). Lexical Structure of Text Discourse Analysis Monograph No. 12, Birmingham: English Language Research, University of Birmingham.

Scott, M. (1997). PC Analysis of Key words - and Key Key Words. System, 25(2), 233-245.

Scott, M (2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In Mohsen Ghadessy, Alex Henry, Robert L. Roseberry (Eds.), Small corpus studies and ELT: theory and practice. John Benjamins B.V.

Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and marking out trails…. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 43–58). John Benjamins B.V. https://doi.org/10.1075/scl.41

Scott, M. (2012). WordSmith Tools version 6 [Computer Software], Stroud: Lexical Analysis Software. Retrieved from http://www.lexically.net/wordsmith/index.html

Scott, M. (2015). Wordsmith Tools Manual. Lexical Analysis Software Ltd. Retrieved from http://lexically.net/downloads/version6/HTML/index.html?getting_started.htm

Scott, M., & Tribble, C. (2006). Textual Patterns: Key words and corpus analysis in language education. John Benjamins B.V.

Sinclair, J. (1996). The Search for Units of Meaning. Textus IX. 75-106.

Stubbs, M. (1996). Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture. Oxford: Blackwell.

Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. London: Blackwell.

Stubbs, M. (2010). Three concepts of keywords. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 21–42). John Benjamins B.V. https://doi.org/10.1075/scl.41

Tarasheva, E. (2011). Repetitions of Word Forms in Texts. Cambridge Scholars Publishing.

Utka, A. (2004). Analysis of George Orwell’s novel 1984 by statistical methods of corpus linguistics.’ (Bachelor’s thesis, Kaunas Vytautas Magnus University, Kaunas, Lithuania). Retrieved from http://donelaitis.vdu.lt/publikacijos/adrtmain.htm

Williams, R. (1976/1983). Keywords: A Vocabulary of Culture and Society. London: Fontana Press.

Downloads

Published

2015-12-31

How to Cite

Tarasheva, E. (2015). An alternative proposal for eliciting key words. English Studies at NBU, 1(2), 5–26. https://doi.org/10.33919/esnbu.15.2.1

Issue

Section

Articles