An alternative proposal for eliciting key words
DOI:
https://doi.org/10.33919/esnbu.15.2.1Keywords:
corpora, key words, chi-square, log likelihood, lemmas, lemmatizationAbstract
The article reports research on the concept of key words as statistically significant items in a text or corpus. It reviews approaches to eliciting key words used in various software products for language analysis and the rationale for adopting them. Based on empirical data, a new method is proposed and tested on an exploratory corpus. The motivation and arguments for proposing the procedure are revealed, using comparisons between different languages. The adequacy of the results yielded by the different methods is tested via a mechanism developed with this research.
References
Baker, P. (2004). Querying keywords: questions of difference, frequency and sense in keywords analysis†Journal of English Linguistics, 32(4), 346-359.
Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.
Davies, Mark. (2004). BYU-BNC. (Based on the British National Corpus from Oxford University Press). Available online at http://corpus.byu.edu/bnc
Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1), 61-74.
Hoey, M. (1991). Patterns of Lexis in Text. Oxford: Oxford University Press.
Hoey, M. (2005). Lexical Priming: A new theory of words and language. Routledge.
Kilgarriff, A. (1996). Which Words are Particularly Characteristic of Text? A Survey of Statistical Approaches. Information Technology Research Institute, University of Brighton. Retrieved from https://www.kilgarriff.co.uk/Publications/1996-K-AISB.pdf
Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394.
Oakes, M. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Phillips, M. (1989). Lexical Structure of Text Discourse Analysis Monograph No. 12, Birmingham: English Language Research, University of Birmingham.
Scott, M. (1997). PC Analysis of Key words - and Key Key Words. System, 25(2), 233-245.
Scott, M (2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In Mohsen Ghadessy, Alex Henry, Robert L. Roseberry (Eds.), Small corpus studies and ELT: theory and practice. John Benjamins B.V.
Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and marking out trails…. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 43–58). John Benjamins B.V. https://doi.org/10.1075/scl.41
Scott, M. (2012). WordSmith Tools version 6 [Computer Software], Stroud: Lexical Analysis Software. Retrieved from http://www.lexically.net/wordsmith/index.html
Scott, M. (2015). Wordsmith Tools Manual. Lexical Analysis Software Ltd. Retrieved from http://lexically.net/downloads/version6/HTML/index.html?getting_started.htm
Scott, M., & Tribble, C. (2006). Textual Patterns: Key words and corpus analysis in language education. John Benjamins B.V.
Sinclair, J. (1996). The Search for Units of Meaning. Textus IX. 75-106.
Stubbs, M. (1996). Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture. Oxford: Blackwell.
Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. London: Blackwell.
Stubbs, M. (2010). Three concepts of keywords. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 21–42). John Benjamins B.V. https://doi.org/10.1075/scl.41
Tarasheva, E. (2011). Repetitions of Word Forms in Texts. Cambridge Scholars Publishing.
Utka, A. (2004). Analysis of George Orwell’s novel 1984 by statistical methods of corpus linguistics.’ (Bachelor’s thesis, Kaunas Vytautas Magnus University, Kaunas, Lithuania). Retrieved from http://donelaitis.vdu.lt/publikacijos/adrtmain.htm
Williams, R. (1976/1983). Keywords: A Vocabulary of Culture and Society. London: Fontana Press.
Downloads
Published
How to Cite
Issue
Section
License
Access Policy and Content Licensing
All published articles on the ESNBU site are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes. The terms on which the article is published allow the posting of the published article (Version of Record) in any repository by the author(s) or with their consent.
Note that prior to, and including, Volume 10, Issue 2, 2024, articles were licensed under the Non-commercial (CC BY-NC 4.0) license. The transition to CC BY 4.0 is effective as of Volume 11, Issue 1, 2025.
In other words, under the CC BY 4.0 license users are free to
Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
Attribution (by) - You must give appropriate credit (Title, Author, Source, License), provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notice: No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
If the law requires that the article be published in the public domain, authors will notify ESNBU at the time of submission, and in such cases the article shall be released under the Creative Commons Public Domain Dedication waiver CC0 1.0 Universal.
Copyright
Copyright for articles published in ESNBU are retained by the authors, with first publication rights granted to the journal. Authors retain full publishing rights and are encouraged to upload their work to institutional repositories, social academic networking sites, etc. ESNBU is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author.
Exceptions to copyright policy
Occasionally ESNBU may co-publish articles jointly with other publishers, and different licensing conditions may then apply.