Synonyms
Negative dictionary; Stopwords
Definition
Stoplists are lists of words, commonly called stopwords, which are not indexed in an information retrieval system, and/or are not available for use as query terms. A stoplist can be created by sorting the terms in a document collection by frequency of occurrence, and designating some number of high frequency terms as stopwords, or alternately, by using one of the published lists of stopwords available. Stoplists may be generic or domain specific, and are of course language specific. When a stoplist is used for indexing, as a document is added to the system, each word in it is checked against the stoplist (for example through dictionary lookup or hashing), and those which match are eliminated from further processing. In some systems, stopwords are indexed, but the stoplist is used to eliminate the words from processing when they are used as query terms.
Key Points
Hans Peter Luhn, in pioneering work on automatic abstracting, put forward...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Dialog online courses: glossary of search terms. Available at: http://training.dialog.com/onlinecourses/glossary/glossary_life.html.
Flood BJ. Historical note: the start of a stop list at Biological Abstracts. J Am Soc Inf Sci. 1999;50(12):1066.
Fox C. Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R, editors. Information retrieval: data structures and algorithms. Englewood Cliffs: Prentice-Hall; 1992. p. 102–30.
Google Web Search Help Center. Search basics: use of common words. Available at: http://www.google.com/support/bin/answer.py?answer=981.
Korfhage RR. Information storage and retrieval. Wiley: Wiley Computer Pub; 1997.
Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):157–65.
Luhn HP. Keyword-in-context index for technical literature. Am Doc. 1960;11(4):288–95.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.
Parkins PV. Approaches to vocabulary management in permuted-title indexing of Biological Abstracts. In: Proceedings of the 26th Annual Meeting on American Documentation Institute; 1963. p. 27–9.
Witten IH, Moffat A, Bell TC. Managing gigabytes: compressing and indexing documents and images. 2nd ed. San Francisco: Morgan Kaufmann; 1999.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Rasmussen, E. (2018). Stoplists. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_955
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_955
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering