Definition
Term weighting is a procedure that takes place during the text indexing process in order to assess the value of each term to the document. Term weighting is the assignment of numerical values to terms that represent their importance in a document in order to improve retrieval effectiveness [9]. Essentially it considers the relative importance of individual words in an information retrieval system, which can improve system effectiveness, since not all the terms in a given document collection are of equal importance. Weighing the terms is the means that enables the retrieval system to determine the importance of a given term in a certain document or a query. It is a crucial component of any information retrieval system, a component that has shown great potential for improving the retrieval effectiveness of an information retrieval system [8].
Historical Background
The use of word frequency dates back to G. K. Zipf and his well-known law [16] for word distribution. The law...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Hiemstra D, de Vries A. Relating the new language models of information retrieval to the traditional retrieval models (No. TR-CTIT-00-09). Amsterdam: Centre for Telematics and Information Technology (CTIT), University of Twente; 2000.
Korfhage RR. Information storage and retrieval. New York: Wiley; 1997.
Lancaster FW. Indexing and abstracting in theory and practice. 2nd ed. Champaign: University of Illinois, Graduate School of Library and Information Science; 1998.
Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.
Ponte JM, Croft WB. A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 275–281.
Robertson SE, Sparck-Jones K. Relevance weighting of search terms. J Am Soc Inf Sci. 1976;27(3):129–46.
Roelleke, Thomas. Information retrieval models: foundations & relationships. San Rafael: Morgan & Claypool Publishers; 2013.
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24(4):513–23.
Salton G, McGill M. Introduction to modern information retrieval. New York: McGraw-Hill Book Company; 1983.
Salton G, Yang CS, Yu CT. A theory of term importance in automatic text analysis. J Am Soc Inf Sci Technol. 1975;26(1):33–44.
Singhal A. Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Comm Data Eng. 2001;24(4):35–43.
Singhal A, Salton G, Mitra M, Buckley C. Document length normalization. Inf Process Manag. 1996;32(5):619–33.
Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc. 1972;28(1):11–20.
Sparck Jones K, Walker S, Robertson SE. A probabilistic model of information retrieval: development and comparative experiments: part I. Inf Process Manag. 2000;36(6):779–808.
Zhai CX. Statistical language models for information retrieval. Synth Lect Hum Lang Technol. 2008;1(1):1–141.
Zipf GK. Human behavior and principle of least effort. Cambridge, MA: Addison Wesley; 1949.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Abu El-Khair, I. (2018). Term Weighting. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_943
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_943
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering