Synonyms
Term frequency by inverse document frequency
Definition
A weighting function that depends on the term frequency (TF) in a given document calculated with its relative collection frequency (IDF). This weighting function is calculated as follows [1]. Assuming that term j occurs in at least one document d(dj ≠ 0), the inverse document frequency (IDF) would be
The ratio dj/N is the fraction of documents in the collection that contain the term. The term frequency-inverse document frequency weight (TF*IDF) of term j in document i is defined by multiplying the term frequency by the inverse document frequency:
Where
-
N: number of documents in the collection
-
dj: number of documents containing term j
-
fij: frequency of term j in document i
-
Wij: is the weight of term j in document i
The use of the logarithm in the...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Korfhage RR. Information storage and retrieval. New York: Wiley; 1997.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge, UK: Cambridge University Press; 2008.
Roelleke T. Information retrieval models: foundations & relationships. Morgan & Claypool Publishers; 2013.
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24(4):513–23.
Singhal A, Salton G, Mitra M, Buckley C. Document length normalization. Inf Process Manag. 1996;32(5):619–33.
Sparck JK. A statistical interpretation of term specify and its application in retrieval. J Doc. 1972;28(1):11–20.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Abu El-Khair, I. (2018). TF*IDF. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_956
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_956
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering