Skip to main content

TF–IDF

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining

TF–IDF (term frequency–inverse document frequency) is a term weighting scheme commonly used to represent textual documents as vectors (for purposes of classification, clustering, visualization, retrieval, etc.). Let T = { t1, , t n } be the set of all terms occurring in the document corpus under consideration. Then a document d i is represented by a n-dimensional real-valued vector x\(_{i} = (x_{i_{1}},\ldots,x_{in})\) with one component for each possible term from T.

The weight x ij corresponding to term t j in document d i is usually a product of three parts: one which depends on the presence or frequency of t j in d i , one which depends on t j ’s presence in the corpus as a whole, and a normalization part which depends on d j . The most common TF–IDF weighting is defined by \(x_{ij} =\mathrm{ TF}_{i} \cdot \mathrm{ IDF}_{j} \cdot (\sum _{j}(\mathrm{TF}_{ij}\mathrm{IDF}_{j})^{2})^{-1/2}\), where TF ij is the term frequency (i.e., number of occurrences) of t j in d i , and IDFjis...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

(2017). TF–IDF. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_832

Download citation

Publish with us

Policies and ethics