Term Statistics for Structured Text Retrieval

Kamps, Jaap; Lalmas, Mounia

doi:10.1007/978-1-4614-8265-9_412

Term Statistics for Structured Text Retrieval

Jaap Kamps³ &
Mounia Lalmas⁴

Reference work entry
First Online: 01 January 2018

14 Accesses

Synonyms

Inverse element frequency; Within-element term frequency

Definition

Classical ranking algorithms in information retrieval make use of term statistics, the most common (and basic) ones being within-document term frequency, tf, and document frequency, df. tf is the number of occurrences of a term in a document and is used to reflect how well a term captures the topic of a document, whereas df is the number of documents in which a term appears and is used to reflect how well a term discriminates between relevant and non-relevant documents. df is also commonly referred to as inverse document frequency, idf, since it is inversely related to the importance of a term. Both tf and idf are obtained at indexing time. Ranking algorithms for structured text retrieval, and more precisely XML retrieval, require similar terms statistics, but with respect to elements.

Key Points

To calculate term statistics for elements, one could simply replace documents by elements and calculate so-called...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Yahoo! Inc., London, UK
Mounia Lalmas

Authors

Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Mounia Lalmas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaap Kamps .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kamps, J., Lalmas, M. (2018). Term Statistics for Structured Text Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_412

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_412
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Term Statistics for Structured Text Retrieval

Synonyms

Definition

Key Points

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Synonyms

Definition

Key Points

Buying options

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation