Skip to main content

Text Indexing Techniques

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 43 Accesses

Definition

Text indexing is the act of processing a text in order to extract statistics considered important for representing the information available and/or to allow fast search on its content. Text indexing operations can be performed not only on natural language texts, but virtually on any type of textual information, such as source code of computer programs, DNA or protein databases, and textual data stored in traditional database systems.

Historical Background

Efforts for indexing electronic texts are found in literature since the beginning of computational systems. For example, descriptions of electronic information search systems that are able to index and search text can be found in the early 1950s [3].

In a seminal work, Gerard Salton wrote, in 1968, a book containing the basis for the modern information retrieval systems [5], including a description of a model largely adopted up to now for indexing texts, known as vector space model. Other successful models for indexing...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Baeza-Yates R, Navarro G. Block-addressing indices for approximate text retrieval. J Am Soc Inf Sci. 2000;51(1):69–82.

    Article  Google Scholar 

  2. Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. 2nd ed. Reading: Addison Wesley; 2011.

    Google Scholar 

  3. Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev. 1957;1(4):309–17.

    Article  MathSciNet  Google Scholar 

  4. Manber U, Wu S. Glimpse: a tool to search through entire file systems. In: Proceedings of the USENIX Winter 1994 Technical Conference; 1994. p. 23–32.

    Google Scholar 

  5. Salton G. Automatic information organization and retrieval. New York: McGraw-Hill; 1968.

    Google Scholar 

  6. Salton G, Won A, Yang CS. A vector space model for automatic indexing. Inf Retr Lang Process. 1975;18(11):613–20.

    MATH  Google Scholar 

  7. Witten I, Moffat A, Bell T. Managing gigabytes. 2nd ed. Los Altos: Morgan Kaufmann; 1999.

    MATH  Google Scholar 

  8. Zobel J, Moffat A, Ramamohanarao K. Inverted files versus signature files for text indexing. ACM Trans Database Syst. 1998;23(4):453–90.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edleno Silva De Moura .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Silva De Moura, E. (2018). Text Indexing Techniques. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1135

Download citation

Publish with us

Policies and ethics