Definition
Text indexing is the act of processing a text in order to extract statistics considered important for representing the information available and/or to allow fast search on its content. Text indexing operations can be performed not only on natural language texts, but virtually on any type of textual information, such as source code of computer programs, DNA or protein databases and textual data stored in traditional database systems.
Historical Background
Efforts for indexing electronic texts are found in literature since the beginning of computational systems. For example, descriptions of Electronic Information Search Systems that are able to index and search text can be found in the early of 1950s [3].
In a seminal work, Gerard Salton wrote, in 1968, a book containing the basis for the modern information retrieval systems [5], including a description of a model largely adopted up to now for indexing texts, known as Vector Space Model. Other successful models for indexing...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Baeza-Yates R. and Navarro G. Block-addressing indices for approximate text retrieval. J. American Soc. for Inf. Sci., 51(1):69–82, 2000.
Baeza-Yates R. and Ribeiro-Neto B. Modern Information Retrieval. Addison Wesley, Reading, MA, 1999.
Luhn H.P. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):309–317, October 1957.
Manber U. and Wu S. Glimpse: A tool to search through entire file systems. In Proc. USENIX Winter 1994 Technical Conf., pp. 23–32.1994, Winter
Salton G. Automatic Information Organization and Retrieval. McGraw-Hill, New York, NY, 1968.
Salton G., Won A., and Yang C.S. A vector space model for automatic indexing. Inf. Retriev. Lang. Process., 18(11):613–620, November 1975.
Witten I., Moffat A., and Bell T. Managing Gigabytes, 2nd edn. Morgan Kaufmann, Los Altos, CA, 1999.
Zobel J., Moffat A., and Ramamohanarao K. Inverted files versus signature files for text indexing ACM Trans. Database Syst., 23(4):453–490, December 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this entry
Cite this entry
De Moura, E. (2009). Text Indexing Techniques. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_1135
Download citation
DOI: https://doi.org/10.1007/978-0-387-39940-9_1135
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering