Synonyms
Definition
Text segmentation is a precursor to text retrieval, automatic summarization, information retrieval (IR); language modeling (LM) and natural language processing (NLP). In written texts, text segmentation is the process of identifying the boundaries between words, phrases, or some other linguistic meaningful units, such as sentences or topics. The term separated from such processing is useful to help humans reading texts, and are mainly used to assist computers to do some artificial processes as fundamental units, such as NLP, and IR.
Historical Background
Natural language processing (NLP) is an important research field. Its primary problem is how to segment text correctly. Various segmentation methods have emerged in the past decades for different kinds of language and applications. Text segmentation is language dependent (different language has its own special problems, which would be introduced later), corpus dependent, character-set...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Beeferman D., Berger A., and Lafferty J. Statistical models for text segmentation. Mach. Learn., 34(1–3):177–210, 1999.
Grefenstette G. and Tapanainen P. What is a word, what is a sentence? Problems of tokenization. In Proc. 3rd Conf. on Computational Lexicography and Text Research, 1994, pp. 7–10.
Mikheev A. Tagging sentence boundaries. In Proc. 1st Conf. on North American Chapter of the Association for Computational Linguistics, 2000, pp. 264–271.
Reynar J.C. and Marcus M.P. 1998.Topic segmentation: algorithms and applications. University of Pennsylvania, Philadelphia, PA, Ph.D. Thesis,
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this entry
Cite this entry
Huang, H., Zhang, B. (2009). Text Segmentation. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_421
Download citation
DOI: https://doi.org/10.1007/978-0-387-39940-9_421
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering