Synonyms
Text classification
Definition
Text classification is to automatically assign textual documents (such as documents in plain text and Web pages) into some predefined categories based their content. Formally speaking, text classification works on an instance space X where each instance is a document d and a fixed set of classes C = {C1, C2, … , C|C|} where |C| is the number of classes. Given a training set Dl of training documents 〈d, Ci〉 where 〈d, Ci〉 ∈ X × C, using a learning method or learning algorithm, the goal of document classification is to learn a classifier or classification function γ that maps instances to classes: γ : X → C [7].
Historical Background
Text classification, which is to classify documents into some predefined categories, provides an effective way to organize documents. Text classification dates back to the early 1960s, but only in the early 1990s did it become a major subfield of the information systems discipline. Recently, with the explosive growth of...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Dumais S, Platt J, Heckerman D, Sahami M. Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th International Conference on Information and Knowledge Management; 1998. p. 148–55.
Glover EJ, Tsioutsiouliklis K, Lawrence S, Pennock DM, Flake GW. Using web structure for classifying and describing web pages. In: Proceedings of the 11th International World Wide Web Conference; 2002. p. 562–9.
Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning; 1998. p. 137–42.
Kehagias A, Petridis V, Kaburlasos VG, Fragkou P. A comparison of word- and sense-based text categorization using several classification algorithms. J Intell Inf Syst. 2003;21(3):227–47.
Kolcz A, Prabakarmurthi V, Kalita JK. String match and text extraction: summarization as feature selection for text categorization. In: Proceedings of the 10th International Conference on Information and Knowledge Management; 2001. p. 365–70.
Lewis DD. Representation quality in text classification: an introduction and experiment. In: Proceedings of the Workshop on Speech and Natural Language; 1990. p. 288–95.
Manning CD, Raghavan P, SchÜZe H. Introduction to information retrieval. Cambridge University Press, 2007.
Mccallum A, Nigam K. A comparison of event models for naive bayes text classication. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization; 1998.
Peng F, Schuurmans D, Wang S. Augmenting naive bayes classifiers with statistical language models. Inf. Retr. 2004;7(3–4):317–45.
Rijsbergen CV. Information retrieval. 2nd ed. London: Butterworths; 1979.
Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv. 2002;34(1):1–47.
Shen D, Chen Z, Yang Q, Zeng HJ, Zhang B, Lu Y, Ma WY. Web-page classification through summarization. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 242–9.
Shen D, Sun JT, Yang Q, Chen Z. A comparison of implicit and explicit links for web page classification. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 643–50.
Yang Y. An evaluation of statistical approaches to text categorization. Inf Retr. 1999;1(1–2):69–90.
Yang Y, Pedersen JO. A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning; 1997. p. 412–20.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Shen, D. (2018). Text Categorization. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_414
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_414
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering