Abstract
A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require levels of computation that prevent them from making sufficiently fast decisions in some applied setting. Using insights gained from examining the way humans make fast decisions when classifying text documents, two new text classification algorithms are developed based on sequential sampling processes. These algorithms make extremely fast decisions, because they need to examine only a small number of words in each text document. Evaluation against the Reuters-21578 collection shows both techniques have levels of performance that approach benchmark methods, and the ability of one of the classifiers to produce realistic measures of confidence in its decisions is shown to be useful for prioritizing relevant documents.
This research was supported by the Australian Defence Science and Technology Organisation. The author wishes to thank Peter Bruza, Simon Dennis, Brandon Pincombe, Douglas Vickers, and Chris Woodru.. Correspondence should be addressed to: Michael D. Lee, Department of Psychology, University of Adelaide, SA 5005, AUSTRALIA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y Yang and X Liu, “A re-examination of text categorization methods,” in SIGIR’ 99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkley, CA, 1999, pp. 42–49, ACM.
G Gigerenzer and P M Todd, Simple Heuristics That Make Us Smart, Oxford University Press, New York, 1999.
P L Smith, “Stochastic dynamic models of response time and accuracy: A foundational primer,” Journal of Mathematical Psychology, vol. 44, pp. 408–463, 2000.
D Vickers and M D Lee, “Dynamic models of simple judgments: I. Properties of a self-regulating accumulator module,” Non-linear Dynamics, Psychology, and Life Sciences, vol. 2, no. 3, pp. 169–194, 1998.
R E Kass and A E Raftery, “Bayes factors,” Journal of the American Statistical Association, vol. 90, no. 430, pp. 773–795, 1995.
D D Lewis, “Reuters-21578 text categorization test collection,” 1997, Available at http://www.research.att.com/~lewis/reuters21578/readme.txt.
C J Van Risjbergen, Information Retrieval, Butterworths, London, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, M.D. (2001). Fast Text Classification Using Sequential Sampling Processes. In: Stumptner, M., Corbett, D., Brooks, M. (eds) AI 2001: Advances in Artificial Intelligence. AI 2001. Lecture Notes in Computer Science(), vol 2256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45656-2_27
Download citation
DOI: https://doi.org/10.1007/3-540-45656-2_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42960-9
Online ISBN: 978-3-540-45656-8
eBook Packages: Springer Book Archive