Fast Text Classification Using Sequential Sampling Processes

Lee, Michael D.

doi:10.1007/3-540-45656-2_27

Michael D. Lee³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2256))

Included in the following conference series:

Australian Joint Conference on Artificial Intelligence

733 Accesses

Abstract

A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require levels of computation that prevent them from making sufficiently fast decisions in some applied setting. Using insights gained from examining the way humans make fast decisions when classifying text documents, two new text classification algorithms are developed based on sequential sampling processes. These algorithms make extremely fast decisions, because they need to examine only a small number of words in each text document. Evaluation against the Reuters-21578 collection shows both techniques have levels of performance that approach benchmark methods, and the ability of one of the classifiers to produce realistic measures of confidence in its decisions is shown to be useful for prioritizing relevant documents.

This research was supported by the Australian Defence Science and Technology Organisation. The author wishes to thank Peter Bruza, Simon Dennis, Brandon Pincombe, Douglas Vickers, and Chris Woodru.. Correspondence should be addressed to: Michael D. Lee, Department of Psychology, University of Adelaide, SA 5005, AUSTRALIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y Yang and X Liu, “A re-examination of text categorization methods,” in SIGIR’ 99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkley, CA, 1999, pp. 42–49, ACM.
Google Scholar
G Gigerenzer and P M Todd, Simple Heuristics That Make Us Smart, Oxford University Press, New York, 1999.
Google Scholar
P L Smith, “Stochastic dynamic models of response time and accuracy: A foundational primer,” Journal of Mathematical Psychology, vol. 44, pp. 408–463, 2000.
Article MATH MathSciNet Google Scholar
D Vickers and M D Lee, “Dynamic models of simple judgments: I. Properties of a self-regulating accumulator module,” Non-linear Dynamics, Psychology, and Life Sciences, vol. 2, no. 3, pp. 169–194, 1998.
Article Google Scholar
R E Kass and A E Raftery, “Bayes factors,” Journal of the American Statistical Association, vol. 90, no. 430, pp. 773–795, 1995.
Article MATH Google Scholar
D D Lewis, “Reuters-21578 text categorization test collection,” 1997, Available at http://www.research.att.com/~lewis/reuters21578/readme.txt.
C J Van Risjbergen, Information Retrieval, Butterworths, London, 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of Adelaide, Australia
Michael D. Lee

Authors

Michael D. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Information Science, University of South Australia, Mawson Lakes, 5095, SA, Australia
Markus Stumptner & Dan Corbett &
Department of Computer Science, University of Adelaide, 5001, Adelaide, SA, Australia
Mike Brooks

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, M.D. (2001). Fast Text Classification Using Sequential Sampling Processes. In: Stumptner, M., Corbett, D., Brooks, M. (eds) AI 2001: Advances in Artificial Intelligence. AI 2001. Lecture Notes in Computer Science(), vol 2256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45656-2_27

Download citation

DOI: https://doi.org/10.1007/3-540-45656-2_27
Published: 14 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42960-9
Online ISBN: 978-3-540-45656-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics