Skip to main content

Fast Text Classification Using Sequential Sampling Processes

  • Conference paper
  • First Online:
AI 2001: Advances in Artificial Intelligence (AI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2256))

Included in the following conference series:

  • 733 Accesses

Abstract

A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require levels of computation that prevent them from making sufficiently fast decisions in some applied setting. Using insights gained from examining the way humans make fast decisions when classifying text documents, two new text classification algorithms are developed based on sequential sampling processes. These algorithms make extremely fast decisions, because they need to examine only a small number of words in each text document. Evaluation against the Reuters-21578 collection shows both techniques have levels of performance that approach benchmark methods, and the ability of one of the classifiers to produce realistic measures of confidence in its decisions is shown to be useful for prioritizing relevant documents.

This research was supported by the Australian Defence Science and Technology Organisation. The author wishes to thank Peter Bruza, Simon Dennis, Brandon Pincombe, Douglas Vickers, and Chris Woodru.. Correspondence should be addressed to: Michael D. Lee, Department of Psychology, University of Adelaide, SA 5005, AUSTRALIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y Yang and X Liu, “A re-examination of text categorization methods,” in SIGIR’ 99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkley, CA, 1999, pp. 42–49, ACM.

    Google Scholar 

  2. G Gigerenzer and P M Todd, Simple Heuristics That Make Us Smart, Oxford University Press, New York, 1999.

    Google Scholar 

  3. P L Smith, “Stochastic dynamic models of response time and accuracy: A foundational primer,” Journal of Mathematical Psychology, vol. 44, pp. 408–463, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  4. D Vickers and M D Lee, “Dynamic models of simple judgments: I. Properties of a self-regulating accumulator module,” Non-linear Dynamics, Psychology, and Life Sciences, vol. 2, no. 3, pp. 169–194, 1998.

    Article  Google Scholar 

  5. R E Kass and A E Raftery, “Bayes factors,” Journal of the American Statistical Association, vol. 90, no. 430, pp. 773–795, 1995.

    Article  MATH  Google Scholar 

  6. D D Lewis, “Reuters-21578 text categorization test collection,” 1997, Available at http://www.research.att.com/~lewis/reuters21578/readme.txt.

  7. C J Van Risjbergen, Information Retrieval, Butterworths, London, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, M.D. (2001). Fast Text Classification Using Sequential Sampling Processes. In: Stumptner, M., Corbett, D., Brooks, M. (eds) AI 2001: Advances in Artificial Intelligence. AI 2001. Lecture Notes in Computer Science(), vol 2256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45656-2_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-45656-2_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42960-9

  • Online ISBN: 978-3-540-45656-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics