skip to main content
10.1145/564376.564413acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Probabilistic combination of text classifiers using reliability indicators: models and results

Published:11 August 2002Publication History

ABSTRACT

The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators---variables that provide a valuable signal about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.

References

  1. K. Al-Kofahi, A. Tyrrell, A. Vacher, T. Travers, and P. Jackson. Combining multiple classifiers for text categorization. In CIKM '01, pages 97--104, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. T. Bartell, G. W. Cottrell, and R. K. Belew. Automatic combination of multiple ranked retrieval systems. In SIGIR '94, pages 173--181, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Belkin, C. Cool, W. Croft, and J. Callan. The effect of multiple query representations on information retrieval system performance. In SIGIR '93, pages 339--346, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. In UAI '97, pages 80--89, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Corporation. WinMine Toolkit v1.0. http://research.microsoft.com/\~ dmax /WinMine/ContactInfo.html, 2001.Google ScholarGoogle Scholar
  6. S. T. Dumais and H. Chen. Hierarchical classification of web content. In SIGIR '00, pages 256--263, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, pages 148--155, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49--75, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Hull, J. Pedersen, and H. Schuetze. Method combination for document filtering. In SIGIR '96, pages 279--287, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML '98, pages 137--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Katzer, M. McGill, J. Tessier, W. Frakes, and P. DasGupta. A study of the overlap among document representations. Information Technology: Research and Development, 1:261--274, 1982.Google ScholarGoogle Scholar
  12. W. Lam and K.-Y. Lai. A meta-learning approach for text categorization. In SIGIR '01, pages 303--309, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. S. Larkey and W. B. Croft. Combining classifiers in text categorization. In SIGIR '96, pages 289--297, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. D. Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum, 29(2):13--19, Fall 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. D. Lewis. Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources /testcollections/reuters21578, January 1997.Google ScholarGoogle Scholar
  16. D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94, pages 3--12, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training algorithms for linear text classifiers. In SIGIR '96, pages 298--306, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Li and A. Jain. Classification of text documents. The Computer Journal, 41(8):537--546, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In Working Notes of AAAI 1998, Workshop on Learning for Text Categorization, pages 41--48, 1998.Google ScholarGoogle Scholar
  20. J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.Google ScholarGoogle Scholar
  21. F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Rajashekar and W. Croft. Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 6(4):272--283, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. E. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39:135--168, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Shaw and E. Fox. Combination of multiple searches. In D. K. Harman, editor, TREC-3 Conference, number 500-225 in NIST Special Publication, pages 105--108, 1995.Google ScholarGoogle Scholar
  25. K. Ting and I. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271--289, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  26. K. Toyama and E. Horvitz. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In ACCV 2000, Fourth Asian Conference on Computer Vision, 2000.Google ScholarGoogle Scholar
  27. C. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T. Goets, and T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. H. Wolpert. Stacked generalization. Neural Networks, 5:241--259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Yang, T. Ault, and T. Pierce. Combining multiple learning strategies for effective cross validation. In ICML '00, pages 1167--1182, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR '99, pages 42--49, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Probabilistic combination of text classifiers using reliability indicators: models and results

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
        August 2002
        478 pages
        ISBN:1581135610
        DOI:10.1145/564376

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader