Skip to main content

Scaling Up a Boosting-Based Learner via Adaptive Sampling

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Abstract

In this paper we present a experimental evaluation of a boosting based learning system and show that can be run efficiently over a large dataset. The system uses as base learner decision stumps, single atribute decision trees with only two terminal nodes. To select the best decision stump at each iteration we use an adaptive sampling method. As a boosting algorithm, we use a modification of AdaBoost that is suitable to be combined with a base learner that does not use all the dataset. We provide experimental evidence that our method is as accurate as the equivalent algorithm that uses all the dataset but much faster.

Thanks to the European Commission for their generous support via a EU S&T fellowship programme.

Supported in part by the Ministry of Education, Science, Sports and Culture of Japan, Grant-in-Aid for Scientific Research on Priority Areas (Discovery Science).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, E. and Kohavi, R. 1998. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 1–38, 1998.

    Google Scholar 

  2. Dietterich, T., 1998. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning, 32:1–22.

    Google Scholar 

  3. Domingo, C, Gavaldà, R. and Watanabe, R., 1998. Practical Algorithms for On-line Selection. Proceedings of the First International Conference on Discovery Science, DS’98. Lecture Notes in Artificial Intelligence 1532:150–161.

    Google Scholar 

  4. Domingo, C, Gavaldà, R. and Watanabe, O., 1999. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Proceedings of the Second International Conference on Discovery Science, DS’99. Lecture Notes on Artificial Intelligence, 1721, pp. 172–183.

    Google Scholar 

  5. Domingo, C. and Watanabe, O., 1999. MadaBoost: A modification of AdaBoost. Tech Rep. C-133, Dept. of Math and Computing Science, Tokyo Institute of Technology. URL: http://www.is.titech.ac.jp/~carlos.

  6. Domingo, C. and Watanabe, O., 1999. Experimental evaluation of a modification of AdaBoost for the filtering framework. Tech Rep. C-139, Dept. of Math and Computing Science, Tokyo Institute of Technology. URL: http://www.is.titech.ac.jp/~carlos.

  7. P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning, 29:2, 103–130, 1997.

    Article  MATH  Google Scholar 

  8. Dougherty, J., Kohavi, R., and Sahami, M., 1995. Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning.

    Google Scholar 

  9. Fayad, U.M. and Irani, K.B., 1993. Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027.

    Google Scholar 

  10. Freund, Y., 1995. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285.

    Article  MATH  MathSciNet  Google Scholar 

  11. Freund, Y., and Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. JCSS, 55(1):119–139.

    MATH  MathSciNet  Google Scholar 

  12. Freund, Y., and Schapire, R.E., 1997. Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, 148–146.

    Google Scholar 

  13. R.C. Holte. Very simple classification rules perform well on most common datasets. Machine Learning, 11:63–91, 1993.

    Article  MATH  Google Scholar 

  14. John, G. H. and Langley, P., 1996. Static Versus Dynamic Sampling for Data Mining. Proc. of the Second International Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press.

    Google Scholar 

  15. Keogh, E., Blake, C. and Merz, C.J., 1998. UCI repository of machine learning databases, [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

    Google Scholar 

  16. Lipton, R. J. and Naughton, J. F., 1995. Query Size Estimation by Adaptive Sampling. Journal of Computer and System Science, 51:18–25.

    Article  MATH  MathSciNet  Google Scholar 

  17. Lipton, R. J., Naughton, J. F., Schneider, D. A. and Seshadri, S., 1995. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195–226.

    Article  MathSciNet  Google Scholar 

  18. Provost, F., Jensen, D. and Oates, T., 1999. Efficient Progressive Sampling. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  19. Quinlan, J. R., 1996. Bagging, Boosting and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI Press and the MIT Press, pp. 725–723.

    Google Scholar 

  20. Quinlan, J. R., 1993. C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, California.

    Google Scholar 

  21. Schapire, R. E., 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.

    Google Scholar 

  22. Wald, A., 1947. Sequential Analysis. Wiley Mathematical, Statistics Series.

    Google Scholar 

  23. Watanabe, O., 1999. From Computational Learning Theory to Discovery Science. Proc. of the 26th International Colloquim on Automata, Languages and Programming, ICALP’99 Invited talk. Lecture Notes in Computer Science 1644:134–148.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Domingo, C., Watanabe, O. (2000). Scaling Up a Boosting-Based Learner via Adaptive Sampling. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-45571-X_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67382-8

  • Online ISBN: 978-3-540-45571-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics