Skip to main content
Log in

Boosting Algorithms for Parallel and Distributed Learning

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Blackard, “Comparison of neural networks and discriminant analysis in predicting forest cover types,” Ph.D. dissertation, Colorado State University, 1998.

  2. C.L. Blake and C.J. Merz, “UCI repository of machine learning databases,” http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science, 1998.

    Google Scholar 

  3. L. Breiman and N. Shang, “Born again trees,” ftp://ftp.stat.berkeley.edu/pub/users/breiman/BAtrees.ps, 1996.

  4. P. Chan and S. Stolfo, “On the accuracy of meta-learning for scalable data mining,” Journal of Intelligent Integration of Information, L. Kerschberg (Ed.), 1998.

  5. S. Clearwater, T. Cheng, H. Hirsh, and B. Buchanan, “Incremental batch learning.” in Proc. of the Sixth Int. Machine Learning Workshop, Ithaca, NY, 1989, pp. 366–370.

  6. J. Dy and C. Brodley, “Feature subset selection and order identification for unsupervised learning,” in Proc. of the Seventeenth Int. Conf. on Machine Learning, Stanford, CA, 2000, pp. 247–254.

  7. W. Fan, S. Stolfo, and J. Zhang, “The application of Adaboost for distributed, scalable and on-line learning,” in Proc. of the Fifth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, 1999, pp. 362–366.

  8. B. Flury, “A first course in multivariate statistics, Springer-Verlag: New York, NY, 1997.

    Google Scholar 

  9. Y. Freund and R.E. Schapire, “Experiments with a new boosting algorithm,” in Proc. of the Thirteenth Int. Conf. on Machine Learning, San Francisco, CA, 1996, pp. 325–332.

  10. J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting,” The Annals of Statistics, vol. 38, no.2, pp. 337–374, 2000.

    Google Scholar 

  11. M. Hagan and M.B. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. on Neural Networks, vol. 5, pp. 989–993, 1994.

    Google Scholar 

  12. S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall: Englewood Cliffs, NJ, 1999.

    Google Scholar 

  13. M. Jordan and R. Jacobs, “Hierarchical mixture of experts and the EM algorithm,” Neural Computation, vol. 6, no.2, pp. 181–214, 1994.

    Google Scholar 

  14. A. Lazarevic, T. Fiez, and Z. Obradovic, “Adaptive boosting for spatial functions with unstable driving attributes,” in Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Kyoto, Japan, 2000, pp. 329–340.

  15. A. Lazarevic and Z. Obradovic. “Boosting localized classifiers in heterogeneous databases,” in Proc. on First Int. SIAM Conf. on Data Mining, Chicago, IL, 2001.

  16. A. Lazarevic and Z. Obradovic, “The effective pruning of neural network ensembles,” in Proc. of IEEE Int. Joint Conf. on Neural Networks, Washington, D.C., 2001, pp. 796–801.

  17. A. Lazarevic, D. Pokrajac, and Z. Obradovic, “Distributed clustering and local regression for knowledge discovery in multiple spatial databases,” in Proc. 8th European Symp. on Art. Neural Networks, Bruges, Belgium, 2000, pp. 129–134.

  18. A. Lazarevic, X. Xu, T. Fiez, and Z. Obradovic, “Clustering-Regression-Ordering steps for knowledge discovery in spatial databases,” in Proc. IEEE/INNS Int. Conf. on Neural Networks, Washington D.C., No. 345, Session 8.1B, 1999.

  19. L. Mason, J. Baxter, P. Bartlett, and M. Frean. “Function gradient techniques for combining hypotheses,” in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans (Eds.), MIT Press: Cambridge, MA, 2000, chap. 12.

    Google Scholar 

  20. D. Pokrajac, T. Fiez, and Z. Obradovic, “A data generator for evaluating spatial issues in precision agriculture,” Precision Agriculture, in press.

  21. D. Pokrajac, A. Lazarevic, V. Megalooikonomou, and Z. Obradovic, “Classification of brain image data using measures of distributional distance,” in Proc. 7th Annual Meeting of the Organization for Human Brain Mapping, London, UK, 2001.

  22. A. Prodromidis, P. Chan, and S. Stolfo, “Meta-Learning in distributed data mining systems: Issues and approaches,” in Advances in Distributed Data Mining, H. Kargupta and P. Chan (Eds.), AAAI Press: Menlo Park, CA, 2000.

    Google Scholar 

  23. F. Provost and D. Hennesy, “Scaling Up: Distributed machine learning with cooperation,” in Proc. of the Thirteenth National Conf. on Artificial Intelligence, Portland, OR, 1996, pp. 74–79.

  24. M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” in Proc. of the IEEE Int. Conf. on Neural Networks, San Francisco, CA, 1993, pp. 586–591.

  25. J. Sander, M. Ester, H.-P. Kriegel, and X. Xu “Density-Based clustering in spatial databases: The algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, vol. 2, no.2, pp. 169–194, 1998.

    Google Scholar 

  26. J. Shafer, R. Agrawal, and M. Mehta, “SPRINT: A scalable parallel classifier for data mining,” in Proc. of the 22nd Int. Conf. on Very Large Data Bases, Mumbai (Bombay), India, 1996, pp. 544–555.

  27. P. Sollich and A. Krogh, “Learning with ensembles: How over-fitting can be useful.” Advances in Neural Information Processing Systems, vol. 8, pp. 190–196, 1996.

    Google Scholar 

  28. M. Sreenivas, K. AlSabti, and S. Ranka, “Parallel out-of-core decision tree classifiers,” in Advances in Distributed Data Mining, H. Kargupta and P. Chan (Eds.), AAAI Press: Menlo Park, CA, 2000.

    Google Scholar 

  29. A. Srivastava, E. Han, V. Kumar, and V. Singh, “Parallel formulations of decision-tree classification algorithms,” Data Mining and Knowledge Discovery, vol. 3, no.3, pp. 237–261, 1999.

    Google Scholar 

  30. P. Utgoff, “An improved algorithm for incremental induction of decision trees,” in Proc. of the Eleventh Int. Conf. on Machine Learning, New Brunswick, NJ, 1994, pp. 318–325.

  31. M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “Parallel algorithms for discovery of association rules,” Data Mining and Knowledge Discovery: An International Journal, special issue on Scalable High-Performance Computing, vol. 1, no.4, pp. 343–373, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lazarevic, A., Obradovic, Z. Boosting Algorithms for Parallel and Distributed Learning. Distributed and Parallel Databases 11, 203–229 (2002). https://doi.org/10.1023/A:1013992203485

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013992203485

Navigation