skip to main content
10.1145/2983323.2983907acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams

Published:24 October 2016Publication History

ABSTRACT

A priori determining the ideal number of component classifiers of an ensemble is an important problem. The volume and velocity of big data streams make this even more crucial in terms of prediction accuracies and resource requirements. There is a limited number of studies addressing this problem for batch mode and none for online environments. Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. We prove the existence of an ideal number of classifiers for an ensemble, using the weighted majority voting aggregation rule. In our experiments, we use two state-of-the-art online ensemble classifiers with six synthetic and six real-world data streams. The violation of providing independent component classifiers for our theoretical framework makes determining the exact ideal number of classifiers nearly impossible. We suggest upper bounds for the number of classifiers that gives the highest accuracy. An important implication of our study is that comparing online ensemble classifiers should be done based on these ideal values, since comparing based on a fixed number of classifiers can be misleading.

References

  1. X. Zhu, Stream Data Mining Repository, http://www.cse.fau.edu/ xqzhu/stream.html, 2010.Google ScholarGoogle Scholar
  2. E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn., 36(1-2):105--139, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bifet, G. Holmes, and B. Pfahringer. Leveraging bagging for evolving data streams. In ECML PKDD, pages 135--150, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda. New ensemble methods for evolving data streams. In ACM SIGKDD, pages 139--148, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Brzezinski and J. Stefanowski. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE TNNLS, 25(1):81--94, 2014.Google ScholarGoogle Scholar
  6. L.-W. Chan. Weighted least square ensemble networks. In IJCNN, volume 2, pages 1393--1396, 1999.Google ScholarGoogle Scholar
  7. G. Fumera and F. Roli. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI, 27(6):942--956, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Fumera, F. Roli, and A. Serrau. A theoretical analysis of bagging as a linear combination of classifiers. IEEE TPAMI, 30(7):1293--1299, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. C. Hansen, V. Pereyra, and G. Scherer. Least Squares Data Fitting with Applications. JHU Press, 2013.Google ScholarGoogle Scholar
  10. D. Hernandez-Lobato, G. Martinez-Munoz, and A. Suarez. How large should ensembles of classi ers be? Patt. Recog., 46(5):1323--1336, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn., 51(2):181--207, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Latinne, O. Debeir, and C. Decaestecker. Limiting the number of trees in random forests. In MCS, pages 178--187, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. M. Oshiro, P. S. Perez, and J. A. Baranauskas. How many trees in a random forest? In MLDM, pages 154--168, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. C. Oza and S. J. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In ACM SIGKDD, pages 359--364, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in
                        • Published in

                          cover image ACM Conferences
                          CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
                          October 2016
                          2566 pages
                          ISBN:9781450340731
                          DOI:10.1145/2983323

                          Copyright © 2016 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 24 October 2016

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Qualifiers

                          • short-paper

                          Acceptance Rates

                          CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

                          Upcoming Conference

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader