short-paper

A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams

Authors:
Hamed R. Bonab

Bilkent University, Ankara, Turkey

Bilkent University, Ankara, Turkey
View Profile

,
Fazli Can

Bilkent University, Ankara, Turkey

Bilkent University, Ankara, Turkey
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 2053–2056https://doi.org/10.1145/2983323.2983907

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 2053–2056

ABSTRACT

A priori determining the ideal number of component classifiers of an ensemble is an important problem. The volume and velocity of big data streams make this even more crucial in terms of prediction accuracies and resource requirements. There is a limited number of studies addressing this problem for batch mode and none for online environments. Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. We prove the existence of an ideal number of classifiers for an ensemble, using the weighted majority voting aggregation rule. In our experiments, we use two state-of-the-art online ensemble classifiers with six synthetic and six real-world data streams. The violation of providing independent component classifiers for our theoretical framework makes determining the exact ideal number of classifiers nearly impossible. We suggest upper bounds for the number of classifiers that gives the highest accuracy. An important implication of our study is that comparing online ensemble classifiers should be done based on these ideal values, since comparing based on a fixed number of classifiers can be misleading.

References

X. Zhu, Stream Data Mining Repository, http://www.cse.fau.edu/ xqzhu/stream.html, 2010.Google Scholar
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn., 36(1-2):105--139, 1999. Google ScholarDigital Library
A. Bifet, G. Holmes, and B. Pfahringer. Leveraging bagging for evolving data streams. In ECML PKDD, pages 135--150, 2010. Google ScholarDigital Library
A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda. New ensemble methods for evolving data streams. In ACM SIGKDD, pages 139--148, 2009. Google ScholarDigital Library
D. Brzezinski and J. Stefanowski. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE TNNLS, 25(1):81--94, 2014.Google Scholar
L.-W. Chan. Weighted least square ensemble networks. In IJCNN, volume 2, pages 1393--1396, 1999.Google Scholar
G. Fumera and F. Roli. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI, 27(6):942--956, 2005. Google ScholarDigital Library
G. Fumera, F. Roli, and A. Serrau. A theoretical analysis of bagging as a linear combination of classifiers. IEEE TPAMI, 30(7):1293--1299, 2008. Google ScholarDigital Library
P. C. Hansen, V. Pereyra, and G. Scherer. Least Squares Data Fitting with Applications. JHU Press, 2013.Google Scholar
D. Hernandez-Lobato, G. Martinez-Munoz, and A. Suarez. How large should ensembles of classi ers be? Patt. Recog., 46(5):1323--1336, 2013. Google ScholarDigital Library
L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn., 51(2):181--207, 2003. Google ScholarDigital Library
P. Latinne, O. Debeir, and C. Decaestecker. Limiting the number of trees in random forests. In MCS, pages 178--187, 2001. Google ScholarDigital Library
T. M. Oshiro, P. S. Perez, and J. A. Baranauskas. How many trees in a random forest? In MLDM, pages 154--168, 2012. Google ScholarDigital Library
N. C. Oza and S. J. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In ACM SIGKDD, pages 359--364, 2001. Google ScholarDigital Library

Index Terms

Recommendations

How large should ensembles of classifiers be?

We propose to determine the size of a parallel ensemble by estimating the minimum number of classifiers that are required to obtain stable aggregate predictions. Assuming that majority voting is used, a statistical description of the convergence of the ...
Read More
Optimal ensemble construction via meta-evolutionary ensembles

In this paper, we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extra rewards for getting ...
Read More
A geometric framework for multiclass ensemble classifiers
Abstract
Ensemble classifiers have been investigated by many in the artificial intelligence and machine learning community. Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning. However, understanding ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data stream
ensemble size
weighted majority voting
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 667
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

How large should ensembles of classifiers be?

Optimal ensemble construction via meta-evolutionary ensembles

A geometric framework for multiclass ensemble classifiers