Creating ensemble classifiers through order and incremental data selection in a stream

Saunier, Nicolas; Midenet, Sophie

doi:10.1007/s10044-011-0263-5

Creating ensemble classifiers through order and incremental data selection in a stream

Application to the online learning of road safety indicators

Theoretical Advances
Published: 21 January 2012

Volume 16, pages 333–347, (2013)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Nicolas Saunier¹ &
Sophie Midenet²

337 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents an original time-sensitive traffic management application for road safety diagnosis in signalized intersections. Such applications require to deal with data streams that may be subject to concept drift over various time scales. The method for road safety analysis relies on the estimation of severity indicators for vehicle interactions based on complex and noisy spatial occupancy information. An expert provides imprecise labels based on video recordings of the traffic scenes. In order to improve the performance—overall and for each class—and the stability of learning in a stream, this paper presents new ensemble methods based on incremental algorithms that rely on their sensitivity to the processing order of instances. Different data selection criteria, many used in active learning methods, are studied in a comprehensive experimental evaluation, including benchmark datasets from the UCI machine learning repository and the prediction of severity indicators. The best performance is obtained with a criterion that selects instances which are misclassified by the current hypothesis. The proposed ensemble methods using this criterion and AdaBoost have similar principles and performance, while the proposed methods have a smaller computational training cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble method for data stream classification in the presence of concept drift

Article 10 December 2015

Omid Abbaszadeh, Ali Amiri & Ali Reza Khanteymoori

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

A novel methodology to predict urban traffic congestion with ensemble learning

Article 03 August 2016

G. Asencio-Cortés, E. Florido, … F. Martínez-Álvarez

References

Archer J (2004) Methods for the assessment and prediction of traffic safety at urban intersections and their application in micro-simulation modelling. Academic thesis, Royal Institute of Technology, Stockholm, Sweden. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-143
Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Blum A (1998) On-line algorithms in machine learning. In: Fiat A, Woeginger G (eds) Online algorithms. Lecture notes in computer science, vol 1442. Springer, Berlin, pp 306–325. doi:10.1007/BFb0029575
Boillot F, Midenet S, Pierrelée JC (2006) The real-time urban traffic control system CRONOS: algorithm and experiments. Transp Res Part C Emerg Technol 14(1):18–38. doi:10.1016/j.trc.2006.05.001
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. http://citeseer.ist.psu.edu/breiman96bagging.html
Google Scholar
Brown G, Wyatt J, Harris R, Yao X (2005) Ensemble diversity creation methods: a survey and categorisation. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):5–20
Google Scholar
Cohn D, Atlas L, Ladner RE (1994) Improving generalization with active learning. Mach Learn 15(2):201–221. http://citeseer.ist.psu.edu/cohn92improving.html
Google Scholar
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the 12th international conference on machine learning (ML-95), pp 150–157. http://citeseer.ist.psu.edu/17150.html
Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, 2nd edn. The MIT Press, Cambridge, pp 405–408
Dietterich TG (2002) Machine learning for sequential data: a review. In: Structural, syntactic, and statistical pattern recognition. Lecture notes in computer science, vol 2396. Springer, Berlin, pp 15–30
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29:103–130
Article MATH Google Scholar
Duda RO, Hart PE (2000) Pattern classification. Wiley-Interscience, New York
Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285
Article MathSciNet MATH Google Scholar
Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164. doi:10.1613/jair.487
MATH Google Scholar
Ho SS, Wechsler H (2004) Learning from data streams via online transduction. In: Ma S, Li T, Perng CS (eds) Workshop proceedings, temporal data mining: algorithms, theory and applications, ICDM 2004, Brighton, pp 45–52
Hoare Z (2008) Landscapes of naive Bayes classifiers. Pattern Anal Appl 11:59–72. doi:10.1007/s10044-007-0079-5
Article MathSciNet Google Scholar
Kittler J (1998) Combining classifiers: a theoretical framework. Pattern Anal Appl 1:18–27. doi:10.1007/BF01238023
Article Google Scholar
Kuncheva LI (2004) Classifier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Proceedings 5th international workshop on multiple classifier systems, MCS2004, Cagliari, Italy. Lecture notes in computer science, vol 3077, pp 1–15
Kuncheva LI (2005) Guest editorial. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):3–4
MathSciNet Google Scholar
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207
Article MATH Google Scholar
Lee HKH, Clyde MA (2004) Lossless online bayesian bagging. J Mach Learn Res 5:143–151
MathSciNet Google Scholar
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Cohen WW, Hirsh H (eds) Proceedings of ICML-94, 11th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 148–156. http://citeseer.nj.nec.com/135290.html
Melville P, Mooney R (2004) Diverse ensembles for active learning. In: Proceedings of the 21st international conference on machine learning (ICML-2004), Banff, Canada, pp 584–591. http://citeseer.ist.psu.edu/melville04diverse.html
Midenet S, Boillot F, Pierrelée JC (2004) Signalized intersection with real-time adaptive control: on-field assessment of CO₂ and pollutant emission reduction. Transp Res Part D Transp Environ 9:29–47. doi:10.1016/S1361-9209(03)00044-0
Article Google Scholar
Midenet S, Saunier N, Boillot F (2011) Exposure to lateral collision in signalized intersections with protected left turn under different traffic control strategies. Accid Anal Prev 43:1968–1978. doi:10.1016/j.aap.2011.05.015
Article Google Scholar
Oza N (2001) Online ensemble learning. PhD thesis, University of California, Berkeley
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45. doi:10.1109/MCAS.2006.1688199
Article Google Scholar
Saunier N, Midenet S (2010) Automatic estimation of the exposure to lateral collision in signalized intersections using video sensors. Tech. rep., http://arxiv.org/abs/1012.4776v1
Saunier N, Midenet S, Grumbach A (2003) Automatic detection of vehicle interactions in a signalized intersection. In: 16th international cooperation on theories and concepts in traffic safety workshop, Soesterberg, The Netherlands. http://www.ictct.org/Workshops/03-Soesterberg/Saunier.pdf
Saunier N, Midenet S, Grumbach A (2004) Stream-based learning through data selection in a road safety application. In: Onaindia E, Staab S (eds) STAIRS 2004, proceedings of the second starting AI researchers’ symposium. Frontiers in artificial intelligence and applications, vol 109. IOS Press, Valencia, pp 107–117
Google Scholar
Saunier N, Sayed T, Ismail K (2010) Large scale automated analysis of vehicle interactions and collisions. Transp Res Rec J Transp Res Board 2147:42–50. doi:10.3141/2147-06
Article Google Scholar
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 839–846. http://citeseer.nj.nec.com/schohn00less.html
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Computational learning theory, pp 287–294. http://citeseer.nj.nec.com/seung92query.html
Tong S (2001) Active learning: theory and applications. PhD thesis, Department of Computer Science of Stanford University
Tumer K, Oza NC (2003) Input decimated ensembles. Pattern Anal Appl 6:65–77. doi:10.1007/s10044-002-0181-7
Article MathSciNet MATH Google Scholar
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4:161–186
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar

Download references

Acknowledgments

The authors wish to thank the reviewers for their constructive comments that helped to significantly improve the paper. This work was supported by the French "Region Ile de France".

Author information

Authors and Affiliations

Civil, Mining and Geological Engineering Department, École Polytechnique de Montréal, succ. Centre-Ville, C.P. 6079, Montréal, QC, H3C 3A7, Canada
Nicolas Saunier
GRETTIA, Université Paris-Est, IFSTTAR, 93166, Noisy-le-Grand, France
Sophie Midenet

Authors

Nicolas Saunier
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Midenet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Saunier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saunier, N., Midenet, S. Creating ensemble classifiers through order and incremental data selection in a stream. Pattern Anal Applic 16, 333–347 (2013). https://doi.org/10.1007/s10044-011-0263-5

Download citation

Received: 16 September 2010
Accepted: 30 December 2011
Published: 21 January 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10044-011-0263-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating ensemble classifiers through order and incremental data selection in a stream

Abstract

Access this article

Similar content being viewed by others

An ensemble method for data stream classification in the presence of concept drift

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

A novel methodology to predict urban traffic congestion with ensemble learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Creating ensemble classifiers through order and incremental data selection in a stream

Abstract

Access this article

Similar content being viewed by others

An ensemble method for data stream classification in the presence of concept drift

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

A novel methodology to predict urban traffic congestion with ensemble learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation