Abstract
This paper presents an original time-sensitive traffic management application for road safety diagnosis in signalized intersections. Such applications require to deal with data streams that may be subject to concept drift over various time scales. The method for road safety analysis relies on the estimation of severity indicators for vehicle interactions based on complex and noisy spatial occupancy information. An expert provides imprecise labels based on video recordings of the traffic scenes. In order to improve the performance—overall and for each class—and the stability of learning in a stream, this paper presents new ensemble methods based on incremental algorithms that rely on their sensitivity to the processing order of instances. Different data selection criteria, many used in active learning methods, are studied in a comprehensive experimental evaluation, including benchmark datasets from the UCI machine learning repository and the prediction of severity indicators. The best performance is obtained with a criterion that selects instances which are misclassified by the current hypothesis. The proposed ensemble methods using this criterion and AdaBoost have similar principles and performance, while the proposed methods have a smaller computational training cost.
Similar content being viewed by others
References
Archer J (2004) Methods for the assessment and prediction of traffic safety at urban intersections and their application in micro-simulation modelling. Academic thesis, Royal Institute of Technology, Stockholm, Sweden. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-143
Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Blum A (1998) On-line algorithms in machine learning. In: Fiat A, Woeginger G (eds) Online algorithms. Lecture notes in computer science, vol 1442. Springer, Berlin, pp 306–325. doi:10.1007/BFb0029575
Boillot F, Midenet S, Pierrelée JC (2006) The real-time urban traffic control system CRONOS: algorithm and experiments. Transp Res Part C Emerg Technol 14(1):18–38. doi:10.1016/j.trc.2006.05.001
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. http://citeseer.ist.psu.edu/breiman96bagging.html
Brown G, Wyatt J, Harris R, Yao X (2005) Ensemble diversity creation methods: a survey and categorisation. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):5–20
Cohn D, Atlas L, Ladner RE (1994) Improving generalization with active learning. Mach Learn 15(2):201–221. http://citeseer.ist.psu.edu/cohn92improving.html
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the 12th international conference on machine learning (ML-95), pp 150–157. http://citeseer.ist.psu.edu/17150.html
Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, 2nd edn. The MIT Press, Cambridge, pp 405–408
Dietterich TG (2002) Machine learning for sequential data: a review. In: Structural, syntactic, and statistical pattern recognition. Lecture notes in computer science, vol 2396. Springer, Berlin, pp 15–30
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29:103–130
Duda RO, Hart PE (2000) Pattern classification. Wiley-Interscience, New York
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285
Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164. doi:10.1613/jair.487
Ho SS, Wechsler H (2004) Learning from data streams via online transduction. In: Ma S, Li T, Perng CS (eds) Workshop proceedings, temporal data mining: algorithms, theory and applications, ICDM 2004, Brighton, pp 45–52
Hoare Z (2008) Landscapes of naive Bayes classifiers. Pattern Anal Appl 11:59–72. doi:10.1007/s10044-007-0079-5
Kittler J (1998) Combining classifiers: a theoretical framework. Pattern Anal Appl 1:18–27. doi:10.1007/BF01238023
Kuncheva LI (2004) Classifier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Proceedings 5th international workshop on multiple classifier systems, MCS2004, Cagliari, Italy. Lecture notes in computer science, vol 3077, pp 1–15
Kuncheva LI (2005) Guest editorial. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):3–4
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207
Lee HKH, Clyde MA (2004) Lossless online bayesian bagging. J Mach Learn Res 5:143–151
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Cohen WW, Hirsh H (eds) Proceedings of ICML-94, 11th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 148–156. http://citeseer.nj.nec.com/135290.html
Melville P, Mooney R (2004) Diverse ensembles for active learning. In: Proceedings of the 21st international conference on machine learning (ICML-2004), Banff, Canada, pp 584–591. http://citeseer.ist.psu.edu/melville04diverse.html
Midenet S, Boillot F, Pierrelée JC (2004) Signalized intersection with real-time adaptive control: on-field assessment of CO2 and pollutant emission reduction. Transp Res Part D Transp Environ 9:29–47. doi:10.1016/S1361-9209(03)00044-0
Midenet S, Saunier N, Boillot F (2011) Exposure to lateral collision in signalized intersections with protected left turn under different traffic control strategies. Accid Anal Prev 43:1968–1978. doi:10.1016/j.aap.2011.05.015
Oza N (2001) Online ensemble learning. PhD thesis, University of California, Berkeley
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45. doi:10.1109/MCAS.2006.1688199
Saunier N, Midenet S (2010) Automatic estimation of the exposure to lateral collision in signalized intersections using video sensors. Tech. rep., http://arxiv.org/abs/1012.4776v1
Saunier N, Midenet S, Grumbach A (2003) Automatic detection of vehicle interactions in a signalized intersection. In: 16th international cooperation on theories and concepts in traffic safety workshop, Soesterberg, The Netherlands. http://www.ictct.org/Workshops/03-Soesterberg/Saunier.pdf
Saunier N, Midenet S, Grumbach A (2004) Stream-based learning through data selection in a road safety application. In: Onaindia E, Staab S (eds) STAIRS 2004, proceedings of the second starting AI researchers’ symposium. Frontiers in artificial intelligence and applications, vol 109. IOS Press, Valencia, pp 107–117
Saunier N, Sayed T, Ismail K (2010) Large scale automated analysis of vehicle interactions and collisions. Transp Res Rec J Transp Res Board 2147:42–50. doi:10.3141/2147-06
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 839–846. http://citeseer.nj.nec.com/schohn00less.html
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Computational learning theory, pp 287–294. http://citeseer.nj.nec.com/seung92query.html
Tong S (2001) Active learning: theory and applications. PhD thesis, Department of Computer Science of Stanford University
Tumer K, Oza NC (2003) Input decimated ensembles. Pattern Anal Appl 6:65–77. doi:10.1007/s10044-002-0181-7
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4:161–186
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Acknowledgments
The authors wish to thank the reviewers for their constructive comments that helped to significantly improve the paper. This work was supported by the French "Region Ile de France".
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saunier, N., Midenet, S. Creating ensemble classifiers through order and incremental data selection in a stream. Pattern Anal Applic 16, 333–347 (2013). https://doi.org/10.1007/s10044-011-0263-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0263-5