Skip to main content
Log in

Creating ensemble classifiers through order and incremental data selection in a stream

Application to the online learning of road safety indicators

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents an original time-sensitive traffic management application for road safety diagnosis in signalized intersections. Such applications require to deal with data streams that may be subject to concept drift over various time scales. The method for road safety analysis relies on the estimation of severity indicators for vehicle interactions based on complex and noisy spatial occupancy information. An expert provides imprecise labels based on video recordings of the traffic scenes. In order to improve the performance—overall and for each class—and the stability of learning in a stream, this paper presents new ensemble methods based on incremental algorithms that rely on their sensitivity to the processing order of instances. Different data selection criteria, many used in active learning methods, are studied in a comprehensive experimental evaluation, including benchmark datasets from the UCI machine learning repository and the prediction of severity indicators. The best performance is obtained with a criterion that selects instances which are misclassified by the current hypothesis. The proposed ensemble methods using this criterion and AdaBoost have similar principles and performance, while the proposed methods have a smaller computational training cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Archer J (2004) Methods for the assessment and prediction of traffic safety at urban intersections and their application in micro-simulation modelling. Academic thesis, Royal Institute of Technology, Stockholm, Sweden. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-143

  2. Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html

  3. Blum A (1998) On-line algorithms in machine learning. In: Fiat A, Woeginger G (eds) Online algorithms. Lecture notes in computer science, vol 1442. Springer, Berlin, pp 306–325. doi:10.1007/BFb0029575

  4. Boillot F, Midenet S, Pierrelée JC (2006) The real-time urban traffic control system CRONOS: algorithm and experiments. Transp Res Part C Emerg Technol 14(1):18–38. doi:10.1016/j.trc.2006.05.001

    Article  Google Scholar 

  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. http://citeseer.ist.psu.edu/breiman96bagging.html

    Google Scholar 

  6. Brown G, Wyatt J, Harris R, Yao X (2005) Ensemble diversity creation methods: a survey and categorisation. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):5–20

    Google Scholar 

  7. Cohn D, Atlas L, Ladner RE (1994) Improving generalization with active learning. Mach Learn 15(2):201–221. http://citeseer.ist.psu.edu/cohn92improving.html

    Google Scholar 

  8. Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the 12th international conference on machine learning (ML-95), pp 150–157. http://citeseer.ist.psu.edu/17150.html

  9. Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, 2nd edn. The MIT Press, Cambridge, pp 405–408

  10. Dietterich TG (2002) Machine learning for sequential data: a review. In: Structural, syntactic, and statistical pattern recognition. Lecture notes in computer science, vol 2396. Springer, Berlin, pp 15–30

  11. Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  MATH  Google Scholar 

  12. Duda RO, Hart PE (2000) Pattern classification. Wiley-Interscience, New York

    Google Scholar 

  13. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285

    Article  MathSciNet  MATH  Google Scholar 

  14. Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164. doi:10.1613/jair.487

    MATH  Google Scholar 

  15. Ho SS, Wechsler H (2004) Learning from data streams via online transduction. In: Ma S, Li T, Perng CS (eds) Workshop proceedings, temporal data mining: algorithms, theory and applications, ICDM 2004, Brighton, pp 45–52

  16. Hoare Z (2008) Landscapes of naive Bayes classifiers. Pattern Anal Appl 11:59–72. doi:10.1007/s10044-007-0079-5

    Article  MathSciNet  Google Scholar 

  17. Kittler J (1998) Combining classifiers: a theoretical framework. Pattern Anal Appl 1:18–27. doi:10.1007/BF01238023

    Article  Google Scholar 

  18. Kuncheva LI (2004) Classifier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Proceedings 5th international workshop on multiple classifier systems, MCS2004, Cagliari, Italy. Lecture notes in computer science, vol 3077, pp 1–15

  19. Kuncheva LI (2005) Guest editorial. Inf Fusion J (Spec Issue Divers Multiple Classif Syst) 6(1):3–4

    MathSciNet  Google Scholar 

  20. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207

    Article  MATH  Google Scholar 

  21. Lee HKH, Clyde MA (2004) Lossless online bayesian bagging. J Mach Learn Res 5:143–151

    MathSciNet  Google Scholar 

  22. Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Cohen WW, Hirsh H (eds) Proceedings of ICML-94, 11th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 148–156. http://citeseer.nj.nec.com/135290.html

  23. Melville P, Mooney R (2004) Diverse ensembles for active learning. In: Proceedings of the 21st international conference on machine learning (ICML-2004), Banff, Canada, pp 584–591. http://citeseer.ist.psu.edu/melville04diverse.html

  24. Midenet S, Boillot F, Pierrelée JC (2004) Signalized intersection with real-time adaptive control: on-field assessment of CO2 and pollutant emission reduction. Transp Res Part D Transp Environ 9:29–47. doi:10.1016/S1361-9209(03)00044-0

    Article  Google Scholar 

  25. Midenet S, Saunier N, Boillot F (2011) Exposure to lateral collision in signalized intersections with protected left turn under different traffic control strategies. Accid Anal Prev 43:1968–1978. doi:10.1016/j.aap.2011.05.015

    Article  Google Scholar 

  26. Oza N (2001) Online ensemble learning. PhD thesis, University of California, Berkeley

  27. Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45. doi:10.1109/MCAS.2006.1688199

    Article  Google Scholar 

  28. Saunier N, Midenet S (2010) Automatic estimation of the exposure to lateral collision in signalized intersections using video sensors. Tech. rep., http://arxiv.org/abs/1012.4776v1

  29. Saunier N, Midenet S, Grumbach A (2003) Automatic detection of vehicle interactions in a signalized intersection. In: 16th international cooperation on theories and concepts in traffic safety workshop, Soesterberg, The Netherlands. http://www.ictct.org/Workshops/03-Soesterberg/Saunier.pdf

  30. Saunier N, Midenet S, Grumbach A (2004) Stream-based learning through data selection in a road safety application. In: Onaindia E, Staab S (eds) STAIRS 2004, proceedings of the second starting AI researchers’ symposium. Frontiers in artificial intelligence and applications, vol 109. IOS Press, Valencia, pp 107–117

    Google Scholar 

  31. Saunier N, Sayed T, Ismail K (2010) Large scale automated analysis of vehicle interactions and collisions. Transp Res Rec J Transp Res Board 2147:42–50. doi:10.3141/2147-06

    Article  Google Scholar 

  32. Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 839–846. http://citeseer.nj.nec.com/schohn00less.html

  33. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Computational learning theory, pp 287–294. http://citeseer.nj.nec.com/seung92query.html

  34. Tong S (2001) Active learning: theory and applications. PhD thesis, Department of Computer Science of Stanford University

  35. Tumer K, Oza NC (2003) Input decimated ensembles. Pattern Anal Appl 6:65–77. doi:10.1007/s10044-002-0181-7

    Article  MathSciNet  MATH  Google Scholar 

  36. Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4:161–186

    Article  Google Scholar 

  37. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

Download references

Acknowledgments

The authors wish to thank the reviewers for their constructive comments that helped to significantly improve the paper. This work was supported by the French "Region Ile de France".

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Saunier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saunier, N., Midenet, S. Creating ensemble classifiers through order and incremental data selection in a stream. Pattern Anal Applic 16, 333–347 (2013). https://doi.org/10.1007/s10044-011-0263-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0263-5

Keywords

Navigation