Abstract
In New Zealand, road accident casualties have been increasing. Factor analyses and time series analyses show what types of accidents result in casualties, but the results from the analysis can become outdated. We propose a stream classification framework with drift detection to signal and adapt when the factors associated with crash casualties change over time. We propose a drift detection framework, G-mean Adaptive drift Detection (GAD), which adapts a classifier threshold to maximise G-mean. This metric rewards maximising accuracy on each class while keeping these accuracies balanced. As a result, GAD can make concept drift in the minority class easier to detect. We also propose a recurring concept classification framework, G-mean Concept Profiling Framework (GCPF), which reuses previously trained classifiers and uses GAD’s approach to drift detection. Through experimentation, we show GAD improves G-mean without increasing false positive drift detection on imbalanced synthetic and real world datasets. We also show GCPF achieves better G-mean than other state-of-the-art stream classification approaches on the NZ crash data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, R., Koh, Y.S., Dobbie, G.: Predicting concept drifts in data streams using metadata clustering. In: IJCNN. IEEE (2018)
Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019)
Barros, R.S.M., Santos, S.G.T.C.: A large-scale comparison of concept drift detectors. Inf. Sci. 451, 348–370 (2018)
Bifet, A., Frank, E.: Sentiment knowledge discovery in Twitter streaming data. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS (LNAI), vol. 6332, pp. 1–15. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16184-1_1
Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017)
Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: IJCNN, pp. 1–8. IEEE (2018)
Frías-Blanco, I., del Campo-Ávila, J., Ramos-Jiménez, G., Morales-Bueno, R., Ortiz-Díaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE TKDE 27(3), 810–823 (2015)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)
Gonçalves Jr., P.M., De Barros, R.S.M.: RCD: a recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)
Huang, D.T.J., Koh, Y.S., Dobbie, G., Pears, R.: Detecting volatility shift in data streams. In: 2014 IEEE International Conference on Data Mining, pp. 863–868. IEEE (2014)
IRTAD: IRTAD road safety annual report 2018 (2018). https://www.itf-oecd.org/road-safety-annual-report-2018
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, Nashville, USA, vol. 97, pp. 179–186 (1997)
Lavrenz, S.M., Vlahogianni, E.I., Gkritza, K., Ke, Y.: Time series modeling in traffic safety research. Accid. Anal. Prev. 117, 368–380 (2018)
Twyford, P., Genter, J.A.: New Zealand government press release: \$1.4 billion to save lives on our roads (2018). https://www.beehive.govt.nz/release/14-billion-save-lives-our-roads
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–20 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anderson, R., Koh, Y.S., Dobbie, G. (2019). Classifying Imbalanced Road Accident Data Using Recurring Concept Drift. In: Le, T., et al. Data Mining. AusDM 2019. Communications in Computer and Information Science, vol 1127. Springer, Singapore. https://doi.org/10.1007/978-981-15-1699-3_12
Download citation
DOI: https://doi.org/10.1007/978-981-15-1699-3_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1698-6
Online ISBN: 978-981-15-1699-3
eBook Packages: Computer ScienceComputer Science (R0)