Abstract
In modern society, accidents on the roads are one of the most life-threatening dangers to humans. Traffic accidents that cause a lot of damages are occurring all over the places. The most effective solution to these types of accidents can be to predict future accidents in advance, giving drivers chances to avoid the dangers or reduce the damage by responding quickly. Predicting accidents on the road can be achieved using classification analysis, a data mining procedure requiring enough data to build a learning model. However, building such a predicting system involves several problems. It requires many hardware resources to collect and analyze traffic data for predicting traffic accidents since the data are extremely large. Furthermore, the size of data related to traffic accidents is less than that not related to traffic accidents; the amounts of the two classes (classes to be predicted and other classes) of data differ and are thus imbalanced. The purpose of this paper is to build a predicting model that can resolve all these problems. This paper suggests using the Hadoop framework to process and analyze big traffic data efficiently and a sampling method to resolve the problem of data imbalance. Based on this, the predicting system first preprocesses the big traffic data and analyzes it to create data for the learning system. The imbalance of created data is corrected using a sampling method. To improve the predicting accuracy, corrected data are classified into several groups, to which classification analysis is applied.
Similar content being viewed by others
References
Lv Y, Tang S, Zhao H (2009) Real-time highway traffic accident prediction based on the \(k\)-nearest neighbor method. In: International conference on measuring technology and mechatronics automation (ICMTMA), vol 3, pp 547–550
Yu R, Liu X (2010) Study on traffic accidents prediction model based on RBF neural network. In: 2nd international conference on information engineering and computer science (ICIECS), pp 1–4
Lv Y, Tang S, Zhao H (2010) Research on influence extention of two-lane highway intersections based on traffic accident database. In: International conference on optoelectronics and image processing (ICOIP), vol 2, pp 244–246
Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: Empirical software engineering and measurement, pp 196–204
Gothenberg A, Tenhunen H (1998) Performance analysis of low oversampling ratio sigma-delta noise shapers for RF applications. In: Proceedings of the 1998 IEEE international symposium on circuits and systems (ISCAS’98), vol 1, pp 401–404
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113
Lee T, Kim H, Rhee K-H, Shin S-U (2013) Implementation and performance of distributed text processing system using Hadoop for e-discovery cloud service. Innov Inf Sci Technol Res Group (ISYOU) 4:12–24
Zhang F, Sakr M (2013) Dataset scaling and MapReduce performance. In: 2013 IEEE 27th international on parallel and distributed processing symposium workshops and PhD forum (IPDPSW), pp 1683–1690
Guruzon. http://guruzon.com. Accessed 20 Nov 2013
Chen T-S, Hu X-Q, Li S-A, Zhou C-L (2008) Multi-class diagnosis classification on high dimension data by logistic models. In: 2008 international conference on machine learning and cybernetics, vol 6, pp 3301–3306
Seliya N, Xu Z, Khoshgoftaar TM (2008) Addressing class imbalance in non-binary classification problems. In: 20th IEEE international conference on tools with artificial intelligence (ICTAI’08), vol 1, pp 460–466
Maithani S, Tyagi R (2008) Noise characterization and classification for background estimation. In: International conference on signal processing, communications and networking (ICSCN’08), pp 208–213
Yan Z, Wang X, Du L (2011) Design method of highway traffic safety analysis model. In: International conference on transportation, mechanical, and electrical engineering (TMEE), pp 151–154
Beshah T, Ejigu D, Abraham A, Snasel V, Kromer P (2011) Pattern recognition and knowledge discovery from road traffic accident data in Ethiopia: implications for improving road safety. In: 2011 world congress on information and communication technologies (WICT), pp 1241–1246
Ramani RG, Shanthi S (2012) Classifier prediction evaluation in modeling road traffic accident data. In: 2012 IEEE international conference on computational intelligence and computing research (ICCIC), pp 1–4
Ghimire B, Bhattacharjee S, Ghosh SK (2013) Analysis of spatial autocorrelation for traffic accident data based on spatial decision tree. In: 2013 fourth international conference on computing for geospatial research and application (COM.Geo), pp 111–115
Apache, Apache Hadoop. https://hadoop.apache.org/. Accessed 20 Nov 2013
Apache, Apache Hive. https://hive.apache.org/. Accessed 20 Nov 2013
Apache, Apache Mahout. https://mahout.apache.org/. Accessed 13 Jan 2014
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Kukar M (2004) Transduction and typicalness for quality assessment of individual classifications in machine learning and data mining. In: Fourth IEEE international conference on data mining (ICDM’04), pp 146–153
Raghavendra PS, Chowdhury SR, Kameswari SV (2010) Comparative study of neural networks and \(k\)-means classification in web usage mining. In: International conference on internet technology and secured transactions (ICITST)
Rahayu SP, Purnami SW, Embong A (2008) Applying kernel logistic regression in data mining to classify credit risk. Inf Technol 2:1–6
Mountassir A, Benbrahim H, Berrada I (2010) An empirical study to address the problem of unbalanced data sets in sentiment classification. In: 2012 IEEE international conference on systems, man, and cybernetics (SMC), pp 3298–3303
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, Sh., Kim, Sm. & Ha, Yg. Highway traffic accident prediction using VDS big data analysis. J Supercomput 72, 2815–2831 (2016). https://doi.org/10.1007/s11227-016-1624-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1624-z