Skip to main content

Advertisement

Log in

Highway traffic accident prediction using VDS big data analysis

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

An Erratum to this article was published on 08 February 2016

Abstract

In modern society, accidents on the roads are one of the most life-threatening dangers to humans. Traffic accidents that cause a lot of damages are occurring all over the places. The most effective solution to these types of accidents can be to predict future accidents in advance, giving drivers chances to avoid the dangers or reduce the damage by responding quickly. Predicting accidents on the road can be achieved using classification analysis, a data mining procedure requiring enough data to build a learning model. However, building such a predicting system involves several problems. It requires many hardware resources to collect and analyze traffic data for predicting traffic accidents since the data are extremely large. Furthermore, the size of data related to traffic accidents is less than that not related to traffic accidents; the amounts of the two classes (classes to be predicted and other classes) of data differ and are thus imbalanced. The purpose of this paper is to build a predicting model that can resolve all these problems. This paper suggests using the Hadoop framework to process and analyze big traffic data efficiently and a sampling method to resolve the problem of data imbalance. Based on this, the predicting system first preprocesses the big traffic data and analyzes it to create data for the learning system. The imbalance of created data is corrected using a sampling method. To improve the predicting accuracy, corrected data are classified into several groups, to which classification analysis is applied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Lv Y, Tang S, Zhao H (2009) Real-time highway traffic accident prediction based on the \(k\)-nearest neighbor method. In: International conference on measuring technology and mechatronics automation (ICMTMA), vol 3, pp 547–550

  2. Yu R, Liu X (2010) Study on traffic accidents prediction model based on RBF neural network. In: 2nd international conference on information engineering and computer science (ICIECS), pp 1–4

  3. Lv Y, Tang S, Zhao H (2010) Research on influence extention of two-lane highway intersections based on traffic accident database. In: International conference on optoelectronics and image processing (ICOIP), vol 2, pp 244–246

  4. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: Empirical software engineering and measurement, pp 196–204

  5. Gothenberg A, Tenhunen H (1998) Performance analysis of low oversampling ratio sigma-delta noise shapers for RF applications. In: Proceedings of the 1998 IEEE international symposium on circuits and systems (ISCAS’98), vol 1, pp 401–404

  6. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113

    Article  Google Scholar 

  7. Lee T, Kim H, Rhee K-H, Shin S-U (2013) Implementation and performance of distributed text processing system using Hadoop for e-discovery cloud service. Innov Inf Sci Technol Res Group (ISYOU) 4:12–24

    Google Scholar 

  8. Zhang F, Sakr M (2013) Dataset scaling and MapReduce performance. In: 2013 IEEE 27th international on parallel and distributed processing symposium workshops and PhD forum (IPDPSW), pp 1683–1690

  9. Guruzon. http://guruzon.com. Accessed 20 Nov 2013

  10. Chen T-S, Hu X-Q, Li S-A, Zhou C-L (2008) Multi-class diagnosis classification on high dimension data by logistic models. In: 2008 international conference on machine learning and cybernetics, vol 6, pp 3301–3306

  11. Seliya N, Xu Z, Khoshgoftaar TM (2008) Addressing class imbalance in non-binary classification problems. In: 20th IEEE international conference on tools with artificial intelligence (ICTAI’08), vol 1, pp 460–466

  12. Maithani S, Tyagi R (2008) Noise characterization and classification for background estimation. In: International conference on signal processing, communications and networking (ICSCN’08), pp 208–213

  13. Yan Z, Wang X, Du L (2011) Design method of highway traffic safety analysis model. In: International conference on transportation, mechanical, and electrical engineering (TMEE), pp 151–154

  14. Beshah T, Ejigu D, Abraham A, Snasel V, Kromer P (2011) Pattern recognition and knowledge discovery from road traffic accident data in Ethiopia: implications for improving road safety. In: 2011 world congress on information and communication technologies (WICT), pp 1241–1246

  15. Ramani RG, Shanthi S (2012) Classifier prediction evaluation in modeling road traffic accident data. In: 2012 IEEE international conference on computational intelligence and computing research (ICCIC), pp 1–4

  16. Ghimire B, Bhattacharjee S, Ghosh SK (2013) Analysis of spatial autocorrelation for traffic accident data based on spatial decision tree. In: 2013 fourth international conference on computing for geospatial research and application (COM.Geo), pp 111–115

  17. Apache, Apache Hadoop. https://hadoop.apache.org/. Accessed 20 Nov 2013

  18. Apache, Apache Hive. https://hive.apache.org/. Accessed 20 Nov 2013

  19. Apache, Apache Mahout. https://mahout.apache.org/. Accessed 13 Jan 2014

  20. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  21. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  22. Kukar M (2004) Transduction and typicalness for quality assessment of individual classifications in machine learning and data mining. In: Fourth IEEE international conference on data mining (ICDM’04), pp 146–153

  23. Raghavendra PS, Chowdhury SR, Kameswari SV (2010) Comparative study of neural networks and \(k\)-means classification in web usage mining. In: International conference on internet technology and secured transactions (ICITST)

  24. Rahayu SP, Purnami SW, Embong A (2008) Applying kernel logistic regression in data mining to classify credit risk. Inf Technol 2:1–6

    Google Scholar 

  25. Mountassir A, Benbrahim H, Berrada I (2010) An empirical study to address the problem of unbalanced data sets in sentiment classification. In: 2012 IEEE international conference on systems, man, and cybernetics (SMC), pp 3298–3303

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-guk Ha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, Sh., Kim, Sm. & Ha, Yg. Highway traffic accident prediction using VDS big data analysis. J Supercomput 72, 2815–2831 (2016). https://doi.org/10.1007/s11227-016-1624-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1624-z

Keywords

Navigation