Skip to main content
Log in

Big data pre-processing methods with vehicle driving data using MapReduce techniques

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

A huge amount of sensing data are generated by a large number of pervasive IoT devices. In order to find meaningful information from the big data, it is essential to perform pre-processing, in which many outlier data points need to be removed, because they deteriorate as time passes. Although pre-processing is essential in the big data field, there has been a significant lack of research works with case studies. In this paper, big data pre-processing methods are investigated and proposed. To evaluate the pre-processing methods for accurate analysis, we used a collection of digital tachograph (DTG) data. We obtained DTG sensing data of 6198 driving vehicles over a year. We studied five kinds of pre-processing methods: filtering ranges, excluding meaningless values, comparing filters from variables, applying statistical techniques, and finding driving patterns. In addition, we developed a MapReduce program using a Hadoop ecosystem and deployed big data to perform the pre-processing analysis. Through the pre-processing steps, we confirmed that the proportion of DTG sensing data points including any errors was up to 27.09%. Compared to the traditional brute-force way to detect, ours had 71.1% additional detection effect. In addition, we confirmed that outlier data points, which are difficult to detect through simple range error pre-processing, could be well detected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  1. Souza AM, Amazonas JR (2015) An outlier detect algorithm using big data processing and internet of things architecture. Proced Comput Sci 52:1010–1015

    Article  Google Scholar 

  2. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170

    Article  Google Scholar 

  3. Govindarajan M, Abinaya V (2014) An outlier detection approach with data mining in wireless sensor network. Int J Curr Eng Technol 4(2):929–932

    Google Scholar 

  4. Wang C, Lin H, Jiang H (2014) Trajectory-based multi-dimensional outlier detection in wireless sensor networks using Hidden Markov Models. Wirel Netw 20(8):2409–2418

    Article  Google Scholar 

  5. Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805

    Article  MATH  Google Scholar 

  6. Lee SJ, Lee C (2012) Short-Term Impact Analysis of DTG Installation for Commercial Vehicles. J Korea Inst Intell Transp Syst 11(6):49–59

    Article  Google Scholar 

  7. Kang JG, Kim YW, Lim UT, Jun MS (2013) An improved vehicle data format of digital tachograph. J Korea Soc Comput Inform 18(8):77–85

    Article  Google Scholar 

  8. Park J, Joh G, Park J (2015) Study on reliability of new digital tachograph for traffic accident investigation and reconstruction. Transact Korean Soc Automot Eng 23(6):615–622

    Article  Google Scholar 

  9. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Newton

    Google Scholar 

  10. Lai WK, Chen YU, Wu TY, Obaidat MS (2014) Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J Supercomput 68(1):488–507

    Article  Google Scholar 

  11. Lam C (2010) Hadoop in action. Manning Publications Co., Greenwich

    Google Scholar 

  12. Hadoop Homepage, (2015) [Online] Available: https://hadoop.apache.org/

  13. Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. ACM Sigmod Rec 40(4):11–20

    Article  Google Scholar 

  14. Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercomput 70(3):1249–1259

    Article  Google Scholar 

  15. He Q, Tan Q, Ma X, Shi Z (2010, October). The high-activity parallel implementation of data preprocessing based on MapReduce. In: International Conference on Rough Sets and Knowledge Technology. Springer, Berlin, pp. 646–654

  16. Pan Y, Zhang J (2012) Parallel programming on cloud computing platforms–challenges and solutions. J Converg 3(4):23–28

    Google Scholar 

  17. Mafrur R, Nugraha IGD, Choi D (2015) Modeling and discovering human behavior from smartphone sensing life-log data for identification purpose. Hum Cent Comput Inform Sci 5(1):1

    Article  Google Scholar 

  18. Toledo RY, Mota YC, Borroto MG (2013) A regularity-based preprocessing method for collaborative recommender systems. J Inf Process Syst 9(3):435–460

    Article  Google Scholar 

  19. Cho W, Choi E (2015) A GPS trajectory map-matching mechanism with DTG big data on the HBase system. In: The Proceedings 2015 International Conference on Big Data Applications and Services

  20. Cho W, Choi E (2015) Rural traffic map coverage extension using DTG Big data processing. J Inform Technol Archit 12:51–57

    Google Scholar 

  21. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  22. Vilaça A, Aguiar A, Soares C (2015) Estimating fuel consumption from GPS data. In: Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, pp 672–682

Download references

Acknowledgements

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the IT/SW Creative research program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2013-H0502-13-1071).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eunmi Choi.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cho, W., Choi, E. Big data pre-processing methods with vehicle driving data using MapReduce techniques. J Supercomput 73, 3179–3195 (2017). https://doi.org/10.1007/s11227-017-2014-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2014-x

Keywords