Skip to main content

Advertisement

Log in

An implementation of cloud-based platform with R packages for spatiotemporal analysis of air pollution

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recently, the R package has become a popular tool for big data analysis due to its several matured software packages for the data analysis and visualization, including the analysis of air pollution. The air pollution problem is of increasing global concern as it has greatly impacts on the environment and human health. With the rapid development of IoT and the increase in the accuracy of geographical information collected by sensors, a huge amount of air pollution data were generated. Thus, it is difficult to analyze the air pollution data in a single machine environment effectively and reliably due to its inherent characteristic of memory design. In this work, we construct a distributed computing environment based on both the softwares of RHadoop and SparkR for performing the analysis and visualization of air pollution with the R more reliably and effectively. In the work, we firstly use the sensors, called EdiGreen AirBox to collect the air pollution data in Taichung, Taiwan. Then, we adopt the Inverse Distance Weighting method to transform the sensors’ data into the density map. Finally, the experimental results show the accuracy of the short-term prediction results of PM2.5 by using the ARIMA model. In addition, the verification with respect to the prediction accuracy with the MAPE method is also presented in the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Cohen AJ, Ross Anderson H, Ostro B, Pandey KD, Krzyzanowski M, Kunzli N, Gutschmidt K, Pope A, Romieu I, Samet JM, Smith K (2005) The global burden of disease due to outdoor air pollution. J Toxic Environ Health 68(13–14):1301–1307

    Article  Google Scholar 

  2. Mehta S, Shin H, Burnett R, North T, Cohen AJ (2013) Ambient particulate air pollution and acute lower respiratory infections: a systematic review and implications for estimating the global burden of disease. Air Qual Atmos Health 6(1):69–83

    Article  Google Scholar 

  3. Liu L, Yang X, Liu H, Wang M, Welles S, Mrquez S, Frank A, Haas CN (2016) Spatial temporal analysis of airpollution, climate change, and total mortality in 120 cities of china. Front Public Health 4:1–13

    Article  Google Scholar 

  4. da Silva CS, Rossato JM, Rocha JAV, Vargas VM (2015) Characterization of an area of reference for inhalable particulate matter (PM2.5) associated with genetic biomonitoring in children. Mutat Res Genet Toxicol Environ Mutagen 778:44–55

    Article  Google Scholar 

  5. Yorifuji T, Kashima S, Diez MH, Kado Y, Sanada S, Doi H (2017) Prenatal exposure to outdoor air pollution and child behavioral problems at school age in Japan. Environ Int 99:192–198

    Article  Google Scholar 

  6. Ries L (1993) Areas of influence for IDW-interpolation with isotropic environmental data. CATENA 20(1):199–205

    Article  Google Scholar 

  7. Liang Y, Fang L, Pan H, Zhang K, Kan H, Brook JR, Sun Q (2014) PM2.5 in Beijing temporal pattern and its association with influenza. Environ Health 13:102–109

    Article  Google Scholar 

  8. Li X, Peng L, Hu Y, Shao J, Chi T (2016) Deep learning architecture for air quality predictions. Environ Sci Pollut Res 23:22408–22417

    Article  Google Scholar 

  9. Eddelbuettel D (2016) CRAN task view: high-performance and parallel computing with R. https://cran.r-project.org/web/views/HighPerformanceComputing.html

  10. Zhao Y, Cen Y (2013) Data mining applications with R. Academic Press, Cambridge

    Google Scholar 

  11. Liang M, Trejo C, Muthu L, Ngo LB, Luckow A, Apon AW (2015) Evaluating R-based big data analytic frameworks. In: 2015 IEEE International Conference on Cluster Computing, September 2015

  12. Dousse O, Thiran P, Hasler M (2002) Connectivity in ad-hoc and hybrid networks. In: Proceedings of IEEE INFOCOM 2002, June 2002

  13. Uskenbayeva R, Kuandykov A, Young IC, Temirboltov T, Mnzholov S, Kozhmzhrov D (2015) Integrating of data using the Hadoop and R. Proc Comput Sci 56:145–149

    Article  Google Scholar 

  14. Stachelek J (2017) Spatial interpolation via inverse path distance weighting. https://cran.r-project.org/web/packages/ipdw/vignettes/ipdw2.html

  15. Stachelek J (1993) Spatial interpolation via inverse path distance weighting. West Palm Beach 20:237–240

    Google Scholar 

  16. Prajapati V (2013) Big data analytics with R and Hadoop. Packt Publishing, Birmingham

    Google Scholar 

  17. Catalano M, Galatioto F, Bell M, Namdeo A, Bergantinoc AS (2016) Improving the prediction of air pollution peak episodes generated by urban transport networks. Environ Sci Policy 60:69–83

    Article  Google Scholar 

  18. Zafra C, Ngel Y, Torres E (2017) ARIMA analysis of the effect of land surface coverage on PM10 concentrations in a high-altitude megacity. Atmos Pollut Res 8(4):660–668

    Article  Google Scholar 

  19. Wang P, Zhang H, Qin Z, Zhang G (2017) A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos Pollut Res 8(5):850–860

    Article  Google Scholar 

  20. Kuandykov A, Cho YI, Temirboltov T, Mnzholov S, Kozhmzhrov D (2016) Optimizing R with SparkR on a commodity cluster for biomedical research. Comput Methods Progr Biomed 137:321–328

    Article  Google Scholar 

  21. Shivaram V, Zongheng Y, Davies L, Eric L, Hossein F, Xiangrui M, Reynold X, Ali G, Michael F, Stoica I, Matei Z (2016) SparkR: scaling R programs with spark. In: Proceedings of the 2016 International Conference on Management of Data, June–July 2016

  22. Siknun GP, Sitanggang IS (2016) Web-based classification application for forest fire data using the shiny framework and the C5.0 algorithm. Proc Environ Sci 33:332–339

    Article  Google Scholar 

  23. Hermawati R, Sitanggang IS (2016) Web-based clustering application using shiny framework and DBSCAN algorithm for hotspots data in peatland in Sumatra. Proc Environ Sci 33:317–323

    Article  Google Scholar 

  24. Ries L (1993) Areas of influence for IDW-interpolation with isotropic environmental data. CATENA 20(1–2):199–205

    Article  Google Scholar 

  25. Wagner M, Darrell K (2015) Tutorial L exploring discrete database networks of triCare health data using R and shiny. Pract Predict Anal Decis Syst Med 30:635–658

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Ministry of Science and Technology, Taiwan, under Grant MOST 105-2634-E-029-001 and MOST 106-2621-M-029-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Wei Chan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, CT., Chan, YW., Liu, JC. et al. An implementation of cloud-based platform with R packages for spatiotemporal analysis of air pollution. J Supercomput 76, 1416–1437 (2020). https://doi.org/10.1007/s11227-017-2189-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2189-1

Keywords

Navigation