Skip to main content

Advertisement

Log in

An innovative framework for supporting big atmospheric data analytics via clustering-based spatio-temporal analysis

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, we provide principles, models, and main architecture of an innovative framework for supporting intelligent analytics over big atmospheric data via clustering-based spatio-temporal analysis. In particular we investigates the interesting applicative setting represented by Greenhouse Gas Emissions (GGEs), a relevant instance of Big Data that empathize the Variety aspect of the well-known 3V Big Data axioms. A relevant case study is also introduced and discussed in detail. We also provide a comprehensive experimental evaluation of the proposed framework, which indeed confirms the benefits of our approach. The deriving Big Data Mining model turns to be useful for decision support processes in both the governmental and industrial contexts. We complete our analytical contributions by means of concluding remarks of our work, and a vision on future research efforts in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://unfccc.int.

  2. http://www.epa.gov.

  3. http://unfccc.int/kyoto_protocol/items/2830.php.

  4. http://www.eea.europa.eu.

  5. http://www.cia.gov.

  6. http://www.epa.gov/ozone/science/sc_fact.html.

References

  • Amatriain X (2012) Mining large streams of user data for personalized recommendations. SIGKDD Explor 14(2):37–48

    Article  Google Scholar 

  • Anderson BJ, Musicant DR, Ritz AM, Ault A, Gross DS, Yuen M, Galli M (2005) User-friendly clustering for atmospheric data analysis. Carleton College, Northfield, MN, Technical Report

  • Athanasiadis IN, Mitkas PA (2004) Supporting the decision-making process in environmental monitoring systems with knowledge discovery techniques. In: Knowledge discovery for environmental management, volume Workshop I of Knowledge-based Services for the Public Sector Symposium, pp 1–12

  • Athanasiadis IN, Mitkas PA (2007) Knowledge discovery for operational decision support in air quality management. J Environ Inf 9(2):100–107

    Article  Google Scholar 

  • Barakeh ZA, Delbart V, Bonnet F (2014) Multiple gas sensors system for environmental and air quality assessments—a way to perform environmental monitoring in smart cities. In: SENSORNETS 2014—proceedings of the 3rd international conference on sensor networks, Lisbon, Portugal, 7–9 January, 2014, pp 360–364

  • Bellatreche L, Cuzzocrea A, Benkrid S (2010) F&A: a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In DAWAK 2010:89–104

    Google Scholar 

  • Carslaw DC, Beevers SD (2013) Characterising and understanding emission sources using bivariate polar plots and k-means clustering. Environ Model Softw 40:325–329

    Article  Google Scholar 

  • Chen L-D, Sakaguchi T, Frolick MN (2000) Data mining methods, applications, and tools. Inf Syst Manag 17(1):1–6

    Article  Google Scholar 

  • Cuzzocrea A (2013) Analytics over big data: exploring the convergence of datawarehousing, OLAP and data-intensive cloud infrastructures. In: 37th annual IEEE computer software and applications conference, COMPSAC 2013, Kyoto, Japan, July 22–26, 2013, pp 481–483

  • Cuzzocrea A (2014a) Big data mining or turning data mining into predictive analytics from large-scale 3vs data: the future challenge for knowledge discovery. In: Model and data engineering—4th international conference, MEDI 2014, Larnaca, Cyprus, September 24–26, 2014. Proceedings, pp 4–8

  • Cuzzocrea A (2014b) Privacy and security of big data: Current challenges and future research perspectives. In: Proceedings of the first international workshop on privacy and security of big data, PSBD@CIKM 2014, Shanghai, China, November 7, 2014, pp 45–47

  • Cuzzocrea A (2015) Data warehousing and OLAP over big data: a survey of the state-of-the-art, open problems and future challenges. IJBPIM 7(4):372–377

    Article  Google Scholar 

  • Cuzzocrea A, Saccà D (2010) Balancing accuracy and privacy of OLAP aggregations on data cubes. In ACM DOLAP 2010:93–98

    Google Scholar 

  • Cuzzocrea A, Darmont J, Mahboubi H (2009) Fragmenting very large XML data warehouses via k-means clustering algorithm. IJBIDM 4(3/4):301–328

    Article  Google Scholar 

  • Cuzzocrea A, Bellatreche L, Song I (2013a) Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, October 28, 2013, pp 67–70

  • Cuzzocrea A, Fortino G, Rana OF (2013b) Managing data and processes in cloud-enabled large-scale sensor networks: State-of-the-art and future research directions. In: 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, CCGrid 2013, Delft, The Netherlands, May 13–16, 2013, pp 583–588

  • Cuzzocrea A, Gaber MM, Lattimer S (2015) Spatio-temporal analysis of greenhouse gas data via clustering techniques. In: 19th IEEE international conference on computer supported cooperative work in design, CSCWD 2015, Calabria, Italy, May 6–8, 2015, pp 478–483

  • Cuzzocrea A, Saccà D, Ullman JD (2013c) Big data: a research agenda. In: 17th international database engineering and applications symposium, IDEAS ’13, Barcelona, Spain - October 09–11, 2013, pp 198–203

  • Cuzzocrea A, Song I (2014) Big graph analytics: the state of the art and future research agenda. In: Proceedings of the 17th international workshop on data warehousing and OLAP, DOLAP 2014, Shanghai, China, November 3–7, 2014, pp 99–101

  • Cuzzocrea A, Song I, Davis KC (2011) Analytics over large-scale multidimensional data: the big data revolution! In: DOLAP 2011, ACM 14th international workshop on data warehousing and OLAP, Glasgow, UK, October 28, 2011, Proceedings, pp 101–104

  • Dunn JC (1974) Well separated clusters and optimal fuzzy-partitions. J Cybern 4:95–104

    Article  MathSciNet  MATH  Google Scholar 

  • Ekasingh B, Ngamsomsuke K, Letcher R, Spate J (2005) A data mining approach to simulating farmers’ crop choices for integrated water resources management. J Environ Manage 77(4):315–325

    Article  Google Scholar 

  • Ellison AM (2004) Bayesian inference in ecology. Ecol Lett 7(6):509–520

    Article  Google Scholar 

  • Etchevers X, Salaün G, Boyer F, Coupaye T, Palma ND (2017) Reliable self-deployment of distributed cloud applications. Softw Pract Exp 47(1):3–20

    Article  Google Scholar 

  • Fan W, Bifet A (2012) Mining big data: current status, and forecast to the future. SIGKDD Explor 14(2):1–5

    Article  Google Scholar 

  • Gaffney SJ, Robertson AW, Smyth P, Camargo SJ, Ghil M (2007) Probabilistic clustering of extratropical cyclones using regression mixture models. Clim Dyn 29(4):423–440

    Article  Google Scholar 

  • Ganguly AR, Steinhaeuser K (2008) Data mining for climate change and impacts. In: IEEE international conference on data mining workshops, 2008. ICDMW’08, pp 385–394. IEEE

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009a) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  • Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009b) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  • Horizon 2020 (2015) The EU framework programme for research and innovation. https://ec.europa.eu/programmes/horizon2020/. Accessed: 20 Dec 2015

  • Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Kaufmann L, Rousseeuw P (1987) Clustering by means of medoids. pp 405–416

  • Kersting K, Meyer U (2018) From big data to big artificial intelligence? Algorithmic challenges and opportunities of big data. KI 32(1):3–8

  • Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  • Kolehmainen M, Martikainen H, Ruuskanen J (2001) Neural networks and periodic components used in air quality forecasting. Atmos Environ 35(5):815–825

    Article  Google Scholar 

  • Kusiak A, Zheng H, Song Z (2010) Power optimization of wind turbines with data mining and evolutionary computation. Renew Energy 35(3):695–702

    Article  Google Scholar 

  • Laney D (2001) 3D data management: Controlling data volume, velocity, and variety. Technical report, META Group

    Google Scholar 

  • Li S, Shue L (2004) Data mining to aid policy making in air pollution management. Expert Syst Appl 27(3):331–340

    Article  Google Scholar 

  • Lin J, Ryaboy DV (2012) Scaling big data mining infrastructure: the twitter experience. SIGKDD Explor 14(2):6–19

    Article  Google Scholar 

  • Lindzen RS (1990) Some coolness concerning global warming. Bull Am Meteorol Soc 71(3):288–299

    Article  Google Scholar 

  • Macêdo M, Cook D, Brown TJ (2000) Visual data mining in atmospheric science data. Data Min Knowl Discov 4(1):69–80

    Article  MATH  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol  1. pp 281–297

  • Martínez-Ballesteros M, Lora AT, Martínez-Álvarez F, Riquelme JC (2010) Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution. Integr Comput Aid Eng 17(3):227–242

    Article  Google Scholar 

  • Mora HM, Gil D, Terol RM, López JA, Szymanski J (2017) An iot-based computational framework for healthcare monitoring in mobile environments. Sensors 17(10):2302

    Article  Google Scholar 

  • Orlowski A, Marc M, Namiesnik J, Tobiszewski M (2017) Assessment and optimization of air monitoring network for smart cities with multicriteria decision analysis. In: Intelligent information and database systems—9th Asian Conference, ACIIDS 2017, Kanazawa, Japan, April 3–5, 2017, Proceedings, Part II, pp 531–538

  • Panagiotou N, Zygouras N, Katakis I, Gunopulos D, Zacheilas N, Boutsis I, Kalogeraki V, Lynch S, O’Brien B (2016) Intelligent urban data monitoring for smart cities. In: Machine learning and knowledge discovery in databases—European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III, pp 177–192

  • Phares DJ, Rhoads KP, Wexler AS, Kane DB, Johnston MV (2001) Application of the art-2a algorithm to laser ablation aerosol mass spectrometry of particle standards. Anal Chem 73(10):2338–2344

    Article  Google Scholar 

  • Ramakrishnan R, Schauer JJ, Chen L, Huang Z, Shafer MM, Gross DS (2005) The EDAM project: mining atmospheric aerosol datasets. Int J Intell Syst 20(7):759–787

    Article  Google Scholar 

  • Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65

    Article  MATH  Google Scholar 

  • Salimi F, Ristovski Z, Mazaheri M, Laiman R, Crilley LR, He C, Clifford S, Morawska L (2014) Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment. Atmos Chem Phys 14(1):11883–11892

    Article  Google Scholar 

  • Spate J, Gibert K, Sànchez-Marrè M, Frank E, Comas J, Athanasiadis I, Letcher R (2006) Data mining as a tool for environmental scientists. International Environmental Modelling and Software Society, In First workshop of data mining techniques for environmental scientists

    Google Scholar 

  • Watanabe C, Touma E, Yamauchi K, Noguchi K, Hayashida S, Joe K (2005) Development of an interactive visual data mining system for atmospheric science. In: High-Performance Computing—6th International Symposium, ISHPC 2005, Nara, Japan, September 7–9, 2005, First International Workshop on Advanced Low Power Systems, ALPS 2006, Revised Selected Papers, pp 279–286

  • Wirth R, Hipp J (2000) Crisp-dm: towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, pp 29–39. Citeseer

  • Yu B, Cuzzocrea A, Jeong DH, Maydebura S (2012) On managing very large sensor-network data using bigtable. In IEEE/ACM CCGrid 2012:918–922

    Google Scholar 

  • Zoppou C, Nielsen OM, Zhang L (2002) Regionalization of daily stream flow in australia using wavelets and k-means analysis. Technical report, CMA Research Report MRR02-003, Australian National University, Canberra. http://wwwmaths.anu.edu.au/research. reports/mrr/02/003

Download references

Acknowledgements

Authors are very grateful to Dr. Staci Lattimer, who contributed to early versions of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edoardo Fadda.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cuzzocrea, A., Gaber, M.M., Fadda, E. et al. An innovative framework for supporting big atmospheric data analytics via clustering-based spatio-temporal analysis. J Ambient Intell Human Comput 10, 3383–3398 (2019). https://doi.org/10.1007/s12652-018-0966-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0966-1

Keywords

Navigation