Abstract
In this paper, we provide principles, models, and main architecture of an innovative framework for supporting intelligent analytics over big atmospheric data via clustering-based spatio-temporal analysis. In particular we investigates the interesting applicative setting represented by Greenhouse Gas Emissions (GGEs), a relevant instance of Big Data that empathize the Variety aspect of the well-known 3V Big Data axioms. A relevant case study is also introduced and discussed in detail. We also provide a comprehensive experimental evaluation of the proposed framework, which indeed confirms the benefits of our approach. The deriving Big Data Mining model turns to be useful for decision support processes in both the governmental and industrial contexts. We complete our analytical contributions by means of concluding remarks of our work, and a vision on future research efforts in the field.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amatriain X (2012) Mining large streams of user data for personalized recommendations. SIGKDD Explor 14(2):37–48
Anderson BJ, Musicant DR, Ritz AM, Ault A, Gross DS, Yuen M, Galli M (2005) User-friendly clustering for atmospheric data analysis. Carleton College, Northfield, MN, Technical Report
Athanasiadis IN, Mitkas PA (2004) Supporting the decision-making process in environmental monitoring systems with knowledge discovery techniques. In: Knowledge discovery for environmental management, volume Workshop I of Knowledge-based Services for the Public Sector Symposium, pp 1–12
Athanasiadis IN, Mitkas PA (2007) Knowledge discovery for operational decision support in air quality management. J Environ Inf 9(2):100–107
Barakeh ZA, Delbart V, Bonnet F (2014) Multiple gas sensors system for environmental and air quality assessments—a way to perform environmental monitoring in smart cities. In: SENSORNETS 2014—proceedings of the 3rd international conference on sensor networks, Lisbon, Portugal, 7–9 January, 2014, pp 360–364
Bellatreche L, Cuzzocrea A, Benkrid S (2010) F&A: a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In DAWAK 2010:89–104
Carslaw DC, Beevers SD (2013) Characterising and understanding emission sources using bivariate polar plots and k-means clustering. Environ Model Softw 40:325–329
Chen L-D, Sakaguchi T, Frolick MN (2000) Data mining methods, applications, and tools. Inf Syst Manag 17(1):1–6
Cuzzocrea A (2013) Analytics over big data: exploring the convergence of datawarehousing, OLAP and data-intensive cloud infrastructures. In: 37th annual IEEE computer software and applications conference, COMPSAC 2013, Kyoto, Japan, July 22–26, 2013, pp 481–483
Cuzzocrea A (2014a) Big data mining or turning data mining into predictive analytics from large-scale 3vs data: the future challenge for knowledge discovery. In: Model and data engineering—4th international conference, MEDI 2014, Larnaca, Cyprus, September 24–26, 2014. Proceedings, pp 4–8
Cuzzocrea A (2014b) Privacy and security of big data: Current challenges and future research perspectives. In: Proceedings of the first international workshop on privacy and security of big data, PSBD@CIKM 2014, Shanghai, China, November 7, 2014, pp 45–47
Cuzzocrea A (2015) Data warehousing and OLAP over big data: a survey of the state-of-the-art, open problems and future challenges. IJBPIM 7(4):372–377
Cuzzocrea A, Saccà D (2010) Balancing accuracy and privacy of OLAP aggregations on data cubes. In ACM DOLAP 2010:93–98
Cuzzocrea A, Darmont J, Mahboubi H (2009) Fragmenting very large XML data warehouses via k-means clustering algorithm. IJBIDM 4(3/4):301–328
Cuzzocrea A, Bellatreche L, Song I (2013a) Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, October 28, 2013, pp 67–70
Cuzzocrea A, Fortino G, Rana OF (2013b) Managing data and processes in cloud-enabled large-scale sensor networks: State-of-the-art and future research directions. In: 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, CCGrid 2013, Delft, The Netherlands, May 13–16, 2013, pp 583–588
Cuzzocrea A, Gaber MM, Lattimer S (2015) Spatio-temporal analysis of greenhouse gas data via clustering techniques. In: 19th IEEE international conference on computer supported cooperative work in design, CSCWD 2015, Calabria, Italy, May 6–8, 2015, pp 478–483
Cuzzocrea A, Saccà D, Ullman JD (2013c) Big data: a research agenda. In: 17th international database engineering and applications symposium, IDEAS ’13, Barcelona, Spain - October 09–11, 2013, pp 198–203
Cuzzocrea A, Song I (2014) Big graph analytics: the state of the art and future research agenda. In: Proceedings of the 17th international workshop on data warehousing and OLAP, DOLAP 2014, Shanghai, China, November 3–7, 2014, pp 99–101
Cuzzocrea A, Song I, Davis KC (2011) Analytics over large-scale multidimensional data: the big data revolution! In: DOLAP 2011, ACM 14th international workshop on data warehousing and OLAP, Glasgow, UK, October 28, 2011, Proceedings, pp 101–104
Dunn JC (1974) Well separated clusters and optimal fuzzy-partitions. J Cybern 4:95–104
Ekasingh B, Ngamsomsuke K, Letcher R, Spate J (2005) A data mining approach to simulating farmers’ crop choices for integrated water resources management. J Environ Manage 77(4):315–325
Ellison AM (2004) Bayesian inference in ecology. Ecol Lett 7(6):509–520
Etchevers X, Salaün G, Boyer F, Coupaye T, Palma ND (2017) Reliable self-deployment of distributed cloud applications. Softw Pract Exp 47(1):3–20
Fan W, Bifet A (2012) Mining big data: current status, and forecast to the future. SIGKDD Explor 14(2):1–5
Gaffney SJ, Robertson AW, Smyth P, Camargo SJ, Ghil M (2007) Probabilistic clustering of extratropical cyclones using regression mixture models. Clim Dyn 29(4):423–440
Ganguly AR, Steinhaeuser K (2008) Data mining for climate change and impacts. In: IEEE international conference on data mining workshops, 2008. ICDMW’08, pp 385–394. IEEE
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009a) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009b) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Horizon 2020 (2015) The EU framework programme for research and innovation. https://ec.europa.eu/programmes/horizon2020/. Accessed: 20 Dec 2015
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Kaufmann L, Rousseeuw P (1987) Clustering by means of medoids. pp 405–416
Kersting K, Meyer U (2018) From big data to big artificial intelligence? Algorithmic challenges and opportunities of big data. KI 32(1):3–8
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Kolehmainen M, Martikainen H, Ruuskanen J (2001) Neural networks and periodic components used in air quality forecasting. Atmos Environ 35(5):815–825
Kusiak A, Zheng H, Song Z (2010) Power optimization of wind turbines with data mining and evolutionary computation. Renew Energy 35(3):695–702
Laney D (2001) 3D data management: Controlling data volume, velocity, and variety. Technical report, META Group
Li S, Shue L (2004) Data mining to aid policy making in air pollution management. Expert Syst Appl 27(3):331–340
Lin J, Ryaboy DV (2012) Scaling big data mining infrastructure: the twitter experience. SIGKDD Explor 14(2):6–19
Lindzen RS (1990) Some coolness concerning global warming. Bull Am Meteorol Soc 71(3):288–299
Macêdo M, Cook D, Brown TJ (2000) Visual data mining in atmospheric science data. Data Min Knowl Discov 4(1):69–80
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. pp 281–297
Martínez-Ballesteros M, Lora AT, Martínez-Álvarez F, Riquelme JC (2010) Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution. Integr Comput Aid Eng 17(3):227–242
Mora HM, Gil D, Terol RM, López JA, Szymanski J (2017) An iot-based computational framework for healthcare monitoring in mobile environments. Sensors 17(10):2302
Orlowski A, Marc M, Namiesnik J, Tobiszewski M (2017) Assessment and optimization of air monitoring network for smart cities with multicriteria decision analysis. In: Intelligent information and database systems—9th Asian Conference, ACIIDS 2017, Kanazawa, Japan, April 3–5, 2017, Proceedings, Part II, pp 531–538
Panagiotou N, Zygouras N, Katakis I, Gunopulos D, Zacheilas N, Boutsis I, Kalogeraki V, Lynch S, O’Brien B (2016) Intelligent urban data monitoring for smart cities. In: Machine learning and knowledge discovery in databases—European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III, pp 177–192
Phares DJ, Rhoads KP, Wexler AS, Kane DB, Johnston MV (2001) Application of the art-2a algorithm to laser ablation aerosol mass spectrometry of particle standards. Anal Chem 73(10):2338–2344
Ramakrishnan R, Schauer JJ, Chen L, Huang Z, Shafer MM, Gross DS (2005) The EDAM project: mining atmospheric aerosol datasets. Int J Intell Syst 20(7):759–787
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65
Salimi F, Ristovski Z, Mazaheri M, Laiman R, Crilley LR, He C, Clifford S, Morawska L (2014) Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment. Atmos Chem Phys 14(1):11883–11892
Spate J, Gibert K, Sànchez-Marrè M, Frank E, Comas J, Athanasiadis I, Letcher R (2006) Data mining as a tool for environmental scientists. International Environmental Modelling and Software Society, In First workshop of data mining techniques for environmental scientists
Watanabe C, Touma E, Yamauchi K, Noguchi K, Hayashida S, Joe K (2005) Development of an interactive visual data mining system for atmospheric science. In: High-Performance Computing—6th International Symposium, ISHPC 2005, Nara, Japan, September 7–9, 2005, First International Workshop on Advanced Low Power Systems, ALPS 2006, Revised Selected Papers, pp 279–286
Wirth R, Hipp J (2000) Crisp-dm: towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, pp 29–39. Citeseer
Yu B, Cuzzocrea A, Jeong DH, Maydebura S (2012) On managing very large sensor-network data using bigtable. In IEEE/ACM CCGrid 2012:918–922
Zoppou C, Nielsen OM, Zhang L (2002) Regionalization of daily stream flow in australia using wavelets and k-means analysis. Technical report, CMA Research Report MRR02-003, Australian National University, Canberra. http://wwwmaths.anu.edu.au/research. reports/mrr/02/003
Acknowledgements
Authors are very grateful to Dr. Staci Lattimer, who contributed to early versions of this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cuzzocrea, A., Gaber, M.M., Fadda, E. et al. An innovative framework for supporting big atmospheric data analytics via clustering-based spatio-temporal analysis. J Ambient Intell Human Comput 10, 3383–3398 (2019). https://doi.org/10.1007/s12652-018-0966-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0966-1