Skip to main content
Log in

A visual big data system for the prediction of weather-related variables: Jordan-Spain case study

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particularities such as high volume and dimensionality, the frequent existence of missing values in some stations, and the high correlation between collected variables. In this regard, it is crucial to make use of Big Data and Data Mining techniques to deal with those data and extract useful knowledge from them that can be used, for instance, to predict weather phenomena. In this paper, we propose a visual big data system that is designed to deal with high amounts of weather-related data and lets the user analyze those data to perform predictive tasks over the considered variables (temperature and rainfall). The proposed system collects open data and loads them onto a local NoSQL database fusing them at different levels of temporal and spatial aggregation in order to perform a predictive analysis using univariate and multivariate approaches as well as forecasting based on training data from neighbor stations in cases with high rates of missing values. The system has been assessed in terms of usability and predictive performance, obtaining an overall normalized mean squared error value of 0.00013, and an overall directional symmetry value of nearly 0.84. Our system has been rated positively by a group of experts in the area (all aspects of the system except graphic desing were rated 3 or above in a 1–5 scale). The promising preliminary results obtained demonstrate the validity of our system and invite us to keep working on this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn

  2. Check ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt for further information on the M-,Q-, and S-Flag.

  3. All that information has been obtained from https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt

  4. There are some Spanish word in the figure whose meaning is: estación = station; fecha = date; valor_dato = datum_value

  5. https://weka.sourceforge.io/doc.packages/timeseriesForecasting/weka/classifiers/timeseries/WekaForecaster.html

  6. https://weka.sourceforge.io/doc.dev/weka/classifiers/meta/Bagging.html

References

  1. Aggarwal C (2014) Data classification – algorithms and applications, Chapman & Hall/CRC

  2. Alodah A, Seidou O (2019) The adequacy of stochastically generated climate time series for water resources systems risk and performance assessment. Stoch Environ Res Risk Assess 33:253–269

    Article  Google Scholar 

  3. Ambigavathi M, and Sridharan D (2020) A survey on big data in healthcare applications. In: Choudhury S., Mishra R., Mishra R., Kumar A. (eds) Intelligent communication, control and devices. Advances in intelligent systems and computing, vol 989. Springer, Singapore

  4. Baerg A (2017) Big data, sport, and the digital divide: theorizing how athletes might respond to big data monitoring. Journal of Sport and Social Issues 41(1):3–20

    Article  Google Scholar 

  5. Bajaber F, Sakr S, Batarfi O, Altalhi A, Barnawi A (2020) Benchmarking big data systems: a survey. Comput Commun 149:241–251

    Article  Google Scholar 

  6. Booz J, Yu W, Xu G, Griffith D, and Golmie N (2019) A Deep Learning-Based Weather Forecast System for Data Volume and Recency Analysis, 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, pp. 697–701

  7. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  MATH  Google Scholar 

  8. Chodorow K, and Dirolf M (2010) MongoDB: the definitive guide, O′Reilly media, Inc., Sebastopol, CA, USA

  9. Chouksey P, Chauhan AS (2017) A review of weather data analytics using big data. International Journal of Advanced Research in Computer and Communication Engineering 6(1):365–368

    Article  Google Scholar 

  10. Corbellini A, Mateos C, Zunino A, Godoy D, Schiaffino S (2017) Persisting big-data: the NoSQL landscape. Inf Syst 63:1–23

    Article  Google Scholar 

  11. Dagade V, Lagali M, Avadhani S, Kalekar P (2015) Big data weather analytics using Hadoop. International Journal of Emerging Technology in Computer Science & Electronics 14(2):847–851

    Google Scholar 

  12. Fayyad UM, Piatetsky-Shapiro G, and Smyth P (1996) “From Data Mining To Knowledge Discovery: An Overview,” in Advances In Knowledge Discovery And Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press, Menlo Park, CA., pp. 1–34

  13. Firican G (2020) The 10 Vs of big data. TDWI. https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx [accessed July 2020]

  14. Gutiérrez PA, Pérez-Ortiz M, Sánchez-Monedero J, Fernández-Navarro F, Hervás-Martínez C (2016) Ordinal regression methods: survey and experimental study. IEEE Trans Knowl Data Eng 28(1):127–146

    Article  Google Scholar 

  15. Hassani H, Silva ES (2015) Forecasting with big data: a review. Ann Data Sci 2:5–19

    Article  Google Scholar 

  16. Haupt SE and Kosovic B (2015) Big Data and Machine Learning for Applied Weather Forecasts: Forecasting Solar Power for Utility Operations, 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, pp. 496–501

  17. Haykin S (1998) Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall

  18. Hussein E, Sadiki R, Jafta Y, Sungay MM, Ajayi O (2020) And a. Bagula a., big data processing using Hadoop and spark: the case of meteorology data. In: Zitouni R, Agueh M, Houngue P, Soude H (eds) E-infrastructure and e-Services for Developing Countries. AFRICOMM 2019. Lecture notes of the Institute for Computer Sciences, social informatics and telecommunications engineering, vol 311. Springer, Cham

    Google Scholar 

  19. Ismail KA, Majid MA, Zain JM, and Abu Bakar NA (2016) Big Data prediction framework for weather Temperature based on MapReduce algorithm, 2016 IEEE Conference on Open Systems (ICOS), Langkawi, pp. 13–17

  20. Ismail KA, Majid MA, Fakherldin M, Zain JM (2017) A big data prediction framework for weather forecast using MapReduce algorithm. J Comput Theor Nanosci 23(11):11138–11143(6)

    Google Scholar 

  21. Jose B and Abraham S (2017) Exploring the merits of nosql: A study based on mongodb, International Conference on Networks & Advances in Computational Technologies (NetACT), Thiruvanthapuram, pp. 266–271, 2017

  22. Küçükkeçeci C, Yazici A (2019) Multilevel object tracking in wireless multimedia sensor networks for surveillance applications using graph-based big data. IEEE Access 7:67818–67832

    Article  Google Scholar 

  23. Kulkarni P, and Akhilesh KB (2020) big data analytics as an enabler in smart governance for the future smart cities. In: Akhilesh K., Möller D. (eds) Smart technologies. Springer, Singapore

  24. Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40

    Article  Google Scholar 

  25. Lin S-Y, Chiang C-C, Li J-B, Hung Z-S, Chao K-M (2018) Dynamic fine-tuning stacked auto-encoder neural network for weather forecast. Futur Gener Comput Syst 89:446–454

    Article  Google Scholar 

  26. Liu JNK, Hu Y, He Y, Chan PW, and Lai L (2015) Deep Neural Network Modeling for Big Data Weather Forecasting. In: Pedrycz W., Chen SM. (eds) Information Granularity, Big Data, and Computational Intelligence. Studies in Big Data, vol 8, pp 389–408, Springer, Cham

  27. Liu H, Ong Y, Shen X and Cai J, When Gaussian Process Meets Big Data: A Review of Scalable GPs, in IEEE Transactions on Neural Networks and Learning Systems.

  28. Lynch C (2008) Big data: How do your data grow? Nature 455(7209):28–29

    Article  Google Scholar 

  29. Marchioni F (2012) Infinispan data grid platform. Packt Pub Limited, Birmingham

    Google Scholar 

  30. Membrey P, Plugge E, Hawkins T (2010) The definitive guide to MongoDB: the NoSQL database for cloud and desktop computing. Apress, Berkely

    Google Scholar 

  31. Miyoshi T, Kondo K, Terasaki K (2015) Big ensemble data assimilation in numerical weather prediction. Computer 48(11):15–21

    Article  Google Scholar 

  32. Moreno FJ (2019) Sistema big data para mejorar los rendimientos agrícolas en Castilla y León, Degree dissertation, Udima, Madrid, Spain

  33. Narendra K, and Aghila G (2020) Securing Online Bank's Big Data Through Block Chain Technology: Cross-Border Transactions Security and Tracking. In R. Joshi, & B. Gupta (Eds.), Security, Privacy, and Forensics Issues in Big Data pp. 247–263

  34. Objectivity Inc. (2020) InfiniteGraph, http://www.objectivity.com/infinitegraph, 2013 (accessed 17.04.20).

  35. Pandey P, Kumar M and Srivastava P (2016) Classification techniques for big data: A survey, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, pp. 3625–3629

  36. Pyzel P (2019) Ampliación de un sistema de Big data para mejorar los rendimientos agrícolas con objetivo de realizar previsiones de necesidades de agua tratada en países con escasez de recursos hídricos, Degree dissertation, Udima

  37. Renuka Devi D, and Sasikala S (2019) Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams. Journal of Big Data, vol. 6, no. 103

  38. Seber GAF, and Lee AJ, Linear regression analysis, 2nd edition, Wiley Series in Probability and Statistics, Wiley-Interscience, 2003.

  39. Shastri A, Deshpande M (2020) A review of big data and its applications in healthcare and public sector. In: Kulkarni A et al (eds) Big data analytics in healthcare. Studies in big data, vol 66. Springer, Cham

    Google Scholar 

  40. Shevade SK, Keerthi SS, Bhattacharyya C, and Murthy KRK (1999) Improvements to the SMO algorithm for SVM regression, IEEE Trans Neural Netw

  41. Torres JF, Troncoso A, Koprinska I, Wang Z, Martínez-Álvarez F (2019) Big data solar power forecasting based on deep learning and multiple data sources. Expert Syst 36:e12394. https://doi.org/10.1111/exsy.12394

    Article  Google Scholar 

  42. Udeh K, Wanik DW, Bassill N and Anagnostou E (2019) Time Series Modeling of Storm Outages with Weather Mesonet Data for Emergency Preparedness and Response, 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York City, NY, USA, pp. 0499–0505

  43. Werner Kristjanpoller R, Kevin Michell V (2018) A stock market risk forecasting model through integration of switching regime, ANFIS and GARCH techniques. Appl Soft Comput 67:106–116

    Article  Google Scholar 

  44. Wibisono A, Adibah J, Mursanto P, and Saputri MS (2019) Improvement of Big Data Stream Mining Technique for Automatic Bone Age Assessment, Proceedings of the 2019 ACM 3rd International Conference on Big Data Research, pp. 119–123

  45. Witten IH, Frank E, Trigg L, Hall M Holmes G, and Cunningham SJ (1999) Weka: Practical Machine Learning Tools and Techniques with Java Implementations, Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pp. 192–196

  46. Wu Y, Huang H, Wu N, Wang Y, Bhuiyan MZA, Wang T (2020) An incentive-based protection and recovery strategy for secure big data in social networks. Inf Sci 508:79–91

    Article  Google Scholar 

  47. Yang R, Yu L, Zhao Y, Yu H, Xu G, Wu Y, Liu Z (2020) Big data analytics for financial market volatility forecast based on support vector machine. Int J Inf Manag 50:452–462

    Article  Google Scholar 

Download references

Acknowledgments

This paper was drafted as part of Juan A. Lara’s research stay during 2019-2020 at Jordan University of Science and Technology, JUST (Jordan), which partially sponsored this research. The authors would like to thank UDIMA’s and JUST’s students who took part in the design and implementation of the system, particularly Francisco Javier Moreno Hermosilla, Paulina Pyzel and Amnah Al-Abdi; and JUST’s experts for providing their feedback in order to assess this system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan A. Lara.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

APPENDIX

APPENDIX

1.1 I –. ARFF files generated by the system

1.2 A. Excerpt of a particular minable view created for “standard” analysis (file .arff)

figure b

@relation weather-project.

@attribute Date date ‘yyyy-MM-dd’.

@attribute raLnfall numeric.

@attribute tmLn numeric.

@attribute tmax numeric.

@data.

2016–1-C1,40.90490992906111,3.125,13.33111111111111.

2016–2-C1,34.753053538158774,5.157777777777778,18.84.

2016-3-01,48.504665419434346,7.76046511627907,21.05625.

2016-4-01,42.176677782541,12.59375,28.476829268292683.

2016-5-01,?,14.482608695652175,29.78192771084337.

2016-6-01,?,19.125555555555557,36.2038961038961.

2016-7-01,?,20.276767676767676,36.09493670886076.

2016-8-01,?,21.55056179775281,36.88076923076923.

2016-9-01,?,16.78426966292135,32.72894736842105.

2016-10-01,48.04021044733257,13.712903225806452,29.78170731707317.

2016-11-01,?f7.062637362637362,21.71772151898734.

2016-12-01,44.539838581248475,2.833707865168539,13.484210526315788.

2017-1-01,32.95836866004329,1.4148936170212765,13.410975609756099.

2017-2-01,36.37586159726386,1.1903225806451612,15.182894736842105.

2017-3-01,40.60443010546419,6.790425531914893,20.07.

2017-4-01,39.17010546939185,12.114285714285714,27.001785714285717.

2017-5-01,?,14.576842105263157,31.349.

2017-6-01,?,18.812222222222225,34.6.

2017-7-01,?,22.743478260869566,38.703947368421055.

2017-8-01,?,20.94123711340206,37.10253164556962.

2017-9-01,?,18.993269230769233,34.8725.

2017–10-01,0,14.519847328244273,27.939772727272725.

2017–11-01,34.965075614664805,8.31359223300971,21.822093023255814.

2017–12-01,40.16383020752389,5.935294117647059,19.084883720930232

1.3 B. Excerpt of a particular minable view created for “neighbour-based” analysis (file .arff)

figure c

@relation weather-project @attriblate year numem-ic @attribu-te month numeric @attribu-te rainfall numeric @attribu-te latitude numeric @attribu-te longitiade numem-ic @attribu-te altitaide numeric @data.

2016,1,3-258,096,528,021,482,325,390,381,950,6E16.

2016,2,3-8,213,641,296,489,095,325,390,381,950,686.

2016,2,5-299,971,020,274,537,325,390,381,950,686.

2016,1,2-4,849,066,497,880,004,325,390,381,950,686.

2016,5,2,325,390,381,950,686.

2016,6,2,325,390,381,950,686.

2016,7,2,325,390,381′950,686.

2016,8,2,325,390,381′950,686.

2016,9,2,325,390,381′950,686.

2016,10,?,325,390,381,950,686.

2016,11,?,325,390,381,950,686.

2016,12,3-349,904,087,274,605,325,390,381,950,6436.

2017,1,?,325,290,381,950,686.

2017,2,2,225,390,381,950,686.

2017,3,4-762,173,924,797,756,325,390,381,950,6436.

2017,4,4-269,697,449,699,962,325,390,381,950,6436.

2017,5,2,325,390,281,950,686.

2017,6,2,325,390,281,950,686.

2017,7,2,325,390,281,950,686.

2017,8,2,325,390,281,950,686.

2017,9,2,325,390,201,950,686.

2017,10,?,325,390,201,950,686.

2017,11,2-4,849,066,497,880,004,325,390,381,950,606.

2017,12,2-0149020205422647,325,390,381,950,606.

2016,1,2-7,950,615,700,918,397,321,610,371,490,677

1.4 C. Excerpt of. ARFF test file

figure d

@relation weather-project.

@attribute year numeric.

@attribute month numeric.

@attribute rainfall numeric.

@attribute latitude numeric.

@attribute longitude numeric.

@attribute altitude numeric.

data.

2018,1,?,325,390,381,950,686.

2018,2,?,325,390,381,950,686.

2018,3,?,325,390,381,950,686.

2018,4,?,325,390,381,950,686.

2018,5,?,325,390,381,950,686.

2018,6,?,325,390,381,950,686.

2018,7,?,325,390,381,950,686.

2018,8,?,325,390,381,950,686.

2018,9,?,325,390,381,950,686.

2018,10,?,325,390,381,950,686.

2018,11,?,325,390,381,950,686.

2018,12,?,325,390,381,950,686.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aljawarneh, S., Lara, J.A. & Yassein, M.B. A visual big data system for the prediction of weather-related variables: Jordan-Spain case study. Multimed Tools Appl 82, 13103–13139 (2023). https://doi.org/10.1007/s11042-020-09848-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09848-9

Keywords

Navigation