Skip to main content

Cloud Computing for Enabling Big Data Analysis

  • Conference paper
  • First Online:
Cloud Computing and Services Science (CLOSER 2020)

Abstract

Every day billions of people access web sites, blogs, and social media. Often they use their mobile devices and produce huge amount of data that can be effectively exploited for extracting valuable information concerning human dynamics and behaviors. Such data, commonly referred as Big Data, contains rich information about user activities, interests, and behaviors, which makes it intrinsically suited to a very large set of applications. For getting valuable information and knowledge from such data in a reasonable time, novel scalable frameworks and data analysis techniques on Cloud systems have been developed. This paper aims at describing some recent Cloud-based frameworks and methodologies for Big Data processing that can be used for developing and executing several data analysis applications, including trajectory mining and sentiment analysis. The paper is organized in two main parts. The first part focuses on tools for developing and executing scalable data analysis applications on Clouds. The second part presents data analysis methodologies for extracting knowledge from large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://mahout.apache.org/.

  2. 2.

    https://spark.apache.org/mllib/.

  3. 3.

    https://www.openstreetmap.org/.

  4. 4.

    http://www.fifa.com/worldcup/archive/brazil2014.

  5. 5.

    http://www.expo2015.org/.

References

  1. Agapito, G., Cannataro, M., Guzzi, P., Marozzo, F., Talia, D., Trunfio, P.: Cloud4SNP: distributed analysis of SNP microarray data on the cloud, pp. 468–475 (2013). https://doi.org/10.1145/2506583.2506605

  2. Altomare, A., Cesario, E., Comito, C., Marozzo, F., Talia, D.: Trajectory pattern mining for urban computing in the cloud. IEEE Trans. Parallel Distrib. Syst. 28(2), 586–599 (2017). https://doi.org/10.1109/TPDS.2016.2565480

    Article  Google Scholar 

  3. Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22(4), 469–483 (1996)

    Article  MathSciNet  Google Scholar 

  4. Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Discovering political polarization on social media: a case study, pp. 182–189 (2019). https://doi.org/10.1109/SKG49510.2019.00038

  5. Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Learning political polarization on social media using neural networks. IEEE Access 8, 47177–47187 (2020). https://doi.org/10.1109/ACCESS.2020.2978950

    Article  Google Scholar 

  6. Belcastro, L., Marozzo, F., Talia, D.: Programming models and systems for big data analysis. Int. J. Parallel Emerg. Distrib. Syst. 34(6), 632–652 (2019). https://doi.org/10.1080/17445760.2017.1422501

    Article  Google Scholar 

  7. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Programming visual and script-based big data analytics workflows on clouds. Adv. Parallel Comput. 26, 18–31 (2015). https://doi.org/10.3233/978-1-61499-583-8-18

    Article  Google Scholar 

  8. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Using scalable data mining for predicting flight delays. ACM Trans. Intell. Syst. Technol. 8(1) (2016). https://doi.org/10.1145/2888402

  9. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Big data analysis on clouds. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 101–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_4

    Chapter  Google Scholar 

  10. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Appraising SPARK on large-scale social media analysis. In: Heras, D.B., Bougé, L. (eds.) Euro-Par 2017. LNCS, vol. 10659, pp. 483–495. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75178-8_39

    Chapter  Google Scholar 

  11. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: G-RoI: automatic region-of-interest detection driven by geotagged social media data. ACM Trans. Knowl. Discov. Data 12(3) (2018). https://doi.org/10.1145/3154411

  12. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: ParSoDA: high-level parallel programming for social data mining. Soc. Netw. Anal. Min. 9(1), 1–19 (2018). https://doi.org/10.1007/s13278-018-0547-5

    Article  Google Scholar 

  13. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: A high-level programming library for mining social media. In: Post-Proceedings of the High Performance Computing Workshop 2018, Cetraro, Italy. Advances in Parallel Computing, vol. 34, pp. 3–21. IOS Press, 2–6 July 2019

    Google Scholar 

  14. Cesario, E., et al.: Following soccer fans from geotagged tweets at FIFA World Cup 2014. In: Proceedings of the 2nd IEEE Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, pp. 33–38, July 2015. https://doi.org/10.1109/ICSDM.2015.7298021

  15. Cesario, E., et al.: Analyzing social media data to discover mobility patterns at EXPO 2015: methodology and result. In: The 2016 International Conference on High Performance Computing & Simulation (HPCS 2016), Innsbruck, Austria, pp. 230–237, July 2016. https://doi.org/10.1109/HPCSim.2016.7568340

  16. Cesario, E., Iannazzo, A., Marozzo, F., Morello, F., Talia, D., Trunfio, P.: Nubytics: scalable cloud services for data analysis and prediction. In: 2nd International Forum on Research and Technologies for Society and Industry (RTSI 2016), Bologna, Italy, pp.1–6, September 2016. https://doi.org/10.1109/RTSI.2016.7740643

  17. Cesario, E., Marozzo, F., Talia, D., Trunfio, P.: SMA4TD: a social media analysis methodology for trajectory discovery in large-scale events. Online Soc. Netw. Media 3–4, 49–62 (2017). https://doi.org/10.1016/j.osnem.2017.10.002

    Article  Google Scholar 

  18. Duro, F.R., Blas, J.G., Carretero, J.: A hierarchical parallel storage system based on distributed memory for large scale systems. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 139–140 (2013)

    Google Scholar 

  19. de Graaff, V., de By, R.A., van Keulen, M., Flokstra, J.: Point of interest to region of interest conversion. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2013, pp. 388–391. ACM, New York (2013)

    Google Scholar 

  20. Hansen, P.C.: Analysis of discrete ill-posed problems by means of the L-Curve. SIAM Rev. 34(4), 561–580 (1992)

    Article  MathSciNet  Google Scholar 

  21. Marozzo, F., Bessi, A.: Analyzing polarization of social media users and news sites during political campaigns. Soc. Netw. Anal. Min. 8(1), 1–13 (2017). https://doi.org/10.1007/s13278-017-0479-5

    Article  Google Scholar 

  22. Marozzo, F., Rodrigo Duro, F., Garcia Blas, J., Carretero, J., Talia, D., Trunfio, P.: A data-aware scheduling strategy for workflow execution in clouds. Concurrency Comput. 29(24) (2017). https://doi.org/10.1002/cpe.4229

  23. Marozzo, F., Talia, D., Trunfio, P.: A cloud framework for parameter sweeping data mining applications. In: Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2011), Athens, Greece, pp. 367–374. IEEE Computer Society Press, December 2011. https://doi.org/10.1109/CloudCom.2011.56

  24. Marozzo, F., Talia, D., Trunfio, P.: A cloud framework for big data analytics workflows on azure. Adv. Parallel Comput. 23, 182–191 (2013). https://doi.org/10.3233/978-1-61499-322-3-182

    Article  Google Scholar 

  25. Marozzo, F., Talia, D., Trunfio, P.: Js4Cloud: script-based workflow programming for scalable data analysis on cloud platforms. Concurrency Comput. 27(17), 5214–5237 (2015). https://doi.org/10.1002/cpe.3563

    Article  Google Scholar 

  26. Marozzo, F., Talia, D., Trunfio, P.: A workflow management system for scalable data mining on clouds. IEEE Trans. Serv. Comput. 11(3), 480–492 (2018). https://doi.org/10.1109/TSC.2016.2589243

    Article  Google Scholar 

  27. Rodrigo Duro, F., Marozzo, F., Garcia Blas, J., Talia, D., Trunfio, P.: Exploiting in-memory storage for improving workflow executions in cloud platforms. J. Supercomput. 72(11), 4069–4088 (2016). https://doi.org/10.1007/s11227-016-1678-y

    Article  Google Scholar 

  28. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th symposium on Mass storage systems and technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  29. Talia, D., Trunfio, P., Marozzo, F.: Data analysis in the cloud: models. Tech. Appl. (2015). https://doi.org/10.1016/C2014-0-02172-7

    Article  Google Scholar 

  30. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2012)

    Google Scholar 

  31. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Domenico Talia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belcastro, L., Marozzo, F., Talia, D., Trunfio, P. (2021). Cloud Computing for Enabling Big Data Analysis. In: Ferguson, D., Pahl, C., Helfert, M. (eds) Cloud Computing and Services Science. CLOSER 2020. Communications in Computer and Information Science, vol 1399. Springer, Cham. https://doi.org/10.1007/978-3-030-72369-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72369-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72368-2

  • Online ISBN: 978-3-030-72369-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics