Abstract
Every day billions of people access web sites, blogs, and social media. Often they use their mobile devices and produce huge amount of data that can be effectively exploited for extracting valuable information concerning human dynamics and behaviors. Such data, commonly referred as Big Data, contains rich information about user activities, interests, and behaviors, which makes it intrinsically suited to a very large set of applications. For getting valuable information and knowledge from such data in a reasonable time, novel scalable frameworks and data analysis techniques on Cloud systems have been developed. This paper aims at describing some recent Cloud-based frameworks and methodologies for Big Data processing that can be used for developing and executing several data analysis applications, including trajectory mining and sentiment analysis. The paper is organized in two main parts. The first part focuses on tools for developing and executing scalable data analysis applications on Clouds. The second part presents data analysis methodologies for extracting knowledge from large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agapito, G., Cannataro, M., Guzzi, P., Marozzo, F., Talia, D., Trunfio, P.: Cloud4SNP: distributed analysis of SNP microarray data on the cloud, pp. 468–475 (2013). https://doi.org/10.1145/2506583.2506605
Altomare, A., Cesario, E., Comito, C., Marozzo, F., Talia, D.: Trajectory pattern mining for urban computing in the cloud. IEEE Trans. Parallel Distrib. Syst. 28(2), 586–599 (2017). https://doi.org/10.1109/TPDS.2016.2565480
Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22(4), 469–483 (1996)
Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Discovering political polarization on social media: a case study, pp. 182–189 (2019). https://doi.org/10.1109/SKG49510.2019.00038
Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Learning political polarization on social media using neural networks. IEEE Access 8, 47177–47187 (2020). https://doi.org/10.1109/ACCESS.2020.2978950
Belcastro, L., Marozzo, F., Talia, D.: Programming models and systems for big data analysis. Int. J. Parallel Emerg. Distrib. Syst. 34(6), 632–652 (2019). https://doi.org/10.1080/17445760.2017.1422501
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Programming visual and script-based big data analytics workflows on clouds. Adv. Parallel Comput. 26, 18–31 (2015). https://doi.org/10.3233/978-1-61499-583-8-18
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Using scalable data mining for predicting flight delays. ACM Trans. Intell. Syst. Technol. 8(1) (2016). https://doi.org/10.1145/2888402
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Big data analysis on clouds. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 101–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_4
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Appraising SPARK on large-scale social media analysis. In: Heras, D.B., Bougé, L. (eds.) Euro-Par 2017. LNCS, vol. 10659, pp. 483–495. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75178-8_39
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: G-RoI: automatic region-of-interest detection driven by geotagged social media data. ACM Trans. Knowl. Discov. Data 12(3) (2018). https://doi.org/10.1145/3154411
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: ParSoDA: high-level parallel programming for social data mining. Soc. Netw. Anal. Min. 9(1), 1–19 (2018). https://doi.org/10.1007/s13278-018-0547-5
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: A high-level programming library for mining social media. In: Post-Proceedings of the High Performance Computing Workshop 2018, Cetraro, Italy. Advances in Parallel Computing, vol. 34, pp. 3–21. IOS Press, 2–6 July 2019
Cesario, E., et al.: Following soccer fans from geotagged tweets at FIFA World Cup 2014. In: Proceedings of the 2nd IEEE Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, pp. 33–38, July 2015. https://doi.org/10.1109/ICSDM.2015.7298021
Cesario, E., et al.: Analyzing social media data to discover mobility patterns at EXPO 2015: methodology and result. In: The 2016 International Conference on High Performance Computing & Simulation (HPCS 2016), Innsbruck, Austria, pp. 230–237, July 2016. https://doi.org/10.1109/HPCSim.2016.7568340
Cesario, E., Iannazzo, A., Marozzo, F., Morello, F., Talia, D., Trunfio, P.: Nubytics: scalable cloud services for data analysis and prediction. In: 2nd International Forum on Research and Technologies for Society and Industry (RTSI 2016), Bologna, Italy, pp.1–6, September 2016. https://doi.org/10.1109/RTSI.2016.7740643
Cesario, E., Marozzo, F., Talia, D., Trunfio, P.: SMA4TD: a social media analysis methodology for trajectory discovery in large-scale events. Online Soc. Netw. Media 3–4, 49–62 (2017). https://doi.org/10.1016/j.osnem.2017.10.002
Duro, F.R., Blas, J.G., Carretero, J.: A hierarchical parallel storage system based on distributed memory for large scale systems. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 139–140 (2013)
de Graaff, V., de By, R.A., van Keulen, M., Flokstra, J.: Point of interest to region of interest conversion. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2013, pp. 388–391. ACM, New York (2013)
Hansen, P.C.: Analysis of discrete ill-posed problems by means of the L-Curve. SIAM Rev. 34(4), 561–580 (1992)
Marozzo, F., Bessi, A.: Analyzing polarization of social media users and news sites during political campaigns. Soc. Netw. Anal. Min. 8(1), 1–13 (2017). https://doi.org/10.1007/s13278-017-0479-5
Marozzo, F., Rodrigo Duro, F., Garcia Blas, J., Carretero, J., Talia, D., Trunfio, P.: A data-aware scheduling strategy for workflow execution in clouds. Concurrency Comput. 29(24) (2017). https://doi.org/10.1002/cpe.4229
Marozzo, F., Talia, D., Trunfio, P.: A cloud framework for parameter sweeping data mining applications. In: Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2011), Athens, Greece, pp. 367–374. IEEE Computer Society Press, December 2011. https://doi.org/10.1109/CloudCom.2011.56
Marozzo, F., Talia, D., Trunfio, P.: A cloud framework for big data analytics workflows on azure. Adv. Parallel Comput. 23, 182–191 (2013). https://doi.org/10.3233/978-1-61499-322-3-182
Marozzo, F., Talia, D., Trunfio, P.: Js4Cloud: script-based workflow programming for scalable data analysis on cloud platforms. Concurrency Comput. 27(17), 5214–5237 (2015). https://doi.org/10.1002/cpe.3563
Marozzo, F., Talia, D., Trunfio, P.: A workflow management system for scalable data mining on clouds. IEEE Trans. Serv. Comput. 11(3), 480–492 (2018). https://doi.org/10.1109/TSC.2016.2589243
Rodrigo Duro, F., Marozzo, F., Garcia Blas, J., Talia, D., Trunfio, P.: Exploiting in-memory storage for improving workflow executions in cloud platforms. J. Supercomput. 72(11), 4069–4088 (2016). https://doi.org/10.1007/s11227-016-1678-y
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th symposium on Mass storage systems and technologies (MSST), pp. 1–10. IEEE (2010)
Talia, D., Trunfio, P., Marozzo, F.: Data analysis in the cloud: models. Tech. Appl. (2015). https://doi.org/10.1016/C2014-0-02172-7
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2012)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P. (2021). Cloud Computing for Enabling Big Data Analysis. In: Ferguson, D., Pahl, C., Helfert, M. (eds) Cloud Computing and Services Science. CLOSER 2020. Communications in Computer and Information Science, vol 1399. Springer, Cham. https://doi.org/10.1007/978-3-030-72369-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-72369-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72368-2
Online ISBN: 978-3-030-72369-9
eBook Packages: Computer ScienceComputer Science (R0)