Abstract
Big data analytics has recently emerged as an important research area due to the advent of user-generated content platforms. In recent years, we have witnessed an exponential increase in data production all over the world. This phenomenon is known as big data. Social media platforms, like Facebook, Twitter, and YouTube represent the most popular source of big data. Big data coming from social networking sites can be immensely useful for all companies by providing valuable insights on consumer preferences and changing trends. This information presents enormous opportunities for companies that have implemented Big Data management solutions. For this reason, how to store, manage, and transform social media posts into knowledge for decision-makers becomes an important research problem. On the other hand, the massive amount of data generated by users using social media platforms has led to the emergence and development of new technologies and techniques of data management. Data Lake is one of the latest technologies that was introduced to address this challenge in the last period. In this paper, we provide a NoSQL data lake design approach. More precisely, we start by introducing the main concepts of data lake. We also discuss the advantages of data lakes and their impact on big data analytics. Then, we describe some of the recent literature reviews on data lake design approaches. Finally, we provide a NoSQL Data lake using MongoDB that allows storing big data collected from social networks such as Facebook, Twitter, and Youtube.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gryncewicz, W., Sitarska-Buba, M., Zygała, R.: Agile approach to develop data lake based systems. In: Towards Industry 4.0–Current Challenges in Information Systems, pp. 201–216. Springer, Cham (2020)
Singh, K., Paneri, K., Pandey, A., Gupta, G., Sharma, G., Agarwal, P., Shroff, G.: Visual Bayesian fusion to navigate a data lake. In: 2016 19th International Conference on Information Fusion (FUSION), pp. 987–994. IEEE, July 2016
Farrugia, A., Claxton, R., Thompson, S.: Towards social network analytics for understanding and managing enterprise data lakes. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1213–1220. IEEE, August 2016
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE, August 2015
Beheshti, A., Benatallah, B., Nouri, R., Chhieng, V.M., Xiong, H., Zhao, X.: CoreDB: a data lake service. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2451–2454, November 2017
Munshi, A.A., Mohamed, Y.A.R.I.: Data lake lambda architecture for smart grids big data analytics. IEEE Access 6, 40463–40471 (2018)
Cha, B., Park, S., Kim, J., Pan, S., Shin, J.: International network performance and security testing based on distributed Abyss storage cluster and draft of data lake framework. Secur. Commun. Netw. 2018, 14 (2018)
Chen, Y.H., Chen, H.H., Huang, P.C.: Enhancing the data privacy for public data lakes. In: 2018 IEEE International Conference on Applied System Invention (ICASI), pp. 1065–1068. IEEE, April 2018
Małysiak-Mrozek, B., Stabla, M., Mrozek, D.: Soft and declarative fishing of information in big data lake. IEEE Trans. Fuzzy Syst. 26(5), 2732–2747 (2018)
Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. Proc. VLDB Endowment 11(12), 1942–1945 (2018)
Rangarajan, S., Liu, H., Wang, H., Wang, C.L.: Scalable architecture for personalized healthcare service recommendation using big data lake. In: Service Research and Innovation, pp. 65–79. Springer, Cham (2015)
Maini, E., Venkateswarlu, B., Gupta, A.: Data lake-an optimum solution for storage and analytics of big data in cardiovascular disease prediction system (2018)
Klein, S.: Azure data lake analytics. In: IoT Solutions in Microsoft’s Azure IoT Suite, pp. 155-172. Apress, Berkeley (2017)
Rieder, B.: Studying Facebook via data extraction: the Netvizz application. In: Proceedings of the 5th Annual ACM Web Science Conference, pp. 346-355, May 2013
Chawla, H., Khattar, P.: Data lake analytics concepts. In: Data Lake Analytics on Microsoft Azure, pp. 1–10. Apress, Berkeley (2020)
Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56, 1–24 (2020)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dabbèchi, H., Haddar, N.Z., Elghazel, H., Haddar, K. (2021). NoSQL Data Lake: A Big Data Source from Social Media. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-73050-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73049-9
Online ISBN: 978-3-030-73050-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)