Skip to main content

Social Media Data Integration: From Data Lake to NoSQL Data Warehouse

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2020)

Abstract

As more social media platforms expand through our lives, the amount of data exchanged across them has sharply upsurged. Data coming from social network sites can be immensely useful for all companies for determining customer trends and increase operational efficiency to get a competitive edge. At the same time, traditional decision support systems are unable to meet the growing needs of the modern enterprise to integrate and analyze a wide variety of data generated by social networks platforms. This emergence of large amounts of data requires new techniques of data management and data storage architectures able to find information quickly in a large volume of data. In this context, a data storage concept known under the name of data lake appeared, which refers to one of the latest technologies that were introduced to address this challenge in the last period. A data lake is a large raw data repository that stores and manages all company data in raw form before integrating them into the data warehouse. In this paper, we provide a new approach to design a NoSQL data warehouse from a data lake. More precisely, we start by introducing some of the recent literature reviews on NoSQL data warehouse design approaches. Then, we describe the main concepts of a NoSQL data lake that allows storing the big data collected from social networks such as Facebook, Twitter, and Youtube. Finally, we define a set of mapping rules to integrate social media data from the data lake into the NoSQL data warehouse based on two NoSQL logical models: column-oriented and document-oriented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, C.: Transforming relational database into HBase: a case study. In: 2010 IEEE International Conference on Software Engineering and Service Sciences, pp. 683-687. IEEE, July 2010

    Google Scholar 

  2. Han, D., Stroulia, E.: A three-dimensional data model in hbase for large time-series dataset analysis. In: 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA), pp. 47–56. IEEE, September 2012

    Google Scholar 

  3. Dede, E., Govindaraju, M., Gunter, D., Canon, R.S., Ramakrishnan, L.: Performance evaluation of a mongodb and hadoop platform for scientific data analysis. In: Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, pp. 13-20, June 2013

    Google Scholar 

  4. Dehdouh, K., Boussaid, O., Bentayeb, F.: Columnar nosql star schema benchmark. In: International Conference on Model and Data Engineering, pp. 281–288. Springer, Cham, September 2014

    Google Scholar 

  5. Dehdouh, K., Boussaid, O., Bentayeb, F.: Big data warehouse: building columnar NoSQL OLAP cubes. Int. J. Dec. Support Syst. Technol. (IJDSST) 12(1), 1–24 (2020)

    Article  Google Scholar 

  6. Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 469. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) (2015)

    Google Scholar 

  7. Zhao, H., Ye, X.: A multidimensional OLAP engine implementation in key-value database systems. In: Advancing Big Data Benchmarks, pp. 155–170. Springer, Cham (2013)

    Google Scholar 

  8. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL (2015)

    Google Scholar 

  9. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementation of multidimensional databases in column-oriented NoSQL systems. In: East European Conference on Advances in Databases and Information Systems, pp. 79–91. Springer, Cham, September 2015

    Google Scholar 

  10. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Entrepôts de données orientés documents: cuboïdes étendus. Document numérique 20(1), 9–38 (2017)

    Google Scholar 

  11. Ferro, M., Fragoso, R., Fidalgo, R.: Document-oriented geospatial data warehouse: an experimental evaluation of SOLAP queries. In: 2019 IEEE 21st Conference on Business Informatics (CBI), vol. 1, pp. 47–56. IEEE, July 2019

    Google Scholar 

  12. Oditis, I., Bicevska, Z., Bicevskis, J., Karnitis, G.: Implementation of NoSQL-based data Wareh. Baltic J. Mod. Comput. 6(1), 45–55 (2018)

    Article  Google Scholar 

  13. Scabora, L.C., Brito, J.J., Ciferri, R.R., Ciferri, C.D.D.A.: Physical data warehouse design on NoSQL databases. In: Proceedings of the 18th International Conference on Enterprise Information Systems, pp. 111–118. SCITEPRESS-Science and Technology Publications, Lda, April 2016

    Google Scholar 

  14. Dabbèchi, H., Haddar, N., Abdallah, M.B., Haddar, K.: A unified multidimensional data model from social networks for unstructured data analysis. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 415-422. IEEE, October 2017

    Google Scholar 

  15. Prakash, D.: NOSOLAP: Moving from data warehouse requirements to NoSQL databases. In: ENASE, pp. 452–458, May 2019

    Google Scholar 

  16. Yang, E., Scheff, J.D., Shen, S.C., Farnum, M.A., Sefton, J., Lobanov, V., Agrafiotis, D.K.: A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials. Database (2019)

    Google Scholar 

  17. Jianmin, W., Wenbin, Z., Tongrang, F., Shilong, Y., Hongwei, L.: An improved join-free snowflake schema for ETL and OLAP of data warehouse. Pract. Exper. Concurrency Comput. e5519 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dabbèchi, H., Haddar, N.Z., Elghazel, H., Haddar, K. (2021). Social Media Data Integration: From Data Lake to NoSQL Data Warehouse. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_64

Download citation

Publish with us

Policies and ethics