Skip to main content

Design and Execution of ETL Process to Build Topic Dimension from User-Generated Content

  • Conference paper
  • First Online:
Research Challenges in Information Science (RCIS 2021)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 415))

Included in the following conference series:

Abstract

Latest research studies on multi-dimensional design have combined business data with User-Generated Content (UGC). They have integrated new analytical aspects, such as user’s behavior, sentiments, opinions or topics of interest, to ameliorate decisional analysis. In this paper, we deal with the complexity of designing topics dimension schema due to the dynamicity and heterogeneity of its hierarchies. Researchers addressed partially this issue by offering technical solutions to topics detection without focusing on the Extraction, Transformation and Loading (ETL) process allowing their integration in multi-dimensional schema. Our contribution consists in modeling ETL steps generating valid topic dimension hierarchies referring to UGC informal texts. In this research work, we propose a generic ETL4SocialTopic process model defining a set of operations executed following a specific order. The implementation of these steps offers a set of customized jobs simplifying the ETL designer’s work by automating a large part of the process. Experimentation results show the consistency of ETL4SocialTopic to design valid topic dimension schemas in several contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.talend.com/products/data-integration/.

  2. 2.

    https://developer.twitter.com/en/products/tweets.

References

  1. Muntean, M., Cabău, L.G., Rinciog, V.: Social business intelligence: a new perspective for decision makers. Proc.-Soc. Behav. Sci. 124, 562–567 (2014)

    Article  Google Scholar 

  2. Gallinucci, E., Golfarelli, M., Rizzi, S.: Meta-stars: multidimensional modeling for social business intelligence. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, pp. 11–18 (2013)

    Google Scholar 

  3. Gallinucci, E., Golfarelli, M., Rizzi, S.: Advanced topic modeling for social business intelligence. Inf. Syst. 53, 87–106 (2015)

    Article  Google Scholar 

  4. Rehman, N.U., Weiler, A., Scholl, M.H.: OLAPing social media: the case of Twitter. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1139–1146. IEEE (2013)

    Google Scholar 

  5. Dayal, U., Gupta, C., Castellanos, M., Wang, S., Garcia-Solaco, M.: Of cubes, DAGs and hierarchical correlations: a novel conceptual model for analyzing social media data. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 30–49. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34002-4_3

    Chapter  Google Scholar 

  6. Francia, M., Gallinucci, E., Golfarelli, M., Rizzi, S.: Social business intelligence in action. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 33–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_3

    Chapter  Google Scholar 

  7. Gutiérrez-Batista, K., et al.: Building a contextual dimension for OLAP using textual data from social networks. Expert Syst. Appl. 93, 118–133 (2018)

    Article  Google Scholar 

  8. Kurnia, P.F.: Business intelligence model to analyze social media information. Proc. Comput. Sci. 135, 5–14 (2018)

    Article  Google Scholar 

  9. Rehman, N.U., et al.: Building a data warehouse for twitter stream exploration. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1341–1348. IEEE (2012)

    Google Scholar 

  10. Mukherjee, R., Kar, P.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948. IEEE (2017)

    Google Scholar 

  11. El Akkaoui, Z., Mazón, J.-N., Vaisman, A., Zimányi, E.: BPMN-based conceptual modeling of ETL processes. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 1–14. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_1

    Chapter  Google Scholar 

  12. Oliveira, B., Belo, O.: BPMN patterns for ETL conceptual modelling and validation. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds.) ISMIS 2012. LNCS (LNAI), vol. 7661, pp. 445–454. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34624-8_50

    Chapter  Google Scholar 

  13. Walha, A., Ghozzi, F., Gargouri, F.: From user generated content to social data warehouse: processes, operations and data modelling. Int. J. Web Eng. Technol. 14(3), 203–230 (2019)

    Article  Google Scholar 

  14. Awiti, J., Vaisman, A.A., Zimányi, E.: Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl. Eng. 129, 101–837 (2020)

    Article  Google Scholar 

  15. Nagamanjula, R., Pethalakshmi, A.: A novel framework based on bi-objective optimization and LAN 2 FIS for Twitter sentiment analysis. Soc. Netw. Anal. Min. 10, 1–16 (2020)

    Article  Google Scholar 

  16. Singh, S., Manjunanh, T.N., Aswini, N.: A study on Twitter 4j libraries for data acquisition from tweets. Int. J. Comput. Appl. 975(2016), 8887 (2016)

    Google Scholar 

  17. Hemalatha, I., Saradhi Varma, G.P., Govardhan, A.: Preprocessing the informal text for efficient sentiment analysis. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 1(2), 58–61 (2012)

    Google Scholar 

  18. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  19. El Akkaoui, Z., Vaisman, A.A., Zimányi, E.: A quality-based ETL design evaluation framework. In: ICEIS, no. 1 (2019)

    Google Scholar 

  20. Abran, A., et al.: Usability meanings and interpretations in ISO standards. Softw. Qual. J. 11(4), 325–338 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Walha, A., Ghozzi, F., Gargouri, F. (2021). Design and Execution of ETL Process to Build Topic Dimension from User-Generated Content. In: Cherfi, S., Perini, A., Nurcan, S. (eds) Research Challenges in Information Science. RCIS 2021. Lecture Notes in Business Information Processing, vol 415. Springer, Cham. https://doi.org/10.1007/978-3-030-75018-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75018-3_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75017-6

  • Online ISBN: 978-3-030-75018-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics