Skip to main content
Log in

Conceptual modeling of big data SPJ operations with Twitter social medium

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Currently, the blooming growth of social networks such as Facebook, Twitter, Instagram, etc., has generated and is still generating a big amount of data, which can be regarded as a gold mine for business analysts and researchers where several insights that are useful and essential for effective decision making have to be provided. However, multiple problems and challenges affect the decisional support systems, especially at the level of the Extraction–Transformation–Loading processes. These processes are responsible for the selection, filtering and normalizing of data sources in order to obtain relevant decisions. As far as this research paper is concerned, we aim to focus on adapting the transformation phase with the MapReduce paradigm to process data in a distributed and parallel environment. Subsequently, we set forward a conceptual model of this second phase that is composed of several operations that handle NoSQL structure, which is suitable for Big Data storage. Finally, we implement through Talend for Big Data our new components, which help the designer apply selection, projection and joining operations on the extracted data from social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://datareportal.com/accessedApril2022.

  2. https://www.internetlivestats.com/.

  3. https://hadoop.apache.org/.

  4. https://www.ibm.com/topics/mapreduce.

  5. https://spark.apache.org/.

  6. https://www.uml.org/.

  7. https://www.bpmn.org/.

  8. https://www.omg.org/.

  9. https://www.proinfluent.com/nombre-utilisateurs-twitter/.

  10. https://www.omg.org/.

  11. https://www.talend.com/.

References

  • Alarabi L, Eldawy A, Alghamdi R, Mokbel MF (2014) TAREEG: a MapReduce-based system for extracting spatial data from OpenStreetMap. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 83–92

  • Awiti J, Vaisman AA, Zimányi E (2020) Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl Eng 129:101837

    Article  Google Scholar 

  • Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: Parallel-ETL based on the MapReduce paradigm. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA). IEEE, pp 42–49

  • Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136

    Article  Google Scholar 

  • Bendechache M, Tari AK, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34(6):671–689

    Article  Google Scholar 

  • Biswas N, Chattopadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2017) SysML based conceptual ETL process modeling. In: Computational intelligence, communications, and business analytics: first international conference, CICBA 2017, Kolkata, India, March 24–25, 2017, revised selected papers, part II, pp 242–255

  • Biswas N, Chattapadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2019) A new approach for conceptual extraction-transformation-loading process modeling. Int Ambient Comput Intell (IJACI) 10(1):30–45

    Article  Google Scholar 

  • Boussahoua M, Boussaid O, Bentayeb F (2017) Logical schema for data warehouse on column-oriented NoSQL databases. In: Database and expert systems applications: 28th international conference, DEXA

  • Cuzzocrea A, De Maio C, Fenza G, Loia V, Parente M (2016) OLAP analysis of multidimensional tweet streams for supporting advanced analytics. In: Proceedings of the 31st annual ACM symposium on applied computing, pp 992–999

  • Dhaouadi A, Bousselmi K, Monnet S, Gammoudi MM, Hammoudi S (2022) A multi-layer modeling for the generation of new architectures for big data warehousing. In: Advanced Information networking and applications: proceedings of the 36th international conference on advanced information networking and applications (AINA-2022), vol 2, pp 204–218

  • Di Tria F, Lefons E, Tangorra F (2017) Evaluation of data warehouse design methodologies in the context of big data. In: Big data analytics and knowledge discovery: 19th international conference, DaWaK 2017, Lyon, France, August 28–31, 2017, Proceedings 19. Springer, Berlin, pp 3–18

  • Eckerson W, White C (2003) Evaluating ETL and data integration platforms. TDWI report series

  • El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, pp 41–48

  • El Akkaoui Z, Mazón J N, Vaisman A, Zimányi E (2012) BPMN-based conceptual modeling of ETL processes. In: Data warehousing and knowledge discovery: 14th international conference, DaWaK 2012, Vienna, Austria, September 3–6, 2012. Proceedings 14, pp 1–14

  • El-Sappagh SHA, Hendawi AMA, El Bastawissy AH (2017) A proposed model for DW ETL processes

  • Gonzalez-Lopez J, Ventura S, Cano A (2018) Distributed nearest neighbor classification for large-scale multi-label data on spark. Futur Gener Comput Syst 87:66–82

    Article  Google Scholar 

  • Gupta G, Kumar N, Chhabra I (2020) Optimised transformation algorithm for hadoop data loading in web ETL framework. EAI Endorsed Trans Scalable Inf Syst 7(25):e6–e6

    Google Scholar 

  • Kumar S, Mohbey KK (2022) A review on big data based parallel and distributed approaches of pattern mining. J King Saud Univ Comput Inf Sci 34(5):1639–1662

    Google Scholar 

  • Liu X, Thomsen C, Pedersen TB (2013) ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Special issue on advances in data warehousing and knowledge discovery, transactions on large-scale data-and knowledge-centered systems VIII, pp 1–31

  • Liu X, Thomsen C, Pedersen TB (2014) CloudETL: scalable dimensional ETL for hive. In: Proceedings of the 18th international database engineering and applications symposium, pp 195–206

  • Machado GV, Cunha Í, Pereira AC, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15

    Article  Google Scholar 

  • Mallek H, Walha A, Ghozzi F, Gargouri F (2014) ETL-web process modeling. In: ASD Advances on decisional systems conference

  • Mallek H, Ghozzi F, Gargouri F (2020) Towards extract-transform-load operations in a big data context. Int J Sociotechnology Knowl Dev (IJSKD) 12(2):77–95

    Article  Google Scholar 

  • Mallek H, Ghozzi F, Gargouri F (2022) Conversion operation: from semi-structured collection of documents to Column-oriented structure. In: Proceedings of the 22nd international conference on hybrid intelligent systems (HIS 2022)

  • Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of Big Data extraction phase. Int J Hybrid Intell Syst 1–16. (Preprint)

  • Mallek H, Ghozzi F, Teste O, Gargouri F (2017). BigDimETL: ETL for multidimensional big data. In: Intelligent systems design and applications: 16th international conference on intelligent systems design and applications (ISDA 2016) held in Porto, Portugal, December 16–18, 2016, pp 935–944

  • Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12:123

    Article  Google Scholar 

  • Muñoz L, Mazon JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of DWs with UML activity diagrams. Mexico, November 9–14, 2008. Proceedings, pp 44–53

  • Muñoz L, Mazón J-N, Trujillo J (2010) A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Inf Softw Technol 52(11):1188–1203

    Article  Google Scholar 

  • Oliveira B, Belo O (2015) Task clustering on ETL systems—a pattern-oriented approach

  • Oliveira B, Oliveira Ó, Belo O (2021). Using BPMN for ETL conceptual modelling: a case study. In: DATA, pp 267–274

  • Russell N, Van Der Aalst W M, Ter Hofstede AH, Edmond D (2005) Workflow resource patterns: identification, representation and tool support. In: CAiSE, vol 5, pp 216–232

  • Russell N, Van der Aalst W, Ter Hofstede A, Wohed P (2006) On the suitability of UML 2.0 activity diagrams for business process modelling. In: Conceptual modelling 2006: Proceedings of APCCM2006, pp 95–104

  • Sharma S, Shandilya R, Patnaik S, Mahapatra A (2016) Leading NoSQL models for handling big data: a brief review. Int J Bus Inf Syst 22(1):1–25

    Google Scholar 

  • Song X, Yan X, Yang L (2009) Design ETL metamodel based on UML profile. In: 2009 Second international symposium on knowledge acquisition and modeling, vol 3, pp 69–72

  • Swari MHP, Satwika IKS, Handika IPS (2020) Performance analysis of sales big data processing using hadoop and hive in cloud environment. In: 2020 6th Information technology international seminar (ITIS). IEEE

  • Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data DWs. In: Conceptual modeling-ER 2003: 22nd international conference on conceptual modeling, Chicago, IL, USA, October 13–16, 2003. Proceedings 22, pp 307–320

  • Trujillo J, Davis KC, Du X et al (2021) Conceptual modeling in the era of big data and artificial intelligence: research topics and introduction to the special issue. Data Knowl Eng 135:101911

    Article  Google Scholar 

  • Vassiliadis P, Vagena Z, Skiadopoulos S, Karayannidis N, Sellis T (2001) ARKTOS: towards the modeling, design, control and execution of ETL processes. Inf Syst 26(8):537–561

    Article  Google Scholar 

  • Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on data warehousing and OLAP, pp 14–21

  • Walha A, Ghozzi F, Gargouri F (2017) ETL4Social-Data: modeling approach for topic hierarchy. In: KEOD, pp 107–118

  • Wilkinson K, Simitsis A, Castellanos M, Dayal U (2010) Leveraging business process models for ETL design. In: Conceptual modeling-ER 2010: 29th international conference on conceptual modeling, Vancouver, BC, Canada, November 1–4, 2010. Proceedings 29, pp 15–30

Download references

Author information

Authors and Affiliations

Authors

Contributions

H.Mallek and F.Ghozzi wrote the main manuscript text and F. Gargouri prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hana Mallek.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mallek, H., Ghozzi, F. & Gargouri, F. Conceptual modeling of big data SPJ operations with Twitter social medium. Soc. Netw. Anal. Min. 13, 105 (2023). https://doi.org/10.1007/s13278-023-01112-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-023-01112-w

Keywords

Navigation