Skip to main content

A Variety-Sensitive ETL Processes

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

Abstract

Nowadays, small, medium and large companies need advanced data integration techniques supported by tools to analyse data in order to deliver real-time alerts and trigger automated actions, etc. In the context of rapidly technology changing, these techniques have to consider two main issues: (a) the variety of the huge amount of data sources (ex. traditional, semantic, and graph databases) and (b) the variety of storage platforms, where a data integration system may have several stores, where one hosts a particular type. These issues directly impact the efficiency and the deployment flexibility of ETL (Extract, Transform, Load). In this paper, we consider these issues. Firstly, thanks to Model Driven Engineering, we make generic different types of data sources. This genericity allows overloading the ETL operators. To show the benefit of this genericity, several examples of instantiation are described covering relational, semantic and graph databases. Secondly, a Web-service-driven approach for orchestrating the ETL flows is given. Thirdly, we present a fusion procedure that merges the set of heterogeneous instances and deployed according their favorite stores. Finally, our finding is validated through a proof of concept tool using the LUBM benchmark and YAGO \(\mathcal {KB}\) and deployed in Oracle RDF Semantic Graph 12c.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.omg.org/mof/.

  2. 2.

    https://neo4j.com/product/.

  3. 3.

    http://swat.cse.lehigh.edu/projects/lubm/.

  4. 4.

    www.yago-knowledge.org/.

  5. 5.

    http://www.oracle.com/technetwork/java/dataaccessobject-138824.html.

  6. 6.

    http://www.cytoscape.org/.

References

  1. Akkaoui, Z., Mazón, J.-N., Vaisman, A., Zimányi, E.: BPMN-based conceptual modeling of ETL processes. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 1–14. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32584-7_1

    Chapter  Google Scholar 

  2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  3. Berkani, N., Bellatreche, L., Khouri, S.: Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Comput. 16(4), 915–931 (2013)

    Article  Google Scholar 

  4. Calvanese, D., Lenzerini, M., Nardi, D.: Description logics for conceptual data modeling. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems, pp. 229–263. Springer, Boston (1998). doi:10.1007/978-1-4615-5643-5_8

    Chapter  Google Scholar 

  5. Casati, F., Castellanos, M., Dayal, U., Salazar, N.: A generic solution for warehousing business process data. In: VLDB, pp. 1128–1137 (2007)

    Google Scholar 

  6. Craig, I.: The Interpretation of Object-Oriented Programming Languages. Springer, London (2002). doi:10.1007/978-1-4471-0199-4

    Book  MATH  Google Scholar 

  7. Dong, X.L., Srivastava, D.: Big data integration. PVLDB 6(11), 118 (2013)

    Google Scholar 

  8. Mazón, J.-N., Trujillo, J.: An MDA approach for the development of data warehouses. In: JISBD, p. 208 (2009)

    Google Scholar 

  9. Jean, S., Bellatreche, L., Ordonez, C., Fokou, G., Baron, M.: OntoDBench: interactively benchmarking ontology storage in a database. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 499–503. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41924-9_44

    Chapter  Google Scholar 

  10. Kolev, B., Valduriez, P., Bondiombouy, C., Jiménez-Peris, R., Pau, R., Pereira, J.: CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib. Parallel Databases 34(4), 463–503 (2016)

    Article  Google Scholar 

  11. Lenzerini, M.: Data integration: a theoretical perspective. In: ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 233–246 (2002)

    Google Scholar 

  12. Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data mapping diagrams for data warehouse design with UML. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 191–204. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30464-7_16

    Chapter  Google Scholar 

  13. Nakuçi, E., Theodorou, V., Jovanovic, P., Abelló, A.: Bijoux: data generator for evaluating ETL process quality. In: ACM DOLAP, pp. 23–32 (2014)

    Google Scholar 

  14. Nebot, V., Berlanga, R.: Building data warehouses with semantic web data. Decis. Support Syst. 52(4), 853–868 (2012)

    Article  Google Scholar 

  15. Raventós, R., Olivé, A.: An object-oriented operation-based approach to translation between MOF metaschemas. Data Knowl. Eng. 67(3), 444–462 (2008)

    Article  Google Scholar 

  16. Rodriguez, M.A., Neubauer, P.: Constructions from dots and lines. CoRR, abs/1006.2361 (2010)

    Google Scholar 

  17. Shmueli, O., Tsur, S.: Logical diagnosis of LDL programs. New Gener. Comput. 9(3/4), 277–304 (1991)

    Article  MATH  Google Scholar 

  18. Simitsis, A., Vassiliadis, P., Sellis, T.-K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005)

    Google Scholar 

  19. Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010)

    Google Scholar 

  20. Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int. J. Semant. Web Inf. Syst. 3(4), 1–24 (2007)

    Article  Google Scholar 

  21. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)

    Google Scholar 

  22. Trujillo, J., Luján-Mora, S.: A UML based approach for modeling ETL processes in data warehouses. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 307–320. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39648-2_25

    Chapter  Google Scholar 

  23. Tziovara, P., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: DOLAP, pp. 49–56 (2007)

    Google Scholar 

  24. Vassiliadis, P.: A survey of extract-transform-load technology. IJDWM 5(3), 1–27 (2009)

    Google Scholar 

  25. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of etl scenarios. Inf. Syst. 30(7), 492–525 (2005)

    Article  Google Scholar 

  26. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP, pp. 14–21 (2002)

    Google Scholar 

  27. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002)

    Google Scholar 

  28. Wilkinson, K., Simitsis, A., Castellanos, M., Dayal, U.: Leveraging business process models for ETL design. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 15–30. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16373-9_2

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabila Berkani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Berkani, N., Bellatreche, L. (2017). A Variety-Sensitive ETL Processes. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64471-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64470-7

  • Online ISBN: 978-3-319-64471-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics