Skip to main content

Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 5530))

Abstract

One of the main tasks during the early steps of a data warehouse project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the data warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: DOLAP, pp. 14–21 (2002)

    Google Scholar 

  2. Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data Mapping Diagrams for Data Warehouse Design with UML. In: ER, pp. 191–204 (2004)

    Google Scholar 

  3. Trujillo, J., Luján-Mora, S.: A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: ER, pp. 307–320 (2003)

    Google Scholar 

  4. IBM: IBM Data Warehouse Manager (2006), http://www.ibm.com/software/data/db2/datawarehouse/

  5. Informatica: Informatica PowerCenter (2007), http://www.informatica.com/products/powercenter/

  6. Microsoft: Microsoft Data Transformation Services (2007), http://www.microsoft.com/sql/prodinfo/features/

  7. Oracle: Oracle Warehouse Builder (2007), http://www.oracle.com/technology/products/warehouse/

  8. Hüsemann, B., Lechtenbörger, J., Vossen, G.: Conceptual Data Warehouse Modeling. In: DMDW, p. 6 (2000)

    Google Scholar 

  9. Borst, W.N.: Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis, University of Enschede (1997)

    Google Scholar 

  10. Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. Int. J. Semantic Web Inf. Syst. 3(4), 1–24 (2007)

    Google Scholar 

  11. Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  12. Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Simitsis, A., Skoutas, D., Castellanos, M.: Natural Language Reporting for ETL Processes. In: DOLAP, pp. 65–72 (2008)

    Google Scholar 

  14. Skoutas, D., Simitsis, A.: Flexible and Customizable NL Representation of Requirements for ETL processes. In: NLDB, pp. 433–439 (2007)

    Google Scholar 

  15. Manola, F., Miller, E.: Rdf primer. W3C Recommendation, W3C (February 2004)

    Google Scholar 

  16. Brickley, D., Guha, R.: Rdf vocabulary description language 1.0: Rdf schema. W3C Recommendation, W3C (February 2004)

    Google Scholar 

  17. McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation, W3C (February 2004)

    Google Scholar 

  18. Skoutas, D., Simitsis, A.: Designing ETL Processes Using Semantic Web Technologies. In: DOLAP, pp. 67–74 (2006)

    Google Scholar 

  19. Rozenberg, G. (ed.): Handbook of Graph Grammars and Computing by Graph Transformations. Foundations, vol. 1. World Scientific, Singapore (1997)

    MATH  Google Scholar 

  20. Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-Space Optimization of ETL Workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)

    Article  Google Scholar 

  21. Tzitzikas, Y., Hainaut, J.L.: How to Tame a Very Large ER Diagram (Using Link Analysis and Force-Directed Drawing Algorithms). In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 144–159. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  22. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A Generic and Customizable Framework for the Design of ETL Scenarios. Inf. Syst. 30(7), 492–525 (2005)

    Article  Google Scholar 

  23. AGG: AGG Homepage (2007), http://tfs.cs.tu-berlin.de/agg

  24. Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Policy-regulated Management of ETL Evolution. J. Data Semantics (to appear)

    Google Scholar 

  25. Mazón, J.N., Trujillo, J.: Enriching data warehouse dimension hierarchies by using semantic relations. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 278–281. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/Grid in Data Integration for OLAP. Int. J. Semantic Web Inf. Syst. 3(4), 25–49 (2007)

    Google Scholar 

  27. Romero, O., Abelló, A.: Automating Multidimensional Design from Ontologies. In: DOLAP, pp. 1–8 (2007)

    Google Scholar 

  28. Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  29. Gottlob, G.: Web Data Extraction for Business Intelligence: The Lixto Approach. In: BTW, pp. 30–47 (2005)

    Google Scholar 

  30. Mazón, J.N., Trujillo, J., Serrano, M., Piattini, M.: Applying MDA to the development of data warehouses. In: DOLAP, pp. 57–66 (2005)

    Google Scholar 

  31. QVT: QVT (2007), http://www.omg.org/docs/ptc/07-07-07.pdf

  32. Ehrig, K., Guerra, E., de Lara, J., Lengyel, L., Levendovszky, T., Prange, U., Taentzer, G., Varró, D., Gyapay, S.V.: Model transformation by graph transformation: A comparative study. In: MTiP (2005)

    Google Scholar 

  33. Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. SMC 13(3), 353–362 (1983)

    MATH  Google Scholar 

  34. Messmer, B.T., Bunke, H.: A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 493–504 (1998)

    Article  Google Scholar 

  35. Myers, R., Wilson, R.C., Hancock, E.R.: Bayesian Graph Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 628–635 (2000)

    Article  Google Scholar 

  36. Yahoo!: Pipes (2007), http://pipes.yahoo.com/

  37. Microsoft: Popfly (2007), http://www.popfly.com/

  38. Google: Mashup Editor (2007), http://www.googlemashups.com/

  39. Huynh, D.F., Miller, R.C., Karger, D.R.: Potluck: Semi-ontology alignment for casual users. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 903–910. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  40. Ambite, J.L., Kapoor, D.: Automatically Composing Data Workflows with Relational Descriptions and Shim Services. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 15–29. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  41. Petrovic, M., Liu, H., Jacobsen, H.A.: G-ToPSS: Fast Filtering of Graph-based Metadata. In: WWW, pp. 539–547 (2005)

    Google Scholar 

  42. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation. In: Spaccapietra, S., Atzeni, P., Fages, F., Hacid, M.-S., Kifer, M., Mylopoulos, J., Pernici, B., Shvaiko, P., Trujillo, J., Zaihrayeu, I. (eds.) Journal on Data Semantics IX. LNCS, vol. 4601, pp. 1–38. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Skoutas, D., Simitsis, A., Sellis, T. (2009). Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations. In: Spaccapietra, S., Zimányi, E., Song, IY. (eds) Journal on Data Semantics XIII. Lecture Notes in Computer Science, vol 5530. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03098-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03098-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03097-0

  • Online ISBN: 978-3-642-03098-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics