Skip to main content

Avoiding Ontology Confusion in ETL Processes

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 539))

Abstract

Extract-Transform-Load (\(\mathcal {ETL}\)) is a crucial phase in Data Warehouse (\(\mathcal {DW}\)) design life-cycle that copes with many issues: data provenance, data heterogeneity, process automation, data refreshment, execution time, etc. Ontologies and Semantic Web technologies have been largely used in the \(\mathcal {ETL}\) phase. Ontologies are a buzzword used by many research communities such as: Databases, Artificial Intelligence (AI), Natural Language Processing (NLP), where each community has its type of ontologies: conceptual canonical ontologies (for databases), conceptual non-canonical ontologies (for AI), and linguistic ontologies (for NLP). In \(\mathcal {ETL}\) approaches, these three types of ontologies are considered. However, these studies do not consider the types of the used ontologies which usually affect the quality of the managed data. We propose in this paper a semantic \(\mathcal {ETL}\) approach which considers both canonical and non-canonical layers. To evaluate the effectiveness of our approach, experiments are conducted using Oracle semantic databases referencing LUBM benchmark ontology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 68–79. ACM (1999)

    Google Scholar 

  2. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR) 41(3), 16 (2009)

    Article  Google Scholar 

  3. Bellatreche, L., Dung, N.X., Pierra, G., Hondjack, D.: Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases. Computers in Industry 57(8), 711–724 (2006)

    Article  Google Scholar 

  4. Bellatreche, L., Khouri, S., Berkani, N.: Semantic data warehouse design: from etl to deployment à la carte. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 64–83. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Chakroun, C., Bellatreche, L., Ait-Ameur, Y., Berkani, N., Jean, S.: Be careful when designing semantic databases: data and concepts redundancy. In: 2013 IEEE Seventh International Conference on Research Challenges in Information Science (RCIS), pp. 1–12. IEEE (2013)

    Google Scholar 

  6. Golfarelli, M.: From user requirements to conceptual design in data warehouse design a survey. In: Data Warehousing Design and Advanced Engineering Applications Methods for Complex Construction, pp. 1–16 (2010)

    Google Scholar 

  7. Gruber, T.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)

    Article  Google Scholar 

  8. Jean, S., Pierra, G., Ameur, Y.A.: Domain ontologies: a database-oriented analysis. In: WEBIST (Selected Papers), pp. 238–254 (2006)

    Google Scholar 

  9. Lenzerini, M.: Data integration: a theoretical perspective. In: PODS, pp. 233–246 (2002)

    Google Scholar 

  10. Nebot, V., Berlanga, R.: Building data warehouses with semantic web data. Decision Support Systems (2012)

    Google Scholar 

  11. Niinimäki, M., Niemi, T.: An ETL process for OLAP using RDF/OWL ontologies. In: Spaccapietra, S., Zimányi, E., Song, I.-Y. (eds.) Journal on Data Semantics XIII. LNCS, vol. 5530, pp. 97–119. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Park, Y.R., Kim, J., Lee, H.W., Yoon, Y.J., Kim, J.H.: Gochase-ii: correcting semantic inconsistencies from gene ontology-based annotations for gene products. BMC Bioinformatics 12(1), 1–7 (2011)

    Article  Google Scholar 

  13. Romero, O., Simitsis, A., Abelló, A.: GEM: requirement-driven generation of ETL and multidimensional conceptual designs. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 80–95. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Simitsis, A., Skoutas, D., Castellanos, M.: Representation of conceptual etl designs in natural language using semantic web technology. Data & Knowledge Engineering 69(1), 96–115 (2010)

    Article  Google Scholar 

  15. Skoutas, D., Simitsis, A.: Ontology-based conceptual design of etl processes for both structured and semi-structured data. International Journal on Semantic Web and Information Systems (IJSWIS) 3(4), 1–24 (2007)

    Article  Google Scholar 

  16. Skoutas, D., Simitsis, A., Sellis, T.: Ontology-driven conceptual design of ETL processes using graph transformations. In: Spaccapietra, S., Zimányi, E., Song, I.-Y. (eds.) Journal on Data Semantics XIII. LNCS, vol. 5530, pp. 120–146. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Selma Khouri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Khouri, S., Abdellaoui, S., Nader, F. (2015). Avoiding Ontology Confusion in ETL Processes. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds) New Trends in Databases and Information Systems. ADBIS 2015. Communications in Computer and Information Science, vol 539. Springer, Cham. https://doi.org/10.1007/978-3-319-23201-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23201-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23200-3

  • Online ISBN: 978-3-319-23201-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics