Skip to main content
Log in

Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The data warehouse technology has become the incontestable tool for businesses and organizations to make strategic decisions to ensure their competitively. The construction of a data warehouse (\(\mathcal{D}\mathcal{W}\)) passes by selecting relevant information sources, extracting relevant data and loading them into the \(\mathcal{D}\mathcal{W}\). These processes require a precise expertise from designers related to logical and physical implementations of information sources, which is not usually an easy task. The diversity and heterogeneity of information sources makes the construction process of the \(\mathcal{D}\mathcal{W}\) complex and time consuming. Domain ontologies have been proposed to reduce heterogeneity between sources, platforms, services, etc. They resolve syntax and semantic conflicts. The phenomenon of adopting domain ontologies by organizations creates a new type of databases, called semantic databases (\(\mathcal{S}\mathcal{D}\mathcal{B}\)). As a consequence, they become a candidate for building the semantic \(\mathcal{D}\mathcal{W}\) (\(\mathcal{S}\mathcal{D}\mathcal{W}\)). To handle the diversity of information sources and hide the implementations aspects of sources, proposing a generic framework for constructing \(\mathcal{D}\mathcal{W}\) becomes a necessity. In this paper, we first proposed an ontology-based approach for designing \(\mathcal{S}\mathcal {D}\mathcal{B}\). Secondly, ETL phases are defined at ontological level to hide the implementation details. Thirdly, a storage service for ontologies and its associated data is given. Finally, our proposal is validated through a case study and a tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. On-Line Analytical Processing.

  2. Business Process Modeling Notation.

  3. DLR is a subset of Description Logics (DL) formalism.

  4. http://www.omg.org/cgi-bin/doc?dtc/10-06-02.

  5. http://www.w3.org/2007/OWL/wiki/OracleOwlPrime.

  6. http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl.

References

  1. Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 778–787 (2005)

    Google Scholar 

  2. Jarke, M., Jeusfeld, M.A., Quix, C., Vassiliadis, P.: Architecture and quality in data warehouses: an extended repository approach. Inf. Syst. 24(3), 229–253 (1999)

    Article  Google Scholar 

  3. Liu, X., Thomsen, C., Pedersen, T.B.: Mapreduce-based dimensional ETL made easy. J. Proc. VLDB Endow. 5(12), 1882–1885 (2012)

    Google Scholar 

  4. Calvanese, D., Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Data integration in data warehousing. Int. J. Coop. Inf. Syst. 10(3), 237–271 (2001)

    Article  Google Scholar 

  5. Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: new wine or just new bottles? J. Proc. VLDB Endow. 3(2), 1647–1648 (2010)

    Google Scholar 

  6. Agrawal, D., El Abbadi, A., Wang, S.: Secure and privacy-preserving data services in the cloud: a data centric view. J. Proc. VLDB Endow. 5(12), 2028–2029 (2012)

    Google Scholar 

  7. Haase, P., Motik, B.: A mapping system for the integration of owl-dl ontologies. In: IHIS, pp. 9–16 (2005)

    Chapter  Google Scholar 

  8. Gruber, T.R.: A translation approach to portable ontology specifications. In: Knowledge Acquisition, vol. 5, pp. 199–220 (1993)

    Google Scholar 

  9. Bellatreche, L., Nguyen Xuan, D., Pierra, G., Dehainsala, H.: Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases. Comput. Ind. 57(8–9), 711–724 (2006)

    Article  Google Scholar 

  10. Fankam, C.: Ontodb2: un systme flexible et efficient de base de donnes base ontologique pour le web smantique et les donnes techniques. Poitiers University, Ph.D. Thesis (2009)

  11. Lu, J., Ma, L., Zhang, L., Brunner, J.S., Wang, C., Pan, Y., Yu, Y.: Sor: a practical system for ontology storage, reasoning and search. In: VLDB, pp. 1402–1405 (2007)

    Google Scholar 

  12. Wu, Z., Eadon, G., Das, S., Chong, E., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for rdfs/owl constructs and user-defined rules in oracle. In: ICDE, pp. 1239–1248 (2008)

    Google Scholar 

  13. Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G., Melchiori, M., Vincini, M.: Information integration: the momis project demonstration. In: VLDB Journal, pp. 611–614 (2000)

    Google Scholar 

  14. Mena, E., Illarramendi, A., Kashyap, V., Sheth, A.P.: Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib. Parallel Databases 8(2), 223–271 (2000)

    Article  Google Scholar 

  15. Wache, H., et al.: Ontology-based integration of information—a survey of existing approaches. In: OIS, pp. 108–117 (2001)

    Google Scholar 

  16. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP, pp. 14–21 (2002)

    Google Scholar 

  17. Trujillo, J., Luján-Mora, S.: A uml based approach for modeling ETL processes in data warehouses. In: ER, pp. 307–320 (2003)

    Google Scholar 

  18. Mazón, J.-N., Trujillo, J.: An mda approach for the development of data warehouses. In: JISBD, p. 208 (2009)

    Google Scholar 

  19. Wilkinson, K., Simitsis, A., Castellanos, M., Dayal, U.: Leveraging business process models for ETL design. In: ER, pp. 15–30 (2010)

    Google Scholar 

  20. Akkaoui, Z., Mazón, J., Vaisman, A., Zimányi, A.: Bpmn-based conceptual modeling of ETL processes. In: DaWaK, pp. 1–14 (2012)

    Google Scholar 

  21. Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: A principled approach to data integration and reconciliation in data warehousing. In: DMDW, p. 16 (1999)

    Google Scholar 

  22. Calvanese, D., Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Data integration in data warehousing. Int. J. Coop. Inf. Syst. 10(3), 237–271 (2001)

    Article  Google Scholar 

  23. Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data mapping diagrams for data warehouse design with uml. In: ER, pp. 191–204 (2004)

    Google Scholar 

  24. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002)

    Google Scholar 

  25. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Inf. Syst. 30(7), 492–525 (2005)

    Article  Google Scholar 

  26. Shmueli, O., Tsur, S.: Logical diagnosis of ldl programs. New Gener. Comput. 9(3/4), 277–304 (1991)

    Article  Google Scholar 

  27. Luján-Mora, S., Trujillo, J.: Physical modeling of data warehouses using uml component and deployment diagrams: design and implementation issues. J. Database Manag. 17(2), 12–42 (2006)

    Article  Google Scholar 

  28. Tziovara, P., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: DOLAP, pp. 49–56 (2007)

    Google Scholar 

  29. Simitsis, A., Vassiliadis, P., Sellis, T.-K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005)

    Google Scholar 

  30. Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010)

    Google Scholar 

  31. Microsoft: Sql server integration services (2008). Available online: http://technet.microsoft.com/fr-fr/library/ms141026.aspx

  32. Oracle: Oracle warehouse builder 11g release 2.1 (2009). Available online: http://www.oracle.com/technetwork/developer-tools/warehouse/documentation/library/index.html

  33. IBM: IBM infosphere datastage (2008). Available online: http://www-01.ibm.com/software/data/infosphere/datastage/

  34. Informatica: Informatica powercenter (2008). Available online: http://www.informatica.com/us/products/enterprise-data-integration/powercenter/

  35. Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int. J. Semantic Web Inf. Syst. 3(4), 1–24 (2007)

    Article  Google Scholar 

  36. Romero, O., Simitsis, A., Abelló, A.: Gem: requirement-driven generation of ETL and multidimensional conceptual designs. In: DaWaK, pp. 80–95 (2011)

    Google Scholar 

  37. Nebot, V., Berlanga, R.: Building data warehouses with semantic web data. Decis. Support Syst. 52(4), 853–868 (2012)

    Article  Google Scholar 

  38. Calvanese, D., Lenzerini, M., Nardi, D.: Description logics for conceptual data modeling. In: Logics for Databases and Information Systems, pp. 229–263 (1998)

    Chapter  Google Scholar 

  39. Brockmans, S., Haase, P., Serafini, L., Stuckenschmidt, H.: Formal and conceptual comparison of ontology mapping languages. In: Modular Ontologies, pp. 267–291 (2009)

    Chapter  Google Scholar 

  40. Guo, Y., Pan, Z., Heflin, J.: Lubm: a benchmark for owl knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)

    Article  Google Scholar 

  41. Mayr, C., Zdun, U., Dustdar, S.: Model-driven integration and management of data access objects in process-driven soas. In: ServiceWave, pp. 62–73 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ladjel Bellatreche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berkani, N., Bellatreche, L. & Khouri, S. Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Comput 16, 915–931 (2013). https://doi.org/10.1007/s10586-013-0266-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0266-7

Keywords

Navigation