skip to main content
10.1145/1183512.1183526acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Designing ETL processes using semantic web technologies

Published: 10 November 2006 Publication History

Abstract

One of the most important tasks performed in the early stages of a data warehouse project is the analysis of the structure and content of the existing data sources and their intentional mapping to a common data model. Establishing the appropriate mappings between the attributes of the data sources and the attributes of the data warehouse tables is critical in specifying the required transformations in an ETL workflow. The selected data model should besuitable for facilitating the redefinition and revision efforts, typically occurring during the early phases of a data warehouse project, and serve as the means of communication between the involved parties. In this paper, we argue that ontologies constitute a very suitable model for this purpose and show how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design.

References

[1]
Arens, Y., Hsu, C.-H., Knoblock, C. Query Processing in the Sims Information Mediator. Advanced Planning Tech., 1996.
[2]
Ascential Software Inc.url:http://www.ascentialsoftware.com
[3]
Baader, F., McGuiness, D.L., Nardi, D., Patel-Schneider, P. (Eds). Description Logic Handbook: Theory, implementation and applications. Cambridge University Press, 2002.
[4]
Ballard, C. Data Modeling Techniques for Data Warehousing. IBM Red Book, ISBN 0738402451, 1998.
[5]
Boehnlein, M., Ulbrich-vom Ende, A. Deriving the Initial Data Warehouse Structures from the Conceptual Data Models of the Underlying Operational Information Systems. DOLAP, 1999.
[6]
Borst, W.N. Construction of Engineering Ontologies. PhD thesis, University of Twente, Enschede, 1997.
[7]
Goh, C.H. Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Sources. MIT, 1997.
[8]
Golfarelli, M., Rizzi, S. Methodological Framework for Data Warehouse Design. DOLAP, 1998.
[9]
Hahn, K., Sapia, C., Blaschka, M. Automatically Generating OLAP Schemata from Conceptual Graphical Models. DOLAP, 2000.
[10]
IBM. IBM Data Warehouse Manager. url: http://www-3.ibm.com/software/data/db2/datawarehouse/
[11]
Informatica. PowerCenter. url:http://www.informatica.com/products/data +integration/powercenter/default.htm
[12]
Kimball, R., et al. The Data Warehouse Lifecycle Toolkit. John Wiley & Sons, 1998.
[13]
Luján-Mora, S., Vassiliadis, P., Trujillo, J. Data Mapping Diagrams for Data Warehouse Design with UML. ER, 2004.
[14]
Mazon, J-N., Trujillo, J., Serrano, M., Piattini, M. Applying MDA to the development of data warehouses. DOLAP, 2005.
[15]
Mena, E., Kashyap, V., Sheth, A., Illarramendi, A. Observer: An Approach for Query Processing in Global Information Systems Based on Interoperability Between Pre-Existing Ontologies. CoopIS, 1996.
[16]
Microsoft. Data Transformation Services. url: www.microsoft.com
[17]
Moody, D.L., Kortink, M.A.R. From Enterprise Models to Dimensional Models: a Methodology for Data Warehouse and Data Mart Design. DMDW, 2000.
[18]
Oracle. Oracle Warehouse Builder Product Page. url: http://otn.oracle.com/products/warehouse/content.html
[19]
Papadakis, N., Skoutas, D., Raftopoulos, K., Varvarigou, T. STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques. IEEE TKDE 17(12), 2005.
[20]
Peralta, V. Data Warehouse Logical Design from Multidimensional Conceptual Schemas. CLEI, 2003.
[21]
Phipps, C., Davis, K. Automating Data Warehouse Conceptual Schema Design and Evaluation. DMDW, 2002.
[22]
Simitsis, A. Mapping Conceptual to Logical Models for ETL Processes. DOLAP, 2005.
[23]
Simitsis, A., Vassiliadis, P., Sellis, T. State-Space Optimization of ETL Workflows. IEEE TKDE 17(10), 2005.
[24]
Smith, M.K., Welty, C., McGuinness, D.L. OWL Web Ontology Language Guide. W3C Recommendation. 2004.
[25]
Trujillo, J., Lujan-Mora, S. A UML Based Approach for Modeling ETL Processes in Data Warehouses. ER, 2003.
[26]
Wache, H., et al. Ontology-Based Integration of Information A Survey of Existing Approaches. IJCAI workshop on Ontologies and Information Sharing, 2001.
[27]
Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S. A Generic and Customizable Framework for the Design of ETL Scenarios. Information Systems 30(7), 2005.
[28]
Vassiliadis, P., Simitsis, A., Skiadopoulos, S. Conceptual Modeling for ETL Processes. DOLAP, 2002.

Cited By

View all
  • (2024)An efficient hybrid optimization of ETL process in data warehouse of cloud architectureJournal of Cloud Computing10.1186/s13677-023-00571-y13:1Online publication date: 8-Jan-2024
  • (2023)Data Integration Process Automation Using Machine Learning: Issues and SolutionMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_3(39-54)Online publication date: 26-Feb-2023
  • (2022)Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and ComparisonsData10.3390/data70801137:8(113)Online publication date: 12-Aug-2022
  • Show More Cited By

Index Terms

  1. Designing ETL processes using semantic web technologies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DOLAP '06: Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
    November 2006
    110 pages
    ISBN:1595935304
    DOI:10.1145/1183512
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ETL
    2. conceptual modeling
    3. data warehousing
    4. ontologies
    5. reasoning
    6. semantic web technology
    7. transformations

    Qualifiers

    • Article

    Conference

    CIKM06
    CIKM06: Conference on Information and Knowledge Management
    November 10, 2006
    Virginia, Arlington, USA

    Acceptance Rates

    Overall Acceptance Rate 29 of 79 submissions, 37%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An efficient hybrid optimization of ETL process in data warehouse of cloud architectureJournal of Cloud Computing10.1186/s13677-023-00571-y13:1Online publication date: 8-Jan-2024
    • (2023)Data Integration Process Automation Using Machine Learning: Issues and SolutionMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_3(39-54)Online publication date: 26-Feb-2023
    • (2022)Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and ComparisonsData10.3390/data70801137:8(113)Online publication date: 12-Aug-2022
    • (2022)Intelligent Assistance with ML in Data Mapping ETL Processing2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS)10.1109/ITSIS56166.2022.10118369(01-04)Online publication date: 15-Jul-2022
    • (2022)Automated credit assessment framework using ETL process and machine learningInnovations in Systems and Software Engineering10.1007/s11334-022-00522-xOnline publication date: 31-Dec-2022
    • (2021)Knowledge Graph-based Data Transformation Recommendation Engine2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671905(4617-4623)Online publication date: 15-Dec-2021
    • (2021)Decision Support on the Shop Floor Using Digital TwinsAdvances in Production Management Systems. Artificial Intelligence for Sustainable and Resilient Production Systems10.1007/978-3-030-85874-2_30(284-292)Online publication date: 31-Aug-2021
    • (2020)Data Warehouses and Big DataInternational Journal of Organizational and Collective Intelligence10.4018/IJOCI.202007010110:3(1-13)Online publication date: Jul-2020
    • (2020)Role of Machine Learning in ETL AutomationProceedings of the 21st International Conference on Distributed Computing and Networking10.1145/3369740.3372778(1-6)Online publication date: 4-Jan-2020
    • (2020)Healthcare Social Data Platform Based on Linked Data and Machine LearningEmbedded Systems and Artificial Intelligence10.1007/978-981-15-0947-6_28(291-304)Online publication date: 8-Apr-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media