skip to main content
10.1145/1317331.1317341acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Deciding the physical implementation of ETL workflows

Published: 09 November 2007 Publication History

Abstract

In this paper, we deal with the problem of determining the best possible physical implementation of an ETL workflow, given its logical-level description and an appropriate cost model as inputs. We formulate the problem as a state-space problem and provide a suitable solution for this task. We further extend this technique by intentionally introducing sorter activities in the workflow in order to search for alternative physical implementations with lower cost. We experimentally assess our method based on a principled organization of test suites.

References

[1]
J. M. Hellerstein. Optimization Techniques for Queries with Expensive Methods. ACM Trans. Database Syst., 23(2):113--157, 1998.
[2]
T. Neumann and G. Moerkotte. An Efficient Framework for Order Optimization. In ICDE, pages 461--472, 2004.
[3]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In SIGMOD, pages 23--34, 1979.
[4]
A. Simitsis, P. Vassiliadis, and T. K. Sellis. Optimizing ETL Processes in Data Warehouses. In ICDE, pages 564--575, 2005.
[5]
A. Simitsis, P. Vassiliadis, and T. K. Sellis. State-Space Optimization of ETL Workflows. IEEE Trans. Knowl. Data Eng., 17(10):1404--1419, 2005.
[6]
D. E. Simmen, E. J. Shekita, and T. Malkemus. Fundamental Techniques for Order Optimization. In SIGMOD, pages 57--67, 1996.
[7]
V. Tziovara. Order-Aware ETL Workflows. Master's thesis, University of Ioannina. Available as a TR at http://www.cs.uoi.gr, 2006.
[8]
P. Vassiliadis, A. Karagiannis, V. Tziovara, and A. Simitsis. Towards a Benchmark for ETL workflows. In QDB'07 (in conj. with {VLDB'07), 2007.
[9]
X. Wang and M. Cherniack. Avoiding Ordering and Grouping In Query Processing. In VLDB, pages 826--837, 2003.

Cited By

View all

Index Terms

  1. Deciding the physical implementation of ETL workflows

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DOLAP '07: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
    November 2007
    112 pages
    ISBN:9781595938275
    DOI:10.1145/1317331
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ETL
    2. data warehousing
    3. optimization
    4. physical design

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 29 of 79 submissions, 37%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Radiation Effects in VLSI Circuits - Part II: Hardening TechniquesIETE Technical Review10.1080/02564602.2024.2389802(1-27)Online publication date: 25-Sep-2024
    • (2024)BIGOWL4DQInformation and Software Technology10.1016/j.infsof.2023.107378167:COnline publication date: 1-Mar-2024
    • (2024)ForestAdvisorEnvironmental Modelling & Software10.1016/j.envsoft.2024.106190181:COnline publication date: 18-Nov-2024
    • (2024)VeriBypasserComputer Communications10.1016/j.comcom.2023.12.022217:C(246-258)Online publication date: 25-Jun-2024
    • (2023)Declarative RDF graph generation from heterogeneous (semi-)structured dataWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2022.10075375:COnline publication date: 1-Jan-2023
    • (2023)A semi-automatic data integration process of heterogeneous databasesPattern Recognition Letters10.1016/j.patrec.2023.01.007166:C(134-142)Online publication date: 1-Feb-2023
    • (2023)Data Integration Process Automation Using Machine Learning: Issues and SolutionMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_3(39-54)Online publication date: 26-Feb-2023
    • (2023)Towards a Model-Driven Approach for Big Data Analytics in the Genomics FieldAdvances in Conceptual Modeling10.1007/978-3-031-22036-4_1(5-14)Online publication date: 1-Jan-2023
    • (2022)Intelligent Assistance with ML in Data Mapping ETL Processing2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS)10.1109/ITSIS56166.2022.10118369(01-04)Online publication date: 15-Jul-2022
    • (2022)How can dense results be differentiated in comprehensive evaluations? A hybrid information filtering modelKnowledge-Based Systems10.1016/j.knosys.2021.107658235:COnline publication date: 10-Jan-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media