Skip to main content

An Efficient Heuristic for Logical Optimization of ETL Workflows

  • Conference paper
Enabling Real-Time Business Intelligence (BIRTE 2010)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 84))

Abstract

An ETL process is used to extract data from various sources, transform it and load it into a Data Warehouse. In this paper, we analyse an ETL flow and observe that only some of the dependencies in an ETL flow are essential while others are basically represents the flow of data. For the linear flows, we exploit the underlying dependency graph and develop a greedy heuristic technique to determine a reordering that significantly improves the quality of the flow. Rather than adopting a state-space search approach, we use the cost functions and selectivities to determine the best option at each position in a right-to-left manner. To deal with complex flows, we identify activities that can be transferred between linear segments in it and position those activities appropriately. We then use the re-orderings of the linear segments to obtain a cost-optimal semantically equivalent flow for a given complex flow. Experimental evaluation has shown that by using the proposed techniques, ETL flows can be better optimized and with much less effort compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Inmon, W.: Building the Data Warehouse, 3rd edn. Wiley & Sons, New York (2002)

    Google Scholar 

  2. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2002), pp. 14–21. ACM, New York (2002)

    Chapter  Google Scholar 

  3. Trujillo, J., Luján-Mora, S.: A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 307–320. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Eckerson, W., White, C.: http://www.dw-institute.com/etlreport (2003)

  5. IBM: IBM data warehouse manager, www3.ibm.com/software/data/db2/datawarehouse

  6. Oracle: Oracle warehouse builder 11g, http://www.oracle.com/technology/products/warehouse/

  7. Informatica: PowerCenter, http://www.informatica.com/products/data+integration/powercenter/default.htm

  8. Simitsis, A., Vassiliadis, P., Sellis, T.: State-Space Optimization of ETL Workflows. IEEE Trans. on Knowl. and Data Eng. 17(10), 1404–1419 (2005)

    Article  Google Scholar 

  9. Vassiliadis, P., Simitsis, A., Spiros, S.: Modeling ETL Activities as Graphs. In: 4th International Workshop on the Design and Management of Data Warehouses (DMDW 2002), pp. 52–61. IEEE Computer Society, Toronto (2002)

    Google Scholar 

  10. Vassiliadis, P., Simitsis, A., Baikousi, E.: A taxonomy of etl activities. In: DOLAP 2009: Proceeding of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 25–32. ACM, New York (2009)

    Chapter  Google Scholar 

  11. Tziovara, V., Vassiliadis, P., Simitsis, A.: Deciding the Physical Implementation of ETL Workflows. In: Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP (DOLAP 2007), pp. 49–56. ACM, New York (2007)

    Chapter  Google Scholar 

  12. Vassiliadis, P., Karagiannis, A., Tziovara, V., Simitsis, A.: Towards a Benchmark for ETL Workflows. In: Proceedings of the 5th International Workshop on Quality in Databases (QDB 2007), in Conjunction with the 33rd International Conference on Very Large Data Bases (VLDB 2007), pp. 117–137 (2007)

    Google Scholar 

  13. Elmasri, R., Navathe, S.: Fundamentals of Database Systems. Addison-Wesley Pubs., Reading (2000)

    MATH  Google Scholar 

  14. Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing ETL Processes in Data Warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575. IEEE Computer Society, Washington, DC (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, N., Kumar, P.S. (2011). An Efficient Heuristic for Logical Optimization of ETL Workflows. In: Castellanos, M., Dayal, U., Markl, V. (eds) Enabling Real-Time Business Intelligence. BIRTE 2010. Lecture Notes in Business Information Processing, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22970-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22970-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22969-5

  • Online ISBN: 978-3-642-22970-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics