Skip to main content

ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

Abstract

Modern software development methods such as Extreme Programming (XP) favor the use of frequently repeated tests, so-called regression tests, to catch new errors when software is updated or tuned, by checking that the software still produces the right results for a reference input. Regression testing is also very valuable for Extract–Transform–Load (ETL) software, as ETL software tends to be very complex and error-prone. However, regression testing of ETL software is currently cumbersome and requires large manual efforts. In this paper, we describe a novel, easy–to–use, and efficient semi–automatic test framework for regression test of ETL software. By automatically analyzing the schema, the tool detects how tables are related, and uses this knowledge, along with optional user specifications, to determine exactly what data warehouse (DW) data should be identical across test ETL runs, leaving out change-prone values such as surrogate keys. The framework also provides tools for quickly detecting and displaying differences between the current ETL results and the reference results. In summary, manual work for test setup is reduced to a minimum, while still ensuring an efficient testing procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley Professional, Reading (1999)

    Google Scholar 

  2. Chays, D., Dan, S., Frankl, P., Vokolos, F.I., Weyuker, E.J.: A Framework for Testing Database Applications. In: Proceedings of ISSTA 2000, pp. 147–157 (2000)

    Google Scholar 

  3. Christensen, C.A., Gundersborg, S., de Linde, K., Torp, K.: A Unit-Test Framework for Database Applications, TR-15, www.cs.aau.dk/DBTR

  4. Daou, B., Haraty, R.A., Mansour, N.: Regression Testing of Database Applications. In: Proceedings of SAC 2001, pp. 285–290 (2001)

    Google Scholar 

  5. Cobéna, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. TR (April 2002) (last accessed June 9, 2006), ftp://inria.fr/INRIA/Projects/verso/VersoReport-221.pdf

  6. dbunit.sourceforge.net (last accessed June 9, 2006)

  7. junit.org (last accessed June 9, 2006)

  8. Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Structure in Relational Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 138–148. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Kimball, R., Reeves, L., Ross, M., Thornthwaite, W.: The Data Warehouse Lifecycle Toolkit. Wiley, Chichester (1998)

    Google Scholar 

  10. Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, Chichester (2002)

    Google Scholar 

  11. Knudsen, S.U., Pedersen, T.B., Thomsen, C., Torp, K.: RELAXML: Bidirectional Transfer between Relational and XML Data. In: Proceedings of IDEAS 2005, pp. 151–162 (2005)

    Google Scholar 

  12. Microsoft Corporation. SQL Server Integration Services (last accessed June 9, 2006), www.microsoft.com/sql/technologies/integration/default.mspx

  13. Peters, L.: Change Detection in XML Trees: a Survey. In: 3rd Twente Student Conference on IT (2005), referaat.ewi.utwente.nl/documents/2005_03_B-DATA_AND_APPLICATION_INTEGRATION/

  14. www.saxproject.org (last accessed June 9, 2006)

  15. tpc.org/tpch/ (last accessed June 9, 2006)

  16. Willmor, D., Embury, S.: A safe regression test selection technique for database-driven applications. In: Proceedings of ICSM 2005, pp. 421–430 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thomsen, C., Pedersen, T.B. (2006). ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_1

Download citation

  • DOI: https://doi.org/10.1007/11823728_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37736-8

  • Online ISBN: 978-3-540-37737-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics