Abstract
Modern software development methods such as Extreme Programming (XP) favor the use of frequently repeated tests, so-called regression tests, to catch new errors when software is updated or tuned, by checking that the software still produces the right results for a reference input. Regression testing is also very valuable for Extract–Transform–Load (ETL) software, as ETL software tends to be very complex and error-prone. However, regression testing of ETL software is currently cumbersome and requires large manual efforts. In this paper, we describe a novel, easy–to–use, and efficient semi–automatic test framework for regression test of ETL software. By automatically analyzing the schema, the tool detects how tables are related, and uses this knowledge, along with optional user specifications, to determine exactly what data warehouse (DW) data should be identical across test ETL runs, leaving out change-prone values such as surrogate keys. The framework also provides tools for quickly detecting and displaying differences between the current ETL results and the reference results. In summary, manual work for test setup is reduced to a minimum, while still ensuring an efficient testing procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley Professional, Reading (1999)
Chays, D., Dan, S., Frankl, P., Vokolos, F.I., Weyuker, E.J.: A Framework for Testing Database Applications. In: Proceedings of ISSTA 2000, pp. 147–157 (2000)
Christensen, C.A., Gundersborg, S., de Linde, K., Torp, K.: A Unit-Test Framework for Database Applications, TR-15, www.cs.aau.dk/DBTR
Daou, B., Haraty, R.A., Mansour, N.: Regression Testing of Database Applications. In: Proceedings of SAC 2001, pp. 285–290 (2001)
Cobéna, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. TR (April 2002) (last accessed June 9, 2006), ftp://inria.fr/INRIA/Projects/verso/VersoReport-221.pdf
dbunit.sourceforge.net (last accessed June 9, 2006)
junit.org (last accessed June 9, 2006)
Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Structure in Relational Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 138–148. Springer, Heidelberg (2004)
Kimball, R., Reeves, L., Ross, M., Thornthwaite, W.: The Data Warehouse Lifecycle Toolkit. Wiley, Chichester (1998)
Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, Chichester (2002)
Knudsen, S.U., Pedersen, T.B., Thomsen, C., Torp, K.: RELAXML: Bidirectional Transfer between Relational and XML Data. In: Proceedings of IDEAS 2005, pp. 151–162 (2005)
Microsoft Corporation. SQL Server Integration Services (last accessed June 9, 2006), www.microsoft.com/sql/technologies/integration/default.mspx
Peters, L.: Change Detection in XML Trees: a Survey. In: 3rd Twente Student Conference on IT (2005), referaat.ewi.utwente.nl/documents/2005_03_B-DATA_AND_APPLICATION_INTEGRATION/
www.saxproject.org (last accessed June 9, 2006)
tpc.org/tpch/ (last accessed June 9, 2006)
Willmor, D., Embury, S.: A safe regression test selection technique for database-driven applications. In: Proceedings of ICSM 2005, pp. 421–430 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomsen, C., Pedersen, T.B. (2006). ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_1
Download citation
DOI: https://doi.org/10.1007/11823728_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)