Skip to main content

Performance Tests in Data Warehousing ETLM Process for Detection of Changes in Data Origin

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2737))

Included in the following conference series:

Abstract

In a data warehouse (DW) environment, when the operational environment does not posses or does not want to inform the data about the changes that occurred, controls have to be implemented to enable detection of these changes and to reflect them in the DW environment. The main scenarios are: i) the impossibility to instrument the DBMS (triggers, transaction log, stored procedures, replication, materialized views, old and new versions of data, etc) due to security policies, data property or performance issues; ii) the lack of instrumentation resources on the DBMS; iii) the use of legacy technologies such file systems or semi-structured data; iv) application proprietary databases and ERP systems. In another article [1], we presented the development and implementation of a technique that was derived for the comparison of database snapshots, where we use signatures to mark and detect changes. The technique is simple and can be applied to all four scenarios above. To prove the efficiency of our technique, in this article we do comparative performance tests between these approaches. We performed two benchmarks: the first one using synthetic data and the second one using the real data from a case study in the data warehouse project developed for Rio Sul Airlines, a regional aviation company belonging to the Brazil-based Varig group. We also describe the main approaches to solve the detection of changes in data origin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rocha, R. L. A., Cardoso, L. F., Souza, J. M., 2003, An Improved Approach in Data Warehousing ETLM Process for Detection of Changes in Data Origin. COPPE/UFRJ, Report No ES-593/03 http://www.cos.ufrj.br/publicacoes/reltec/es59303.pdf

  2. Do, L., Drew, P., Jin, W., et al.: Issues in Developing Very Large Databases. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 633–636 (August 1998)

    Google Scholar 

  3. Özsu, M.T., Valduriez, P.: Principles of Distributes Database Systems, 1st edn. Prentice Hall Inc., New Jersey (1991)

    Google Scholar 

  4. Zhuge, Y., Garcia-Molina, H., Hammer, J., et al.: View Maintenance in a Warehousing Environment. In: Proceedings of ACM SIGMOD International Conference on Management Data, San Jose, California, USA, pp. 316–327 (June 1995)

    Google Scholar 

  5. Zhuge, Y., Garcia-Molina, H., Wiener, J.L.: The Strobe Algorithms for Multi-Source Warehouse Consistency. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 146–157 (December 1996)

    Google Scholar 

  6. Quass, D., Widom, J.: On-Line Warehouse View Maintenance. In: Proceedings of ACM SIGMOD International Conference on Management Data, Tucson, Arizona, USA, pp. 405–416 (May 1997)

    Google Scholar 

  7. Hull, R., Zhou, G.: Towards the Study of Performance Trade-offs Between Materialized and Virtual Integrated Views. In: Proc. Workshop on Materialized Views: Techniques and Applications (VIEWS 1996), Canada, pp. 91–102 (June 1996)

    Google Scholar 

  8. Quass, D., Gupta, A., Mumick, I.S., et al.: Making Views Self-Maintainable for Data Warehousing. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 158–169 (December 1996)

    Google Scholar 

  9. Inmon, W.H., Kelley, C.: Rdb/VMS, developing the data warehouse. QED Pub. Group, Boston (1993)

    Google Scholar 

  10. Labio, W.J., Yerneni, R., Garcia-Molina, H.: Shrinking the Warehouse Update Window. In: Proceedings of ACM SIGMOD International Conference on Management Data, Philadelphia, USA, pp. 383–394 (June 1999)

    Google Scholar 

  11. Widom, J., Ceri, S.: Active Databases Systems: Triggers and Rules for Advanced Database Processing, San Francisco, California, USA (1996)

    Google Scholar 

  12. Craig, R.S., Vivona, J.A., Berkovitch, D.: Microsoft data warehousing building distributed decision support systems. Wiley, New York (1999)

    Google Scholar 

  13. Widom, J.: Research Problems in Data Warehousing. In: Proceedings of ACM CIKM International Conference on Management Data, USA, pp. 25–30 (November 1995)

    Google Scholar 

  14. Hammer, J., Garcia-Molina, H., Widom, J., et al.: The Stanford Data Warehousing Project. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing 18(2), 41–48 (1995)

    Google Scholar 

  15. Chawathe, S.S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of ACM SIGMOD International Conference on Management Data, Arizona, USA, pp. 26–37 (May1997)

    Google Scholar 

  16. Kimball, R.: Data Warehouse Toolkit. John Wiley & Sons, Inc., New York (1996)

    Google Scholar 

  17. Kimball, R.: The Data Warehouse Lifecycle Toolkit. In: Expert Methods for Designing, Developing, and Deploying Data Warehouses. John Wiley & Sons, Inc., New York (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rocha, R.L.A., Cardoso, L.F., de Souza, J.M. (2003). Performance Tests in Data Warehousing ETLM Process for Detection of Changes in Data Origin. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2003. Lecture Notes in Computer Science, vol 2737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45228-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45228-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40807-9

  • Online ISBN: 978-3-540-45228-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics