Skip to main content

Influence of Balancing Used in a Distributed Data Warehouse on the Extraction Process

  • Conference paper
Trends in Enterprise Application Architecture (TEAA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3888))

Abstract

A data warehouse is filled with data during the extraction process. Such a process is sometimes interrupted by occurrence of a failure. After a failure the warehouse contains an incomplete data set, a part of the set is missing. To load the missing part of the data one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of data balancing used in a distributed data warehouse on the efficiency of extraction and resumption processes. During resumption we base on the Design-Resume algorithm which imposes no additional overhead on an uninterrupted extraction process. We present how the balancing is done and examine its influence on the ETL process efficiency. Finally, basing on the results of performed tests, we discuss advantages and disadvantages of the balancing with respect to the ETL process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bruckner, R., List, B., Schiefer, J.: Striving Towards Near Real-Time Data Integration for Data Warehouses. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 317–326. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Galhardas, H., Florescu, D., Shasha, D., Simon, E.: Ajax: An Extensible Data Cleaning-Tool. In: Proc. ACM SIGMOD Intl. Conf. on the Management of Data, Teksas (2000)

    Google Scholar 

  3. Gorawski, M., Malczok, R.: Distributed Spatial DataWarehouse Indexed with Virtual Memory Aggregation Tree. In: 5th Workshop on Spatial-Temporal DataBase Management (STDBM VLDB 2004), Toronto, Canada (2004)

    Google Scholar 

  4. Gorawski, M., Chechelski, R.: Spatial Telemetric Data Warehouse Balancing Algorithm in Oracle9i/Java Environment. In: Intelligent Information Systems, Gdansk, Poland (2005)

    Google Scholar 

  5. Labio, W., Wiener, J., Garcia-Molina, H., Gorelik, V.: Efficient resumption of interrupted warehouse loads. In: SIGMOD Conference (2000)

    Google Scholar 

  6. Labio, W., Wiener, J., Garcia-Molina, H., Gorelik, V.: Resumption algorithms. Technical report, Stanford University (1998)

    Google Scholar 

  7. Sagent Technologies Inc.: Personal correspondence with customers

    Google Scholar 

  8. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL Activities asGraphs. In: Proc. 4th Intl. Workshop on Design and Management of Data Warehouses, Canada (2002)

    Google Scholar 

  9. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M.: A Framework for the Design of ETL Scenarios. In: Eder, J., Missikoff, M. (eds.) CAiSE 2003. LNCS, vol. 2681. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gorawski, M., Marks, P. (2006). Influence of Balancing Used in a Distributed Data Warehouse on the Extraction Process. In: Draheim, D., Weber, G. (eds) Trends in Enterprise Application Architecture. TEAA 2005. Lecture Notes in Computer Science, vol 3888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11681885_8

Download citation

  • DOI: https://doi.org/10.1007/11681885_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32734-9

  • Online ISBN: 978-3-540-32735-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics