Skip to main content

Update Propagation in a Streaming Warehouse

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

Abstract

Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adelberg, B., Garcia-Molina, H., Kao, B.: Applying update streams in a soft real-time database system. In: Proc. ACM SIGMOD Conf. (1995)

    Google Scholar 

  2. Agrawal, D., El Abbadi, A., Singh, A., Yurek, T.: Efficient view maintenance at data warehouses. In: Proc. ACM SIGMOD Conf. (1997)

    Google Scholar 

  3. Agrawakl, P., Silberstein, A., Cooler, B.F., Srivastava, U., Ramakrishnan, R.: Asynchronous view maintenance for VLSD databases. In: Proc. ACM SIGMOD Conf. (2009)

    Google Scholar 

  4. Balazinska, M., Kwon, Y.C., Kuchta, N., Lee, D.: Moirae: History-enhanced Monitoring. In: CIDR (2007)

    Google Scholar 

  5. Blakely, J., Larson, P., Tompa, F.: Efficiently updating materialized views. In: Proc. SIGMOD (1986)

    Google Scholar 

  6. de Boor, A.: Pmake: a tutorial, http://www.freebsd.org/doc/en/books/pmake/

  7. Bunger, C.J., et al.: Aggregate maintenance for data warehousing in Informix Red Brick Vista. In: Proc. VLDB Conf. (2001)

    Google Scholar 

  8. Chen, Q., Hsu, M., Dayal, U.: A data-warehouse/OLAP framework for scalable telecommunication tandem traffic analysis. In: Proc. IEEE Intl. Conf. Data Engineering (2000)

    Google Scholar 

  9. Colby, L.S., Griffin, T., Libkin, L., Mumick, I.S., Trickey, H.: Algorithms for deferred view maintenance. In: Proc. ACM SIGMOD Conf. (1996)

    Google Scholar 

  10. Colby, L.S., Kawaguchi, K., Lieuwen, F.F., Mumick, I.S., Ross, K.A.: Supporting multiple view maintenance policies. In: Proc. ACM SIGMOD (1997)

    Google Scholar 

  11. Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk Gigascope, V.: A Stream Database for Network Applications. In: Proc. ACM SIGMOD, pp. 647–651 (2003)

    Google Scholar 

  12. Do, L., Drew, P., Jin, W., Jumani, V., Van Rossum, D.: Issues in developing very large data warehouses. In: Proc. VLDB Conf. (1998)

    Google Scholar 

  13. Fidge, C.J.: Timetamps in message-passing systems that preserve the partial ordering. In: Proc. 11th Australian Computer Science Conference (1988)

    Google Scholar 

  14. Folkert, N., et al.: Optimizing refresh of a set of materialized views. In: Proc. VLDB Conf. (2005)

    Google Scholar 

  15. Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative data cleaning: language, models, and algorithms. In: Proc. VLDB Conf. (2001)

    Google Scholar 

  16. Garcia-Molina, H., Labio, W.J., Yang, J.: Expiring data in a warehouse. In: Proc. VLDB Conf. (1998)

    Google Scholar 

  17. Golab, L., Garg, S., Özsu, M.T.: On indexing sliding windows over online data streams. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 712–729. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Golab, L., Johnson, T., Spencer, J.S., Shkapenyuk, V.: Stream Warehousing with DataDepot. In: Proc. ACM SIGMOD (2009)

    Google Scholar 

  19. Golab, L., Johnson, T., Shkapenyuk, V.: Scheduling updates in a real-time stream warehouse. In: Proc. Intl. Conf. Data Engineering (2009)

    Google Scholar 

  20. Golab, L., Ozsu, M.T.: Update-pattern-aware modeling and processing of continuous queries. In: Proc. ACM SIGMOD (2005)

    Google Scholar 

  21. Hull, R., Zhou, G.: A framework for supporting data integration using the materialized and virtual approaches. In: Proc. ACM SIGMOD Conf. (1996)

    Google Scholar 

  22. Inmon, W.H.: What is a data warehouse? Prism Solutions (1995)

    Google Scholar 

  23. Jha, A.K., Xiong, M., Ramamritham, K.: Mutual consistency in real-time databases. In: Proc. IEEE Real Time Systems Symposium (1996)

    Google Scholar 

  24. Krishnamurthy, S., et al.: Continuous analytics over discontinuous streams. In: Proc. SIGMOD (2010)

    Google Scholar 

  25. Labio, W.J., Wiener, J.L., Garcia-Molina, H., Gorelik, V.: Efficient resumption of interrupted database loads. In: Proc. ACM SIGMOD Conf. (2000)

    Google Scholar 

  26. Labrinidis, A., Roussopoulos, N.: Update propagation strategies for improving the quality of data on the web. In: Proc. VLDB Conf. (2001)

    Google Scholar 

  27. Lerner, A., Shasha, D.: The Virtues and Challenges of Ad Hoc Streams Querying in Finance. IEEE Data Engineering Bulletin 26(1), 49–56 (2003)

    Google Scholar 

  28. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evalaution of sliding-window aggregates over data streams. SIGMOD Record 34(1), 39–44 (2005)

    Article  Google Scholar 

  29. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: Proc. ACM SIGMOD Conf. (2005)

    Google Scholar 

  30. Kalmanek, C.: Exploratory data mining in network and service management. In: IFIP/IEEE Intl. Symp. on Integrated Network Management (2009)

    Google Scholar 

  31. Koutsofios, E., North, S., Truscott, R., Keim, D.: Visualizing Large-Scale Telecommunication Networks and Services. IEEE Visualization, 457–461 (1999)

    Google Scholar 

  32. Labio, W.J., Wiener, J.L., Garcia-Molina, H., Gorelik, V.: Efficient resumption of Interrupted Warehouse Loads. In: Proc. ACM SIGMOD (2000)

    Google Scholar 

  33. Labio, W.J., Yerneni, R., Garcia-Molina, H.: Shrinking the Warehouse Update Window. In: Proc. ACM SIGMOD (1999)

    Google Scholar 

  34. Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialized view selection and maintenance using multi-query optimization. In: Proc. ACM SIGMOD (2001)

    Google Scholar 

  35. Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of Data Cubes and Summary Tables in a Warehouse. In: Proc. ACM SIGMOD (1997)

    Google Scholar 

  36. Palpanas, T., Sidle, R., Cochrane, R., Pirahesh, H.: Incremental maintenance for non-distributive aggregate functions. In: Proc. VLDB (2002)

    Google Scholar 

  37. Petersen, K., et al.: Flexible update propagation for weakly consistent replication. In: Proc. Symp. on Operating System Principles (1997)

    Google Scholar 

  38. Quass, D., Widom, J.: On-line warehouse view maintenance. In: Proc. ACM SIGMOD (1997)

    Google Scholar 

  39. Riederwald, M., Agrawal, D., El Abbadi, A.: Efficient integration and aggregation of historical information. In: Proc. ACM SIGMOD Conf. (2002)

    Google Scholar 

  40. Rohm, U., Bohm, K., Schek, H.-J., Schuldt, H.: FAS – a freshness-sensitive coordination middleware for a cluster of OLAP components. In: Proc. VLDB Conf. (2002)

    Google Scholar 

  41. Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. In: Proc. ACM SIGMOD Conf. (2000)

    Google Scholar 

  42. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proc. VLDB (1998)

    Google Scholar 

  43. Shivakumar, N., Garcia-Molina, H.: Wave-indices : indexing evolving databases. In: SIGMOD (1997)

    Google Scholar 

  44. Taylor, R.: Concurrency in the Data Warehouse. In: Proc. VLDB (2000)

    Google Scholar 

  45. Tucker, P., Maier, D., Sheard, T., Fegaras, L.: Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. Knowledge and Data Engineering 15(3), 555–568 (2003)

    Article  Google Scholar 

  46. Uppsala, K., Johnson, R., Chen, C., Hallmann, J., Hasan, W.: Peta-scale Data Warehousing at Yahoo! In: Proc. ACM SIGMOD Conf. (2009)

    Google Scholar 

  47. Welbourne, E., Koscher, K., Soroush, E., Balazinska, M., Borriello, G.: Longitudinal study of a building-scale RFID ecosystem. In: Proc. Intl. Conf. Mobile Systems, Applications, and Services (2009)

    Google Scholar 

  48. Welbourne, E., et al.: Cascadia: A system for specifying, detecting, and managing RFID events. In: Proc. Intl. Conf. Mobile Systems, Applications, and Services, MobiSys (2008)

    Google Scholar 

  49. Zhou, J., Larson, P.-A., Elmongui, H.G.: Lazy maintenance of materialized views. In: Proc. VLDB Conf. (2007)

    Google Scholar 

  50. Zhuge, Y., Garcia-Molina, H., Hammer, J., Widom, J.: View maintenance in a warehousing environment. In: Proc. ACM SIGMOD (1995)

    Google Scholar 

  51. Zhuge, Y., Wiener, J.L., Garcia-Molina, H.: Multiple view consistency for data warehousing. In: Proc. Intl. Conf. Data Engineering (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Johnson, T., Shkapenyuk, V. (2011). Update Propagation in a Streaming Warehouse. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22351-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22350-1

  • Online ISBN: 978-3-642-22351-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics