Abstract
Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adelberg, B., Garcia-Molina, H., Kao, B.: Applying update streams in a soft real-time database system. In: Proc. ACM SIGMOD Conf. (1995)
Agrawal, D., El Abbadi, A., Singh, A., Yurek, T.: Efficient view maintenance at data warehouses. In: Proc. ACM SIGMOD Conf. (1997)
Agrawakl, P., Silberstein, A., Cooler, B.F., Srivastava, U., Ramakrishnan, R.: Asynchronous view maintenance for VLSD databases. In: Proc. ACM SIGMOD Conf. (2009)
Balazinska, M., Kwon, Y.C., Kuchta, N., Lee, D.: Moirae: History-enhanced Monitoring. In: CIDR (2007)
Blakely, J., Larson, P., Tompa, F.: Efficiently updating materialized views. In: Proc. SIGMOD (1986)
de Boor, A.: Pmake: a tutorial, http://www.freebsd.org/doc/en/books/pmake/
Bunger, C.J., et al.: Aggregate maintenance for data warehousing in Informix Red Brick Vista. In: Proc. VLDB Conf. (2001)
Chen, Q., Hsu, M., Dayal, U.: A data-warehouse/OLAP framework for scalable telecommunication tandem traffic analysis. In: Proc. IEEE Intl. Conf. Data Engineering (2000)
Colby, L.S., Griffin, T., Libkin, L., Mumick, I.S., Trickey, H.: Algorithms for deferred view maintenance. In: Proc. ACM SIGMOD Conf. (1996)
Colby, L.S., Kawaguchi, K., Lieuwen, F.F., Mumick, I.S., Ross, K.A.: Supporting multiple view maintenance policies. In: Proc. ACM SIGMOD (1997)
Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk Gigascope, V.: A Stream Database for Network Applications. In: Proc. ACM SIGMOD, pp. 647–651 (2003)
Do, L., Drew, P., Jin, W., Jumani, V., Van Rossum, D.: Issues in developing very large data warehouses. In: Proc. VLDB Conf. (1998)
Fidge, C.J.: Timetamps in message-passing systems that preserve the partial ordering. In: Proc. 11th Australian Computer Science Conference (1988)
Folkert, N., et al.: Optimizing refresh of a set of materialized views. In: Proc. VLDB Conf. (2005)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative data cleaning: language, models, and algorithms. In: Proc. VLDB Conf. (2001)
Garcia-Molina, H., Labio, W.J., Yang, J.: Expiring data in a warehouse. In: Proc. VLDB Conf. (1998)
Golab, L., Garg, S., Özsu, M.T.: On indexing sliding windows over online data streams. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 712–729. Springer, Heidelberg (2004)
Golab, L., Johnson, T., Spencer, J.S., Shkapenyuk, V.: Stream Warehousing with DataDepot. In: Proc. ACM SIGMOD (2009)
Golab, L., Johnson, T., Shkapenyuk, V.: Scheduling updates in a real-time stream warehouse. In: Proc. Intl. Conf. Data Engineering (2009)
Golab, L., Ozsu, M.T.: Update-pattern-aware modeling and processing of continuous queries. In: Proc. ACM SIGMOD (2005)
Hull, R., Zhou, G.: A framework for supporting data integration using the materialized and virtual approaches. In: Proc. ACM SIGMOD Conf. (1996)
Inmon, W.H.: What is a data warehouse? Prism Solutions (1995)
Jha, A.K., Xiong, M., Ramamritham, K.: Mutual consistency in real-time databases. In: Proc. IEEE Real Time Systems Symposium (1996)
Krishnamurthy, S., et al.: Continuous analytics over discontinuous streams. In: Proc. SIGMOD (2010)
Labio, W.J., Wiener, J.L., Garcia-Molina, H., Gorelik, V.: Efficient resumption of interrupted database loads. In: Proc. ACM SIGMOD Conf. (2000)
Labrinidis, A., Roussopoulos, N.: Update propagation strategies for improving the quality of data on the web. In: Proc. VLDB Conf. (2001)
Lerner, A., Shasha, D.: The Virtues and Challenges of Ad Hoc Streams Querying in Finance. IEEE Data Engineering Bulletin 26(1), 49–56 (2003)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evalaution of sliding-window aggregates over data streams. SIGMOD Record 34(1), 39–44 (2005)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: Proc. ACM SIGMOD Conf. (2005)
Kalmanek, C.: Exploratory data mining in network and service management. In: IFIP/IEEE Intl. Symp. on Integrated Network Management (2009)
Koutsofios, E., North, S., Truscott, R., Keim, D.: Visualizing Large-Scale Telecommunication Networks and Services. IEEE Visualization, 457–461 (1999)
Labio, W.J., Wiener, J.L., Garcia-Molina, H., Gorelik, V.: Efficient resumption of Interrupted Warehouse Loads. In: Proc. ACM SIGMOD (2000)
Labio, W.J., Yerneni, R., Garcia-Molina, H.: Shrinking the Warehouse Update Window. In: Proc. ACM SIGMOD (1999)
Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialized view selection and maintenance using multi-query optimization. In: Proc. ACM SIGMOD (2001)
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of Data Cubes and Summary Tables in a Warehouse. In: Proc. ACM SIGMOD (1997)
Palpanas, T., Sidle, R., Cochrane, R., Pirahesh, H.: Incremental maintenance for non-distributive aggregate functions. In: Proc. VLDB (2002)
Petersen, K., et al.: Flexible update propagation for weakly consistent replication. In: Proc. Symp. on Operating System Principles (1997)
Quass, D., Widom, J.: On-line warehouse view maintenance. In: Proc. ACM SIGMOD (1997)
Riederwald, M., Agrawal, D., El Abbadi, A.: Efficient integration and aggregation of historical information. In: Proc. ACM SIGMOD Conf. (2002)
Rohm, U., Bohm, K., Schek, H.-J., Schuldt, H.: FAS – a freshness-sensitive coordination middleware for a cluster of OLAP components. In: Proc. VLDB Conf. (2002)
Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. In: Proc. ACM SIGMOD Conf. (2000)
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proc. VLDB (1998)
Shivakumar, N., Garcia-Molina, H.: Wave-indices : indexing evolving databases. In: SIGMOD (1997)
Taylor, R.: Concurrency in the Data Warehouse. In: Proc. VLDB (2000)
Tucker, P., Maier, D., Sheard, T., Fegaras, L.: Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. Knowledge and Data Engineering 15(3), 555–568 (2003)
Uppsala, K., Johnson, R., Chen, C., Hallmann, J., Hasan, W.: Peta-scale Data Warehousing at Yahoo! In: Proc. ACM SIGMOD Conf. (2009)
Welbourne, E., Koscher, K., Soroush, E., Balazinska, M., Borriello, G.: Longitudinal study of a building-scale RFID ecosystem. In: Proc. Intl. Conf. Mobile Systems, Applications, and Services (2009)
Welbourne, E., et al.: Cascadia: A system for specifying, detecting, and managing RFID events. In: Proc. Intl. Conf. Mobile Systems, Applications, and Services, MobiSys (2008)
Zhou, J., Larson, P.-A., Elmongui, H.G.: Lazy maintenance of materialized views. In: Proc. VLDB Conf. (2007)
Zhuge, Y., Garcia-Molina, H., Hammer, J., Widom, J.: View maintenance in a warehousing environment. In: Proc. ACM SIGMOD (1995)
Zhuge, Y., Wiener, J.L., Garcia-Molina, H.: Multiple view consistency for data warehousing. In: Proc. Intl. Conf. Data Engineering (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Johnson, T., Shkapenyuk, V. (2011). Update Propagation in a Streaming Warehouse. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-22351-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)