Abstract
Materialized views can be maintained by submitting maintenance queries to the data sources. However, the query results may be erroneous due to concurrent source updates. State-of-the-art maintenance strategies typically apply compensations to resolve such conflicts and assume all source schemata remain stable over time. In a loosely coupled dynamic environment, the sources may autonomously change not only their data but also their schema or semantics. Consequently, either the maintenance or the compensation queries may be broken. Unlike compensation-based approaches found in the literature, we instead model the complete materialized view maintenance process as a view maintenance transaction (VM_Transaction). This way, the anomaly problem can be rephrased as the serializability of VM_Transactions. To achieve VM_Transaction serializability, we propose a multiversion concurrency control algorithm, called TxnWrap, which is shown to be the appropriate design for loosely coupled environments with autonomous data sources. TxnWrap is complementary to the maintenance algorithms proposed in the literature, since it removes concurrency issues from consideration allowing the designer to focus on the maintenance logic. We show several optimizations of TxnWrap, in particular, (1) space optimizations on versioned data materialization and (2) parallel maintenance scheduling. With these optimizations, TxnWrap even outperforms state-of-the-art view maintenance solutions in terms of refresh time. Further, several design choices of TxnWrap are studied each having its respective advantages for certain environmental settings. A correctness proof based on transaction theory for TxnWrap is also provided. Last, we have implemented TxnWrap. The experimental results confirm that TxnWrap achieves predictable performance under a varying rate of concurrency.
Supplemental Material
Available for Download
Online Appendix to: Multiversion-based view maintenance over distributed data sources
- Agrawal, D., Abbadi, A. E., Singh, A., and Yurek, T. 1997. Efficient view maintenance at data warehouses. In Proceedings of SIGMOD. 417--427. Google Scholar
- Agrawal, D. and Sengupta, S. 1989. Modular synchronization in multiversion databases. In Proceedings of SIGMOD. 408--417. Google Scholar
- Bernstein, P. A., Hadzilacos, V., and Goodman, N. 1987. Concurrency Control and Recovery in Database System. Addison-Wesley, Reading, M.A. Google ScholarDigital Library
- Chan, A. and Gray, R. 1985. Implementing distributed read-only transactions. IEEE Trans. on Softw. Eng. 11, 205--212.Google ScholarDigital Library
- Chen, J., Chen, S., and Rundensteiner, E. A. 2002. A transactional model for data warehouse maintenance. In Proceedings of the Conference on Conceptual Modeling. 247--262. Google Scholar
- Chen, J., Zhang, X., Chen, S., Andreas, K., and Rundensteiner, E. A. 2001. DyDa: Data warehouse maintenance under fully concurrent environments. In Proceedings of SIGMOD Demo Session. 619. Google Scholar
- Chen, S., Chen, J., Zhang, X., and Rundensteiner, E. A. 2004. Detection and correction of conflicting source updates for view maintenace. In Proceedings of ICDE. 436--448. Google Scholar
- Colby, L. S., Griffin, T., Libkin, L., Mumick, I. S., and Trickey, H. 1996. Algorithms for deferred view maintenance. In Proceedings of SIGMOD. 469--480. Google Scholar
- Gray, J. and Reuter, A. 1992. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Gupta, A. and Mumick, I. 1995. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull. 18, 2, 3--19.Google Scholar
- Gupta, A., Mumick, I., and Ross, K. 1995. Adapting materialized views after redefinition. In Proceedings of SIGMOD. 211--222. Google Scholar
- Lee, A. M., Nica, A., and Rundensteiner, E. A. 2002. The EVE approach: View synchronization in dynamic distributed environments. IEEE Trans. Knowl. Data Eng. 14, 5, 931--954. Google ScholarDigital Library
- Liu, B., Chen, S., and Rundensteiner, E. A. 2002a. A transactional approach to parallel data warehouse maintenance. In Proceedings of DaWaK. 307--317. Google Scholar
- Liu, B., Chen, S., and Rundensteiner, E. A. 2002b. Batch data warehouse maintenance in dynamic environments. In Proceedings of CIKM. 68--75. Google ScholarDigital Library
- Lomet, D. B. and Salzberg, B. 1989. Access methods for multiversion data. In Proceedings of SIGMOD. 315--324. Google Scholar
- Lu, J. J., Moerkotte, G., Schue, J., and Subrahmanian, V. S. 1995. Efficient maintenance of materialized mediated views. In Proceedings of SIGMOD. 340--351. Google Scholar
- Madhavan, J., Bernstein, P. A., and Rahm, E. 2001. Generic schema matching with cupid. In Proceedings of VLDB. 49--58. Google Scholar
- Marche, S. 1993. Measuring the stability of data models. European J. Inform. Syst. 2, 1, 37--47.Google ScholarCross Ref
- Miller, R. J., Haas, L. M., and Hernández, M. A. 2000. Schema mapping as query discovery. In Proceedings of VLDB. 77--88. Google Scholar
- Mohan, C., Pirahesh, H., and Lorie, R. 1992. Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions. In Proceedings of SIGMOD. 124--133. Google Scholar
- Nica, A., Lee, A. J., and Rundensteiner, E. A. 1998. The CVS algorithm for view synchronization in evolvable large-scale information systems. In Proceedings of EDBT. 359--373. Google Scholar
- Nica, A. and Rundensteiner, E. A. 1999. View maintenance after view synchronization. In Proceedings of the Conference on International Database Engineering and Applications. 213--215. Google Scholar
- Quass, D. Gupta, A. Mumick, I. S., and Widom, J. 1996. Making view self-maintainable for data warehousing. In Proceedings of the Conference on Parallel and Distributed Information Systems. 158--169. Google Scholar
- Quass, D. and Widom, J. 1997. On-line warehouse view maintenance. In Proceedings of SIGMOD. 393--400. Google Scholar
- Salem, K., Beyer, K. S., Cochrane, R., and Lindsay, B. G. 2000. How to roll a join: Asynchronous incremental view maintenance. In Proceedings of SIGMOD. 129--140. Google Scholar
- Sjoberg, D. 1993. Quantifying schema evolution. Inform. Softw. Tech. 35, 1, 35--54.Google ScholarCross Ref
- Varde, A. S. and Rundensteiner, E. A. 2002. MEDWRAP: Consistent view maintenance over distributed multi-relation sources. In Proceedings of DEXA. 341--350. Google Scholar
- Velegrakis, Y., Miller, R. J., and Popa, L. 2003. Mapping adaptation under evolving schemas. In Proceedings of VLDB. 584--595. Google ScholarDigital Library
- Widom, J. 1995. Research problems in data warehousing. In Proceedings of CIKM. 25--30. Google ScholarDigital Library
- Zaniolo, C., Ceri, S., Faloursos, C., Snodgrass, R. T., Subrahmanian, V. S., and Zicari, R. 1997. Advanced Database Systems. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Zhang, X., Ding, L., and Rundensteiner, E. A. 2004. Parallel multi-source view maintenance. VLDB J. 13, 1, 22--48. Google ScholarDigital Library
- Zhang, X. and Rundensteiner, E. A. 2000. DyDa: Dynamic data warehouse maintenance in a fully concurrent environment. In Proceedings of DaWaK. 94--103. Google ScholarDigital Library
- Zhuge, Y., Gracía-Molina, H., Hammer, J., and Widom, J. 1995. View maintenance in a warehousing environment. In Proceedings of SIGMOD. 316--327. Google Scholar
- Zhuge, Y., Gracía-Molina, H., and Wiener, J. L. 1996. The strobe algorithms for multi-source warehouse consistency. In Proceedings of the Conference on Parallel and Distributed Information Systems. 146--157. Google Scholar
Index Terms
Multiversion-based view maintenance over distributed data sources
Recommendations
A Compensation-Based Approach for View Maintenance in Distributed Environments
Data integration over multiple heterogeneous data sources has become increasingly important for modern applications. The integrated data is usually stored as materialized views to allow better access, performance, and high availability. In loosely ...
Optimizing Cyclic Join View Maintenance over Distributed Data Sources
Materialized views defined over distributed data sources are critical for many applications to ensure efficient access, reliable performance, and high availability. Materialized views need to be maintained upon source updates since stale view extents ...
A comprehensive study of view maintenance approaches in data warehousing evolution
A data warehouse mainly stores integrated information over data from many different remote data sources for query and analysis. The integrated information at the data warehouse is stored in the form of materialized views. Using these materialized views, ...
Comments