Abstract
In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice.
We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view.
We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies.
Similar content being viewed by others
References
Abelló, A., Samos, J., Saltor, F.: On relationships offering new drill-across possibilities. In: Proc. of 5th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2002), pp. 7–13, 2002
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison–Wesley, Reading (1995)
Aho, A.V., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. Database Syst. 4(4), 435–454 (1979)
Atzeni, P., Ceri, S., Paraboschi, S., Torlone, R.: Database Systems: Concepts, Languages and Architectures. McGraw–Hill, New York (1999)
Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database scheme integration. ACM Comput. Surv. 18(4), 323–364 (1986)
Cabibbo, L., Torlone, R.: A logical approach to multidimensional databases. In: Proc. of 6th Int. Conference on Extending Database Technology (EDBT’98), pp. 183–197. Springer, Berlin (1998)
Cabibbo, L., Torlone, R.: From a procedural to a visual query language for OLAP. In: Proc. of 10th Int. Conference on Scientific and Statistical Database Management (SSDBM’98), pp. 74–83, 1998
Cabibbo, L., Torlone, R.: On the integration of autonomous data marts. In: Proc. of 16th Int. Conference on Scientific and Statistical Database Management (SSDBM’04), pp. 223–234, 2004
Cabibbo, L., Torlone, R.: Integrating heterogeneous multidimensional databases. In: Proc. of 17th Int. Conference on Scientific and Statistical Database Management (SSDBM’05), pp. 205–214, 2005
Cabibbo, L., Panella, I., Torlone, R.: DaWaII: a tool for the integration of autonomous data marts. In: Proc. of 22nd Int. Conference on Data Engineering (ICDE’06), Demo session, 2006
Elmagarmid, A., Rusinkiewicz, M., Sheth, A.: Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann, San Mako (1999)
Fellbaum, C. (ed.): WordNet: a Lexical Database for the English Language. MIT Press, Cambridge (1998)
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)
Honeyman, P.: Testing satisfaction of functional dependencies. J. ACM 29(3), 668–677 (1982)
Hull, R.: Managing semantic heterogeneity in databases: a theoretical perspective. In: Proc. of 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, pp. 51–61, 1997
Jensen, M.R., Møller, T.H., Pedersen, T.B.: Specifying OLAP Cubes on XML Data. J. Intell. Inf. Syst. 17(2-3), 255–280 (2001)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2nd edn. Wiley, New York (2002)
Lenzerini, M.: Data integration: a theoretical perspective. In: Proc. of 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, pp. 233–246, 2002
Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans. Database Syst. 4(4), 455–468 (1979)
Malvestuto, F.M.: The classification problem with semantically heterogeneous data. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 157–176, 1988
Malvestuto, F.M., Zuffada, C.: The derivation problem for summary data. In: Proc. of 4th Int. Conference on Scientific and Statistical Database Management (SSDBM’88), pp. 82–89, 1988
Miller, R.J. (ed.): Special issue on integration management. IEEE Bull. Tech. Comm. Data Eng. 25(3), (2002)
Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio project: managing heterogeneity. SIGMOD Rec. 30(1), 78–83 (2001)
Pedersen, T.B., Shoshani, A., Gu, J., Jensen, C.S.: Extending OLAP querying to external object databases. In: Proc. of 9th Int. Conference on Information and Knowledge Management, pp. 405–413, 2000
Pedersen, D., Riis, K., Pedersen, T.B.: XML-Extended OLAP Querying. In: Proc. of 14th Int. Conference on Scientific and Statistical Database Management (SSDBM’02), pp. 195–206, 2002
Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Sato, H.: Handling summary information in a database: derivability. In: Proc. of ACM SIGMOD International Conference on Management of Data, pp. 98–107, 1981
Torlone, R.: Conceptual models for multidimensional databases. In: M. Rafanelli (ed.) Multidimensional Databases, pp. 69–90, Idea Group Publ. (2002)
Torlone, R., Panella, I.: Design and development of a tool for integrating heterogeneous data warehouses. In: Proc. of 7th Int. Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), pp. 105–114, 2005
Yin, X., Pedersen, T.B.: Evaluating XML-extended OLAP queries based on a physical algebra. In: Proc. of 7th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP’04), pp. 73–82, 2004
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version this paper appeared, under the title “Integrating Heterogeneous Multidimensional Databases” [9], in 17th Int. Conference on Scientific and Statistical Database Management, 2005.
Rights and permissions
About this article
Cite this article
Torlone, R. Two approaches to the integration of heterogeneous data warehouses. Distrib Parallel Databases 23, 69–97 (2008). https://doi.org/10.1007/s10619-007-7022-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-007-7022-z