Abstract
In today’s integrating information systems data fusion, i.e., the merging of multiple tuples about the same real-world object into a single tuple, is left to ETL tools and other specialized software. While much attention has been paid to architecture, query languages, and query execution, the final step of actually fusing data from multiple sources into a consistent and homogeneous set is often ignored.
This paper states the formal problem of data fusion in relational databases and discusses which parts of the problem can already be solved with standard Sql. To bridge the final gap, we propose the SQL Fuse By statement and define its syntax and semantics. A first implementation of the statement in a prototypical database system shows the usefulness and feasibility of the new operator.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
van Bercken, J., Blohsfeld, B., Dittrich, J.-P., Krämer, J., Schäfer, T., Schneider, M., Seeger, B.: XXL - a library approach to supporting efficient implementations of advanced database queries. In: Proc. of VLDB 2001, pp. 39–48 (2001)
Dayal, U.: Processing queries over generalization hierarchies in a multidatabase system. In: Proc. of VLDB 1983, pp. 342–353 (1983)
Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An extensible data cleaning tool. In: Proc. of SIGMOD, p. 590 (2000)
Galindo-Legaria, C.: Outerjoins as disjunctions. In: Proc. of SIGMOD, pp. 348–358 (1994)
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst. 8(2), 117–132 (1997)
Greco, S., Pontieri, L., Zumpano, E.: Integrating and managing conflicting data. In: Revised Papers from the 4th Int. Andrei Ershov Memorial Conf. on Perspectives of System Informatics, pp. 349–362 (2001)
Motro, A.: Completeness information and its application to query processing. In: Proc. of VLDB Kyoto, pp. 170–178 (August 1986)
Motro, A., Anokhin, P.: Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources. Information Fusion (2004) (In Press)
Naumann, F., Freytag, J.-C., Leser, U.: Completeness of integrated information sources. Information Systems 29(7), 583–615 (2004)
Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proc. of VLDB, pp. 413–424 (1996)
Raman, V., Hellerstein, J.: Potter’s Wheel: An interactive data cleaning system. In: Proc. of VLDB, pp. 381–390 (2001)
Rao, J., Pirahesh, H., Zuzarte, C.: Canonical abstraction for outerjoin optimization. In: Proc. of SIGMOD, pp. 671–682. ACM Press, New York (2004)
Sattler, K., Conrad, S., Saake, G.: Adding Conflict Resolution Features to a Query Language for Database Federations. In: Proc. 3rd Int. Workshop on Engineering Federated Information Systems, EFIS, pp. 41–52 (2000)
Scannapieco, M., Batini, C.: Completeness in the relational model: a comprehensive framework. In: Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, pp. 333–345 (2004)
Schallehn, E., Sattler, K.-U., Saake, G.: Efficient similarity-based operations for data integration. Data Knowl. Eng. 48(3), 361–387 (2004)
Subrahmanian, V.S., Adali, S., Brink, A., Emery, R., Lu, J.L., Rajput, A., Rogers, T.J., Ross, R., Ward, C.: Hermes: A heterogeneous reasoning and mediator system. Technical report, University of Maryland (1995)
Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for object- relational systems. In: Proc of VLDB, pp. 166–175 (2000)
Yan, L.L., Özsu, M.: Conflict tolerant queries in AURORA. In: Proc. of CoopIS, p. 279 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bleiholder, J., Naumann, F. (2005). Declarative Data Fusion – Syntax, Semantics, and Implementation. In: Eder, J., Haav, HM., Kalja, A., Penjam, J. (eds) Advances in Databases and Information Systems. ADBIS 2005. Lecture Notes in Computer Science, vol 3631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547686_5
Download citation
DOI: https://doi.org/10.1007/11547686_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28585-4
Online ISBN: 978-3-540-31895-8
eBook Packages: Computer ScienceComputer Science (R0)