skip to main content
research-article

Data fusion

Published:15 January 2009Publication History
Skip Abstract Section

Abstract

The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.

This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

References

  1. Adali, S., Candan, K. S., Papakonstantinou, Y., and Subrahmanian, V. S. 1996. Query caching and optimization in distributed mediator systems. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, P., Benjelloun, O., Sarma, A. D., Hayworth, C., Nabar, S. U., Sugihara, T., and Widom, J. 2006. Trio: A system for data, uncertainty, and lineage. In Proceedings of the International Conference on Very Large Databases (VLDB), 1151--1154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ahmed, R., De Smedt, P., Du, W., Kent, W., Ketabchi, M. A., Litwin, W. A., Rafii, A., and Shan, M.-C. 1991. The Pegasus heterogeneous multidatabase system. IEEE Comput. 24, 12, 19--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ambite, J. L., Ashish, N., Barish, G., Knoblock, C. A., Minton, S., Modi, P. J., Muslea, I., Philpot, A., and Tejada, S. 1998. Ariadne: A system for constructing mediators for Internet sources. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 561--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ambite, J. L., Knoblock, C. A., Muslea, I., and Philpot, A. G. 2001. Compiling source descriptions for efficient and flexible information integration. J. Intell. Inf. Syst. 16, 2, 149--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arenas, M., Bertossi, L. E., and Chomicki, J. 1999. Consistent query answers in inconsistent databases. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM Press, 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arens, Y., Knoblock, C. A., and Shen, W.-M. 1996. Query reformulation for dynamic information integration. J. Intell. Inf. Syst. 6, 2-3 (June), 99--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Batini, C., Lenzerin, M., and Navathe, S. B. 1986. A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 4, 323--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bayardo, Jr., R. J., Bohrer, W., Brice, R., Cichocki, A., Fowler, J., Helal, A., Kashyap, V., Ksiezyk, T., Martin, G., Nodine, M., Rashid, M., Rusinkiewicz, M., Shea, R., Unnikrishnan, C., Unruh, A., and Woelk, D. 1997. InfoSleuth: Agent-Based semantic integration of information in open and dynamic environments. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Belcastro, V., Dutkowski, A., Kaminski, W., Kowalewski, M., Mallamaci, C. L., Meszyk, S., Mostardi, T., Scrocco, F. P., Staniszkis, W., and Turco, G. 1988. An overview of the distributed query system DQS. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 170--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Benjelloun, O., Sarma, A. D., Hayworth, C., and Widom, J. 2006. An introduction to ULDBs and the Trio system. IEEE Data Eng. Bull. 29, 1, 5--16.Google ScholarGoogle Scholar
  12. Berlin, J. and Motro, A. 2006. Tuplerank: Ranking discovered content in virtual databases. In Proceedings of the International Workshop on Next Generation Information on Technology and Systems (NGITS), 13--25.Google ScholarGoogle Scholar
  13. Bertossi, L. E., Bravo, L., Franconi, E., and Lopatenko, A. 2005. Complexity and approximation of fixing numerical attributes in databases under integrity constraints. In Proceedings of the International Conference on Database Programming Languages (DBPL), 262--278.Google ScholarGoogle Scholar
  14. Bertossi, L. E. and Chomicki, J. 2003. Query answering in inconsistent databases. In Logics for Emerging Applications of Databases, 43--83.Google ScholarGoogle Scholar
  15. Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., and Weis, M. 2005. Automatic data fusion with HumMer. In Proceedings of the International Conference on Very Large Databases (VLDB), 1251--1254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bilke, A. and Naumann, F. 2005. Schema matching using duplicates. In Proceedings of the International Conference on Data Engineering (ICDE), 69--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bleiholder, J. and Naumann, F. 2005. Declarative data fusion—Syntax, semantics, and implementation. In Proceedings of the East European Conference on Advances in Databases and Information Systems (ADBIS), 58--73.Google ScholarGoogle Scholar
  18. Bleiholder, J. and Naumann, F. 2006. Conflict handling strategies in an integrated information system. In Proceedings of the IJCAI Workshop on Information on the Web (IIWeb).Google ScholarGoogle Scholar
  19. Bohannon, P., Flaster, M., Fan, W., and Rastogi, R. 2005. A cost-based model and effective heuristic for repairing constraints by value modification. In Proceedings of the ACM International Conference on Management of Data SIGMOD, 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Brill, D., Templeton, M., and Yu, C. T. 1984. Distributed query processing strategies in Mermaid, a frontend to data management systems. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Brzezinski, Z., Getta, J. R., Rybnik, J., and Stepniewski, W. 1984. Unibase—An integrated access to databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Morgan Kaufmann, San Francisco, CA, 388--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Burdick, D., Deshpande, P., Jayram, T. S., Ramakrishnan, R., and Vaithyanathan, S. 2005. OLAP over uncertain and imprecise data. In Proceedings of the International Conference on Very Large Databases (VLDB), 970--981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Calmet, J., Jekutsch, S., and Schü, J. 1997. A generic query-translation framework for a mediator architecture. In Proceedings of the International Conference on Data Engineering (ICDE), W. A. Gray and P.-Å. Larson, Eds. IEEE Computer Society, 434--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Calmet, J. and Kullmann, P. 1999. Meta Web search with KOMET. In Proceedings of the Workshop on Intelligent Information Integration.Google ScholarGoogle Scholar
  25. Calvanese, D., Giacomo, G. D., Lembo, D., Lenzerini, M., and Rosati, R. 2005. Inconsistency tolerance in P2P data integration: An epistemic logic approach. In Proceedings of the International Conference on Database Programming Languages (DBPL).Google ScholarGoogle Scholar
  26. Caroprese, L., Greco, S., Trubitsyna, I., and Zumpano, E. 2006. Preferred generalized answers for inconsistent databases. In Proceedings of the Internation Symposium on Methodologies for Information Systems (ISMIS), 344--349.Google ScholarGoogle Scholar
  27. Caroprese, L. and Zumpano, E. 2006. A framework for merging, repairing and querying inconsistent databases. In Proceedings of the East European Conference on Advances in Databases and Information Systems (ADBIS), 383--398.Google ScholarGoogle Scholar
  28. Chaudhuri, S., Ganjam, K., Ganti, V., Kapoor, R., Narasayya, V., and Vassilakis, T. 2005. Data cleaning in Microsoft SQL Server 2005. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 918--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chomicki, J., Marcinkowski, J., and Staworko, S. 2004a. Computing consistent query answers using conflict hypergraphs. In Proceedings of the Internation Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 417--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chomicki, J., Marcinkowski, J., and Staworko, S. 2004b. Hippo: A system for computing consistent answers to a class of SQL queries. In Proceedings of the International Conference on Extending Database Technology (EDBT), 841--844.Google ScholarGoogle Scholar
  31. Cody, W. F., Haas, L. M., Niblack, W., Arya, M., Carey, M. J., Fagin, R., Flickner, M., Lee, D., Petkovic, D., Schwarz, P. M., Thomas, J., Roth, M. T., Williams, J. H., and Wimmers, E. L. 1995. Querying multimedia data from multiple repositories by content: The Garlic project. In Proceedings of the IFIP Working Conference on Visual Database Systems (VDB-3). Chapman & Hall, Ltd., 17--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Cohen, S. and Sagiv, Y. 2005. An incremental algorithm for computing ranked full disjunctions. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM Press, New York, 98--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Collet, C., Huhns, M. N., and Shen, W.-M. 1991. Resource integration using a large knowledge base in Carnot. IEEE Comput. 24, 12, 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Connors, T., Hasan, W., Kolovson, C., Neimat, M.-A., Schneider, D., and Wilkinson, K. 1991. The Papyrus integrated data server. In Proceedings of the International Conference on Parallel and Distributed Information Systems. IEEE Computer Society Press, 139--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dayal, U. 1983. Processing queries over generalization hierarchies in a multidatabase system. In Proceedings of the International Conference on Very Large Databases (VLDB), 342--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dayal, U. and Hwang, H.-Y. 1984. View definition and generalization for database system integration in a multidatabase system. IEEE Trans. Softw. Eng. 10, 6 (Nov.), 628--645.Google ScholarGoogle Scholar
  37. DeMichiel, L. G. 1989. Resolving database incompatibility: An approach to performing relational operations over mismatched domains. IEEE Trans. Knowl. Data Eng. 1, 4, 485--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dittrich, K. R. and Domenig, R. 1999. Towards exploitation of the data universe: Database technology for comprehensive query services. In Proceedings of the International Conference on Business Infromation Systems (BIS).Google ScholarGoogle Scholar
  39. Domenig, R. and Dittrich, K. R. 1999. An overview and classification of mediated query systems. SIGMOD Rec. 28, 3, 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Draper, D., Halevy, A. Y., and Weld, D. S. 2001a. The Nimble integration engine. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 567--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Draper, D., Halevy, A. Y., and Weld, D. S. 2001b. The Nimble XML data integration system. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 155--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dwyer, P. and Larson, J. 1987. Some experiences with a distributed database testbed system. Proc. IEEE 75, 5 (May), 633--648.Google ScholarGoogle ScholarCross RefCross Ref
  43. Eiter, T., Fink, M., Greco, G., and Lembo, D. 2003. Efficient evaluation of logic programs for querying data integration systems. In Proceedings of the International Conference on Logic Programming (ICLP), 163--177.Google ScholarGoogle Scholar
  44. Fagin, R., Kolaitis, P. G., and Popa, L. 2005. Data exchange: Getting to the core. Trans. Dat. Syst. 30, 1, 174--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Flesca, S., Furfaro, F., and Parisi, F. 2005. Consistent query answers on numerical databases under aggregate constraints. In Proceedings of the International Conference on Database Programming Languages (DBPL), 279--294.Google ScholarGoogle Scholar
  46. Fuxman, A., Fazli, E., and Miller, R. J. 2005a. ConQuer: Efficient management of inconsistent databases. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fuxman, A., Fuxman, D., and Miller, R. J. 2005b. ConQuer: A system for efficient querying over inconsistent databases. In Proceedings of the International Conference on Very Large Databases (VLDB), 1354--1357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Galhardas, H., Florescu, D., Shasha, D., and Simon, E. 2000a. AJAX: An extensible data cleaning tool. In Proceedings of the ACM International Conference on Management of Data SIGMOD, W. Chen et al., 590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Galhardas, H., Florescu, D., Shasha, D., and Simon, E. 2000b. An extensible framework for data cleaning. In Proceedings of the International Conference on Data Engineering (ICDE), 312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Galhardas, H., Florescu, D., Shasha, D., Simon, E., and Saita, C.-A. 2001. Declarative data cleaning: Language, model, and algorithms. In Proceedings of the International Conference on Very Large Databases (VLDB), 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Galindo-Legaria, C. A. 1994. Outerjoins as disjunctions. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 348--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., and Widom, J. 1997. The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst. 8, 2, 117--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Genesereth, M. R., Keller, A. M., and Duschka, O. M. 1997. Infomaster: An information integration system. In Proceedings of the ACM International Conference on Management of Data SIGMOD, 539--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Greco, S., Pontieri, L., and Zumpano, E. 2001. Integrating and managing conflicting data. In Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics. Springer, 349--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Haas, L. M., Kodali, P., Rice, J. E., Schwarz, P. M., and Swope, W. C. 2000. Integrating life sciences data-with a little Garlic. In Proceedings of the IEEE International Conference on Bioinformatics and Bio Engineering (BIBE). IEEE Computer Society, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Halevy, A. Y., Ashish, N., Bitton, D., Carey, M. J., Draper, D., Pollock, J., Rosenthal, A., and Sikka, V. 2005. Enterprise information integration: Successes, challenges and controversies. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 778--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hammer, J., McHugh, J., and Garcia-Molina, H. 1997. Semistructured data: The TSIMMIS experience. In Proceedings of the East European Conference on Advances in Databases and Information Systems (ADBIS), 1--8.Google ScholarGoogle Scholar
  58. Hernández, M. A. and Stolfo, S. J. 1998. Real-World data is dirty: Data cleansing and the merge/purge problem. Data Mining Knowl. Discov. 2, 1, 9--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ives, Z. G., Florescu, D., Friedman, M., Levy, A. Y., and Weld, D. S. 1999. An adaptive query execution system for data integration. In Proceedings of the ACM International Conference on Management of Data SIGMOD, 299--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ives, Z. G., Khandelwal, N., Kapur, A., and Cakir, M. 2005. ORCHESTRA: Rapid, collaborative sharing of dynamic data. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 107--118.Google ScholarGoogle Scholar
  61. Jakobson, G., Piatetsky-Shapiro, G., Lafond, C., Rajinikanth, M., and Hernandez, J. 1988. CALIDA: A knowledge-based system for integrating multiple heterogeneous databases. In Proceedings of the 3rd International Conference on Data and Knowledge Bases: Improving Usability and Responsiveness, 3--18.Google ScholarGoogle Scholar
  62. Josifovski, V., Schwarz, P., Haas, L., and Lin, E. 2002. Garlic: A new flavor of federated query processing for DB2. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 524--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Kent, W., Ahmed, R., Albert, J., Ketabchi, M. A., and Shan, M.-C. 1992. Object identification in multidatabase systems. In Proceedings of the IFIP WG 2.6 Database Semantics Conference on Interoperable Database Systems (DS-5), 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., and Lee, D. 2003. A taxonomy of dirty data. Data Mining Knowl. Discov. 7, 1, 81--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Knoblock, C. A. 1995. Planning, executing, sensing, and replanning for information gathering. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), C. Mellish, ed. Morgan Kaufmann, San Francisco, CA, 1686--1693.Google ScholarGoogle Scholar
  66. Knoblock, C. A., Minton, S., Ambite, J. L., Ashish, N., Modi, P. J., Muslea, I., Philpot, A. G., and Tejada, S. 1998. Modeling Web sources for information integration. In Proceedings of the National Conference on Artificial Intelligence (AAAI). American Association for Artificial Intelligence, Menlo Park, CA, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Kwok, C. T. and Weld, D. S. 1996. Planning to gather information. In Proceedings of the National Conference on Artificial Intelligence (AAAI). AAAI/MIT Press, Portland, 32--39.Google ScholarGoogle Scholar
  68. Landers, T. and Rosenberg, R. L. 1982. An overview of MULTIBASE. In Proceedings of the 2nd International Symposium on Distributed Data Bases, H. J. Schneider, ed. North Holland, Berlin.Google ScholarGoogle Scholar
  69. Lembo, D., Lenzerini, M., and Rosati, R. 2002. Source inconsistency and incompleteness in data integration. In Proceedings of the International Workshop on Knowledge Representation Meets Databases (KRDB).Google ScholarGoogle Scholar
  70. Lenat, D. B., Guha, R. V., Pittman, K., Pratt, D., and Shepherd, M. 1990. CYC: Toward programs with common sense. Commun. ACM 33, 8, 30--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Leone, N., Greco, G., Ianni, G., Lio, V., Terracina, G., Eiter, T., Faber, W., Fink, M., Gottlob, G., Rosati, R., Lembo, D., Lenzerini, M., Ruzzi, M., Kalka, E., Nowicki, B., and Staniszkis, W. 2005. The INFOMIX system for advanced integration of incomplete and inconsistent data. In Proceedings of the ACM International Conference on Management of Data SIGMOD, 915--917. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Levenshtein, V. 1965. Binary codes capable of correcting spurious insertions and deletions of ones. Problems Inf. Transm. 1, 8--17.Google ScholarGoogle Scholar
  73. Levy, A. Y., Rajaraman, A., and Ordille, J. J. 1996a. Querying heterogeneous information sources using source descriptions. In Proceedings of the International Conference on Very Large Databases (VLDB). Morgan Kaufmann, 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Levy, A. Y., Rajaraman, A., and Ordille, J. J. 1996b. The World Wide Web as a collection of views: Query processing in the information manifold. In Proceedings of the SIGMOD Workshop on Materialized Views: Techniques and Applications (VIEW), 43--55.Google ScholarGoogle Scholar
  75. Lim, E.-P., Cao, Y., and Chiang, R. H. L. 1997. Source-Aware multidatabase query processing. In Proceedings of the Workshop on Engineering Federated Information Database Systems (EFDBS), 69--80.Google ScholarGoogle Scholar
  76. Lim, E.-P., Srivastava, J., and Hwang, S.-Y. 1995. An algebraic transformation framework for multidatabase queries. Distrib. Parallel Databases 3, 3, 273--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Lim, E.-P., Srivastava, J., and Shekhar, S. 1994. Resolving attribute incompatibility in database integration: An evidential reasoning approach. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 154--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Litwin, W. 1985. An overview of the multidatabase system MRDSM. In Proceedings of the ACM Annual Conference on the Range of Computing: Mid-80's Perspective. ACM Press, New York, 524--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Litwin, W. and Abdellatif, A. 1987. An overview of the multi-database manipulation language MDSL. Proc. IEEE 75, 5 (May), 621--632.Google ScholarGoogle ScholarCross RefCross Ref
  80. Litwin, W., Boudenant, J., Esculier, C., Ferrier, A., Glorieux, A. M., Chimia, J. L., Kabbaj, K., Moulinoux, C., Rolin, P., and Stangret, C. 1982. SIRIUS system for distributed data management. In Distributed Databases. North-Holland, Amsterdam, The Netherlands, 311--343.Google ScholarGoogle Scholar
  81. Liu, L. and Pu, C. 1995. The distributed interoperable object model and its application to large-scale interoperable database systems. In Proceedings of the Internation Conference on Information and Knowledge Management (CIKM). ACM Press, New York, 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. McHugh, J., Abiteboul, S., Goldman, R., Quass, D., and Widom, J. 1997. Lore: A database management system for semistructured data. SIGMOD Rec. 26, 3, 54--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Melnik, S., Bernstein, P. A., Halevy, A., and Rahm, E. 2005. Supporting executable mappings in model management. In Proceedings of the ACM International Conference on Management of Data SIGMOD, 167--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Mena, E., Kashyap, V., Sheth, A. P., and Illarramendi, A. 1996. OBSERVER: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. In Proceedings of the IFCIS Conference on Cooperative Information Systems (CoopIS), 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Miller, R. J., Ioannidis, Y. E., and Ramakrishnan, R. 1993. The use of information capacity in schema integration and translation. In Proceedings of the International Conference on Very Large Databases (VLDB), R. Agrawal et al., eds. Morgan Kaufmann, 120--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Motro, A. 1986. Completeness information and its application to query processing. In Proceedings of the International Conference on Very Large Databases (VLDB), 170--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Motro, A. 1999. Multiplex: A formal model for multidatabases and its implementation. In Proceedings of the International Workshop on Next Generation Information on Technology and Systems (NGITS). Springer, 138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Motro, A. and Anokhin, P. 2006. Fusionplex: Resolution of data inconsistencies in the integration of heterogeneous information sources. Inf. Fusion 7, 2, 176--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Motro, A., Anokhin, P., and Acar, A. C. 2004. Utility-Based resolution of data inconsistencies. In Proceedings of the International Workshop on Information Qualities in Information Systems (IQIS). ACM Press, 35--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Motro, A., Berlin, J., and Anokhin, P. 2004. Multiplex, Fusionplex, and Autoplex—Three generations of information integration. SIGMOD Rec. 33, 4, 51--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Naumann, F., Bilke, A., Bleiholder, J., and Weis, M. 2006. Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29, 2, 21--31.Google ScholarGoogle Scholar
  92. Naumann, F., Freytag, J.-C., and Leser, U. 2004. Completeness of integrated information sources. Inf. Syst. 29, 7, 583--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Nodine, M. H., Fowler, J., and Perry, B. 1999. Active information gathering in InfoSleuth. In Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications (CODAS), 15--26.Google ScholarGoogle Scholar
  94. Ordille, J. J. and Miller, B. P. 1993. Distributed active catalogs and meta-data caching in descriptive name services. In Proceedings of the International Conference on Distributed Computing Systems, 120--129.Google ScholarGoogle Scholar
  95. Papakonstantinou, Y., Abiteboul, S., and Garcia-Molina, H. 1996. Object fusion in mediator systems. In Proceedings of the International Conference on Very Large Databases (VLDB). Morgan Kaufmann, 413--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Parsons, S. 1996. Current approaches to handling imperfect information in data and knowledge bases. IEEE Trans. Knowl. Data Eng. 8, 3 (Jun.), 353--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Popa, L., Velegrakis, Y., Miller, R. J., Hernández, M. A., and Fagin, R. 2002. Translating Web data. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Rahm, E. and Bernstein, P. A. 2001. On matching schemas automatically. Tech. Rep. MSR-TR-2001-17, Microsoft Research, Redmond, Washington. February.Google ScholarGoogle Scholar
  99. Rahm, E. and Do, H. H. 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23, 4, 3--13.Google ScholarGoogle Scholar
  100. Rajaraman, A. and Ullman, J. D. 1996. Integrating information by outerjoins and full disjunctions (extended abstract). In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM Press, 238--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Rajinikanth, M., Jakobson, G., Lafond, C., Papp, W., and Piatetsky-Shapiro, G. 1990. Multiple database integration in CALIDA: Design and implementation. In Proceedings of the International Conference on Systems Integration (ICSI). IEEE Press, 378--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Raman, V., Chou, A., and Hellerstein, J. M. 1999. Scalable spreadsheets for interactive data analysis. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.Google ScholarGoogle Scholar
  103. Raman, V. and Hellerstein, J. M. 2001. Potter's Wheel: An interactive data cleaning system. In Proceedings of the International Conference on Very Large Databases (VLDB). Morgan Kaufmann, 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Rao, J., Pirahesh, H., and Zuzarte, C. 2004. Canonical abstraction for outerjoin optimization. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 671--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Reck, C. and König-Ries, B. 1997. An architecture for transparent access to semantically heterogeneous information sources. In Proceedings of the International Workshop on Cooperative Information Agents (CIA). Springer, 260--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Rusinkiewicz, M., Elmasri, R., Czejdo, B., Georgakopoulos, D., Karabatis, G., Jamoussi, A., Loa, K., and Li, Y. 1989. Omnibase: Design and implementation of a multidatabase system. In Proceedings of the 1st Annual Symposium in Parallel and Distributed Processing, 162--169.Google ScholarGoogle Scholar
  107. Sarma, A. D., Benjelloun, O., Halevy, A. Y., and Widom, J. 2006. Working models for uncertain data. In Proceedings of the International Conference on Data Engineering (ICDE), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Sattler, K., Conrad, S., and Saake, G. 2000. Adding conflict resolution features to a query language for database federations. In Proceedings of the Workshop on Engineering Federated Information System (EFIS), M. Roantree et al., eds, 41--52.Google ScholarGoogle Scholar
  109. Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., and Baldoni, R. 2004. The DaQuinCIS architecture: A platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29, 7, 551--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Schallehn, E. and Sattler, K.-U. 2003. Using similarity-based operations for resolving data-level conflicts. In Proceedings of the British National Conference on Databases (BNCOD), 172--189.Google ScholarGoogle Scholar
  111. Schallehn, E., Sattler, K.-U., and Saake, G. 2004. Efficient similarity-based operations for data integration. Data Knowl. Eng. 48, 3, 361--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Sheth, A. P. and Larson, J. A. 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22, 3, 183--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Shipman, D. W. 1981. The functional data model and the data languages DAPLEX. Trans. Dat. Syst. 6, 1, 140--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Shoens, K. A., Luniewski, A., Schwarz, P. M., Stamos, J. W., and II, J. T. 1993. The Rufus system: Information organization for semi-structured data. In Proceedings of the International Conference on Very Large Databases (VLDB), R. Agrawal et al., eds. Morgan Kaufmann, 97--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Singh, M. P., Cannata, P., Huhns, M. N., Jacobs, N., Ksiezyk, T., Ong, K., Sheth, A. P., Tomlinson, C., and Woelk, D. 1997. The Carnot heterogeneous database project: Implemented applications. Distrib. Parallel Databases 5, 2, 207--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Staworko, S., Chomicki, J., and Marcinkowski, J. 2006. Preference-Driven querying of inconsistent relational databases. In Proceedings of the International Workshop on Inconsistency and Incompleteness in Databases (IIDB).Google ScholarGoogle Scholar
  117. Subrahmanian, V. S., Adali, S., Brink, A., Emery, R., Lu, J., Rajput, A., Rogers, T., Ross, R., and Ward, C. 1995. Hermes: A heterogeneous reasoning and mediator system. Tech. Rep., University of Maryland.Google ScholarGoogle Scholar
  118. Templeton, M., Brill, D., Dao, S., Lund, E., Ward, P., Chen, A., and MacGregor, R. 1987. Mermaid—A front-end to distributed heterogeneous databases. Proc. IEEE 75, 5 (May), 695--708.Google ScholarGoogle ScholarCross RefCross Ref
  119. Tomasic, A., Amouroux, R., Bonnet, P., Kapitskaia, O., Naacke, H., and Raschid, L. 1997. The distributed information search component (Disco) and the World Wide Web. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, 546--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Tomasic, A., Raschid, L., and Valduriez, P. 1998. Scaling access to heterogeneous data sources with Disco. IEEE Trans. Knowl. Data Eng. 10, 5, 808--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Tsai, P. S. M. and Chen, A. L. P. 2000. Partial natural outerjoin—An operation for interoperability in a multidatabase environment. J. Inf. Sci. Eng. 16, 4 (Jul.), 593--617.Google ScholarGoogle Scholar
  122. Tseng, F. S.-C., Chen, A. L. P., and Yang, W.-P. 1993. Answering heterogeneous database queries with degrees of uncertainty. Distrib. Parallel Databases 1, 3, 281--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Ullman, J. D., Garcia-Molina, H., and Widom, J. 2001. Database Systems: The Complete Book. Prentice Hall PTR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Wang, H. and Zaniolo, C. 2000. Using SQL to build new aggregates and extenders for object-relational systems. In Proceedings of the International Conference on Very Large Databases (VLDB), A. E. Abbadi et al., eds. Morgan Kaufmann, 166--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Weis, M. and Naumann, F. 2004. Detecting duplicate objects in XML documents. In Proceedings of the International Workshop on Information Quality Informative Systems (IQIS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Weis, M. and Naumann, F. 2005. DogmatiX tracks down duplicates in XML. In Proceedings of the ACM International Conference on Management of Data SIGMOD. ACM Press, New York, 431--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Widom, J. 2005. Trio: A system for integrated management of data, accuracy, and lineage. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 262--276.Google ScholarGoogle Scholar
  128. Wiederhold, G. 1992. Mediators in the architecture of future information systems. Comput. 25, 3 (Mar.), 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Wijsen, J. 2003. Condensed representation of database repairs for consistent query answering. In Proceedings of the International Conference on Database Theory (ICDT), 378--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Yan, L. L. and Zsu, M. T. 1999. Conflict tolerant queries in AURORA. In Proc. of CoopIS. IEEE Computer Society, 279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Yerneni, R., Papakonstantinou, Y., Abiteboul, S., and Garcia-Molina, H. 1998. Fusion queries over Internet databases. In Proceedings of the International Conference on Extending Database Technology (EDBT), 57--71. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data fusion

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 41, Issue 1
            January 2009
            281 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/1456650
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 January 2009
            • Accepted: 1 December 2007
            • Revised: 1 September 2007
            • Received: 1 May 2007
            Published in csur Volume 41, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader