Skip to main content
Log in

System support for exploration and expert feedback in resolving conflicts during integration of metadata

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A critical reality in integration is that knowledge obtained from different sources may often be conflicting. Conflict-resolution, whether performed during the design phase or during run-time, can be costly and, if done without a proper understanding of the usage context, can be ineffective. In this paper, we propose a novel exploration and feedback-based approach [FICSR (Pronounced as “fixer”)] to conflict-resolution when integrating metadata from different sources. Rather than relying on purely automated conflict-resolution mechanisms, FICSR brings the domain expert in the conflict-resolution process and informs the integration based on the expert’s feedback. In particular, instead of relying on traditional model based definition of consistency (which, whenever there are conflicts, picks a possible world among many), we introduce a ranked interpretation of the metadata and statements about the metadata. This not only enables FICSR to avoid committing to an interpretation too early, but also helps in achieving a more direct correspondence between the experts’ (subjective) interpretation of the data and the system’s (objective) treatment of the available alternatives. Consequently, the ranked interpretation leads to new opportunities for exploratory feedback for conflict-resolution: within the context of a given statement of interest, (a) a preliminary ranking of candidate matches, representing different resolutions of the conflicts, informs the user about the alternative interpretations of the metadata, while (b) user feedback regarding the preferences among alternatives is exploited to inform the system about the expert’s relevant domain knowledge. The expert’s feedback, then, is used for resolving not only the conflicts among different sources, but also possible mis-alignments due to the initial matching phase. To enable this \({(system \stackrel{_{informs}}{\longleftrightarrow} user)}\) feedback process, we develop data structures and algorithms for efficient off-line conflict/agreement analysis of the integrated metadata. We also develop algorithms for efficient on-line query processing, candidate result enumeration, validity analysis, and system feedback. The results are brought together and evaluated in the Feedback-based InConSistency Resolution (FICSR) system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. http://aria.asu.edu/ficsr/data.zip

  2. Alchourron C., Gardenfors P., Makinson D.: On the logic of theory change: partial meet contraction and revision functions. J. Symb. Log. 50(2), 531–543 (1985)

    Article  MathSciNet  Google Scholar 

  3. Arenas, M., Libkin, L.: XML data exchange: consistency and query answering. In: PODS, pp. 13–24 (2005)

  4. Banjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)

  5. Bertossi L.: Consistent query answering in databases. SIGMOD Rec. 35(2), 68–76 (2006)

    Article  Google Scholar 

  6. Biskup J.: A formal approach to null values in database relations. Adv. Database Theory 1, 299–341 (1979)

    Google Scholar 

  7. Biskup J.: Foundations of codd’s relational maybe operations. ACM Trans. Database Syst. 8(4), 608–636 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bonifati, A., Chang, E., Lakshmanan, L.: Heptox: marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005)

  9. Boutilier, C., Brafman, R.I., Geib, C.: Structured reachability analysis for markov decision processes. In: UAI (1998)

  10. Candan K.S., Grant J., Subrahmanian V.: A unified treatment of null values using constraints. Inf. Syst. J. 98(1–4), 99–156 (1997)

    Google Scholar 

  11. Candan K.S., Li W.-S., Priya M.L.: Similarity-based ranking and query processing in multimedia databases. Data Knowl. Eng. 35(3), 259–298 (2000)

    Article  MATH  Google Scholar 

  12. Candan K.S., Kim J.W., Liu H., Suvarna R.: Discovering mappings in hierarchical data from multiple sources using the inherent structure. J. Knowl. Inf. Syst. 10(2), 185–210 (2006)

    Article  Google Scholar 

  13. Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. TKDE 16 (2004)

  14. Chiticariu, L., Kolaitis, P., Popa, L.: Interactive generation of integrated schemas. In: SIGMOD (2008)

  15. Codd, E.F.: Understanding relations (Installment n. 7). In: FDT Bulletin of ACM-SIGMOD (1975)

  16. Codd, E.F.: Extending the database relational model to capture more meaning. In: ACM TODS, vol. 4 (1979)

  17. Conrad, S., Höding, M., Saake, G., Schmitt, I., Türker, C.: Schema integration with integrity constraints. In: Proceedings of British National Conference on Databases (BNCOD), pp. 200–214 (1997)

  18. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  19. Domshlak C., Gal A., Roitman H.: Rank aggregation for automatic schema matching. TKDE 19(4), 538–553 (2007)

    Google Scholar 

  20. Doan, A., Domingos, P., Levy, A.Y.: Learning source description for data integration. In: WebDB (2000)

  21. Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: VLDB, pp. 687–698 (2007)

  22. Doyle J.: A truth maintenance system. J. Artif. Intell. 12(3), 231–272 (1979)

    Article  MathSciNet  Google Scholar 

  23. Euzenat J., Shvaiko P.: Ontology Matching. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  24. Fagin, R.: Combining fuzzy information from multiple systems. In: Proceedings of PODS, pp. 216–226 (1996)

  25. Fagin, R.: Fuzzy queries in multimedia database systems. PODS98

  26. Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  27. Flesca, S., Furfaro, F.,Greco, S., and Zumpano, E. Repairs and consistent answers for XML data with functional dependencies. In: XSYM, pp. 238–253 (2003)

  28. Gal A., Anaby-Tavor A., Trombetta A., Montesi D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)

    Article  Google Scholar 

  29. Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: ICSLP, pp. 1070–1080 (1988)

  30. Grant J.: Incomplete information in a relational database. Fundamenta Informaticae 3(3), 363–378 (1980)

    MATH  MathSciNet  Google Scholar 

  31. Grant J., Minker J.: Answering queries in indefinite databases and the null value problem. In: Kanellakis, P.(eds) Advances in Computing Research, vol. 3, JAI press Inc., Greenwich (1986)

    Google Scholar 

  32. Grant, J., Minker, J.: A logic-based approach to data integration. In: CoRR at ACM, DB/011032 (2001)

  33. Haas, P., Wu, M., Xu, F., Jampani, R., Jermaine, C., Perez, L.: MCDB: A monte carlo approach to managing uncertain data. In: SIGMOD (2008)

  34. Halevy, A., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management. In: ICDE (2003)

  35. Hernandez, M., Miller, R.J., Haas, L.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD (2001)

  36. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)

  37. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB03 (2003)

  38. Imielinski, T., Lipski, W.: On representing incomplete information in a relational data base. In: VLDB (1981)

  39. Imielinski T., Lipski W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  40. Jeffery, S., Franklin, M., Halevy, A.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD (2008)

  41. Jhingran, A.: Enterprise information mashups: integrating information, simply. In: VLDB, pp. 3–4 (2006)

  42. Kang, J., Han, T., Lee, D., Mitra, P.: Establishing value mappings using statistical models and user feedback. In: CIKM, pp. 68–75 (2005)

  43. Kementsietsidis, A., Arenas, M., Miller, R.: Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In: SIGMOD, pp. 325–336 (2003)

  44. Lakshmanan V.S. et al.: ProbView: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)

    Article  Google Scholar 

  45. Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of PODS, pp. 233–246 (2002)

  46. Li W.-S., Candan K.S., Hirata K., Hara Y.: Supporting efficient multimedia database exploration. VLDB J. 9(4), 312–326 (2001)

    MATH  Google Scholar 

  47. Li, C., Chang, K.C., Ilyas, I.F., Song, S.: Ranksql: query algebra and optimization for relational topá1k queries. In: SIGMOD (2006)

  48. Liu, M., Ling, T.W.: A data model for semistructured data with partial and inconsistent information. In: EDBT, pp. 317–331 (2000)

  49. Nakajima, H.: Development of efficient fuzzy SQL for large scale fuzzy relational database. In: IFSA (1993)

  50. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)

  51. Mercer, R., Risch, V.: Properties of maximal cliques of a pair-wise compatibility graph for three nonmonotonic reasoning system. In: Proceedings of the Answer Set Programming (2003)

  52. Miller, R., Haas, L., Hernandez, M.: Schema mapping as query discovery. In: VLDB, pp. 77–88 (2000)

  53. Milo, T., Zohar, S.: Using schema matching to simplify heterogeneous data translation. In: VLDB, pp. 122–133 (1998)

  54. Mitra, P., Wiederhold, G., Kersten, M.: A graph oriented model for articulation of ontology interdependencies. In: EDBT (2000)

  55. Mitra, P., Wiederhold, G., Jannink, J.: Semi-automatic integration of knowledge sources. In: Proceedings of Fusion ’99, July (1999)

  56. Moon J.W., Moser L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  57. Ng, W.: Repairing inconsistent merged xml data. In: DEXA, pp. 244–255 (2003)

  58. Özsu M.T., Blakeley J.A.: Query processing in object-oriented database systems. In: Kim, W.(eds) Modern Database Systems: The Object Model, Interoperability, and Beyond, pp. 146–174. ACM Press/Addison-Wesley, New York/Reading (1995)

    Google Scholar 

  59. Palopoli, L., Sacca, D., Ursino, D.: An automatic technique for detecting type conflicts in database schemes. In: CIKM (1998)

  60. Pascoal, M., Martins, E.: A new implementation of Yen’s ranking loopless paths algorithm. 4OR—Quarterly Journal of the Belgian, French and Italian Operations Research Societies (2003)

  61. Pottinger, R.A., Bernstein, P.A.: Merging models based on given correspondences. In: VLDB, pp. 826–873 (2003)

  62. Qi, Y., Candan, K.S., Sapino, M.L., Kintigh, K.: Using QUEST for integrating taxonomies in the presence of misalignments and conflicts. In: SIGMOD, Demo, pp. 1153–1155 (2007)

  63. Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-based inconsistency resolution and query processing on misaligned data sources. In: ACM SIGMOD (2007)

  64. Qi, Y., Candan, K.S., Sapino, M.L., Kintigh, K.: QUEST: query-driven exploration of semistructured data with conflicts and partial knowledge. In: VLDB Workshop on Clean Databases (CleanDB) (2006)

  65. Qi, Y., Candan, K.S., Sapino, M.L.: Sum-max monotonic ranked joins for evaluating top-K twig queries on weighted data graphs. In: VLDB (2007)

  66. Qi, Y., Candan, K.S., Tatemura, J., Chen, S., Liao, F.: Supporting OLAP operations over imperfectly integrated taxonomies. In: SIGMOD (2008)

  67. Rahm E., Bernstein P.A.: A survey of approaches to automatic schema matching. VLDB J. 4(10), 334–350 (2001)

    Article  Google Scholar 

  68. Reiter R.: A sound and sometimes complete query evaluation algorithm for relational databases with null values. J. Assoc. Comput. Mach. (JACM) 33(2), 349–370 (1986)

    Article  MathSciNet  Google Scholar 

  69. Russel S., Norvig P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (1995)

    Google Scholar 

  70. Sarma, A.D., Dong, L., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD (2008)

  71. Taylor, N.E., Ives, Z.G.: Reconciling while tolerating disagreement in collaborative data sharing. In: SIGMOD, pp. 13–24 (2006)

  72. Vermeer, M.W.W., Apers, P.M.G.: The role of integrity constraints in database interoperation. In: VLDB, pp. 425–435 (1996)

  73. Yen J.Y.: Finding the k shortest loopless paths in a network. Manag. Sci. 17(11), 712–716 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  74. Yu C.T., Luk W.S., Cheung T.Y.: A statistical model for relevance feedback in information retrieval. JACM 23(2), 273–286 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  75. Zadeh L.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  76. Zaniolo, C.: A unified semantics for active and deductive databases. In: RIDS, pp. 271–287 (1993)

  77. XML Path Language (XPath) 2.0, W3C Recommendation 23 January 2007, http://www.w3.org/TR/xpath20/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Selçuk Candan.

Additional information

This research has been funded with NSF Grant, AOC: Archaeological Data Integration for the Study of Long-Term Human and Social Dynamics, 2007–2009.

This work was done while the M. L. Sapino was at ASU for sabbatical.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Candan, K.S., Cao, H., Qi, Y. et al. System support for exploration and expert feedback in resolving conflicts during integration of metadata. The VLDB Journal 17, 1407–1444 (2008). https://doi.org/10.1007/s00778-008-0109-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0109-y

Keywords

Navigation