skip to main content
10.1145/2488388.2488422acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Compact explanation of data fusion decisions

Published: 13 May 2013 Publication History

Abstract

Despite the abundance of useful information on the Web, different Web sources often provide conflicting data, some being out-of-date, inaccurate, or erroneous. Data fusion aims at resolving conflicts and finding the truth. Advanced fusion techniques apply iterative MAP (Maximum A Posteriori) analysis that reasons about trustworthiness of sources and copying relationships between them. Providing explanations for such decisions is important for a better understanding, but can be extremely challenging because of the complexity of the analysis during decision making.
This paper proposes two types of explanations for data-fusion results: snapshot explanations take the provided data and any other decision inferred from the data as evidence and provide a high-level understanding of a fusion decision; comprehensive explanations take only the data as evidence and provide an in-depth understanding of a fusion decision. We propose techniques that can efficiently generate correct and compact explanations. Experimental results show that (1) we generate correct explanations, (2) our techniques can significantly reduce the sizes of the explanations, and (3) we can generate the explanations efficiently.

References

[1]
L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In CAiSE, 2010.
[2]
J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1):1--41, 2008.
[3]
P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren. Curated databases. In Proc. of PODS, 2008.
[4]
A. Chapman and H. Jagadish. Why not? In Sigmod, 2009.
[5]
S. Davidson and J. Freire. Provenance and scientific workflows: Challenges and opportunites. In Proc. of SIGMOD, 2008.
[6]
X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 2010.
[7]
X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Solomon: Seeking the truth via copying detection. PVLDB, 2010.
[8]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009.
[9]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009.
[10]
X. L. Dong and D. Srivastava. Compact explanation of data fusion decisions. http://lunadong.com/publication/explanation_report.pdf.
[11]
M. J. Druzdzel. Qualitative verbal explanations in bayesian belief networks. Artificial Intelligence and Simulation of Behaviour Quarterly, 94:43--54, 1996.
[12]
A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In WSDM, 2010.
[13]
B. Glavic, G. Alonso, R. J. Miller, and L. M. Haas. TRAMP: Understanding the behavior of schema mappings through provenance. PVLDB, 3(1), 2010.
[14]
M. Herschel and M. A. Hernandez. Explaining missing answers to SPJUA queries. PVLDB, 3(1), 2010.
[15]
J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. PVLDB, 1(1), 2008.
[16]
A. Kementsietsidis and M. Wang. Provenance query evaluation: what's so special about it? In CIKM, 2009.
[17]
C. Lacave, R. Atienza, and F. J. Diez. Graphical explanation in bayesian networks. Lecture Notes in Computer Science, 1933:122--129, 2000.
[18]
X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 2013.
[19]
A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and nonanswers. PVLDB, 4(1), 2010.
[20]
J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877--885, 2010.
[21]
J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In IJCAI, pages 2324--2329, 2011.
[22]
A. D. Sarma, A. Jain, and D. Srivastava. I4e: interactive investigation of iterative information extraction. In Sigmod, 2010.
[23]
M. Wu and A. Marian. A framework for corroborating answers from multiple web sources. Inf. Syst., 36(2):431--449, 2011.
[24]
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng., 20:796--808, 2008.
[25]
X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, pages 217--226, 2011.
[26]
B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In QDB, 2012.
[27]
B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '13: Proceedings of the 22nd international conference on World Wide Web
May 2013
1628 pages
ISBN:9781450320351
DOI:10.1145/2488388

Sponsors

  • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
  • CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. copy detection
  2. data fusion
  3. explanation

Qualifiers

  • Research-article

Conference

WWW '13
Sponsor:
  • NICBR
  • CGIBR
WWW '13: 22nd International World Wide Web Conference
May 13 - 17, 2013
Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Record Fusion via Inference and Data AugmentationACM / IMS Journal of Data Science10.1145/35935791:1(1-23)Online publication date: 16-Jan-2024
  • (2023)Decentralized privacy-preserving truth discovery for crowd sensingInformation Sciences10.1016/j.ins.2023.03.046632(730-741)Online publication date: Jun-2023
  • (2022)FEDEXProceedings of the VLDB Endowment10.14778/3565838.356584115:13(3854-3868)Online publication date: 1-Sep-2022
  • (2020)Random Sampling-Arithmetic Mean: A Simple Method of Meteorological Data Quality Control Based on Random Observation ThoughtIEEE Access10.1109/ACCESS.2020.30454348(226999-227013)Online publication date: 2020
  • (2019)Tracy: Tracing Facts over Knowledge Graphs and TextThe World Wide Web Conference10.1145/3308558.3314126(3516-3520)Online publication date: 13-May-2019
  • (2019)ExFaKTProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290996(87-95)Online publication date: 30-Jan-2019
  • (2017)AxiomACM Transactions on Embedded Computing Systems10.1145/304741316:3(1-29)Online publication date: 28-Apr-2017
  • (2017)Where the Truth LiesProceedings of the 26th International Conference on World Wide Web Companion10.1145/3041021.3055133(1003-1012)Online publication date: 3-Apr-2017
  • (2016)Scalable Clustering by Iterative Partitioning and Point Attractor RepresentationACM Transactions on Knowledge Discovery from Data10.1145/293468811:1(1-23)Online publication date: 20-Jul-2016
  • (2016)A Survey on Truth DiscoveryACM SIGKDD Explorations Newsletter10.1145/2897350.289735217:2(1-16)Online publication date: 25-Feb-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media