skip to main content
10.1145/3209978.3210021acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Automated Comparative Table Generation for Facilitating Human Intervention in Multi-Entity Resolution

Published: 27 June 2018 Publication History

Abstract

Entity resolution (ER), the process of identifying entities that refer to the same real-world object, has long been studied in the knowledge graph (KG) community, among many others. Humans, as a valuable source of background knowledge, are increasingly getting involved in this loop by crowdsourcing and active learning, where presenting condensed and easily-compared information is vital to help human intervene in an ER task. However, current methods for single entity or pairwise summarization cannot well support humans to observe and compare multiple entities simultaneously, which impairs the efficiency and accuracy of human intervention. In this paper, we propose an automated approach to select a few important properties and values for a set of entities, and assemble them by a comparative table. We formulate several optimization problems for generating an optimal comparative table according to intuitive goodness measures and various constraints. Our experiments on real-world datasets, comparison with related work and user study demonstrate the superior efficiency, precision and user satisfaction of our approach in multi-entity resolution (MER).

References

[1]
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR. ACM, Melbourne, Australia, 335--336.
[2]
Michelle Cheatham and Pascal Hitzler. 2013. String similarity metrics for ontology alignment. In ISWC, Part II, Vol. LNCS 8219. Springer, Sydney, Australia, 294--309.
[3]
Michelle Cheatham and Pascal Hitzler. 2014. The properties of property alignment. In ISWC Workshop on Ontology Matching. CEUR-WS, Trentino, Italy, 13--24.
[4]
Gong Cheng, Danyun Xu, and Yuzhong Qu. 2015. C3D+P: A summarization method for interactive entity resolution. Journal of Web Semantics 35 (2015), 203--213.
[5]
Peter Christen. 2012. Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin, Germany.
[6]
William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In IIWEB. AAAI Press, Acapulco, Mexico, 73--78.
[7]
Yves Crama and Frits C.R. Spieksma. 1992. Approximation algorithms for threedimensional assignment problems with triangle inequalities. European Journal of Operational Research 60, 3 (1992), 273--279.
[8]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudrè-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW. ACM, Lyon, France, 469--478.
[9]
Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 1--16.
[10]
Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. 2004. Comparing and aggregating rankings with Ties. In PODS. ACM, Paris, France, 47--58.
[11]
Sean M. Falconer and Margaret-Anne Storey. 2007. A cognitive support framework for ontology mapping. In ISWC/ASWC, Vol. LNCS 4825. Springer, Busan, Korea, 114--127.
[12]
Uriel Feige. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 4 (1998), 634--652.
[13]
Bo Fu, Natalya F. Noy, and Margaret-Anne Storey. 2013. Indented tree or graph? A usability study of ontology visualization techniques in the context of class mapping evaluation. In ISWC, Part I, Vol. LNCS 8218. Springer, Sydney, Australia, 117--134.
[14]
Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: Tutorial. In PVLDB, Vol. 5. VLDB, Istanbul, Turkey, 2018--2019.
[15]
Hugh Glaser, Afraz Jaffri, and Ian C. Millard. 2009. Managing co-reference on the semantic web. In WWW Workshop on Linked Data on the Web. CEUR-WS, Madrid, Spain, 6.
[16]
Kalpa Gunaratna, Krishnaparasad Thirunarayan, and Amit Sheth. 2015. FACES: Diversity-aware entity summarization using incremental hierarchical conceptual clustering. In AAAI. AAAI Press, Austin, TX, USA, 116--122.
[17]
Kalpa Gunaratna, Amir Hossein Yazdavar, Krishnaparasad Thirunarayan, Amit Sheth, and Gong Cheng. 2017. Relatedness-based multi-entity summarization. In IJCAI. IJCAI Organization, Melbourne, Australia, to appear.
[18]
Harry Halpin, Daniel M. Herzig, Peter Mika, Roi Blanco, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2010. Evaluating ad-hoc object retrieval. In ISWC Workshop on Evaluating Semantic Technologies. CEUR-WS, Shanghai, China, 9.
[19]
Wei Hu and Cunxin Jia. 2015. A bootstrapping approach to entity linkage on the semantic web. Journal of Web Semantics 34 (2015), 1--12.
[20]
Wei Hu and Yuzhong Qu. 2008. Falcon-AO: A practical ontology matching system. Journal of Web Semantics 6 (2008), 237--239.
[21]
Robert Isele and Christian Bizer. 2013. Active learning of expressive linkage rules using genetic programming. Journal of Web Semantics 23 (2013), 2--15.
[22]
Kalervo Järvelin and Jaana Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR. ACM, Athens, Greece, 41--48.
[23]
Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. 2011. LogMap: Logic-based and scalable ontology matching. In ISWC, Vol. LNCS 7031. Springer, Bonn, Germany, 273--288.
[24]
Maurice George Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81--93.
[25]
Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. In KDD. ACM, Sydney, Australia, 745--754.
[26]
Imen Megdiche, Olivier Teste, and Cassia Trojahn. 2016. An extensible linear approach for holistic ontology matching. In ISWC, Part I, Vol. LNCS 9981. Springer, Kobe, Japan, 393--410.
[27]
Axel-Cyrille Ngonga Ngomo, Klaus Lyko, and Victor Christen. 2013. COALA -- Correlation-aware active learning of link specifications. In ESWC, Vol. LNCS 7882. Springer, Montpellier, France, 442--456.
[28]
Arthur O'Sullivan and Steven M. Sheffrin. 2003. Economics: Principles in action. Pearson Prentice Hall, Upper Saddle River, NJ, USA.
[29]
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A conditional entropybased external cluster evaluation measure. In EMNLP-CoNLL. ACL, Prague, Czech Republic, 410--420.
[30]
Patrick E. Shrout and Joseph L. Fleiss. 1979. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 86, 2 (1979), 420--428.
[31]
Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger. 2016. LinkSUM: Using link analysis to summarize entity data. In ICWE, Vol. LNCS 9671. Springer, Lugano, Switzerland, 244--261.
[32]
Vasilis Verroios, Hector Garcia-Molina, and Yannis Papakonstantinou. 2017. Waldo: An adaptive human interface for crowd entity resolution. In SIGMOD. ACM, Chicago, IL, USA, 1133--1148.
[33]
Iris Vessey. 1991. Cognitive fit: A theory-based analysis of the graphs versus tables literature. Decision Sciences 22, 2 (1991), 219--240.
[34]
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsoucing entity resolution. In PVLDB, Vol. 5. VLDB, Istanbul, Turkey, 1483-- 1494.
[35]
Steven Euijong Whang, Julian McAuley, and Hector Garcia-Molina. 2012. Compare me maybe: Crowd entity resolution. Technical Report. Stanford University.
[36]
Chuncheng Xiang, Baobao Chang, and Zhifang Sui. 2015. An ontology matching approach based on affinity-preserving random walks. In IJCAI. IJCAI Organization, Buenos Aires, Argentina, 1471--1477.
[37]
Ning Yan, Sona Hasani, Abolfazl Asudeh, and Chengkai Li. 2016. Generating preview tables for entity graphs. In SIGMOD. ACM, San Francisco, CA, USA, 1797--1811.
[38]
Yudian Zheng, Guoliang Li, and Reynold Cheng. 2017. DOCS: A domain-aware crowdsourcing system using knowledge bases. In VLDB. VLDB Endowment, Munich, Germany, 361--372.

Cited By

View all
  • (2024)Linguistic summarisation of multiple entities in RDF graphsApplied Computing and Intelligence10.3934/aci.20240014:1(1-18)Online publication date: 2024
  • (2023)Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based ReasoningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591708(174-184)Online publication date: 19-Jul-2023
  • (2022)Impact of the Characteristics of Multi-source Entity Matching Tasks on the Performance of Active Learning MethodsThe Semantic Web10.1007/978-3-031-06981-9_7(113-129)Online publication date: 31-May-2022
  • Show More Cited By

Index Terms

  1. Automated Comparative Table Generation for Facilitating Human Intervention in Multi-Entity Resolution

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
      June 2018
      1509 pages
      ISBN:9781450356572
      DOI:10.1145/3209978
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. comparative table
      2. entity resolution
      3. holistic property matching
      4. knowledge graph
      5. multi-entity summarization

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SIGIR '18
      Sponsor:

      Acceptance Rates

      SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Linguistic summarisation of multiple entities in RDF graphsApplied Computing and Intelligence10.3934/aci.20240014:1(1-18)Online publication date: 2024
      • (2023)Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based ReasoningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591708(174-184)Online publication date: 19-Jul-2023
      • (2022)Impact of the Characteristics of Multi-source Entity Matching Tasks on the Performance of Active Learning MethodsThe Semantic Web10.1007/978-3-031-06981-9_7(113-129)Online publication date: 31-May-2022
      • (2021)Entity summarization: State of the art and future challengesJournal of Web Semantics10.1016/j.websem.2021.10064769(100647)Online publication date: May-2021
      • (undefined)Entity Summarization: State of the Art and Future ChallengesSSRN Electronic Journal10.2139/ssrn.3945397
      • (undefined)Entity Summarization: State of the Art and Future ChallengesSSRN Electronic Journal10.2139/ssrn.3944540

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media