skip to main content
research-article

FusE: Entity-Centric Data Fusion on Linked Data

Authors Info & Claims
Published:17 February 2019Publication History
Skip Abstract Section

Abstract

Many current web pages include structured data which can directly be processed and used. Search engines, in particular, gather that structured data and provide question answering capabilities over the integrated data with an entity-centric presentation of the results. Due to the decentralized nature of the web, multiple structured data sources can provide similar information about an entity. But data from different sources may involve different vocabularies and modeling granularities, which makes integration difficult. We present FusE, an approach that identifies similar entity-specific data across sources, independent of the vocabulary and data modeling choices. We apply our method along the scenario of a trustable knowledge panel, conduct experiments in which we identify and process entity data from web sources, and compare the output to a competing system. The results underline the advantages of the presented entity-centric data fusion approach.

References

  1. Krisztian Balog, David Carmel, Arjen P. de Vries, Daniel M. Herzig, Peter Mika, Haggai Roitman, Ralf Schenkel, Pavel Serdyukov, and Thanh Tran Duc. 2012. The first joint international workshop on entity-oriented and semantic search (JIWES). SIGIR Forum 46, 2 (2012), 87--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tim Berners-Lee. 2006. Linked Data. Retrieved on February 7, 2019 from https://www.w3.org/DesignIssues/LinkedData.html.Google ScholarGoogle Scholar
  3. Abraham Bernstein, James Hendler, and Natalya Noy. 2016. A new look at the semantic web. Communications of the ACM 59, 9 (2016), 35--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 1247--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. 2787--2795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. J. Artif. Intell. Res. 49 (2014), 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amy Cavenaile. 2016. You probably haven’t even noticed Google’s sketchy quest to control the world’s knowledge. Retrieved on February 7, 2019 from https://www.washingtonpost.com/news/the-intersect/wp/2016/05/11/you.Google ScholarGoogle Scholar
  9. Michelle Cheatham and Pascal Hitzler. 2013. String similarity metrics for ontology alignment. In The Semantic Web -- Proceedings of the 12th International Semantic Web Conference (ISWC’13), Part II. Springer, Berlin, 294--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mariano P. Consens, Valeria Fionda, Shahan Khatchadourian, and Giuseppe Pirrò. 2015. S+EPPs: Construct and explore bisimulation summaries, plus optimize navigational queries; all on existing SPARQL systems. Proceedings of the VLDB Endowment 8, 12 (2015), 2028--2031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stefan Dietze. 2017. Retrieval, crawling and fusion of entity-centric data on the web. In Semantic Keyword-Based Search on Structured Data Sources: Proceedings of the COST Action IC1302 2nd International KEYSTONE Conference (IKC’16), Revised Selected Papers, Andrea Calì, Dorian Gorgan, and Martín Ugarte (Eds.). Springer International Publishing, 3--16.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2, 1 (2009), 562--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 601--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment 7, 10 (2014), 881--892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: Estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment 8, 9 (2015), 938--949. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Songyun Duan, Achille Fokoue, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas, and Michael J. Ward. 2012. Instance-based matching of large ontologies using locality-sensitive hashing. In Proceedings of the International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 7649. Springer, 49--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandečić. 2014. Introducing wikidata to the linked data web. In The Semantic Web -- Proceedings of the 13th International Semantic Web Conference (ISWX’14), Part I, Lecture Notes in Computer Science Vol. 8796. Springer International Publishing, 50--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Heather Ford and Mark Graham. 2016. Code and the City. Routledge, 200--214.Google ScholarGoogle Scholar
  19. Gleb Gawriljuk, Andreas Harth, Craig A. Knoblock, and Pedro A. Szekely. 2016. A scalable approach to incrementally building knowledge graphs. In Proceedings of TPDL, Lecture Notes in Computer Science, Vol. 9819. Springer, 188--199.Google ScholarGoogle Scholar
  20. Anna Lisa Gentile, Petar Ristoski, Steffen Eckel, Dominique Ritze, and Heiko Paulheim. 2017. Entity matching on web tables: A table embeddings approach for blocking. In Proceedings of EDBT. OpenProceedings.org, 510--513.Google ScholarGoogle Scholar
  21. Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proceedings of the VLDB Endowment 7, 9 (2014), 697--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ramanathan V. Guha, Dan Brickley, and Steve MacBeth. 2015. Schema.Org: Evolution of structured data on the web. ACM Queue 13, 9, Article 10 (2015), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems (CEUR Workshop Proceedings), Vol. 1457. CEUR-WS.org, 32--47.Google ScholarGoogle Scholar
  24. Daniel M. Herzig, Peter Mika, Roi Blanco, and Thanh Tran. 2013. Federated entity search using on-the-fly consolidation. In The Semantic Web -- Proceedings of the 12th International Semantic Web Conference (ISWC’13), Part I. Springer, Berlin, 167--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aidan Hogan, Andreas Harth, and Stefan Decker. 2007. Performing object consolidation on the semantic web data graph. In Proceedings of 1st I3: Identity, Identifiers, Identification Workshop Co-located with the 16th International World Wide Web Conference (WWW’07).Google ScholarGoogle Scholar
  26. Wei Hu, Jianfeng Chen, Hang Zhang, and Yuzhong Qu. 2011. How matchable are four thousand ontologies on the semantic web. In The Semantic Web: Research and Applications: Proceedings of the 8th Extended Semantic Web Conference (ESWC’11), Part I. Springer, Berlin, 290--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30h Annual ACM Symposium on Theory of Computing (STOC’98). ACM, 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. 2006. Record linkage: Similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, 802--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Robert Meusel, Heiko Paulheim, and Christian Bizer. 2015. The Mannheim search join engine. Journal of Web Semantics 35 (2015), 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Frank Manola and Eric Miller. 2004. RDF Primer. W3C Recommendation, Retrieved on February 7, 2019 from http://www.w3.org/TR/rdf-syntax/.Google ScholarGoogle Scholar
  32. Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The WebDataCommons microdata, RDFa and microformat dataset series. In The Semantic Web -- Proceedings of the 13th International Semantic Web Conference (ISWC’14), Part I. Springer International Publishing, 277--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sebastian Neumaier, Jürgen Umbrich, Josiane Xavier Parreira, and Axel Polleres. 2016. Multi-level semantic labelling of numerical values. In Proceedings of the International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 9981. 428--445.Google ScholarGoogle Scholar
  35. Natasha Noy and Alan Rector. 2006. Defining N-ary Relations on the Semantic Web. W3C Working Group Note, http://www.w3.org/TR/swbp-n-aryRelations.Google ScholarGoogle Scholar
  36. Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite. 2010. Linking and Building Ontologies of Linked Data. Springer, Berlin, 598--614.Google ScholarGoogle Scholar
  37. Dominique Ritze, Christian Meilicke, Ondřej Šváb Zamazal, and Heiner Stuckenschmidt. 2009. A pattern-based ontology matching approach for detecting complex correspondences. In Proceedings of the 4th International Workshop on Ontology Matching (OM’09) Collocated with the 8th International Semantic Web Conference (ISWC’09) (CEUR Workshop Proceedings), Vol. 551. CEUR-WS.org, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pavel Shvaiko, Jérôme Euzenat, Fausto Giunchiglia, Heiner Stuckenschmidt, Natasha Noy, and Arnon Rosenthal (Eds.). 2009. Proceedings of the 4th International Workshop on Ontology Matching (OM’09) Collocated with the 8th International Semantic Web Conference (ISWC’09). CEUR Workshop Proceedings, Vol. 551. CEUR-WS.org.Google ScholarGoogle Scholar
  39. Amit Singhal. 2012. Introducing the Knowledge Graph: Things, not strings. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html.Google ScholarGoogle Scholar
  40. Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment 5, 3 (2011), 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Andreas Thalhammer. 2016. Linked Data Entity Summarization. PhD thesis. Karlsruhe Institute of Technology, Department of Economics and Management, Karlsruhe.Google ScholarGoogle Scholar
  42. Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger. 2016. LinkSUM: Using link analysis to summarize entity data. In Web Engineering: Proceedings of the 16th International Conference (ICWE’16). Lecture Notes in Computer Science, Vol. 9671. Springer International Publishing, 244--261.Google ScholarGoogle Scholar
  43. Andreas Thalhammer, Steffen Thoma, Andreas Harth, and Rudi Studer. 2017. Entity-centric data fusion on the web. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT’17). ACM, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Steffen Thoma, Achim Rettinger, and Fabian Both. 2017. Towards holistic concept representations: Embedding relational knowledge, visual attributes, and distributional word semantics. In Proceedings of International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 10587. Springer, 694--710.Google ScholarGoogle ScholarCross RefCross Ref
  45. Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile graph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW’18,). ACM, 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. 2010. Sig.ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 8, 4 (2010), 355--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Communications of the ACM 57, 10 (2014), 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Denny Vrandečić, Varun Ratnakar, Markus Krötzsch, and Yolanda Gil. 2011. Shortipedia: Aggregating and curating semantic web data. Web Semantics: Science, Services and Agents on the World Wide Web 9, 3 (2011), 334--338. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FusE: Entity-Centric Data Fusion on Linked Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 13, Issue 2
        May 2019
        156 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/3313948
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 February 2019
        • Accepted: 1 January 2019
        • Revised: 1 November 2018
        • Received: 1 January 2018
        Published in tweb Volume 13, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format