Abstract
Many current web pages include structured data which can directly be processed and used. Search engines, in particular, gather that structured data and provide question answering capabilities over the integrated data with an entity-centric presentation of the results. Due to the decentralized nature of the web, multiple structured data sources can provide similar information about an entity. But data from different sources may involve different vocabularies and modeling granularities, which makes integration difficult. We present FusE, an approach that identifies similar entity-specific data across sources, independent of the vocabulary and data modeling choices. We apply our method along the scenario of a trustable knowledge panel, conduct experiments in which we identify and process entity data from web sources, and compare the output to a competing system. The results underline the advantages of the presented entity-centric data fusion approach.
- Krisztian Balog, David Carmel, Arjen P. de Vries, Daniel M. Herzig, Peter Mika, Haggai Roitman, Ralf Schenkel, Pavel Serdyukov, and Thanh Tran Duc. 2012. The first joint international workshop on entity-oriented and semantic search (JIWES). SIGIR Forum 46, 2 (2012), 87--94. Google ScholarDigital Library
- Tim Berners-Lee. 2006. Linked Data. Retrieved on February 7, 2019 from https://www.w3.org/DesignIssues/LinkedData.html.Google Scholar
- Abraham Bernstein, James Hendler, and Natalya Noy. 2016. A new look at the semantic web. Communications of the ACM 59, 9 (2016), 35--37. Google ScholarDigital Library
- Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 1247--1250. Google ScholarDigital Library
- Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. 2787--2795. Google ScholarDigital Library
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117. Google ScholarDigital Library
- Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. J. Artif. Intell. Res. 49 (2014), 1--47. Google ScholarDigital Library
- Amy Cavenaile. 2016. You probably haven’t even noticed Google’s sketchy quest to control the world’s knowledge. Retrieved on February 7, 2019 from https://www.washingtonpost.com/news/the-intersect/wp/2016/05/11/you.Google Scholar
- Michelle Cheatham and Pascal Hitzler. 2013. String similarity metrics for ontology alignment. In The Semantic Web -- Proceedings of the 12th International Semantic Web Conference (ISWC’13), Part II. Springer, Berlin, 294--309. Google ScholarDigital Library
- Mariano P. Consens, Valeria Fionda, Shahan Khatchadourian, and Giuseppe Pirrò. 2015. S+EPPs: Construct and explore bisimulation summaries, plus optimize navigational queries; all on existing SPARQL systems. Proceedings of the VLDB Endowment 8, 12 (2015), 2028--2031. Google ScholarDigital Library
- Stefan Dietze. 2017. Retrieval, crawling and fusion of entity-centric data on the web. In Semantic Keyword-Based Search on Structured Data Sources: Proceedings of the COST Action IC1302 2nd International KEYSTONE Conference (IKC’16), Revised Selected Papers, Andrea Calì, Dorian Gorgan, and Martín Ugarte (Eds.). Springer International Publishing, 3--16.Google ScholarCross Ref
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2, 1 (2009), 562--573. Google ScholarDigital Library
- Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 601--610. Google ScholarDigital Library
- Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment 7, 10 (2014), 881--892. Google ScholarDigital Library
- Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: Estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment 8, 9 (2015), 938--949. Google ScholarDigital Library
- Songyun Duan, Achille Fokoue, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas, and Michael J. Ward. 2012. Instance-based matching of large ontologies using locality-sensitive hashing. In Proceedings of the International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 7649. Springer, 49--64. Google ScholarDigital Library
- Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandečić. 2014. Introducing wikidata to the linked data web. In The Semantic Web -- Proceedings of the 13th International Semantic Web Conference (ISWX’14), Part I, Lecture Notes in Computer Science Vol. 8796. Springer International Publishing, 50--65. Google ScholarDigital Library
- Heather Ford and Mark Graham. 2016. Code and the City. Routledge, 200--214.Google Scholar
- Gleb Gawriljuk, Andreas Harth, Craig A. Knoblock, and Pedro A. Szekely. 2016. A scalable approach to incrementally building knowledge graphs. In Proceedings of TPDL, Lecture Notes in Computer Science, Vol. 9819. Springer, 188--199.Google Scholar
- Anna Lisa Gentile, Petar Ristoski, Steffen Eckel, Dominique Ritze, and Heiko Paulheim. 2017. Entity matching on web tables: A table embeddings approach for blocking. In Proceedings of EDBT. OpenProceedings.org, 510--513.Google Scholar
- Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proceedings of the VLDB Endowment 7, 9 (2014), 697--708. Google ScholarDigital Library
- Ramanathan V. Guha, Dan Brickley, and Steve MacBeth. 2015. Schema.Org: Evolution of structured data on the web. ACM Queue 13, 9, Article 10 (2015), 28 pages. Google ScholarDigital Library
- Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems (CEUR Workshop Proceedings), Vol. 1457. CEUR-WS.org, 32--47.Google Scholar
- Daniel M. Herzig, Peter Mika, Roi Blanco, and Thanh Tran. 2013. Federated entity search using on-the-fly consolidation. In The Semantic Web -- Proceedings of the 12th International Semantic Web Conference (ISWC’13), Part I. Springer, Berlin, 167--183. Google ScholarDigital Library
- Aidan Hogan, Andreas Harth, and Stefan Decker. 2007. Performing object consolidation on the semantic web data graph. In Proceedings of 1st I3: Identity, Identifiers, Identification Workshop Co-located with the 16th International World Wide Web Conference (WWW’07).Google Scholar
- Wei Hu, Jianfeng Chen, Hang Zhang, and Yuzhong Qu. 2011. How matchable are four thousand ontologies on the semantic web. In The Semantic Web: Research and Applications: Proceedings of the 8th Extended Semantic Web Conference (ESWC’11), Part I. Springer, Berlin, 290--304. Google ScholarDigital Library
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30h Annual ACM Symposium on Theory of Computing (STOC’98). ACM, 604--613. Google ScholarDigital Library
- Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. 2006. Record linkage: Similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, 802--803. Google ScholarDigital Library
- Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Robert Meusel, Heiko Paulheim, and Christian Bizer. 2015. The Mannheim search join engine. Journal of Web Semantics 35 (2015), 159--166. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Frank Manola and Eric Miller. 2004. RDF Primer. W3C Recommendation, Retrieved on February 7, 2019 from http://www.w3.org/TR/rdf-syntax/.Google Scholar
- Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The WebDataCommons microdata, RDFa and microformat dataset series. In The Semantic Web -- Proceedings of the 13th International Semantic Web Conference (ISWC’14), Part I. Springer International Publishing, 277--292. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119. Google ScholarDigital Library
- Sebastian Neumaier, Jürgen Umbrich, Josiane Xavier Parreira, and Axel Polleres. 2016. Multi-level semantic labelling of numerical values. In Proceedings of the International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 9981. 428--445.Google Scholar
- Natasha Noy and Alan Rector. 2006. Defining N-ary Relations on the Semantic Web. W3C Working Group Note, http://www.w3.org/TR/swbp-n-aryRelations.Google Scholar
- Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite. 2010. Linking and Building Ontologies of Linked Data. Springer, Berlin, 598--614.Google Scholar
- Dominique Ritze, Christian Meilicke, Ondřej Šváb Zamazal, and Heiner Stuckenschmidt. 2009. A pattern-based ontology matching approach for detecting complex correspondences. In Proceedings of the 4th International Workshop on Ontology Matching (OM’09) Collocated with the 8th International Semantic Web Conference (ISWC’09) (CEUR Workshop Proceedings), Vol. 551. CEUR-WS.org, 25--36. Google ScholarDigital Library
- Pavel Shvaiko, Jérôme Euzenat, Fausto Giunchiglia, Heiner Stuckenschmidt, Natasha Noy, and Arnon Rosenthal (Eds.). 2009. Proceedings of the 4th International Workshop on Ontology Matching (OM’09) Collocated with the 8th International Semantic Web Conference (ISWC’09). CEUR Workshop Proceedings, Vol. 551. CEUR-WS.org.Google Scholar
- Amit Singhal. 2012. Introducing the Knowledge Graph: Things, not strings. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html.Google Scholar
- Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment 5, 3 (2011), 157--168. Google ScholarDigital Library
- Andreas Thalhammer. 2016. Linked Data Entity Summarization. PhD thesis. Karlsruhe Institute of Technology, Department of Economics and Management, Karlsruhe.Google Scholar
- Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger. 2016. LinkSUM: Using link analysis to summarize entity data. In Web Engineering: Proceedings of the 16th International Conference (ICWE’16). Lecture Notes in Computer Science, Vol. 9671. Springer International Publishing, 244--261.Google Scholar
- Andreas Thalhammer, Steffen Thoma, Andreas Harth, and Rudi Studer. 2017. Entity-centric data fusion on the web. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT’17). ACM, 25--34. Google ScholarDigital Library
- Steffen Thoma, Achim Rettinger, and Fabian Both. 2017. Towards holistic concept representations: Embedding relational knowledge, visual attributes, and distributional word semantics. In Proceedings of International Semantic Web Conference (1), Lecture Notes in Computer Science, Vol. 10587. Springer, 694--710.Google ScholarCross Ref
- Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile graph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW’18,). ACM, 539--548. Google ScholarDigital Library
- Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. 2010. Sig.ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 8, 4 (2010), 355--364. Google ScholarDigital Library
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Communications of the ACM 57, 10 (2014), 78--85. Google ScholarDigital Library
- Denny Vrandečić, Varun Ratnakar, Markus Krötzsch, and Yolanda Gil. 2011. Shortipedia: Aggregating and curating semantic web data. Web Semantics: Science, Services and Agents on the World Wide Web 9, 3 (2011), 334--338. Google ScholarDigital Library
Index Terms
- FusE: Entity-Centric Data Fusion on Linked Data
Recommendations
Entity-centric Data Fusion on the Web
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social MediaA lot of current web pages include structured data which can directly be processed and used. Search engines, in particular, gather that structured data and provide question answering capabilities over the integrated data with an entity-centric ...
Design of ETL Tool for Structured Data Based on Data Warehouse
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application EngineeringThis paper takes the current business system of a mobile communication-equipment-chain sales-service-company as an example, and analyzes the problem that the data from multiple data sources cannot directly be loaded into the data warehouse by the ...
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life SciencesThe Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
Comments