research-article

Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases

Authors:
Dominique Ritze

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

,
Oliver Lehmberg

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

,
Yaser Oulabi

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

,
Christian Bizer

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

WWW '16: Proceedings of the 25th International Conference on World Wide WebApril 2016Pages 251–261https://doi.org/10.1145/2872427.2883017

Published:11 April 2016Publication History

WWW '16: Proceedings of the 25th International Conference on World Wide Web

Pages 251–261

ABSTRACT

Cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph have gained increasing attention over the last years and are starting to be deployed within various use cases. However, the content of such knowledge bases is far from being complete, far from always being correct, and suffers from deprecation (i.e. population numbers become outdated after some time). Hence, there are efforts to leverage various types of Web data to complement, update and extend such knowledge bases. A source of Web data that potentially provides a very wide coverage are millions of relational HTML tables that are found on the Web. The existing work on using data from Web tables to augment cross-domain knowledge bases reports only aggregated performance numbers. The actual content of the Web tables and the topical areas of the knowledge bases that can be complemented using the tables remain unclear. In this paper, we match a large, publicly available Web table corpus to the DBpedia knowledge base. Based on the matching results, we profile the potential of Web tables for augmenting different parts of cross-domain knowledge bases and report detailed statistics about classes, properties, and instances for which missing values can be filled using Web table data as evidence. In order to estimate the potential quality of the new values, we empirically examine the Local Closed World Assumption and use it to determine the maximal number of correct facts that an ideal data fusion strategy could generate. Using this as ground truth, we compare three data fusion strategies and conclude that knowledge-based trust outperforms PageRank- and voting-based fusion.

References

S. Balakrishnan, A. Y. Halevy, and B. Harb. Applying WebTables in Practice. In Proc. of the 7th Biennial Conference on Innovative Data Systems Research, CIDR '15, 2015.Google Scholar
J. Bleiholder and F. Naumann. Data fusion. ACM Comput. Surv., 41(1):1--41, 2009. Google ScholarDigital Library
K. Braunschweig, M. Thiele, J. Eberius, and W. Lehner. Column-specific Context Extraction for Web Tables. In Proc. of the 30th Annual ACM Symposium on Applied Computing, SAC '15, pages 1072--1077, 2015. Google ScholarDigital Library
V. Bryl and C. Bizer. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proc. of the 23rd Int. Conference on World Wide Web Companion, WWW '14, pages 1129--1134, 2014. Google ScholarDigital Library
M. Cafarella, Y. Halevy, Alonand Zhang, D. Z. Wang, and E. Wu. Uncovering the Relational Web. In Proc. of the WebDB Workshop, 2008.Google Scholar
M. J. Cafarella, A. Halevy, and N. Khoussainova. Data Integration for the Relational Web. Proc. of the VLDB Endow., 2:1090--1101, 2009. Google ScholarDigital Library
M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the Power of Tables on the Web. Proc. of the VLDB Endow., 1:538--549, 2008. Google ScholarDigital Library
A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu. Finding Related Tables. In Proc. of the Int. Conference on Management of Data, pages 817--828, 2012. Google ScholarDigital Library
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In Proc. of the 20th SIGKDD, pages 601--610, 2014. Google ScholarDigital Library
X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based Trust: Estimating the Trustworthiness of Web Sources. Proc. of the VLDB Endow., 8(9):938--949, 2015. Google ScholarDigital Library
R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An Ontology for Search Applications. In Proc. of the 40th Int. Conference on Very Large Data Bases, 2014. Google ScholarDigital Library
O. Hassanzadeh, M. J. Ward, M. Rodriguez-Muro, and K. Srinivas. Understanding a large corpus of web tables through matching with knowledge bases: an empirical study. In Proc. of the 10th Int. Workshop on Ontology Matching, 2015.Google Scholar
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, 6(2):167--195, 2015.Google ScholarCross Ref
O. Lehmberg, D. Ritze, P. Ristoski, R. Meusel, H. Paulheim, and C. Bizer. The Mannheim Search Join Engine. Web Semantics: Science, Services and Agents on the World Wide Web, 35:159--166, 2015. Google ScholarDigital Library
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc.of the VLDB Endow., 3:1338--1347, 2010. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford InfoLab, 1999.Google Scholar
J. Pasternack and D. Roth. Knowing What to Believe (when You Already Know Something). In Proc. of the 23rd Int. Conference on Computational Linguistics, pages 877--885, 2010. Google ScholarDigital Library
D. Ritze, O. Lehmberg, and C. Bizer. Matching HTML Tables to DBpedia. In Proc. of the 5th Int. Conference on Web Intelligence, Mining and Semantics, 2015. Google ScholarDigital Library
Y. A. Sekhavat, F. di Paolo, D. Barbosa, and P. Merialdo. Knowledge Base Augmentation using Tabular Data. In Proc. of the 7th Workshop on Linked Data on the Web, 2014.Google Scholar
M. Surdeanu and H. Ji. Overview of the English Slot Filling Track at the TAC2014 Knowledge Base Population Evaluation. http://nlp.cs.rpi.edu/paper/sf2014overview.pdf, 2014.Google Scholar
P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering Semantics of Tables on the Web. Proc. of the VLDB Endow., pages 528--538, 2011. Google ScholarDigital Library
J. Wang, H. Wang, Z. Wang, and K. Q. Zhu. Understanding Tables on the Web. In Proc. of the 31st Int. Conf. on Conceptual Modeling, pages 141--155, 2012. Google ScholarDigital Library
R. C. Wang and W. W. Cohen. Iterative set expansion of named entities using the web. In Proc. of the 8th IEEE Int. Conference on Data Mining, ICDM '08, pages 1091--1096, 2008. Google ScholarDigital Library
G. Weikum and M. Theobald. From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In Proc. of the 29th Symp. on Principles of Database Systems, pages 65--76, 2010. Google ScholarDigital Library
M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. InfoGather: Entity Augmentation and Attribute Discovery by Holistic Matching with Web Tables. In Proc. of the 2012 SIGMOD, pages 97--108, 2012. Google ScholarDigital Library
X. Yin and W. Tan. Semi-supervised truth discovery. In Proc. of the 20th Int. Conference on World Wide Web, WWW '11, pages 217--226. AC, 2011. Google ScholarDigital Library
M. Zhang and K. Chakrabarti. InfoGather+: Semantic Matching and Annotation of Numeric and Time-varying Attributes in Web Tables. In Proc. of the 2013 ACM SIGMOD Int. Conference on Management of Data, pages 145--156, 2013. Google ScholarDigital Library
X. Zhang, Y. Chen, J. Chen, X. Du, and L. Zou. Mapping Entity-Attribute Web Tables to Web-Scale Knowledge Bases. In Database Systems for Advanced Applications, pages 108--122. Springer Berlin, 2013. Google ScholarCross Ref

Index Terms

Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases
1. Information systems
  1. Data management systems
    1. Information integration

Recommendations

Profiling the semantics of n-ary web table data
SBD '19: Proceedings of the International Workshop on Semantic Big Data

The Web contains millions of relational HTML tables, which cover a multitude of different, often very specific topics. This rich pool of data has motivated a growing body of research on methods that use web table data to extend local tables with ...
Read More
Detecting and Representing Relevant Web Deltas Using Web Join
ICDCS '00: Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)

In this paper, we show how to detect and represent web deltas, i.e., changes in Web information, that are relevant to a user's query in the context of our web warehousing system called WHOWEDA (Warehouse of Web Data). In WHOWEDA, Web information is ...
Read More
KnowMore – knowledge base augmentation with structured web markup
Machine Learning for Knowledge Base Generation and Population

Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431
General Chairs:
Jacqueline Bourdeau
Tele-university (TELUQ), Montreal, QC, Canada
,
Jim A. Hendler
Rensselaer Polytechnic Institute, Troy, NY, USA
,
Roger Nkambou Nkambou
Université du Québec à Montréal, Montreal, QC, Canada
,
Program Chairs:
Ian Horrocks
University of Oxford, UK
,
Ben Y. Zhao
University of California at Santa Barbara, CA, USA
Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 11 April 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data fusion
data profiling
knowledge base augmentation
schema and data matching
slot filling
web tables
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 514
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases

WWW '16: Proceedings of the 25th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Profiling the semantics of n-ary web table data

Detecting and Representing Relevant Web Deltas Using Web Join

KnowMore – knowledge base augmentation with structured web markup

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases

WWW '16: Proceedings of the 25th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Profiling the semantics of n-ary web table data

Detecting and Representing Relevant Web Deltas Using Web Join

KnowMore – knowledge base augmentation with structured web markup

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media