skip to main content
10.1145/2872427.2883017acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases

Published:11 April 2016Publication History

ABSTRACT

Cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph have gained increasing attention over the last years and are starting to be deployed within various use cases. However, the content of such knowledge bases is far from being complete, far from always being correct, and suffers from deprecation (i.e. population numbers become outdated after some time). Hence, there are efforts to leverage various types of Web data to complement, update and extend such knowledge bases. A source of Web data that potentially provides a very wide coverage are millions of relational HTML tables that are found on the Web. The existing work on using data from Web tables to augment cross-domain knowledge bases reports only aggregated performance numbers. The actual content of the Web tables and the topical areas of the knowledge bases that can be complemented using the tables remain unclear. In this paper, we match a large, publicly available Web table corpus to the DBpedia knowledge base. Based on the matching results, we profile the potential of Web tables for augmenting different parts of cross-domain knowledge bases and report detailed statistics about classes, properties, and instances for which missing values can be filled using Web table data as evidence. In order to estimate the potential quality of the new values, we empirically examine the Local Closed World Assumption and use it to determine the maximal number of correct facts that an ideal data fusion strategy could generate. Using this as ground truth, we compare three data fusion strategies and conclude that knowledge-based trust outperforms PageRank- and voting-based fusion.

References

  1. S. Balakrishnan, A. Y. Halevy, and B. Harb. Applying WebTables in Practice. In Proc. of the 7th Biennial Conference on Innovative Data Systems Research, CIDR '15, 2015.Google ScholarGoogle Scholar
  2. J. Bleiholder and F. Naumann. Data fusion. ACM Comput. Surv., 41(1):1--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Braunschweig, M. Thiele, J. Eberius, and W. Lehner. Column-specific Context Extraction for Web Tables. In Proc. of the 30th Annual ACM Symposium on Applied Computing, SAC '15, pages 1072--1077, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Bryl and C. Bizer. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proc. of the 23rd Int. Conference on World Wide Web Companion, WWW '14, pages 1129--1134, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Cafarella, Y. Halevy, Alonand Zhang, D. Z. Wang, and E. Wu. Uncovering the Relational Web. In Proc. of the WebDB Workshop, 2008.Google ScholarGoogle Scholar
  6. M. J. Cafarella, A. Halevy, and N. Khoussainova. Data Integration for the Relational Web. Proc. of the VLDB Endow., 2:1090--1101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the Power of Tables on the Web. Proc. of the VLDB Endow., 1:538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu. Finding Related Tables. In Proc. of the Int. Conference on Management of Data, pages 817--828, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In Proc. of the 20th SIGKDD, pages 601--610, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based Trust: Estimating the Trustworthiness of Web Sources. Proc. of the VLDB Endow., 8(9):938--949, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An Ontology for Search Applications. In Proc. of the 40th Int. Conference on Very Large Data Bases, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. O. Hassanzadeh, M. J. Ward, M. Rodriguez-Muro, and K. Srinivas. Understanding a large corpus of web tables through matching with knowledge bases: an empirical study. In Proc. of the 10th Int. Workshop on Ontology Matching, 2015.Google ScholarGoogle Scholar
  13. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, 6(2):167--195, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. O. Lehmberg, D. Ritze, P. Ristoski, R. Meusel, H. Paulheim, and C. Bizer. The Mannheim Search Join Engine. Web Semantics: Science, Services and Agents on the World Wide Web, 35:159--166, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc.of the VLDB Endow., 3:1338--1347, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  17. J. Pasternack and D. Roth. Knowing What to Believe (when You Already Know Something). In Proc. of the 23rd Int. Conference on Computational Linguistics, pages 877--885, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Ritze, O. Lehmberg, and C. Bizer. Matching HTML Tables to DBpedia. In Proc. of the 5th Int. Conference on Web Intelligence, Mining and Semantics, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. A. Sekhavat, F. di Paolo, D. Barbosa, and P. Merialdo. Knowledge Base Augmentation using Tabular Data. In Proc. of the 7th Workshop on Linked Data on the Web, 2014.Google ScholarGoogle Scholar
  20. M. Surdeanu and H. Ji. Overview of the English Slot Filling Track at the TAC2014 Knowledge Base Population Evaluation. http://nlp.cs.rpi.edu/paper/sf2014overview.pdf, 2014.Google ScholarGoogle Scholar
  21. P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering Semantics of Tables on the Web. Proc. of the VLDB Endow., pages 528--538, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Wang, H. Wang, Z. Wang, and K. Q. Zhu. Understanding Tables on the Web. In Proc. of the 31st Int. Conf. on Conceptual Modeling, pages 141--155, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. C. Wang and W. W. Cohen. Iterative set expansion of named entities using the web. In Proc. of the 8th IEEE Int. Conference on Data Mining, ICDM '08, pages 1091--1096, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Weikum and M. Theobald. From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In Proc. of the 29th Symp. on Principles of Database Systems, pages 65--76, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. InfoGather: Entity Augmentation and Attribute Discovery by Holistic Matching with Web Tables. In Proc. of the 2012 SIGMOD, pages 97--108, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Yin and W. Tan. Semi-supervised truth discovery. In Proc. of the 20th Int. Conference on World Wide Web, WWW '11, pages 217--226. AC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Zhang and K. Chakrabarti. InfoGather+: Semantic Matching and Annotation of Numeric and Time-varying Attributes in Web Tables. In Proc. of the 2013 ACM SIGMOD Int. Conference on Management of Data, pages 145--156, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Zhang, Y. Chen, J. Chen, X. Du, and L. Zou. Mapping Entity-Attribute Web Tables to Web-Scale Knowledge Bases. In Database Systems for Advanced Applications, pages 108--122. Springer Berlin, 2013. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '16: Proceedings of the 25th International Conference on World Wide Web
      April 2016
      1482 pages
      ISBN:9781450341431

      Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 11 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader