A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables

Vu, Binh; Knoblock, Craig A.; Szekely, Pedro; Pham, Minh; Pujara, Jay

doi:10.1007/978-3-030-88361-4_18

Binh Vu¹⁷,
Craig A. Knoblock¹⁷,
Pedro Szekely¹⁷,
Minh Pham¹⁷ &
…
Jay Pujara¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

International Semantic Web Conference

3392 Accesses
4 Citations

Abstract

There are millions of high-quality tables available in Wikipedia. These tables cover many domains and contain useful information. To make use of these tables for data discovery or data integration, we need precise descriptions of the concepts and relationships in the data, known as semantic descriptions. However, creating semantic descriptions is a complex process requiring considerable manual effort and can be error prone. In this paper, we present a novel probabilistic approach for automatically building semantic descriptions of Wikipedia tables. Our approach leverages hyperlinks in a Wikipedia table and existing knowledge in Wikidata to construct a graph of possible relationships in the table and its context, and then it uses collective inference to distinguish genuine and spurious relationships to form the final semantic description. In contrast to existing methods, our solution can handle tables that require complex semantic descriptions of n-ary relations (e.g., the population of a country in a particular year) or implicit contextual values to describe the data accurately. In our empirical evaluation, our approach outperforms state-of-the-art systems on the SemTab2020 dataset and outperforms those systems by as much as 28% in F1 score on a large set of Wikipedia tables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This requirement is to help reduce ambiguity and speed up the annotation process.
2.
We could not evaluate the other winning systems as we were unable to get access to their code and the papers do not describe them precisely.
3.
https://github.com/usc-isi-i2/GRAMS/releases/tag/iswc-2021.

References

Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random fields and probabilistic soft logic. J. Mach. Learn. Res. 18(1), 3846–3912 (2017)
MathSciNet MATH Google Scholar
Chen, S., et al.: Linkingpark: an integrated approach for semantic table interpretation. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). CEUR-WS. org (2020)
Google Scholar
Cremaschi, M., De Paoli, F., Rula, A., Spahiu, B.: A fully automated approach to a complete semantic table interpretation. Future Gener. Comput. Syst. 112, 478–500 (2020)
Article Google Scholar
Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th Workshop on Linked Data on the Web, Proceedings, vol. 184 (2014)
Google Scholar
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Edinburgh (2012)
Google Scholar
Hassanzadeh, O., Efthymiou, V., Chen, J., Jiménez-Ruiz, E., Srinivas, K.: SemTab 2020: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Data Sets (2020)
Google Scholar
Hulsebos, M., et al.: Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1500–1508. KDD ’19, Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar
Huynh, V.P., Liu, J., Chabot, Y., Labbé, T., Monnin, P., Troncy, R.: Dagobah: enhanced scoring algorithms for scalable annotations of tabular data. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). CEUR-WS. org (2020)
Google Scholar
Knoblock, C.A., et al.: Lessons learned in building linked data for the american art collaborative. In: ISWC 2017–16th International Semantic Web Conference (2017)
Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1–2), 1338–1347 (2010)
Article Google Scholar
Mulwad, V., Finin, T., Joshi, A.: Semantic message passing for generating linked data from tables. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 363–378. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_23
Chapter Google Scholar
Nguyen, P., Kertkeidkachorn, N., Ichise, R., Takeda, H.: Mtab: Matching tabular data to knowledge graph using probability models. CoRR abs/1910.00246 (2019)
Google Scholar
Nguyen, P., Yamada, I., Kertkeidkachorn, N., Ichise, R., Takeda, H.: Mtab4wikidata at semtab 2020: tabular data annotation with wikidata. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). CEUR-WS. org (2020)
Google Scholar
Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic labeling: a domain-independent approach. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 446–462. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_27
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Google Scholar
Ritze, D., Bizer, C.: Matching web tables to dbpedia-a feature utility study. Context 42(41), 19–31 (2017)
Google Scholar
Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to dbpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pp. 1–6 (2015)
Google Scholar
Shigapov, R., Zumstein, P., Kamlah, J., Oberländer, L., Mechnich, J., Schumm, I.: bbw: Matching csv to wikidata via meta-lookup. In: CEUR Workshop Proceedings, vol. 2775, pp. 17–26. RWTH (2020)
Google Scholar
Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. J. Web Semant. 37–38, 152–169 (2016)
Article Google Scholar
Vu, B., Knoblock, C., Pujara, J.: Learning semantic models of data sources using probabilistic graphical models. In: The World Wide Web Conference, pp. 1944–1953. WWW ’19, ACM, New York, NY, USA (2019)
Google Scholar
Vu, B., Pujara, J., Knoblock, C.A.: D-repr: a language for describing and mapping diversely-structured data sources to rdf. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 189–196 (2019)
Google Scholar
Zhang, Z.: Effective and efficient semantic table interpretation using tableminer+. Semant. Web 8(6), 921–957 (2017)
Article Google Scholar

Download references

Acknowledgements

This research was sponsored by the Army Research Office and the Defense Advance Research Projects Agency and was accomplished under Grant Number W911NF-18-1-0027. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office and Defense Advance Research Projects Agency or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Author information

Authors and Affiliations

USC Information Sciences Institute, Marina Del Rey, CA, 90292, USA
Binh Vu, Craig A. Knoblock, Pedro Szekely, Minh Pham & Jay Pujara

Authors

Binh Vu
View author publications
You can also search for this author in PubMed Google Scholar
Craig A. Knoblock
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Szekely
View author publications
You can also search for this author in PubMed Google Scholar
Minh Pham
View author publications
You can also search for this author in PubMed Google Scholar
Jay Pujara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binh Vu .

Editor information

Editors and Affiliations

University of Würzburg, Würzburg, Germany
Andreas Hotho
Linköping University, Linköping, Sweden
Eva Blomqvist
University of Düsseldorf, Düsseldorf, Germany
Stefan Dietze
IBM Research - Thomas J. Watson Research, Hawthorne, CA, USA
Achille Fokoue
University of Texas, Austin, TX, USA
Ying Ding
Imperial College, London, UK
Payam Barnaghi
Australian National University, Canberra, ACT, Australia
Armin Haller
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
The Open University Walton Hall, Milton Keynes, UK
Harith Alani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, B., Knoblock, C.A., Szekely, P., Pham, M., Pujara, J. (2021). A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-88361-4_18
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)