From Big Data to Big Knowledge

Theobald, Martin

doi:10.1007/978-3-030-10801-4_5

Martin Theobald¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11376))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Informatics

751 Accesses

Abstract

Today’s knowledge bases (KBs) capture facts about the world’s entities, their properties, and their semantic relationships in the form of subject-predicate-object (SPO) triples. Domain-oriented KBs, such as DBpedia, Yago, Wikidata or Freebase, capture billions of facts that have been (semi-)automatically extracted from Wikipedia articles. Their commercial counterparts at Google, Bing or Baidu provide back-end support for search engines, online recommendations, and various knowledge-centric services.

This invited talk provides an overview of our recent contributions—and also highlights a number of open research challenges—in the context of extracting, managing, and reasoning with large semantic KBs. Compared to domain-oriented extraction techniques, we aim to acquire facts for a much broader set of predicates. Compared to open-domain extraction methods, the SPO arguments of our facts are canonicalized, thus referring to unique entities with semantically typed predicates. A core part of our work focuses on developing scalable inference techniques for querying an uncertain KB in the form of a probabilistic database. A further, very recent research focus lies also in scaling out these techniques to a distributed setting. Here, we aim to process declarative queries, posed in either SQL or logical query languages such as Datalog, via a proprietary, asynchronous communication protocol based on the Message Passing Interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Ives, Z.: DBpedia: a nucleus for a web of open data. In: ISWC, pp. 11–15 (2007)
Chapter Google Scholar
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Article Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)
Google Scholar
Dylla, M., Miliaraki, I., Theobald, M.: A temporal-probabilistic database model for information extraction. PVLDB 6(14), 1810–1821 (2013a)
Google Scholar
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013b)
Google Scholar
Dylla, M., Sozio, M., Theobald, M.: Resolving temporal conflicts in inconsistent RDF knowledge bases. In: BTW, pp. 474–493 (2011)
Google Scholar
Dylla, M., Theobald, M., Miliaraki, I.: Querying and learning in probabilistic databases. In: Reasoning Web, pp. 313–368 (2014)
Chapter Google Scholar
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD, pp. 289–300 (2014a)
Google Scholar
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: Using graph summarization for join-ahead pruning in a distributed RDF engine. In: SWIM, pp. 41:1–41:4 (2014b)
Google Scholar
Gurajada, S., Theobald, M.: Distributed processing of generalized graph-pattern queries in SPARQL 1.1. CoRR, abs/1609.05293 (2016a)
Google Scholar
Gurajada, S., Theobald, M.: Distributed set reachability. In: SIGMOD, pp. 1247–1261 (2016b)
Google Scholar
Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: CIKM, pp. 545–554 (2012)
Google Scholar
Mutsuzaki, M., et al.: Trio-one: layering uncertainty and lineage on a conventional DBMS. In: CIDR, pp. 269–274 (2007)
Google Scholar
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, pp. 227–236 (2011)
Google Scholar
Nakashole, N., Weikum, G., Suchanek, F.M.: PATTY: a taxonomy of relational patterns with semantic types. In: EMNLP-CoNLL, pp. 1135–1145 (2012)
Google Scholar
Nguyen, D.B., Abujabal, A., Tran, K., Theobald, M., Weikum, G.: Query-driven on-the-fly knowledge base construction. PVLDB 11(1), 66–79 (2017a)
Article Google Scholar
Nguyen, D.B., Hoffart, J., Theobald, M., Weikum, G.: AIDA-light: high-throughput named-entity disambiguation. In: LDOW (2014)
Google Scholar
Nguyen, D.B., Theobald, M., Weikum, G.: J-NERD: joint named entity recognition and disambiguation with rich linguistic features. TACL 4, 215–229 (2016)
Google Scholar
Nguyen, D.B., Theobald, M., Weikum, G.: J-REED: joint relation extraction and entity disambiguation. In: CIKM, pp. 2227–2230 (2017b)
Google Scholar
Papaioannou, K., Theobald, M., Böhlen, M.H.: Supporting set operations in temporal-probabilistic databases. In: ICDE, pp. 1180–1191 (2018)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Google Scholar
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)
Article Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Comm. of the ACM 57(10), 78–85 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Martin Theobald

Authors

Martin Theobald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Theobald .

Editor information

Editors and Affiliations

University of Genoa, Genoa, Italy
Barbara Catania
Comenius University, Bratislava, Slovakia
Rastislav Královič
Poznań University of Technology, Poznań, Poland
Jerzy Nawrocki
Università degli Studi di Milano, Milan, Italy
Giovanni Pighizzini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theobald, M. (2019). From Big Data to Big Knowledge. In: Catania, B., Královič, R., Nawrocki, J., Pighizzini, G. (eds) SOFSEM 2019: Theory and Practice of Computer Science. SOFSEM 2019. Lecture Notes in Computer Science(), vol 11376. Springer, Cham. https://doi.org/10.1007/978-3-030-10801-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-10801-4_5
Published: 11 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10800-7
Online ISBN: 978-3-030-10801-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics