A Quality Evaluation of Combined Search on a Knowledge Base and Text

Bast, Hannah; Buchhold, Björn; Haussmann, Elmar

doi:10.1007/s13218-017-0513-9

A Quality Evaluation of Combined Search on a Knowledge Base and Text

Technical Contribution
Published: 06 October 2017

Volume 32, pages 19–26, (2018)
Cite this article

KI - Künstliche Intelligenz Aims and scope Submit manuscript

Hannah Bast¹,
Björn Buchhold¹ &
Elmar Haussmann¹

318 Accesses
Explore all metrics

Abstract

We provide a quality evaluation of KB+Text search, a deep integration of knowledge base search and standard full-text search. A knowledge base (KB) is a set of subject–predicate–object triples with a common naming scheme. The standard query language is SPARQL, where queries are essentially lists of triples with variables. KB+Text search extends this by a special occurs-with predicate, which can be used to express the co-occurrence of words in the text with mentions of entities from the knowledge base. Both pure KB search and standard full-text search are included as special cases. We evaluate the result quality of KB+Text search on three different query sets. The corpus is the full version of the English Wikipedia (2.4 billion word occurrences) combined with the YAGO knowledge base (26 million triples). We provide a web application to reproduce our evaluation, which is accessible via http://ad.informatik.uni-freiburg.de/publications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://lemurproject.org/clueweb09/.
BTC billion triple challenge, https://km.aifb.kit.edu/projects/btc-2009/.
The choice of this outdated version has no significant impact on the insights from our evaluation: the corresponding Wikipedia data from 2017 is (only) about 50% larger but otherwise has the same characteristics and would not lead to principally different results.
There is a more recent version, called YAGO2, but most of the additions from YAGO to YAGO2 (spatial and temporal information) are not interesting for our search.
http://en.wikipedia.org/wiki/Wikipedia:Featured_lists.
For the TREC benchmark even the number of false-negatives decreases. This is because when segmenting into contexts the document parser pre-processes Wikipedia lists by appending each list item to the preceding sentence. These are the only types of contexts that cross sentence boundaries and a rare exception. For the Wikipedia list benchmark we verified that this technique does not include results from the lists from which we created the ground truth.
This means that the words occur in the context, but with a meaning different from what was intended by the query.
The sentence parses are required to compute contexts.

References

Bast H, Bäurle F, Buchhold B, Haußmann E (2014) Semantic full-text search with broccoli. In: SIGIR, ACM, pp 1265–1266
Mihalcea R, Csomai A (2007) Wikify! Linking documents to encyclopedic knowledge. In: CIKM, pp 233–242
Bast H, Haussmann E (2013) Open information extraction via contextual sentence decomposition. In: ICSC
Bast H, Buchhold B (2013) An index for efficient semantic full-text search. In: CIKM
Bast H, Buchhold B, Haussmann E (2016) Semantic search on text and knowledge bases. Found Trends Inf Retr 10(2–3):119–271. doi:10.1561/1500000032
Article Google Scholar
Balog K, de Vries AP, Serdyukov P, Thomas P, Westerveld T (2009) Overview of the TREC 2009 entity track. In: TREC
Bron M, Balog K, de Rijke M (2010) Ranking related entities: components and analyses. In: CIKM, pp 1079–1088
Balog K, Serdyukov P, de Vries AP (2010) Overview of the TREC 2010 entity track. In: TREC
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Sem Web 6(2):167–195
Google Scholar
Balog K, Serdyukov P, de Vries AP (2011) Overview of the TREC 2011 entity track. In: TREC
Campinas S, Ceccarelli D, Perry TE, Delbru R, Balog K, Tummarello G (2011) The sindice-2011 dataset for entity-oriented search in the web of data. In: Workshop on entity-oriented search (EOS), pp 26–32
Halpin H, Herzig DM, Mika P, Blanco R, Pound J, Thompson HS, Tran DT (2010) Evaluating ad-hoc object retrieval. In: Workshop on evaluation of semantic technologies (WEST)
Blanco R, Halpin H, Herzig DM, Mika P, Pound J, Thompson HS, Duc TT (2011) Entity search evaluation over structured web data. In: SIGIR workshop on entity-oriented search (JIWES)
Dang HT, Kelly D, Lin JJ (2007) Overview of the TREC 2007 question answering track. In: TREC
Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. J Web Sem 21:3–13
Article Google Scholar
Cimiano P, Lopez V, Unger C, Cabrio E, Ngomo ACN, Walter S (2013) Multilingual question answering over linked data (QALD-3): lab overview. In: CLEF, pp 321–332
Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2014) Question answering over linked data (QALD-4). In: Working notes for CLEF 2014 conference, Sheffield, 15–18 Sept 2014, pp 1172–1180
Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2015) Question answering over linked data (QALD-5). In: Working notes of CLEF 2015—conference and labs of the evaluation forum, Toulouse, 8–11 Sept 2015
Bast H, Chitea A, Suchanek FM, Weber I (2007) Ester: efficient search on text, entities, and relations. In: SIGIR, pp 671–678
Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and semantic searches. In: ESWC, pp 554–568
Tablan V, Bontcheva K, Roberts I, Cunningham H (2015) Mímir: an open-source semantic search framework for interactive information seeking and discovery. J Web Sem 30:52–68
Article Google Scholar
Wang H, Liu Q, Penin T, Fu L, Zhang L, Tran T, Yu Y, Pan Y (2009) Semplore: a scalable IR approach to search the web of data. J Web Sem 7(3):177–188
Article Google Scholar
Giunchiglia F, Kharkevich U, Zaihrayeu I (2009) Concept search. In: ESWC, pp 429–444
Tran T, Mika P, Wang H, Grobelnik M (2011) SemSearch’11: the 4th semantic search workshop. In: WWW (companion volume)
Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp 1247–1250
Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Freiburg, 79110, Freiburg, Germany
Hannah Bast, Björn Buchhold & Elmar Haussmann

Authors

Hannah Bast
View author publications
You can also search for this author in PubMed Google Scholar
Björn Buchhold
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Haussmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannah Bast.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bast, H., Buchhold, B. & Haussmann, E. A Quality Evaluation of Combined Search on a Knowledge Base and Text. Künstl Intell 32, 19–26 (2018). https://doi.org/10.1007/s13218-017-0513-9

Download citation

Received: 21 June 2017
Accepted: 14 September 2017
Published: 06 October 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s13218-017-0513-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Quality Evaluation of Combined Search on a Knowledge Base and Text

Abstract

Access this article

Similar content being viewed by others

Efficient and Convenient SPARQL+Text Search: A Quick Survey

Metasearch Engine: A Technology for Information Extraction in Knowledge Computing

Core techniques of question answering systems over knowledge bases: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Quality Evaluation of Combined Search on a Knowledge Base and Text

Abstract

Access this article

Similar content being viewed by others

Efficient and Convenient SPARQL+Text Search: A Quick Survey

Metasearch Engine: A Technology for Information Extraction in Knowledge Computing

Core techniques of question answering systems over knowledge bases: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation