Skip to main content
Log in

A Quality Evaluation of Combined Search on a Knowledge Base and Text

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

We provide a quality evaluation of KB+Text search, a deep integration of knowledge base search and standard full-text search. A knowledge base (KB) is a set of subject–predicate–object triples with a common naming scheme. The standard query language is SPARQL, where queries are essentially lists of triples with variables. KB+Text search extends this by a special occurs-with predicate, which can be used to express the co-occurrence of words in the text with mentions of entities from the knowledge base. Both pure KB search and standard full-text search are included as special cases. We evaluate the result quality of KB+Text search on three different query sets. The corpus is the full version of the English Wikipedia (2.4 billion word occurrences) combined with the YAGO knowledge base (26 million triples). We provide a web application to reproduce our evaluation, which is accessible via http://ad.informatik.uni-freiburg.de/publications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. http://lemurproject.org/clueweb09/.

  2. BTC billion triple challenge, https://km.aifb.kit.edu/projects/btc-2009/.

  3. The choice of this outdated version has no significant impact on the insights from our evaluation: the corresponding Wikipedia data from 2017 is (only) about 50% larger but otherwise has the same characteristics and would not lead to principally different results.

  4. There is a more recent version, called YAGO2, but most of the additions from YAGO to YAGO2 (spatial and temporal information) are not interesting for our search.

  5. http://en.wikipedia.org/wiki/Wikipedia:Featured_lists.

  6. For the TREC benchmark even the number of false-negatives decreases. This is because when segmenting into contexts the document parser pre-processes Wikipedia lists by appending each list item to the preceding sentence. These are the only types of contexts that cross sentence boundaries and a rare exception. For the Wikipedia list benchmark we verified that this technique does not include results from the lists from which we created the ground truth.

  7. This means that the words occur in the context, but with a meaning different from what was intended by the query.

  8. The sentence parses are required to compute contexts.

References

  1. Bast H, Bäurle F, Buchhold B, Haußmann E (2014) Semantic full-text search with broccoli. In: SIGIR, ACM, pp 1265–1266

  2. Mihalcea R, Csomai A (2007) Wikify! Linking documents to encyclopedic knowledge. In: CIKM, pp 233–242

  3. Bast H, Haussmann E (2013) Open information extraction via contextual sentence decomposition. In: ICSC

  4. Bast H, Buchhold B (2013) An index for efficient semantic full-text search. In: CIKM

  5. Bast H, Buchhold B, Haussmann E (2016) Semantic search on text and knowledge bases. Found Trends Inf Retr 10(2–3):119–271. doi:10.1561/1500000032

    Article  Google Scholar 

  6. Balog K, de Vries AP, Serdyukov P, Thomas P, Westerveld T (2009) Overview of the TREC 2009 entity track. In: TREC

  7. Bron M, Balog K, de Rijke M (2010) Ranking related entities: components and analyses. In: CIKM, pp 1079–1088

  8. Balog K, Serdyukov P, de Vries AP (2010) Overview of the TREC 2010 entity track. In: TREC

  9. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Sem Web 6(2):167–195

    Google Scholar 

  10. Balog K, Serdyukov P, de Vries AP (2011) Overview of the TREC 2011 entity track. In: TREC

  11. Campinas S, Ceccarelli D, Perry TE, Delbru R, Balog K, Tummarello G (2011) The sindice-2011 dataset for entity-oriented search in the web of data. In: Workshop on entity-oriented search (EOS), pp 26–32

  12. Halpin H, Herzig DM, Mika P, Blanco R, Pound J, Thompson HS, Tran DT (2010) Evaluating ad-hoc object retrieval. In: Workshop on evaluation of semantic technologies (WEST)

  13. Blanco R, Halpin H, Herzig DM, Mika P, Pound J, Thompson HS, Duc TT (2011) Entity search evaluation over structured web data. In: SIGIR workshop on entity-oriented search (JIWES)

  14. Dang HT, Kelly D, Lin JJ (2007) Overview of the TREC 2007 question answering track. In: TREC

  15. Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. J Web Sem 21:3–13

    Article  Google Scholar 

  16. Cimiano P, Lopez V, Unger C, Cabrio E, Ngomo ACN, Walter S (2013) Multilingual question answering over linked data (QALD-3): lab overview. In: CLEF, pp 321–332

  17. Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2014) Question answering over linked data (QALD-4). In: Working notes for CLEF 2014 conference, Sheffield, 15–18 Sept 2014, pp 1172–1180

  18. Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2015) Question answering over linked data (QALD-5). In: Working notes of CLEF 2015—conference and labs of the evaluation forum, Toulouse, 8–11 Sept 2015

  19. Bast H, Chitea A, Suchanek FM, Weber I (2007) Ester: efficient search on text, entities, and relations. In: SIGIR, pp 671–678

  20. Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and semantic searches. In: ESWC, pp 554–568

  21. Tablan V, Bontcheva K, Roberts I, Cunningham H (2015) Mímir: an open-source semantic search framework for interactive information seeking and discovery. J Web Sem 30:52–68

    Article  Google Scholar 

  22. Wang H, Liu Q, Penin T, Fu L, Zhang L, Tran T, Yu Y, Pan Y (2009) Semplore: a scalable IR approach to search the web of data. J Web Sem 7(3):177–188

    Article  Google Scholar 

  23. Giunchiglia F, Kharkevich U, Zaihrayeu I (2009) Concept search. In: ESWC, pp 429–444

  24. Tran T, Mika P, Wang H, Grobelnik M (2011) SemSearch’11: the 4th semantic search workshop. In: WWW (companion volume)

  25. Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp 1247–1250

  26. Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hannah Bast.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bast, H., Buchhold, B. & Haussmann, E. A Quality Evaluation of Combined Search on a Knowledge Base and Text. Künstl Intell 32, 19–26 (2018). https://doi.org/10.1007/s13218-017-0513-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-017-0513-9

Keywords

Navigation