Abstract
Currently, a trend to augment document collections with entity-centric knowledge provided by knowledge graphs is clearly visible, especially in scientific digital libraries. Entity facts are either manually curated, or for higher scalability automatically harvested from large volumes of text documents. The often claimed benefit is that a collection-wide fact extraction combines information from huge numbers of documents into one single database. However, even if the extraction process would be 100% correct, the promise of pervasive information fusion within retrieval tasks poses serious threats with respect to the results’ validity. This is because important contextual information provided by each document is often lost in the process and cannot be readily restored at retrieval time. In this paper, we quantify the consequences of uncontrolled knowledge graph evolution in real-world scientific libraries using NLM’s PubMed corpus vs. the SemMedDB knowledge base. Moreover, we operationalise the notion of implicit context as a viable solution to gain a sense of context compatibility for all extracted facts based on the pair-wise coherence of all documents used for extraction: Our derived measures for context compatibility determine which facts are relatively safe to combine. Moreover, they allow to balance between precision and recall. Our practical experiments extensively evaluate context compatibility based on implicit contexts for typical digital library tasks. The results show that our implicit notion of context compatibility is superior to existing methods in terms of both, simplicity and retrieval quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Auer, S., Kovtun, V., Prinz, M., Kasprzik, A., Stocker, M., Vidal, M.E.: Towards a knowledge graph for science. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics. WIMS 2018. ACM (2018)
Bechhofer, S., et al.: Why linked data is not enough for scientists. Fut. Gener. Comput. Syst. 29(2), 599–611 (2013)
Candan, K.S., Liu, H., Suvarna, R.: Resource description framework: metadata and its applications. SIGKDD Expl. 3(1), 6–19 (2001)
Carothers, G.: RDF 1.1 N-Quads. https://www.w3.org/TR/n-quads/ (2014)
Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proceedings of the 14th International Conference on WWW, WWW 2005, pp. 613–622. ACM (2005)
Ernst, P., Siu, A., Weikum, G.: Highlife: higher-arity fact harvesting. In: Proceedings of the 2018 World Wide Web Conference, WWW 2018, International World Wide Web Conference on Steering Committee, pp. 1013–1022 (2018)
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25
Hayes, P.J., Patel-Schneider, P.F.: RDF 1.1 Semantics. https://www.w3.org/TR/rdf11-mt/##whatnot (2014)
Hernández, D., Hogan, A., Krötzsch, M.: Reifying RDF: what works well with Wikidata? In: Proceedings of the 11th International Work. on Scalable Semantic Web Knowledge Base Systems. CEUR Working Proceedings, vol. 1457, pp. 32–47. CEUR-WS.org (2015)
Kalo, J.C., Homoceanu, S., Rose, J., Balke, W.T.: Avoiding Chinese Whispers: controlling end-to-end join quality in linked open data stores. In: Proceedings of the ACM Web Science Conference, WebSci 2015, pp. 5:1–5:10. ACM (2015)
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012)
Lebo, T., Sahoo, S., McGuinness, D.: PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/ (2013)
Patel-Schneider, P.: Contextualization via qualifiers. In: Workshop on Contextualized Knowledge Graphs co-located with 17th International Semantic Web Conference on, CKG@ISWC 2018 (2018). http://wiki.knoesis.org/index.php/CKG2018
Pinto, J.M.G., Balke, W.-T.: Can plausibility help to support high quality content in digital libraries? In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 169–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_14
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)
Swanson, D.R.: Complementary structures in disjoint science literatures. In: Proc. of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 280–289. SIGIR 1991, ACM (1991)
Tan, W.C.: Provenance in databases: past, current, and future. Bull. IEEE Comput. Soc. Techn. Committee Data Eng. 30(4), 3–12 (2007)
Vahdati, S., Palma, G., Nath, R.J., Lange, C., Auer, S., Vidal, M.-E.: Unveiling scholarly communities over knowledge graphs. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 103–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00066-0_9
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wylot, M., Cudré-Mauroux, P., Hauswirth, M., Groth, P.: Storing, tracking, and querying provenance in linked data. IEEE Trans. Knowl. Data Eng. 29(8), 1751–1764 (2017)
Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: a survey. IEEE Trans. Big Data 3(1), 18–35 (2017)
Zhang, R., et al.: Using semantic predications to uncover drug-drug interactions in clinical data. J. Biomed. Inform. 49, 134–147 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kroll, H., Kalo, JC., Nagel, D., Mennicke, S., Balke, WT. (2020). Context-Compatible Information Fusion for Scientific Knowledge Graphs. In: Hall, M., MerÄŤun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-54956-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54955-8
Online ISBN: 978-3-030-54956-5
eBook Packages: Computer ScienceComputer Science (R0)