Skip to main content
Log in

Constructing biomedical domain-specific knowledge graph with minimum supervision

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Domain-specific knowledge graph is an effective way to represent complex domain knowledge in a structured format and has shown great success in real-world applications. Most existing work on knowledge graph construction and completion shares several limitations in that sufficient external resources such as large-scale knowledge graphs and concept ontologies are required as the starting point. However, such extensive domain-specific labeling is highly time-consuming and requires special expertise, especially in biomedical domains. Therefore, knowledge extraction from unstructured contexts with minimum supervision is crucial in biomedical fields. In this paper, we propose a versatile approach for knowledge graph construction with minimum supervision based on unstructured biomedical domain-specific contexts including the steps of entity recognition, unsupervised entity and relation embedding, latent relation generation via clustering, relation refinement and relation assignment to assign cluster-level labels. The experimental results based on 24,687 unstructured biomedical science abstracts show that the proposed framework can effectively extract 16,192 structured facts with high precision. Moreover, we demonstrate that the constructed knowledge graph is a sufficient resource for the task of knowledge graph completion and new knowledge inference from unseen contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://wordnet.princeton.edu/.

  2. https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/.

  3. https://www.ncbi.nlm.nih.gov/pubmed.

  4. http://www.geneontology.org/page/introduction-go-resource.

  5. https://googleblog.blogspot.com/2015/02/health-info-knowledge-graph.html.

  6. https://www.nlm.nih.gov/mesh/.

  7. https://skr3.nlm.nih.gov/index.html.

References

  1. Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, July 26–31, 2015, vol 1. Long Papers, Beijing, China, pp 344–354

  2. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 1027–1035

  3. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25

    Article  Google Scholar 

  4. Augenstein I, Vlachos A, Maynard D (2015) Extracting relations between non-standard entities using distant supervision and imitation learning. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Florence, pp 747–757

  5. Bai T, Gong L, Wang Y, Wang Y, Kulikowski CA, Huang L (2016) A method for exploring implicit concept relatedness in biomedical knowledge network. BMC Bioinform 17(9):265

    Article  Google Scholar 

  6. Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716

    Article  Google Scholar 

  7. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606

  8. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York City, pp 1247–1250

  9. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795

  10. Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings compression and complexity of sequences. IEEE, Piscataway, pp 21–29

  11. Consortium U (2016) Uniprot: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169

    Google Scholar 

  12. Ernst P, Siu A, Milchevski D, Hoffart J, Weikum G (2016) Deeplife: an entity-aware search, analytics and exploration platform for health and life sciences. ACL, Vancouver, p 19

    Google Scholar 

  13. Ernst P, Siu A, Weikum G (2015) Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform 16(1):157

    Article  Google Scholar 

  14. Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Florence, pp 363–370

  15. Galárraga L, Heitz G, Murphy K, Suchanek FM (2014) Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM international conference on information and knowledge management. ACM, New York City, pp 1679–1688

  16. Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Florence, pp 541–550

  17. Ji M, He Q, Han J, Spangler S (2015) Mining strong relevance between heterogeneous entities from unstructured biomedical data. Data Min Knowl Discov 29(4):976–998

    Article  MathSciNet  Google Scholar 

  18. Kilicoglu H, Fiszman M, Rodriguez A, Shin D, Ripple A, Rindflesch TC (2008) Semantic medline: a web application for managing the results of pubmed searches. In: Proceedings of the third international symposium for semantic mining in biomedicine, vol 2008. Citeseer, Princeton, pp 69–76

  19. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195

    Article  Google Scholar 

  20. Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of ACL, vol 1, pp 2124–2133

  21. Mahdisoltani F, Biega J, Suchanek F (2014) Yago3: a knowledge base from multilingual wikipedias. In: CIDR conference 7th Biennial conference on innovative data systems research

  22. Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The stanford coreNLP natural language processing toolkit. ACL, Florence, p 55

    Google Scholar 

  23. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119

  24. Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33

    Article  Google Scholar 

  25. Niu F, Zhang C, Ré C, Shavlik JW (2012) Deepdive: web-scale knowledge-base construction using statistical learning and inference. VLDS 12:25–28

    Google Scholar 

  26. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  27. Ren X, Wu Z, He W, Qu M, Voss CR, Ji H, Abdelzaher TF, Han J (2016) Cotype: joint extraction of typed entities and relations with knowledge bases. arXiv preprint arXiv:1610.08763

  28. Riedel S, Yao L, McCallum A (2010) Modeling relations and their mentions without labeled text. In: Machine Learning and Knowledge Discovery in Databases, European Conference, Barcelona, Spain, September 20–24, 2010, Proceedings, Part III, pp 148–163. https://doi.org/10.1007/978-3-642-15939-8_10

    Chapter  Google Scholar 

  29. Rindflesch TC, Fiszman M (2003) The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 36(6):462–477

    Article  Google Scholar 

  30. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  31. Siu A, Ernst P, Weikum G (2016) Disambiguation of entities in medline abstracts by combining mesh terms with knowledge. ACL, Florence, p 72

    Google Scholar 

  32. Siu A, Nguyen DB, Weikum G (2013) Fast entity recognition in biomedical. In: Proceedings of workshop on data mining for healthcare (DMH) at conference on knowledge discovery and data mining (KDD). ACM Press, New York

  33. Surdeanu M, Tibshirani J, Nallapati R, Manning CD (2012) Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, Florence, pp 455–465

  34. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp 1112–1119

  35. Xie R, Liu Z, Sun M (2016) Representation learning of knowledge graphs with hierarchical types. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2965–2971

  36. You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence. AAAI Press, Palo Alto, pp 381–388

  37. Yuan J, Guo H, Jin Z, Jin H, Zhang X, Luo J (2017) One-shot learning for fine-grained relation extraction via convolutional siamese neural network. In: IEEE international conference on big data. IEEE, Piscataway, pp 2194–2199

  38. Yuan J, Holtz C, Smith T, Luo J (2016) Autism spectrum disorder detection from semi-structured and unstructured medical data. EURASIP J Bioinform Syst Biol 2017(1):3

    Article  Google Scholar 

  39. Zeng D, Liu K, Chen Y, Zhao J (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp 1753–1762

Download references

Acknowledgements

This work is supported in part by the New York State through the Goergen Institute for Data Science and our corporate sponsors, Carestream Health and NSF awards #1704309 and #1722847.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianbo Yuan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, J., Jin, Z., Guo, H. et al. Constructing biomedical domain-specific knowledge graph with minimum supervision. Knowl Inf Syst 62, 317–336 (2020). https://doi.org/10.1007/s10115-019-01351-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01351-4

Keywords

Navigation