Skip to main content

DD-RDL: Drug-Disease Relation Discovery and Labeling

  • Conference paper
  • First Online:
ICT Innovations 2021. Digital Transformation (ICT Innovations 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1521))

Included in the following conference series:

Abstract

Drug repurposing, which is concerned with the study of the effectiveness of existing drugs on new diseases, has been growing in importance in the last few years. One of the core methodologies for drug repurposing is text-mining, where novel biological entity relationships are extracted from existing biomedical literature and publications, whose number skyrocketed in the last couple of years. This paper proposes an NLP approach for drug-disease relation discovery and labeling (DD-RDL), which employs a series of steps to analyze a corpus of abstracts of scientific biomedical research papers. The proposed ML pipeline restructures the free text from a set of words into drug-disease pairs using state-of-the-art text mining methodologies and natural language processing tools. The model’s output is a set of extracted triplets in the form (drug, verb, disease), where each triple describes a relationship between a drug and a disease detected in the corpus. We evaluate the model based on a gold standard dataset for drug-disease relationships, and we demonstrate that it is possible to achieve similar results without requiring a large amount of annotated biological data or predefined semantic rules. Additionally, as an experimental case, we analyze the research papers published as part of the COVID-19 Open Research Dataset (CORD-19) to extract and identify relations between drugs and diseases related to the ongoing pandemic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://semrep.nlm.nih.gov/GoldStandard.html.

  2. 2.

    https://doi.org/10.6084/m9.figshare.6389870.v1.

  3. 3.

    http://db.idrblab.net/ttd/.

  4. 4.

    https://gitlab.com/jovana.dobreva16/dd-rdl-model.git.

  5. 5.

    https://www.kaggle.com/arpikr/uci-drug.

  6. 6.

    https://www.kaggle.com/priya1207/diseases-dataset.

  7. 7.

    https://gitlab.com/jovana.dobreva16/dd-rdl-model/-/tree/master/data_storage.

  8. 8.

    https://pubmed.ncbi.nlm.nih.gov/?term=%28%28Vitamin+D%29+AND+%28Osteoporosis%29%29+AND+%28receive%29.

References

  1. Abacha, A.B., Zweigenbaum, P.: Automatic extraction of semantic relations between medical entities: a rule based approach. J. Biomed. Seman. 2(5), 1–11 (2011)

    Google Scholar 

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  Google Scholar 

  3. Dobreva, J., Jofche, N., Jovanovik, M., Trajanov, D.: Improving NER performance by applying text summarization on pharmaceutical articles. In: Dimitrova, V., Dimitrovski, I. (eds.) ICT Innovations 2020. CCIS, vol. 1316, pp. 87–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62098-1_8

    Chapter  Google Scholar 

  4. Filannino, M., Uzuner, Ö.: Advancing the state of the art in clinical natural language processing through shared tasks. Yearbook Med. Inform. 27(1), 184 (2018)

    Article  Google Scholar 

  5. Fu, T.y., Lee, W.C., Lei, Z.: Hin2vec: explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1797–1806 (2017)

    Google Scholar 

  6. Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS) (2018)

    Google Scholar 

  7. Gottlieb, A., Stein, G.Y., Ruppin, E., Sharan, R.: Predict: a method for inferring novel drug indications with application to personalized medicine. Mole. Syst. Biol. 7(1), 496 (2011)

    Article  Google Scholar 

  8. Gu, J., Qian, L., Zhou, G.: Chemical-induced disease relation extraction with various linguistic features. Database 2016, 042 (2016)

    Google Scholar 

  9. Henry, S., McInnes, B.T.: Literature based discovery: models, methods, and trends. J. Biomed. Inform. 74, 20–32 (2017).https://doi.org/10.1016/j.jbi.2017.08.011,https://www.sciencedirect.com/science/article/pii/S1532046417301909

  10. Jofche, N., Mishev, K., Stojanov, R., Jovanovik, M., Trajanov, D.: PharmKE: Knowledge extraction platform for pharmaceutical texts using transfer learning (2021)

    Google Scholar 

  11. Kadir, R.A., Bokharaeian, B.: Overview of biomedical relations extraction using hybrid rulebased approaches. J. Ind. Intell. Inf. 1(3) (2013)

    Google Scholar 

  12. Khan, J.Y., et al.: COVID-19Base: a knowledgebase to explore biomedical entities related to COVID-19. arXiv preprint arXiv:2005.05954 (2020)

  13. Kilicoglu, H., Rosemblat, G., Fiszman, M., Shin, D.: Broad-coverage biomedical relation extraction with Semrep. BMC Bioinform. 21, 1–28 (2020)

    Article  Google Scholar 

  14. Kraljevic, Z., et al.: MedCAT - Medical Concept Annotation Tool (2019)

    Google Scholar 

  15. Lee, K., He, L., Lewis, M., Zettlemoyer, L.: End-to-end neural co reference resolution. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 188–197. Association for Computational Linguistics, Copenhagen, Denmark, September 2017. https://doi.org/10.18653/v1/D17-1018, https://www.aclweb.org/anthology/D17-1018

  16. Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. CoRR abs/1812.09449 (2018), http://arxiv.org/abs/1812.09449

  17. Liu, J., Abeysinghe, R., Zheng, F., Cui, L.: Pattern-based extraction of disease drug combination knowledge from biomedical literature. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–7. IEEE (2019)

    Google Scholar 

  18. Màrquez, L., Carreras, X., Litkowski, K.C., Stevenson, S.: Semantic role labeling: an introduction to the special issue. Comput. Ling. 34, 145–159 (2008)

    Google Scholar 

  19. Preiss, J., Stevenson, M., Gaizauskas, R.: Exploring relation types for literature-based discovery. J. Am. Med. Inform. Assoc 22(5), 987–992 (2015). https://doi.org/10.1093/jamia/ocv002

  20. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 101–108 (2020)

    Google Scholar 

  21. Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pp. 430–437 (2004)

    Google Scholar 

  22. Sang, S., Yang, Z., Wang, L., Liu, X., Lin, H., Wang, J.: SemaTyP: a knowledge graph based literature mining method for drug discovery. BMC Bioinform. 19(1), 1–11 (2018)

    Article  Google Scholar 

  23. Shi, P., Lin, J.: Simple BERT models for relation extraction and semantic role labeling (2019)

    Google Scholar 

  24. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001). https://doi.org/10.1162/089120101753342653, https://www.aclweb.org/anthology/J01-4004

  25. Torfi, A., Shirvani, R.A., Keneshloo, Y., Tavaf, N., Fox, E.A.: Natural Language Processing Advancements By Deep Learning: A Survey (2020)

    Google Scholar 

  26. Wang, L.L., et al.: CORD-19: The COVID-19 open research dataset (2020)

    Google Scholar 

  27. Wang, P., Hao, T., Yan, J., Jin, L.: Large-scale extraction of drug-disease pairs from the medical literature. J. Assoc. Inf. Sci. Technol. 68(11), 2649–2661 (2017)

    Article  Google Scholar 

  28. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 EMNLP (Systems Demonstrations), pp. 38–45 (2020)

    Google Scholar 

  29. Xia, Q., et al.: Syntax-aware neural semantic role labeling (2019)

    Google Scholar 

  30. Xu, R., Wang, Q.: Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC Bioinform. 14(1), 1–11 (2013)

    Article  MathSciNet  Google Scholar 

  31. Xue, H., Li, J., Xie, H., Wang, Y.: Review of drug repositioning approaches and resources. Int. J. Biol. Sci.Int. J. Biol. Sci. 14(10), 1232 (2018)

    Article  Google Scholar 

  32. Yang, H., Swaminathan, R., Sharma, A., Ketkar, V., Jason, D.: Mining biomedical text towards building a quantitative food-disease-gene network. In: Learning Structure and Schemas from Documents, pp. 205–225. Springer, Cham (2011). https://doi.org/10.1007/978-3-642-22913-8

  33. Zhou, R., Lu, Z., Luo, H., Xiang, J., Zeng, M., Li, M.: NEDD: a network embedding based method for predicting drug-disease associations. BMC Bioinform. 21(13), 1–12 (2020)

    Google Scholar 

Download references

Acknowledgement

The work in this paper was partially financed by the Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milos Jovanovik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dobreva, J., Jovanovik, M., Trajanov, D. (2022). DD-RDL: Drug-Disease Relation Discovery and Labeling. In: Antovski, L., Armenski, G. (eds) ICT Innovations 2021. Digital Transformation. ICT Innovations 2021. Communications in Computer and Information Science, vol 1521. Springer, Cham. https://doi.org/10.1007/978-3-031-04206-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04206-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04205-8

  • Online ISBN: 978-3-031-04206-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics