Skip to main content

On Graph Construction for Classification of Clinical Trials Protocols Using Graph Neural Networks

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2022)

Abstract

A recent trend in health-related machine learning proposes the use of Graph Neural Networks (GNN’s) to model biomedical data. This is justified due to the complexity of healthcare data and the modelling power of graph abstractions. Thus, GNN’s emerge as the natural choice to learn from increasing amounts of healthcare data. While formulating the problem, however, there are usually multiple design choices and decisions that can affect the final performance. In this work, we focus on Clinical Trial (CT) protocols consisting of hierarchical documents, containing free text as well as medical codes and terms, and design a classifier to predict each CT protocol termination risk as “low” or “high”. We show that while using GNN’s to solve this classification task is very successful, the way the graph is constructed is also of importance and one can benefit from making a priori useful information more explicit. While a natural choice is to consider each CT protocol as an independent graph and pose the problem as a graph classification, consistent performance improvements can be achieved by considering them as super-nodes in one unified graph and connecting them according to some metadata, like similar medical condition or intervention, and finally approaching the problem as a node classification task rather than graph classification. We validate this hypothesis experimentally on a large-scale manually labeled CT database. This provides useful insights on the flexibility of graph-based modeling for machine learning in the healthcare domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    More technically, they should be either “permutation invariant” or “permutation equivariant” to the order of nodes.

  2. 2.

    which resembles semi-supervised classification in some sense.

  3. 3.

    https://ClinicalTrials.gov/AllAPIJSON.zip.

References

  1. Bronstein, M.M., Bruna, J., Cohen, T., Veličković, P.: Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478 (2021)

  2. Chiang, W.L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.J.: Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)

    Google Scholar 

  3. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795 (2017)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Elkin, M.E., Zhu, X.: Predictive modeling of clinical trial terminations using feature engineering and embedding learning. Sci. Rep. 11(1), 1–12 (2021)

    Article  Google Scholar 

  6. Elkin, M.E., Zhu, X.: Understanding and predicting COVID-19 clinical trial completion vs. cessation. Plos one 16(7), e0253789 (2021)

    Article  Google Scholar 

  7. Ferdowsi, S., Borissov, N., Knafou, J., Amini, P., Teodoro, D.: Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021)

    Google Scholar 

  8. Fogel, D.B.: Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp. Clin. Trials Commun. 11, 156–164 (2018)

    Article  Google Scholar 

  9. Follett, L., Geletta, S., Laugerman, M.: Quantifying risk associated with clinical trial termination: a text mining approach. Inf. Process. Manage. 56, 516–525 (2019)

    Article  Google Scholar 

  10. Gainza, P., et al.: Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17(2), 184–192 (2020)

    Article  Google Scholar 

  11. Geletta, S., Follett, L., Laugerman, M.: Latent dirichlet allocation in predicting clinical trial failures (2019)

    Google Scholar 

  12. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning, pp. 1263–1272. PMLR (2017)

    Google Scholar 

  13. Glynn, E.F., Hoffman, M.A.: Heterogeneity introduced by EHR system implementation in a de-identified data resource from 100 non-affiliated organizations. JAMIA open 2(4), 554–561 (2019)

    Article  Google Scholar 

  14. Gouareb, R., Can, F., Ferdowsi, S., Teodoro, D.: Vessel destination prediction using a graph-based machine learning model. In: Ribeiro, P., Silva, F., Mendes, J.F., Laureano, R. (eds.) NetSci-X 2022. LNCS, vol. 13197, pp. 80–93. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97240-0_7

    Chapter  Google Scholar 

  15. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)

    Google Scholar 

  16. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035 (2017)

    Google Scholar 

  17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  18. Martin, L., Hutchens, M., Hawkins, C.: Trial watch: clinical trial cycle times continue to increase despite industry efforts. Nat. Rev. Drug Discov. 16(3), 157–158 (2017)

    Article  Google Scholar 

  19. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)

    Google Scholar 

  20. Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)

  21. Stokes, J.M., et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)

    Article  Google Scholar 

  22. Teodoro, D., Pasche, E., Gobeill, J., Emonet, S., Ruch, P., Lovis, C.: Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation. J. Med. Internet Res. 14(3), e73 (2012)

    Article  Google Scholar 

  23. Teodoro, D., Sundvall, E., João Junior, M., Ruch, P., Miranda Freire, S.: ORBDA: an open EHR benchmark dataset for performance assessment of electronic health record servers. PLoS ONE 13(1), e0190028 (2018)

    Article  Google Scholar 

  24. Teodoro, D.H., et al.: Interoperability driven integration of biomedical data sources. Stud. Health Technol. Inform. 169, 185–9 (2011)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  26. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  27. Williams, R.J., Tse, T., DiPiazza, K., Zarin, D.A.: Terminated trials in the clinicaltrials.gov results database: evaluation of availability of primary outcome data and reasons for termination. PLOS ONE 10(5), 1–12 (2015). https://doi.org/10.1371/journal.pone.0127242

    Article  Google Scholar 

  28. Wong, C.H., Siah, K.W., Lo, A.W.: Estimation of clinical trial success rates and related parameters. Biostatistics 20(2), 273–286 (2019)

    Article  MathSciNet  Google Scholar 

  29. Wouters, O., McKee, M., Luyten, J.: Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020)

    Article  Google Scholar 

  30. Zeng, H., et al.: Deep graph neural networks with shallow subgraph samplers. arXiv preprint arXiv:2012.01380 (2020)

  31. Zhu, W., Razavian, N.: Variationally regularized graph-based representation learning for electronic health records. In: Proceedings of the Conference on Health, Inference, and Learning, CHIL 2021, pp. 1–13. Association for Computing Machinery, New York (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sohrab Ferdowsi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferdowsi, S. et al. (2022). On Graph Construction for Classification of Clinical Trials Protocols Using Graph Neural Networks. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) Artificial Intelligence in Medicine. AIME 2022. Lecture Notes in Computer Science(), vol 13263. Springer, Cham. https://doi.org/10.1007/978-3-031-09342-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09342-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09341-8

  • Online ISBN: 978-3-031-09342-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics