Abstract
A recent trend in health-related machine learning proposes the use of Graph Neural Networks (GNN’s) to model biomedical data. This is justified due to the complexity of healthcare data and the modelling power of graph abstractions. Thus, GNN’s emerge as the natural choice to learn from increasing amounts of healthcare data. While formulating the problem, however, there are usually multiple design choices and decisions that can affect the final performance. In this work, we focus on Clinical Trial (CT) protocols consisting of hierarchical documents, containing free text as well as medical codes and terms, and design a classifier to predict each CT protocol termination risk as “low” or “high”. We show that while using GNN’s to solve this classification task is very successful, the way the graph is constructed is also of importance and one can benefit from making a priori useful information more explicit. While a natural choice is to consider each CT protocol as an independent graph and pose the problem as a graph classification, consistent performance improvements can be achieved by considering them as super-nodes in one unified graph and connecting them according to some metadata, like similar medical condition or intervention, and finally approaching the problem as a node classification task rather than graph classification. We validate this hypothesis experimentally on a large-scale manually labeled CT database. This provides useful insights on the flexibility of graph-based modeling for machine learning in the healthcare domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
More technically, they should be either “permutation invariant” or “permutation equivariant” to the order of nodes.
- 2.
which resembles semi-supervised classification in some sense.
- 3.
References
Bronstein, M.M., Bruna, J., Cohen, T., Veličković, P.: Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478 (2021)
Chiang, W.L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.J.: Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)
Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Elkin, M.E., Zhu, X.: Predictive modeling of clinical trial terminations using feature engineering and embedding learning. Sci. Rep. 11(1), 1–12 (2021)
Elkin, M.E., Zhu, X.: Understanding and predicting COVID-19 clinical trial completion vs. cessation. Plos one 16(7), e0253789 (2021)
Ferdowsi, S., Borissov, N., Knafou, J., Amini, P., Teodoro, D.: Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021)
Fogel, D.B.: Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp. Clin. Trials Commun. 11, 156–164 (2018)
Follett, L., Geletta, S., Laugerman, M.: Quantifying risk associated with clinical trial termination: a text mining approach. Inf. Process. Manage. 56, 516–525 (2019)
Gainza, P., et al.: Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17(2), 184–192 (2020)
Geletta, S., Follett, L., Laugerman, M.: Latent dirichlet allocation in predicting clinical trial failures (2019)
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning, pp. 1263–1272. PMLR (2017)
Glynn, E.F., Hoffman, M.A.: Heterogeneity introduced by EHR system implementation in a de-identified data resource from 100 non-affiliated organizations. JAMIA open 2(4), 554–561 (2019)
Gouareb, R., Can, F., Ferdowsi, S., Teodoro, D.: Vessel destination prediction using a graph-based machine learning model. In: Ribeiro, P., Silva, F., Mendes, J.F., Laureano, R. (eds.) NetSci-X 2022. LNCS, vol. 13197, pp. 80–93. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97240-0_7
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035 (2017)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Martin, L., Hutchens, M., Hawkins, C.: Trial watch: clinical trial cycle times continue to increase despite industry efforts. Nat. Rev. Drug Discov. 16(3), 157–158 (2017)
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)
Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)
Stokes, J.M., et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)
Teodoro, D., Pasche, E., Gobeill, J., Emonet, S., Ruch, P., Lovis, C.: Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation. J. Med. Internet Res. 14(3), e73 (2012)
Teodoro, D., Sundvall, E., João Junior, M., Ruch, P., Miranda Freire, S.: ORBDA: an open EHR benchmark dataset for performance assessment of electronic health record servers. PLoS ONE 13(1), e0190028 (2018)
Teodoro, D.H., et al.: Interoperability driven integration of biomedical data sources. Stud. Health Technol. Inform. 169, 185–9 (2011)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Williams, R.J., Tse, T., DiPiazza, K., Zarin, D.A.: Terminated trials in the clinicaltrials.gov results database: evaluation of availability of primary outcome data and reasons for termination. PLOS ONE 10(5), 1–12 (2015). https://doi.org/10.1371/journal.pone.0127242
Wong, C.H., Siah, K.W., Lo, A.W.: Estimation of clinical trial success rates and related parameters. Biostatistics 20(2), 273–286 (2019)
Wouters, O., McKee, M., Luyten, J.: Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020)
Zeng, H., et al.: Deep graph neural networks with shallow subgraph samplers. arXiv preprint arXiv:2012.01380 (2020)
Zhu, W., Razavian, N.: Variationally regularized graph-based representation learning for electronic health records. In: Proceedings of the Conference on Health, Inference, and Learning, CHIL 2021, pp. 1–13. Association for Computing Machinery, New York (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ferdowsi, S. et al. (2022). On Graph Construction for Classification of Clinical Trials Protocols Using Graph Neural Networks. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) Artificial Intelligence in Medicine. AIME 2022. Lecture Notes in Computer Science(), vol 13263. Springer, Cham. https://doi.org/10.1007/978-3-031-09342-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-09342-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09341-8
Online ISBN: 978-3-031-09342-5
eBook Packages: Computer ScienceComputer Science (R0)