Skip to main content
Log in

A deep neural network for operator learning enhanced by attention and gating mechanisms for long-time forecasting of tumor growth

  • Original Article
  • Published:
Engineering with Computers Aims and scope Submit manuscript

Abstract

Forecasting tumor progression and assessing the uncertainty of predictions play a crucial role in clinical settings, especially for determining disease outlook and making informed decisions about treatment approaches. In this work, we propose TGM-ONets, a deep neural operator learning (PI-DeepONet) based computational framework, which combines bioimaging and tumor growth modeling (TGM) for enhanced prediction of tumor growth. Deep neural operators have recently emerged as a powerful tool for learning the solution maps between the function spaces, and they have demonstrated their generalization capability in making predictions based on unseen input instances once trained. Incorporating the physics laws into the loss function of the deep neural operator can significantly reduce the amount of the training data. The novelties of the design of TGM-ONets include the employment of a convolutional block attention module (CBAM) and a gating mechanism (i.e., mixture of experts (MoE)) to extract the features of the input images. Our results show that the TGM-ONets not only can capture the detailed morphological characteristics of the mild and aggressive tumors within and outside the training domain but also can be used to predict the long-term dynamics of both mild and aggressive tumor growth for up to 6 months with a maximum error of less than 6.7 \(\times 10^{-2}\) for unseen input instances with two or three snapshots added. We also systematically study the effects of the number of training snapshots and noisy data on the performance of TGM-ONets as well as quantify the uncertainty of the model predictions. We demonstrate the efficiency and accuracy by comparing the performance of TGM-ONets with three state-of-the-art (SOTA) baseline models. In summary, we propose a new deep learning model capable of integrating the TGM and sequential observations of tumor morphology to improve the current approaches for predicting tumor growth and thus provide an advanced computational tool for patient-specific tumor prognosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Data availibility statement

The data supporting this studys findings are available from the corresponding author upon reasonable request.

References

  1. Lorenzo G, Heiselman J S, Liss M A, Miga M I, Gomez H, Yankeelov T E, Reali A, Hughes T J. Patient-specific computational forecasting of prostate cancer growth during active surveillance using an imaging-informed biomechanistic model, arXiv preprint arXiv:2310.00060

  2. Xu J, Wang Y, Gomez H, Feng X-Q. Biomechanical modelling of tumor growth with chemotherapeutic treatment: A review, Smart Materials and Structures https://doi.org/10.1088/1361-665X/acf79a

  3. Lorenzo G, Ahmed S R, Hormuth II D A, Vaughn B, Kalpathy-Cramer J, Solorio L, Yankeelov T E, Gomez H. Patient-specific, mechanistic models of tumor growth incorporating artificial intelligence and big data, arXiv preprint arXiv:2308.14925

  4. Yankeelov TE, Atuegwu N, Hormuth D, Weis JA, Barnes SL, Miga MI, Rericha EC, Quaranta V (2013) Clinically relevant modeling of tumor growth and treatment response. Science Translational Medicine 5(187):187ps9-187ps9. https://doi.org/10.1126/scitranslmed.3005686

    Article  MATH  Google Scholar 

  5. Lorenzo G, Scott MA, Tew K, Hughes TJ, Zhang YJ, Liu L, Vilanova G, Gomez H (2016) Tissue-scale, personalized modeling and simulation of prostate cancer growth. Proc Natl Acad Sci 113(48):E7663–E7671. https://doi.org/10.1073/pnas.1615791113

    Article  MATH  Google Scholar 

  6. Lorenzo G, Scott M, Tew K, Hughes T, Gomez H (2017) Hierarchically refined and coarsened splines for moving interface problems, with particular application to phase-field models of prostate tumor growth. Comput Methods Appl Mech Eng 319:515–548. https://doi.org/10.1016/j.cma.2017.03.009

    Article  MathSciNet  MATH  Google Scholar 

  7. Lorenzo G, Hughes TJ, Dominguez-Frojan P, Reali A, Gomez H (2019) Computer simulations suggest that prostate enlargement due to benign prostatic hyperplasia mechanically impedes prostate cancer growth. Proc Natl Acad Sci 116(4):1152–1161. https://doi.org/10.1073/pnas.1815735116

    Article  MATH  Google Scholar 

  8. Colli P, Gomez H, Lorenzo G, Marinoschi G, Reali A, Rocca E (2020) Mathematical analysis and simulation study of a phase-field model of prostate cancer growth with chemotherapy and antiangiogenic therapy effects. Math Models Methods Appl Sci 30(07):1253–1295. https://doi.org/10.1142/S0218202520500220

    Article  MathSciNet  MATH  Google Scholar 

  9. Benítez JM, García-Mozos L, Santos A, Montáns FJ, Saucedo-Mora L (2022) A simple agent-based model to simulate 3D tumor-induced angiogenesis considering the evolution of the hypoxic conditions of the cells. Engineering with Computers 38(5):4115–4133. https://doi.org/10.1007/s00366-022-01625-6

    Article  Google Scholar 

  10. Feng Y, Fuentes D, Hawkins A, Bass J, Rylander MN, Elliott A, Shetty A, Stafford RJ, Oden JT (2009) Nanoshell-mediated laser surgery simulation for prostate cancer treatment. Engineering with Computers 25:3–13. https://doi.org/10.1007/s00366-008-0109-y

    Article  MATH  Google Scholar 

  11. Srinivasan A, Moure A, Gomez H (2023) Computational modeling of flow-mediated angiogenesis: Stokes–Darcy flow on a growing vessel network, Engineering with Computers 1–19 https://doi.org/10.1007/s00366-023-01889-6

  12. Lagergren JH, Nardini JT, Baker RE, Simpson MJ, Flores KB (2020) Biologically-informed neural networks guide mechanistic modeling from sparse experimental data. PLoS Comput Biol 16(12):e1008462. https://doi.org/10.1371/journal.pcbi.1008462

    Article  MATH  Google Scholar 

  13. Oden JT, Lima EA, Almeida RC, Feng Y, Rylander MN, Fuentes D, Faghihi D, Rahman MM, DeWitt M, Gadde M et al (2016) Toward predictive multiscale modeling of vascular tumor growth. Archives of Computational Methods in Engineering 23(4):735–779. https://doi.org/10.1007/s11831-015-9156-x

    Article  MathSciNet  MATH  Google Scholar 

  14. Fritz M, Jha PK, Köppl T, Oden JT, Wagner A, Wohlmuth B (2021) Modeling and simulation of vascular tumors embedded in evolving capillary networks. Comput Methods Appl Mech Eng 384:113975. https://doi.org/10.1016/j.cma.2021.113975

    Article  MathSciNet  MATH  Google Scholar 

  15. Wise SM, Lowengrub JS, Frieboes HB, Cristini V (2008) Three-dimensional multispecies nonlinear tumor growth-I: model and numerical method. J Theor Biol 253(3):524–543. https://doi.org/10.1016/j.jtbi.2008.03.027

    Article  MathSciNet  MATH  Google Scholar 

  16. Frieboes HB, Jin F, Chuang Y-L, Wise SM, Lowengrub JS, Cristini V (2010) Three-dimensional multispecies nonlinear tumor growth-II: tumor invasion and angiogenesis. J Theor Biol 264(4):1254–1278. https://doi.org/10.1016/j.jtbi.2010.02.036

    Article  MathSciNet  MATH  Google Scholar 

  17. Macklin P, McDougall S, Anderson AR, Chaplain MA, Cristini V, Lowengrub J (2009) Multiscale modelling and nonlinear simulation of vascular tumour growth. J Math Biol 58(4):765–798. https://doi.org/10.1007/s00285-008-0216-9

    Article  MathSciNet  MATH  Google Scholar 

  18. Anderson AR, Quaranta V (2008) Integrative mathematical oncology. Nat Rev Cancer 8(3):227–234. https://doi.org/10.1038/nrc2329

    Article  MATH  Google Scholar 

  19. Cristini V, Lowengrub J (2010) Multiscale modeling of cancer: An integrated experimental and mathematical modeling approach. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  20. Oden JT (2018) Adaptive multiscale predictive modelling. Acta Numer 27:353–450. https://doi.org/10.1017/S096249291800003X

    Article  MathSciNet  MATH  Google Scholar 

  21. Rahman MM, Feng Y, Yankeelov TE, Oden JT (2017) A fully coupled space-time multiscale modeling framework for predicting tumor growth. Comput Methods Appl Mech Eng 320:261–286. https://doi.org/10.1016/j.cma.2017.03.021

    Article  MathSciNet  MATH  Google Scholar 

  22. Rocha H, Almeida R, Lima E, Resende A, Oden J, Yankeelov T (2018) A hybrid three-scale model of tumor growth. Math Models Methods Appl Sci 28(01):61–93. https://doi.org/10.1142/S0218202518500021

    Article  MathSciNet  MATH  Google Scholar 

  23. Lima E, Oden J, Almeida R (2014) A hybrid ten-species phase-field model of tumor growth. Math Models Methods Appl Sci 24(13):2569–2599. https://doi.org/10.1142/S0218202514500304

    Article  MathSciNet  MATH  Google Scholar 

  24. Shen D, Wu G, Suk H-I (2017) Deep learning in medical image analysis. Annu Rev Biomed Eng 19:221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442

    Article  MATH  Google Scholar 

  25. Haque IRI, Neubert J (2020) Deep learning approaches to biomedical image segmentation. Informatics in Medicine Unlocked 18:100297. https://doi.org/10.1016/j.imu.2020.100297

    Article  Google Scholar 

  26. Zhang Q, Sampani K, Xu M, Cai S, Deng Y, Li H, Sun JK, Karniadakis GE (2022) AOSLO-net: a deep learning-based method for automatic segmentation of retinal microaneurysms from adaptive optics scanning laser ophthalmoscopy images. Translational Vision Science & Technology 11(8):7–7. https://doi.org/10.1167/tvst.11.8.7

    Article  Google Scholar 

  27. Pereira SP, Oldfield L, Ney A, Hart PA, Keane MG, Pandol SJ, Li D, Greenhalf W, Jeon CY, Koay EJ et al (2020) Early detection of pancreatic cancer. The Lancet Gastroenterology & Hepatology 5(7):698–710. https://doi.org/10.1016/S2468-1253(19)30416-9

    Article  Google Scholar 

  28. Giampaolo F, De Rosa M, Qi P, Izzo S, Cuomo S (2022) Physics-informed neural networks approach for 1D and 2D Gray-Scott systems. Advanced Modeling and Simulation in Engineering Sciences 9(1):1–17. https://doi.org/10.1186/s40323-022-00219-7

    Article  MATH  Google Scholar 

  29. Weng Y, Zhou D (2022) Multiscale physics-informed neural networks for stiff chemical kinetics. J Phys Chem A 126(45):8534–8543. https://doi.org/10.1021/acs.jpca.2c06513

    Article  MATH  Google Scholar 

  30. Colin T, Iollo A, Lagaert J-B, Saut O (2014) An inverse problem for the recovery of the vascularization of a tumor. Journal of Inverse and Ill-posed Problems 22(6):759–786. https://doi.org/10.1515/jip-2013-0009

    Article  MathSciNet  MATH  Google Scholar 

  31. Feng X, Hormuth DA, Yankeelov TE (2019) An adjoint-based method for a linear mechanically-coupled tumor model: Application to estimate the spatial variation of murine glioma growth based on diffusion weighted magnetic resonance imaging. Comput Mech 63:159–180. https://doi.org/10.1007/s00466-018-1589-2

    Article  MathSciNet  MATH  Google Scholar 

  32. Gholami A, Mang A, Biros G (2016) An inverse problem formulation for parameter estimation of a reaction-diffusion model of low grade gliomas. J Math Biol 72(1):409–433. https://doi.org/10.1007/s00285-015-0888-x

    Article  MathSciNet  MATH  Google Scholar 

  33. Hogea C, Davatzikos C, Biros G (2008) An image-driven parameter estimation problem for a reaction-diffusion glioma growth model with mass effects. J Math Biol 56(6):793–825. https://doi.org/10.1007/s00285-007-0139-x

    Article  MathSciNet  MATH  Google Scholar 

  34. Knopoff DA, Fernández DR, Torres GA, Turner CV (2013) Adjoint method for a tumor growth pde-constrained optimization problem. Computers & Mathematics with Applications 66(6):1104–1119. https://doi.org/10.1016/j.camwa.2013.05.028

    Article  MathSciNet  MATH  Google Scholar 

  35. Subramanian S, Scheufele K, Mehl M, Biros G (2020) Where did the tumor start? An inverse solver with sparse localization for tumor growth models. Inverse Prob 36(4):045006. https://doi.org/10.1088/1361-6420/ab649c

    Article  MathSciNet  MATH  Google Scholar 

  36. Chen X, Summers RM, Yao J (2012) Kidney tumor growth prediction by coupling reaction-diffusion and biomechanical model. IEEE Trans Biomed Eng 60(1):169–173

    Article  MATH  Google Scholar 

  37. Konukoglu E, Clatz O, Menze BH, Stieltjes B, Weber M-A, Mandonnet E, Delingette H, Ayache N (2009) Image guided personalization of reaction-diffusion type tumor growth models using modified anisotropic eikonal equations. IEEE Trans Med Imaging 29(1):77–95

    Article  MATH  Google Scholar 

  38. Mi H, Petitjean C, Dubray B, Vera P, Ruan S (2014) Prediction of lung tumor evolution during radiotherapy in individual patients with PET. IEEE Trans Med Imaging 33(4):995–1003

    Article  MATH  Google Scholar 

  39. Wong KC, Summers RM, Kebebew E, Yao J (2016) Pancreatic tumor growth prediction with elastic-growth decomposition, image-derived motion, and FDM-FEM coupling. IEEE Trans Med Imaging 36(1):111–123

    Article  Google Scholar 

  40. Hormuth DA II, Weis JA, Barnes SL, Miga MI, Rericha EC, Quaranta V, Yankeelov TE (2015) Predicting in vivo glioma growth with the reaction diffusion equation constrained by quantitative magnetic resonance imaging data. Phys Biol 12(4):046006. https://doi.org/10.1088/1478-3975/12/4/046006

    Article  MATH  Google Scholar 

  41. Scheufele K, Mang A, Gholami A, Davatzikos C, Biros G, Mehl M (2019) Coupling brain-tumor biophysical models and diffeomorphic image registration. Comput Methods Appl Mech Eng 347:533–567. https://doi.org/10.1016/j.cma.2018.12.008

    Article  MathSciNet  MATH  Google Scholar 

  42. Raissi M (2018) Deep hidden physics models: Deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research 19(1):932–955

    MathSciNet  MATH  Google Scholar 

  43. Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707. https://doi.org/10.1016/j.jcp.2018.10.045

    Article  MathSciNet  MATH  Google Scholar 

  44. Li S, Wang G, Di Y, Wang L, Wang H, Zhou Q (2023) A physics-informed neural network framework to predict 3D temperature field without labeled data in process of laser metal deposition. Eng Appl Artif Intell 120:105908. https://doi.org/10.1016/j.engappai.2023.105908

    Article  MATH  Google Scholar 

  45. Cai S, Li H, Zheng F, Kong F, Dao M, Karniadakis GE, Suresh S (2021) Artificial intelligence velocimetry and microaneurysm-on-a-chip for three-dimensional analysis of blood flow in physiology and disease. Proc Natl Acad Sci 118(13):e2100697118. https://doi.org/10.1073/pnas.2100697118

    Article  Google Scholar 

  46. Kissas G, Yang Y, Hwuang E, Witschey WR, Detre JA, Perdikaris P (2020) Machine learning in cardiovascular flows modeling: Predicting arterial blood pressure from non-invasive 4D flow MRI data using physics-informed neural networks. Comput Methods Appl Mech Eng 358:112623. https://doi.org/10.1016/j.cma.2019.112623

    Article  MathSciNet  MATH  Google Scholar 

  47. Sahli Costabal F, Yang Y, Perdikaris P, Hurtado DE, Kuhl E (2020) Physics-informed neural networks for cardiac activation mapping. Frontiers in Physics 8:42. https://doi.org/10.3389/fphy.2020.00042

    Article  MATH  Google Scholar 

  48. Lei J, Liu Q, Wang X (2022) Physics-informed multi-fidelity learning-driven imaging method for electrical capacitance tomography. Eng Appl Artif Intell 116:105467. https://doi.org/10.1016/j.engappai.2022.105467

    Article  MATH  Google Scholar 

  49. Ouyang H, Zhu Z, Chen K, Tian B, Huang B, Hao J (2023) Reconstruction of hydrofoil cavitation flow based on the chain-style physics-informed neural network. Eng Appl Artif Intell 119:105724. https://doi.org/10.1016/j.engappai.2022.105724

    Article  MATH  Google Scholar 

  50. Nguyen TNK, Dairay T, Meunier R, Mougeot M (2022) Physics-informed neural networks for non-Newtonian fluid thermo-mechanical problems: An application to rubber calendering process. Eng Appl Artif Intell 114:105176. https://doi.org/10.1016/j.engappai.2022.105176

    Article  Google Scholar 

  51. Ren P, Rao C, Sun H, Liu Y. SeismicNet: Physics-informed neural networks for seismic wave modeling in semi-infinite domain, arXiv preprint arXiv:2210.14044

  52. Lorenzo G, Hormuth DA II, Jarrett AM, Lima EA, Subramanian S, Biros G, Oden JT, Hughes TJ, Yankeelov TE (2022) Quantitative in vivo imaging to enable tumour forecasting and treatment optimization. In: Cancer Complexity (ed) Computation. New York, Springer, pp 55–97

  53. Zhang E, Dao M, Karniadakis GE, Suresh S (2022) Analyses of internal structures and defects in materials using physics-informed neural networks. Sci Adv 8(7):eabk0644. https://doi.org/10.1126/sciadv.abk0644

    Article  MATH  Google Scholar 

  54. Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nature Reviews Physics 3(6):422–440. https://doi.org/10.1038/s42254-021-00314-5

    Article  MATH  Google Scholar 

  55. Cai S, Mao Z, Wang Z, Yin M, Karniadakis G E (2022) Physics-informed neural networks (PINNs) for fluid mechanics: A review, Acta Mechanica Sinica 1–12 https://doi.org/10.1007/s10409-021-01148-1

  56. Jagtap AD, Kharazmi E, Karniadakis GE (2020) Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Comput Methods Appl Mech Eng 365:113028. https://doi.org/10.1016/j.cma.2020.113028

    Article  MathSciNet  MATH  Google Scholar 

  57. Yang L, Meng X, Karniadakis GE (2021) B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J Comput Phys 425:109913. https://doi.org/10.1016/j.jcp.2020.109913

    Article  MathSciNet  MATH  Google Scholar 

  58. Du P, Zhu X, Wang J-X (2022) Deep learning-based surrogate model for three-dimensional patient-specific computational fluid dynamics. Phys Fluids 34(8):081906. https://doi.org/10.1063/5.0101128

    Article  MATH  Google Scholar 

  59. Chen Q, Ye Q, Zhang W, Li H, Zheng X (2023) TGM-Nets: A deep learning framework for enhanced forecasting of tumor growth by integrating imaging and modeling. Eng Appl Artif Intell 126:106867. https://doi.org/10.1016/j.engappai.2023.106867

    Article  MATH  Google Scholar 

  60. Ruiz Herrera C, Grandits T, Plank G, Perdikaris P, Sahli Costabal F, Pezzuto S (2022) Physics-informed neural networks to learn cardiac fiber orientation from multiple electroanatomical maps, Engineering with Computers 38(5), 3957–3973. https://doi.org/10.1007/s00366-022-01709-3

  61. Tajdari M, Tajdari F, Shirzadian P, Pawar A, Wardak M, Saha S, Park C, Huysmans T, Song Y, Zhang YJ et al (2022) Next-generation prognosis framework for pediatric spinal deformities using bio-informed deep learning networks. Engineering with Computers 38(5):4061–4084. https://doi.org/10.1007/s00366-022-01742-2

    Article  MATH  Google Scholar 

  62. Lee SY, Park C-S, Park K, Lee HJ, Lee S (2023) A physics-informed and data-driven deep learning approach for wave propagation and its scattering characteristics. Engineering with Computers 39(4):2609–2625. https://doi.org/10.1007/s00366-022-01640-7

    Article  MATH  Google Scholar 

  63. Fallah A, Aghdam M M (2023) Physics-informed neural network for bending and free vibration analysis of three-dimensional functionally graded porous beam resting on elastic foundation, Engineering with Computers 1–18 https://doi.org/10.1007/s00366-023-01799-7

  64. Mai H T, Mai D D, Kang J, Lee J, Lee J (2023) Physics-informed neural energy-force network: a unified solver-free numerical simulation for structural optimization, Engineering with Computers 1–24 https://doi.org/10.1007/s00366-022-01760-0

  65. Wang S, Wang H, Perdikaris P (2021) Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Sci Adv 7(40):eabi8605. https://doi.org/10.1126/sciadv.abi8605

    Article  MATH  Google Scholar 

  66. Koric S, Viswantah A, Abueidda D W, Sobh N A, Khan K (2023) Deep learning operator network for plastic deformation with variable loads and material properties, Engineering with Computers 1–13 https://doi.org/10.1007/s00366-023-01822-x

  67. Linka K, Schäfer A, Meng X, Zou Z, Karniadakis GE, Kuhl E (2022) Bayesian physics informed neural networks for real-world nonlinear dynamical systems. Comput Methods Appl Mech Eng 402:115346. https://doi.org/10.1016/j.cma.2022.115346

    Article  MathSciNet  MATH  Google Scholar 

  68. Zakir Ullah M, Zheng Y, Song J, Aslam S, Xu C, Kiazolu GD, Wang L (2021) An attention-based convolutional neural network for acute lymphoblastic leukemia classification. Appl Sci 11(22):10662. https://doi.org/10.3390/app112210662

    Article  Google Scholar 

  69. Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for computational linguistics 4:259–272. https://doi.org/10.1162/tacl_a_00097

    Article  Google Scholar 

  70. Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimedia Tools and Applications 79:5595–5616. https://doi.org/10.1007/s11042-019-08422-2

    Article  MATH  Google Scholar 

  71. Shen Y, Huang X-J (2016) Attention-based convolutional neural network for semantic relation extraction, in: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2526–2536

  72. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87

    Article  MATH  Google Scholar 

  73. Wang S, Perdikaris P (2023) Long-time integration of parametric evolution equations with physics-informed deeponets. J Comput Phys 475:111855. https://doi.org/10.1016/j.jcp.2022.111855

    Article  MathSciNet  MATH  Google Scholar 

  74. Michałowska K, Goswami S, Karniadakis G E, Riemer-Sørensen S. Neural operator learning for long-time integration in dynamical systems with recurrent neural networks, arXiv preprint arXiv:2303.02243

  75. Zhu M, Zhang H, Jiao A, Karniadakis GE, Lu L (2023) Reliable extrapolation of deep neural operators informed by physics or sparse observations. Comput Methods Appl Mech Eng 412:116064. https://doi.org/10.1016/j.cma.2023.116064

    Article  MathSciNet  MATH  Google Scholar 

  76. Osband I, Aslanides J, Cassirer A. Randomized prior functions for deep reinforcement learning, Advances in Neural Information Processing Systems 31

  77. Xu J, Vilanova G, Gomez H (2016) A mathematical model coupling tumor growth and angiogenesis. PLoS ONE 11(2):e0149422. https://doi.org/10.1371/journal.pone.0149422

    Article  MATH  Google Scholar 

  78. Xu S, Xu Z, Kim OV, Litvinov RI, Weisel JW, Alber M (2017) Model predictions of deformation, embolization and permeability of partially obstructive blood clots under variable shear flow. J R Soc Interface 14(136):20170441. https://doi.org/10.1098/rsif.2017.0441

    Article  MATH  Google Scholar 

  79. Xu J, Vilanova G, Gomez H (2020) Phase-field model of vascular tumor growth: Three-dimensional geometry of the vascular network and integration with imaging data. Comput Methods Appl Mech Eng 359:112648. https://doi.org/10.1016/j.cma.2019.112648

    Article  MathSciNet  MATH  Google Scholar 

  80. Kobayashi R (2010) A brief introduction to phase field method, in: AIP Conference Proceedings, Vol. 1270, American Institute of Physics, 282–291. https://doi.org/10.1063/1.3476232

  81. Lu L, Jin P, Pang G, Zhang Z, Karniadakis GE (2021) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence 3(3):218–229

    Article  MATH  Google Scholar 

  82. Chen T, Chen H (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans Neural Networks 6(4):911–917

    Article  MATH  Google Scholar 

  83. Deng B, Shin Y, Lu L, Zhang Z, Karniadakis GE (2022) Approximation rates of DeepONets for learning operators arising from advection-diffusion equations. Neural Netw 153:411–426. https://doi.org/10.1016/j.neunet.2022.06.019

    Article  MATH  Google Scholar 

  84. Lu L, Jin P, Karniadakis G E. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators, arXiv preprint arXiv:1910.03193

  85. Lu L, Meng X, Cai S, Mao Z, Goswami S, Zhang Z, Karniadakis GE (2022) A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data. Comput Methods Appl Mech Eng 393:114778. https://doi.org/10.1016/j.cma.2022.114778

    Article  MathSciNet  MATH  Google Scholar 

  86. He J, Kushwaha S, Park J, Koric S, Abueidda D, Jasiuk I (2024) Sequential Deep Operator networks (S-DeepONet) for predicting full-field solutions under time-dependent loads. Eng Appl Artif Intell 127:107258. https://doi.org/10.1016/j.engappai.2023.107258

    Article  MATH  Google Scholar 

  87. Sun Y, Moya C, Lin G, Yue M, Deepgraphonet: A deep graph operator network to learn and zero-shot transfer the dynamic response of networked systems, IEEE Systems Journal

  88. Goswami S, Yin M, Yu Y, Karniadakis GE (2022) A physics-informed variational deeponet for predicting crack path in quasi-brittle materials. Comput Methods Appl Mech Eng 391:114587. https://doi.org/10.1016/j.cma.2022.114587

    Article  MathSciNet  MATH  Google Scholar 

  89. Goswami S, Bora A, Yu Y, E G (2023) Karniadakis, Physics-informed deep neural operator networks, in: Machine Learning in Modeling and Simulation: Methods and Applications, Springer, New York, pp. 219–254

  90. Koric S, Abueidda DW (2023) Data-driven and physics-informed deep learning operators for solution of heat conduction equation with parametric heat source. Int J Heat Mass Transf 203:123809. https://doi.org/10.1016/j.ijheatmasstransfer.2022.123809

    Article  MATH  Google Scholar 

  91. Hao Y, Di Leoni PC, Marxen O, Meneveau C, Karniadakis GE, Zaki TA (2023) Instability-wave prediction in hypersonic boundary layers with physics-informed neural operators. Journal of Computational Science 73:102120. https://doi.org/10.1016/j.jocs.2023.102120

    Article  MATH  Google Scholar 

  92. Iqbal S, Ghani MU, Saba T, Rehman A (2018) Brain tumor segmentation in multi-spectral MRI using convolutional neural networks (CNN). Microsc Res Tech 81(4):419–427. https://doi.org/10.1002/jemt.22994

    Article  Google Scholar 

  93. Chen L, Wu Y, DSouza A M, Abidin A Z, Wismüller A, Xu C (2018) MRI tumor segmentation with densely connected 3D CNN, in: Medical Imaging 2018: Image Processing, Vol. 10574, SPIE, pp. 357–364. https://doi.org/10.1117/12.2293394

  94. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251

    Article  MATH  Google Scholar 

  95. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin P-M, Larochelle H (2017) Brain tumor segmentation with deep neural networks. Med Image Anal 35:18–31. https://doi.org/10.1016/j.media.2016.05.004

    Article  Google Scholar 

  96. Havaei M, Dutil F, Pal C, Larochelle H, Jodoin P-M (2016) A convolutional neural network approach to brain tumor segmentation, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: First International Workshop, Brainles 2015, Held in Conjunction with MICCAI 2015, Munich, Germany, October 5, 2015, Revised Selected Papers 1, Springer, pp. 195–208. https://doi.org/10.1007/978-3-319-30858-6_17

  97. Woo S, Park J, Lee J-Y, Kweon I S (2018) CBAM: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19

  98. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, PMLR, pp. 448–456

  99. Zhou Y, Li D, Huo S, Kung S-Y (2021) Shape autotuning activation function. Expert Syst Appl 171:114534. https://doi.org/10.1016/j.eswa.2020.114534

    Article  Google Scholar 

  100. Wang S, Wang H, Perdikaris P (2022) Improved architectures and training algorithms for deep operator networks. J Sci Comput 92(2):35. https://doi.org/10.1007/s10915-022-01881-0

    Article  MathSciNet  MATH  Google Scholar 

  101. Waterhouse S, Cook G, Ensemble methods for phoneme classification, Advances in Neural Information Processing Systems 9

  102. Nguyen MH, Abbass HA, Mckay RI (2006) A novel mixture of experts model based on cooperative coevolution. Neurocomputing 70(1–3):155–163. https://doi.org/10.1016/j.neucom.2006.04.009

    Article  MATH  Google Scholar 

  103. Ebrahimpour R, Kabir E, Yousefi MR (2007) Face detection using mixture of MLP experts. Neural Process Lett 26:69–82. https://doi.org/10.1007/s11063-007-9043-z

    Article  MATH  Google Scholar 

  104. Übeyli ED, Ilbay K, Ilbay G, Sahin D, Akansel G (2010) Differentiation of two subtypes of adult hydrocephalus by mixture of experts. J Med Syst 34:281–290. https://doi.org/10.1007/s10916-008-9239-4

    Article  Google Scholar 

  105. Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2011) Mixture of MLP-experts for trend forecasting of time series: A case study of the tehran stock exchange. Int J Forecast 27(3):804–816. https://doi.org/10.1016/j.ijforecast.2010.02.015

    Article  MATH  Google Scholar 

  106. Kingma D P, Ba J, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980

  107. Raissi M, Yazdani A, Karniadakis GE (2020) Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 367(6481):1026–1030. https://doi.org/10.1126/science.aaw4741

    Article  MathSciNet  MATH  Google Scholar 

  108. Yin M, Zheng X, Humphrey JD, Karniadakis GE (2021) Non-invasive inference of thrombus material properties with physics-informed neural networks. Comput Methods Appl Mech Eng 375:113603. https://doi.org/10.1016/j.cma.2020.113603

    Article  MathSciNet  MATH  Google Scholar 

  109. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c, Convolutional LSTM network: A machine learning approach for precipitation nowcasting, Advances in Neural Information Processing Systems 28

  110. Kirby R M, Karniadakis G E, Spectral element and hp methods, Encyclopedia of Computational Mechanics

  111. Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144(5):646–674. https://doi.org/10.1016/j.cell.2011.02.013

    Article  MATH  Google Scholar 

  112. Lu L, Dao M, Kumar P, Ramamurty U, Karniadakis GE, Suresh S (2020) Extraction of mechanical properties of materials through deep learning from instrumented indentation. Proc Natl Acad Sci 117(13):7052–7062. https://doi.org/10.1073/pnas.1922210117

    Article  Google Scholar 

  113. Sanga S, Sinek JP, Frieboes HB, Ferrari M, Fruehauf JP, Cristini V (2006) Mathematical modeling of cancer progression and response to chemotherapy. Expert Rev Anticancer Ther 6(10):1361–1376. https://doi.org/10.1586/14737140.6.10.1361

    Article  MATH  Google Scholar 

  114. Ayensa-Jiménez J, Doweidar MH, Sanz-Herrera JA, Doblare M (2022) Understanding glioblastoma invasion using physically-guided neural networks with internal variables. PLoS Comput Biol 18(4):e1010019. https://doi.org/10.1371/journal.pcbi.1010019

    Article  MATH  Google Scholar 

  115. Gao Q, Lin H, Qian J, Liu X, Cai S, Li H, Fan H, Zheng Z (2023) A deep learning model for efficient end-to-end stratification of thrombotic risk in left atrial appendage. Eng Appl Artif Intell 126:107187. https://doi.org/10.1016/j.engappai.2023.107187

    Article  Google Scholar 

  116. Qi C R, Su H, Mo K, Guibas L J (2017) Pointnet: Deep learning on point sets for 3D classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660

  117. Garcia-Garcia A, Gomez-Donoso F, Garcia-Rodriguez J, Orts-Escolano S, Cazorla M, Azorin-Lopez J, Pointnet: A 3D convolutional neural network for real-time object class recognition, in, (2016) International joint conference on neural networks (IJCNN). IEEE 2016:1578–1584

  118. Aoki Y, Goforth H, Srivatsan R A, Lucey S (2019) Pointnetlk: Robust & efficient point cloud registration using pointnet, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7163–7172

Download references

Acknowledgements

Q.C and X.Z gratefully acknowledge the support from the starting fund of Jinan University, Guangzhou, Guangdong Province, China.

Author information

Authors and Affiliations

Authors

Contributions

Qijing Chen: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing-original draft (equal), Writing-review & editing (equal). He Li: Conceptualization (equal); Writing-original draft (equal); Writing-review & editing (equal). Xiaoning Zheng: Conceptualization (equal); Funding acquisition (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Supervision (equal); Writing-original draft (equal); Writing-review & editing (equal).

Corresponding author

Correspondence to Xiaoning Zheng.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

A Convergence of spectral/hp element (Nektar) results for TGMs

We give a brief introduction to the spectral/hp element method which we use to solve the PDEs for tumor growth and generate the synthetic data. More details about the spectral/hp element method can be found in [110]. We first define the weak form of the PDE and impose the boundary conditions. Then we discretize the computational domain into subdomains. Below we use the one-dimensional Poisson equation in the interval \(0< x \le 1\) for illustration. \(\bigtriangleup u + f\) = 0, where \(u(x = 0)\) = \(g_D\) = 1 and \(\frac{\partial u(x = 1)}{\partial x}\) = \(g_N\) = 1.

  1. 1.

    We get the weak form by multiplying the problem by a discrete test space and integrating the second-order derivative by parts: \(\int _{0}^{1}\frac{\partial v^{\delta }}{\partial x}\frac{\partial u^{\delta }}{\partial x} = \int v^{\delta }f dx + v^{\delta }(1)g_{N}.\)

  2. 2.

    We lift a known solution from the problem by decomposing into a known solution satisfying the Dirichlet boundary conditions and a homogeneous solution such that \(u^{\delta } = u^{D}+u^{H}\), and the weak solution becomes \(\int _{0}^{1}\frac{\partial v^{\delta }}{\partial x}\frac{\partial u^{H}}{\partial x} = \int v^{\delta }f dx + v^{\delta }(1)g_{N}-\int _{0}^{1}\int _{0}^{1}\frac{\partial v^{\delta }}{\partial x}\frac{\partial u^{D}}{\partial x}.\)

We use piecewise linear functions as basis functions and decompose the domain into two subdomains. We can use finer mesh which can give h-convergence and higher-order polynomials as basis functions which can give p-type convergence. For the linear two-subdomain case the approximate expansion has the form \(u^{\delta } = \sum _{i = 0}^{2} \hat{u_{i}}\Phi _i(x)\), where \(\Phi _i(x)\) are the piecewise linear functions. Then we represent f in terms of basis functions \(f(x) = \sum _{i = 0}^2\hat{f_i}\Phi _i(x)\). Finally, we can solve the linear system of equations to get the numerical solution for u, which is also a finite element approximation for this example.

We solve the Eqs. 14 in Sect. 2.1 for tumor growth using a spectral/hp element Nektar solver with \(\Delta t\) = \(1.0\,\times \,10^{-2}\), \(1.0\,\times \,10^{-3}\), and \(1.0\,\times \,10^{-4}\). The characteristic length and time are 1 mm and 1 day. We found that at \(\Delta t \le 1.0\,\times \,10^{-3}\), the differences in the results between \(\Delta t\)s are marginal. For aggressive tumors, the maximum difference (i.e., the maximum pointwise absolute difference between \(\phi\) from two simulation runs) between \(\Delta t = 1.0\,\times \,10^{-2}\) and \(\Delta t = 1.0\,\times \,10^{-5}\) is \(2.08\,\times \,10^{-2}\), \(\Delta t = 1.0\,\times \,10^{-3}\) and \(\Delta t = 1.0\,\times \,10^{-5}\) is \(2.99\,\times \,10^{-3}\), and \(\Delta t = 1.0\,\times \,10^{-4}\) and \(\Delta t = 1.0\,\times \,10^{-5}\) is \(1.32\,\times \,10^{-3}\). For all the numerical simulations conducted by Nektar, we use polynomial order = 3, time step size \(\Delta t\) = \(1.0\,\times \,10^{-3}\), mesh size \(6.67\,\times \,10^{-3}\) in both x- and y- directions which resulted in 22,500 quadrilateral elements, and run the solver with 256 CPU nodes in parallel. Table 43 shows the parameters used in the simulations. It takes about 1.1 h to run one mild tumor case up to 80 days and 3.9 h to run one aggressive tumor case up to 200 days.

Table 43 Parameters used for mechanistic simulations using phase-field model

B Forecast the tumor growth using the initial density of nutrients as the input for the branch net

Fig. 32
figure 32

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the initial density of nutrients. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: R = 0.07 mm; right: R = 0.21 mm (R: the length of the minor axis of the initial ellipsoidal tumor)

Fig. 33
figure 33

Prediction for tumor cells and nutrient dynamics for aggressive tumor cases mapping from the initial density of nutrients. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: R = 0.07 mm; right: R = 0.23 mm (R: the length of the minor axis of the initial ellipsoidal tumor)

We also test the performances of TGM-ONets to forecast tumor growth using the initial density of nutrients as the input for the branch net. The hyper-parameters (i.e., the initial learning rate, the decay step, the \(\omega _{PDE}\) and the \(\omega _{data}\)) are selected to be the same as in Sect. 3.1.1. We parameterized the initial density of nutrients for both mild and aggressive tumors as:

$$\begin{aligned} \sigma (0, x, y) = 1 - 0.8\phi (0, x, y), \end{aligned}$$
(24)

which represents an ellipsoidal nutrients field corresponding to an ellipsoidal tumor in the computational domain for which the y-axis has double length compared to the x-axis. We use the same training datasets as in Sect. 3.1.1 to train TGM-ONets.

For mild tumor cases, prediction errors for all training and testing cases given by TGM-ONets are represented in Fig. 32(a-b), which shows that the average of prediction errors for both \(\phi\) and \(\sigma\) are under \(1.0\,\times \,10^{-3}\) in training datasets and \(2.0\,\times \,10^{-2}\) in testing datasets. the maximum prediction error are around \(2.0\,\times \,10^{-3}\) for \(\phi\) and \(5.0\,\times \,10^{-4}\) for \(\sigma\) in training datasets while \(6.0\,\times \,10^{-2}\) for \(\phi\) and \(5.0\,\times \,10^{-2}\) for \(\sigma\) in testing datasets. Predictions for two specific cases of R = 0.07 mm (in-distribution) and 0.21 mm (out-of-distribution) are illustrated in Fig. 32c, d.

For aggressive tumor cases, prediction errors for all training and testing cases given by TGM-ONets are represented in Fig. 33a, b, which shows that the average of prediction errors for both \(\phi\) and \(\sigma\) are under \(5.0\,\times \,10^{-4}\) in training datasets and \(2.0\,\times \,10^{-2}\) in testing datasets. the maximum prediction error are under \(2.0\,\times \,10^{-3}\) for both \(\phi\) and \(\sigma\) in training datasets while \(4.0\,\times \,10^{-2}\) for both \(\phi\) and \(\sigma\) in testing datasets. Predictions for two specific cases of R = 0.07 mm (in-distribution) and 0.23 mm (out-of-distribution) are illustrated in Fig. 33c, d.

C Forecast the tumor growth using the initial density of tumor cells with varying shapes as the input for the branch net

We evaluate the performance of TGM-ONets to forecast tumor growth using the initial density of tumor cells with varying shapes as the input for the branch net. For mild tumors, we vary the ratios of the y-semiaxes to the x-semiaxes (\(\delta\)) (C.1), positions within the domain (C.2), the length of the minor axis of the initial ellipsoidal tumor centered at (0.5,0.5) (C.3) and centered not at (0.5,0.5) (C.4), circular shapes (C.5) and oblique ellipsoidal tumors (C.6). For aggressive tumors, we vary the ratios of the y-semiaxes to the x-semiaxes (\(\delta\)) (C.7) and positions (C.8). The hyper-parameters (i.e., initial learning rate, the decay step, the \(\omega _{PDE}\) and the \(\omega _{data}\)) are selected to be the same as Sect. 3.1.1.

1.1 C.1 Forecast the mild tumor growth using the initial density of tumor cells with varying ratio (\(\delta\)) of y-semiaxis to the x-semiaxis

For mild tumor cases, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying \(\delta\) to the solutions of tumor cells and nutrients on the entire computation domain. The growth rate and the length of the minor axis of the initial ellipsoidal tumor R remain the same as 1.5 and 0.05. We sample 1000 values of \(\delta\) from a uniform distribution U(1.0, 2.6). Assuming we have 8 cases of data recording the density of tumor cells and nutrients for every 0.5 days up to 70.5 days with different values of \(\delta\) sampled from U(1.0, 2.6), we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of \(\delta\) sampled from U(1.0, 2.9). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 34a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(6.0\,\times \,10^{-4}\) and \(3.0\,\times \,10^{-4}\) in training datasets while \(8.0\,\times \,10^{-4}\) and \(2.0\,\times \,10^{-3}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(3.0\,\times \,10^{-4}\) in training datasets while \(5.0\,\times \,10^{-4}\) in testing datasets. Predictions for two specific cases of \(\delta\) = 1.4 (in-distribution) and \(\delta\) = 2.9 (out-of-distribution) are illustrated in Fig. 34b.

Fig. 34
figure 34

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the initial density of tumor cells with varying \(\delta\). a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: \(\delta\) = 1.4; right: \(\delta\) = 2.9 (\(\delta\): the ratios of the y-semiaxes to the x-semiaxes)

1.2 C.2 Forecast the mild tumor growth using the initial density of tumor cells with varying positions within the domain

In this subsection, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying positions within the domain to the solutions of tumor cells and nutrients on the entire computation domain for mild tumor cases. The growth rate and the length of the minor axis of the initial ellipsoidal tumor R remain the same as 1.5 mm and 0.05 mm. Let \((x^{*}, y^{*})\) denote the position of tumor cells and nutrients, we sample 1000 values of \(x^{*}\) and \(y^{*}\) from a uniform distribution U(0.4, 0.6). Assuming we have 8 cases of data recording the density of tumor cells and nutrients for every 0.5 days up to 70.5 days with different values of \(x^{*}\) and \(y^{*}\) sampled from U(0.4, 0.6), we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of \(x^{*}\) and \(y^{*}\) sampled from U(0.4, 0.6). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 35a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(1.0\,\times \,10^{-3}\) in training datasets while \(1.5\times10^{-1}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(1.0\,\times \,10^{-2}\) in training datasets while \(8.0\,\times \,10^{-2}\) in testing datasets. Predictions for two specific cases of \((x^{*}, y^{*})\) = (0.58, 0.42) (in-distribution) and \((x^{*}, y^{*})\) = (0.42, 0.58) (in-distribution) are illustrated in Fig. 35b.

Fig. 35
figure 35

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the initial density of tumor cells with varying positions within the computation domain. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: \((x^{*}, y^{*})\) = (0.58, 0.42); right: \((x^{*}, y^{*})\) = (0.42, 0.58) (\((x^{*}, y^{*})\): the center position of tumor cells and nutrients within computation domain)

1.3 C.3 Forecast the mild tumor growth using the initial density of tumor cells with varying length of the minor axis of the initial ellipsoidal tumor centered at (0.5,0.5)

In this subsection, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying lengths of the minor axis of the initial ellipsoidal tumor R and a larger ratio of the y-semiaxes to the x-semiaxes (\(\delta\) = 3) to the solutions of tumor cells and nutrients on the entire computation domain. The growth rate remains the same as 1.5. We sample 1000 values of R from a uniform distribution U(0.06, 0.20). Assuming we have 9 cases of data with different values of R sampled from U(0.06, 0.20) and an additional case of data with R = 0.22 mm recording the density of tumor cells and nutrients for every 0.5 days up to 70.5 days, we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of R sampled from U(0.06, 0.23). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 36a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(2.0\,\times \,10^{-3}\) and \(6.0\,\times \,10^{-4}\) in training datasets while \(1.3\,\times \,10^{-2}\) and \(1.0\,\times \,10^{-2}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(1.0\,\times \,10^{-3}\) in training datasets while \(5.0\,\times \,10^{-3}\) in testing datasets. Predictions for two specific cases of R = 0.07 mm (in-distribution) and R = 0.23 mm (out-of-distribution) are illustrated in Fig. 36b.

Fig. 36
figure 36

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the initial density of tumor cells with varying lengths of the minor axis of the initial ellipsoidal tumor R and a larger \(\delta\) = 3. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: R = 0.07 mm; right: R = 0.23 mm. (R: the length of the minor axis of the initial ellipsoidal tumor)

1.4 C.4 Forecast the mild tumor growth using the initial density of tumor cells with varying length of the minor axis of the initial ellipsoidal tumor centered not at (0.5,0.5)

In this subsection, we use TGM-ONets to learn the mapping from the non-center initial density of tumor cells with varying lengths of the minor axis of the initial ellipsoidal tumor R to the solutions of tumor cells and nutrients on the entire computation domain. The growth rate remains the same as 1.5 1/day. We sample 1000 values of R from a uniform distribution U(0.06, 0.20). Assuming we have 9 cases of data with different values of R sampled from U(0.06, 0.20) and additional data with R = 0.22 mm recording the density of tumor cells and nutrients for every 0.5 days up to 70.5 days, we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of R sampled from U(0.06, 0.23). The accuracy of predictions associated with both training and testing datasets are represented in Fig. 37a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(3.0\,\times \,10^{-3}\) in training datasets while \(5.0\,\times \,10^{-2}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(1.0\,\times \,10^{-3}\) in training datasets while \(2.0\,\times \,10^{-2}\) in testing datasets. Predictions for two specific cases of R = 0.07 mm (in-distribution) and R = 0.23 mm (out-of-distribution) are illustrated in Fig. 37b.

Fig. 37
figure 37

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the non-center initial density of tumor cells with varying scaling factors of R. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: R = 0.07 mm; right: R = 0.23 mm. (R: the length of the minor axis of the initial ellipsoidal tumor)

1.5 C.5 Forecast the mild tumor growth using the initial density of tumor cells with varying radii of the initial circular tumor

In this subsection, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying radii of the initial circular tumor R to the solutions of tumor cells and nutrients on the entire computation domain for mild tumor cases. The growth rate remains the same as 1.5. We sample 1000 values of R from a uniform distribution U(0.06, 0.20). Assuming we have 9 cases of data with different values of R sampled from U(0.06, 0.20) and an additional case of data with R = 0.22 recording the density of tumor cells and nutrients for every 0.5 days up to 70.5 days, we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of R sampled from U(0.06, 0.23). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 38a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(8.0\,\times \,10^{-4}\) and \(5.0\,\times \,10^{-4}\) in training datasets while \(1.0\,\times \,10^{-2}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(4.0\,\times \,10^{-4}\) in training datasets while \(5.0\,\times \,10^{-3}\) in testing datasets. Predictions for two specific cases of R = 0.07(in-distribution) and R = 0.23(out-of-distribution) are illustrated in Fig. 38b.

Fig. 38
figure 38

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the circular initial density of tumor cells with varying scaling factors of R. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: R = 0.07 mm; right: R = 0.23 mm. (R: the radius of the initial circular tumor)

1.6 C.6 Forecast the mild tumor growth using the initial density of tumor cells with varying oblique (\(\theta\)) and the ratios of the y-semiaxes to the x-semiaxes (\(\delta\) = 2 or 4)

In this subsection, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying oblique(\(\theta\)) and the ratios of the y-semiaxes to the x-semiaxes(\(\delta\) = 2 or 4) to the solutions of tumor cells and nutrients on the entire computation domain. The growth rate remains the same as 1.5. We sample 1000 values of \(\theta\) from a uniform distribution \(U(0, 2\pi )\). Assuming we have 8 cases of data with different values of \(\theta\) sampled from \(U(0, 2\pi )\), we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of \(\theta\) sampled from \(U(0, 2\pi )\). The accuracy of predictions associated with both training and testing datasets are represented in Fig. 39a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are around \(3.0\,\times \,10^{-3}\) and \(1.5\,\times \,10^{-3}\) in training datasets while \(2.0\,\times \,10^{-1}\) and \(8.0\,\times \,10^{-2}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(2.0\,\times \,10^{-3}\) in training datasets while \(1.0\,\times \,10^{-1}\) in testing datasets. Predictions for two specific cases of (\(\theta\) = 0.3\(\pi\), \(\delta\) = 4) (in-distribution) and (\(\theta\) = 0.1\(\pi\), \(\delta\) = 2) (in-distribution) are illustrated in Fig. 39b.

Fig. 39
figure 39

Prediction for tumor cells and nutrient dynamics for mild tumor cases mapping from the initial density of tumor cells with varying oblique \(\theta\) and the ratios of the y-semiaxes to the x-semiaxes (\(\delta\) = 2 or 4). a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: \(\theta\) = 0.3\(\pi\), \(\delta\) = 4; right: \(\theta\) = 0.1\(\pi\), \(\delta\) = 2. (\(\theta\): oblique of the initial density of tumor cells, \(\delta\): the ratios of the y-semiaxes to the x-semiaxes)

1.7 C.7 Forecast the aggressive tumor growth using the initial density of tumor cells with varying the ratio of the x-semiaxis to y-semiaxis (\(\delta\))

For aggressive tumor cases, here we use TGM-ONets to learn the mapping from the ellipsoidal initial density of tumor cells with varying \(\delta\) to the solutions of tumor cells and nutrients on the entire computation domain. The growth rate, the length of the minor axis of the initial ellipsoidal tumor R and \(\gamma _{c}\) remain the same as 1.0, 0.05 and 17.5. We sample 1000 values of \(\delta\) from a uniform distribution U(1.0, 3.0). Assuming we have 8 cases of data recording the density of tumor cells and nutrients for every 0.5 days up to 200.5 days with different values of \(\delta\) sampled from U(1.0, 3.0), we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of \(\delta\) sampled from U(1.0, 3.0). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 40a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(4.0\,\times \,10^{-3}\) and \(8.0\,\times \,10^{-3}\) in both training and testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(5\,\times \,10^{-3}\) in both training and testing datasets. Predictions for two specific cases of \(\delta\) = 1.4 (in-distribution) and 2.7 (in-distribution) are illustrated in Fig. 40b.

Fig. 40
figure 40

Prediction for tumor cells and nutrient dynamics for aggressive tumor cases mapping from the ellipsoidal initial density of tumor cells with varying \(\delta\). a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: \(\delta\) = 1.4; right: \(\delta\) = 2.7. (\(\delta\): the ratios of the y-semiaxes to the x-semiaxes)

1.8 C.8 Forecast the aggressive tumor growth using the initial density of tumor cells with varying positions within the domain

In this subsection, we use TGM-ONets to learn the mapping from the initial density of tumor cells with varying positions within the domain to the solutions of tumor cells and nutrients on the entire computation domain for aggressive tumor cases. The growth rate, the length of the minor axis of the initial ellipsoidal tumor R and \(\gamma _{c}\) remain the same as 1.0 and 0.05 and 17.5. Let \((x^{*}, y^{*})\) denote the position of tumor cells and nutrients, we sample 1000 values of \(x^{*}\) and \(y^{*}\) from a uniform distribution U(0.4, 0.6). Assuming we have 10 cases of data recording the density of tumor cells and nutrients for every 0.5 days up to 200.5 days with different values of \(x^{*}\) and \(y^{*}\) sampled from U(0.4, 0.6), we follow the same training procedure as in Sect. 3.1.1. We evaluate the performance of TGM-ONets on testing datasets with different values of \(x^{*}\) and \(y^{*}\) sampled from U(0.4, 0.6). The accuracy of predictions associated with both the training and testing datasets are represented in Fig. 41a, from which we can see the maximum prediction error for \(\phi\) and \(\sigma\) are bounded by \(3.0\,\times \,10^{-3}\) in training datasets while \(3.0\,\times \,10^{-2}\) in testing datasets. The average prediction errors for \(\phi\) and \(\sigma\) are under \(2.0\,\times \,10^{-3}\) in training datasets while \(2.0\,\times \,10^{-2}\) in testing datasets. Predictions for two specific cases of \((x^{*}, y^{*})\) = (0.54, 0.54) (in-distribution) and \((x^{*}, y^{*})\) = (0.54, 0.46) (in-distribution) are illustrated in Fig. 41b.

Fig. 41
figure 41

Prediction for tumor cells and nutrient dynamics for aggressive tumor cases mapping from the ellipsoidal initial density of tumor cells with varying positions within computation domain. a Prediction errors for training and testing datasets. The blue lines represent the mean of prediction errors in training datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors. b Predictions of the tumor morphologies \(\phi\) at different time. left: (\(x^{*}, y^{*}\)) = (0.54, 0.54); right: (\(x^{*}, y^{*}\)) = (0.54, 0.46). ((\(x^{*}, y^{*}\)): the center position of tumor cells and nutrients within computation domain)

D Significance tests for the difference of prediction errors obtained by TGM-ONets using different input functions for the branch net

In this section, we conduct Kruskal–Wallis and Dunns test to check if the differences in prediction errors obtained by TGM-ONets using varying input functions are statistically significant. We take the average prediction errors over time to compute the statistics for Kruskal–Wallis and Dunns test.

For mild tumors, we vary the input functions from the initial density of tumor cells to the growth rate. The results of Kruskal–Wallis test are summarized on the \(1^{st}\) two rows in Table 44, from which we can see the p-values are only lower than \(1.0\,\times \,10^{-2}\) in training datasets, indicating that the \(\epsilon _{\phi }\) and \(\epsilon _{\sigma }\) in training datasets obtained by TGM-ONets using initial density of tumor cells and growth rate as inputs for the branch net are significantly different. However, the generalization ability of TGM-ONets for unseen datasets of mild tumors is consistent and irrespective of input functions. Results of the median of \(\phi\) and \(\sigma\) in training datasets are summarized in Table 45. These results suggest that TGM-ONets perform better in fitting training datasets for mild tumors mapping from the initial density of tumor cells.

For aggressive tumor cases, we vary the input functions from the initial density of tumor cells to the nutrient uptake. The results of Kruskal–Wallis test are summarized on the last two rows in Table 44 and the results of the median of \(\epsilon _{\phi }\) and \(\epsilon _{\sigma }\) in training datasets are summarized in Table 46. These results also suggest that TGM-ONets performances are consistent and irrespective of input functions for unseen datasets while significantly better fitting training datasets for aggressive tumors mapping from the initial density of tumor cells. We infer that the CBAM blocks utilized in CNN-based branch nets are the reasons accounting for the better performance in fitting training datasets for mild and aggressive tumors mapping from the initial density of tumor cells.

Table 44 Results of Kruskal–Wallis test for comparisons between TGM-ONets with varying input functions
Table 45 Median of prediction errors in training datasets for mild tumors with varying input functions
Table 46 Median of prediction errors in training datasets for aggressive tumors with varying input functions

E Long-time predictions using TGM-ONets

In this subsection, we demonstrate more results about long-time prediction using TGM-ONets. Three scenarios are the same as in Sect. 3.2: (1) assuming we have sparse observations at the last time step in the testing domain, (2) assuming we have sparse observations at the early stage in the testing domain, and (3) assuming we have no additional observations in the testing domain.

Tables 47 and 48 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for mild tumors mapping from the initial density of tumor cells in scenario 1, respectively. Tables 4950 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for aggressive tumors mapping from the initial density of tumor cells in scenario 1, respectively. We see that with the time and the initial density of tumor cells increasing, the mean relative \(L^2\) errors increase a little bit. However, even at T = 270 with R > 0.2 mm, the maximum mean relative \(L^2\) errors for \(\phi\) and \(\sigma\) are still under around \(7\times \,10^{-2}\).

Tables 51 and 52 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for mild tumors mapping from the initial density of tumor cells in scenario 2, respectively. Table 53 and 54 shows the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for aggressive tumors mapping from the initial density of tumor cells in scenario 2, respectively.

Tables 55 and 56 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for mild tumors mapping from the initial density of tumor cells in scenario 3, respectively. Tables 57 and 58 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time predictions for aggressive tumors mapping from the initial density of tumor cells in scenario 3, respectively.

Tables 59 and 60 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for mild tumors mapping from the concomitant changes in the initial density of tumor cells and growth rate in scenario 1, respectively. Tables 61 and 62 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for aggressive tumors mapping from the concomitant changes in the initial density of tumor cells and nutrient uptake in scenario 1, respectively.

Tables 63 and 64 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for mild tumors mapping from the concomitant changes in the initial density of tumor cells and growth rate in scenario 2, respectively. Tables 65 and 66 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for aggressive tumors mapping from the concomitant changes in the initial density of tumor cells and nutrient uptake in scenario 2, respectively.

Tables 67 and 68 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for mild tumors mapping from the concomitant changes in the initial density of tumor cells and growth rate in scenario 3, respectively. Tables 69 and 70 show the mean relative \(L^2\) error for \(\phi\) and \(\sigma\) in long-time prediction for aggressive tumors mapping from the concomitant changes in the initial density of tumor cells and nutrient uptake in scenario 3, respectively.

Table 47 Long-time prediction for mild tumor \(\phi\) scenario 1: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 200 and T = 270
Table 48 Long-time prediction for mild tumor \(\sigma\) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 200 and T = 270
Table 49 Long-time prediction for aggressive tumor \(\phi\) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 200, 300 and 400
Table 50 Long-time prediction for aggressive tumor \(\sigma\) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 200, 300 and 400
Table 51 Long-time prediction for mild tumor \(\phi\) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells using an additional snapshot at T = 80
Table 52 Long-time prediction for mild tumor \(\sigma\) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells using an additional snapshot at T = 80
Table 53 Long-time prediction for aggressive tumor \(\phi\) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 230 and T = 280
Table 54 Long-time prediction for aggressive tumor \(\sigma\) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells using additional snapshots at T = 230 and T = 280
Table 55 Long-time prediction for mild tumor \(\phi\) in scenario 3: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 56 Long-time prediction for mild tumor \(\sigma\) in scenario 3: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 57 Long-time prediction for aggressive tumor \(\phi\) in scenario 3: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 58 Long-time prediction for aggressive tumor \(\sigma\) in scenario 3: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 59 Long-time prediction for mild tumor \(\phi\) accommodating multiple input functions in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells and growth rate using additional snapshots at T = 200 and T = 270
Table 60 Long-time prediction for mild tumor \(\sigma\) accommodating multiple input functions in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells and growth rate using additional snapshots at T = 200 and T = 270
Table 61 Long-time prediction for aggressive tumor \(\phi\) accommodating multiple input functions in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells and nutrient uptake using additional snapshots at T = 200, 300 and 400
Table 62 Long-time prediction for aggressive tumor \(\sigma\) accommodating multiple input functions in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells and nutrient uptake using 3 additional snapshots at T = 200, 300 and 400
Table 63 Long-time prediction for mild tumor \(\phi\) accommodating multiple input functions in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells and growth rate using an additional snapshot at T = 80
Table 64 Long-time prediction for mild tumor \(\sigma\) accommodating multiple input functions in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells and growth rate using an additional snapshot at T = 80
Table 65 Long-time prediction for aggressive tumor \(\phi\) accommodating multiple input functions in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells and nutrient uptake using 2 additional snapshots T = 230 and T = 280
Table 66 Long-time prediction for aggressive tumor \(\sigma\) accommodating multiple input functions in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells and nutrient uptake using 2 additional snapshots T = 230 and T = 280
Table 67 Long-time prediction for mild tumor \(\phi\) accommodating multiple input functions in scenario 3: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 68 Long-time prediction for mild tumor \(\sigma\) accommodating multiple input functions in scenario 3: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain
Table 69 Long-time prediction for aggressive tumor \(\phi\) accommodating multiple input functions in scenario 3: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells and nutrient uptake \(\gamma _{c}\) using no additional snapshots in testing domain
Table 70 Long-time prediction for aggressive tumor \(\sigma\) accommodating multiple input functions in scenario 3: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells using no additional snapshots in testing domain

F Ranges of prediction errors in training datasets for examining the robustness of TGM-ONets

In this subsection, we provide the ranges of prediction errors in training datasets for examining the robustness of TGM-ONets.

For the results of examining the effects of the number of training snapshots, Fig. 42 shows the ranges of prediction errors in training datasets for mild tumor cases mapping from the initial density of tumor cells, Fig. 43 shows the ranges of prediction errors in training datasets for aggressive tumor cases mapping from the initial density of tumor cells, and Fig. 44 shows the ranges of prediction errors in training datasets for aggressive tumor cases mapping from the initial density of tumor cells using sparser data.

For the results of examining the effects of the noisy measurements, Fig. 45 shows the ranges of prediction errors in training datasets for mild tumor cases mapping from the initial density of tumor cells, and Fig. 46 shows the ranges of prediction errors in training datasets for aggressive tumor cases mapping from the initial density of tumor cells.

Fig. 42
figure 42

Prediction errors in training datasets for examining the effects of the number of training points for mild tumor cases mapping from the initial density of tumor cells. The blue lines represent the mean of prediction errors in training datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 43
figure 43

Prediction errors in training datasets for examining the effects of the number of training points for aggressive tumor cases mapping from the initial density of tumor cells. The blue lines represent the mean of prediction errors in training datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 44
figure 44

Prediction errors in training datasets for examining the effects of the number of training points for aggressive tumor cases mapping from the initial density of tumor cells using sparser data. The blue lines represent the mean of prediction errors in training datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 45
figure 45

Prediction errors in training datasets for examining the effects of the noisy measurements for mild tumor cases mapping from the initial density of tumor cells. The blue lines represent the mean of prediction errors in training datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 46
figure 46

Prediction errors in training datasets for examining the effects of the noisy measurements for aggressive tumor cases mapping from the initial density of tumor cells. The blue lines represent the mean of prediction errors in training datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 47
figure 47

Ablation study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the initial density of tumor cells with varying network structures. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

Fig. 48
figure 48

Ablation study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with varying network structures. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

Fig. 49
figure 49

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

Fig. 50
figure 50

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

Fig. 51
figure 51

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate \(\rho\) with varying number of total hidden layers. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

Fig. 52
figure 52

Grid study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. a The average of prediction errors for training datasets. b The average of prediction errors for testing datasets

G Ablation & grid studies of TGM-ONets

For all cases considered in Sect. 4 and in this subsection, the distributions for the parameters of the mechanistic model for training and testing are the same as in Sect. 3.

In this subsection, we first consider the contributions of CBAM and MoE blocks in aggressive tumor cases mapping from the initial density of tumor cells as well as in mild tumor cases mapping from the growth rates \(\rho\). We use the same settings mentioned in the first ablation study in Sect. 4. Model performances are summarized in Figs. 47 and 48. We also conduct single-tailed Wilcoxon tests to check if the prediction errors given by Vanilla PI-DeepONet are significantly greater than those given by our proposed methods. The null hypothesis and the alternative hypothesis are the same as in Sect. 4. We take the average prediction errors over time for each training and test sample to compute the statistics for the single-tailed Wilcoxon tests. The results are summarized in Tables 71 and 72, from which we can see the p-values are lower than \(1.0\,\times \,10^{-2}\) for \(\epsilon _{\phi }\) and \(\epsilon _{\sigma }\) in testing datasets for both mild tumors and aggressive tumor cases. These results further showcase the improvements in the generalization ability of TGM-ONets for unseen datasets by utilizing our proposed methods.

Additionally, we further investigate the effects of the number of total hidden layers used in TGM-ONets for mild tumors mapping from the nutrient uptake \(\gamma _{c}\), mild tumors mapping from the initial density of tumor cells, mild tumors mapping from the growth rate \(\rho\) and aggressive tumors mapping from the initial density of tumor cells. Model performances are summarized in Figs. 49, 50, 51 and  52. We also conduct Kruskal–Wallis test and Dunns test to check if the difference of the prediction errors given by TGM-ONets with varying numbers of total hidden layers are statistically significant. We take the average prediction errors over time for each training and test sample to compute the statistics for Kruskal–Wallis and Dunns test. The results of Kruskal–Wallis test are summarized in Tables 7376, from which we can see the effects of the number of total hidden layers are different from case to case. For mild tumors mapping from the nutrient uptake \(\gamma _{c}\), the p-values are lower than \(1.0\,\times \,10^{-2}\) in training and testing datasets. For mild tumors mapping from the initial density of tumor cells, the p-values are lower than \(1.0\,\times \,10^{-2}\) in training datasets. For mild tumors mapping from the growth rate \(\rho\), the p-values are greater than \(1.0\,\times \,10^{-2}\) in training and testing datasets. For aggressive tumors mapping from the initial density of tumor cells, the p-values are only lower than \(1.0\,\times \,10^{-2}\) in training datasets. These results indicate that increasing the number of total hidden layers does enhance the performance, but the degree of enhancement achieved through this approach can be relatively minor sometimes and may not offset the increased computational cost associated with an increased number of model parameters. Further results of Dunns test for each case are summarized in Tables 77 - 83.

Quantitative results of enhancement derived from utilizing CBAM and MoE blocks are also provided in this section. Tables 84 - 89 show the mean relative \(L^{2}\) errors for \(\phi\) and \(\sigma\) for mild tumors mapping from the initial density of tumor cells with R = 0.05 mm, 0.18 mm and 0.23 mm using varying architectures of deep operator networks. Tables 90 - 95 show the mean relative \(L^{2}\) errors for \(\phi\) and \(\sigma\) for aggressive tumors mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 16.1 g/L/day, 16.9 g/L/day and 18.9 g/L/day using varying architectures of deep operator networks. Tables 96 - 101 show the mean relative \(L^{2}\) errors for \(\phi\) and \(\sigma\) for aggressive tumors mapping from the initial density of tumor cells with \(R = 0.09\) mm, 0.16 mm and 0.21 mm using varying architectures of deep operator networks. Tables 102, 103, 104, 105, 106 and 107 show the mean relative \(L^{2}\) errors for \(\phi\) and \(\sigma\) for mild tumors mapping from the growth rate with \(\rho\) = 1.2 1/day, 2.1 1/day and 2.7 1/day using varying architectures of deep operator networks. In all the tables, bolded are the best results, and underlined are the second-best ones.

We also provide the ranges for prediction errors for each case considered in Sect. 4 and in this subsection. From Figs. 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68 and 69, we can see that ranges for the training error are roughly no larger than the ranges for the testing errors for all the cases considered in ablation and grid studies. In Figs. 53, 54 and 65, we can see that TGM-ONets with CBAM and MoE blocks have smaller ranges of prediction errors compared with vanilla PI-DeepONets. In Fig. 55 and 56, we can see that \(w_{data} = 100\) has smaller ranges for the prediction errors than other values for \(w_{data}\). In Figs. 57, 66, 67, 68 and 69, we can see that 8 or 9 hidden layers have smaller ranges for the prediction errors than others. In Fig. 58, we can see that 2 MoE has smaller ranges for the testing errors but 3 MoE has smaller ranges for the training errors. In Fig. 59, we can see that \(1\,\times \,10^{-3}\), \(6\,\times \,10^{-4}\), and \(2\,\times \,10^{-4}\) learning rate have smaller ranges for the prediction errors. In Fig. 60, 1000 decay steps have smaller ranges for the prediction errors. Figure 61 shows that continuous activation functions have smaller ranges for the prediction errors. Figure 62 shows different numbers of training datasets (i.e., 7, 8, 9, 10) have roughly the same ranges for the prediction errors. Figures 63 and 64 show that with or without boundary loss, ranges for the prediction errors do not change much.

Table 71 Results of Wilcoxon test for comparisons between vanilla PI-DeepONet and TGM-ONets with CBAM and MoE block for aggressive tumors mapping from the initial density of tumor cells
Table 72 Results of Wilcoxon test for comparisons between vanilla PI-DeepONet and TGM-ONets with MoE block for mild tumors mapping from the growth rate \(\rho\)
Table 73 Results of Kruskal–Wallis test for examing the effectiveness of number of hidden layers for mild tumors mapping from the nutrient uptake \(\gamma _{c}\)
Table 74 Results of Kruskal–Wallis test for examing the effectiveness of number of hidden layers for mild tumors mapping from the initial density of tumor cells
Table 75 Results of Kruskal–Wallis test for examing the effectiveness of number of hidden layers for mild tumors mapping from the growth rate \(\rho\)
Table 76 Results of Kruskal–Wallis test for examing the effectiveness of number of hidden layers for aggressive tumors mapping from the initial density of tumor cells
Table 77 Results of Dunns test for \(\epsilon _{\phi }\) in training datasets for mild tumors mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 78 Results of Dunns test for \(\epsilon _{\sigma }\) in training datasets for mild tumors mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 79 Results of Dunns test for \(\epsilon _{\phi }\) in testing datasets for mild tumors mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 80 Results of Dunns test for \(\epsilon _{\sigma }\) in testing datasets for mild tumors mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 81 Results of Dunns test for \(\epsilon _{\phi }\) in training datasets for mild tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 82 Results of Dunns test for \(\epsilon _{\phi }\) in training datasets for aggressive tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 83 Results of Dunns test for \(\epsilon _{\sigma }\) in training datasets for aggressive tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. Each element represents the result of the p-value of the corresponding post hoc pairwise test for multiple comparisons
Table 84 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.05 mm (test case)
Table 85 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.05 mm (test case)
Table 86 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.18 mm (test case)
Table 87 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.18 mm(test case)
Table 88 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.23 mm (test case)
Table 89 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with R = 0.23 mm (test case)
Table 90 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 16.1 g/L/day (test case)
Table 91 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 16.1 g/L/day (test case)
Table 92 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 16.9 g/L/day (test case)
Table 93 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 16.9 g/L/day (test case)
Table 94 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 18.9 g/L/day (test case)
Table 95 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the nutrient uptake by tumor cells with \(\gamma _{c}\) = 18.9 g/L/day (test case)
Table 96 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.09 mm (test case)
Table 97 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.09 mm (test case)
Table 98 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.16 mm (test case)
Table 99 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.16 mm (test case)
Table 100 Ablation study for aggressive tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.21 mm (test case)
Table 101 Ablation study for aggressive tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with R = 0.21 mm (test case)
Table 102 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 1.2 1/day (test case)
Table 103 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 1.2 1/day (test case)
Table 104 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 2.1 1/day (test case)
Table 105 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 2.1 1/day (test case)
Table 106 Ablation study for mild tumor \(\phi\): Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 2.7 1/day (test case)
Table 107 Ablation study for mild tumor \(\sigma\): Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the growth rate with \(\rho\) = 2.7 1/day (test case)
Fig. 53
figure 53

Ablation study: Prediction errors for inferring state variables for mild tumor cases mapping from the initial density of tumor cells with varying network structures. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 54
figure 54

Ablation study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the nutrient uptake by tumor cells with varying network structures. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 55
figure 55

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the initial density of tumor cells with varying \(\omega _{data}\). a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 56
figure 56

Grid study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the initial density of tumor cells with varying \(\omega _{data}\). a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 57
figure 57

Grid study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the nutrient uptake by tumor cells with varying number of total hidden layers. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 58
figure 58

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the initial density of tumor cells with varying number of expert networks in MoE block. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 59
figure 59

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with varying initial learning rate. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 60
figure 60

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with varying decay steps for the optimizer. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 61
figure 61

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with varying activation functions. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 62
figure 62

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with varying number of training datasets. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 63
figure 63

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate with or without boundary loss. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 64
figure 64

Grid study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the nutrient uptake by tumor cells with or without boundary loss. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 65
figure 65

Ablation study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the initial density of tumor cells with varying network structures. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 66
figure 66

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the nutrient uptake \(\gamma _{c}\) with varying number of total hidden layers. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 67
figure 67

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 68
figure 68

Grid study: Prediction errors for inferring state variables for mild tumor cases mapping from the growth rate \(\rho\) with varying number of total hidden layers. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 69
figure 69

Grid study: Prediction errors for inferring state variables for aggressive tumor cases mapping from the initial density of tumor cells with varying number of total hidden layers. a Prediction errors for training datasets. The blue lines represent the mean of prediction errors in training datasets. b Prediction errors for testing datasets. The red lines represent the mean of prediction errors in testing datasets. The shaded region represents the region encompassed by the maximum and minimum of the prediction errors

Fig. 70
figure 70

Comparison with SOTA models. a Prediction errors for mild tumor case with R = 0.16 mm in scenario 1 (i.e., using a sparse measurement at T = 270). b Prediction errors for aggressive tumor case with R = 0.18 mm in scenario 1 (i.e., using sparse measurements at T = 200, 300 and 400). c Prediction errors for mild tumor case with R = 0.16 mm in scenario 2 (i.e., using no additional measurements in testing domain). d Prediction errors for aggressive tumor case with R = 0.18 mm in scenario 2 (i.e., using no additional measurements in testing domain). In scenario 1, the prediction range is T\(\in\)[0, 270] for mild tumors while T\(\in\)[0, 400] for aggressive tumors. In scenario 2, the prediction range is T\(\in\)[0, 130] for mild tumors while T\(\in\)[0, 230] for aggressive tumors

Fig. 71
figure 71

Comparison with SOTA models. a Prediction errors for mild tumor case with R = 0.21 mm in scenario 1 (i.e., using 1 snapshot at T = 270). b Prediction errors for aggressive tumor case with R = 0.21 mm in scenario 1 (i.e., using 3 snapshots at T = 200, 300 and 400). c Prediction errors for mild tumor case with R = 0.21 mm in scenario 2 (i.e., using no additional snapshots in testing domain). d Prediction errors for aggressive tumor case with R = 0.21 mm in scenario 2 (i.e., using no additional snapshots in testing domain). In scenario 1, the prediction range is T\(\in\)[0, 270] for mild tumors while T\(\in\)[0, 400] for aggressive tumors. In scenario 2, the prediction range is T\(\in\)[0, 130] for mild tumors while T\(\in\)[0, 230] for aggressive tumors

H. Comparison with three state-of-the-art (SOTA) models

In this subsection, we further compare the TGM-ONets ability in predicting mild tumor growth with R = 0.16 mm, 0.21 mm and aggressive tumor growth with R = 0.18 mm, 0.21 mm with three SOTA models. We consider the same two scenarios as in Sect. 5.

For the first scenario, prediction errors for mild tumor cases with R = 0.16 mm and 0.21 mm are displayed in Fig. 70a and Fig. 71a, prediction errors for aggressive tumor cases with R = 0.18 mm and 0.21 mm are displayed in Fig. 70b and Fig. 71b.

For the second scenario, prediction errors for mild tumor cases with R = 0.16 mm and 0.21 mm are displayed in Fig. 70c and Fig. 71. c, prediction errors for aggressive tumor cases with R = 0.18 mm and 0.21 mm are displayed in Fig. 70d and Fig. 71d.

From the results presented above, we can see our fine-tuning method can provide more stable and accurate prediction results compared with the other three SOTA models, which demonstrate the effectiveness and efficiency of our proposed method.

Quantitative results are also provided for all cases considered in Sect. 5 and in this subsection. Tables 108 and 109 show the mean relative \(L^2\) error for mild tumor case with R = 0.08 mm in scenario 1. Tables 110 and 111 show the mean relative \(L^2\) error for mild tumor case with R = 0.16 mm in scenario 1. Tables 112 and 113 show the mean relative \(L^2\) error for mild tumor case with R = 0.21 mm in scenario 1.

Tables 114 and 115 show the mean relative \(L^2\) error for aggressive tumor case with R = 0.05 mm in scenario 1. Tables 116 and 117 show the mean relative \(L^2\) error for aggressive tumor cases with R = 0.18 mm in scenario 1. Tables 118 and 119 show the mean relative \(L^2\) error for aggressive tumor cases with R = 0.21 mm in scenario 1.

Tables 120 and 121 show the mean relative \(L^2\) error for mild tumor case with R = 0.08 mm in scenario 2. Tables 122 and 123 show the mean relative \(L^{2}\) errors for mild tumor case with R = 0.16 mm in scenario 2. Tables 124 and 125 show the mean relative \(L^{2}\) errors for mild tumor case with R = 0.21 mm in scenario 2.

Tables 126 and 127 show the mean relative \(L^2\) error for aggressive tumor case with R = 0.05 mm in scenario 2. Tables 128 and 129 show the mean relative \(L^2\) error for aggressive tumor case with R = 0.18 mm in scenario 2. Tables 130 and 131 show the mean relative \(L^2\) error for aggressive tumor case with R = 0.21 mm in scenario 2.

Table 108 Comparison with SOTA for mild tumor \(\phi\) (R = 0.08 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 109 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.08 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 110 Comparison with SOTA for mild tumor \(\phi\) (R = 0.16 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 111 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.16 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 112 Comparison with SOTA for mild tumor \(\phi\) (R = 0.21 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 113 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.21 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using an additional snapshot at T = 270. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 270]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 114 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.05 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using additional snapshots at T = 200, 300 and 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 115 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.05 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using additional 3 snapshots at T = 200, 300 and 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 116 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.18 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using 3 additional snapshots at T = 200, 300, 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 117 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.18 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using 3 additional snapshots at T = 200, 300, 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 118 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.21 mm) in scenario 1: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using 3 additional snapshots at T = 200, 300, 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 119 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.21 mm) in scenario 1: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using 3 additional snapshots at T = 200, 300, 400. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 400]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 120 Comparison with SOTA for mild tumor \(\phi\) (R = 0.08 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 121 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.08 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 122 Comparison with SOTA for mild tumor \(\phi\) (R = 0.16 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 123 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.16 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 124 Comparison with SOTA for mild tumor \(\phi\) (R = 0.21 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 125 Comparison with SOTA for mild tumor \(\sigma\) (R = 0.21 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the mild tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 70.5] while tested in T\(\in\)[0, 130]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 70.5]
Table 126 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.05 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 230]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 127 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.05 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 230]. \(\phi\) and \(\sigma\) are seen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 128 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.18 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 230]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 129 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.18 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 230]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 130 Comparison with SOTA for aggressive tumor \(\phi\) (R = 0.21 mm) in scenario 2: Mean relative \(L^2\) error for \(\phi\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T\(\in\)[0, 200] while tested in T\(\in\)[0, 230]. \(\phi\) and \(\sigma\) are unseen during the training of TGM-ONets in T\(\in\)[0, 200]
Table 131 Comparison with SOTA for aggressive tumor \(\sigma\) (R = 0.21 mm) in scenario 2: Mean relative \(L^2\) error for \(\sigma\) of the aggressive tumor growth mapping from the initial density of tumor cells with different SOTA methods using no additional snapshots in testing domain. TGM-ONets are trained in T

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Q., Li, H. & Zheng, X. A deep neural network for operator learning enhanced by attention and gating mechanisms for long-time forecasting of tumor growth. Engineering with Computers 41, 423–533 (2025). https://doi.org/10.1007/s00366-024-02003-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00366-024-02003-0

Keywords