Abstract
The introduction in 2015 of Residual Neural Networks (RNN) and ResNET allowed for outstanding improvements of the performance of learning algorithms for evolution problems containing a “large” number of layers. Continuous-depth RNN-like models called Neural Ordinary Differential Equations (NODE) were then introduced in 2019. The latter have a constant memory cost, and avoid the a priori specification of the number of hidden layers. In this paper, we derive and analyze a parallel (-in-parameter and time) version of the NODE, which potentially allows for a more efficient implementation than a standard/naive parallelization of NODEs with respect to the parameters only. We expect this approach to be relevant whenever we have access to a very large number of processors, or when we are dealing with high dimensional ODE systems. Moreover, when using implicit ODE solvers, solutions to linear systems with up to cubic complexity are then required for solving nonlinear systems using for instance Newton’s algorithm; as the proposed approach allows to reduce the overall number of time-steps thanks to an iterative increase of the accuracy order of the ODE system solvers, it then reduces the number of linear systems to solve, hence benefiting from a scaling effect.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016, 770–778 (2016)
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. arXiv:1806.07366v4 (2019)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)
Anthony, M., Bartlett, P.L.: Neural network learning: theoretical foundations. Cambridge University Press, Cambridge (1999)
White, H.: Artificial neural networks. Blackwell Publishers, Oxford (1992). Approximation and learning theory, With contributions by A. R. Gallant, K. Hornik, M. Stinchcombe and J. Wooldridge
Lions, J-L, Maday, Y., Turinici, G.: Résolution d’EDP par un schéma en temps “pararéel”. C. R. Acad. Sci. Paris Sér. I Math. 332(7), 661–668 (2001)
Maday, Y.: Symposium: Recent advances on the parareal in time algorithms. 1168, 1515–1516 (2009)
Gander, M.J., Jiang, Y-L, Li, R-J: Parareal Schwarz waveform relaxation methods. Lect. Notes Comput. Sci. Eng. 91, 451–458 (2013)
Fischer, P.F., Hecht, F., Maday, Y.: A parareal in time semi-implicit approximation of the Navier-Stokes equations. Lect. Notes Comput. Sci. Eng. 40, 433–440 (2005)
Falgout, R.D., Friedhoff, S., Kolev, T.V., MacLachlan, S.P., Schroder, J.B.: Parallel time integration with multigrid. SIAM J. Sci. Comput. 36(6) (2014)
Giannakoglou, K.C., Papadimitriou, D.I. In: Thévenin, D, Janiga, G (eds.) : Adjoint methods for shape optimization, pp 79–108. Springer, Berlin (2008)
Quarteroni, A., Sacco, R., Saleri, F.: Numerical mathematics, vol. 37. of texts in Applied Mathematics. Springer, New York (2000)
Parpas, P., Muir, C.: Predict globally, correct locally: Parallel-in-time optimal control of neural networks. arXiv:1902.02542 (2019)
Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., Gauger, N.R.: Layer-parallel training of deep residual neural networks (2019)
Gander, M.J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method. SIAM J. Sci. Comput. 29(2), 556–578 (2007)
Saad, Y., Schultz, M.H.: GMRES - A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Statis. Comput. 7(3), 856–869 (1986)
Acknowledgments
The author would like to thank Prof. D. Duvenaud and Dr. R. Chen from the University of Toronto for enlightening discussions about NODEs.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lorin, E. Derivation and analysis of parallel-in-time neural ordinary differential equations. Ann Math Artif Intell 88, 1035–1059 (2020). https://doi.org/10.1007/s10472-020-09702-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-020-09702-6