Efficient sparse matrix-delayed vector multiplication for discretized neural field model

Fousek, Jan

doi:10.1007/s11227-017-2194-4

Efficient sparse matrix-delayed vector multiplication for discretized neural field model

Published: 15 December 2017

Volume 74, pages 1863–1884, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jan Fousek ORCID: orcid.org/0000-0002-8371-2956¹

493 Accesses
1 Altmetric
Explore all metrics

Abstract

Computational models of the human brain provide an important tool for studying the principles behind brain function and disease. To achieve whole-brain simulation, models are formulated at the level of neuronal populations as systems of delayed differential equations. In this paper, we show that the integration of large systems of sparsely connected neural masses is similar to well-studied sparse matrix-vector multiplication; however, due to delayed contributions, it differs in the data access pattern to the vectors. To improve data locality, we propose a combination of node reordering and tiled schedules derived from the connectivity matrix of the particular system, which allows performing multiple integration steps within a tile. We present two schedules: with a serial processing of the tiles and one allowing for parallel processing of the tiles. We evaluate the presented schedules showing speedup up to \(2\,\times \) on single-socket CPU, and \(1.25\,\times \) on Xeon Phi accelerator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Brain Simulations with the Fast Multipole Method

NeuroBox: computational mathematics in multiscale neuroscience

Article 14 June 2019

Fully-Asynchronous Fully-Implicit Variable-Order Variable-Timestep Simulation of Neural Networks

Notes

For anatomical and functional atlases, the number of areas ranges typically from 80 to 200, e.g., [6, 34].

References

Bojak I, Oostendorp TF, Reid AT, Kötter R (2011) Towards a model-based integration of co-registered electroencephalography/functional magnetic resonance imaging data with realistic neural population meshes. Philos Trans R Soc A Math Phys Eng Sci 369(1952):3785–3801
Article MathSciNet MATH Google Scholar
Bressloff PC (2011) Spatiotemporal dynamics of continuum neural fields. J Phys A Math Theor 45(3):033,001
Article MathSciNet Google Scholar
Byun JH, Lin R, Yelick KA, Demmel J (2012) Autotuning sparse matrix-vector multiplication for multicore. Technical report UCB/EECS-2012-215, EECS Department, University of California, Berkeley
Cacciola F (2016) Triangulated surface mesh simplification. In: CGAL User and Reference Manual, 4.9 edn. CGAL Editorial Board. http://doc.cgal.org/4.9/Manual/packages.html#PkgSurfaceMeshSimplificationSummary. Accessed 03 Apr 2017
Coombes S, beim Graben P, Potthast R, Wright J (2014) Neural fields. Springer, Berlin
Book MATH Google Scholar
Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS (2012) A whole brain fMRI atlas generated via spatially constrained spectral clustering. Hum Brain Mapp 33(8):1914–1928
Article Google Scholar
Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference. ACM, pp 157–172
Datta K, Kamil S, Williams S, Oliker L, Shalf J, Yelick K (2009) Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev 51(1):129–159
Article MATH Google Scholar
Demmel J, Hoemmen M, Mohiyuddin M, Yelick K (2008) Avoiding communication in sparse matrix computations. In: IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE, pp 1–12
Douglas CC, Hu J, Kowarschik M, Rüde U, Weiß C (2000) Cache optimization for structured and unstructured grid multigrid. Electron Trans Numer Anal 10:21–40
MathSciNet MATH Google Scholar
Geuzaine C, Remacle JF (2009) Gmsh: a 3-D finite element mesh generator with built-in pre-and post-processing facilities. Int J Numer Methods Eng 79(11):1309–1331
Article MathSciNet MATH Google Scholar
Green KR, van Veen L (2014) Open-source tools for dynamical analysis of Liley’s mean-field cortex model. J Comput Sci 5(3):507–516
Article MathSciNet Google Scholar
Grosser T, Cohen A, Holewinski J, Sadayappan P, Verdoolaege S (2014) Hybrid hexagonal/classical tiling for GPUs. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 66
Jirsa VK (2009) Neural field dynamics with local and global connectivity and time delay. Philos Trans R Soc A Math Phys Eng Sci 367(1891):1131–1143
Article MathSciNet MATH Google Scholar
Korch M, Rauber T (2010) Parallel low-storage Runge–Kutta solvers for ODE systems with limited access distance. Int J High Perform Comput Appl 25(2):236–255
Article Google Scholar
L’Ecuyer P, Munger D, Oreshkin B, Simard R (2017) Random numbers for parallel computers: requirements and methods, with emphasis on gpus. Math Comput Simul 135:3–17
Article MathSciNet Google Scholar
Leon PS, Knock SA, Woodman MM, Domide L, Mersmann J, McIntosh AR, Jirsa V (2013) The Virtual Brain: a simulator of primate Brain network dynamics. Front Neuroinform 7:36–47
Liu X, Chow E, Vaidyanathan K, Smelyanskiy M (2012) Improving the performance of dynamical simulations via multiple right-hand sides. In: 2012 IEEE 26th International on Parallel & Distributed Processing Symposium (IPDPS). IEEE, pp 36–47
Malas T, Hager G, Ltaief H, Keyes D (2015) Multi-dimensional intra-tile parallelization for memory-starved stencil computations. arXiv preprint arXiv:1510.04995
Mitchell JS, Mount DM, Papadimitriou CH (1987) The discrete geodesic problem. SIAM J Comput 16(4):647–668
Article MathSciNet MATH Google Scholar
Morlan J, Kamil S, Fox A (2012) Auto-tuning the matrix powers kernel with SEJITS. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science-VECPAR 2012. Springer, pp 391–403
Orozco D, Garcia E, Gao G (2010) Locality optimization of stencil applications using data dependency graphs. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, pp 77–91
Proix T, Spiegler A, Schirner M, Rothmeier S, Ritter P, Jirsa VK (2016) How do parcellation size and short-range connectivity affect dynamics in large-scale brain network models? NeuroImage 142:135–149
Article Google Scholar
Rafique A, Constantinides GA, Kapre N (2015) Communication optimization of iterative sparse matrix-vector multiply on GPUs and FPGAs. IEEE Trans Parallel Distrib Syst 26(1):24–34
Article Google Scholar
Sanz-Leon P, Knock SA, Spiegler A, Jirsa VK (2015) Mathematical framework for large-scale brain network modeling in The Virtual Brain. Neuroimage 111:385–430
Article Google Scholar
Spiegler A, Jirsa V (2013) Systematic approximations of neural fields through networks of neural masses in The Virtual Brain. NeuroImage 83:704–725
Article Google Scholar
Strout M, Carter L, Ferrante J (2001) Rescheduling for locality in sparse matrix computations. In: Computational Science—ICCS 2001. pp 137–146
Strout MM, Carter L, Ferrante J, Kreaseck B (2004) Sparse tiling for stationary iterative methods. Int J High Perform Comput Appl 18(1):95–113
Article Google Scholar
Strout MM, LaMielle A, Carter L, Ferrante J, Kreaseck B, Olschanowsky C (2016) An approach for code generation in the sparse polyhedral framework. Parallel Comput 53:32–57
Article MathSciNet Google Scholar
Thapliyal H, Arabnia HR (2006) A reversible programmable logic array (RPLA) using Fredkin and Feynman gates for industrial electronics and applications. In: Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology, CDES 2006, Las Vegas, 26–29 June 2006. pp 70–76
Thapliyal H, Arabnia HR, Bajpai R, Sharma KK (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2007, Las Vegas, 25–28 June 2007, Vol 1. pp 449–452
Thapliyal H, Jayashree HV, Nagamani AN, Arabnia HR (2013) Progress in reversible processor design: a novel methodology for reversible carry look-ahead adder. Trans Comput Sci 17:73–97. https://doi.org/10.1007/978-3-642-35840-1_4
Google Scholar
Treibig J, Hager G, Wellein G (2010) LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego
Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M (2002) Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15(1):273–289
Article Google Scholar
Venkat A, Shantharam M, Hall M, Strout MM (2014) Non-affine extensions to polyhedral code generation. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 185
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194
Article Google Scholar
Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. ACM SIGARCH Comput Archit News 23(1):20–24
Article Google Scholar
Yzelman AJN, Roose D (2014) High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans Parallel Distrib Syst 25(1):116–125
Article Google Scholar

Download references

Acknowledgements

The work was supported from European Regional Development Fund—Project “CERIT Scientific Cloud” (No. CZ.02.1.01/0.0/0.0/16_013/0001802). The author would like to thank Jiří Filipovič for constructive criticism of the manuscript.

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Jan Fousek

Authors

Jan Fousek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Fousek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fousek, J. Efficient sparse matrix-delayed vector multiplication for discretized neural field model. J Supercomput 74, 1863–1884 (2018). https://doi.org/10.1007/s11227-017-2194-4

Download citation

Published: 15 December 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11227-017-2194-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient sparse matrix-delayed vector multiplication for discretized neural field model

Abstract

Access this article

Similar content being viewed by others

Accelerating Brain Simulations with the Fast Multipole Method

NeuroBox: computational mathematics in multiscale neuroscience

Fully-Asynchronous Fully-Implicit Variable-Order Variable-Timestep Simulation of Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient sparse matrix-delayed vector multiplication for discretized neural field model

Abstract

Access this article

Similar content being viewed by others

Accelerating Brain Simulations with the Fast Multipole Method

NeuroBox: computational mathematics in multiscale neuroscience

Fully-Asynchronous Fully-Implicit Variable-Order Variable-Timestep Simulation of Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation