Performance analysis and comparison of cellular automata GPU implementations

Millán, Emmanuel N.; Wolovick, Nicolás; Piccoli, María Fabiana; Garino, Carlos García; Bringa, Eduardo M.

doi:10.1007/s10586-017-0850-3

Performance analysis and comparison of cellular automata GPU implementations

Published: 07 April 2017

Volume 20, pages 2763–2777, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Emmanuel N. Millán¹,
Nicolás Wolovick²,
María Fabiana Piccoli³,
Carlos García Garino⁴ &
…
Eduardo M. Bringa¹

474 Accesses
7 Citations
Explore all metrics

Abstract

Cellular automata (CA) models are of interest to several scientific areas, and there is a growing interest in exploring large systems which would need high performance computing. In this work a CA implementation is presented which performs well in five different NVIDIA GPU architectures, from Tesla to Maxwell, simulating systems with up to a billion cells. Using the game of life (GoL) and a more complex variation of GoL as examples, a performance of 5.58e6 evaluated cells/s is achieved. The two optimizations most often used in previous studies are the use of shared memory and Multicell algorithms. Here, these optimizations do not improve performance in Fermi or newer architectures. The GoL CA code running in an NVIDIA Titan X obtained a speedup of up to \(\sim \)85 x and up to \(\sim \)230 x for a more complex CA, compared to an optimized serial CPU implementation. Finally, the efficiency of each GPU is analyzed in terms of cell performance/transistors and cell performance/bandwidth showing how the architectures improved for this particular problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-accelerated simulations of mass-action kinetics models with cupSODA

Article 23 May 2014

Evaluating GPU Programming Models for the LUMI Supercomputer

Artificial Chemistries on GPU

References

Aaby, B.G., Perumalla, K.S., Seal, S.K.: Efficient simulation of agent-based models on multi-GPU and multi-core clusters. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques p. 29:1 (2010). doi:10.4108/icst.simutools2010.8822
Balasalle, J., Lopez, M.A., Rutherford, M.J.: Optimizing memory access patterns for cellular. In: Hwu, W. (ed.) GPU Computing Gems Jade Edition, pp. 67–75. Morgan Kaufmann, Amsterdam (2011)
Google Scholar
Bauer, M., Cook, H., Khailany, B.: Cudadma. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on—SC 11 p. 12 (2011). doi:10.1145/2063384.2063400
Blecic, I., Cecchini, A., Trunfio, G.A.: Fast and accurate optimization of a GPU-accelerated ca urban model through cooperative coevolutionary particle swarms. Proc. Comput. Sci. 29, 1631–1643 (2014). doi:10.1016/j.procs.2014.05.148
Article Google Scholar
Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)
Google Scholar
Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers—short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011). doi:10.1016/j.cpc.2010.12.021
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: ACM/IEEE 2000 Conference on Supercomputing, p. 42. IEEE (2000)
Campos, R.S., Lobosco, M., dos Santos, R.W.: A GPU-based heart simulator with mass-spring systems and cellular automaton. J Supercomput 69(1), 1–8 (2014). doi:10.1007/s11227-014-1199-5
Article Google Scholar
Carozzani, T., Gandin, C.A., Digonnet, H.: Optimized parallel computing for cellular automaton finite element modeling of solidification grain structures. Modelling Simul. Mater. Sci. Eng. 22(1), 015,012 (2013). doi:10.1088/0965-0393/22/1/015012
Article Google Scholar
Caux, J., Hill David, R., Siregar, P.: Accelerating 3D Cellular automata computation with GP-GPU in the context of integrative biology. In: Cellular Automata—Innovative Modelling for Science and Engineering, pp. 411–426. InTech (2011). https://hal.archives-ouvertes.fr/hal-00679045
Chen, S., Doolen, G.D.: Lattice Boltzmann method for fluid flows. Ann. Rev. Fluid Mech. 30(1), 329–364 (1998). doi:10.1146/annurev.fluid.30.1.329
Article MathSciNet Google Scholar
CUDA C Programming Guide, vol. 4.2. NVIDIA (2012). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
CUDA C Programming Guide, vol. 7.0. NVIDIA (2015). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
CUDA from NVIDIA. http://www.nvidia.com/cuda
Feichtinger, C., Habich, J., Kstler, H., Hager, G., Rde, U., Wellein, G.: A flexible patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. Parallal Comput. 37(9), 536–549 (2011). doi:10.1016/j.parco.2011.03.005
Article MathSciNet Google Scholar
Ferrero, E.E., De Francesco, J.P., Wolovick, N., Cannas, S.A.: q-state potts model metastability study using optimized GPU-based Monte Carlo algorithms. Comput. Phys. Commun. 183(8), 1578–1587 (2012). doi:10.1016/j.cpc.2012.02.026
Article MathSciNet Google Scholar
Ganguly, N., Sikdar, B.K., Deutsch, A., Canright, G., Chaudhuri, P.P.: A survey on cellular automata. Center for High Performance Computing, Dresden University of Technology (2003). http://citeseerx.ist.psu.edu/viewdoc/summary?, doi:10.1.1.107.7729
Gardner, M.: Mathematical games: the fantastic combinations of John Conway new solitaire game life. Sci. Am. 223(4), 120–123 (1970)
Article Google Scholar
Gibson, M.J., Keedwell, E.C., Savi, D.: Understanding the efficient parallelisation of cellular automata on CPU and GPGPU hardware. In: Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion—GECCO 13 Companion pp. 171–172 (2013). doi:10.1145/2464576.2464660
Gibson, M.J., Keedwell, E.C., Savi, D.A.: An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware. J. Parallel Distrib. Comput. 77, 11–25 (2014). doi:10.1016/j.jpdc.2014.10.011
Hawick, K.A., Johnson, M.G.: Bit-packed damaged lattice potts model simulations with cuda and gpus. In: Proceedings of International Conferences on Modelling, Simulation and Identification, pp. 371–378 (2011)
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput. Archit. News 37(3), 152 (2009). doi:10.1145/1555815.1555775
Article Google Scholar
Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM, New York (2010)
LAMMPS: Lennard Jones Liquid Benchmark. http://lammps.sandia.gov/bench.html#lj
Lee, C., Ro, W.W., Gaudiot, J.L.: Boosting CUDA applications with CPU-GPU hybrid computing. Int. J. Parallel Program. 42(2), 384–404 (2013). doi:10.1007/s10766-013-0252-y
Article Google Scholar
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
Article Google Scholar
Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Größlinger, A., Köstler, H. (eds.) Proceedings of the 1st International Workshop on High-Performance Stencil Computations, pp. 89–95. Austria, Vienna (2014)
Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In: Proceedings of the 23rd International Conference on Conference on Supercomputing—ICS 09 pp. 256–265 (2009). doi:10.1145/1542275.1542313
Millán, E.N., Bederian, C., Piccoli, M.F., García Garino, C., Bringa, E.M.: Performance analysis of cellular automata HPC implementations. Comput. Electr. Eng. 48, 12–24 (2015). doi:10.1016/j.compeleceng.2015.09.015
Article Google Scholar
Millán, E.N., Martínez, P.C., Gil Costa, G.V., Piccoli, M.F., Printista, A.M., Bederian, C., García Garino, C., Bringa, E.M.: Parallel implementation of a cellular automata in a hybrid CPU/GPU environment. In: A. De Giusti (ed.) XVIII Congreso Argentino de Ciencias de la Computación, pp. 184–193. Red de Universidades con Carreras en Informática RedUNCI (2013). ISBN 978-987-23963-1-2
Moore, N.: Kernel specialization for improved adaptability and performance on graphics processing units (gpus). Ph.D. thesis, Northeastern University Boston, MA (2012)
North, M.J., Collier, N.T., Ozik, J., Tatara, E.R., Macal, C.M., Bragen, M., Sydelko, P.: Complex adaptive systems modeling with repast simphony. Complex Adapt. Syst. Model. 1(1), 3 (2013). doi:10.1186/2194-3206-1-3
Article Google Scholar
NVIDIA: Whitepaper NVIDIA GeForce GTX 750 Ti, v1.1
NVIDIA: Whitepaper NVIDIA GeForce GTX 980, v1.1
NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Fermi, v1.1
NVIDIA: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110, v1.0
NVIDIA: Nvidia geforce 8800 gpu architecture overview. Technical brief, November 2006 (2006)
NVIDIA: Tuning Cuda Applications for Kepler, v7.0 (2015)
NVIDIA: Tuning Cuda Applications for Maxwell, v7.0 (2015)
NVIDIA: Nvidia geforce gtx 200 gpu architectural overview. Technical brief, May (2008)
Oxman, G., Weiss, S., Be’ery, Y.: Computational methods for conway’s game of life cellular automaton. J. Comput. Sci. 5(1), 24–31 (2014). doi:10.1016/j.jocs.2013.07.005
Article MathSciNet Google Scholar
Papadopoulou, M.M., Sadooghi-Alvandi, M., Wong, H.: Micro-benchmarking the GT200 GPU. Computer Group, ECE, University of Toronto, Technical Report (2009)
Perumalla, K.S., Aaby, B.G.: Data parallel execution challenges and runtime performance of agent simulations on gpus. In: Proceedings of the 2008 Spring Simulation Multiconference, SpringSim’08, pp. 116–123. Society for Computer Simulation International, San Diego, CA, USA (2008). http://dl.acm.org/citation.cfm?id=1400549.1400564
Pohl, T., Deserno, F., Thurey, N., Rude, U., Lammers, P., Wellein, G., Zeiser, T.: Performance evaluation of parallel large-scale lattice boltzmann applications on three supercomputing architectures. In: Proceedings of the ACM/IEEE SC2004 Conference p. 21 (2004). doi:10.1109/sc.2004.37
Preis, T., Virnau, P., Paul, W., Schneider, J.J.: GPU accelerated Monte Carlo simulation of the 2D and 3D ising model. J. Comput. Phys. 228(12), 4468–4477 (2009). doi:10.1016/j.jcp.2009.03.018
Article MATH Google Scholar
RanjanNayak, D., Kumar Sahu, S., Mohammed, J.: A cellular automata based optimal edge detection technique using twenty-five neighborhood model. IJCA 84(10), 27–33 (2013). doi:10.5120/14614-2869
Article Google Scholar
Rapaport, D.: Enhanced molecular dynamics performance with a programmable graphics processor. Comput. Phys. Commun. 182(4), 926–934 (2011). doi:10.1016/j.cpc.2010.12.029
Article MATH Google Scholar
Rauch, L., Madej, L., Spytkowski, P., Golab, R.: Development of the cellular automata framework dedicated for metallic materials microstructure evolution models. Arch. Civil Mech. Eng. 15(1), 48–61 (2015). doi:10.1016/j.acme.2014.06.006
Article Google Scholar
Russo, L., Russo, P., Vakalis, D., Siettos, C.: Detecting weak points of wildland fire spread: a cellular automata model risk assessment simulation approach. Chem. Eng. 36, 253–258 (2014)
Google Scholar
Rybacki, S., Himmelspach, J., Uhrmacher, A.M.: Experiments with single core, multi-core, and GPU based computation of cellular automata. In: First International Conference on Advances in System Simulation, 2009. SIMUL’09, pp. 62–67. IEEE (2009)
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.m.W.: Program optimization space pruning for a multithreaded gpu. In: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization—CGO 08 (2008). doi:10.1145/1356058.1356084
Shimokawabe, T., Aoki, T., Takaki, T., Yamanaka, A., Nukada, A., Endo, T., Maruyama, N., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the tsubame 2.0 supercomputer. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2011)
Smoller, J.: Shock waves and reaction-diffusion equations. In: Research Supported by the US Air Force and National Science Foundation, vol. 258. Springer, New York(Grundlehren der Mathematischen Wissenschaften, vol. 258), p. 600 (1983)
Topa, P.: Cellular automata model tuned for efficient computation on GPU with global memory cache. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 380–383 (2014). doi:10.1109/pdp.2014.97
Topa, P., Młocek, P.: Using shared memory as a cache in cellular automata water flow simulations on gpus. Comput. Sci. 14, 3 (2013)
Google Scholar
Top 500 supercomputers, list of june 2016. http://www.top500.org/lists/2016/06/
Veerbeek, W., Pathirana, A., Ashley, R., Zevenbergen, C.: Enhancing the calibration of an urban growth model using a memetic algorithm. Comput. Environ. Urban Syst. 50, 53–65 (2015). doi:10.1016/j.compenvurbsys.2014.11.003
Article Google Scholar
Volkov, V.: Better performance at lower occupancy. In: Proceedings of the GPU Technology Conference, GTC, vol. 10 (2010)
Volkov, V., Demmel, J.: Benchmarking GPUs to tune dense linear algebra. 2008 SC—International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2008). doi:10.1109/sc.2008.5214359
Wilensky, U.: Netlogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL (1999). http://ccl.northwestern.edu/netlogo/
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65 (2009). doi:10.1145/1498765.1498785
Article Google Scholar
Zhao, Y.: GPU accelerated computation and real-time rendering of cellular automata model for spatial simulation. J. Inform. Comput. Sci. 11(12), 4453–4465 (2014). doi:10.12733/jics20104445
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge support from CONICET, ANPCyT Grant PICT-2014-0696, and a SeCTyP UNCuyo Grant. This work used the clusters Mendieta from CCAD-UNC and ICB-ITIC from UNCuyo, which are part of the SNCAD network.

Author information

Authors and Affiliations

CONICET and FCEN, Universidad Nacional de Cuyo, Mendoza, Argentina
Emmanuel N. Millán & Eduardo M. Bringa
Universidad Nacional de Córdoba, Córdoba, Argentina
Nicolás Wolovick
Universidad Nacional de San Luis, San Luis, Argentina
María Fabiana Piccoli
ITIC and FING, Universidad Nacional de Cuyo, Mendoza, Argentina
Carlos García Garino

Authors

Emmanuel N. Millán
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás Wolovick
View author publications
You can also search for this author in PubMed Google Scholar
María Fabiana Piccoli
View author publications
You can also search for this author in PubMed Google Scholar
Carlos García Garino
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo M. Bringa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emmanuel N. Millán.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millán, E.N., Wolovick, N., Piccoli, M.F. et al. Performance analysis and comparison of cellular automata GPU implementations. Cluster Comput 20, 2763–2777 (2017). https://doi.org/10.1007/s10586-017-0850-3

Download citation

Received: 11 November 2016
Revised: 28 March 2017
Accepted: 29 March 2017
Published: 07 April 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-0850-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance analysis and comparison of cellular automata GPU implementations

Abstract

Access this article

Similar content being viewed by others

GPU-accelerated simulations of mass-action kinetics models with cupSODA

Evaluating GPU Programming Models for the LUMI Supercomputer

Artificial Chemistries on GPU

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance analysis and comparison of cellular automata GPU implementations

Abstract

Access this article

Similar content being viewed by others

GPU-accelerated simulations of mass-action kinetics models with cupSODA

Evaluating GPU Programming Models for the LUMI Supercomputer

Artificial Chemistries on GPU

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation