MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Nichols, Daniel; Tomov, Nathalie-Sofia; Betancourt, Frank; Tomov, Stanimire; Wong, Kwai; Dongarra, Jack

doi:10.1007/978-3-030-34356-9_37

Daniel Nichols¹²,
Nathalie-Sofia Tomov¹²,
Frank Betancourt¹²,
Stanimire Tomov¹²,
Kwai Wong¹² &
…
Jack Dongarra^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

International Conference on High Performance Computing

5948 Accesses
4 Citations

Abstract

In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bitbucket.org/icl/magmadnn/.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). http://arxiv.org/abs/1603.04467
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 21–38. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_2
Chapter Google Scholar
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. CoRR abs/1802.09941 (2018). http://arxiv.org/abs/1802.09941
Chen, J., Monga, R., Bengio, S., Józefowicz, R.: Revisiting distributed synchronous SGD. CoRR abs/1604.00981 (2016). http://arxiv.org/abs/1604.00981
Chen, S., Gessinger, A., Tomov, S.: Design and acceleration of convolutional neural networks on modern architectures. Technical report, Joint Institute for Computational Sciences (JICS), UTK (2018). 2018 Summer Research Experiences for Undergraduate (REU), Knoxville, TN 2018
Google Scholar
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759 (2014). http://arxiv.org/abs/1410.0759
Gates, M., Tomov, S., Dongarra, J.: Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs. Parallel Comput. 74, 3–18 (2018). https://doi.org/10.1016/j.parco.2017.10.004. http://www.sciencedirect.com/science/article/pii/S0167819117301758. Parallel Matrix Algorithms and Applications (PMAA’16)
Article MathSciNet Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017). http://arxiv.org/abs/1706.02677
Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. CoRR abs/1511.00175 (2015). http://arxiv.org/abs/1511.00175
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014). http://arxiv.org/abs/1408.5093
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Smith, S.L., Kindermans, P., Le, Q.V.: Don’t decay the learning rate, increase the batch size. CoRR abs/1711.00489 (2017). http://arxiv.org/abs/1711.00489
Sorna, A., Cheng, X., D’Azevedo, E., Wong, K., Tomov, S.: Optimizing the fast fourier transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW). pp. 3–7, December 2018. https://doi.org/10.1109/HiPCW.2018.8634417
Tomov, N., Tomov, S.: On deep neural networks for detecting heart disease. CoRR abs/1808.07168 (2018). http://arxiv.org/abs/1808.07168
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005. http://www.sciencedirect.com/science/article/pii/S0167819109001276. Parallel Matrix Algorithms and Applications
Article MATH Google Scholar
Tomov, S., Haidar, A., Ayala, A., Schultz, D., Dongarra, J.: Design and implementation for FFT-ECP on distributed accelerated systems. ECP WBS 2.3.3.09 Milestone Report FFT-ECP ST-MS-10-1410, Innovative Computing Laboratory, University of Tennessee, April 2019. 04–2019 revision
Google Scholar
Wong, K., Brown, L., Coan, J., White, D.: Distributive interoperable executive library (DIEL) for systems of multiphysics simulation. In: 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 49–55. IEEE (2014)
Google Scholar
You, Y., Gitman, I., Ginsburg, B.: Scaling SGD batch size to 32k for imagenet training. CoRR abs/1708.03888 (2017). http://arxiv.org/abs/1708.03888
You, Y., Zhang, Z., Hsieh, C., Demmel, J.: 100-epoch imagenet training with AlexNet in 24 minutes. CoRR abs/1709.05011 (2017). http://arxiv.org/abs/1709.05011

Download references

Acknowledgments

This work was conducted at the Joint Institute for Computational Sciences (JICS) and the Innovative Computing Laboratory (ICL). This work is sponsored by the National Science Foundation (NSF), through NSF REU Award #1659502, with additional Support from the University of Tennessee, Knoxville (UTK), the National Institute for Computational Sciences (NICS), and NSF Awards #1740250 and #1709069. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant #ACI-1548562. Computational Resources are available through a XSEDE education allocation awards TG-ASC170031 and TG-ASC190013. In addition, the computing work was also performed on technical workstations donated by the BP High Performance Computing Team, as well as on GPUs donated by NVIDIA.

Author information

Authors and Affiliations

University of Tennessee, Knoxville, TN, 37996, USA
Daniel Nichols, Nathalie-Sofia Tomov, Frank Betancourt, Stanimire Tomov, Kwai Wong & Jack Dongarra
Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Jack Dongarra

Authors

Daniel Nichols
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie-Sofia Tomov
View author publications
You can also search for this author in PubMed Google Scholar
Frank Betancourt
View author publications
You can also search for this author in PubMed Google Scholar
Stanimire Tomov
View author publications
You can also search for this author in PubMed Google Scholar
Kwai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stanimire Tomov .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nichols, D., Tomov, NS., Betancourt, F., Tomov, S., Wong, K., Dongarra, J. (2019). MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_37
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics