Skip to main content

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

Abstract

In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bitbucket.org/icl/magmadnn/.

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). http://arxiv.org/abs/1603.04467

  2. Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 21–38. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_2

    Chapter  Google Scholar 

  3. Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. CoRR abs/1802.09941 (2018). http://arxiv.org/abs/1802.09941

  4. Chen, J., Monga, R., Bengio, S., Józefowicz, R.: Revisiting distributed synchronous SGD. CoRR abs/1604.00981 (2016). http://arxiv.org/abs/1604.00981

  5. Chen, S., Gessinger, A., Tomov, S.: Design and acceleration of convolutional neural networks on modern architectures. Technical report, Joint Institute for Computational Sciences (JICS), UTK (2018). 2018 Summer Research Experiences for Undergraduate (REU), Knoxville, TN 2018

    Google Scholar 

  6. Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759 (2014). http://arxiv.org/abs/1410.0759

  7. Gates, M., Tomov, S., Dongarra, J.: Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs. Parallel Comput. 74, 3–18 (2018). https://doi.org/10.1016/j.parco.2017.10.004. http://www.sciencedirect.com/science/article/pii/S0167819117301758. Parallel Matrix Algorithms and Applications (PMAA’16)

    Article  MathSciNet  Google Scholar 

  8. Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017). http://arxiv.org/abs/1706.02677

  9. Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. CoRR abs/1511.00175 (2015). http://arxiv.org/abs/1511.00175

  10. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014). http://arxiv.org/abs/1408.5093

  11. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  12. Smith, S.L., Kindermans, P., Le, Q.V.: Don’t decay the learning rate, increase the batch size. CoRR abs/1711.00489 (2017). http://arxiv.org/abs/1711.00489

  13. Sorna, A., Cheng, X., D’Azevedo, E., Wong, K., Tomov, S.: Optimizing the fast fourier transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW). pp. 3–7, December 2018. https://doi.org/10.1109/HiPCW.2018.8634417

  14. Tomov, N., Tomov, S.: On deep neural networks for detecting heart disease. CoRR abs/1808.07168 (2018). http://arxiv.org/abs/1808.07168

  15. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005. http://www.sciencedirect.com/science/article/pii/S0167819109001276. Parallel Matrix Algorithms and Applications

    Article  MATH  Google Scholar 

  16. Tomov, S., Haidar, A., Ayala, A., Schultz, D., Dongarra, J.: Design and implementation for FFT-ECP on distributed accelerated systems. ECP WBS 2.3.3.09 Milestone Report FFT-ECP ST-MS-10-1410, Innovative Computing Laboratory, University of Tennessee, April 2019. 04–2019 revision

    Google Scholar 

  17. Wong, K., Brown, L., Coan, J., White, D.: Distributive interoperable executive library (DIEL) for systems of multiphysics simulation. In: 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 49–55. IEEE (2014)

    Google Scholar 

  18. You, Y., Gitman, I., Ginsburg, B.: Scaling SGD batch size to 32k for imagenet training. CoRR abs/1708.03888 (2017). http://arxiv.org/abs/1708.03888

  19. You, Y., Zhang, Z., Hsieh, C., Demmel, J.: 100-epoch imagenet training with AlexNet in 24 minutes. CoRR abs/1709.05011 (2017). http://arxiv.org/abs/1709.05011

Download references

Acknowledgments

This work was conducted at the Joint Institute for Computational Sciences (JICS) and the Innovative Computing Laboratory (ICL). This work is sponsored by the National Science Foundation (NSF), through NSF REU Award #1659502, with additional Support from the University of Tennessee, Knoxville (UTK), the National Institute for Computational Sciences (NICS), and NSF Awards #1740250 and #1709069. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant #ACI-1548562. Computational Resources are available through a XSEDE education allocation awards TG-ASC170031 and TG-ASC190013. In addition, the computing work was also performed on technical workstations donated by the BP High Performance Computing Team, as well as on GPUs donated by NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stanimire Tomov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nichols, D., Tomov, NS., Betancourt, F., Tomov, S., Wong, K., Dongarra, J. (2019). MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics