skip to main content
10.1145/3330345.3330360acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

AMPT-GA: automatic mixed precision floating point tuning for GPU applications

Published: 26 June 2019 Publication History

Abstract

Mixed precision computations improve high performance computing throughput for applications that can tolerate decreased mathematical precision in their computations. Native mixed precision computation is commonplace in today's GPGPU accelerators where it is applied to applications with well-known tolerances for reduced mathematical precision. Applications with stricter accuracy needs lack support for selecting precisions that both improve performance and satisfy these accuracy requirements. Prior works have focused primarily on accuracy, leaving performance concerns such as the overhead of casting unanswered in GPGPU contexts. In this paper, we present a system called AMPT-GA that selects application-level data precisions to maximize performance while satisfying accuracy constraints. We combine static analysis for casting-aware performance modeling with dynamic analysis for modeling and enforcing precision constraints. We further improve our optimizations with application-aware mutations in our genetic algorithm-based search function. AMPT-GA improves the performance efficiency of our target applications more than the prior state-of-the-art approach called Precimonious. AMPT-GA outperforms Precimonious in efficiency by 14--63%.

References

[1]
{n. d.}. CORAL Benchmarks. https://asc.llnl.gov/CORAL-benchmarks/ ({n. d.}).
[2]
Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R de Supinski, Dong H Ahn, and Martin Schulz. 2010. AutomaDeD: Automata-based debugging for dissimilar parallel tasks. In 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). IEEE, 231--240.
[3]
Greg Bronevetsky, Ignacio Laguna, Bronis R de Supinski, and Saurabh Bagchi. 2012. Automatic fault characterization via abnormality-enhanced classification. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012). IEEE, 1--12.
[4]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 44--54.
[5]
Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrishnan, and Zvonimir Rakamarić. 2017. Rigorous Floating-point Mixed-precision Tuning. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 300--315.
[6]
A Conn, Nick Gould, and Ph Toint. 1997. A globally convergent Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Mathematics of Computation of the American Mathematical Society 66, 217 (1997), 261--288.
[7]
NVidia Corporation. {n. d.}. CUDA C Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions__throughput-native-arithmetic-instructions. ({n. d.}).
[8]
Eva Darulova and Viktor Kuncak. 2017. Towards a compiler for reals. ACM Transactions on Programming Languages and Systems (TOPLAS) 39, 2 (2017), 8.
[9]
Kusum Deep, Krishna Pratap Singh, Mitthan Lal Kansal, and C Mohan. 2009. A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl. Math. Comput. 212, 2 (2009), 505--518.
[10]
David E Goldberg and John H Holland. 1988. Genetic algorithms and machine learning. Machine learning 3, 2 (1988), 95--99.
[11]
Hui Guo and Cindy Rubio-González. 2018. Exploiting community structure for floating-point precision tuning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 333--343.
[12]
Ian Karlin, Abhinav Bhatele, Bradford L. Chamberlain, Jonathan Cohen, Zachary Devito, Maya Gokhale, Riyaz Haque, Rich Hornung, Jeff Keasler, Dan Laney, Edward Luke, Scott Lloyd, Jim McGraw, Rob Neely, David Richards, Martin Schulz, Charle H. Still, Felix Wang, and Daniel Wong. 2012. LULESH Programming Model and Performance Ports Overview. Technical Report LLNL-TR-608824. Lawrence Livermore National Lab. 1--17 pages.
[13]
Lawrence Livermore National Lab. 2017. LULESH 2.0 Benchmark Summary. https://asc.llnl.gov/CORAL-benchmarks/Summaries/LULESH_Summary_v1.pdf. (2017).
[14]
Michael O Lam and Jeffrey K Hollingsworth. 2016. Fine-grained floating-point precision analysis. The International Journal of High Performance Computing Applications (2016), 1094342016652462.
[15]
Michael O Lam, Jeffrey K Hollingsworth, Bronis R de Supinski, and Matthew P LeGendre. 2013. Automatically adapting programs for mixed-precision floating-point computation. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 369--378.
[16]
Michael O Lam and Barry L Rountree. 2016. Floating-point shadow value analysis. In Proceedings of the 5th Workshop on Extreme-Scale Programming Tools. IEEE Press, 18--25.
[17]
Reza Mokhtari and Michael Stumm. 2014. BigKernel-High Performance CPU-GPU Communication Pipelining for Big Data-Style Applications. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 819--828.
[18]
Andres Nötzli and Fraser Brown. 2016. LifeJacket: verifying precise floating-point optimizations in LLVM. In Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis. ACM, 24--29.
[19]
Mostofa Patwary, Sharan Narang, Eric Undersander, Joel Hestness, and Gregory Diamos. 2018. Experimental Evaluation of Mixed Precision Training for End to End Applications. http://research.baidu.com/Blog/index-view?id=103. (May 2018).
[20]
Cindy Rubio-González, Cuong Nguyen, Benjamin Mehne, Koushik Sen, James Demmel, William Kahan, Costin Iancu, Wim Lavrijsen, David H Bailey, and David Hough. 2016. Floating-point precision tuning using blame analysis. In Proceedings of the 38th International Conference on Software Engineering. ACM, 1074--1085.
[21]
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 27.
[22]
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. ACM SIGPLAN Notices 49, 6 (2014), 53--64.
[23]
Laurent Thévenoux, Philippe Langlois, and Matthieu Martel. 2017. Automatic source-to-source error compensation of floating-point programs: code synthesis to optimize accuracy and time. Concurrency and Computation: Practice and Experience 29, 7 (2017).
[24]
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183--200.
[25]
Jia Zhan, Onur Kayiran, Gabriel H Loh, Chita R Das, and Yuan Xie. 2016. OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--13.

Cited By

View all
  • (2024)Can AI Replace Stock Analysts? Evidence from Deep Learning Financial StatementsSSRN Electronic Journal10.2139/ssrn.4813310Online publication date: 2024
  • (2024)SeTHet - Sending Tuned numbers over DMA onto Heterogeneous clusters: an automated precision tuning storyProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649203(258-266)Online publication date: 7-May-2024
  • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '19: Proceedings of the ACM International Conference on Supercomputing
June 2019
533 pages
ISBN:9781450360791
DOI:10.1145/3330345
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Can AI Replace Stock Analysts? Evidence from Deep Learning Financial StatementsSSRN Electronic Journal10.2139/ssrn.4813310Online publication date: 2024
  • (2024)SeTHet - Sending Tuned numbers over DMA onto Heterogeneous clusters: an automated precision tuning storyProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649203(258-266)Online publication date: 7-May-2024
  • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
  • (2024)Predicting Performance and Accuracy of Mixed-Precision Programs for Precision TuningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623338(1-13)Online publication date: 20-May-2024
  • (2024)Toward Automated Precision Tuning of Weather and Climate Models: A Case StudySC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00026(148-159)Online publication date: 17-Nov-2024
  • (2024)Interleaved Execution of Approximated CUDA Kernels in Iterative Applications2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00017(60-67)Online publication date: 20-Mar-2024
  • (2024)FBTuner: A Feedback-Directed Approach for Safe Mixed-Precision Tuning2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00077(1-2)Online publication date: 6-May-2024
  • (2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
  • (2024)Convergence-aware operator-wise mixed-precision trainingCCF Transactions on High Performance Computing10.1007/s42514-024-00208-9Online publication date: 31-Dec-2024
  • (2024)Auto‐Tuning Mixed‐Precision Computation by Specifying Multiple RegionsConcurrency and Computation: Practice and Experience10.1002/cpe.832637:2Online publication date: 7-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media