research-article

Public Access

FPVM: Towards a Floating Point Virtual Machine

Authors:

Nick Wanninger,

Charles Bernat,

Souradip Ghosh,

Christopher Kraemer,

Yehya ElmasryAuthors Info & Claims

HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

Pages 16 - 29

https://doi.org/10.1145/3502181.3531469

Published: 27 June 2022 Publication History

Abstract

Alternatives to IEEE floating point arithmetic have become all the rage. Some extract more representational power out of the available bits. Others offer the potential for lower or higher precision than is available in IEEE-compatible hardware. Even an "interface to the real numbers" has recently been proposed. Using such alternative arithmetic systems within an existing scientific or other significant codebase is a major challenge, however. We explore how to address this challenge through virtualizing the IEEE floating point hardware, specifically on x64. The goal of the floating point virtual machine (FPVM) is to allow an existing application binary to be seamlessly extended to support the desired alternative arithmetic system with overheads determined by that system and not the virtualization mechanisms. We describe the prospects, issues, and tradeoffs for four different approaches for building FPVM: trap-and-emulate, trap-and-patch, binary transformation, and IR transformation. We then describe the design and implementation of our current design, which combines static binary analysis/translation and trap-and-emulate execution. We evaluate our FPVM implementation on several benchmarks, virtualizing them to use posits and MPFR. Finally, we comment on kernel- and hardware-level innovations that could further reduce overheads for floating point virtualization.

References

[1]

The risc-v instruction set manual. volume i: User-level isa.

[2]

Capstone: The ultimate disassembler, 2021.

[3]

Arnold, M. G., Bailey, T. A., Cowles, J. R., and Cupal, J. J. Redundant logarithmic arithmetic. IEEE Transactions on Computers 39, 8 (Aug. 1990), 1077--1086.

Digital Library

[4]

Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinksi, T., Schreiber, R., Simon, H., Venkatakrishnan, V., and Weeratunga, S. The nas parallel benchmarks (nas 1). Tech. Rep. RNR-94-007, NASA, March 1994.

[5]

Balakrishnan, G., and Reps, T. Analyzing memory accesses in x86 executables. In International conference on compiler construction (2004), Springer, pp. 5--23.

[6]

Bao, T., and Zhang, X. On-the-fly detection of instability problems in floating-point program execution. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) (October 2013).

Digital Library

[7]

Bellard, F. Libbf: The tiny big float library. Available at https://bellard.org/libbf/, 2017.

[8]

Bentley, M., Briggs, I., Gopalakrishnan, G., Ahn, D. H., Laguna, I., Lee, G. L., and Jones, H. E. Multi-level analysis of compiler-induced variability and performance tradeoffs. In Proceedings of the 28th ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2019) (June 2019).

Digital Library

[9]

Benz, F., Hildebrandt, A., and Hack, S. A dynamic program analysis to find floating-point accuracy problems. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2012).

Digital Library

[10]

Boehm, H.-J. Simple garbage-collector-safety. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (New York, NY, USA, 1996), PLDI '96, Association for Computing Machinery, p. 89--98.

[11]

Boehm, H.-J. Towards an api for the real numbers. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (June 2020).

Digital Library

[12]

Bryan, G. L., Norman, M. L., O'Shea, B. W., Abel, T., Wise, J. H., Turk, M. J., Reynolds, D. R., Collins, D. C., Wang, P., Skillman, S. W., Smith, B., Harkness, R. P., Bordner, J., Kim, J.-h., Kuhlen, M., Xu, H., Goldbaum, N., Hummels, C., Kritsuk, A. G., Tasker, E., Skory, S., Simpson, C. M., Hahn, O., Oishi, J. S., So, G. C., Zhao, F., Cen, R., Li, Y., and The Enzo Collaboration. ENZO: An Adaptive Mesh Refinement Code for Astrophysics. The Astrophysical Journal 211, 2 (March 2014), 19.

[13]

Cherkaev, A. The secret life of a nan. https://anniecherkaev.com/the-secret-life-of-nan, March 2018.

[14]

Chiang, W.-F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., and Rakamariç, Z. Rigorous floating-point mixed-precision tuning. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL) (2017), pp. 300--315.

Digital Library

[15]

Courbet, C. Nsan: A floating-point numerical sanitizer. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction (CC) (March 2021).

Digital Library

[16]

Crozier, P., Thornquist, H., Numrich, R., Williams, A., Edwards, H., Keiter, E., Rajan, M., Willenbring, J., Doerfler, D., and Heroux, M. Improving performance via mini-applications. Tech. Rep. SAND2009--5574, Sandia National Laboratories, January 2009.

[17]

Devine, S., Bugnion, E., and Rosenblum, M. Virtualization system including a virtual machine monitor for a computer with a segmented architecture. United States Patent Number 6397242.

[18]

Dinda, P., and Bernat, A. Comparing the understanding of ieee floating point between scientific and non-scientific users. Tech. Rep. NWU-CS-2021-07, Department of Computer Science, Northwestern University, December 2021.

[19]

Dinda, P., Bernat, A., and Hetland, C. Spying on the floating point behavior of existing, unmodified scientific applications. In Proceedings of the 29th ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2020) (June 2020). Best Paper.

Digital Library

[20]

Dinda, P., and Hetland, C. Do developers understand IEEE floating point? In Proceedings of the 32rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2018) (Apr. 2018).

[21]

Duck, G. J., Gao, X., and Roychoudhury, A. Binary rewriting without control flow recovery. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2020), PLDI 2020, Association for Computing Machinery, p. 151--163.

[22]

Févotte, F., and Lathuilière, B. VERROU: assessing floating point accuracy without recompiling, October 2016. working paper or preprint.

[23]

Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., and Zimmermann, P. Mpfr: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software (TOMS) 33, 2 (June 2007).

Digital Library

[24]

Ghosh, S., Cuevas, M., Campanoni, S., and Dinda, P. Compiler-based timing for extremely fine-grain preemptive parallelism. In Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing (SC 2020) (November 2020).

[25]

Goldberg, R. Survey of virtual machine research. IEEE Computer (June 1974), 34--45.

Digital Library

[26]

Gustafson, J. The End of Error: Unum Computing. Chapman and Hall/CRC, 2015.

[27]

Hale, K., and Dinda, P. A case for transforming parallel runtimes into operating system kernels. In Proceedings of the 24th ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2015) (June 2015).

Digital Library

[28]

Hale, K., and Dinda, P. Enabling hybrid parallel runtimes through kernel and virtualization support. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2016) (April 2016).

Digital Library

[29]

Hickey, T., Ju, Q., and Van Emden, M. H. Interval arithmetic: From principles to implementation. Journal of the ACM 48, 5 (Sept. 2001), 1038--1068.

Digital Library

[30]

Hollingsworth, J. K., and Buck, B. DynInstAPI Programmer's Guide Release 1.0, July 1997. http://www.cs.umd.edu/ hollings/dyninstAPI/dyninstUserGuide.pdf.

[31]

Ian A. Mason, S. I. https://github.com/SRI-CSL/gllvm, 2018.

[32]

IEEE Floating Point Working Group. IEEE standard for binary floating-point arithmetic. ANSI/IEEE Std 754--1985 (1985).

[33]

IEEE Floating Point Working Group. IEEE standard for floating-point arithmetic. IEEE Std 754-2008 (Aug 2008), 1--70.

[34]

Jin, H., Frumkin, M., and Yan, J. The openmp implementation of nas parallel benchmarks and its performance (nas 3). Tech. Rep. NAS-99-011, NASA, March 1999. OpenMP 3.0 version available at https://github.com/benchmark-subsetting/NPB3.0-omp-C.

[35]

Jost, T., Durand, Y., Fabre, C., Cohen, A., and Pétrot, F. Vp float: First class treatment for variable precision floating point arithmetic. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT) (September 2020).

Digital Library

[36]

Jost, T. T., Durand, Y., Fabre, C., Cohen, A., and Pérrot, F. Seamless compiler integration of variable precision floating-point arithmetic. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (February-March 2021).

Digital Library

[37]

Kahan, W. A critique of john l. gustafson's the end of error--unum computation and his a radical approach to computation with real numbers. In Proceedings of the 23rd IEEE Symposium on Computer Arithmetic (ARITH) (July 2016).

[38]

Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D. T., Jammalamadaka, N., Huang, J., Yuen, H., Yang, J., Park, J., Heinecke, A., Georganas, E., Srinivasan, S., Kundu, A., Smelyanskiy, M., Kaul, B., and Kundu, P. D. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322, May 2019.

[39]

Lam, M. O., Hollingsworth, J. K., and Stewart, G. Dynamic floating-point cancellation detection. Parallel Computing 39, 3 (2013), 146--155.

Digital Library

[40]

Landi, W. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 4 (dec 1992), 323--337.

Digital Library

[41]

Lattner, C., and Adve, V. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04) (Palo Alto, California, Mar 2004).

Digital Library

[42]

Lee, W.-C., Bao, T., Zheng, Y., Zhang, X., Vora, K., and Gupta, R. Raive: Runtime assessment of floating-point instability by vectorization. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2015).

Digital Library

[43]

Matula, D. W., and Kornerup, P. Finite precision rational arithmetic: Slash number systems. IEEE Transactions on Computers C-34, 1 (Jan 1985), 3--18.

Digital Library

[44]

Milroy, D. J., Baker, A. H., Hammerling, D. M., Dennis, J. M., Mickelson, S. A., and Jessup, E. R. Towards characterizing the variability of statistically consistent community earth system model simulations. Procedia Computer Science 80, C (June 2016), 1589--1600.

Digital Library

[45]

Moon, F. C. Chaotic and Fractal Dynamics: An Introduction for Applied Scientists and Engineers. John Wiley and Sons, Inc., 1992.

[46]

Omni OpenMP Compiler Group, University of Versailles Saint Quentin en Yvlines. Nas parallel benchmarks 3.0-unofficial openmp c version. https://github.com/benchmark-subsetting/NPB3.0-omp-C, 2014.

[47]

Omtzigt, E. T. L., Gottschling, P., Seligman, M., and Zorn, W. Universal Numbers Library: design and implementation of a high-performance reproducible number systems library. arXiv:2012.11011 (2020).

[48]

Panchekha, P., Sanchez-Stern, A., Wilcox, J. R., and Tatlock, Z. Automatically improving accuracy for floating point expressions. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (June 2015).

Digital Library

[49]

Popek, G., and Goldberg, R. Formal requirements for virtualizable third generation architectures. Communications of the ACM (July 1974), 413--421.

[50]

Ramalingam, G. The undecidability of aliasing. ACM Trans. Program. Lang. Syst. 16, 5 (sep 1994), 1467--1471.

Digital Library

[51]

Ravitch, T. https://github.com/travitch/whole-program-llvm, 2016.

[52]

Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C., and Hough, D. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing) (2013).

Digital Library

[53]

Sanchez-Stern, A., Panchekha, P., Lerner, S., and Tatlock, Z. Finding root causes of floating point error. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (June 2018).

Digital Library

[54]

Sawaya, G., Bentley, M., Briggs, I., Gopalakrishnan, G., and Ahn, D. H. Flit: Cross-platform floating-point result-consistency tester and workload. In Proceedings of the 2017 IEEE International Symposium on Workload Characterization (IISWC) (Oct 2017), pp. 229--238.

[55]

Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A., Grosen, J., Feng, S., Hauser, C., Kruegel, C., and Vigna, G. Sok: (state of) the art of war: Offensive techniques in binary analysis.

[56]

Sugerman, J., Venkitachalan, G., and Lim, B.-H. Virtualizing I/O devices on VMware workstation's hosted virtual machine monitor. In Proceedings of the USENIX Annual Technical Conference (June 2001).

Digital Library

[57]

Walker, J. Fbench: Floating point benchmarks. https://www.fourmilab.ch/fbench/, September 2021.

[58]

Wingo, A. Value representation in javascript implementations. http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations, May 2011.

Cited By

Filipiuk TWanninger NDhiantravan NSurmeier CBernat ADinda P(2023)CARAT KOP: Towards Protecting the Core HPC Kernel from Linux Kernel ModulesProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624237(1596-1605)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624237

Index Terms

FPVM: Towards a Floating Point Virtual Machine

Recommendations

SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
Transparently bridging semantic gap in CPU management for virtualized environments

Consolidated environments are progressively accommodating diverse and unpredictable workloads in conjunction with virtual desktop infrastructure and cloud computing. Unpredictable workloads, however, aggravate the semantic gap between the virtual ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

June 2022

314 pages

ISBN:9781450391993

DOI:10.1145/3502181

General Chairs:
Jon Weissman
University of Minnesota, MN, USA
,
Abhishek Chandra
University of Minnesota, MN, USA
,
Program Chairs:
Ada Gavrilovska
Georgia Institute of Technology, GA, USA
,
Devesh Tiwari
Northeastern University, MA, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

HPDC '22

Sponsor:

HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing

June 27 - July 1, 2022

MN, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)27

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Filipiuk TWanninger NDhiantravan NSurmeier CBernat ADinda P(2023)CARAT KOP: Towards Protecting the Core HPC Kernel from Linux Kernel ModulesProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624237(1596-1605)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624237

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten