Skip to main content

Improving Performance of Floating Point Division on GPU and MIC

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9529))

Abstract

Floating point computing ability is an important concern in high performance scientific application and engineering computing. Although as a fundamental operation, floating point division (or reciprocal) has long been much less efficiency compared with addition and multiplication. Architectures like GPU and MIC even have no instruction for such division in the instruction level. This paper proposes a fast approximation algorithm to estimate the division of floating point numbers in IEEE 754 format based on existing instructions which in most cases are accurate enough for practical computing. It consists of a predicting step and an iterating step like most iterative numerical algorithm. The predicting step makes use of the property of IEEE 754 format to calculate estimation by only one integer subtraction instruction. The iterating step improves the accuracy by fast iterations in about ten instructions. This new algorithm is extremely easy to implement and shows a great performance in practical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. IEEE standard for floating-point arithmetic: IEEE Std 754–2008, 1–70 (2008)

    Google Scholar 

  2. Flynn, M.J.: On division by functional iteration. IEEE Trans. Comput. 100(8), 702–706 (1970)

    Article  MATH  Google Scholar 

  3. Goldschmidt, R.E.: Applications of division by convergence. Ph.D. thesis, Massachusetts Institute of Technology (1964)

    Google Scholar 

  4. Granlund, T., Montgomery, P.L.: Division by invariant integers using multiplication. In: ACM SIGPLAN Notices, vol. 29, pp. 61–72. ACM (1994)

    Google Scholar 

  5. Hwang, K., Louri, A.: Optical multiplication and division using modified-signed-digit symbolic substitution. Opt. Eng. 28(4), 284364–284364 (1989)

    Article  Google Scholar 

  6. Jeffers, J., Reinders, J.: Intel Xeon Phi coprocessor high-performance programming. Newnes (2013)

    Google Scholar 

  7. Markstein, P.: Software division and square root using Goldschmidts algorithms. In: Proceedings of the 6th Conference on Real Numbers and Computers (RNC6). vol. 123, pp. 146–157 (2004)

    Google Scholar 

  8. NVIDIA: CUDA C programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/

  9. Oberman, S.F.: Floating point division and square root algorithms and implementation in the AMD-K7 TM microprocessor. In: 14th IEEE Symposium on Computer Arithmetic, Proceedings, pp. 106–115. IEEE (1999)

    Google Scholar 

  10. Oberman, S.F., Flynn, M.J.: Design issues in division and other floating-point operations. IEEE Trans. Comput. 46(2), 154–161 (1997)

    Article  Google Scholar 

  11. Oberman, S.F.: Design issues in high performance floating point arithmetic units. Ph.D. thesis, Citeseer (1996)

    Google Scholar 

  12. Patterson, D.A., Hennessy, J.L.: Computer organization and design: the hardware/software interface. Newnes (2013)

    Google Scholar 

  13. Piñeiro, J.A., Bruguera, J.D.: High-speed double-precision computation of reciprocal, division, square root, and inverse square root. IEEE Trans. Comput. 51(12), 1377–1388 (2002)

    Article  MathSciNet  Google Scholar 

  14. Sharangpani, H., Barton, M.: Statistical analysis of floating point flaw in the pentium processor. Intel Corporation (1994)

    Google Scholar 

  15. Soderquist, P., Leeser, M.: Division and square root: choosing the right implementation. IEEE Micro 17(4), 56–66 (1997)

    Article  Google Scholar 

  16. Wikipedia: Double-precision floating-point format. https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Download references

Acknowledgments

We thank anonymous reviewers for comments and suggestions on the submitted version of this paper. Special thanks to the suggestions from members of the Parallel Software Group of EECS, Peking University.

This research is supported by the National HTRD 863 Plan under Grants No. 2012AA010902, 2012AA010903; and NSFC Grants No. 61170053, 61432018, 61379048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, K., Chen, Y. (2015). Improving Performance of Floating Point Division on GPU and MIC. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27122-4_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27121-7

  • Online ISBN: 978-3-319-27122-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics