Reducing division latency with reciprocal caches

Oberman, Stuart F.; Flynn, Michael J.

doi:10.1007/BF02425917

Reducing division latency with reciprocal caches

Ускорение деления с помощью кэширования обратных значений

Published: June 1996

Volume 2, pages 147–153, (1996)
Cite this article

Reliable Computing

Stuart F. Oberman¹ &
Michael J. Flynn¹

26 Accesses
10 Citations
Explore all metrics

Abstract

Floating-point division is generally regarded as a high latency operation in typical floating-point applications. Many techniques exist for increasing division performance, often at the cost of increasing either chip area, cycle time, or both. This paper presents two methods for reducing the latency of division. Using applications from the SPECfp92 and NAS benchmark suites, these methods are evaluated to determine their effects on overall system performance. The notion of recurring computation is presented, and it is shown how recurring division can be exploited using an additional, dedicated division cache. For multiplication-based division algorithms, reciprocal caches can be utilized to store recurring reciprocals. Results show that reciprocal caches can achieve nearly a two-times speedup in division performance for reasonable cache sizes.

Abstract

Деление значений с ¶rt;лаваю¶rt;ей точкой в ¶rt;р¶rt;ложениях, ис¶rt;ользуюн¶rt;х арифметику с ¶rt;лаваю¶rt;ей точкой, обычно требует боль¶rt;их затрат времени. Д¶rt;я ¶rt;овы¶rt;ения эффективности деления ¶rt;релложено немало методов, многие из которых требуют увеличения ¶rt;ло¶rt;ади кристалла, снижения тактовой частоты или и того, и другого. Представлены лва метода ускорения опера¶rt;ии леления. Приводятся данные о влиянии зтих методов на об¶rt;ую ¶rt;роизводительность системы, ¶rt;олученные с ¶rt;омо¶rt;ью тестовых ¶rt;рограмм из ¶rt;акетов SPECfp92 и NAS. Приводится ¶rt;онятие рекуррентных вычн¶rt;ений и ¶rt;реллагается с¶rt;особ реализа¶rt;ии рекуррентного деления с ¶rt;омо¶rt;ью до¶rt;олнительной кэ¶rt;-¶rt;амяти, отвеленной с¶rt;е¶rt;иально для этой о¶rt;ера¶rt;и. В алгоритмах деления, основанных на умножении, можно использовать кэ¶rt;-¶rt;амять для хранения рекуррентных обратных значений. Результаты свидетельствуют, то кэ¶rt;-¶rt;амять для обратных значений может обес¶rt;ечить ¶rt;очти двукратное увеличение скорости деления ¶rt;ри сравнительно небол¶rt;ом ее размере.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alternative Approach to Rounding Issues in Precision Computing with Accumulators, with Less Memory Consumption: A Proposal for Implementation

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

References

Flynn, M.On division by functional iteration. IEEE Transactions on ComputersC-19 (8) (1970).
ANSI/IEEE std 754–1985, IEEE standard for binary floating-point arithmetic.
NAS parallel benchmarks release. August, 1991.
Oberman, S. and Flynn, M.Design issues in floating-point division. Technical Report No. CSL-TR-94-647, Computer Systems Laboratory, Stanford University, 1994.
Oberman, S. and Flynn, M.On division and reciprocal caches. Technical Report No. CSL-TR-95-666, Computer Systems Laboratory, Stanford University, 1995.
Richardson, S. E.Exploiting trivial and redundant computation. In: “Proceedings of the 11th IEEE Symposium on Computer Arithmetic”, 1993, pp. 220–227.
Spec benchmark suite release. February, 1992.
Srivastava, A. and Eustace, A.ATOM: a system for building customized program analysis tools. In: “Proceedings of the SIGPLAN’94 Conference on Programming Language Design and Implementation”, 1994, pp. 196–205.
Waser, S. and Flynn, M.Introduction to arithmetic for digital systems designers. Holt, Rinehart, and Winston, 1982.

Download references

Author information

Authors and Affiliations

Computer Systems Laboratory Department of Electrical Engineering, Stanford University, 94305-9030, Stanford, CA, USA
Stuart F. Oberman & Michael J. Flynn

Authors

Stuart F. Oberman
View author publications
You can also search for this author inPubMed Google Scholar
Michael J. Flynn
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oberman, S.F., Flynn, M.J. Reducing division latency with reciprocal caches. Reliable Comput 2, 147–153 (1996). https://doi.org/10.1007/BF02425917

Download citation

Received: 20 October 1995
Revised: 29 November 1995
Issue Date: June 1996
DOI: https://doi.org/10.1007/BF02425917

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing division latency with reciprocal caches

Abstract

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Alternative Approach to Rounding Issues in Precision Computing with Accumulators, with Less Memory Consumption: A Proposal for Implementation

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now