Computer performance analysis and the Pi Theorem

Numrich, Robert W.

doi:10.1007/s00450-010-0147-8

Computer performance analysis and the Pi Theorem

Regular Paper
Published: 29 December 2010

Volume 29, pages 45–71, (2014)
Cite this article

Computer Science - Research and Development

Robert W. Numrich¹

385 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper applies the Pi Theorem of dimensional analysis to a representative set of examples from computer performance analysis. It is a survey paper that takes a different look at problems involving latency, bandwidth, cache-miss ratios, and the efficiency of parallel numerical algorithms. The Pi Theorem is the fundamental tool of dimensional analysis, and it applies to problems in computer performance analysis just as well as it does to problems in other sciences. Applying it requires the definition of a system of measurement appropriate for computer performance analysis with a consistent set of units and dimensions. Then a straightforward recipe for each specific problem reduces the number of independent variables to a smaller number of dimensionless parameters. Two machines with the same values of these parameters are self-similar and behave the same way. Self-similarity relationships emphasize how machines are the same rather than how they are different. The Pi Theorem is simple to state and simple to prove, using purely algebraic methods, but the results that follow from it are often surprising and not simple at all. The results are often unexpected but they almost always reveal something new about the problem at hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Agarwal A, Horowitz M, Hennessy J (1989) An analytical cache model. ACM Trans Comput Syst 7(2):184–215
Article Google Scholar
Alam S, Kuehn JA, Barrett RF, Larkin JM, Fahey MR, Sankaran R, Worley PH (2007) Cray xt4: an early evaluation for petascale scientific simulation. In: SC’07: proceedings of the 2007 ACM/IEEE conference on supercomputing. ACM, New York, pp 1–12
Chapter Google Scholar
Barenblatt GI (1987) Dimensional analysis. Gordon and Breach Science Publishers, New York
Google Scholar
Barenblatt GI (1996) Scaling, self-similarity, and intermediate asymptotics. Cambridge University Press, Cambridge
MATH Google Scholar
Barenblatt GI (2003) Scaling. Cambridge University Press, Cambridge
Book MATH Google Scholar
Birge RT (1934) On electric and magnetic units and dimensions. Am Phys Teach 2(2):41–48
Article MathSciNet Google Scholar
Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part I. Am Phys Teach 3:102–109
Article Google Scholar
Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part II. Am Phys Teach 3:171–179
Article Google Scholar
Birkhoff G (1960) Hydrodynamics: a study in logic, fact and similitude, 2nd edn. Princeton University Press, Princeton
MATH Google Scholar
Bond WN (1930) Concerning electrical and other dimensions. Philos Mag 9:842–847
Google Scholar
Brand L (1957) The Pi Theorem of dimensional analysis. Arch Ration Mech Anal 1:35–45
Article MATH MathSciNet Google Scholar
Bridgman PW (1931) Dimensional analysis, 2nd edn. Yale University Press, New Haven
Google Scholar
Buckingham E (1914) On physically similar systems: illustrations of the use of dimensional equations. Phys Rev 4:345–376
Article Google Scholar
Burks AW, Goldstine HH, von Neumann J (1963) Preliminary discussion of the logical design of an electronic computing instrument. In: John von Neumann collected works., vol V. Pergamon, Elmsford, pp 34–79
Google Scholar
Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined architectures. J Parallel Distrib Comput 5:334–358
Article Google Scholar
Comms1. http://phase.hpcc.jp/mirrors/netlib/parkbench/gbis/benchmark_results/comms1/
Dongarra JJ, Hey T, Strohmaier E (1996) Selected results from the ParkBench Benchmark. In: Bougé L, Fraigniaud P, Mignotte A, Robert Y (eds) Euro-par’96 parallel processing, second international euro-par conference, Proceedings, vol. II, Lyon, France, 26–29 August 1996. Lecture notes in computer science, vol 1124. Springer, Berlin, pp 251–254
Google Scholar
Drobot S (1954) On the foundations of dimensional analysis. Studia Math 14:84–99
MathSciNet Google Scholar
Einstein A (1911) Elementare Betrachtungen über die thermische Molekularbewegung in festen Körpern. Ann Phys 35:679–694
MATH Google Scholar
Focken CM (1953) Dimensional methods and their applications. Edward Arnold, Sevenoaks
Google Scholar
Baptiste J, Fourier J (1952) Analytical theory of heat. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago
Google Scholar
Fox GC, Otto SW, Hey AJG (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31
Article MATH Google Scholar
Gupta A, Kumar V, Sameh A (1995) Performance and scalability of preconditioned conjugate gradient methods on parallel computers. IEEE Trans Parallel Distrib Syst 6(5):455–469
Article Google Scholar
Gustafson JL, Montry GR, Benner RE (1988) Development of parallel methods for a 1024-processor hypercube. SIAM J Sci Stat Comput 9:609–638
Article MATH MathSciNet Google Scholar
Gustafson J, Rover D, Elbert S, Carter M (1991) The design of a scalable, fixed-time computer benchmark. J Parallel Distrib Comput 12(4):388–401
Article Google Scholar
Hartstein A, Srinivasan V, Puzak TR, Emma PG (2006) Cache miss behavior: is it \(\sqrt{2}\)? In: Proceedings of the 3rd conference on computing frontiers, May 3–5, Ischia, Italy. ACM Press, New York, pp 313–320
Google Scholar
Hockney RW, Curington IJ (1989) f-half: a parameter to characterise memory and communication bottlenecks. Parallel Comput 10:277–286
Article MATH Google Scholar
Hockney RW, Getov VS (1998) Low-level benchmarking: performance profiles. In: Proceedings of the sixth Euromicro workshop on parallel and distributed processing, PDP’98, Madrid, Spain, January 21–23 1998, pp 50–56
Chapter Google Scholar
Hockney R, Berry M (1994) Public international benchmarks for parallel computer. Sci Program 3(2):101–146
Google Scholar
Hockney RW (1991) Performance parameters and benchmarking of supercomputers. Parallel Comput 17:1111–1130
Article Google Scholar
Hockney RW (1995) Computational similarity. Concurrency Pract Exper 7(2):147–166
Article Google Scholar
Hockney RW (1996) The science of computer benchmarking. SIAM, Philadelphia
Book Google Scholar
Hockney RW, Jesshpe CR (1988) Parallel computers 2: architecture, programming and algorithms. Adam Hilger/IOP Publishing, Bristol and Philadelphia
MATH Google Scholar
HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/
Jackson JD (1962) Classical electrodynamics. Wiley, New York
Google Scholar
Kapitza SP (1966) A natural system of units in classical electrodynamics and electronics. Sov Phys Usp 9(1):184–186
Article Google Scholar
Krantz DH, Duncan Luce R, Suppes P, Tversky A (1971) Foundations of measurement volume I: additive and polynomial representations. Academic Press, New York. Dover reprint (2007)
Google Scholar
Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79
Article Google Scholar
Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117
Article Google Scholar
McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society; Technical committee on Computer Architecture Newsletter, December 1995
Miles D (1993) Compute intensity and the FFT. In: Proceedings supercomputing 1993, pp. 676–684
Murphy RC, Kogge PM (2007) On the memory access patterns of supercomputer applications: benchmark selection and its implications. IEEE Trans Comput 56(7):937–945
Article MathSciNet Google Scholar
National Institute of Standards and Technology. Prefixes for binary multiples. http://physics.nist.gov/cuu/Units/binary.html
Newton I (1952) Mathematical principles of natural philosophy. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago, 1725
Google Scholar
Numrich RW (1988) CRAY-2 common memory. Technical report HN-2043, Cray Research, Inc., Mendota Heights, MN, September 1988
Numrich RW (1992) Cray-2 memory organization and interprocessor memory contention. In: Meyer C, Plemmons RJ (eds) Linear algebra, Markov chains, and queueing models. The IMA volumes in mathematics and its applications, vol 48. Springer, Berlin, pp 267–294
Chapter Google Scholar
Numrich RW (1992) Memory contention for shared memory vector multiprocessors. In: Proceedings of supercomputing’92. IEEE Comput. Soc., Los Alamitos, pp 316–325
Chapter Google Scholar
Numrich RW (1997) Computational force, mass, and energy. Int J Modern Phys C 8(3):437–457
Article Google Scholar
Numrich RW (2005) Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax. Parallel Comput 31:588–607
Article Google Scholar
Numrich RW (2007) A note on scaling the Linpack benchmark. J Parallel Distrib Comput 67(4):491–498
Article MATH Google Scholar
Numrich RW (2007) Computational force: a unifying concept for scalability analysis. In: Bischof C, Bücker M, Gibbon P, Joubert G, Lippert T, Mohr B, Peters F (eds) Parallel computing: architectures, algorithms and applications, proceedings of the international conference ParCo 2007, pp 107–112. John von Neumann Institute for Computing (NIC) and Jülich Supercomputing Centre
Google Scholar
Numrich RW (2008) Computational forces in the Linpack benchmark. J Parallel Distrib Comput 68(9):1283–1290
Article MATH Google Scholar
Numrich RW (2008) Dimensional analysis applied to a parallel QR algorithm. In: Parallel processing and applied mathematics: proceedings of the seventh international conference on parallel processing and applied mathematics (PPAM07), 9–12 September 2007, Gdansk, Poland. Springer lecture notes in computer science, vol 4967, pp 148–157
Chapter Google Scholar
Numrich RW (2009) Computational forces in the SAGE benchmark. J Parallel Distrib Comput 69(3):315–325
Article Google Scholar
Numrich RW, Heroux MA (2009) A performance model with a fixed point for a molecular dynamics kernel. Comput Sci Res Develop 23(3–4):195–201
Article Google Scholar
Numrich RW, Heroux MA Self-similarity of parallel machines. Parallel Comput (2010, in press)
Numrich RW, Hochstein L, Basili V (2005) A metric space for productivity measurement in software development. In: Proceedings SE-HPCS’05, second international workshop on software engineering for high performance computing system applications, St. Louis, Missouri, 15 May 2005
Google Scholar
Numrich RW, Springer PL, Peterson JC (1994) Measurement of communication rates on the CRAY-T3D interprocessor network. In: High-performance computing and networking, international conference and exhibition, Munich, Germany, April 18–20, proceedings, volume II: networking and tools. Lecture notes in computer science, vol 797. Springer, Berlin, pp 150–157
Google Scholar
Olver PJ (1993) Applications of Lie groups to differential equations, 2nd edn. Springer, Berlin
Book MATH Google Scholar
Penrose R (2005) The road to reality: a complete guide to the laws of the universe. A.A. Knopf, New York
Google Scholar
Planck M (1914) The theory of heat radiation. P. Blakiston’s Son and Co., Philadelphia
Google Scholar
Planck M (1932) Theory of electricity and magnetism. Macmillan and Co., London
MATH Google Scholar
Reynolds O (1901) An experimental investigation of the circumstances which determine whether the motion shall be direct or sinuous, and of the law of resistance in parallel channels. In: Papers on mechanical and physical subjects. Cambridge University Press, Cambridge, pp 51–105
Google Scholar
Sedov LI (1959) Similarity and dimensional methods in mechanics. Infosearch Ltd., London
MATH Google Scholar
Simon HD, Strohmaier E (1995) Statistical analysis of NAS parallel benchmarks and LINPACK results. In: High-performance computing and networking. Springer lecture notes in computer science, vol 919, pp 626–633
Chapter Google Scholar
Singh JP, Stone HS, Thiébaut D (1992) A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Trans Comput 41(7):811–825
Article Google Scholar
Smith AJ (1982) Cache memories. ACM Comput Surv 14(3):473–530
Article Google Scholar
Smith AJ (1987) Line (block) size choice for CPU cache memories. IEEE Trans Comput C-36(9):1063–1075
Article Google Scholar
Snir M, Yu J (2005) On the theory of spatial and temporal locality. Technical report UIUC DCS-R-2005-2611, University of Illinois Urbana Champaign
Johnstone Stoney G (1881) On the physical units of nature. Philos Mag XI:381–390
Article Google Scholar
Strohmaier E (1995) Extending the concept of computational similarity for analyzing complex benchmarks. Technical report RUM 43/95, Computing Center, University of Mannheim, Germany, April 1995
Strohmaier E (1995) Using computational similarity to analyze the performance of the NAS parallel benchmarks. Technical report RUM 44/95, Computing Center, University of Mannheim, Germany, April 1995
Strohmaier E (1997) Statistical performance modeling: case study of the NPB 2.1 results. In: Proceedings of euro-par’97 parallel processing. Springer lecture notes in computer science, vol 1300, pp 985–992
Chapter Google Scholar
Taylor BN (1995) Guide for the use of the international system of units (SI). Special publication 811, National Institute of Standards and Technology
Taylor BN (2001) The international system of units (SI). Special publication 330, National Institute of Standards and Technology
Thiébaut D (1989) On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio. IEEE Trans Comput 38(7):1012–1026
Article Google Scholar
Thiébaut D, Wolf JL, Stone HS (1992) Synthetic traces for trace-driven simulation of cache memories. IEEE Trans Comput 41(4):388–410
Article Google Scholar
Tolman RC (1987) Relativity thermodynamics and cosmology. Dover, New York
Google Scholar
Top 500 Benchmark. http://www.top500.org/lists/linpack.php
Weinberg J, Snavely A, McCracken MO, Strohmaier E (2005) Measurement of spatial and temporal locality in memory access patterns. In: Proceedings of supercomputing’05. IEEE Comput Soc, Los Alamitos
Google Scholar
White BS, McKee SA, de Supinski BR, Miller B, Quinlanl D, Schulz M (2005) Improving the computational intensity of unstructured mesh applications. In: ICS’05: proceedings of the 19th annual international conference on supercomputing. ACM, New York, pp 341–350
Chapter Google Scholar
Willmore TJ (1959) An introduction to differential geometry. English language book society. Oxford University Press, London
Google Scholar
Worley PJ (1990) The effect of time constraints on scaled speedup. SIAM J Sci Stat Comput 11(5):838–858
Article MATH MathSciNet Google Scholar
Yu J, Baghsorkhi S, Snir M (2005) A new locality metric and case studies for HPCS benchmarks. Technical Report UIUC DCS-R-2005-2564, University of Illinois Urbana Champaign

Download references

Author information

Authors and Affiliations

Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
Robert W. Numrich

Authors

Robert W. Numrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert W. Numrich.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Numrich, R.W. Computer performance analysis and the Pi Theorem. Comput Sci Res Dev 29, 45–71 (2014). https://doi.org/10.1007/s00450-010-0147-8

Download citation

Received: 30 May 2009
Accepted: 09 December 2010
Published: 29 December 2010
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00450-010-0147-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Computer performance analysis and the Pi Theorem

Abstract

Access this article

Similar content being viewed by others

Metrics

Appendices

Understanding Slowdown in Large-Scale Heterogeneous Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computer performance analysis and the Pi Theorem

Abstract

Access this article

Similar content being viewed by others

Metrics

Appendices

Understanding Slowdown in Large-Scale Heterogeneous Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation