Skip to main content
Log in

Computer performance analysis and the Pi Theorem

  • Regular Paper
  • Published:
Computer Science - Research and Development

Abstract

This paper applies the Pi Theorem of dimensional analysis to a representative set of examples from computer performance analysis. It is a survey paper that takes a different look at problems involving latency, bandwidth, cache-miss ratios, and the efficiency of parallel numerical algorithms. The Pi Theorem is the fundamental tool of dimensional analysis, and it applies to problems in computer performance analysis just as well as it does to problems in other sciences. Applying it requires the definition of a system of measurement appropriate for computer performance analysis with a consistent set of units and dimensions. Then a straightforward recipe for each specific problem reduces the number of independent variables to a smaller number of dimensionless parameters. Two machines with the same values of these parameters are self-similar and behave the same way. Self-similarity relationships emphasize how machines are the same rather than how they are different. The Pi Theorem is simple to state and simple to prove, using purely algebraic methods, but the results that follow from it are often surprising and not simple at all. The results are often unexpected but they almost always reveal something new about the problem at hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal A, Horowitz M, Hennessy J (1989) An analytical cache model. ACM Trans Comput Syst 7(2):184–215

    Article  Google Scholar 

  2. Alam S, Kuehn JA, Barrett RF, Larkin JM, Fahey MR, Sankaran R, Worley PH (2007) Cray xt4: an early evaluation for petascale scientific simulation. In: SC’07: proceedings of the 2007 ACM/IEEE conference on supercomputing. ACM, New York, pp 1–12

    Chapter  Google Scholar 

  3. Barenblatt GI (1987) Dimensional analysis. Gordon and Breach Science Publishers, New York

    Google Scholar 

  4. Barenblatt GI (1996) Scaling, self-similarity, and intermediate asymptotics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  5. Barenblatt GI (2003) Scaling. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  6. Birge RT (1934) On electric and magnetic units and dimensions. Am Phys Teach 2(2):41–48

    Article  MathSciNet  Google Scholar 

  7. Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part I. Am Phys Teach 3:102–109

    Article  Google Scholar 

  8. Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part II. Am Phys Teach 3:171–179

    Article  Google Scholar 

  9. Birkhoff G (1960) Hydrodynamics: a study in logic, fact and similitude, 2nd edn. Princeton University Press, Princeton

    MATH  Google Scholar 

  10. Bond WN (1930) Concerning electrical and other dimensions. Philos Mag 9:842–847

    Google Scholar 

  11. Brand L (1957) The Pi Theorem of dimensional analysis. Arch Ration Mech Anal 1:35–45

    Article  MATH  MathSciNet  Google Scholar 

  12. Bridgman PW (1931) Dimensional analysis, 2nd edn. Yale University Press, New Haven

    Google Scholar 

  13. Buckingham E (1914) On physically similar systems: illustrations of the use of dimensional equations. Phys Rev 4:345–376

    Article  Google Scholar 

  14. Burks AW, Goldstine HH, von Neumann J (1963) Preliminary discussion of the logical design of an electronic computing instrument. In: John von Neumann collected works., vol V. Pergamon, Elmsford, pp 34–79

    Google Scholar 

  15. Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined architectures. J Parallel Distrib Comput 5:334–358

    Article  Google Scholar 

  16. Comms1. http://phase.hpcc.jp/mirrors/netlib/parkbench/gbis/benchmark_results/comms1/

  17. Dongarra JJ, Hey T, Strohmaier E (1996) Selected results from the ParkBench Benchmark. In: Bougé L, Fraigniaud P, Mignotte A, Robert Y (eds) Euro-par’96 parallel processing, second international euro-par conference, Proceedings, vol. II, Lyon, France, 26–29 August 1996. Lecture notes in computer science, vol 1124. Springer, Berlin, pp 251–254

    Google Scholar 

  18. Drobot S (1954) On the foundations of dimensional analysis. Studia Math 14:84–99

    MathSciNet  Google Scholar 

  19. Einstein A (1911) Elementare Betrachtungen über die thermische Molekularbewegung in festen Körpern. Ann Phys 35:679–694

    MATH  Google Scholar 

  20. Focken CM (1953) Dimensional methods and their applications. Edward Arnold, Sevenoaks

    Google Scholar 

  21. Baptiste J, Fourier J (1952) Analytical theory of heat. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago

    Google Scholar 

  22. Fox GC, Otto SW, Hey AJG (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31

    Article  MATH  Google Scholar 

  23. Gupta A, Kumar V, Sameh A (1995) Performance and scalability of preconditioned conjugate gradient methods on parallel computers. IEEE Trans Parallel Distrib Syst 6(5):455–469

    Article  Google Scholar 

  24. Gustafson JL, Montry GR, Benner RE (1988) Development of parallel methods for a 1024-processor hypercube. SIAM J Sci Stat Comput 9:609–638

    Article  MATH  MathSciNet  Google Scholar 

  25. Gustafson J, Rover D, Elbert S, Carter M (1991) The design of a scalable, fixed-time computer benchmark. J Parallel Distrib Comput 12(4):388–401

    Article  Google Scholar 

  26. Hartstein A, Srinivasan V, Puzak TR, Emma PG (2006) Cache miss behavior: is it \(\sqrt{2}\)? In: Proceedings of the 3rd conference on computing frontiers, May 3–5, Ischia, Italy. ACM Press, New York, pp 313–320

    Google Scholar 

  27. Hockney RW, Curington IJ (1989) f-half: a parameter to characterise memory and communication bottlenecks. Parallel Comput 10:277–286

    Article  MATH  Google Scholar 

  28. Hockney RW, Getov VS (1998) Low-level benchmarking: performance profiles. In: Proceedings of the sixth Euromicro workshop on parallel and distributed processing, PDP’98, Madrid, Spain, January 21–23 1998, pp 50–56

    Chapter  Google Scholar 

  29. Hockney R, Berry M (1994) Public international benchmarks for parallel computer. Sci Program 3(2):101–146

    Google Scholar 

  30. Hockney RW (1991) Performance parameters and benchmarking of supercomputers. Parallel Comput 17:1111–1130

    Article  Google Scholar 

  31. Hockney RW (1995) Computational similarity. Concurrency Pract Exper 7(2):147–166

    Article  Google Scholar 

  32. Hockney RW (1996) The science of computer benchmarking. SIAM, Philadelphia

    Book  Google Scholar 

  33. Hockney RW, Jesshpe CR (1988) Parallel computers 2: architecture, programming and algorithms. Adam Hilger/IOP Publishing, Bristol and Philadelphia

    MATH  Google Scholar 

  34. HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/

  35. Jackson JD (1962) Classical electrodynamics. Wiley, New York

    Google Scholar 

  36. Kapitza SP (1966) A natural system of units in classical electrodynamics and electronics. Sov Phys Usp 9(1):184–186

    Article  Google Scholar 

  37. Krantz DH, Duncan Luce R, Suppes P, Tversky A (1971) Foundations of measurement volume I: additive and polynomial representations. Academic Press, New York. Dover reprint (2007)

    Google Scholar 

  38. Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79

    Article  Google Scholar 

  39. Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117

    Article  Google Scholar 

  40. McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society; Technical committee on Computer Architecture Newsletter, December 1995

  41. Miles D (1993) Compute intensity and the FFT. In: Proceedings supercomputing 1993, pp. 676–684

  42. Murphy RC, Kogge PM (2007) On the memory access patterns of supercomputer applications: benchmark selection and its implications. IEEE Trans Comput 56(7):937–945

    Article  MathSciNet  Google Scholar 

  43. National Institute of Standards and Technology. Prefixes for binary multiples. http://physics.nist.gov/cuu/Units/binary.html

  44. Newton I (1952) Mathematical principles of natural philosophy. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago, 1725

    Google Scholar 

  45. Numrich RW (1988) CRAY-2 common memory. Technical report HN-2043, Cray Research, Inc., Mendota Heights, MN, September 1988

  46. Numrich RW (1992) Cray-2 memory organization and interprocessor memory contention. In: Meyer C, Plemmons RJ (eds) Linear algebra, Markov chains, and queueing models. The IMA volumes in mathematics and its applications, vol 48. Springer, Berlin, pp 267–294

    Chapter  Google Scholar 

  47. Numrich RW (1992) Memory contention for shared memory vector multiprocessors. In: Proceedings of supercomputing’92. IEEE Comput. Soc., Los Alamitos, pp 316–325

    Chapter  Google Scholar 

  48. Numrich RW (1997) Computational force, mass, and energy. Int J Modern Phys C 8(3):437–457

    Article  Google Scholar 

  49. Numrich RW (2005) Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax. Parallel Comput 31:588–607

    Article  Google Scholar 

  50. Numrich RW (2007) A note on scaling the Linpack benchmark. J Parallel Distrib Comput 67(4):491–498

    Article  MATH  Google Scholar 

  51. Numrich RW (2007) Computational force: a unifying concept for scalability analysis. In: Bischof C, Bücker M, Gibbon P, Joubert G, Lippert T, Mohr B, Peters F (eds) Parallel computing: architectures, algorithms and applications, proceedings of the international conference ParCo 2007, pp 107–112. John von Neumann Institute for Computing (NIC) and Jülich Supercomputing Centre

    Google Scholar 

  52. Numrich RW (2008) Computational forces in the Linpack benchmark. J Parallel Distrib Comput 68(9):1283–1290

    Article  MATH  Google Scholar 

  53. Numrich RW (2008) Dimensional analysis applied to a parallel QR algorithm. In: Parallel processing and applied mathematics: proceedings of the seventh international conference on parallel processing and applied mathematics (PPAM07), 9–12 September 2007, Gdansk, Poland. Springer lecture notes in computer science, vol 4967, pp 148–157

    Chapter  Google Scholar 

  54. Numrich RW (2009) Computational forces in the SAGE benchmark. J Parallel Distrib Comput 69(3):315–325

    Article  Google Scholar 

  55. Numrich RW, Heroux MA (2009) A performance model with a fixed point for a molecular dynamics kernel. Comput Sci Res Develop 23(3–4):195–201

    Article  Google Scholar 

  56. Numrich RW, Heroux MA Self-similarity of parallel machines. Parallel Comput (2010, in press)

  57. Numrich RW, Hochstein L, Basili V (2005) A metric space for productivity measurement in software development. In: Proceedings SE-HPCS’05, second international workshop on software engineering for high performance computing system applications, St. Louis, Missouri, 15 May 2005

    Google Scholar 

  58. Numrich RW, Springer PL, Peterson JC (1994) Measurement of communication rates on the CRAY-T3D interprocessor network. In: High-performance computing and networking, international conference and exhibition, Munich, Germany, April 18–20, proceedings, volume II: networking and tools. Lecture notes in computer science, vol 797. Springer, Berlin, pp 150–157

    Google Scholar 

  59. Olver PJ (1993) Applications of Lie groups to differential equations, 2nd edn. Springer, Berlin

    Book  MATH  Google Scholar 

  60. Penrose R (2005) The road to reality: a complete guide to the laws of the universe. A.A. Knopf, New York

    Google Scholar 

  61. Planck M (1914) The theory of heat radiation. P. Blakiston’s Son and Co., Philadelphia

    Google Scholar 

  62. Planck M (1932) Theory of electricity and magnetism. Macmillan and Co., London

    MATH  Google Scholar 

  63. Reynolds O (1901) An experimental investigation of the circumstances which determine whether the motion shall be direct or sinuous, and of the law of resistance in parallel channels. In: Papers on mechanical and physical subjects. Cambridge University Press, Cambridge, pp 51–105

    Google Scholar 

  64. Sedov LI (1959) Similarity and dimensional methods in mechanics. Infosearch Ltd., London

    MATH  Google Scholar 

  65. Simon HD, Strohmaier E (1995) Statistical analysis of NAS parallel benchmarks and LINPACK results. In: High-performance computing and networking. Springer lecture notes in computer science, vol 919, pp 626–633

    Chapter  Google Scholar 

  66. Singh JP, Stone HS, Thiébaut D (1992) A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Trans Comput 41(7):811–825

    Article  Google Scholar 

  67. Smith AJ (1982) Cache memories. ACM Comput Surv 14(3):473–530

    Article  Google Scholar 

  68. Smith AJ (1987) Line (block) size choice for CPU cache memories. IEEE Trans Comput C-36(9):1063–1075

    Article  Google Scholar 

  69. Snir M, Yu J (2005) On the theory of spatial and temporal locality. Technical report UIUC DCS-R-2005-2611, University of Illinois Urbana Champaign

  70. Johnstone Stoney G (1881) On the physical units of nature. Philos Mag XI:381–390

    Article  Google Scholar 

  71. Strohmaier E (1995) Extending the concept of computational similarity for analyzing complex benchmarks. Technical report RUM 43/95, Computing Center, University of Mannheim, Germany, April 1995

  72. Strohmaier E (1995) Using computational similarity to analyze the performance of the NAS parallel benchmarks. Technical report RUM 44/95, Computing Center, University of Mannheim, Germany, April 1995

  73. Strohmaier E (1997) Statistical performance modeling: case study of the NPB 2.1 results. In: Proceedings of euro-par’97 parallel processing. Springer lecture notes in computer science, vol 1300, pp 985–992

    Chapter  Google Scholar 

  74. Taylor BN (1995) Guide for the use of the international system of units (SI). Special publication 811, National Institute of Standards and Technology

  75. Taylor BN (2001) The international system of units (SI). Special publication 330, National Institute of Standards and Technology

  76. Thiébaut D (1989) On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio. IEEE Trans Comput 38(7):1012–1026

    Article  Google Scholar 

  77. Thiébaut D, Wolf JL, Stone HS (1992) Synthetic traces for trace-driven simulation of cache memories. IEEE Trans Comput 41(4):388–410

    Article  Google Scholar 

  78. Tolman RC (1987) Relativity thermodynamics and cosmology. Dover, New York

    Google Scholar 

  79. Top 500 Benchmark. http://www.top500.org/lists/linpack.php

  80. Weinberg J, Snavely A, McCracken MO, Strohmaier E (2005) Measurement of spatial and temporal locality in memory access patterns. In: Proceedings of supercomputing’05. IEEE Comput Soc, Los Alamitos

    Google Scholar 

  81. White BS, McKee SA, de Supinski BR, Miller B, Quinlanl D, Schulz M (2005) Improving the computational intensity of unstructured mesh applications. In: ICS’05: proceedings of the 19th annual international conference on supercomputing. ACM, New York, pp 341–350

    Chapter  Google Scholar 

  82. Willmore TJ (1959) An introduction to differential geometry. English language book society. Oxford University Press, London

    Google Scholar 

  83. Worley PJ (1990) The effect of time constraints on scaled speedup. SIAM J Sci Stat Comput 11(5):838–858

    Article  MATH  MathSciNet  Google Scholar 

  84. Yu J, Baghsorkhi S, Snir M (2005) A new locality metric and case studies for HPCS benchmarks. Technical Report UIUC DCS-R-2005-2564, University of Illinois Urbana Champaign

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert W. Numrich.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Numrich, R.W. Computer performance analysis and the Pi Theorem. Comput Sci Res Dev 29, 45–71 (2014). https://doi.org/10.1007/s00450-010-0147-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-010-0147-8

Keywords

Navigation