Abstract
This paper applies the Pi Theorem of dimensional analysis to a representative set of examples from computer performance analysis. It is a survey paper that takes a different look at problems involving latency, bandwidth, cache-miss ratios, and the efficiency of parallel numerical algorithms. The Pi Theorem is the fundamental tool of dimensional analysis, and it applies to problems in computer performance analysis just as well as it does to problems in other sciences. Applying it requires the definition of a system of measurement appropriate for computer performance analysis with a consistent set of units and dimensions. Then a straightforward recipe for each specific problem reduces the number of independent variables to a smaller number of dimensionless parameters. Two machines with the same values of these parameters are self-similar and behave the same way. Self-similarity relationships emphasize how machines are the same rather than how they are different. The Pi Theorem is simple to state and simple to prove, using purely algebraic methods, but the results that follow from it are often surprising and not simple at all. The results are often unexpected but they almost always reveal something new about the problem at hand.
Similar content being viewed by others
References
Agarwal A, Horowitz M, Hennessy J (1989) An analytical cache model. ACM Trans Comput Syst 7(2):184–215
Alam S, Kuehn JA, Barrett RF, Larkin JM, Fahey MR, Sankaran R, Worley PH (2007) Cray xt4: an early evaluation for petascale scientific simulation. In: SC’07: proceedings of the 2007 ACM/IEEE conference on supercomputing. ACM, New York, pp 1–12
Barenblatt GI (1987) Dimensional analysis. Gordon and Breach Science Publishers, New York
Barenblatt GI (1996) Scaling, self-similarity, and intermediate asymptotics. Cambridge University Press, Cambridge
Barenblatt GI (2003) Scaling. Cambridge University Press, Cambridge
Birge RT (1934) On electric and magnetic units and dimensions. Am Phys Teach 2(2):41–48
Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part I. Am Phys Teach 3:102–109
Birge RT (1935) On the establishment of fundamental and derived units, with special reference to electric units. Part II. Am Phys Teach 3:171–179
Birkhoff G (1960) Hydrodynamics: a study in logic, fact and similitude, 2nd edn. Princeton University Press, Princeton
Bond WN (1930) Concerning electrical and other dimensions. Philos Mag 9:842–847
Brand L (1957) The Pi Theorem of dimensional analysis. Arch Ration Mech Anal 1:35–45
Bridgman PW (1931) Dimensional analysis, 2nd edn. Yale University Press, New Haven
Buckingham E (1914) On physically similar systems: illustrations of the use of dimensional equations. Phys Rev 4:345–376
Burks AW, Goldstine HH, von Neumann J (1963) Preliminary discussion of the logical design of an electronic computing instrument. In: John von Neumann collected works., vol V. Pergamon, Elmsford, pp 34–79
Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined architectures. J Parallel Distrib Comput 5:334–358
Comms1. http://phase.hpcc.jp/mirrors/netlib/parkbench/gbis/benchmark_results/comms1/
Dongarra JJ, Hey T, Strohmaier E (1996) Selected results from the ParkBench Benchmark. In: Bougé L, Fraigniaud P, Mignotte A, Robert Y (eds) Euro-par’96 parallel processing, second international euro-par conference, Proceedings, vol. II, Lyon, France, 26–29 August 1996. Lecture notes in computer science, vol 1124. Springer, Berlin, pp 251–254
Drobot S (1954) On the foundations of dimensional analysis. Studia Math 14:84–99
Einstein A (1911) Elementare Betrachtungen über die thermische Molekularbewegung in festen Körpern. Ann Phys 35:679–694
Focken CM (1953) Dimensional methods and their applications. Edward Arnold, Sevenoaks
Baptiste J, Fourier J (1952) Analytical theory of heat. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago
Fox GC, Otto SW, Hey AJG (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31
Gupta A, Kumar V, Sameh A (1995) Performance and scalability of preconditioned conjugate gradient methods on parallel computers. IEEE Trans Parallel Distrib Syst 6(5):455–469
Gustafson JL, Montry GR, Benner RE (1988) Development of parallel methods for a 1024-processor hypercube. SIAM J Sci Stat Comput 9:609–638
Gustafson J, Rover D, Elbert S, Carter M (1991) The design of a scalable, fixed-time computer benchmark. J Parallel Distrib Comput 12(4):388–401
Hartstein A, Srinivasan V, Puzak TR, Emma PG (2006) Cache miss behavior: is it \(\sqrt{2}\)? In: Proceedings of the 3rd conference on computing frontiers, May 3–5, Ischia, Italy. ACM Press, New York, pp 313–320
Hockney RW, Curington IJ (1989) f-half: a parameter to characterise memory and communication bottlenecks. Parallel Comput 10:277–286
Hockney RW, Getov VS (1998) Low-level benchmarking: performance profiles. In: Proceedings of the sixth Euromicro workshop on parallel and distributed processing, PDP’98, Madrid, Spain, January 21–23 1998, pp 50–56
Hockney R, Berry M (1994) Public international benchmarks for parallel computer. Sci Program 3(2):101–146
Hockney RW (1991) Performance parameters and benchmarking of supercomputers. Parallel Comput 17:1111–1130
Hockney RW (1995) Computational similarity. Concurrency Pract Exper 7(2):147–166
Hockney RW (1996) The science of computer benchmarking. SIAM, Philadelphia
Hockney RW, Jesshpe CR (1988) Parallel computers 2: architecture, programming and algorithms. Adam Hilger/IOP Publishing, Bristol and Philadelphia
HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/
Jackson JD (1962) Classical electrodynamics. Wiley, New York
Kapitza SP (1966) A natural system of units in classical electrodynamics and electronics. Sov Phys Usp 9(1):184–186
Krantz DH, Duncan Luce R, Suppes P, Tversky A (1971) Foundations of measurement volume I: additive and polynomial representations. Academic Press, New York. Dover reprint (2007)
Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79
Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117
McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society; Technical committee on Computer Architecture Newsletter, December 1995
Miles D (1993) Compute intensity and the FFT. In: Proceedings supercomputing 1993, pp. 676–684
Murphy RC, Kogge PM (2007) On the memory access patterns of supercomputer applications: benchmark selection and its implications. IEEE Trans Comput 56(7):937–945
National Institute of Standards and Technology. Prefixes for binary multiples. http://physics.nist.gov/cuu/Units/binary.html
Newton I (1952) Mathematical principles of natural philosophy. In: Hutchins RM (ed) Great Books of the Western World. Encyclopedia Britannica, Chicago, 1725
Numrich RW (1988) CRAY-2 common memory. Technical report HN-2043, Cray Research, Inc., Mendota Heights, MN, September 1988
Numrich RW (1992) Cray-2 memory organization and interprocessor memory contention. In: Meyer C, Plemmons RJ (eds) Linear algebra, Markov chains, and queueing models. The IMA volumes in mathematics and its applications, vol 48. Springer, Berlin, pp 267–294
Numrich RW (1992) Memory contention for shared memory vector multiprocessors. In: Proceedings of supercomputing’92. IEEE Comput. Soc., Los Alamitos, pp 316–325
Numrich RW (1997) Computational force, mass, and energy. Int J Modern Phys C 8(3):437–457
Numrich RW (2005) Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax. Parallel Comput 31:588–607
Numrich RW (2007) A note on scaling the Linpack benchmark. J Parallel Distrib Comput 67(4):491–498
Numrich RW (2007) Computational force: a unifying concept for scalability analysis. In: Bischof C, Bücker M, Gibbon P, Joubert G, Lippert T, Mohr B, Peters F (eds) Parallel computing: architectures, algorithms and applications, proceedings of the international conference ParCo 2007, pp 107–112. John von Neumann Institute for Computing (NIC) and Jülich Supercomputing Centre
Numrich RW (2008) Computational forces in the Linpack benchmark. J Parallel Distrib Comput 68(9):1283–1290
Numrich RW (2008) Dimensional analysis applied to a parallel QR algorithm. In: Parallel processing and applied mathematics: proceedings of the seventh international conference on parallel processing and applied mathematics (PPAM07), 9–12 September 2007, Gdansk, Poland. Springer lecture notes in computer science, vol 4967, pp 148–157
Numrich RW (2009) Computational forces in the SAGE benchmark. J Parallel Distrib Comput 69(3):315–325
Numrich RW, Heroux MA (2009) A performance model with a fixed point for a molecular dynamics kernel. Comput Sci Res Develop 23(3–4):195–201
Numrich RW, Heroux MA Self-similarity of parallel machines. Parallel Comput (2010, in press)
Numrich RW, Hochstein L, Basili V (2005) A metric space for productivity measurement in software development. In: Proceedings SE-HPCS’05, second international workshop on software engineering for high performance computing system applications, St. Louis, Missouri, 15 May 2005
Numrich RW, Springer PL, Peterson JC (1994) Measurement of communication rates on the CRAY-T3D interprocessor network. In: High-performance computing and networking, international conference and exhibition, Munich, Germany, April 18–20, proceedings, volume II: networking and tools. Lecture notes in computer science, vol 797. Springer, Berlin, pp 150–157
Olver PJ (1993) Applications of Lie groups to differential equations, 2nd edn. Springer, Berlin
Penrose R (2005) The road to reality: a complete guide to the laws of the universe. A.A. Knopf, New York
Planck M (1914) The theory of heat radiation. P. Blakiston’s Son and Co., Philadelphia
Planck M (1932) Theory of electricity and magnetism. Macmillan and Co., London
Reynolds O (1901) An experimental investigation of the circumstances which determine whether the motion shall be direct or sinuous, and of the law of resistance in parallel channels. In: Papers on mechanical and physical subjects. Cambridge University Press, Cambridge, pp 51–105
Sedov LI (1959) Similarity and dimensional methods in mechanics. Infosearch Ltd., London
Simon HD, Strohmaier E (1995) Statistical analysis of NAS parallel benchmarks and LINPACK results. In: High-performance computing and networking. Springer lecture notes in computer science, vol 919, pp 626–633
Singh JP, Stone HS, Thiébaut D (1992) A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Trans Comput 41(7):811–825
Smith AJ (1982) Cache memories. ACM Comput Surv 14(3):473–530
Smith AJ (1987) Line (block) size choice for CPU cache memories. IEEE Trans Comput C-36(9):1063–1075
Snir M, Yu J (2005) On the theory of spatial and temporal locality. Technical report UIUC DCS-R-2005-2611, University of Illinois Urbana Champaign
Johnstone Stoney G (1881) On the physical units of nature. Philos Mag XI:381–390
Strohmaier E (1995) Extending the concept of computational similarity for analyzing complex benchmarks. Technical report RUM 43/95, Computing Center, University of Mannheim, Germany, April 1995
Strohmaier E (1995) Using computational similarity to analyze the performance of the NAS parallel benchmarks. Technical report RUM 44/95, Computing Center, University of Mannheim, Germany, April 1995
Strohmaier E (1997) Statistical performance modeling: case study of the NPB 2.1 results. In: Proceedings of euro-par’97 parallel processing. Springer lecture notes in computer science, vol 1300, pp 985–992
Taylor BN (1995) Guide for the use of the international system of units (SI). Special publication 811, National Institute of Standards and Technology
Taylor BN (2001) The international system of units (SI). Special publication 330, National Institute of Standards and Technology
Thiébaut D (1989) On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio. IEEE Trans Comput 38(7):1012–1026
Thiébaut D, Wolf JL, Stone HS (1992) Synthetic traces for trace-driven simulation of cache memories. IEEE Trans Comput 41(4):388–410
Tolman RC (1987) Relativity thermodynamics and cosmology. Dover, New York
Top 500 Benchmark. http://www.top500.org/lists/linpack.php
Weinberg J, Snavely A, McCracken MO, Strohmaier E (2005) Measurement of spatial and temporal locality in memory access patterns. In: Proceedings of supercomputing’05. IEEE Comput Soc, Los Alamitos
White BS, McKee SA, de Supinski BR, Miller B, Quinlanl D, Schulz M (2005) Improving the computational intensity of unstructured mesh applications. In: ICS’05: proceedings of the 19th annual international conference on supercomputing. ACM, New York, pp 341–350
Willmore TJ (1959) An introduction to differential geometry. English language book society. Oxford University Press, London
Worley PJ (1990) The effect of time constraints on scaled speedup. SIAM J Sci Stat Comput 11(5):838–858
Yu J, Baghsorkhi S, Snir M (2005) A new locality metric and case studies for HPCS benchmarks. Technical Report UIUC DCS-R-2005-2564, University of Illinois Urbana Champaign
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Numrich, R.W. Computer performance analysis and the Pi Theorem. Comput Sci Res Dev 29, 45–71 (2014). https://doi.org/10.1007/s00450-010-0147-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-010-0147-8