Computer comparisons in the presence of performance variation

Irving, Samuel; Li, Bin; Chen, Shaoming; Peng, Lu; Zhang, Weihua; Duan, Lide

doi:10.1007/s11704-018-7319-2

Computer comparisons in the presence of performance variation

Research Article
Published: 29 December 2018

Volume 14, pages 21–41, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Samuel Irving^1,2,
Bin Li¹,
Shaoming Chen¹,
Lu Peng¹,
Weihua Zhang^2,3,4 &
…
Lide Duan⁵

151 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that for both PARSEC and high-variance BigDataBench benchmarks 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g., ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance. We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines. We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5. Finally, we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey About Quantitative Measurement of Performance Variability in High Performance Computers

Runtime Fragility in Main Memory

Managing and Enhancing Performance Benchmarks

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Alameldeen A R, Wood D A. Variability in architectural simulations of multi-threaded workloads. In: Proceedings of the 9th IEEE International Symposium on High Performance Computer Architecture. 2003, 7–18
Google Scholar
George A, Buytaer D, Eeckhout L. Statistically rigorous java performance evaluation. ACM SIGPLAN Notices, 2007, 42(10): 57–76
Article Google Scholar
Mytkowicz T, Diwan A, Hauswirth M, Sweeney P F. Producing wrong data without doing anything obviously wrong. In: Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2009, 265–276
Google Scholar
Krishnamurthi S, Vitek J. The real software crisis: repeatability as a core value. Communications of ACM, 2015, 58(3): 34–36
Article Google Scholar
Chen T, Guo Q, Temam O, Wu Y, Bao Y, Xu Z, Chen Y. Statistical performance comparisons of computers. IEEE Transactions on Computers, 2015, 64(5): 1442–1455
Article MathSciNet Google Scholar
Freund R J, Mohr D,Wilson WJ. Statistical Methods. 3rd ed. London: Academic Press, 2010
Google Scholar
Chen T, Chen Y, Guo Q, Temam O, Wu Y, Hu W. Statistical performance comparisons of computers. In: Proceedings of the 18th IEEE International Symposium On High Performance Computer Architecture. 2012, 1–12
Google Scholar
Hollander M, Wolfe D A. Nonparametric Statistical Methods. 2nd ed. New York: John Wiley & Sons, 1999
MATH Google Scholar
Moore D, McCabe G P, Craig B. Introduction to the Practice of Statistics. 7th ed. New York: W. H. Freeman Press, 2010
Google Scholar
Edgington E S. Randomization Tests. 3rd ed. New York: Marcel- Dekker, 1995
MATH Google Scholar
Davison A C, Hinkley D V. Bootstrap Methods and Their Application. New York: Cambridge University Press, 1997
Book Google Scholar
Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y. Bigdatabench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High-Performance Computer Architecture. 2014, 488–499
Google Scholar
Gower J C, Lubbe S G, Roux N L. Understanding Biplots. Hoboken: John Wiley & Sons, 2011
Book Google Scholar
Efron B, Tibshirani R J. An Introduction to the Bootstrap. New York: Chapman and Hall/CRC, 1994
MATH Google Scholar
Fleming P J, Wallace J J. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 1986, 29(3): 218–221
Article Google Scholar
Johnson R A. Statistics: Principles and Methods. 6th ed. New York: John Wiley & Sons, 2009
Google Scholar
Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 2008, 72–81
Chapter Google Scholar
Citron D, Hurani A, Gnadrey A. The harmonic or geometric mean: does it really matter? ACM SIGARCH Computer Architecture News, 2006, 34(4): 18–25
Article Google Scholar
Iqbal M F, John L K. Confusion by all means. In: Proceedings of the 6th International Workshop on Unique chips and Systems. 2010, 1–6
Google Scholar
Mashey J R. War of the benchmark means: time for a truce. ACM SIGARCH Computer Architecture News, 2004, 32(4): 1–14
Article Google Scholar
Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach. 4th ed. Walthan: Morgan Kaufmann, 2007
Google Scholar
Eeckhout L. Computer Architecture Performance Evaluation Methods. California: Morgan & Claypool Press, 2010
Book Google Scholar
Lilja D J. Measuring Computer Performance: A Practitioner’s Guide. New York: Cambridge University Press, 2000
Book Google Scholar
Oliveira A, Fischmeister S, Diwan A, Hauswirth M, Sweeney P F. Why you should care about quantile regression. In: Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2013, 207–218
Google Scholar
Patil S, Lilja D J. Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. IEEE Computer Architecture Letters, 2010, 9(1): 1–4
Article Google Scholar
Iosup A, Yigitbasi N, Epema D H J. On the performance variability of production cloud services. In: Proceedings of IEEE/ACMInternational Symposium on Cluster, Cloud and Grid Computing, Newport Beach. 2011, 104–113
Google Scholar
Leitner P, Cito J. Patterns in the chaos—a study of performance variation and predictability in public IaaS clouds. ACM Transactions on Internet Technology, 2016, 16(3): 15
Article Google Scholar
Zhang W, Ji X, Song B, Yu S, Chen H, Li T, Yew P, Zhao W. Varcatcher: a pramework for tackling performance variability of parallel workloads on multi-core. IEEE Transactions on Parallel and Distributed Systems, 2016, 28: 1215–1228
Article Google Scholar
Pusukuri K K, Gupta R, Bhuyan A N. Thread tranquilizer: dynamically reducing performance variation. ACM Transactions on Architecture and Code Optimization, 2012, 8(4): 46–66
Article Google Scholar
Jimenez I, Maltzahn C, Lofstead J, Moody A, Mohror K, Arpaci-Dusseau R, Arpaci-Dusseau A. Characterizing and reducing cross-platform performance variability using OS-level virtualization. In: Proceedings of the 1st IEEE International Workshop on Variability in Parallel and Distributed Systems. 2016, 1077–1080
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National High Technology Research and Development Program of China (2015AA015303), the National Natural Science Foundation of China (Grant No. 61672160), and Shanghai Science and Technology Development Funds (17511102200), National Science Foundation (NSF) (CCF-1017961, CCF- 1422408, and CNS-1527318). We acknowledge the computing resources provided by the Louisiana Optical Network Initiative (LONI) HPC team. Finally, we appreciate invaluable comments from anonymous reviewers.

Author information

Authors and Affiliations

Louisiana State University, Baton Rouge, LA, 70803, USA
Samuel Irving, Bin Li, Shaoming Chen & Lu Peng
Shanghai Institute of Intelligent Electronics & Systems, Shanghai, 201203, China
Samuel Irving & Weihua Zhang
Software School, Fudan University, Shanghai, 201203, China
Weihua Zhang
Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, 200433, China
Weihua Zhang
University of Texas at San Antonio, San Antonio, TX, 78249, USA
Lide Duan

Authors

Samuel Irving
View author publications
Search author on:PubMed Google Scholar
Bin Li
View author publications
Search author on:PubMed Google Scholar
Shaoming Chen
View author publications
Search author on:PubMed Google Scholar
Lu Peng
View author publications
Search author on:PubMed Google Scholar
Weihua Zhang
View author publications
Search author on:PubMed Google Scholar
Lide Duan
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Weihua Zhang.

Additional information

Samuel Irving received the bachelor’s degrees in both computer science and electrical engineering from Louisiana State University (LSU), USA in December 2011. He is currently enrolled in the Computer Engineering PhD program and received the Donald W. Clayton PhD Assistantship at LSU. His research interests include machine learning, big data analytics, and heterogeneous architecture design.

Bin Li received his Bachelor’s degree in Biophysics from Fudan University, China. He obtained his Master’s degree in Biometrics (08/2002) and PhD degree in Statistics (08/2006) from The Ohio State University, USA. He is an associate professor with the Experimental Statistics department at Louisiana State University, USA. His research interests include statistical learning & data mining, statistical modeling on massive and complex data, and Bayesian statistics. He received the Ransom Marian Whitney Research Award in 2006 and a Student Paper Competition Award from ASA on Bayesian Statistical Science in 2005. He is a member of the Institute ofMathematical Statistics (IMS) and American Statistical Association (ASA).

Shaoming Chen received the bachelor’s and master’s degrees in electronics and information engineering from the Huazhong University of Science and Technology, China in 2008 and 2011, respectively. He is currently working in AMD after receiving the PhD degree in electrical and computer engineering at Louisiana State University, USA in August 2016. His research interests include sub-memory system design and cost optimization of data centers.

Lu Peng received the bachelor’s and master’s degrees in computer science and engineering from Shanghai Jiao Tong University, China, and the PhD degree in computer engineering from the University of Florida in Gainesville in April 2005. He is currently Gerard L. “Jerry” Rispone professor with the Division of Electrical and Computer Engineering at Louisiana State University, USA. His research focus on memory hierarchy system, reliability, power efficiency and other issues in processor design. He received an ORAU Ralph E. Power Junior Faculty Enhancement Awards in 2007 and the Best Paper Award (processor architecture track) from IEEE International Conference on Computer Design in 2001. He is on the editor board of Microprocessors and Microsystems.

Weihua Zhang received the PhD degree in computer science from Fudan University in 2007. He is currently an associate professor of Parallel Processing Institute, Fudan University, China. His research interests are in compilers, computer architecture, parallelization and systems software.

Lide Duan is currently an assistant professor in the Department of Electrical and Computer Engineering at The University of Texas at San Antonio, USA. Prior to joining UTSA, he worked as a senior CPU design engineer at AMD, working on future x86 based high performance and low power CPU microarchitecture design and performance modeling. He received a PhD in Computer Engineering from Louisiana State University, USA in 2011. His PhD research focused on soft error reliability analysis and prediction for processors at computer architecture level. He also received a bachelor’s degree in Computer Science from Shanghai Jiao Tong University, China in 2006.

Electronic supplementary material