skip to main content
10.1145/2597917.2597928acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

SKOPE: a framework for modeling and exploring workload behavior

Published: 20 May 2014 Publication History

Abstract

Understanding workload behavior plays an important role in performance studies. The growing complexity of applications and architectures has increased the gap among application developers, performance engineers, and hardware designers. To reduce this gap, we propose SKOPE, a SKeleton framework for Performance Exploration, that produces a descriptive model about the semantic behavior of a workload, which can infer potential transformations and help users understand how workloads may interact with and adapt to emerging hardware. SKOPE models can be shared, annotated, and studied by a community of performance engineers and system designers; they offer readability in the frontend and versatility in the backend. SKOPE can be used for performance analysis, tuning, and projection. We provide two example use cases. First, we project GPU performance from CPU code without GPU programming or accessing the hardware, and are able to automatically explore transformations and the projected best-achievable performance deviates from the measured by 18% on average. Second, we project the multi-node scaling trends of two scientific workloads, and are able to achieve a projection accuracy of 95%.

References

[1]
Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. FastFlow: high-level and efficient streaming on multi-core. In Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13. January 2013.
[2]
C. Augonnet, S. Thibault, R. Namyst, and P-A Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comp.: Pract. Exper., 23:187--198, Feb 2011.
[3]
K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Using performance modeling to design large-scale systems. Computer, 42, November 2009.
[4]
Richard F. Barrett, Michael A. Heroux, Paul T. Lin, Courtenay T. Vaughan, and Alan B. Williams. Poster: mini-applications: vehicles for co-design. In SC, 2011.
[5]
L. Carrington, M. M. Tikir, C. Olschanowsky, M. Laurenzano, J. Peraza, A. Snavely, and S. Poole. An idiom-finding tool for increasing productivity of accelerators. In ICS, 2011.
[6]
Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, and John Shalf. Software design space exploration for exascale combustion co-design. In ISC, June 2013.
[7]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphisc processors using CUDA. JPDC, 2008.
[8]
J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In PPoPP, 2010.
[9]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. Logp: towards a realistic model of parallel computation. In PPoPP, 1993.
[10]
Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: a domain specific language for building portable mesh-based PDE solvers. In SC, pages 9:1--9:12, 2011.
[11]
G. P. Ely, S. M. Day, and J.-B. Minster. Dynamic rupture models for the southern San Andreas fault. Bull. Seism. Soc. Am., 100(1):131--150, 2010.
[12]
CAPS Enterprise, Cray Inc., NVIDIA, and the Portland Group. The OpenACC application programming interface, Nov. 2011.
[13]
S. Ethier, W. Tang, and Z. Lin. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. SciDAC 2005, Journal of Physics: Conference Series, 16:1--15, Nov. 2011.
[14]
H. Gahvari, A. H. Baker, M. Schulz, U. M. Yang, K. E. Jordan, and W. Gropp. Modeling the performance of an algebraic multigrid cycle on HPC platforms. In ICS, 2011.
[15]
Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, 2010.
[16]
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Trans. Parallel Distrib. Syst., 2, 1991.
[17]
S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009.
[18]
Kenneth Hoste and Lieven Eeckhout. Microarchitecture-independent workload characterization. IEEE Micro, 27(3):63--72, 2007.
[19]
T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. PLDI, 2011.
[20]
M. H. Kalos, M. A. Lee, P. A. Whitlock, and G. V. Chester. Modern potentials and the properties of condensed 4He. In Phys. Rev. C 66, 044310-1:14, 1981.
[21]
R. M. Karp. A survey of parallel algorithms for shared-memory machines. 1988.
[22]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC, page 37, 2001.
[23]
D. J. Kerbyson and A. Hoisie. Performance modeling of the blue gene architecture. In John Vincent Atanasoff Symposium, pages 252--259, 2006.
[24]
Darren J. Kerbyson, Henry J. Alme, Adolfy Hoisie, Fabrizio Petrini, Harvey J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC, 2001.
[25]
Khronos Group Std. The OpenCL Specification, Version 1.0. http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf, 2009.
[26]
B. C. Lee and D. M. Brooks. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In ASPLOS-XII, 2006.
[27]
B. C. Lee, D. M. Brooks, B. R. de Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP, 2007.
[28]
J. Meng, V. A. Morozov, K. Kumaran, V. Vishwanath, and T. D. Uram. GROPHECY: GPU performance projection from CPU code skeletons. In SC, 2011.
[29]
J. Meng, V. A. Morozov, V. Vishwanath, and K. Kumaran. Dataflow-driven GPU performance projection for multi-kernel transformations. In SC, 2012.
[30]
J. Meng and K. Skadron. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In ICS, 2009.
[31]
Scott Pakin. The Design and Implementation of a Domain-Specific Language for Network Performance Testing. IEEE Transactions on Parallel and Distributed Systems, 18(10), October 2007.
[32]
J. A. Pienaar, A. Raghunathan, and S. Chakradhar. MDR: performance model driven runtime for heterogeneous parallel platforms. In ICS, 2011.
[33]
S. C. Pieper, K. Varga, and R. B. Wiringa. Quantum Monte Carlo calculations of A=9,10 nuclei. In Phys. Rev. C 66, 044310-1:14, 2002.
[34]
S. C. Pieper and R. B. Wiringa. Quantum Monte Carlo calculations of light nuclei. In Annu. Rev. Nucl. Part. Sci. 51, 53, 2001.
[35]
Daniel J. Quinlan. Rose: Compiler support for object-oriented frameworks. Parallel Processing Letters, 10(2/3):215--226, 2000.
[36]
Gabe Rudy, Malik Murtaza Khan, Mary Hall, Chun Chen, and Jacqueline Chame. A programming language interface to describe transformations and code generation. In LCPC, 2011.
[37]
Y. S. Shao and D. Brooks. Isa-independent workload characterization and its implications for specialized architectures. ISPASS, 2013.
[38]
Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc. GPUPerf: A performance analysis framework for identifying potential benefits in GPGPU applications. In PPoPP, 2012.
[39]
A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for performance modeling and prediction. In SC, 2002.
[40]
K. L. Spafford and J. S. Vetter. Aspen - a domain specific language for performance modeling. In SC, 2012.
[41]
A. K. Sujeeth, H. Lee, K J. Brown, H. Chafi, M. Wu, A. R. Atreya, K. Olukotun, T. Rompf, and M. Odersky. OptiML: an implicitly parallel domain specific language for machine learning. In ICML, 2011.
[42]
V. Taylor, X. Wu, and R. Stevens. Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. SIGMETRICS Perform. Eval. Rev., 30(4):13--18, March 2003.
[43]
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. the International Journal of High Performance Computing Applications, (1):49--66, 2005.
[44]
Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8), Aug 1990.
[45]
S. Williams, A. Waterman, and D. Patterson. Rofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, Apr 2009.
[46]
L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC, 2005.

Cited By

View all
  • (2022)Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313424733:9(2079-2092)Online publication date: 1-Sep-2022
  • (2022)Workflow simulation and multi-threading aware task scheduling for heterogeneous computingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.05.011168:C(17-32)Online publication date: 1-Oct-2022
  • (2019)A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance PredictionProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00042(444-455)Online publication date: 23-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers
May 2014
305 pages
ISBN:9781450328708
DOI:10.1145/2597917
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

CF'14
Sponsor:
CF'14: Computing Frontiers Conference
May 20 - 22, 2014
Cagliari, Italy

Acceptance Rates

CF '14 Paper Acceptance Rate 28 of 62 submissions, 45%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313424733:9(2079-2092)Online publication date: 1-Sep-2022
  • (2022)Workflow simulation and multi-threading aware task scheduling for heterogeneous computingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.05.011168:C(17-32)Online publication date: 1-Oct-2022
  • (2019)A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance PredictionProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00042(444-455)Online publication date: 23-Sep-2019
  • (2018)Prometheus: Coherent Exploration of Hardware and Software Optimizations Using Aspen2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2018.00032(244-250)Online publication date: Sep-2018
  • (2018)A Performance Prediction Framework for Irregular Applications2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00042(304-313)Online publication date: Dec-2018
  • (2017)Analytical Performance Modeling and Validation of Intel's Xeon Phi ArchitectureProceedings of the Computing Frontiers Conference10.1145/3075564.3075593(247-250)Online publication date: 15-May-2017
  • (2017)A systematic approach to developing near real-time performance predictions based on physiological measures2017 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA)10.1109/COGSIMA.2017.7929601(1-7)Online publication date: Mar-2017
  • (2016)Exploiting Performance Portability in Search Algorithms for Autotuning2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.85(1535-1544)Online publication date: May-2016
  • (2016)SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks2016 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2016.7581272(1-11)Online publication date: Sep-2016
  • (2016)Sensing and Assessing Cognitive Workload Across Multiple TasksProceedings, Part I, 10th International Conference on Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience - Volume 974310.1007/978-3-319-39955-3_41(440-450)Online publication date: 17-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media