research-article

SKOPE: a framework for modeling and exploring workload behavior

Authors:

Vitali Morozov,

Venkatram Vishwanath,

Kalyan Kumaran,

Valerie TaylorAuthors Info & Claims

CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

Article No.: 6, Pages 1 - 10

https://doi.org/10.1145/2597917.2597928

Published: 20 May 2014 Publication History

Abstract

Understanding workload behavior plays an important role in performance studies. The growing complexity of applications and architectures has increased the gap among application developers, performance engineers, and hardware designers. To reduce this gap, we propose SKOPE, a SKeleton framework for Performance Exploration, that produces a descriptive model about the semantic behavior of a workload, which can infer potential transformations and help users understand how workloads may interact with and adapt to emerging hardware. SKOPE models can be shared, annotated, and studied by a community of performance engineers and system designers; they offer readability in the frontend and versatility in the backend. SKOPE can be used for performance analysis, tuning, and projection. We provide two example use cases. First, we project GPU performance from CPU code without GPU programming or accessing the hardware, and are able to automatically explore transformations and the projected best-achievable performance deviates from the measured by 18% on average. Second, we project the multi-node scaling trends of two scientific workloads, and are able to achieve a projection accuracy of 95%.

References

[1]

Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. FastFlow: high-level and efficient streaming on multi-core. In Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13. January 2013.

[2]

C. Augonnet, S. Thibault, R. Namyst, and P-A Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comp.: Pract. Exper., 23:187--198, Feb 2011.

Digital Library

[3]

K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Using performance modeling to design large-scale systems. Computer, 42, November 2009.

Digital Library

[4]

Richard F. Barrett, Michael A. Heroux, Paul T. Lin, Courtenay T. Vaughan, and Alan B. Williams. Poster: mini-applications: vehicles for co-design. In SC, 2011.

Digital Library

[5]

L. Carrington, M. M. Tikir, C. Olschanowsky, M. Laurenzano, J. Peraza, A. Snavely, and S. Poole. An idiom-finding tool for increasing productivity of accelerators. In ICS, 2011.

Digital Library

[6]

Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, and John Shalf. Software design space exploration for exascale combustion co-design. In ISC, June 2013.

[7]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphisc processors using CUDA. JPDC, 2008.

Digital Library

[8]

J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In PPoPP, 2010.

Digital Library

[9]

David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. Logp: towards a realistic model of parallel computation. In PPoPP, 1993.

Digital Library

[10]

Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: a domain specific language for building portable mesh-based PDE solvers. In SC, pages 9:1--9:12, 2011.

Digital Library

[11]

G. P. Ely, S. M. Day, and J.-B. Minster. Dynamic rupture models for the southern San Andreas fault. Bull. Seism. Soc. Am., 100(1):131--150, 2010.

[12]

CAPS Enterprise, Cray Inc., NVIDIA, and the Portland Group. The OpenACC application programming interface, Nov. 2011.

[13]

S. Ethier, W. Tang, and Z. Lin. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. SciDAC 2005, Journal of Physics: Conference Series, 16:1--15, Nov. 2011.

[14]

H. Gahvari, A. H. Baker, M. Schulz, U. M. Yang, K. E. Jordan, and W. Gropp. Modeling the performance of an algebraic multigrid cycle on HPC platforms. In ICS, 2011.

Digital Library

[15]

Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, 2010.

Digital Library

[16]

P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Trans. Parallel Distrib. Syst., 2, 1991.

Digital Library

[17]

S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009.

Digital Library

[18]

Kenneth Hoste and Lieven Eeckhout. Microarchitecture-independent workload characterization. IEEE Micro, 27(3):63--72, 2007.

Digital Library

[19]

T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. PLDI, 2011.

Digital Library

[20]

M. H. Kalos, M. A. Lee, P. A. Whitlock, and G. V. Chester. Modern potentials and the properties of condensed ⁴He. In Phys. Rev. C 66, 044310-1:14, 1981.

[21]

R. M. Karp. A survey of parallel algorithms for shared-memory machines. 1988.

[22]

D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC, page 37, 2001.

Digital Library

[23]

D. J. Kerbyson and A. Hoisie. Performance modeling of the blue gene architecture. In John Vincent Atanasoff Symposium, pages 252--259, 2006.

Digital Library

[24]

Darren J. Kerbyson, Henry J. Alme, Adolfy Hoisie, Fabrizio Petrini, Harvey J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC, 2001.

Digital Library

[25]

Khronos Group Std. The OpenCL Specification, Version 1.0. http://www.khronos.org/registry/cl/specs/opencl-1.0.33.pdf, 2009.

[26]

B. C. Lee and D. M. Brooks. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In ASPLOS-XII, 2006.

Digital Library

[27]

B. C. Lee, D. M. Brooks, B. R. de Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP, 2007.

Digital Library

[28]

J. Meng, V. A. Morozov, K. Kumaran, V. Vishwanath, and T. D. Uram. GROPHECY: GPU performance projection from CPU code skeletons. In SC, 2011.

Digital Library

[29]

J. Meng, V. A. Morozov, V. Vishwanath, and K. Kumaran. Dataflow-driven GPU performance projection for multi-kernel transformations. In SC, 2012.

Digital Library

[30]

J. Meng and K. Skadron. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In ICS, 2009.

Digital Library

[31]

Scott Pakin. The Design and Implementation of a Domain-Specific Language for Network Performance Testing. IEEE Transactions on Parallel and Distributed Systems, 18(10), October 2007.

Digital Library

[32]

J. A. Pienaar, A. Raghunathan, and S. Chakradhar. MDR: performance model driven runtime for heterogeneous parallel platforms. In ICS, 2011.

Digital Library

[33]

S. C. Pieper, K. Varga, and R. B. Wiringa. Quantum Monte Carlo calculations of A=9,10 nuclei. In Phys. Rev. C 66, 044310-1:14, 2002.

[34]

S. C. Pieper and R. B. Wiringa. Quantum Monte Carlo calculations of light nuclei. In Annu. Rev. Nucl. Part. Sci. 51, 53, 2001.

[35]

Daniel J. Quinlan. Rose: Compiler support for object-oriented frameworks. Parallel Processing Letters, 10(2/3):215--226, 2000.

[36]

Gabe Rudy, Malik Murtaza Khan, Mary Hall, Chun Chen, and Jacqueline Chame. A programming language interface to describe transformations and code generation. In LCPC, 2011.

Digital Library

[37]

Y. S. Shao and D. Brooks. Isa-independent workload characterization and its implications for specialized architectures. ISPASS, 2013.

[38]

Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc. GPUPerf: A performance analysis framework for identifying potential benefits in GPGPU applications. In PPoPP, 2012.

Digital Library

[39]

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for performance modeling and prediction. In SC, 2002.

Digital Library

[40]

K. L. Spafford and J. S. Vetter. Aspen - a domain specific language for performance modeling. In SC, 2012.

Digital Library

[41]

A. K. Sujeeth, H. Lee, K J. Brown, H. Chafi, M. Wu, A. R. Atreya, K. Olukotun, T. Rompf, and M. Odersky. OptiML: an implicitly parallel domain specific language for machine learning. In ICML, 2011.

[42]

V. Taylor, X. Wu, and R. Stevens. Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. SIGMETRICS Perform. Eval. Rev., 30(4):13--18, March 2003.

Digital Library

[43]

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. the International Journal of High Performance Computing Applications, (1):49--66, 2005.

[44]

Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8), Aug 1990.

Digital Library

[45]

S. Williams, A. Waterman, and D. Patterson. Rofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, Apr 2009.

Digital Library

[46]

L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC, 2005.

Digital Library

Cited By

Tang XCao WTang HDeng TMei JLiu YShi CXia MZeng Z(2022)Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313424733:9(2079-2092)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TPDS.2021.3134247
Kelefouras VDjemame K(2022)Workflow simulation and multi-threading aware task scheduling for heterogeneous computingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.05.011168:C(17-32)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2022.05.011
Zhu GJiang PAgrawal G(2019)A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance PredictionProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00042(444-455)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1109/PACT.2019.00042
Show More Cited By

Index Terms

SKOPE: a framework for modeling and exploring workload behavior
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Frameworks

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

May 2014

305 pages

ISBN:9781450328708

DOI:10.1145/2597917

General Chair:
Pedro Trancoso
University of Cyprus, CY
,
Program Chairs:
Diana Franklin
University of California at Santa Barbara
,
Sally A. McKee
Chalmers University of Technology, SE

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

CF'14

Sponsor:

SIGMICRO

CF'14: Computing Frontiers Conference

May 20 - 22, 2014

Cagliari, Italy

Acceptance Rates

CF '14 Paper Acceptance Rate 28 of 62 submissions, 45%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang XCao WTang HDeng TMei JLiu YShi CXia MZeng Z(2022)Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313424733:9(2079-2092)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TPDS.2021.3134247
Kelefouras VDjemame K(2022)Workflow simulation and multi-threading aware task scheduling for heterogeneous computingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.05.011168:C(17-32)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2022.05.011
Zhu GJiang PAgrawal G(2019)A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance PredictionProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00042(444-455)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1109/PACT.2019.00042
Umar MMoore SVetter JCameron K(2018)Prometheus: Coherent Exploration of Hardware and Software Optimizations Using Aspen2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2018.00032(244-250)Online publication date: Sep-2018
https://doi.org/10.1109/MASCOTS.2018.00032
Zhu GAgrawal G(2018)A Performance Prediction Framework for Irregular Applications2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00042(304-313)Online publication date: Dec-2018
https://doi.org/10.1109/HiPC.2018.00042
Chunduri SBalaprakash PMorozov VVishwanath VKumaran KGiorgi RBecchi MPalumbo F(2017)Analytical Performance Modeling and Validation of Intel's Xeon Phi ArchitectureProceedings of the Computing Frontiers Conference10.1145/3075564.3075593(247-250)Online publication date: 15-May-2017
https://dl.acm.org/doi/10.1145/3075564.3075593
Kraft ARusso JKrein MRussell BCasebeer WZiegler M(2017)A systematic approach to developing near real-time performance predictions based on physiological measures2017 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA)10.1109/COGSIMA.2017.7929601(1-7)Online publication date: Mar-2017
https://doi.org/10.1109/COGSIMA.2017.7929601
Roy ABalaprakash PHovland PWild S(2016)Exploiting Performance Portability in Search Algorithms for Autotuning2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.85(1535-1544)Online publication date: May-2016
https://doi.org/10.1109/IPDPSW.2016.85
Kumar SSumner WShriraman A(2016)SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks2016 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2016.7581272(1-11)Online publication date: Sep-2016
https://doi.org/10.1109/IISWC.2016.7581272
Ziegler MKraft AKrein MLo LHatfield BCasebeer WRussell B(2016)Sensing and Assessing Cognitive Workload Across Multiple TasksProceedings, Part I, 10th International Conference on Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience - Volume 974310.1007/978-3-319-39955-3_41(440-450)Online publication date: 17-Jul-2016
https://dl.acm.org/doi/10.1007/978-3-319-39955-3_41
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten