Abstract
To deliver the energy efficiency and raw compute throughput necessary to realize exascale systems, projected designs call for massive numbers of (simple) cores per processor. An unfortunate consequence of such designs is that the memory bandwidth per core will be significantly reduced, which can significantly degrade the performance of many memory-intensive HPC workloads. To identify the code regions that are most impacted and to guide them in developing mitigating solutions, system designers and application developers alike would benefit immensely from a systematic framework that allowed them to identify the types of computations that are sensitive to reduced memory bandwidth and to precisely identify those regions in their code that exhibit sensitivity. This paper introduces a framework for identifying the properties in computations that are associated with memory bandwidth sensitivity, extracting those same properties from HPC applications, and for associating bandwidth sensitivity to specific structures in the application source code. We apply our framework to a number of large scale HPC applications, observing that the bandwidth sensitivity model shows an absolute mean error that averages less than 5%.
The rights of this work are transferred to the extent transferable according to Title 17 §105 U.S.C.
Chapter PDF
Similar content being viewed by others
Keywords
- Memory Bandwidth
- Lawrence Livermore National Laboratory
- Memory Access Pattern
- Computational Phasis
- Binary Instrumentation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Mantevo Project, http://mantevo.org/
Alam, S., Vetter, J.: A framework to develop symbolic performance models of parallel applications. In: 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, p. 8 (April 2006)
Bailey, D.H., Snavely, A.: Performance modeling: Understanding the past and predicting the future. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 185–195. Springer, Heidelberg (2005)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks–summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing 1991. ACM, New York (1991)
Barker, K., Davis, K., Kerbyson, D.: Performance modeling in action: Performance prediction of a cray xt4 system during upgrade. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2009)
Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008 (2008)
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K.: Exascale computing study: Technology challenges in achieving exascale systems (2008), http://www.cse.nd.edu/Reports/2008TR-2008-13.pdf
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman & Hall, CRC (1984)
Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening Multigrid on Distributed Memory Machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)
Chen, C., Chame, J., Hall, M.W.: CHiLL: A framework for composing high-level loop transformations. TR 08-897, Univ. of Southern California (June 2008)
Deng, Q., Meisner, D., Bhattacharjee, A., Wenisch, T.F., Bianchini, R.: Coscale: Coordinating cpu and memory system dvfs in server systems. In: 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO (2012)
Diniz, B., Guedes, D., Meira Jr., W., Bianchini, R.: Limiting the power consumption of main memory. In: ACM SIGARCH Computer Architecture News, vol. 35, pp. 290–301. ACM (2007)
Falgout, R.D., Meier Yang, U.: hypre: A library of high performance preconditioners. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J. J., Hoekstra, A.G. (eds.) ICCS 200. Part III. LNCS, vol. 2331, pp. 632–641. Springer, Heidelberg (2002)
Friedman, J.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29(5), 1189–1232 (2001)
Hoefler, T.: Bridging performance analysis tools and analytic performance modeling for HPC. In: Guarracino, M.R., et al. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 483–491. Springer, Heidelberg (2011)
Hoisie, A., Kerbyson, D.J., Mendes, C.L., Reed, D.A., Snavely, A.: Special section: Large-scale system performance modeling and analysis. Future Generation Comp. Syst. 22(3), 291–292 (2006)
Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 196–205. Springer, Heidelberg (2005)
Kerbyson, D., Vishnu, A., Barker, K., Hoisie, A.: Codesign challenges for exascale systems: Performance, power, and reliability. Computer 44(11), 37–43 (2011)
Kerbyson, D.J., Jones, P.W.: A performance model of the parallel ocean program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)
Laurenzano, M., Tikir, M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 175–183 (March 2010)
Laurenzano, M.A., Meswani, M., Carrington, L., Snavely, A., Tikir, M.M., Poole, S.: Reducing energy usage with memory and computation-aware dynamic frequency scaling. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 79–90. Springer, Heidelberg (2011)
Lebeck, A.R., Fan, X., Zeng, H., Ellis, C.: Power aware page allocation. ACM SIGPLAN Notices 35(11), 105–116 (2000)
McVoy, L., Staelin, C.: lmbench: Portable tools for performance analysis. In: Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC 1996, Berkeley, CA, USA, pp. 23–23. USENIX Association (1996)
Norman, M., Snavely, A.: Accelerating data-intensive science with Gordon and Dash. In: 2010 TeraGrid Conference (2010)
Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications, Computational Science, pp. 443–462. Chapman & Hall / CRC Press (2007)
Pandey, V., Jiang, W., Zhou, Y., Bianchini, R.: Dma-aware memory energy management. In: HPCA, vol. 6, pp. 133–144 (2006)
Tiwari, A., Laurenzano, M., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: Proceedings of the Eighth Workshop on High-Performance, Power-Aware Computing, HPPAC 2012 (2012)
Yang, U.: Parallel algebraic multigrid methods in high performance preconditioners. In: Garbow, B.S., Dongarra, J., Boyle, J.M., Moler, C.B. (eds.) Numerical Solution of Partial Differential Equations on Parallel Computers. LNCS, vol. 51, pp. 209–236. Springer, Heidelberg (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tiwari, A., Gamst, A., Laurenzano, M.A., Schulz, M., Carrington, L. (2014). Modeling the Impact of Reduced Memory Bandwidth on HPC Applications. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)