Abstract
In this paper, we present a statistical model to predict the off-chip memory bandwidth required by a parallel loop during its execution. It is a compile-time modeling technique that derives the correlations between memory bandwidth requirement and data access patterns of multithreaded applications. This model could be used by the compiler and performance tools to predict when the sustainable memory bandwidth of the system will be reached by the application during execution, and to determine an optimal number of threads that should be configured to execute a specific parallel loop according to its memory reference patterns. Awareness of the performance impact of oversubscribed memory bandwidth can also help programmers to take into account the additional latency caused by the contention, and to minimize the overhead by tuning the memory access behavior of applications. We evaluated this model in terms of both technical accuracy and prediction accuracy by comparing the modeling results with the measured results. The evaluation demonstrates its accuracy in both system bandwidth modeling and application bandwidth modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Open64 compiler. http://open64.sourceforge.net
Agarwal, D., Liu, W., Yeung, D.: Exploiting application-level information to reduce memory bandwidth consumption. In: Proceedings of 4th Workshop on Complexity-Effective Design (2003)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)
Black, N., Moore, S., Weisstein, E.W.: Jacobi method. http://mathworld.wolfram.com/JacobiMethod.html
Coope, I.D.: Circle fitting by linear and nonlinear least squares. J. Optim. Theor. Appl. 76(2), 381–388 (1993)
Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 229–241 (1999)
Ding, C. Kennedy, K.: The memory bandwidth bottleneck and its amelioration by a compiler. In: Proceedings of the 14th International Symposium on Parallel and Distributed Processing (2000)
Jeremiassen, T., Eggers, S.J.: Reducing false sharing on shared memory multiprocessors through compile time data transformations. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 179–188 (1994)
Lee, H.-J., Cho, W.-C., Chung, E.-Y.: Analytical memory bandwidth model for many-core processor based systems. IEICE Electron. Express 9(18), 1461–1466 (2012)
Liu, F., Jiang, X., Solihin, Y.: Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In: Proceedings of High Performance Computer Architecture (HPCA), pp. 1–12 (2010)
Majo, Z., Gross, T.R.: Memory system performance in a numa multicore multiprocessor. In Proceedings of the 4th Annual International Conference on Systems and Storage (SYSTOR), pp. 12:1–12:10 (2011)
Mandal, A., Fowler, R., Porterfield. Modeling memory concurrency for multi-socket multi-core systems. In: ISPASS, pp. 66–75 (2010)
Marchal, P., Gómez, J.I., Catthoor, F.: Optimizing the memory bandwidth with loop fusion. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 188–193 (2004)
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (1995)
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18(4), 424–453 (1996)
Mohideen, R.M., Sankaranarayanan, V.: An analytical model for optimum off-chip memory bandwidth partitioning in multicore architectures. In: Proceedings of the 2nd International Conference on Computer Science and Information Technology (ICCSIT) (2012)
Sandberg, A., Eklov, D., Hagersten, E.: Reducing cache pollution through detection and elimination of non-temporal memory accesses. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
Schuff, D., Parsons, B., Pai, V.: Multicore-aware reuse distance analysis. In: IPDPS Workshop on Performance Modeling, Evaluation, and Optimization of Ubiquitous Computing and Networked Systems (2010)
Tolubaeva, M., Yan, Y., Chapman, B.: Compile-time detection of false sharing via loop cost modeling. In: Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 557–566 (2012)
Wang, R., Chen, L., Pinkston, T.M.: An analytical performance model for partitioning off-chip memory bandwidth. In: Proceedings of the IPDPS (2013)
Wu, X., Taylor, V.E.: Performance modeling of hybrid mpi/openmp scientific applications on large-scale multicore cluster systems. In: CSE, pp. 181–190 (2011)
Acknowledgement
This work was supported in part by the National Science Foundations Computer Systems Research program under Award No. CCF-0833201 and Department of Energy under Award Agreement No. DE-FC02-12ER26099. The evaluation platform used for this work was supported by the National Science Foundation’s Computer Systems Research program under Award No. CNS-0833201 and CRI-0958464.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tolubaeva, M., Yan, Y., Chapman, B. (2014). Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel Loops. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-09967-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)