Abstract
In a multi-core system, while the processor core pipelines and local caches are replicated in each core, other resources, such as shared cache and memory bus, are shared across all cores in a processor. Running multiple copies of memory intensive applications on different cores often leads to poor scaling, because these shared resources can become bottlenecks to throughput performance. Such issues have been traditionally studied under the design and evaluation of processors, platforms, and operating systems. We have identified a set of compiler optimizations that have measurable impact on the scaling of applications on multi-core systems and evaluated them based on the standard rate run of the SPEC CPU2006 benchmark suite, where throughput is measured by running multiple copies of a program in a multi-core and multi-processor system. We have also collected data and analyzed how these compiler optimizations affect the utilization and behaviors of the shared resources. Through our experimental results, we show that conventional compiler optimizations can play an important role in improving the scaling of running memory intensive application threads on multi-core systems.
Similar content being viewed by others
References
Aho AV, Ullman JD (1977) Principles of compiler design. Addison-Wesley series in computer science and information processing. Addison-Wesley, Boston
Chai L, Gao Q, Panda DK (May 2007) Understanding the impact of multi-core architecture in cluster computing: a case study with Intel dual-core system. In: Proceedings of the seventh IEEE international symposium on cluster computing and the grid, Rio de Janeiro, Brazil, pp 471–478
Chakrabarti G, Chow F (April 2008) Structure layout optimizations in the open64 compiler: Design, implementation and measurements. In: Open64 workshop held in conjunction with the international symposium on code generation and optimization, Boston, Massachusetts, USA
Huh J, Burger D, Keckler SW (2001) Exploring the design space of future cmps. In: Proceedings of the 2001 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 199–210
Hundt R, Mannarswamy S, Chakrabarti D (2006) Practical structure layout optimization and advice. In: CGO’06: proceedings of the international symposium on code generation and optimization. IEEE Comput Soc, Washington, pp 233–244
Liu L, Li Z, Sameh AH (2008) Analyzing memory access intensity in parallel programs on multicore. In: Proceedings of the 22nd annual international conference on supercomputing, ICS ’08. ACM, New York, pp 359–367
Mowry T, Gupta A (1991) Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J Parallel Distrib Comput 12:87–106
Mowry TC (1998) Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans Comput Syst 16:55–92
Mowry TC, Lam MS, Gupta A (1992) Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the fifth international conference on architectural support for programming languages and operating systems, ASPLOS-V. ACM, New York, pp 62–73
Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufmann, San Francisco
Raman E, Hundt R, Mannarswamy S (2007) Structure layout optimization for multithreaded programs. In: Proceedings of the international symposium on code generation and optimization, CGO ’07. IEEE Comput Soc, Washington, pp 271–282
Rogers BM, Krishna A, Bell GB, Vu K, Jiang X, Solihin Y (2009) Scaling the bandwidth wall: challenges in and avenues for cmp scaling. In: Proceedings of the 36th annual international symposium on Computer architecture, ISCA ’09. ACM, New York, pp 371–382
SPEC (2009) http://www.spec.org/cpu2006/results-/res2009q3/cpu2006-20090707-08127.html
Truong DN, Bodin F, Seznec A (1998) Improving cache behavior of dynamically allocated data structures. In: Proceedings of the 1998 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 322–329
Zhang Q, Chen Y, Li J, Zhang Y, Xu Y (2007) Parallelization and performance analysis of video feature extractions on multi-core based systems. In: Proceedings of the 2007 international conference on parallel processing, ICPP ’07. IEEE Comput Soc, Washington, p 1
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Suresh, D.C., Ju, R., Lai, M. et al. Multi-core scalability-impacting compiler optimizations. Comput Sci Res Dev 25, 15–24 (2010). https://doi.org/10.1007/s00450-010-0113-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-010-0113-5