Multi-core scalability-impacting compiler optimizations

Suresh, Dinesh C.; Ju, Roy; Lai, Michael; Ye, Mei

doi:10.1007/s00450-010-0113-5

Multi-core scalability-impacting compiler optimizations

Special Issue Paper
Published: 03 April 2010

Volume 25, pages 15–24, (2010)
Cite this article

Computer Science - Research and Development

Dinesh C. Suresh¹,
Roy Ju¹,
Michael Lai¹ &
…
Mei Ye¹

115 Accesses
Explore all metrics

Abstract

In a multi-core system, while the processor core pipelines and local caches are replicated in each core, other resources, such as shared cache and memory bus, are shared across all cores in a processor. Running multiple copies of memory intensive applications on different cores often leads to poor scaling, because these shared resources can become bottlenecks to throughput performance. Such issues have been traditionally studied under the design and evaluation of processors, platforms, and operating systems. We have identified a set of compiler optimizations that have measurable impact on the scaling of applications on multi-core systems and evaluated them based on the standard rate run of the SPEC CPU2006 benchmark suite, where throughput is measured by running multiple copies of a program in a multi-core and multi-processor system. We have also collected data and analyzed how these compiler optimizations affect the utilization and behaviors of the shared resources. Through our experimental results, we show that conventional compiler optimizations can play an important role in improving the scaling of running memory intensive application threads on multi-core systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aho AV, Ullman JD (1977) Principles of compiler design. Addison-Wesley series in computer science and information processing. Addison-Wesley, Boston
Google Scholar
Chai L, Gao Q, Panda DK (May 2007) Understanding the impact of multi-core architecture in cluster computing: a case study with Intel dual-core system. In: Proceedings of the seventh IEEE international symposium on cluster computing and the grid, Rio de Janeiro, Brazil, pp 471–478
Chakrabarti G, Chow F (April 2008) Structure layout optimizations in the open64 compiler: Design, implementation and measurements. In: Open64 workshop held in conjunction with the international symposium on code generation and optimization, Boston, Massachusetts, USA
Huh J, Burger D, Keckler SW (2001) Exploring the design space of future cmps. In: Proceedings of the 2001 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 199–210
Google Scholar
Hundt R, Mannarswamy S, Chakrabarti D (2006) Practical structure layout optimization and advice. In: CGO’06: proceedings of the international symposium on code generation and optimization. IEEE Comput Soc, Washington, pp 233–244
Chapter Google Scholar
Liu L, Li Z, Sameh AH (2008) Analyzing memory access intensity in parallel programs on multicore. In: Proceedings of the 22nd annual international conference on supercomputing, ICS ’08. ACM, New York, pp 359–367
Chapter Google Scholar
Mowry T, Gupta A (1991) Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J Parallel Distrib Comput 12:87–106
Article Google Scholar
Mowry TC (1998) Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans Comput Syst 16:55–92
Article Google Scholar
Mowry TC, Lam MS, Gupta A (1992) Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the fifth international conference on architectural support for programming languages and operating systems, ASPLOS-V. ACM, New York, pp 62–73
Chapter Google Scholar
Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufmann, San Francisco
Google Scholar
Raman E, Hundt R, Mannarswamy S (2007) Structure layout optimization for multithreaded programs. In: Proceedings of the international symposium on code generation and optimization, CGO ’07. IEEE Comput Soc, Washington, pp 271–282
Chapter Google Scholar
Rogers BM, Krishna A, Bell GB, Vu K, Jiang X, Solihin Y (2009) Scaling the bandwidth wall: challenges in and avenues for cmp scaling. In: Proceedings of the 36th annual international symposium on Computer architecture, ISCA ’09. ACM, New York, pp 371–382
Chapter Google Scholar
SPEC (2009) http://www.spec.org/cpu2006/results-/res2009q3/cpu2006-20090707-08127.html
Truong DN, Bodin F, Seznec A (1998) Improving cache behavior of dynamically allocated data structures. In: Proceedings of the 1998 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 322–329
Chapter Google Scholar
Zhang Q, Chen Y, Li J, Zhang Y, Xu Y (2007) Parallelization and performance analysis of video feature extractions on multi-core based systems. In: Proceedings of the 2007 international conference on parallel processing, ICPP ’07. IEEE Comput Soc, Washington, p 1
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Micro Devices Inc, One AMD place, Sunnyvale, CA, 94086, USA
Dinesh C. Suresh, Roy Ju, Michael Lai & Mei Ye

Authors

Dinesh C. Suresh
View author publications
You can also search for this author in PubMed Google Scholar
Roy Ju
View author publications
You can also search for this author in PubMed Google Scholar
Michael Lai
View author publications
You can also search for this author in PubMed Google Scholar
Mei Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinesh C. Suresh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suresh, D.C., Ju, R., Lai, M. et al. Multi-core scalability-impacting compiler optimizations. Comput Sci Res Dev 25, 15–24 (2010). https://doi.org/10.1007/s00450-010-0113-5

Download citation

Published: 03 April 2010
Issue Date: May 2010
DOI: https://doi.org/10.1007/s00450-010-0113-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-core scalability-impacting compiler optimizations

Abstract

Access this article

Similar content being viewed by others

Software Cache Coherent Control by Parallelizing Compiler

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Impact of the Memory Controller on the Performance of Parallel Workloads

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-core scalability-impacting compiler optimizations

Abstract

Access this article

Similar content being viewed by others

Software Cache Coherent Control by Parallelizing Compiler

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Impact of the Memory Controller on the Performance of Parallel Workloads

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation