Skip to main content
Log in

Multi-core scalability-impacting compiler optimizations

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

In a multi-core system, while the processor core pipelines and local caches are replicated in each core, other resources, such as shared cache and memory bus, are shared across all cores in a processor. Running multiple copies of memory intensive applications on different cores often leads to poor scaling, because these shared resources can become bottlenecks to throughput performance. Such issues have been traditionally studied under the design and evaluation of processors, platforms, and operating systems. We have identified a set of compiler optimizations that have measurable impact on the scaling of applications on multi-core systems and evaluated them based on the standard rate run of the SPEC CPU2006 benchmark suite, where throughput is measured by running multiple copies of a program in a multi-core and multi-processor system. We have also collected data and analyzed how these compiler optimizations affect the utilization and behaviors of the shared resources. Through our experimental results, we show that conventional compiler optimizations can play an important role in improving the scaling of running memory intensive application threads on multi-core systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aho AV, Ullman JD (1977) Principles of compiler design. Addison-Wesley series in computer science and information processing. Addison-Wesley, Boston

    Google Scholar 

  2. Chai L, Gao Q, Panda DK (May 2007) Understanding the impact of multi-core architecture in cluster computing: a case study with Intel dual-core system. In: Proceedings of the seventh IEEE international symposium on cluster computing and the grid, Rio de Janeiro, Brazil, pp 471–478

  3. Chakrabarti G, Chow F (April 2008) Structure layout optimizations in the open64 compiler: Design, implementation and measurements. In: Open64 workshop held in conjunction with the international symposium on code generation and optimization, Boston, Massachusetts, USA

  4. Huh J, Burger D, Keckler SW (2001) Exploring the design space of future cmps. In: Proceedings of the 2001 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 199–210

    Google Scholar 

  5. Hundt R, Mannarswamy S, Chakrabarti D (2006) Practical structure layout optimization and advice. In: CGO’06: proceedings of the international symposium on code generation and optimization. IEEE Comput Soc, Washington, pp 233–244

    Chapter  Google Scholar 

  6. Liu L, Li Z, Sameh AH (2008) Analyzing memory access intensity in parallel programs on multicore. In: Proceedings of the 22nd annual international conference on supercomputing, ICS ’08. ACM, New York, pp 359–367

    Chapter  Google Scholar 

  7. Mowry T, Gupta A (1991) Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J Parallel Distrib Comput 12:87–106

    Article  Google Scholar 

  8. Mowry TC (1998) Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans Comput Syst 16:55–92

    Article  Google Scholar 

  9. Mowry TC, Lam MS, Gupta A (1992) Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the fifth international conference on architectural support for programming languages and operating systems, ASPLOS-V. ACM, New York, pp 62–73

    Chapter  Google Scholar 

  10. Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufmann, San Francisco

    Google Scholar 

  11. Raman E, Hundt R, Mannarswamy S (2007) Structure layout optimization for multithreaded programs. In: Proceedings of the international symposium on code generation and optimization, CGO ’07. IEEE Comput Soc, Washington, pp 271–282

    Chapter  Google Scholar 

  12. Rogers BM, Krishna A, Bell GB, Vu K, Jiang X, Solihin Y (2009) Scaling the bandwidth wall: challenges in and avenues for cmp scaling. In: Proceedings of the 36th annual international symposium on Computer architecture, ISCA ’09. ACM, New York, pp 371–382

    Chapter  Google Scholar 

  13. SPEC (2009) http://www.spec.org/cpu2006/results-/res2009q3/cpu2006-20090707-08127.html

  14. Truong DN, Bodin F, Seznec A (1998) Improving cache behavior of dynamically allocated data structures. In: Proceedings of the 1998 international conference on parallel architectures and compilation techniques. IEEE Comput Soc, Washington, pp 322–329

    Chapter  Google Scholar 

  15. Zhang Q, Chen Y, Li J, Zhang Y, Xu Y (2007) Parallelization and performance analysis of video feature extractions on multi-core based systems. In: Proceedings of the 2007 international conference on parallel processing, ICPP ’07. IEEE Comput Soc, Washington, p 1

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh C. Suresh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suresh, D.C., Ju, R., Lai, M. et al. Multi-core scalability-impacting compiler optimizations. Comput Sci Res Dev 25, 15–24 (2010). https://doi.org/10.1007/s00450-010-0113-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-010-0113-5

Keywords

Navigation