Skip to main content

Array Regrouping on CMP with Non-uniform Cache Sharing

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

  • 805 Accesses

Abstract

Array regrouping enhances program spatial locality by interleaving elements of multiple arrays that tend to be accessed closely. Its effectiveness has been systematically studied for sequential programs running on unicore processors, but not for multithreading programs on modern Chip Multiprocessor (CMP) machines.

On one hand, the processor-level parallelism on CMP intensifies memory bandwidth pressure, suggesting the potential benefits of array regrouping for CMP computing. On the other hand, CMP architectures exhibit extra complexities—especially the hierarchical, heterogeneous cache sharing among hyperthreads, cores, and processors—that impose new challenges to array regrouping.

In this work, we initiate an exploration to the new opportunities and challenges. We propose cache-sharing-aware reference affinity analysis for identifying data affinity in multithreading applications. The analysis consists of affinity-guided thread scheduling and hierarchical reference-vector merging, handles cache sharing among both hyperthreads and cores, and offers hints for array regrouping and the avoidance of false sharing. Preliminary experiments demonstrate the potential of the techniques in improving locality of multithreading applications on CMP with various pitfalls avoided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)

    Google Scholar 

  2. Cook, W., Rohe, A.: Computing minimum-weight perfect matchings. INFORMS Journal on Computing 11, 138–148 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Cui, S., Silvera, R., Archambault, R.: Memory pooling assisted data splitting (mpads). In: Proceedings of the International Symposium on Memory Management (2008)

    Google Scholar 

  4. Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing 64(1), 108–134 (2004)

    Article  MATH  Google Scholar 

  5. Edmonds, J.: Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B 69B, 125–130 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  6. Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 220–229 (October 2008)

    Google Scholar 

  7. Kumar, R., Tullsen, D.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the International Symposium on Microarchitecture, pp. 419–429 (2002)

    Google Scholar 

  8. Lattner, C., Adve, V.: Automatic pool allocation: improving perfor- mance by controlling data structure layout in the heap. In: Proceedings of the ACM SIGPLAN Conference On Programming Language Design and Implementation (2005)

    Google Scholar 

  9. Nikolopoulos, D.: Code and data transformations for improving shared cache performance on SMT processors. In: Proceedings of the International Symposium on High Performance Computing, pp. 54–69 (2003)

    Google Scholar 

  10. Sarkar, S., Tullsen, D.: Compiler techniques for reducing data cache miss rate on a multithreaded architecture. In: Proceedings of The HiPEAC International Conference on High Performance Embedded Architectures and Compilation, pp. 353–368 (2008)

    Google Scholar 

  11. Sarkar, V.: Analysis and optimization of explicitly parallel programs using the parallel program graph representation (1997)

    Google Scholar 

  12. Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight reference affinity analysis. In: Proceedings of the 19th ACM International Conference on Supercomputing, Cambridge, MA (June 2005)

    Google Scholar 

  13. Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. SIGOPS Oper. Syst. Rev. 41(3), 47–58 (2007)

    Article  Google Scholar 

  14. Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In: PPoPP 2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212 (2010)

    Google Scholar 

  15. Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A framework for safe automatic array regrouping. ACM Transactions on Programming Languages and Systems (2) (2007)

    Google Scholar 

  16. Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 255–266 (June 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, Y., Zhang, E.Z., Shen, X., Gao, Y., Archambault, R. (2011). Array Regrouping on CMP with Non-uniform Cache Sharing. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19595-2_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19594-5

  • Online ISBN: 978-3-642-19595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics