Array Regrouping on CMP with Non-uniform Cache Sharing

Jiang, Yunlian; Zhang, Eddy Z.; Shen, Xipeng; Gao, Yaoqing; Archambault, Roch

doi:10.1007/978-3-642-19595-2_7

Yunlian Jiang¹⁷,
Eddy Z. Zhang¹⁷,
Xipeng Shen¹⁷,
Yaoqing Gao¹⁸ &
…
Roch Archambault¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

805 Accesses

Abstract

Array regrouping enhances program spatial locality by interleaving elements of multiple arrays that tend to be accessed closely. Its effectiveness has been systematically studied for sequential programs running on unicore processors, but not for multithreading programs on modern Chip Multiprocessor (CMP) machines.

On one hand, the processor-level parallelism on CMP intensifies memory bandwidth pressure, suggesting the potential benefits of array regrouping for CMP computing. On the other hand, CMP architectures exhibit extra complexities—especially the hierarchical, heterogeneous cache sharing among hyperthreads, cores, and processors—that impose new challenges to array regrouping.

In this work, we initiate an exploration to the new opportunities and challenges. We propose cache-sharing-aware reference affinity analysis for identifying data affinity in multithreading applications. The analysis consists of affinity-guided thread scheduling and hierarchical reference-vector merging, handles cache sharing among both hyperthreads and cores, and offers hints for array regrouping and the avoidance of false sharing. Preliminary experiments demonstrate the potential of the techniques in improving locality of multithreading applications on CMP with various pitfalls avoided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)
Google Scholar
Cook, W., Rohe, A.: Computing minimum-weight perfect matchings. INFORMS Journal on Computing 11, 138–148 (1999)
Article MathSciNet MATH Google Scholar
Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Cui, S., Silvera, R., Archambault, R.: Memory pooling assisted data splitting (mpads). In: Proceedings of the International Symposium on Memory Management (2008)
Google Scholar
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing 64(1), 108–134 (2004)
Article MATH Google Scholar
Edmonds, J.: Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B 69B, 125–130 (1965)
Article MathSciNet MATH Google Scholar
Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 220–229 (October 2008)
Google Scholar
Kumar, R., Tullsen, D.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the International Symposium on Microarchitecture, pp. 419–429 (2002)
Google Scholar
Lattner, C., Adve, V.: Automatic pool allocation: improving perfor- mance by controlling data structure layout in the heap. In: Proceedings of the ACM SIGPLAN Conference On Programming Language Design and Implementation (2005)
Google Scholar
Nikolopoulos, D.: Code and data transformations for improving shared cache performance on SMT processors. In: Proceedings of the International Symposium on High Performance Computing, pp. 54–69 (2003)
Google Scholar
Sarkar, S., Tullsen, D.: Compiler techniques for reducing data cache miss rate on a multithreaded architecture. In: Proceedings of The HiPEAC International Conference on High Performance Embedded Architectures and Compilation, pp. 353–368 (2008)
Google Scholar
Sarkar, V.: Analysis and optimization of explicitly parallel programs using the parallel program graph representation (1997)
Google Scholar
Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight reference affinity analysis. In: Proceedings of the 19th ACM International Conference on Supercomputing, Cambridge, MA (June 2005)
Google Scholar
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. SIGOPS Oper. Syst. Rev. 41(3), 47–58 (2007)
Article Google Scholar
Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In: PPoPP 2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212 (2010)
Google Scholar
Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A framework for safe automatic array regrouping. ACM Transactions on Programming Languages and Systems (2) (2007)
Google Scholar
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 255–266 (June 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, The College of William and Mary, Williamsburg, VA, US
Yunlian Jiang, Eddy Z. Zhang & Xipeng Shen
IBM Toronto Software Lab, Toronto, Canada
Yaoqing Gao & Roch Archambault

Authors

Yunlian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Eddy Z. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yaoqing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Roch Archambault
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Rice University, 6100 Main Street, 77005-1892, Houston, TX, USA
Keith Cooper , John Mellor-Crummey & Vivek Sarkar , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Y., Zhang, E.Z., Shen, X., Gao, Y., Archambault, R. (2011). Array Regrouping on CMP with Non-uniform Cache Sharing. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-19595-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19594-5
Online ISBN: 978-3-642-19595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics