Abstract
Array regrouping enhances program spatial locality by interleaving elements of multiple arrays that tend to be accessed closely. Its effectiveness has been systematically studied for sequential programs running on unicore processors, but not for multithreading programs on modern Chip Multiprocessor (CMP) machines.
On one hand, the processor-level parallelism on CMP intensifies memory bandwidth pressure, suggesting the potential benefits of array regrouping for CMP computing. On the other hand, CMP architectures exhibit extra complexities—especially the hierarchical, heterogeneous cache sharing among hyperthreads, cores, and processors—that impose new challenges to array regrouping.
In this work, we initiate an exploration to the new opportunities and challenges. We propose cache-sharing-aware reference affinity analysis for identifying data affinity in multithreading applications. The analysis consists of affinity-guided thread scheduling and hierarchical reference-vector merging, handles cache sharing among both hyperthreads and cores, and offers hints for array regrouping and the avoidance of false sharing. Preliminary experiments demonstrate the potential of the techniques in improving locality of multithreading applications on CMP with various pitfalls avoided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)
Cook, W., Rohe, A.: Computing minimum-weight perfect matchings. INFORMS Journal on Computing 11, 138–148 (1999)
Curial, S., Zhao, P., Amaral, J.N., Gao, Y., Cui, S., Silvera, R., Archambault, R.: Memory pooling assisted data splitting (mpads). In: Proceedings of the International Symposium on Memory Management (2008)
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing 64(1), 108–134 (2004)
Edmonds, J.: Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B 69B, 125–130 (1965)
Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 220–229 (October 2008)
Kumar, R., Tullsen, D.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the International Symposium on Microarchitecture, pp. 419–429 (2002)
Lattner, C., Adve, V.: Automatic pool allocation: improving perfor- mance by controlling data structure layout in the heap. In: Proceedings of the ACM SIGPLAN Conference On Programming Language Design and Implementation (2005)
Nikolopoulos, D.: Code and data transformations for improving shared cache performance on SMT processors. In: Proceedings of the International Symposium on High Performance Computing, pp. 54–69 (2003)
Sarkar, S., Tullsen, D.: Compiler techniques for reducing data cache miss rate on a multithreaded architecture. In: Proceedings of The HiPEAC International Conference on High Performance Embedded Architectures and Compilation, pp. 353–368 (2008)
Sarkar, V.: Analysis and optimization of explicitly parallel programs using the parallel program graph representation (1997)
Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight reference affinity analysis. In: Proceedings of the 19th ACM International Conference on Supercomputing, Cambridge, MA (June 2005)
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. SIGOPS Oper. Syst. Rev. 41(3), 47–58 (2007)
Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In: PPoPP 2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212 (2010)
Zhao, P., Cui, S., Gao, Y., Silvera, R., Amaral, J.N.: Forma: A framework for safe automatic array regrouping. ACM Transactions on Programming Languages and Systems (2) (2007)
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 255–266 (June 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, Y., Zhang, E.Z., Shen, X., Gao, Y., Archambault, R. (2011). Array Regrouping on CMP with Non-uniform Cache Sharing. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-19595-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19594-5
Online ISBN: 978-3-642-19595-2
eBook Packages: Computer ScienceComputer Science (R0)