Abstract
High performance embedded architectures will in some cases combine simple caches and multithreading, two techniques that increase energy efficiency and performance at the same time. However, that combination can produce high and unpredictable cache miss rates, even when the compiler optimizes the data layout of each program for the cache.
This paper examines data-cache aware compilation for multithreaded architectures. Data-cache aware compilation finds a layout for data objects which minimizes inter-object conflict misses. This research extends and adapts prior cache-conscious data layout optimizations to the much more difficult environment of multithreaded architectures. Solutions are presented for two computing scenarios: (1) the more general case where any application can be scheduled along with other applications, and (2) the case where the co-scheduled working set is more precisely known.
It is shown that these techniques reduce data cache misses for a variety of cache architectures, multithreading environments, and cache latencies.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Calder, B., Krintz, C., John, S., Austin, T.: Cache-conscious data placement. In: Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (1998)
Tullsen, D.M., Eggers, S., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture (1995)
Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In: Proceedings of the 23rd Annual International Symposium on Computer Architecture (1996)
Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Intl Symposium on Low Power Electronics and Design (2004)
Seng, J., Tullsen, D., Cai, G.: Power-sensitive multithreaded architecture. In: International Conference on Computer Design (September 2000)
Kumar, R., Jouppi, N., Tullsen, D.M.: Conjoined-core chip multiprocessing. In: 37th International Symposium on Microarchitecture (December 2004)
Dolbeau, R., Seznec, A.: Cash: Revisiting hardware sharing in single-chip parallel processor. In: IRISA Report 1491 (November 2002)
Agarwal, A., Pudar, S.: Column-associative caches: A technique for reducing the miss rate of direct-mapped caches. In: International Symposium on Computer Architecture (1993)
Topham, N., González, A.: Randomized cache placement for eliminating conflicts. IEEE Transactions on Computer 48(2) (1999)
Seznec, A., Bodin, F.: Skewed-associative caches. In: International Conference on Parallel Architectures and Languages, pp. 305–316 (1993)
Lynch, W.L., Bray, B.K., Flynn, M.J.: The effect of page allocation on caches. In: 25th Annual International Symposium on Microarchitecture (1992)
Rivera, G., Tseng, C.W.: Data transformations for eliminating conflict misses. In: SIGPLAN Conference on Programming Language Design and Implementation, pp. 38–49 (1998)
Mueller, F.: Compiler support for software-based cache partitioning. In: Workshop on Languages, Compilers and Tools for Real-Time Systems, pp. 125–133 (1995)
Juan, T., Royo, D.: Dynamic cache splitting. In: XV International Confernce of the Chilean Computational Society (1995)
Bershad, B.N., Lee, D., Romer, T.H., Chen, J.B.: Avoiding conflict misses dynamically in large direct-mapped caches. In: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, October 5–7, pp. 158–170 (1994)
Sherwood, T., Calder, B., Emer, J.S.: Reducing cache misses using hardware and software page placement. In: International Conference on Supercomputing, pp. 155–164 (1999)
Nemirovsky, M., Yamamoto, W.: Quantitative study on data caches on a multistreamed architecture. In: Workshop on Multithreaded Execution, Architecture and Compilation (1998)
Hily, S., Seznec, A.: Standard memory hierarchy does not fit simultaneous multithreading. In: Proceedings of the Workshop on Multithreaded Execution Architecture and Compilation, with HPCA-4 (1998)
Jos, M.G.: Data caches for multithreaded processors. In: Workshop on Multithreaded Execution, Architecture and Compilation (2000)
May, D., Irwin, J., Muller, H.L., Page, D.: Effective caching for multithreaded processors. In: Communicating Process Architectures, pp. 145–154. IOS Press, Amsterdam (2000)
Nikolopoulos, D.S.: Code and data transformations for improving shared cache performance on SMT processors. In: International Symposium on High Performance Computing, pp. 54–69 (2003)
Lopez, S., Dropsho, S., Albonesi, D.H., Garnica, O., Lanchares, J.: Dynamic capacity-speed tradeoffs in smt processor caches. In: Intl Conference on High Performance Embedded Architectures & Compilers (January 2007)
Kumar, R., Tullsen, D.M.: Compiling for instruction cache performance on a multithreaded architecture. In: 35th Annual International Symposium on Microarchitecture (2002)
Sarkar, S., Tullsen, D.M.: Compiler techniques for reducing data cache miss rate on a multithreaded architecture. In: Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (2008)
Tullsen, D.M.: Simulation and modeling of a simultaneous multithreading processor. In: 22nd Annual Computer Measurement Group Conference (December 1996)
Tullsen, D.M., Brown, J.: Handling long-latency loads in a simultaneous multithreaded processor. In: 34th International Symposium on Microarchitecture (December 2001)
Srivastava, A., Eustace, A.: Atom: a system for building customized program analysis tools. SIGPLAN Notices 39, 528–539 (2004)
Grunwald, D., Zorn, B.G., Henderson, R.: Improving the cache locality of memory allocation. In: SIGPLAN Conference on Programming Language Design and Implementation (1993)
Robson, J.M.: Worst case fragmentation of first fit and best fit storage allocation strategies. The Computer Journal 20(3) (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sarkar, S., Tullsen, D.M. (2011). Data Layout for Cache Performance on a Multithreaded Architecture. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers III. Lecture Notes in Computer Science, vol 6590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19448-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-19448-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19447-4
Online ISBN: 978-3-642-19448-1
eBook Packages: Computer ScienceComputer Science (R0)