A Data Cache with Dynamic Mapping

D’Alberto, Paolo; Nicolau, Alexandru; Veidenbaum, Alexander

doi:10.1007/978-3-540-24644-2_28

A Data Cache with Dynamic Mapping

Paolo D’Alberto²,
Alexandru Nicolau² &
Alexander Veidenbaum²

Conference paper

577 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2958))

Abstract

Dynamic Mapping is an approach to cope with the loss of performance due to cache interference and to improve performance predictability of blocked algorithms for modern architectures. An example is matrix multiply: tiling matrix multiply for a data cache of 16KB using optimal tile size achieves an average data-cache miss rate of 3%, but with peaks of 16% due to interference.

Dynamic Mapping is a software-hardware approach where the mapping in cache is determined at compile time, by manipulating the address used by the data cache. The reduction of cache misses translates into a 2-fold speed-up for matrix multiply and FFT by eliminating data-cache miss spikes.

Dynamic mapping has the same goal as other proposed approaches, but it determines the cache mapping before issuing a load. It uses the computational power of the processor – instead of the memory controller or the data cache mapping – and it has no effect on the access time of memory and cache. It is an approach combining several concepts, such as non-standard cache mapping functions and data layout reorganization and, potentially, without any overhead.

This work is supported in part by NSF, Contract Number ACI 0204028.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, U.: Loop Transformations for Restructuring Compilers The Foundations. Kluwer Academic Publishers, Dordrecht (1993)
Book MATH Google Scholar
Bilardi, G., Preparata, F.P.: Processor - time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32(5), 531–559 (1999)
Article MathSciNet MATH Google Scholar
Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: 28th Annual Symposium on Foundations of Computer Science, Los Angeles, California, October 1987, pp. 204–216 (1987)
Google Scholar
Aggarwal, A., Alpern, B., Chandra, A.K., Snir, M.: A model for hierarchical memory. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, New York, pp. 305–314 (1987)
Google Scholar
Clauss, P., Meister, B.: Automatic memory layout transformation to optimize spatial locality in parameterized loop nests. In: 4th Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT-4, Toulouse, France (January 2000)
Google Scholar
Cabeza, M.L.C., Clemente, M.I.G., Rubio, M.L.: Cachesim: a cache simulator for teaching memory hierarchy behavior. In: Proceedings of the 4th annual Sigcse/Sigue on Innovation and Technology in Computer Science education, p. 181 (1999)
Google Scholar
Bilardi, G., D’Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, p. 26. Springer, Heidelberg (2001)
Chapter Google Scholar
D’Alberto, P.: Performance evaluation of data locality exploitation. Technical report
Google Scholar
David, S.E.: The making of linux/ia64. Technical report
Google Scholar
Catthoor, F., Dutt, N.D., Kozyrakis, C.E.: How to solve the current memory access and data transfer bottlenecks: At the processor architecture or at the compiler level? In: DATE (March 2000)
Google Scholar
Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48(2), 142–149 (1999)
Article Google Scholar
Frigo, M., Johnson, S.G.: The fastest fourier transform in the west. Technical Report MIT-LCS-TR-728, Massachusetts Institute of technology (September 1997)
Google Scholar
Gatlin, K.S., Carter, L.: Memory hierarchy considerations for fast transpose and bit-reversals. In: HPCA, pp. 33–43 (1999)
Google Scholar
González, A., Valero, M., Topham, N., Parcerisa, J.M.: Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th international conference on Supercomputing, pp. 76–83. ACM Press, New York (1997)
Google Scholar
Gupta, R.: Architectural adaptation in amrm machines. In: Proceedings of IEEE Computer Society Workshop on VLSI 2000, Los Alamitos, CA, USA, pp. 75–79 (2000)
Google Scholar
Hennesy, J.L., Patterson, D.A.: Computer architecture a quantitative approach, 2nd edn. Morgan Kaufman, San Francisco (1996)
Google Scholar
Hong, J., Kung, T.H.: I/o complexity, the red-blue pebble game. In: Proceedings of the 13th Ann. ACM Symposium on Theory of Computing, October 1981, pp. 326–333 (1981)
Google Scholar
Zhang, L., Fang, Z., Parker, M., Mathew, B.K., Schaelicke, L., Carter, J.B., Hsieh, W.C., McKee, S.A.: The impulse memory controller. IEEE Transactions on Computers, Special Issue on Advances in High Performance Memory Systems, 1117–1132 (November 2001)
Google Scholar
Granston, E.D., Jalby, W., Teman, O.: To copy or not to copy: a compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing, November 1993, pp. 410–419 (1993)
Google Scholar
Johnson, T.L., Hwu, W.m.: Run-time adaptive cache hierarchy management via reference analysis. In: Proceedings of the 24th Annual International Symposium on Computer Architecture (1997)
Google Scholar
Johnson, T.L., Hwu, W.W.: Run-time adaptive cache hierarchy management via reference analysis. In: 24th Annual International Symposium on Computer Architecture ISCA 1997, May 1997, pp. 315–326 (1997)
Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: Proceedings 40th Annual Symposium on Foundations of Computer Science (1999)
Google Scholar
Pugh, W.: Counting solutions to presburger formulas: How and why. In: SIGPLAN Programming language issues in software systems, Orlando, Florida, USA, pp. 94–96 (1994)
Google Scholar
Carter, J.B., Hsieh, W.C., Stoller, L.B., Swanson, M.R., Zhang, L., Brunv, E.L., Davis, A., Kuo, C.C., Kuramkote, R., Parker, M.A., Schaelicke, L., Tateyama, T.: Impulse: Building a smarter memory controller. In: In the Proceedings of the Fifth International Symposium on High Performance Computer Architecture (HPCA- 5), January 1999, pp. 70–79 (1999)
Google Scholar
Seznec, A.: A case for two-way skewed-associative caches. In: Proc. 20th Annual Symposium on Computer Architecture, June 1993, pp. 169–178 (1993)
Google Scholar
Smith, A.J.: Cache Memories. ACM Computing Surveys 14(3), 473–530 (1982)
Article MathSciNet Google Scholar
Ghosh, S.M.S., Martonosi, M.: Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems 21(4), 703–746 (1999)
Article Google Scholar
D’Alberto, P., Nicolau, A., Veidenbaum, A., Gupta, R.: Static analysis of parameterized loop nests for energy efficient use of data caches. In: Proceeding on Compilers and Operating Systems for Low Power 2001 (COLP 2001) (September 2001)
Google Scholar
Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory I: Two-level memories. Algorithmica 12(2/3), 110–147 (1994)
Article MathSciNet MATH Google Scholar
Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica 12(2/3), 148–169 (1994)
Article MathSciNet MATH Google Scholar
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. Technical Report UT-CS-97-366 (1997)
Google Scholar
Wolfe, M., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 conference on programming Language Design and Implementation, Toronto, Ontario, Canada (June 1991)
Google Scholar
Zhang, Z., Zhang, X.: Cache-optimal methods for bit-reversals. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p. 26. ACM Press, New York (1999)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California, Irvine
Paolo D’Alberto, Alexandru Nicolau & Alexander Veidenbaum

Authors

Paolo D’Alberto
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Nicolau
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Veidenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Parasol Lab, Dept. of Computer Science, Texas A&M University, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

D’Alberto, P., Nicolau, A., Veidenbaum, A. (2004). A Data Cache with Dynamic Mapping. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2003. Lecture Notes in Computer Science, vol 2958. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24644-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-24644-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21199-0
Online ISBN: 978-3-540-24644-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics