Abstract
Dynamic Mapping is an approach to cope with the loss of performance due to cache interference and to improve performance predictability of blocked algorithms for modern architectures. An example is matrix multiply: tiling matrix multiply for a data cache of 16KB using optimal tile size achieves an average data-cache miss rate of 3%, but with peaks of 16% due to interference.
Dynamic Mapping is a software-hardware approach where the mapping in cache is determined at compile time, by manipulating the address used by the data cache. The reduction of cache misses translates into a 2-fold speed-up for matrix multiply and FFT by eliminating data-cache miss spikes.
Dynamic mapping has the same goal as other proposed approaches, but it determines the cache mapping before issuing a load. It uses the computational power of the processor – instead of the memory controller or the data cache mapping – and it has no effect on the access time of memory and cache. It is an approach combining several concepts, such as non-standard cache mapping functions and data layout reorganization and, potentially, without any overhead.
This work is supported in part by NSF, Contract Number ACI 0204028.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Banerjee, U.: Loop Transformations for Restructuring Compilers The Foundations. Kluwer Academic Publishers, Dordrecht (1993)
Bilardi, G., Preparata, F.P.: Processor - time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32(5), 531–559 (1999)
Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: 28th Annual Symposium on Foundations of Computer Science, Los Angeles, California, October 1987, pp. 204–216 (1987)
Aggarwal, A., Alpern, B., Chandra, A.K., Snir, M.: A model for hierarchical memory. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, New York, pp. 305–314 (1987)
Clauss, P., Meister, B.: Automatic memory layout transformation to optimize spatial locality in parameterized loop nests. In: 4th Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT-4, Toulouse, France (January 2000)
Cabeza, M.L.C., Clemente, M.I.G., Rubio, M.L.: Cachesim: a cache simulator for teaching memory hierarchy behavior. In: Proceedings of the 4th annual Sigcse/Sigue on Innovation and Technology in Computer Science education, p. 181 (1999)
Bilardi, G., D’Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, p. 26. Springer, Heidelberg (2001)
D’Alberto, P.: Performance evaluation of data locality exploitation. Technical report
David, S.E.: The making of linux/ia64. Technical report
Catthoor, F., Dutt, N.D., Kozyrakis, C.E.: How to solve the current memory access and data transfer bottlenecks: At the processor architecture or at the compiler level? In: DATE (March 2000)
Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48(2), 142–149 (1999)
Frigo, M., Johnson, S.G.: The fastest fourier transform in the west. Technical Report MIT-LCS-TR-728, Massachusetts Institute of technology (September 1997)
Gatlin, K.S., Carter, L.: Memory hierarchy considerations for fast transpose and bit-reversals. In: HPCA, pp. 33–43 (1999)
González, A., Valero, M., Topham, N., Parcerisa, J.M.: Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th international conference on Supercomputing, pp. 76–83. ACM Press, New York (1997)
Gupta, R.: Architectural adaptation in amrm machines. In: Proceedings of IEEE Computer Society Workshop on VLSI 2000, Los Alamitos, CA, USA, pp. 75–79 (2000)
Hennesy, J.L., Patterson, D.A.: Computer architecture a quantitative approach, 2nd edn. Morgan Kaufman, San Francisco (1996)
Hong, J., Kung, T.H.: I/o complexity, the red-blue pebble game. In: Proceedings of the 13th Ann. ACM Symposium on Theory of Computing, October 1981, pp. 326–333 (1981)
Zhang, L., Fang, Z., Parker, M., Mathew, B.K., Schaelicke, L., Carter, J.B., Hsieh, W.C., McKee, S.A.: The impulse memory controller. IEEE Transactions on Computers, Special Issue on Advances in High Performance Memory Systems, 1117–1132 (November 2001)
Granston, E.D., Jalby, W., Teman, O.: To copy or not to copy: a compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing, November 1993, pp. 410–419 (1993)
Johnson, T.L., Hwu, W.m.: Run-time adaptive cache hierarchy management via reference analysis. In: Proceedings of the 24th Annual International Symposium on Computer Architecture (1997)
Johnson, T.L., Hwu, W.W.: Run-time adaptive cache hierarchy management via reference analysis. In: 24th Annual International Symposium on Computer Architecture ISCA 1997, May 1997, pp. 315–326 (1997)
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: Proceedings 40th Annual Symposium on Foundations of Computer Science (1999)
Pugh, W.: Counting solutions to presburger formulas: How and why. In: SIGPLAN Programming language issues in software systems, Orlando, Florida, USA, pp. 94–96 (1994)
Carter, J.B., Hsieh, W.C., Stoller, L.B., Swanson, M.R., Zhang, L., Brunv, E.L., Davis, A., Kuo, C.C., Kuramkote, R., Parker, M.A., Schaelicke, L., Tateyama, T.: Impulse: Building a smarter memory controller. In: In the Proceedings of the Fifth International Symposium on High Performance Computer Architecture (HPCA- 5), January 1999, pp. 70–79 (1999)
Seznec, A.: A case for two-way skewed-associative caches. In: Proc. 20th Annual Symposium on Computer Architecture, June 1993, pp. 169–178 (1993)
Smith, A.J.: Cache Memories. ACM Computing Surveys 14(3), 473–530 (1982)
Ghosh, S.M.S., Martonosi, M.: Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems 21(4), 703–746 (1999)
D’Alberto, P., Nicolau, A., Veidenbaum, A., Gupta, R.: Static analysis of parameterized loop nests for energy efficient use of data caches. In: Proceeding on Compilers and Operating Systems for Low Power 2001 (COLP 2001) (September 2001)
Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory I: Two-level memories. Algorithmica 12(2/3), 110–147 (1994)
Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica 12(2/3), 148–169 (1994)
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. Technical Report UT-CS-97-366 (1997)
Wolfe, M., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 conference on programming Language Design and Implementation, Toronto, Ontario, Canada (June 1991)
Zhang, Z., Zhang, X.: Cache-optimal methods for bit-reversals. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p. 26. ACM Press, New York (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
D’Alberto, P., Nicolau, A., Veidenbaum, A. (2004). A Data Cache with Dynamic Mapping. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2003. Lecture Notes in Computer Science, vol 2958. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24644-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-24644-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21199-0
Online ISBN: 978-3-540-24644-2
eBook Packages: Springer Book Archive