Skip to main content

A Data Cache with Dynamic Mapping

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2958))

Abstract

Dynamic Mapping is an approach to cope with the loss of performance due to cache interference and to improve performance predictability of blocked algorithms for modern architectures. An example is matrix multiply: tiling matrix multiply for a data cache of 16KB using optimal tile size achieves an average data-cache miss rate of 3%, but with peaks of 16% due to interference.

Dynamic Mapping is a software-hardware approach where the mapping in cache is determined at compile time, by manipulating the address used by the data cache. The reduction of cache misses translates into a 2-fold speed-up for matrix multiply and FFT by eliminating data-cache miss spikes.

Dynamic mapping has the same goal as other proposed approaches, but it determines the cache mapping before issuing a load. It uses the computational power of the processor – instead of the memory controller or the data cache mapping – and it has no effect on the access time of memory and cache. It is an approach combining several concepts, such as non-standard cache mapping functions and data layout reorganization and, potentially, without any overhead.

This work is supported in part by NSF, Contract Number ACI 0204028.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, U.: Loop Transformations for Restructuring Compilers The Foundations. Kluwer Academic Publishers, Dordrecht (1993)

    Book  MATH  Google Scholar 

  2. Bilardi, G., Preparata, F.P.: Processor - time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32(5), 531–559 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: 28th Annual Symposium on Foundations of Computer Science, Los Angeles, California, October 1987, pp. 204–216 (1987)

    Google Scholar 

  4. Aggarwal, A., Alpern, B., Chandra, A.K., Snir, M.: A model for hierarchical memory. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, New York, pp. 305–314 (1987)

    Google Scholar 

  5. Clauss, P., Meister, B.: Automatic memory layout transformation to optimize spatial locality in parameterized loop nests. In: 4th Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT-4, Toulouse, France (January 2000)

    Google Scholar 

  6. Cabeza, M.L.C., Clemente, M.I.G., Rubio, M.L.: Cachesim: a cache simulator for teaching memory hierarchy behavior. In: Proceedings of the 4th annual Sigcse/Sigue on Innovation and Technology in Computer Science education, p. 181 (1999)

    Google Scholar 

  7. Bilardi, G., D’Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, p. 26. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. D’Alberto, P.: Performance evaluation of data locality exploitation. Technical report

    Google Scholar 

  9. David, S.E.: The making of linux/ia64. Technical report

    Google Scholar 

  10. Catthoor, F., Dutt, N.D., Kozyrakis, C.E.: How to solve the current memory access and data transfer bottlenecks: At the processor architecture or at the compiler level? In: DATE (March 2000)

    Google Scholar 

  11. Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48(2), 142–149 (1999)

    Article  Google Scholar 

  12. Frigo, M., Johnson, S.G.: The fastest fourier transform in the west. Technical Report MIT-LCS-TR-728, Massachusetts Institute of technology (September 1997)

    Google Scholar 

  13. Gatlin, K.S., Carter, L.: Memory hierarchy considerations for fast transpose and bit-reversals. In: HPCA, pp. 33–43 (1999)

    Google Scholar 

  14. González, A., Valero, M., Topham, N., Parcerisa, J.M.: Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th international conference on Supercomputing, pp. 76–83. ACM Press, New York (1997)

    Google Scholar 

  15. Gupta, R.: Architectural adaptation in amrm machines. In: Proceedings of IEEE Computer Society Workshop on VLSI 2000, Los Alamitos, CA, USA, pp. 75–79 (2000)

    Google Scholar 

  16. Hennesy, J.L., Patterson, D.A.: Computer architecture a quantitative approach, 2nd edn. Morgan Kaufman, San Francisco (1996)

    Google Scholar 

  17. Hong, J., Kung, T.H.: I/o complexity, the red-blue pebble game. In: Proceedings of the 13th Ann. ACM Symposium on Theory of Computing, October 1981, pp. 326–333 (1981)

    Google Scholar 

  18. Zhang, L., Fang, Z., Parker, M., Mathew, B.K., Schaelicke, L., Carter, J.B., Hsieh, W.C., McKee, S.A.: The impulse memory controller. IEEE Transactions on Computers, Special Issue on Advances in High Performance Memory Systems, 1117–1132 (November 2001)

    Google Scholar 

  19. Granston, E.D., Jalby, W., Teman, O.: To copy or not to copy: a compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing, November 1993, pp. 410–419 (1993)

    Google Scholar 

  20. Johnson, T.L., Hwu, W.m.: Run-time adaptive cache hierarchy management via reference analysis. In: Proceedings of the 24th Annual International Symposium on Computer Architecture (1997)

    Google Scholar 

  21. Johnson, T.L., Hwu, W.W.: Run-time adaptive cache hierarchy management via reference analysis. In: 24th Annual International Symposium on Computer Architecture ISCA 1997, May 1997, pp. 315–326 (1997)

    Google Scholar 

  22. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: Proceedings 40th Annual Symposium on Foundations of Computer Science (1999)

    Google Scholar 

  23. Pugh, W.: Counting solutions to presburger formulas: How and why. In: SIGPLAN Programming language issues in software systems, Orlando, Florida, USA, pp. 94–96 (1994)

    Google Scholar 

  24. Carter, J.B., Hsieh, W.C., Stoller, L.B., Swanson, M.R., Zhang, L., Brunv, E.L., Davis, A., Kuo, C.C., Kuramkote, R., Parker, M.A., Schaelicke, L., Tateyama, T.: Impulse: Building a smarter memory controller. In: In the Proceedings of the Fifth International Symposium on High Performance Computer Architecture (HPCA- 5), January 1999, pp. 70–79 (1999)

    Google Scholar 

  25. Seznec, A.: A case for two-way skewed-associative caches. In: Proc. 20th Annual Symposium on Computer Architecture, June 1993, pp. 169–178 (1993)

    Google Scholar 

  26. Smith, A.J.: Cache Memories. ACM Computing Surveys 14(3), 473–530 (1982)

    Article  MathSciNet  Google Scholar 

  27. Ghosh, S.M.S., Martonosi, M.: Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems 21(4), 703–746 (1999)

    Article  Google Scholar 

  28. D’Alberto, P., Nicolau, A., Veidenbaum, A., Gupta, R.: Static analysis of parameterized loop nests for energy efficient use of data caches. In: Proceeding on Compilers and Operating Systems for Low Power 2001 (COLP 2001) (September 2001)

    Google Scholar 

  29. Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory I: Two-level memories. Algorithmica 12(2/3), 110–147 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  30. Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica 12(2/3), 148–169 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  31. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. Technical Report UT-CS-97-366 (1997)

    Google Scholar 

  32. Wolfe, M., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 conference on programming Language Design and Implementation, Toronto, Ontario, Canada (June 1991)

    Google Scholar 

  33. Zhang, Z., Zhang, X.: Cache-optimal methods for bit-reversals. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p. 26. ACM Press, New York (1999)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

D’Alberto, P., Nicolau, A., Veidenbaum, A. (2004). A Data Cache with Dynamic Mapping. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2003. Lecture Notes in Computer Science, vol 2958. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24644-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24644-2_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21199-0

  • Online ISBN: 978-3-540-24644-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics