Skip to main content
Log in

Performance Advantage of Reconfigurable Cache Design on Multicore Processor Systems

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

With the trends of microprocessor design towards multicore, cache performance becomes more important because an off-chip access would be increasingly expensive due to the competition across the processor cores. A question arises: How to design the cache architecture to prevent a performance bottleneck caused by data accesses? This work studies a reconfigurable cache architecture that can be dynamically configured for meeting the individual demand of running applications. Using a self-developed cache simulator, we first examined how different cache organization and configuration influence the parallel execution of OpenMP applications. The experimental results show that applications benefit from a flexible cache with reconfigurability. This motivated us to go a step further and develop a hardware prototype of this novel architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chandra, R. et al.: Parallel Programming in OpenMP. Number 978-1-55860-671-5 in ISBN. Morgan Kaufmann (2000)

  2. Pacheco, P.: Parallel Programming with MPI. Number 978-1-55860-339-4 in ISBN. Morgan Kaufmann (1996)

  3. Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)

  4. Wang, Z., Sha, E., Hu, X.: Combined partitioning and data padding for scheduling multiple loop nests. In: Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 67–75 (2001)

  5. Somnath G., Margaret M. and Sharad M. (1998). Precise miss analysis for program transformations with caches of arbitrary associativity. ACM SIG-PLAN Notices 33(11): 228–239

    Article  Google Scholar 

  6. Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the last line of defense before hitting the memory wall for CMPs. In: Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA’04), pp. 176–185, Madrid, Spain, February 2004

  7. Molnos, A.M., Cotofana, S.D., Heijligers, M.J.M., van Eijndhoven, J.T.J.: Static cache partitioning robustness analysis for embedded on-chip multi-processors. In: Proceedings of the 3rd Conference on Computing Frontiers (CF’06), pp. 353–360, Ischia, Italy, May 2006

  8. Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: Proceedings of the 3rd Conference on Computing frontiers (CF’06), pp. 361–372, Ischia, Italy, May 2006

  9. Carvalho, M.B., Goes, L., Martins, C.: Dynamically reconfigurable cache architecture using adaptive block allocation policy. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS), April 2006

  10. Gibson, J., Kunz, R., Ofelt, D., Horowitz, M., Hennessy, J., Heinrich, M.: FLASH vs. (simulated) FLASH: closing the simulation loop. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 49–58, November 2000

  11. Herrod, S.A.: Using Complete Machine Simulation to Understand Computer System Behavior. Ph.D. thesis, Stanford University, February 1998

  12. Magnusson, P.S., Werner, B.: Efficient Memory Simulation in SimICS. In: Proceedings of the 8th Annual Simulation Symposium. Phoenix, Arizona, USA, April 1995

  13. Austin T., Larson E. and Ernst D. (2002). SimpleScalar: an infrastructure for computer system modeling. Computer 35(2): 59–67

    Article  Google Scholar 

  14. Curtis-Maury, M., Ding, X., Antonopoulos, C., Nikolopoulos, D.: An evaluation of OpenMP on current and emerging multithreaded/multicore processors. In: Proceedings of the First International Workshop on OpenMP (IWOMP), Eugene, Oregon USA, June 2005

  15. WWW.Cachegrind: a Cache-miss Profiler. Available at http://developer.kde.org/sewardj/docs-2.2.0/cg_main.html#cg-top

  16. Nethercote, N., Seward, J.: Valgrind: a program supervision framework. In: Proceedings of the Third Workshop on Runtime Verification (RV’03), Boulder, Colorado, USA, July 2003. Available at http://developer.kde.org/sewardj

  17. Martonosi M., Gupta A. and Anderson T. (1995). Tuning memory performance of sequential and parallel programs. Computer 28(4): 32–40

    Article  Google Scholar 

  18. Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: CF ’06: Proceedings of the 3rd Conference on Computing Frontiers, pp. 361–372 (2006)

  19. Gordon-Ross, A., Vahid, F., Dutt, N.: Fast configurable-cache tuning with a unified second-level cache. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 323–326 (2005)

  20. Abella J., González A., Vera X. and O’Boyle M. (2005). IATAC: a smart predictor to turn-off L2 cache lines. ACM Trans Arch Code Optim 2(1): 55–77

    Article  Google Scholar 

  21. Ishihara, T., Fallah, F.: A non-uniform cache architecture for low power system design. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 363–368 (2005)

  22. Saito, H. et al.: Large system performance of SPEC OMP2001 benchmarks. In Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) High performance computing: 4th International Symposium, ISHPC 2002. Proceedings, Volume 2327 of Lecture Notes in Computer Science, pp. 370–379, May 2002

  23. Bailey, D. et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University, March 1994

  24. Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center, October 1999

  25. Nowak, F., Buchty, R., Karl, W.: Adaptive cache infrastructure: supporting dynamic program changes following dynamic program behavior. In: Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA 2008), Dresden, Germany, February 2008

  26. Buchty, R., Nowak, F., Karl, W.: A Run-time Reconfigurable Cache Architecture. In: Proceedings of the International Conference ParCo 2007, Volume 15 of Advances in Parallel Computing, ISBN 978-3-9810843-4-4, pp. 757–766. IOS Press, Juelich, Germany, September 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Tao.

Additional information

This work was conducted as Dr. Tao worked at the Institut für Technische Informatik, Universität Karlsruhe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, J., Kunze, M., Nowak, F. et al. Performance Advantage of Reconfigurable Cache Design on Multicore Processor Systems. Int J Parallel Prog 36, 347–360 (2008). https://doi.org/10.1007/s10766-008-0075-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-008-0075-4

Keywords

Navigation