Abstract
With the trends of microprocessor design towards multicore, cache performance becomes more important because an off-chip access would be increasingly expensive due to the competition across the processor cores. A question arises: How to design the cache architecture to prevent a performance bottleneck caused by data accesses? This work studies a reconfigurable cache architecture that can be dynamically configured for meeting the individual demand of running applications. Using a self-developed cache simulator, we first examined how different cache organization and configuration influence the parallel execution of OpenMP applications. The experimental results show that applications benefit from a flexible cache with reconfigurability. This motivated us to go a step further and develop a hardware prototype of this novel architecture.
Similar content being viewed by others
References
Chandra, R. et al.: Parallel Programming in OpenMP. Number 978-1-55860-671-5 in ISBN. Morgan Kaufmann (2000)
Pacheco, P.: Parallel Programming with MPI. Number 978-1-55860-339-4 in ISBN. Morgan Kaufmann (1996)
Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)
Wang, Z., Sha, E., Hu, X.: Combined partitioning and data padding for scheduling multiple loop nests. In: Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 67–75 (2001)
Somnath G., Margaret M. and Sharad M. (1998). Precise miss analysis for program transformations with caches of arbitrary associativity. ACM SIG-PLAN Notices 33(11): 228–239
Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the last line of defense before hitting the memory wall for CMPs. In: Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA’04), pp. 176–185, Madrid, Spain, February 2004
Molnos, A.M., Cotofana, S.D., Heijligers, M.J.M., van Eijndhoven, J.T.J.: Static cache partitioning robustness analysis for embedded on-chip multi-processors. In: Proceedings of the 3rd Conference on Computing Frontiers (CF’06), pp. 353–360, Ischia, Italy, May 2006
Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: Proceedings of the 3rd Conference on Computing frontiers (CF’06), pp. 361–372, Ischia, Italy, May 2006
Carvalho, M.B., Goes, L., Martins, C.: Dynamically reconfigurable cache architecture using adaptive block allocation policy. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS), April 2006
Gibson, J., Kunz, R., Ofelt, D., Horowitz, M., Hennessy, J., Heinrich, M.: FLASH vs. (simulated) FLASH: closing the simulation loop. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 49–58, November 2000
Herrod, S.A.: Using Complete Machine Simulation to Understand Computer System Behavior. Ph.D. thesis, Stanford University, February 1998
Magnusson, P.S., Werner, B.: Efficient Memory Simulation in SimICS. In: Proceedings of the 8th Annual Simulation Symposium. Phoenix, Arizona, USA, April 1995
Austin T., Larson E. and Ernst D. (2002). SimpleScalar: an infrastructure for computer system modeling. Computer 35(2): 59–67
Curtis-Maury, M., Ding, X., Antonopoulos, C., Nikolopoulos, D.: An evaluation of OpenMP on current and emerging multithreaded/multicore processors. In: Proceedings of the First International Workshop on OpenMP (IWOMP), Eugene, Oregon USA, June 2005
WWW.Cachegrind: a Cache-miss Profiler. Available at http://developer.kde.org/sewardj/docs-2.2.0/cg_main.html#cg-top
Nethercote, N., Seward, J.: Valgrind: a program supervision framework. In: Proceedings of the Third Workshop on Runtime Verification (RV’03), Boulder, Colorado, USA, July 2003. Available at http://developer.kde.org/sewardj
Martonosi M., Gupta A. and Anderson T. (1995). Tuning memory performance of sequential and parallel programs. Computer 28(4): 32–40
Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: CF ’06: Proceedings of the 3rd Conference on Computing Frontiers, pp. 361–372 (2006)
Gordon-Ross, A., Vahid, F., Dutt, N.: Fast configurable-cache tuning with a unified second-level cache. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 323–326 (2005)
Abella J., González A., Vera X. and O’Boyle M. (2005). IATAC: a smart predictor to turn-off L2 cache lines. ACM Trans Arch Code Optim 2(1): 55–77
Ishihara, T., Fallah, F.: A non-uniform cache architecture for low power system design. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 363–368 (2005)
Saito, H. et al.: Large system performance of SPEC OMP2001 benchmarks. In Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) High performance computing: 4th International Symposium, ISHPC 2002. Proceedings, Volume 2327 of Lecture Notes in Computer Science, pp. 370–379, May 2002
Bailey, D. et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University, March 1994
Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center, October 1999
Nowak, F., Buchty, R., Karl, W.: Adaptive cache infrastructure: supporting dynamic program changes following dynamic program behavior. In: Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA 2008), Dresden, Germany, February 2008
Buchty, R., Nowak, F., Karl, W.: A Run-time Reconfigurable Cache Architecture. In: Proceedings of the International Conference ParCo 2007, Volume 15 of Advances in Parallel Computing, ISBN 978-3-9810843-4-4, pp. 757–766. IOS Press, Juelich, Germany, September 2007
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was conducted as Dr. Tao worked at the Institut für Technische Informatik, Universität Karlsruhe.
Rights and permissions
About this article
Cite this article
Tao, J., Kunze, M., Nowak, F. et al. Performance Advantage of Reconfigurable Cache Design on Multicore Processor Systems. Int J Parallel Prog 36, 347–360 (2008). https://doi.org/10.1007/s10766-008-0075-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-008-0075-4