Abstract
On-chip multiprocessor can be an alternative to wide-issue superscalar processor to exploit increasing number of transistors on a silicon chip. Utilization of cache has more performance impact due to its higher penalty for remote (off-chip) communication than board-level implementation. We examine two options for better utilizing cache resource: (1) private data is only cached at L1 and L2 is used only for shared data, (2) dividing cache area into L2 and remote victim cache or just a large L2 cache. Results of execution-driven simulations showed that the first option improved the performance up to 10%. For the second option, four out of six benchmark programs showed that large L2 is more effective than the combination of L2 and remote victim cache.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Supported in part by a National Science Foundation Grant No. MIPS 9522265
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
K. Olukotun et al., “The Case for a Single-Chip Multiprocessor” in Proceedings of 7th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 2–11, October 1996.
Y. Nunomura, T. Shimizu and O. Tomisawa, “M32R/D-Integrating DRAM and Microprocessor”, IEEE Micro, Vol. 17, No. 6, 40–48, November/December 1997.
A-T. Nguyen, M. Michael, A. Sharma and J. Torrellas, “The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures”, in Proceedings of 1996 International Conference on Computer Design, 486–490, October 1996.
S. C. Woo et.al., “The SPLASH-2 Programs: Characterization and Methodological Considerations”, in Proceedings of the 22nd International Symposium on Computer Architecture, 24–36, June 1995.
Z. Zhang and J. Torrellas, “Reducing Remote Conflict Misses: NUMA with Remote Cache COMA”, in Proceedings of International Symposium on High Performance Computer Architecture, 272–281, February 1997.
A. Moga and M. Dubois, “The Effectiveness of SRAM Network Caches in Clustered DSMs”, in Proceedings of The Fourth International Symposium on High Performance Computer Architecture, 103–112, February 1998.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oi, H., Ranganathan, N. (1999). Utilization of cache area in on-chip multiprocessor. In: Polychronopoulos, C., Fukuda, K.J.A., Tomita, S. (eds) High Performance Computing. ISHPC 1999. Lecture Notes in Computer Science, vol 1615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0094939
Download citation
DOI: https://doi.org/10.1007/BFb0094939
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65969-3
Online ISBN: 978-3-540-48821-7
eBook Packages: Springer Book Archive