Abstract
In this paper we present an exhaustive evaluation of the memory subsystem in a chip-multiprocessor (CMP) architecture composed of 16 cores. The characterization is performed making use of a new simulator that we have called DCMPSIM and extends the Rice Simulator for ILP Multiprocessors (RSIM) with the functionality required to model a contemporary CMP in great detail.
To better understand the behavior of the memory subsystem, we propose a taxonomy of the L1 cache misses found in CMPs which subsequently we use to determine where the hot spots of the memory hierarchy are and, thus, where computer architects have to place special emphasis to improve the performance of future dense single-chip multiprocessors, which will integrate 16 or more processor cores.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barroso, L.A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., Verghese, B.: Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In: Proc. of 27th Int’l Symp. on Computer Architecture, pp. 282–293 (2000)
Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M., Olukotun, K.: The Stanford Hydra CMP. IEEE Micro 20, 71–84 (2000)
Kalla, R., Sinharoy, B., Tendler, J.M.: IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro 24, 40–47 (2004)
Krewell, K.: UltraSPARC IV Mirrors Predecessor. Micro. Report, pp. 1-3 (2003)
Hughes, C.J., Pai, V.S.P., Ranganathan, P., Adve, S.V.: RSIM: Simulating Shared-Memory Multiprocessors with ILP Proccesors. IEEE Computer 35, 68–76 (2002)
Moshovos, A., Memik, G., Falsafi, B., Choudhary, A.: JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers. In: Proc. of 7th Int’l Symp. on High-Performance Computer Architecture, pp. 85–96 (2001)
Ekman, M., Dahlgren, F., Stenström, P.: Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors. In: Proc. of 1st Workshop on Duplicating, Deconstructing and Debunking, pp. 2–11 (2002)
Beckmann, B., Wood, D.: Managing Wire Delay in Large Chip-Multiprocessor Caches. In: Proc. of 37th Int’l Symp. on Microarchitecture, pp. 319–330 (2004)
Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs. In: Proc. of 10th Int’l Symp. on High Performance Computer Architecture, pp. 176–185 (2004)
Takahasi, M., Takano, H., Kaneko, E., Suzuki, S.: A Shared-bus Control Mechanism and a Cache Coherence Protocol for a High-performance On-chip Multiprocessor. In: Proc. of 2nd Int’l Conference on High-Performance Computer Architecture, pp. 314–322 (1996)
Hammond, L., Willey, M., Olukotun, K.: Data Speculation Support for a Chip Multiprocessor. In: Proc. of the 8th Int’l Symp. on Architectural Support for Parallel Languages and Operating Systems, pp. 58–69 (1998)
Krishnan, V., Torrellas, J.: A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Transactions On Computers 48, 866–880 (1999)
Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A Scalable Approach to Thread-Level Speculation. In: Proc. of 27th Int’l Symp. on Computer Architecture, pp. 1–12 (2000)
Yanagawa, Y., Hung, L.D., Iwama, C., Barli, N.D., Sakai, S., Tanaka, H.: Complexity Analysis of A Cache Controller for Speculative Multithreading Chip Multiprocessors. In: Proc. of 10th Int’l Conference on High Performance Computing, pp. 393–404 (2003)
Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
Sweazey, P., Smith, A.J.: A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus. In: Proc. of 13th Int’l Symp. on Computer Architecture, pp. 414–423 (1986)
Charlesworth, A.: The Sun Fireplane Interconnect. In: Proc. of SC2001 High Performance Networking and Computing Conference, pp. 1–14 (2001)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and Methodological Considerations. In: Proc. of 22nd Int’l Symp. on Computer Architecture, pp. 24–36 (1995)
Culler, D.E., Dusseau, A., Goldstein, S.C., Krishnamurthy, A., Lumetta, S., Luna, S., von Eicken, T., Yelick, K.: Parallel Programming in Split-C. In: Proc. of Int’l SC1993 High Performance Networking and Computing Conference, pp. 262–273 (1993)
Mukherjee, S.S., Sharma, S.D., Hill, M.D., Larus, J.R., Rogers, A., Saltz, L.: Efficient Support for Irregular Applications on Distributed-Memory Machines. In: Proc. of 5th Int’l Symp. on Principles and Practice of Parallel Programing, pp. 68–79 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Villa, F.J., Acacio, M.E., García, J.M. (2005). Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_27
Download citation
DOI: https://doi.org/10.1007/11557654_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29031-5
Online ISBN: 978-3-540-32079-1
eBook Packages: Computer ScienceComputer Science (R0)