skip to main content
10.1145/1120725.1121008acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
Article

Customized on-chip memories for embedded chip multiprocessors

Published: 18 January 2005 Publication History

Abstract

Ensuring that most of data accesses are satisfied from on-chip memories is a critical problem for chip multiprocessors, as the cost of an off-chip access can be very high. Particularly, multiple cores that need to access the off-chip memory system may contend with each other for the same buses/pins to get there. While it is possible to structure on-chip memory space as shared memory or private memory, each of these has its own drawbacks. In an attempt to achieve lower power consumption than these conventional memory architectures, this paper proposes and evaluates an application-specific hybrid memory architecture that has both shared and private components. The approach is built upon the idea of capturing the amount of privately-accessed and shared data across processors through a polyhedral tool, and using this information to guide memory space partitioning across two dimensions, namely, across parallel processors and across shared and private memory components. We evaluate the resulting memory configurations using a set of benchmarks and compare them to pure private and pure shared architectures. When running the same set of applications with the same code optimizations, our results indicate that the proposed hybrid memory design methodology leads to much less power consumption than the conventional architectures.

References

[1]
F. Angiolini, L. Benini, and A. Caprara. Polynomial-Time Algorithm for On-Chip Scratch-Pad Memory Partitioning. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003.
[2]
U. Banerjee. Loop Parallelization. Kluwer Academic Publishers, 1994.
[3]
Y. Cao, H. Tomiyama, T. Okuma, and H. Yasuura. Data Memory Design Considering Effective Bitwidth for Low-Energy Embedded Systems. In Proceedings of the 15th International Symposium on System Synthesis, Kyoto, Japan, October 2002.
[4]
F. Catthoor, S. Wuytack, E. D. Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom Memory Management Methodology -- Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, 1998.
[5]
F. Gharsalli, S. Meftali, F. Rousseau, and A. A. Jerraya. Automatic Generation of Embedded Memory Wrapper for Multiprocessor SoC. In Proceedings of the 39th Design Automation Conference, New Orleans, Louisiana, 1999.
[6]
W. Kelly and W. Pugh. Finding Legal Reordering Transformations Using Mappings. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. pp. 107--124, 1994.
[7]
C. H. Koelbel, D. B. Loveman, and R. S. Schreiber. The High Performance Fortran Handbook. MIT Press, 1993.
[8]
V. Krishnan and J. Torrellas. A Chip Multiprocessor Architecture with Speculative Multi-threading. IEEE Transactions on Computers, Special Issue on Multi-threaded Architecture, September 1999.
[9]
S. Meftali, F. Gharsalli, F. Rousseau, and A. A. Jerraya. An Optimal Memory Allocation for Application-Specific Multiprocessor System-on-Chip. In Proceedings of the International Symposium on Systems Synthesis, Montreal, Canada, 2001.
[10]
MP98: A Mobile Processor. http://www.labs.nec.co.jp/MP98/top-e.htm.
[11]
B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluating Alternatives for a Multiprocessor Microprocessor. In Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, PA, 1996.
[12]
The OpenMP Application Program Interface. http://www.openmp.org/.
[13]
P. R. Panda, N. D. Dutt, and A. Nicolau. Architectural Exploration and Optimization of Local Memory in Embedded Systems. In Proceedings of the 10th international Symposium on System Synthesis, Antwerp, Belgium, September 1997.
[14]
G. Reinman and N. P. Jouppi. CACTI 2.0: An Integrated Cache Timing and Power Model. Compaq, WRL, Research Report 2000/7, February 2000.
[15]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 2002.
[16]
S. Udayakumaran and R. Barua. Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASP-DAC '05: Proceedings of the 2005 Asia and South Pacific Design Automation Conference
January 2005
1495 pages
ISBN:0780387376
DOI:10.1145/1120725
  • General Chair:
  • Ting-Ao Tang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 January 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPDAC05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Dynamic Shared SPM Reuse for Real-Time Multicore Embedded SystemsACM Transactions on Architecture and Code Optimization10.1145/273805112:2(1-25)Online publication date: 11-May-2015
  • (2011)Design and Implement of Sharable Multi-Channel On-Chip Memory for Embedded CMP SystemAdvanced Materials Research10.4028/www.scientific.net/AMR.217-218.1147217-218(1147-1152)Online publication date: Mar-2011
  • (2011)BibliographyReal-Time Embedded Systems10.1201/b10935-12(187-207)Online publication date: 7-Jun-2011
  • (2010)Algorithms for optimally arranging multicore memory structuresEURASIP Journal on Embedded Systems10.1155/2010/8715102010(1-16)Online publication date: 1-Jan-2010
  • (2009)Global Variable Partition with Virtually Shared Scratch Pad Memory to Minimize Schedule LengthProceedings of the 2009 International Conference on Parallel Processing Workshops10.1109/ICPPW.2009.22(478-483)Online publication date: 22-Sep-2009
  • (2009)Efficient Scratchpad Memory Management Based on Multi-thread for MPSoC ArchitectureProceedings of the 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing10.1109/EmbeddedCom-ScalCom.2009.83(429-434)Online publication date: 25-Sep-2009
  • (2006)Integrated scratchpad memory optimization and task scheduling for MPSoC architecturesProceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems10.1145/1176760.1176809(401-410)Online publication date: 22-Oct-2006
  • (2006)Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchiesProceedings of the 43rd annual Design Automation Conference10.1145/1146909.1146925(49-52)Online publication date: 24-Jul-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media