Skip to main content
Log in

Parallel Memory Architecture for Application-Specific Instruction-Set Processors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. These applications typically set high requirements for computational performance and often parallelism is the key solution to meet the performance requirements. In order to exploit the parallel processing units, memory should be able to feed the data path with data. This calls for a memory organization supporting parallel memory accesses. In this paper, a conflict resolving parallel data memory system for application-specific instruction-set processors is described. The memory structure is generic and reusable to support various application-specific designs. The proposed memory system does not employ any predefined access format signals for memory addressing. The proposed parallel memory system is attached to an application-specific instruction-set processor core and comparison on area, power, and critical path are shown. The experiments show that significant power savings can be obtained by exploiting the parallel memory system instead of multi-port memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Sawyer, N., & Defossez, M. (2002). Quad-port memories in Virtex devices. Xilinx application note, XAPP228 (v1.0) (September 24).

  2. Ang, S. S., Constantinides, G., Cheung, P., & Luk, W. (2006). A flexible multi-port caching scheme for reconfigurable platforms. In K. Bertels, et al. (Eds.), ARC 2006, LNCS (Vol. 3985, pp. 205–216). New York: Springer.

    Google Scholar 

  3. Kloker, K. L. (1986). The Motorola DSP56000 digital signal processor. IEEE Micro, 6(6), 29–48 (December).

    Article  Google Scholar 

  4. Kaneko, K., Nakagawa, T., Kiuchi, A., Hagiwara, Y., Ueda, H., Matsushima, H., et al. (1987). A 50ns DSP with parallel processing architecture. In IEEE int. solid-state circuits conference, digest of technical papers. (pp. 158–159) (February).

  5. Sohi, G. S., & Franklin, M. (1991). High-bandwidth data memory systems for superscalar processors. In Proc. 4th int. conf. architectural support for programming languages and operating systems, Santa Clara, CA, U.S.A. (pp. 53–62) (April 8–11).

  6. Juan, T., Navarro, J. J., & Temam, O. (1997). Data caches for superscalar processors. In Proc. 11th int. conf. supercomputing, Vienna, Austria, (pp. 60–67) (July 7–11).

  7. Rivers, J. A., Tyson, G. S., Davidson, E. S., & Austin, T. M. (1997). On high-bandwidth data cache design for multi-issue processors. In Proc. 30th ann. ACM/IEEE int. symp. microarchitecture, research triangle park, NC, U.S.A. (pp. 46–56) (December 1–3).

  8. Zhu, Z., Johguchi, K., Mattausch, H. J., Koide, T., Hirakawa, T., & Hironaka, T. (2003). A novel hierarchical multi-port cache. In Proc. 29th European solid-state circuits conf., Estoril, Portugal (pp. 405–408) (September 16–18).

  9. Patel, K., Macii, E., & Poncino, M. (2004). Energy-performance tradeoffs for the shared memory in multi-processor systems-on-chip. In Proc. IEEE int. symp. circuits and systems, Vancouver, British Columbia, Canada (Vol. 2, pp. 361–364) (May 23–26).

  10. Budnik, P., & Kuck, D. J. (1971). The organization and use of parallel memories. IEEE transactions on computers, C-20(12), 1566–1569 (December).

    Article  Google Scholar 

  11. Kim, K., & Prasanna, V. K. (1993). Latin squares for parallel array access. IEEE Transactions on Parallel and Distributed Systems, 4(4), 361–370 (April).

    Article  Google Scholar 

  12. Frailong, J. M., Jalby, W., & Lenfant, J. (1985). XOR-schemes: A flexible data organization in parallel memories. In Proc. int. conf. parallel process. (pp. 276–283), (August 20–23).

  13. Liu, Z., & Li, X. (1995). XOR storage schemes for frequently used data patterns. Journal of Parallel and Distributed Computing, 25(2), 162–173 (March).

    Article  Google Scholar 

  14. Deb, A. (1996). Multiskewing – a novel technique for optimal parallel memory access. IEEE Transactions on Parallel and Distributed Systems 7(6), 595–604 (June).

    Article  Google Scholar 

  15. Rau, B. R. (1991). Pseudo-randomly interleaved memory. In Proc. 18th ann. int. symp. computer architecture, Toronto, Ontario, Canada. (pp. 74–83) (May 27–30).

  16. Seznec, A., & Lenfant, J. (1995). Odd memory systems: a new approach. Journal of Parallel and Distributed Computing, 26(2), 248–256 (April).

    Article  Google Scholar 

  17. Tanskanen, J. K., Creutzburg, R., & Niittylahti, J. T. (2005). On design of parallel memory access schemes for video coding. Journal of VLSI Signal Processing, 40(2), 215–237 (June).

    Article  Google Scholar 

  18. Pitkänen, T., Mäkinen, R., Heikkinen, J., Partanen, T., & Takala, J. (2006). Low-power, high-performance TTA processor for 1024-point fast fourier transform. In S. Vassiliadis, et al. (Eds.), Embedded computer systems: Architectures, modeling, and simulation: Proc. 6th int. workshop SAMOS 2006. LNCS, (Vol. 4017, pp. 227–236). New York: Springer.

    Google Scholar 

  19. Corporaal, H. (1997). Microprocessor architectures: From VLIW to TTA. Chichester: Wiley.

    Google Scholar 

  20. Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proc. SPIE - multimedia on mobile devices. 05070X–1–10.

  21. Takala, J. H., Järvinen, T. S., & Sorokin, H. T. (2003). Conflict-free parallel memory access scheme for FFT processors. In Proc. IEEE int. symp. circuit syst. Bangkok, Thailand (Vol. 4., pp. 524–527) (May 25–28).

  22. Aho, E., Vanne, J., & Hämäläinen, T. D. (2006). Parallel memory implementation for arbitrary stride accesses. In Proc. int. conf. embedded comput. syst. architectures modeling simulation, (pp. 1–6) Samos, Greece (July).

Download references

Acknowledgements

This work has been supported in part by the Academy of Finland under project 205743 and the Finnish Funding Agency for Technology and Innovation under research funding decision 40441/05.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teemu Pitkänen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pitkänen, T., Tanskanen, J.K., Mäkinen, R. et al. Parallel Memory Architecture for Application-Specific Instruction-Set Processors. J Sign Process Syst Sign Image Video Technol 57, 21–32 (2009). https://doi.org/10.1007/s11265-008-0173-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0173-y

Keywords

Navigation