Abstract
Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. These applications typically set high requirements for computational performance and often parallelism is the key solution to meet the performance requirements. In order to exploit the parallel processing units, memory should be able to feed the data path with data. This calls for a memory organization supporting parallel memory accesses. In this paper, a conflict resolving parallel data memory system for application-specific instruction-set processors is described. The memory structure is generic and reusable to support various application-specific designs. The proposed memory system does not employ any predefined access format signals for memory addressing. The proposed parallel memory system is attached to an application-specific instruction-set processor core and comparison on area, power, and critical path are shown. The experiments show that significant power savings can be obtained by exploiting the parallel memory system instead of multi-port memory.
Similar content being viewed by others
References
Sawyer, N., & Defossez, M. (2002). Quad-port memories in Virtex devices. Xilinx application note, XAPP228 (v1.0) (September 24).
Ang, S. S., Constantinides, G., Cheung, P., & Luk, W. (2006). A flexible multi-port caching scheme for reconfigurable platforms. In K. Bertels, et al. (Eds.), ARC 2006, LNCS (Vol. 3985, pp. 205–216). New York: Springer.
Kloker, K. L. (1986). The Motorola DSP56000 digital signal processor. IEEE Micro, 6(6), 29–48 (December).
Kaneko, K., Nakagawa, T., Kiuchi, A., Hagiwara, Y., Ueda, H., Matsushima, H., et al. (1987). A 50ns DSP with parallel processing architecture. In IEEE int. solid-state circuits conference, digest of technical papers. (pp. 158–159) (February).
Sohi, G. S., & Franklin, M. (1991). High-bandwidth data memory systems for superscalar processors. In Proc. 4th int. conf. architectural support for programming languages and operating systems, Santa Clara, CA, U.S.A. (pp. 53–62) (April 8–11).
Juan, T., Navarro, J. J., & Temam, O. (1997). Data caches for superscalar processors. In Proc. 11th int. conf. supercomputing, Vienna, Austria, (pp. 60–67) (July 7–11).
Rivers, J. A., Tyson, G. S., Davidson, E. S., & Austin, T. M. (1997). On high-bandwidth data cache design for multi-issue processors. In Proc. 30th ann. ACM/IEEE int. symp. microarchitecture, research triangle park, NC, U.S.A. (pp. 46–56) (December 1–3).
Zhu, Z., Johguchi, K., Mattausch, H. J., Koide, T., Hirakawa, T., & Hironaka, T. (2003). A novel hierarchical multi-port cache. In Proc. 29th European solid-state circuits conf., Estoril, Portugal (pp. 405–408) (September 16–18).
Patel, K., Macii, E., & Poncino, M. (2004). Energy-performance tradeoffs for the shared memory in multi-processor systems-on-chip. In Proc. IEEE int. symp. circuits and systems, Vancouver, British Columbia, Canada (Vol. 2, pp. 361–364) (May 23–26).
Budnik, P., & Kuck, D. J. (1971). The organization and use of parallel memories. IEEE transactions on computers, C-20(12), 1566–1569 (December).
Kim, K., & Prasanna, V. K. (1993). Latin squares for parallel array access. IEEE Transactions on Parallel and Distributed Systems, 4(4), 361–370 (April).
Frailong, J. M., Jalby, W., & Lenfant, J. (1985). XOR-schemes: A flexible data organization in parallel memories. In Proc. int. conf. parallel process. (pp. 276–283), (August 20–23).
Liu, Z., & Li, X. (1995). XOR storage schemes for frequently used data patterns. Journal of Parallel and Distributed Computing, 25(2), 162–173 (March).
Deb, A. (1996). Multiskewing – a novel technique for optimal parallel memory access. IEEE Transactions on Parallel and Distributed Systems 7(6), 595–604 (June).
Rau, B. R. (1991). Pseudo-randomly interleaved memory. In Proc. 18th ann. int. symp. computer architecture, Toronto, Ontario, Canada. (pp. 74–83) (May 27–30).
Seznec, A., & Lenfant, J. (1995). Odd memory systems: a new approach. Journal of Parallel and Distributed Computing, 26(2), 248–256 (April).
Tanskanen, J. K., Creutzburg, R., & Niittylahti, J. T. (2005). On design of parallel memory access schemes for video coding. Journal of VLSI Signal Processing, 40(2), 215–237 (June).
Pitkänen, T., Mäkinen, R., Heikkinen, J., Partanen, T., & Takala, J. (2006). Low-power, high-performance TTA processor for 1024-point fast fourier transform. In S. Vassiliadis, et al. (Eds.), Embedded computer systems: Architectures, modeling, and simulation: Proc. 6th int. workshop SAMOS 2006. LNCS, (Vol. 4017, pp. 227–236). New York: Springer.
Corporaal, H. (1997). Microprocessor architectures: From VLIW to TTA. Chichester: Wiley.
Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proc. SPIE - multimedia on mobile devices. 05070X–1–10.
Takala, J. H., Järvinen, T. S., & Sorokin, H. T. (2003). Conflict-free parallel memory access scheme for FFT processors. In Proc. IEEE int. symp. circuit syst. Bangkok, Thailand (Vol. 4., pp. 524–527) (May 25–28).
Aho, E., Vanne, J., & Hämäläinen, T. D. (2006). Parallel memory implementation for arbitrary stride accesses. In Proc. int. conf. embedded comput. syst. architectures modeling simulation, (pp. 1–6) Samos, Greece (July).
Acknowledgements
This work has been supported in part by the Academy of Finland under project 205743 and the Finnish Funding Agency for Technology and Innovation under research funding decision 40441/05.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pitkänen, T., Tanskanen, J.K., Mäkinen, R. et al. Parallel Memory Architecture for Application-Specific Instruction-Set Processors. J Sign Process Syst Sign Image Video Technol 57, 21–32 (2009). https://doi.org/10.1007/s11265-008-0173-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0173-y