Configurable Data Memory for Multimedia Processing

Aho, Eero; Vanne, Jarno; HÄmÄlÄinen, Timo D.

doi:10.1007/s11265-007-0126-x

Configurable Data Memory for Multimedia Processing

Published: 16 August 2007

Volume 50, pages 231–249, (2008)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Eero Aho¹,
Jarno Vanne¹ &
Timo D. HÄmÄlÄinen¹

95 Accesses
4 Citations
Explore all metrics

Abstract

In modern multimedia applications, memory bottleneck can be alleviated with special stride data accesses. Data elements in stride access can be retrieved in parallel with parallel memories, in which the idea is to increase memory bandwidth with several memory modules working in parallel and feed the processor with only necessary data. Arbitrary stride access capability with interleaved memories is described in previous research where the skewing scheme is changed at run time according to the currently used stride. This paper presents the improved schemes which are adapted to parallel memories. The proposed novel parallel memory implementation allows conflict free accesses with all the constant strides which has not been possible in prior application specific parallel memories. Moreover, the possible access locations are unrestricted and the accessed data element count equals to the number of memory modules. Timing and area estimates are given for Altera Stratix FPGA and 0.18 micrometer CMOS process with memory module count from 2 to 32. The FPGA results show 129 MHz clock frequency for a system with 16 memory modules when read and write latencies are 3 and 2 clock cycles, respectively. The complexity of the proposed system is shown to be a trade-off between application specific and highly configurable parallel memory system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

H.-J. Stolberg, M. Berekovic, S. Moch, L. Friebe, M.B. Kulaczewski, S. Flügel, H. Klußmann, A. Dehnhardt and P. Pirsch, “HiBRID-SoC: A Multi-Core SoC Architecture for Multimedia,” J. VLSI Signal Process., vol. 41, no. 1, 2005, pp. 9–20.
Article Google Scholar
P. Ranganathan, S. Adve and N.P. Jouppi, “Performance of Image and Video Processing with General-purpose Processors and Media ISA Extensions,” in Proc. Int. Symp. Computer Architecture, Atlanta, GA, USA, 1999, pp. 124–135, May.
N. Slingerland and A. J. Smith, “Measuring the Performance of Multimedia Instruction Sets,” IEEE Trans. Comput., vol. 51, no. 11, 2002, pp. 1317–1332.
Article MathSciNet Google Scholar
J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, 3rd ed., Morgan Kaufman Publishers, 2003.
Google Scholar
J. Takala and T. Järvinen, “Stride Permutation Access in Interleaved Memory Systems,” in Domain-specific Multiprocessors—Systems, Architectures, Modeling, and Simulation, S. S. Bhattacharyya, E. F. Deprettere, and J. Teich (Eds.), Marcel Dekker, 2004, pp. 63–84.
Google Scholar
E. Aho, J. Vanne, K. Kuusilinna and T.D. Hämäläinen, “Address Computation in Configurable Parallel Memory Architecture,” IEICE Trans. Inf. Syst., vol. E87-D, no. 7, 2004, pp. 1674–1681.
Google Scholar
P. Budnik and D.J. Kuck, “The Organization and Use of Parallel Memories,” IEEE Trans. Comput., vol. C-20, no. 12, 1971, pp. 1566–1569.
Article Google Scholar
S. Chen, A. Postula, and L. Jozwiak, “Synthesis of XOR Storage Schemes with Different Cost for Minimization of Memory Contention,” in Proc. Euromicro Conf., Milan, Italy, 1999, pp. 170–177, Sep.
G. Kuzmanov, G. Gaydadjiev, and S. Vassiliadis, “Multimedia Rectangularly Addressable Memory,” IEEE Trans. Multimedia, vol. 8, no. 2, 2006, pp. 315–322.
Article Google Scholar
A. Norton and E. Melton, “A Class of Boolean Linear Transformations for Conflict-Free Power-of-Two Stride Access,” in Proc. Int. Conf. Parallel Processing, University Park, PA, USA, 1987, pp. 247–254, Aug.
M. Valero, T. Lang, M. Peiron, and E. Ayguadé, “Conflict-free Access for Streams in Multimodule Memories,” IEEE Trans. Comput., vol. 44, no. 5, 1995, pp. 634–646.
Article MATH Google Scholar
C. Verdier, E. Boutillon, A. Lafage, and A. Demeure, “Access and Alignment of Arrays for a Bidimensional Parallel Memory,” in Proc. Int. Conf. Application Specific Array Processors, San Francisco, CA, USA, 1994, pp. 346–356, Aug.
R. S. Katti, “Nonprime Memory Systems and Error Correction in Address Translation,” IEEE Trans. Comput., vol. 46, no. 1, 1997, pp. 75–79.
Article MathSciNet Google Scholar
T. Järvinen, P. Salmela, T. Sipilä, and J. Takala, “Systematic Approach for Path Metric Access in Viterbi Decoders,” IEEE Trans. Commun., vol. 53, no. 5, 2005, pp. 755–759.
Article Google Scholar
D. T. Harper III and D. A. Linebarger, “Conflict-free Vector Access Using a Dynamic Storage Scheme,” IEEE Trans. Comput., vol. 40, no. 3, 1991, pp. 276–283.
Article Google Scholar
D. T. Harper III, “Increased Memory Performance During Vector Accesses Through the Use of Linear Address Transformations,” IEEE Trans. Comput., vol. 41, no. 2, 1992, pp. 227–230.
Article Google Scholar
E. Aho, J. Vanne, T.D. Hämäläinen and K. Kuusilinna, “Block-level Parallel Processing for Scaling Evenly Divisible Images,” IEEE Trans. Circuits Syst. I, vol. 52, no. 12, 2005, pp. 2717–2725.
Article Google Scholar
E. Aho, J. Vanne and T.D. Hämäläinen, “Parallel Memory Architecture for Arbitrary Stride Accesses,” in Proc. IEEE Workshop Design and Diagnostics of Electronic Circuits and Systems, Prague, Czech Republic, 2006, pp. 65–70, Apr.
E. Aho, J. Vanne and T.D. Hämäläinen, “Parallel Memory Implementation for Arbitrary Stride Accesses,” in Proc. Embedded Computer Systems: Architectures, Modeling, and Simulation Conference, Samos, Greece, 2006, pp. 1–6, July.
P. Pirsch, C. Reuter, J.P. Wittenburg, M.B. Kulaczewski and H.-J. Stolberg, “Architecture Concepts for Multimedia Signal Processing,” J. VLSI Signal Process., vol. 29, no. 3, 2001, pp. 157–165.
Article MATH Google Scholar
P. Faraboschi, G. Desoli and J.A. Fisher, “The Latest Word in Digital and Media Processing,” IEEE Signal Process. Mag., vol. 15, no. 2, 1998, pp. 59–85.
Article Google Scholar
D. Talla, L.K. John, V. Lapinskii and B.L. Evans, “Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures,” in Proc. Int. Conf. Computer Design, Austin, TX, USA, 2000, pp. 163–172, Sep.
D. Cheresiz, B. Juurlink, S. Vassiliadis and H.A.G. Wijshoff, “The CSI Multimedia Architecture,” IEEE Trans. VLSI Syst., vol. 13, no. 1, 2005, pp. 1–13.
Article Google Scholar
A. Peleg and U. Weiser, “MMX Technology Extension to the Intel Architecture,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 42–50.
Article Google Scholar
S. Thakkar and T. Huff, “Internet Streaming SIMD Extensions,” IEEE Computer, vol. 32, no. 12, 1999, pp. 26–34.
Google Scholar
D. Boggs, A. Baktha, J. Hawkins, D.T. Marr, J. A. Miller, P. Roussel, R. Singhal, B. Toll and K.S. Venkatraman, “The Microarchitecture of the Intel® Pentium® 4 Processor on 90 nm Technology,” Intel Technol. J., vol. 8, no. 1, 2004, pp. 1–17.
Google Scholar
S. Oberman, G. Favor and F. Weber, “AMD 3DNow! Technology: Architecture and Implementations,” IEEE MICRO, vol. 19, no. 2, 1999, pp. 37–48.
Article Google Scholar
M. Tremblay, J.M. O’Connor, V. Narayanan and L. He, “VIS Speeds New Media Processing,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 10–20.
Article Google Scholar
D.A. Carlson, R.W. Castelino and R.O. Mueller, “Multimedia Extensions for a 550-MHz RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 32, no. 11, 1997, pp. 1618–1624.
Article Google Scholar
R.B. Lee, “Subword Parallelism with MAX-2,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 51–59.
Article Google Scholar
K. Diefendorff, P.K. Dubey, R. Hochsprung and H. Scale, “AltiVec Extension to PowerPC Accelerates Media Processing,” IEEE MICRO, vol. 20, no. 2, 2000, pp. 85–95.
Article Google Scholar
J. Fridman and Z. Greenfield, “The TigerSHARC DSP Architecture,” IEEE MICRO, vol. 20, no. 1, 2000, pp. 66–76.
Article Google Scholar
Texas Instruments, Inc., TMS320C64x Technical Overview, Texas Instruments, Inc., 2001, Jan.
C. Basoglu, W. Lee and J. O’Donnell, “The Equator MAP-CA™ DSP: An End-to-End Broadband Signal Processor™ VLIW,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 8, 2002, pp. 646–659.
Article Google Scholar
J.-W. van de Waerdt, S. Vassiliadis, S. Das, S. Mirolo, C. Yen, B. Zhong, C. Basto, J.-P. van Itegem, D. Amirtharaj, K. Kalra, P. Rodriguez, and H. van Antwerpen, “The TM3270 Media-Processor,” in Proc. IEEE/ACM Int. Symp. Microarchitecture, Barcelona, Spain, 2005, pp. 331–342, Nov.
V. Lappalainen, T.D. Hämäläinen and P. Liuha, “Overview of Research Efforts on Media ISA Extensions and Their Usage in Video Coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 8, 2002, pp. 660–670.
Article Google Scholar
J. Corbal, M. Valero and R. Espasa, “Exploiting a New Level of DLP in Multimedia Applications,” in Proc. Int. Symp. Microarchitecture, Haifa, Israel, 1999, pp. 72–79, Nov.
L. Zhang, Z. Fang, M. Parker, B.K. Mathew, L. Schaelicke, J.B. Carter, W. C. Hsieh and S.A. McKee, “The Impulse Memory Controller,” IEEE Trans. Comput., vol. 50, no. 11, 2001, pp. 1117–1132.
Article Google Scholar
S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle, “Dynamic Access Ordering for Streamed Computations,” IEEE Trans. Comput., vol. 49, no. 11, 2000, pp. 1255–1271.
Article Google Scholar
B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Chang, and Scott Rixner, “Imagine: Media Processing with Streams,” IEEE MICRO, vol. 21, no. 2, 2001, pp. 35–46.
Article Google Scholar
C. E. Kozyrakis and D. A. Patterson, “Scalable Vector Processors for Embedded Systems,” IEEE MICRO, vol. 23, no. 6, 2003, pp. 36–45.
Article Google Scholar
A. Seznec and J. Lenfant, “Interleaved Parallel Schemes,” IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 12, 1994, pp. 1329–1334.
Article Google Scholar
J.M. Frailong, W. Jalby and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” in Proc. Int’l Conf. Parallel Processing, Washington, DC, USA, 1985, pp. 276–283, Aug.
K. Kim and V.K. Prasanna Kumar, “Parallel Memory Systems for Image Processing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, 1989, pp. 654–659, June.
D.H. Lawrie, “Access and Alignment of Data in an Array Processor,” IEEE Trans. Comput., vol. C-24, no. 12, 1975, pp. 1145–1155.
Article MathSciNet Google Scholar
H.A.G. Wijshoff and J. van Leeuwen, “On Linear Skewing Schemes and d-ordered Vectors,” IEEE Trans. Comput., vol. C-36, no. 2, 1987, pp. 233–239.
Article Google Scholar
D.-L. Lee, “On Access and Alignment of Data in a Parallel Processor,” Inf. Process. Lett., vol. 33, no. 1, 1989, pp. 11–14.
Article Google Scholar
D.T. Harper III and D.A. Linebarger, “Dynamic Address Mapping for Conflict-Free Vector Access,” U.S. Patent 4 918 600, Apr 17, 1990.
S. Dutta, W. Wolf and A. Wolfe, “A Methodology to Evaluate Memory Architecture Design Tradeoffs for Video Signal Processors,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 1, 1998, pp. 36–53.
Article Google Scholar
Altera, Stratix Device Handbook, vol. 1, version 3.2, Altera, 2005. Jan.
Altera, Nios 3.0 CPU Data Sheet, version 2.2, Altera, 2004, Oct.
E. Salminen, A. Kulmala and T.D. Hämäläinen, “HIBI-based Multiprocessor SoC on FPGA,” in Proc. IEEE Int’l Symp. Circuits Syst., Kobe, Japan, 2005, pp. 3351–3354, May.
E. Aho, J. Vanne, T.D. Hämäläinen and K. Kuusilinna, “Configurable Implementation of Parallel Memory Based Real-time Video Downscaler,” Microprocess. Microsyst., vol. 31, no. 5, 2007, pp. 283–292.
Article Google Scholar
L. Li, S. Goto and T. Ikenaga, “An Efficient Deblocking Filter Architecture with 2-Dimensional Parallel Memory for H.264/AVC,” in Proc. Asia and South Pacific Design Automation Conf., Shanghai, China, 2005, pp. 623–626, Jan.
J. Vanne, E. Aho, T.D. Hämäläinen and K. Kuusilinna, “A Parallel Memory System for Variable Block Size Motion Estimation Algorithms,” IEEE Trans. Circuits Syst. Video Technol. (in press).
T.H. Morrin and D.C. van Voorhis, “Method and Apparatus for Accessing Horizontal Sequences and Rectangular Sub-Arrays from an Array Stored in a Modified Word Organized Random Access Memory System,” U.S. Patent 3 938 102, Feb 10, 1976.
J.W. Park, “An Efficient Memory System for Image Processing,” IEEE Trans. Comput., vol. C-35, no. 7, 1986, pp. 669–674.
Article Google Scholar
J. K. Tanskanen, T. Sihvo, and J. Niittylahti, “Byte and Modulo Addressable Parallel Memory Architecture for Video Coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 11, 2004, pp. 1270–1276.
Article Google Scholar
J.K. Tanskanen, R. Creutzburg, and J.T. Niittylahti, “On Design of Parallel Memory Access Schemes for Video Coding,” J. VLSI Signal Process., vol. 40, no. 2, 2005, pp. 215–237.
Article Google Scholar
J.K. Tanskanen and J.T. Niittylahti, “Scalable Parallel Memory Architectures for Video Coding,” J. VLSI Signal Process., vol. 38, no. 2, 2004, pp. 173–199.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Digital and Computer Systems, Tampere University of Technology, Tampere, Finland
Eero Aho, Jarno Vanne & Timo D. HÄmÄlÄinen

Authors

Eero Aho
View author publications
You can also search for this author in PubMed Google Scholar
Jarno Vanne
View author publications
You can also search for this author in PubMed Google Scholar
Timo D. HÄmÄlÄinen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eero Aho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aho, E., Vanne, J. & HÄmÄlÄinen, T.D. Configurable Data Memory for Multimedia Processing. J Sign Process Syst Sign Image 50, 231–249 (2008). https://doi.org/10.1007/s11265-007-0126-x

Download citation

Received: 15 February 2007
Revised: 16 May 2007
Accepted: 21 June 2007
Published: 16 August 2007
Issue Date: February 2008
DOI: https://doi.org/10.1007/s11265-007-0126-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Configurable Data Memory for Multimedia Processing

Abstract

Access this article

Similar content being viewed by others

Towards Application-Centric Parallel Memories

New access modes of parallel memory subsystem for sub-pixel motion estimation

Impact of the Memory Controller on the Performance of Parallel Workloads

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Configurable Data Memory for Multimedia Processing

Abstract

Access this article

Similar content being viewed by others

Towards Application-Centric Parallel Memories

New access modes of parallel memory subsystem for sub-pixel motion estimation

Impact of the Memory Controller on the Performance of Parallel Workloads

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation