Skip to main content
Log in

Configurable Data Memory for Multimedia Processing

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In modern multimedia applications, memory bottleneck can be alleviated with special stride data accesses. Data elements in stride access can be retrieved in parallel with parallel memories, in which the idea is to increase memory bandwidth with several memory modules working in parallel and feed the processor with only necessary data. Arbitrary stride access capability with interleaved memories is described in previous research where the skewing scheme is changed at run time according to the currently used stride. This paper presents the improved schemes which are adapted to parallel memories. The proposed novel parallel memory implementation allows conflict free accesses with all the constant strides which has not been possible in prior application specific parallel memories. Moreover, the possible access locations are unrestricted and the accessed data element count equals to the number of memory modules. Timing and area estimates are given for Altera Stratix FPGA and 0.18 micrometer CMOS process with memory module count from 2 to 32. The FPGA results show 129 MHz clock frequency for a system with 16 memory modules when read and write latencies are 3 and 2 clock cycles, respectively. The complexity of the proposed system is shown to be a trade-off between application specific and highly configurable parallel memory system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H.-J. Stolberg, M. Berekovic, S. Moch, L. Friebe, M.B. Kulaczewski, S. Flügel, H. Klußmann, A. Dehnhardt and P. Pirsch, “HiBRID-SoC: A Multi-Core SoC Architecture for Multimedia,” J. VLSI Signal Process., vol. 41, no. 1, 2005, pp. 9–20.

    Article  Google Scholar 

  2. P. Ranganathan, S. Adve and N.P. Jouppi, “Performance of Image and Video Processing with General-purpose Processors and Media ISA Extensions,” in Proc. Int. Symp. Computer Architecture, Atlanta, GA, USA, 1999, pp. 124–135, May.

  3. N. Slingerland and A. J. Smith, “Measuring the Performance of Multimedia Instruction Sets,” IEEE Trans. Comput., vol. 51, no. 11, 2002, pp. 1317–1332.

    Article  MathSciNet  Google Scholar 

  4. J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, 3rd ed., Morgan Kaufman Publishers, 2003.

    Google Scholar 

  5. J. Takala and T. Järvinen, “Stride Permutation Access in Interleaved Memory Systems,” in Domain-specific MultiprocessorsSystems, Architectures, Modeling, and Simulation, S. S. Bhattacharyya, E. F. Deprettere, and J. Teich (Eds.), Marcel Dekker, 2004, pp. 63–84.

    Google Scholar 

  6. E. Aho, J. Vanne, K. Kuusilinna and T.D. Hämäläinen, “Address Computation in Configurable Parallel Memory Architecture,” IEICE Trans. Inf. Syst., vol. E87-D, no. 7, 2004, pp. 1674–1681.

    Google Scholar 

  7. P. Budnik and D.J. Kuck, “The Organization and Use of Parallel Memories,” IEEE Trans. Comput., vol. C-20, no. 12, 1971, pp. 1566–1569.

    Article  Google Scholar 

  8. S. Chen, A. Postula, and L. Jozwiak, “Synthesis of XOR Storage Schemes with Different Cost for Minimization of Memory Contention,” in Proc. Euromicro Conf., Milan, Italy, 1999, pp. 170–177, Sep.

  9. G. Kuzmanov, G. Gaydadjiev, and S. Vassiliadis, “Multimedia Rectangularly Addressable Memory,” IEEE Trans. Multimedia, vol. 8, no. 2, 2006, pp. 315–322.

    Article  Google Scholar 

  10. A. Norton and E. Melton, “A Class of Boolean Linear Transformations for Conflict-Free Power-of-Two Stride Access,” in Proc. Int. Conf. Parallel Processing, University Park, PA, USA, 1987, pp. 247–254, Aug.

  11. M. Valero, T. Lang, M. Peiron, and E. Ayguadé, “Conflict-free Access for Streams in Multimodule Memories,” IEEE Trans. Comput., vol. 44, no. 5, 1995, pp. 634–646.

    Article  MATH  Google Scholar 

  12. C. Verdier, E. Boutillon, A. Lafage, and A. Demeure, “Access and Alignment of Arrays for a Bidimensional Parallel Memory,” in Proc. Int. Conf. Application Specific Array Processors, San Francisco, CA, USA, 1994, pp. 346–356, Aug.

  13. R. S. Katti, “Nonprime Memory Systems and Error Correction in Address Translation,” IEEE Trans. Comput., vol. 46, no. 1, 1997, pp. 75–79.

    Article  MathSciNet  Google Scholar 

  14. T. Järvinen, P. Salmela, T. Sipilä, and J. Takala, “Systematic Approach for Path Metric Access in Viterbi Decoders,” IEEE Trans. Commun., vol. 53, no. 5, 2005, pp. 755–759.

    Article  Google Scholar 

  15. D. T. Harper III and D. A. Linebarger, “Conflict-free Vector Access Using a Dynamic Storage Scheme,” IEEE Trans. Comput., vol. 40, no. 3, 1991, pp. 276–283.

    Article  Google Scholar 

  16. D. T. Harper III, “Increased Memory Performance During Vector Accesses Through the Use of Linear Address Transformations,” IEEE Trans. Comput., vol. 41, no. 2, 1992, pp. 227–230.

    Article  Google Scholar 

  17. E. Aho, J. Vanne, T.D. Hämäläinen and K. Kuusilinna, “Block-level Parallel Processing for Scaling Evenly Divisible Images,” IEEE Trans. Circuits Syst. I, vol. 52, no. 12, 2005, pp. 2717–2725.

    Article  Google Scholar 

  18. E. Aho, J. Vanne and T.D. Hämäläinen, “Parallel Memory Architecture for Arbitrary Stride Accesses,” in Proc. IEEE Workshop Design and Diagnostics of Electronic Circuits and Systems, Prague, Czech Republic, 2006, pp. 65–70, Apr.

  19. E. Aho, J. Vanne and T.D. Hämäläinen, “Parallel Memory Implementation for Arbitrary Stride Accesses,” in Proc. Embedded Computer Systems: Architectures, Modeling, and Simulation Conference, Samos, Greece, 2006, pp. 1–6, July.

  20. P. Pirsch, C. Reuter, J.P. Wittenburg, M.B. Kulaczewski and H.-J. Stolberg, “Architecture Concepts for Multimedia Signal Processing,” J. VLSI Signal Process., vol. 29, no. 3, 2001, pp. 157–165.

    Article  MATH  Google Scholar 

  21. P. Faraboschi, G. Desoli and J.A. Fisher, “The Latest Word in Digital and Media Processing,” IEEE Signal Process. Mag., vol. 15, no. 2, 1998, pp. 59–85.

    Article  Google Scholar 

  22. D. Talla, L.K. John, V. Lapinskii and B.L. Evans, “Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures,” in Proc. Int. Conf. Computer Design, Austin, TX, USA, 2000, pp. 163–172, Sep.

  23. D. Cheresiz, B. Juurlink, S. Vassiliadis and H.A.G. Wijshoff, “The CSI Multimedia Architecture,” IEEE Trans. VLSI Syst., vol. 13, no. 1, 2005, pp. 1–13.

    Article  Google Scholar 

  24. A. Peleg and U. Weiser, “MMX Technology Extension to the Intel Architecture,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 42–50.

    Article  Google Scholar 

  25. S. Thakkar and T. Huff, “Internet Streaming SIMD Extensions,” IEEE Computer, vol. 32, no. 12, 1999, pp. 26–34.

    Google Scholar 

  26. D. Boggs, A. Baktha, J. Hawkins, D.T. Marr, J. A. Miller, P. Roussel, R. Singhal, B. Toll and K.S. Venkatraman, “The Microarchitecture of the Intel® Pentium® 4 Processor on 90 nm Technology,” Intel Technol. J., vol. 8, no. 1, 2004, pp. 1–17.

    Google Scholar 

  27. S. Oberman, G. Favor and F. Weber, “AMD 3DNow! Technology: Architecture and Implementations,” IEEE MICRO, vol. 19, no. 2, 1999, pp. 37–48.

    Article  Google Scholar 

  28. M. Tremblay, J.M. O’Connor, V. Narayanan and L. He, “VIS Speeds New Media Processing,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 10–20.

    Article  Google Scholar 

  29. D.A. Carlson, R.W. Castelino and R.O. Mueller, “Multimedia Extensions for a 550-MHz RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 32, no. 11, 1997, pp. 1618–1624.

    Article  Google Scholar 

  30. R.B. Lee, “Subword Parallelism with MAX-2,” IEEE MICRO, vol. 16, no. 4, 1996, pp. 51–59.

    Article  Google Scholar 

  31. K. Diefendorff, P.K. Dubey, R. Hochsprung and H. Scale, “AltiVec Extension to PowerPC Accelerates Media Processing,” IEEE MICRO, vol. 20, no. 2, 2000, pp. 85–95.

    Article  Google Scholar 

  32. J. Fridman and Z. Greenfield, “The TigerSHARC DSP Architecture,” IEEE MICRO, vol. 20, no. 1, 2000, pp. 66–76.

    Article  Google Scholar 

  33. Texas Instruments, Inc., TMS320C64x Technical Overview, Texas Instruments, Inc., 2001, Jan.

  34. C. Basoglu, W. Lee and J. O’Donnell, “The Equator MAP-CA™ DSP: An End-to-End Broadband Signal Processor™ VLIW,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 8, 2002, pp. 646–659.

    Article  Google Scholar 

  35. J.-W. van de Waerdt, S. Vassiliadis, S. Das, S. Mirolo, C. Yen, B. Zhong, C. Basto, J.-P. van Itegem, D. Amirtharaj, K. Kalra, P. Rodriguez, and H. van Antwerpen, “The TM3270 Media-Processor,” in Proc. IEEE/ACM Int. Symp. Microarchitecture, Barcelona, Spain, 2005, pp. 331–342, Nov.

  36. V. Lappalainen, T.D. Hämäläinen and P. Liuha, “Overview of Research Efforts on Media ISA Extensions and Their Usage in Video Coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 8, 2002, pp. 660–670.

    Article  Google Scholar 

  37. J. Corbal, M. Valero and R. Espasa, “Exploiting a New Level of DLP in Multimedia Applications,” in Proc. Int. Symp. Microarchitecture, Haifa, Israel, 1999, pp. 72–79, Nov.

  38. L. Zhang, Z. Fang, M. Parker, B.K. Mathew, L. Schaelicke, J.B. Carter, W. C. Hsieh and S.A. McKee, “The Impulse Memory Controller,” IEEE Trans. Comput., vol. 50, no. 11, 2001, pp. 1117–1132.

    Article  Google Scholar 

  39. S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle, “Dynamic Access Ordering for Streamed Computations,” IEEE Trans. Comput., vol. 49, no. 11, 2000, pp. 1255–1271.

    Article  Google Scholar 

  40. B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Chang, and Scott Rixner, “Imagine: Media Processing with Streams,” IEEE MICRO, vol. 21, no. 2, 2001, pp. 35–46.

    Article  Google Scholar 

  41. C. E. Kozyrakis and D. A. Patterson, “Scalable Vector Processors for Embedded Systems,” IEEE MICRO, vol. 23, no. 6, 2003, pp. 36–45.

    Article  Google Scholar 

  42. A. Seznec and J. Lenfant, “Interleaved Parallel Schemes,” IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 12, 1994, pp. 1329–1334.

    Article  Google Scholar 

  43. J.M. Frailong, W. Jalby and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” in Proc. Int’l Conf. Parallel Processing, Washington, DC, USA, 1985, pp. 276–283, Aug.

  44. K. Kim and V.K. Prasanna Kumar, “Parallel Memory Systems for Image Processing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, 1989, pp. 654–659, June.

  45. D.H. Lawrie, “Access and Alignment of Data in an Array Processor,” IEEE Trans. Comput., vol. C-24, no. 12, 1975, pp. 1145–1155.

    Article  MathSciNet  Google Scholar 

  46. H.A.G. Wijshoff and J. van Leeuwen, “On Linear Skewing Schemes and d-ordered Vectors,” IEEE Trans. Comput., vol. C-36, no. 2, 1987, pp. 233–239.

    Article  Google Scholar 

  47. D.-L. Lee, “On Access and Alignment of Data in a Parallel Processor,” Inf. Process. Lett., vol. 33, no. 1, 1989, pp. 11–14.

    Article  Google Scholar 

  48. D.T. Harper III and D.A. Linebarger, “Dynamic Address Mapping for Conflict-Free Vector Access,” U.S. Patent 4 918 600, Apr 17, 1990.

  49. S. Dutta, W. Wolf and A. Wolfe, “A Methodology to Evaluate Memory Architecture Design Tradeoffs for Video Signal Processors,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 1, 1998, pp. 36–53.

    Article  Google Scholar 

  50. Altera, Stratix Device Handbook, vol. 1, version 3.2, Altera, 2005. Jan.

  51. Altera, Nios 3.0 CPU Data Sheet, version 2.2, Altera, 2004, Oct.

  52. E. Salminen, A. Kulmala and T.D. Hämäläinen, “HIBI-based Multiprocessor SoC on FPGA,” in Proc. IEEE Int’l Symp. Circuits Syst., Kobe, Japan, 2005, pp. 3351–3354, May.

  53. E. Aho, J. Vanne, T.D. Hämäläinen and K. Kuusilinna, “Configurable Implementation of Parallel Memory Based Real-time Video Downscaler,” Microprocess. Microsyst., vol. 31, no. 5, 2007, pp. 283–292.

    Article  Google Scholar 

  54. L. Li, S. Goto and T. Ikenaga, “An Efficient Deblocking Filter Architecture with 2-Dimensional Parallel Memory for H.264/AVC,” in Proc. Asia and South Pacific Design Automation Conf., Shanghai, China, 2005, pp. 623–626, Jan.

  55. J. Vanne, E. Aho, T.D. Hämäläinen and K. Kuusilinna, “A Parallel Memory System for Variable Block Size Motion Estimation Algorithms,” IEEE Trans. Circuits Syst. Video Technol. (in press).

  56. T.H. Morrin and D.C. van Voorhis, “Method and Apparatus for Accessing Horizontal Sequences and Rectangular Sub-Arrays from an Array Stored in a Modified Word Organized Random Access Memory System,” U.S. Patent 3 938 102, Feb 10, 1976.

  57. J.W. Park, “An Efficient Memory System for Image Processing,” IEEE Trans. Comput., vol. C-35, no. 7, 1986, pp. 669–674.

    Article  Google Scholar 

  58. J. K. Tanskanen, T. Sihvo, and J. Niittylahti, “Byte and Modulo Addressable Parallel Memory Architecture for Video Coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 11, 2004, pp. 1270–1276.

    Article  Google Scholar 

  59. J.K. Tanskanen, R. Creutzburg, and J.T. Niittylahti, “On Design of Parallel Memory Access Schemes for Video Coding,” J. VLSI Signal Process., vol. 40, no. 2, 2005, pp. 215–237.

    Article  Google Scholar 

  60. J.K. Tanskanen and J.T. Niittylahti, “Scalable Parallel Memory Architectures for Video Coding,” J. VLSI Signal Process., vol. 38, no. 2, 2004, pp. 173–199.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eero Aho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aho, E., Vanne, J. & HÄmÄlÄinen, T.D. Configurable Data Memory for Multimedia Processing. J Sign Process Syst Sign Image 50, 231–249 (2008). https://doi.org/10.1007/s11265-007-0126-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0126-x

Keywords

Navigation