Skip to main content

Advertisement

Log in

Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper studies the feasibility and potential of using planar embedded DRAM (eDRAM), which is completely compatible with CMOS logic process, to improve circuit implementation efficiency of memory-hungry signal processing algorithms. In spite of its apparent cell area efficiency advantage over SRAM, planar eDRAM is not being widely used in practice, mainly due to its very short retention time (e.g., few \(\upmu \)s and even a few hundreds ns). In this work, we contend that short retention time may not necessarily be a fundamental issue for implementing signal processing algorithms because they typically handle streaming data, which exhibits regular and predictable data access patterns, and has a large algorithm/architecture design space. This study elaborates on the rationale and application of using a planar eDRAM in memory-hungry signal processing circuit implementations, and discusses the possible algorithm and architecture design strategies to better embrace the use of planar eDRAM. For the purpose of demonstration, we use low-density parity-check (LDPC) code decoding and motion estimation in video encoding as test vehicles. Beyond a straightforward SRAM replacement, we propose an interleaved read/write page-mode DRAM operation to reduce planar eDRAM energy consumption by leveraging LDPC code decoding data access pattern, and we investigate the potential of using planar eDRAM to enable a higher degree of image data reuse in motion estimation by proposing a folded scan structure to further improve its effectiveness. We carried out detailed planar eDRAM SPICE simulations at 45 nm node to obtain its characteristics, based on which we quantitatively evaluate the effectiveness of using planar eDRAM in these two case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

Notes

  1. In comparison, eDRAM with explicitly fabricated capacitors at extra fabrication cost can achieve much longer retention time, e.g., the eDRAM being used in IBM server processors has 40 \(\upmu \)s retention time [2].

References

  1. Balasubramonian, R., Muralimanohar, N., Jouppi, N. (2009). Cacti: A tool to model large caches. http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.

  2. Barth, J., Reohr, W., Parries, P., Fredeman, G., Golz, J., Schuster, S., Matick, R., Hunter, H.I., C.T., Harig, J., Kim, H., Khan, B., Griesemer, J., Havreluk, R., Yanagisawa, K., Kirihata, T., Iyer, S. (2008). A 500MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE Journal of Solid State Circuits, 43(1), 86–95.

    Article  Google Scholar 

  3. Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G. (2006). Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Transactions on Circuits Systems I, Reg. Papers, 53(6), 578–593.

    Article  Google Scholar 

  4. Chen, C.Y., Huang, C.T., Chen, Y.H., Chen, L.-G. (2006). Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Transactions on Circuits and Systems for Video Technology, 16(4), 553–558.

    Article  Google Scholar 

  5. Cho, H.J., Nemati, F., Roy, R., Gupta, R., Yang, K., Ershov, M., Banna, S., Tarabbia, M., Sailing, C., Hayes, D., Mittal, A., Robins, S. (2005). A novel capacitor-less DRAM cell using thin capacitively-coupled thyristor (TCCT). In Proc. of IEEE international electron devices meeting (IEDM) (pp. 311–314).

  6. Gallager, R.G. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, IT-8, 21–28.

    Article  MathSciNet  Google Scholar 

  7. Kim, J., & Park, T. (2007). A novel VLSI architecture for full-search variable block-size motion estimation. In Proc. of IEEE TENCON, Taipei.

  8. Leung, W., Hsu, F., Jones, M.E. (2000). New generation of Z-RAM. In Proc. of IEEE international ASIC/SOC conference (pp. 32–36).

  9. Li, P., & Tang, H. (2010). A low power VLSI Implementation for variable block size motion estimation in H.264/AVC. In Proc. of ISCAS: circuits and systems conf, Paris, France.

  10. Li, Z., Chen, L., Zeng, L., Lin, S., Fong, W. (2006). Efficient encoding of quasi-cyclic low-density parity-check codes. IEEE Transactions on Communications, 54(1), 71–81.

    Article  Google Scholar 

  11. MacKay, D.J.C., & Neal, R.M. (1996). Near Shannon limit performance of low density parity check codes. Electronics Letters, 32, 1645–1646.

    Article  Google Scholar 

  12. Matick, R., & Schuster, S. (2005). Logic-based eDRAM: Origins and rationale for use. IBM Journal of Research and Development, 49, 145–165.

    Article  Google Scholar 

  13. Miles, L., Gambles, J., Maki, G., Ryan, W., Whitaker, S. (2006). An 860-Mb/s (8158,7136) low-density parity-check encoder. IEEE Journal of Solid-State Circuits, 41(8), 1686–1691.

    Article  Google Scholar 

  14. MoSys Inc. http://www.mosys.com/. Accessed 10 Oct 2010.

  15. Natarajan, S., Chung, S., Paris, L., Keshavarzi, A. (2009). Searching for the dream embedded memory. IEEE Solid-State Circuits Magazine, 1, 34–44.

    Article  Google Scholar 

  16. Okhonin, S., Nagoga, M., Carman, E., Beffa, R., Faraoni, E. (2007). New generation of Z-RAM. In Proc. of IEEE international electron devices meeting (IEDM) (pp. 925–928).

  17. Somasekhar, D., Lu, S.L., Bloechel, B., Lai, K., Borkar, S., De, V. (2002). Planar 1T-cell DRAM with MOS storage capacitors in a 130 nm logic technology for high density microprocessor caches. In Proc. of IEEE solid state circuits conf., ESSCIRC, Firenze, Italy.

  18. Somasekhar, D., Yibin, Y., Aseron, P., Lu, S.L., Khellah, M., Howard, J., Ruhl, G., Karnik, T., Borkar, S., De, V., Keshavarzi, A. (2009). 2GHz 2MB 2T gain cell memory macro with 128 GBytes/s bandwidth in a 65 nm logic process technology. IEEE Journal of Solid State Circuits, 44(1), 174–185.

    Article  Google Scholar 

  19. Song, Y., Liu, Z., Ikenaga, T., Goto, S. (2006). VLSI architecture for variable block size motion estimation in H.264/AVC with low cost memory organization. In Proc. of VLSI design automation and test conf, Hsinchu, Taiwan.

  20. Su, Y., & Sun, M.T. (2006). Fast multiple reference frame motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems on Video Technology, 16(3), 447–452.

    Article  Google Scholar 

  21. Sveriges Television (SVT). Video-sequence. http://www.svt.se. Accessed 10 Oct 2010.

  22. Tuan, J.C., Chang, T.S., Jen, C.W. (2002). On the data reuse and memory bandwidth analysis for full-searchblock-matching VLSI architecture. IEEE Transactions on Circuits and Systems on Video Technology, 12(1), 61–72.

    Article  Google Scholar 

  23. Wang, G., Ho, K.C.H., Faltermeier, J., Kong, W., Kim, H., Cai, J. (2006). A 0.127\(\mu \)m\(^2\) high performance 65 nm SOI based embedded DRAM for on-processor applications. In Proc. of international electron devices meeting (IEDM) (pp. 1–4).

  24. Wang, Z., & Cui, Z. (2007). A memory efficient partially parallel decoder architecture for quasi-cyclic LDPC codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(4), 483–488.

    Article  Google Scholar 

  25. Wiberg, N. (1996). Codes and decoding on general graphs. PhD Dissertation, Linkoping University, Sweden.

  26. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems on Video Technology, 13(7), 560–576.

    Article  Google Scholar 

  27. Xiang, B., Shen, R., Pan, A., Bao, D., Zeng, X. (2010). An area-efficient and low-power multirate decoder for quasi-cyclic low-density parity-check codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(10), 1447–1460.

    Article  Google Scholar 

  28. Yap, S., & McCanny, J. (2004). A VLSI architecture for variable block size video motion estimation. IEEE Transactions on Circuits and Systems II: Express Briefs, 51(7), 384–389.

    Article  Google Scholar 

  29. Zhang, K., Huang, X., Wang, Z. (2009). High-throughput layered decoder implementation for quasi-cyclic LDPC codes. IEEE Journal on Selected Areas in Communications, 27(6), 985–994.

    Article  Google Scholar 

  30. Zhong, H., Zhang, T., Haratsch, E.F. (2007). Quasi-cyclic LDPC codes for the magnetic recording channel: code design and VLSI implementation. IEEE Transactions on Magnetics, 43(3), 1118–1123.

    Article  Google Scholar 

  31. Zhang, Z., Anantharam, V., Wainwright, M., Nikolic, B. (2010). An efficient 10gbase-t ethernet ldpc decoder design with low error floors. IEEE Journal of Solid-State Circuits, 45(4), 843–855.

    Article  Google Scholar 

  32. Ndili, O., & Ogunfunmi, T. (2011). Algorithm and architecture co-design of hardware-oriented, modified diamond search for fast motion estimation in H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 21(9), 1214–1227.

    Article  Google Scholar 

  33. Chatterjee, S.K., & Chakrabarti, I. (2010). Low power VLSI architectures for one bit transformation based fast motion estimation. IEEE Transactions on Consumer Electronics, 56(4), 2652–2660.

    Article  Google Scholar 

  34. Murugappa, P., Al-Khayat, R., Baghdadi, A., Jezequel, M. (2011). A flexible high throughput multi-asip architecture for ldpc and turbo decoding. In Proc. of design, automation test in Europe conference exhibition (DATE) (pp. 1–6).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kalyana Sundaram Venkataraman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Venkataraman, K.S., Li, Y., Wu, Q. et al. Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation. J Sign Process Syst 73, 11–24 (2013). https://doi.org/10.1007/s11265-012-0724-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-012-0724-0

Keywords

Navigation