Skip to main content

Advertisement

Log in

SoC Memory Hierarchy Derivation from Dataflow Graphs

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig .1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Bhattacharyya, S.S.: “Hardware/Software Co-synthesis of DSP Systems”, in Y.H. Hu, editor, Programmable Digital Signal Processors: Architecture, Programming and Applications, pp. 333-378, Marcel Dekker, Inc., 2002.

  2. Lee, E. A., & Messerschmitt, D. G. (1987). Synchronous data flow. Proc. IEEE, 75, 1235–1245. doi:10.1109/PROC.1987.13876.

    Article  Google Scholar 

  3. Wolf, W. (2006). High-Performance Embedded Computing: Architectures, Applications, and Methodologies. San Francisco, CA, USA: Morgan Kaufman.

    MATH  Google Scholar 

  4. Ha, S., et al: “Hardware-software Codesign of Multimedia Embedded Systems: The PeaCE Approach,” in 12th IEEE Int’l Conf. on Emb. and Real-Time Comp. Syst. and App., pp. 207–214, 2006.

  5. Fischaber, S., McAllister, J., Woods, R.: “Memory-Centric Hardware Synthesis from Dataflow Models”, in Proc. 8th Int. SAMOS Workshop, pp. 197-206, Greece, 2008.

  6. Brockmeyer, E., et al. (1999). Low Power Memory Storage and Transfer Organization for the MPEG-4 Full Pel Motion Estimation on a Multi Media Processor. IEEE Trans. Multimed., 1(2), 202–216. doi:10.1109/6046.766740.

    Article  Google Scholar 

  7. Fischaber, S., et al: “SoC Memory Hierarchy Derivation from Dataflow Graphs,” 2007 Workshop on Signal Processing Systems, Shanghai, China, pp. 469-474, Oct. 17-19, 2007.

  8. Edwards, S., et al. (1997). Design of Embedded Systems: Formal Models, Validation, and Synthesis. Proc. IEEE, 85(3), 366–390. doi:10.1109/5.558710.

    Article  Google Scholar 

  9. Gokhale, M., et al: “Stream-oriented FPGA Computing in the Streams-C High Level Language”, in Proc. IEEE Symp. on Field-Programmable Custom Computing Machines, pp. 49-56, 2000.

  10. Handel-C Lanuage Reference Manual: Version 3.0, Celoxica Limited, 2004, available at www.celoxica.com, April, 2007.

  11. Kangas, T., et al: “UML-based multiprocessor SoC design framework,” in ACM Trans. on Embedded Computing Systems (TECS), vol. 5, pp. 281-320, 2006.

  12. Nikolov, H., Stefanov, T., Deprettere, E.: “Modeling and FPGA implementation of applications using parameterized process networks with non-static parameters,” in Proc. IEEE Symp. on FCCM, 18-20 April 2005, pp. 255-263, 2005.

  13. Thompson, M., et al: “A Framework for Rapid System-level Exploration, Synthesis, and Programming of Multimedia MP-SoCs”, Proc. of the 5th IEEE/ACM/IFIP International Conference on HW/SW Codesign and System Synthesis, Austria, 2007.

  14. Janneck, J. W. (2008). et al: “Synthesizing Hardware from Dataflow Programs: an MPEG-4 Simple Profile Decoder Case Study”, 2008 IEEE Workshop on Signal Processing Systems. USA: Washington D.C.

    Google Scholar 

  15. McAllister, J., et al: “Rapid Implementation and Optimisation of DSP Systems on SoPC Based Heterogeneous Platforms,” in Proc. 5th Int. SAMOS Workshop, pp. 254-163, Greece, 2005.

  16. F. Catthoor et al., Optimisation of Global Data Transfer and Storage Organisation for decreased area and power in data-dominated real-time systems, 1998.

  17. Murthy, P. K., & Bhattacharyya, S. S. (2004). Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Trans. Des. Autom. Electron. Syst., 9(2), 212–237. doi:10.1145/989995.989999.

    Article  Google Scholar 

  18. Yang, H., et al. (2006). Buffer Minimization in RTL Synthesis from Coarse-grained Dataflow Specification. Nagoya, Japan, April: SASMI.

    Google Scholar 

  19. “Virtex 4 Family Overview”, Version 1.6, Xilinx, Inc., 2006, available at www.xilinx.com, April, 2007.

  20. Murthy, P. K., & Lee, E. A. (2002). Multidimensional synchronous dataflow. IEEE Trans. Signal Process., 50(8), 2064–2079. doi:10.1109/TSP.2002.800830.

    Article  Google Scholar 

  21. G. Bilsen et al., “Cyclo-static Dataflow”, in IEEE Trans. on Signal Processing, Vol. 44, Issue 2, pp397-408, Feb. 1996.

  22. Denolf, K., et al: “Exploiting the Expressiveness of Cyclo-Static Dataflow to Model Multimedia Implementations,” EURASIP Journal on Advances in Signal Processing, 2007.

  23. Watkinson, J. (2001). The MPEG Handbook. Oxford: Focal.

    Google Scholar 

  24. J.-C. Tuan, et al., “On the Data Reuse and Memory Bandwidth Analysis for Full-Search Block –Matching BLSI Architecture,” in IEEE Trans. On Circuits and Systems for Video Technology, Vol. 12, No. 1, Jan. 2002.

  25. R. M. Ali, “DDR2 SDRAM Interfaces for Next-gen Systems”, in Electronic Engineering Times-Asia, Oct. 16, 2006.

  26. Fischaber, S.: Memory-Centric System Level Design of Heterogeneous Embedded DSP Systems, PhD Thesis, Queen’s University Belfast, 2007.

Download references

Acknowledgements

This work was carried out using the support of the Engineering and Physical Sciences Research Council ICT grant EP/C000676/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott Fischaber.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fischaber, S., Woods, R. & McAllister, J. SoC Memory Hierarchy Derivation from Dataflow Graphs. J Sign Process Syst 60, 345–361 (2010). https://doi.org/10.1007/s11265-009-0380-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0380-1

Keywords

Navigation