Skip to main content
Log in

Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Multiprocessor System-on-Chip with self-ti-med design becomes increasingly attractive due to its ability to exploit high parallelism of applications. Previous research efforts on self-timed techniques mostly focused on hardware layer. However, the problem of correctly synthesizing self-timed systems remains to be difficult. In particular, the problem of how to configure a self-timed ring structure to achieve the maximal throughput with no deadlock is still unsolved. Self-timed ring (STR) is composed of a ring of connected “stages”, each consisting of a processing element, communication units and its current state. The correct configuration of STR is determined by the initial state of each stage and a number of inserted buffers into the ring to maintain correct behavior of applications on an STR. This paper establishes a series of theorems based on the understanding of properties of self-timed structures. Based on the theorems, the setting of initial states and buffers can be decided to guarantee correct configuration. Our theorem also establishes mathematical formulas to calculate throughput of an STR. The algorithms presented in the paper find the optimal initial configuration of an STR that achieves the maximum throughput with the minimum number of inserted buffers. The experimental results show that the throughput of applications mapped on STR with the optimal configuration is improved by 64.99 % on average compared with synchronous system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

References

  1. Beerel, P.A., Lines, A., Davies, M., & Kim, N.H. (2006). Slack matching asynchronous designs. In Asynchronous Circuits and Systems, 2006. 12th IEEE International Symposium on, IEEE, pp. 11–pp.

  2. Bhattacharyya, S., Bambha, N., Khandelia, M., & Kianzad, V. (2001). Mapping dsp applications onto self-timed multiprocessors. In Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on, IEEE, (Vol. 1. pp. 441–448).

  3. Burger, D., & Austin, T.M. (1997). The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3), 13–25.

    Article  Google Scholar 

  4. Chao, L.F., & Sha, E.M. (1992). Unfolding and retiming data-flow dsp programs for risc multiprocessor scheduling. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, IEEE, (Vol. 5. pp. 565–568).

  5. Chao, L.F., & Sha, E.M. (1993). Rate-optimal static scheduling for dsp data-flow programs. In VLSI, 1993.’Design Automation of High Performance VLSI Systems’, Proceedings., Third Great Lakes Symposium on, IEEE (pp. 80–84).

  6. Chao, L. F., & Sha, E.M. (1997). Scheduling data-flow graphs via retiming and unfolding. Parallel and Distributed Systems. IEEE Transactions on, 8(12), 1259–1267.

    Google Scholar 

  7. Gill, G., Hansen, J., & Singh, M. (2006). Loop pipelining for high-throughput stream computation using self-timed rings. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, ACM (pp. 289–296).

  8. Greenstreet M.R. (1993). Stari: A technique for high-bandwidth communication. PhD thesis. NJ, USA: Princeton. uMI Order No. GAX93-11221.

  9. Greenstreet, M.R., & Steiglitz, K. (1990). Bubbles can make self-timed pipelines fast. Journal of VLSI signal processing systems for signal, image and video technology, 2(3), 139–148.

    Article  Google Scholar 

  10. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T, & Brown, R.B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, IEEE (pp. 3–14).

  11. Liljeberg, P., Plosila, J., & Isoaho, J. (2003). Self-timed ring architecture for soc applications. In SOC Conference, 2003 Proceedings. IEEE International [Systems-on-Chip], IEEE (pp. 359–362).

  12. Liljeberg, P., Tuominen, J., Tuuna, S., Plosila, J., & Isoaho, J. (2005). Self-timed approach for noise reduction in noc reduction in noc. Interconnect-centric design for advanced SoC and NoC, 285–313.

  13. Liu, J., Zhuge, Q., Gu, S., Hu, J., Zhu, G., & Sha, E.M. (2014). Minimizing system cost with efficient task assignment on heterogeneous multicore processors considering time constraint. Parallel and Distributed Systems IEEE Transactions on, 25(8), 2101– 2113.

    Article  Google Scholar 

  14. Murata, T. (1989). Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4), 541–580.

    Article  Google Scholar 

  15. Pang, P.B., & Greenstreet, M.R. (1997). Self-timed meshes are faster than synchronous. In Advanced Research in Asynchronous Circuits and Systems, 1997, Proceedings., Third International Symposium on, IEEE (pp. 30–39).

  16. Payne R. (1995). Self-timed fpga systems. In Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications (pp. 21–35). New York: Springer-Verlag.

  17. Sannomiya, S., Omori, Y., & Iwata, M. (2003). A macroscopic behavior model for self-timed pipeline systems. In Parallel and Distributed Simulation, 2003.(PADS 2003). Proceedings. Seventeenth Workshop on, IEEE (pp. 133–140).

  18. Shao, Z., Zhuge, Q., Xue, C., & Sha, E.H.M. (2005). Efficient assignment and scheduling for heterogeneous dsp systems. Parallel and Distributed Systems IEEE Transactions on, 16(6), 516–525.

    Article  Google Scholar 

  19. Stuijk, S., Geilen, M., & Basten, T. (2006). Sdf3: Sdf for free. In ACSD, (Vol. 6. pp. 276–278).

  20. Stuijk, S., Geilen, M., & Basten, T. (2008). Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. Computers, IEEE Transactions on, 57(10), 1331–1345.

    Article  MathSciNet  Google Scholar 

  21. Sun, Q., Zhuge, Q., Hu, J., Yi J, & Sha, E.H.M. (2014), Efficient grouping-based mapping and scheduling on heterogeneous cluster architectures. Computers & Electrical Engineering.

  22. Sutherland, I.E. (1989). Micropipelines. Communications of the ACM, 32(6), 720–738.

    Article  Google Scholar 

  23. Terada, H., Miyata, S., & Iwata, M. (1999). Ddmps: self-timed super-pipelined data-driven multimedia processors. Proceedings of the IEEE, 87(2), 282–295.

    Article  Google Scholar 

  24. Van Berkel, C., Josephs, M.B., & Nowick, S.M. (1999). Applications of asynchronous circuits. Proceedings of the IEEE, 87(2), 223–233.

    Article  Google Scholar 

  25. Williams T.E. (1991). Self-timed rings and their application to division. PhD thesis, Stanford, CA, USA, uMI Order No. GAX92-05744.

  26. Winstanley, A.J. (2001). Temporal properties of self-timed rings. PhD thesis, The University of British Columbia.

  27. Yang, Km., Lei, Kf., & Chiu, Jc. (2010). Design of an asynchronous ring bus architecture for multi-core systems. In Computer Symposium (ICS), 2010 International, IEEE (pp. 682–687).

  28. Zhu, X.Y., Basten, T., Geilen, M., & Stuijk, S. (2012). Efficient retiming of multirate dsp algorithms. Computer-Aided Design of Integrated Circuits and Systems IEEE Transactions on, 31(6), 831–844.

    Article  Google Scholar 

  29. Zhuge, Q., Xiao, B., & Sha, E.H.M. (2003). Code size reduction technique and implementation for software-pipelined dsp applications. ACM Transactions on Embedded Computing Systems (TECS), 2(4), 590–613.

    Article  Google Scholar 

  30. Zhuge, Q., Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha, E.H.M. (2006). Design optimization and space minimization considering timing and code size via retiming and unfolding. Microprocessors and Microsystems, 30(4), 173–183.

    Article  Google Scholar 

  31. Zhuge, Q., Guo, Y., Hu, J., Tseng, W.C., Xue, C.J., & Sha, E.M. (2012). Minimizing access cost for multiple types of memory units in embedded systems through data allocation and scheduling. Signal Processing, IEEE Transactions on, 60(6), 3253–3263.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is partially supported by Chongqing High-Tech Research Program csct2012ggC40005, National 863 Program 2013AA013202, NSFC 61173014, NSFC 61472052.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingfeng Zhuge.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, W., Zhuge, Q., Chen, X. et al. Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput. J Sign Process Syst 84, 123–137 (2016). https://doi.org/10.1007/s11265-015-0984-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-0984-6

Keywords

Navigation