Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput

Jiang, Weiwen; Zhuge, Qingfeng; Chen, Xianzhang; Yang, Lei; Yi, Juan; Sha, Edwin H.-M.

doi:10.1007/s11265-015-0984-6

Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput

Published: 06 March 2015

Volume 84, pages 123–137, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Weiwen Jiang¹,
Qingfeng Zhuge^1,2,
Xianzhang Chen¹,
Lei Yang¹,
Juan Yi¹ &
…
Edwin H.-M. Sha^1,2

276 Accesses
5 Citations
Explore all metrics

Abstract

Multiprocessor System-on-Chip with self-ti-med design becomes increasingly attractive due to its ability to exploit high parallelism of applications. Previous research efforts on self-timed techniques mostly focused on hardware layer. However, the problem of correctly synthesizing self-timed systems remains to be difficult. In particular, the problem of how to configure a self-timed ring structure to achieve the maximal throughput with no deadlock is still unsolved. Self-timed ring (STR) is composed of a ring of connected “stages”, each consisting of a processing element, communication units and its current state. The correct configuration of STR is determined by the initial state of each stage and a number of inserted buffers into the ring to maintain correct behavior of applications on an STR. This paper establishes a series of theorems based on the understanding of properties of self-timed structures. Based on the theorems, the setting of initial states and buffers can be decided to guarantee correct configuration. Our theorem also establishes mathematical formulas to calculate throughput of an STR. The algorithms presented in the paper find the optimal initial configuration of an STR that achieves the maximum throughput with the minimum number of inserted buffers. The experimental results show that the throughput of applications mapped on STR with the optimal configuration is improved by 64.99 % on average compared with synchronous system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Article 26 March 2022

V. A. Melent’ev

Pyverilog: A Python-Based Hardware Design Processing Toolkit for Verilog HDL

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

Article 07 April 2024

Jing Zhang & Zhanwei Han

References

Beerel, P.A., Lines, A., Davies, M., & Kim, N.H. (2006). Slack matching asynchronous designs. In Asynchronous Circuits and Systems, 2006. 12th IEEE International Symposium on, IEEE, pp. 11–pp.
Bhattacharyya, S., Bambha, N., Khandelia, M., & Kianzad, V. (2001). Mapping dsp applications onto self-timed multiprocessors. In Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on, IEEE, (Vol. 1. pp. 441–448).
Burger, D., & Austin, T.M. (1997). The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3), 13–25.
Article Google Scholar
Chao, L.F., & Sha, E.M. (1992). Unfolding and retiming data-flow dsp programs for risc multiprocessor scheduling. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, IEEE, (Vol. 5. pp. 565–568).
Chao, L.F., & Sha, E.M. (1993). Rate-optimal static scheduling for dsp data-flow programs. In VLSI, 1993.’Design Automation of High Performance VLSI Systems’, Proceedings., Third Great Lakes Symposium on, IEEE (pp. 80–84).
Chao, L. F., & Sha, E.M. (1997). Scheduling data-flow graphs via retiming and unfolding. Parallel and Distributed Systems. IEEE Transactions on, 8(12), 1259–1267.
Google Scholar
Gill, G., Hansen, J., & Singh, M. (2006). Loop pipelining for high-throughput stream computation using self-timed rings. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, ACM (pp. 289–296).
Greenstreet M.R. (1993). Stari: A technique for high-bandwidth communication. PhD thesis. NJ, USA: Princeton. uMI Order No. GAX93-11221.
Greenstreet, M.R., & Steiglitz, K. (1990). Bubbles can make self-timed pipelines fast. Journal of VLSI signal processing systems for signal, image and video technology, 2(3), 139–148.
Article Google Scholar
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T, & Brown, R.B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, IEEE (pp. 3–14).
Liljeberg, P., Plosila, J., & Isoaho, J. (2003). Self-timed ring architecture for soc applications. In SOC Conference, 2003 Proceedings. IEEE International [Systems-on-Chip], IEEE (pp. 359–362).
Liljeberg, P., Tuominen, J., Tuuna, S., Plosila, J., & Isoaho, J. (2005). Self-timed approach for noise reduction in noc reduction in noc. Interconnect-centric design for advanced SoC and NoC, 285–313.
Liu, J., Zhuge, Q., Gu, S., Hu, J., Zhu, G., & Sha, E.M. (2014). Minimizing system cost with efficient task assignment on heterogeneous multicore processors considering time constraint. Parallel and Distributed Systems IEEE Transactions on, 25(8), 2101– 2113.
Article Google Scholar
Murata, T. (1989). Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4), 541–580.
Article Google Scholar
Pang, P.B., & Greenstreet, M.R. (1997). Self-timed meshes are faster than synchronous. In Advanced Research in Asynchronous Circuits and Systems, 1997, Proceedings., Third International Symposium on, IEEE (pp. 30–39).
Payne R. (1995). Self-timed fpga systems. In Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications (pp. 21–35). New York: Springer-Verlag.
Sannomiya, S., Omori, Y., & Iwata, M. (2003). A macroscopic behavior model for self-timed pipeline systems. In Parallel and Distributed Simulation, 2003.(PADS 2003). Proceedings. Seventeenth Workshop on, IEEE (pp. 133–140).
Shao, Z., Zhuge, Q., Xue, C., & Sha, E.H.M. (2005). Efficient assignment and scheduling for heterogeneous dsp systems. Parallel and Distributed Systems IEEE Transactions on, 16(6), 516–525.
Article Google Scholar
Stuijk, S., Geilen, M., & Basten, T. (2006). Sdf3: Sdf for free. In ACSD, (Vol. 6. pp. 276–278).
Stuijk, S., Geilen, M., & Basten, T. (2008). Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. Computers, IEEE Transactions on, 57(10), 1331–1345.
Article MathSciNet Google Scholar
Sun, Q., Zhuge, Q., Hu, J., Yi J, & Sha, E.H.M. (2014), Efficient grouping-based mapping and scheduling on heterogeneous cluster architectures. Computers & Electrical Engineering.
Sutherland, I.E. (1989). Micropipelines. Communications of the ACM, 32(6), 720–738.
Article Google Scholar
Terada, H., Miyata, S., & Iwata, M. (1999). Ddmps: self-timed super-pipelined data-driven multimedia processors. Proceedings of the IEEE, 87(2), 282–295.
Article Google Scholar
Van Berkel, C., Josephs, M.B., & Nowick, S.M. (1999). Applications of asynchronous circuits. Proceedings of the IEEE, 87(2), 223–233.
Article Google Scholar
Williams T.E. (1991). Self-timed rings and their application to division. PhD thesis, Stanford, CA, USA, uMI Order No. GAX92-05744.
Winstanley, A.J. (2001). Temporal properties of self-timed rings. PhD thesis, The University of British Columbia.
Yang, Km., Lei, Kf., & Chiu, Jc. (2010). Design of an asynchronous ring bus architecture for multi-core systems. In Computer Symposium (ICS), 2010 International, IEEE (pp. 682–687).
Zhu, X.Y., Basten, T., Geilen, M., & Stuijk, S. (2012). Efficient retiming of multirate dsp algorithms. Computer-Aided Design of Integrated Circuits and Systems IEEE Transactions on, 31(6), 831–844.
Article Google Scholar
Zhuge, Q., Xiao, B., & Sha, E.H.M. (2003). Code size reduction technique and implementation for software-pipelined dsp applications. ACM Transactions on Embedded Computing Systems (TECS), 2(4), 590–613.
Article Google Scholar
Zhuge, Q., Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha, E.H.M. (2006). Design optimization and space minimization considering timing and code size via retiming and unfolding. Microprocessors and Microsystems, 30(4), 173–183.
Article Google Scholar
Zhuge, Q., Guo, Y., Hu, J., Tseng, W.C., Xue, C.J., & Sha, E.M. (2012). Minimizing access cost for multiple types of memory units in embedded systems through data allocation and scheduling. Signal Processing, IEEE Transactions on, 60(6), 3253–3263.
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is partially supported by Chongqing High-Tech Research Program csct2012ggC40005, National 863 Program 2013AA013202, NSFC 61173014, NSFC 61472052.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, China
Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang, Juan Yi & Edwin H.-M. Sha
Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing, China
Qingfeng Zhuge & Edwin H.-M. Sha

Authors

Weiwen Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qingfeng Zhuge
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Yi
View author publications
You can also search for this author in PubMed Google Scholar
Edwin H.-M. Sha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingfeng Zhuge.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, W., Zhuge, Q., Chen, X. et al. Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput. J Sign Process Syst 84, 123–137 (2016). https://doi.org/10.1007/s11265-015-0984-6

Download citation

Received: 15 October 2014
Revised: 01 February 2015
Accepted: 16 February 2015
Published: 06 March 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11265-015-0984-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Pyverilog: A Python-Based Hardware Design Processing Toolkit for Verilog HDL

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Pyverilog: A Python-Based Hardware Design Processing Toolkit for Verilog HDL

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation