Abstract
Multiprocessor System-on-Chip with self-ti-med design becomes increasingly attractive due to its ability to exploit high parallelism of applications. Previous research efforts on self-timed techniques mostly focused on hardware layer. However, the problem of correctly synthesizing self-timed systems remains to be difficult. In particular, the problem of how to configure a self-timed ring structure to achieve the maximal throughput with no deadlock is still unsolved. Self-timed ring (STR) is composed of a ring of connected “stages”, each consisting of a processing element, communication units and its current state. The correct configuration of STR is determined by the initial state of each stage and a number of inserted buffers into the ring to maintain correct behavior of applications on an STR. This paper establishes a series of theorems based on the understanding of properties of self-timed structures. Based on the theorems, the setting of initial states and buffers can be decided to guarantee correct configuration. Our theorem also establishes mathematical formulas to calculate throughput of an STR. The algorithms presented in the paper find the optimal initial configuration of an STR that achieves the maximum throughput with the minimum number of inserted buffers. The experimental results show that the throughput of applications mapped on STR with the optimal configuration is improved by 64.99 % on average compared with synchronous system.










Similar content being viewed by others
References
Beerel, P.A., Lines, A., Davies, M., & Kim, N.H. (2006). Slack matching asynchronous designs. In Asynchronous Circuits and Systems, 2006. 12th IEEE International Symposium on, IEEE, pp. 11–pp.
Bhattacharyya, S., Bambha, N., Khandelia, M., & Kianzad, V. (2001). Mapping dsp applications onto self-timed multiprocessors. In Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on, IEEE, (Vol. 1. pp. 441–448).
Burger, D., & Austin, T.M. (1997). The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3), 13–25.
Chao, L.F., & Sha, E.M. (1992). Unfolding and retiming data-flow dsp programs for risc multiprocessor scheduling. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, IEEE, (Vol. 5. pp. 565–568).
Chao, L.F., & Sha, E.M. (1993). Rate-optimal static scheduling for dsp data-flow programs. In VLSI, 1993.’Design Automation of High Performance VLSI Systems’, Proceedings., Third Great Lakes Symposium on, IEEE (pp. 80–84).
Chao, L. F., & Sha, E.M. (1997). Scheduling data-flow graphs via retiming and unfolding. Parallel and Distributed Systems. IEEE Transactions on, 8(12), 1259–1267.
Gill, G., Hansen, J., & Singh, M. (2006). Loop pipelining for high-throughput stream computation using self-timed rings. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, ACM (pp. 289–296).
Greenstreet M.R. (1993). Stari: A technique for high-bandwidth communication. PhD thesis. NJ, USA: Princeton. uMI Order No. GAX93-11221.
Greenstreet, M.R., & Steiglitz, K. (1990). Bubbles can make self-timed pipelines fast. Journal of VLSI signal processing systems for signal, image and video technology, 2(3), 139–148.
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T, & Brown, R.B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, IEEE (pp. 3–14).
Liljeberg, P., Plosila, J., & Isoaho, J. (2003). Self-timed ring architecture for soc applications. In SOC Conference, 2003 Proceedings. IEEE International [Systems-on-Chip], IEEE (pp. 359–362).
Liljeberg, P., Tuominen, J., Tuuna, S., Plosila, J., & Isoaho, J. (2005). Self-timed approach for noise reduction in noc reduction in noc. Interconnect-centric design for advanced SoC and NoC, 285–313.
Liu, J., Zhuge, Q., Gu, S., Hu, J., Zhu, G., & Sha, E.M. (2014). Minimizing system cost with efficient task assignment on heterogeneous multicore processors considering time constraint. Parallel and Distributed Systems IEEE Transactions on, 25(8), 2101– 2113.
Murata, T. (1989). Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4), 541–580.
Pang, P.B., & Greenstreet, M.R. (1997). Self-timed meshes are faster than synchronous. In Advanced Research in Asynchronous Circuits and Systems, 1997, Proceedings., Third International Symposium on, IEEE (pp. 30–39).
Payne R. (1995). Self-timed fpga systems. In Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications (pp. 21–35). New York: Springer-Verlag.
Sannomiya, S., Omori, Y., & Iwata, M. (2003). A macroscopic behavior model for self-timed pipeline systems. In Parallel and Distributed Simulation, 2003.(PADS 2003). Proceedings. Seventeenth Workshop on, IEEE (pp. 133–140).
Shao, Z., Zhuge, Q., Xue, C., & Sha, E.H.M. (2005). Efficient assignment and scheduling for heterogeneous dsp systems. Parallel and Distributed Systems IEEE Transactions on, 16(6), 516–525.
Stuijk, S., Geilen, M., & Basten, T. (2006). Sdf3: Sdf for free. In ACSD, (Vol. 6. pp. 276–278).
Stuijk, S., Geilen, M., & Basten, T. (2008). Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. Computers, IEEE Transactions on, 57(10), 1331–1345.
Sun, Q., Zhuge, Q., Hu, J., Yi J, & Sha, E.H.M. (2014), Efficient grouping-based mapping and scheduling on heterogeneous cluster architectures. Computers & Electrical Engineering.
Sutherland, I.E. (1989). Micropipelines. Communications of the ACM, 32(6), 720–738.
Terada, H., Miyata, S., & Iwata, M. (1999). Ddmps: self-timed super-pipelined data-driven multimedia processors. Proceedings of the IEEE, 87(2), 282–295.
Van Berkel, C., Josephs, M.B., & Nowick, S.M. (1999). Applications of asynchronous circuits. Proceedings of the IEEE, 87(2), 223–233.
Williams T.E. (1991). Self-timed rings and their application to division. PhD thesis, Stanford, CA, USA, uMI Order No. GAX92-05744.
Winstanley, A.J. (2001). Temporal properties of self-timed rings. PhD thesis, The University of British Columbia.
Yang, Km., Lei, Kf., & Chiu, Jc. (2010). Design of an asynchronous ring bus architecture for multi-core systems. In Computer Symposium (ICS), 2010 International, IEEE (pp. 682–687).
Zhu, X.Y., Basten, T., Geilen, M., & Stuijk, S. (2012). Efficient retiming of multirate dsp algorithms. Computer-Aided Design of Integrated Circuits and Systems IEEE Transactions on, 31(6), 831–844.
Zhuge, Q., Xiao, B., & Sha, E.H.M. (2003). Code size reduction technique and implementation for software-pipelined dsp applications. ACM Transactions on Embedded Computing Systems (TECS), 2(4), 590–613.
Zhuge, Q., Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha, E.H.M. (2006). Design optimization and space minimization considering timing and code size via retiming and unfolding. Microprocessors and Microsystems, 30(4), 173–183.
Zhuge, Q., Guo, Y., Hu, J., Tseng, W.C., Xue, C.J., & Sha, E.M. (2012). Minimizing access cost for multiple types of memory units in embedded systems through data allocation and scheduling. Signal Processing, IEEE Transactions on, 60(6), 3253–3263.
Acknowledgments
This work is partially supported by Chongqing High-Tech Research Program csct2012ggC40005, National 863 Program 2013AA013202, NSFC 61173014, NSFC 61472052.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, W., Zhuge, Q., Chen, X. et al. Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum Throughput. J Sign Process Syst 84, 123–137 (2016). https://doi.org/10.1007/s11265-015-0984-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-0984-6