Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism

Feng, Yu-Jing; Li, De-Jian; Tan, Xu; Ye, Xiao-Chun; Fan, Dong-Rui; Li, Wen-Ming; Wang, Da; Zhang, Hao; Tang, Zhi-Min

doi:10.1007/s11390-020-0555-6

Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism

Regular Paper
Published: 30 July 2022

Volume 37, pages 942–959, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yu-Jing Feng¹,
De-Jian Li²,
Xu Tan¹,
Xiao-Chun Ye¹,
Dong-Rui Fan^1,3,
Wen-Ming Li¹,
Da Wang¹,
Hao Zhang¹ &
…
Zhi-Min Tang¹

148 Accesses
3 Citations
Explore all metrics

Abstract

The dataflow architecture, which is characterized by a lack of a redundant unified control logic, has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency, especially of applications used in high-performance computing (HPC). Importantly, the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner. Therefore, a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts. Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection, or a handshake mechanism in which the data is only sent after acknowledgments are received. However, these mechanisms are characterized by both inefficient data transfer and increased area cost. Good performance of the dataflow architecture depends on the efficiency of data transfer. In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost, we propose a Look-Ahead Acknowledgment (LAA) mechanism. LAA accelerates the execution ow by speculatively acknowledging ahead without penalties. Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%, with a reduction in the average execution time by 17.4% and an increase in the average power efficiency of dataflow processors by 22.4%. Crucially, our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%. In conclusion, the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Drawbacks of Programming Dataflow Architectures and Methods to Overcome Them

A Non-Stop Double Buffering Mechanism for Dataflow Architecture

Article 26 January 2018

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

Article 24 February 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Dennis J B. Retrospective: A preliminary architecture for a basic data-flow processor. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.2-4. https://doi.org/10.1145/285930.285932.
Arvind, Nikhil R S. Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, 1990, 39(3): 300-318. https://doi.org/10.1109/12.48862.
Article Google Scholar
Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Burger D, Keckler S W, Moore C R. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proc. the 30th Annual International Symposium on Computer Architecture, June 2003, pp.422-433. https://doi.org/10.1109/ISCA.2003.1207019.
Swanson S, Michelson K, Schwerin A, Oskin M. WaveScalar. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture, December 2003, pp.291-302. https://doi.org/10.1109/MICRO.2003.1253203.
Pratas F, Oriato D, Pell O, Mata R A, Sousa L. Accelerating the computation of induced dipoles for molecular mechanics with dataflow engines. In Proc. the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, April 2013, pp.177-180. https://doi.org/10.1109/FCCM.2013.34.
Fu H, Gan L, Clapp R G, Ruan H, Pell O, Mencer O, Flynn M, Huang X, Yang G. Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro, 2014, 34(1): 30-40. https://doi.org/10.1109/MM.2013.111.
Article Google Scholar
Coons K E, Chen X, Burger D, McKinley K S, Kushwaha S K. A spatial path scheduling algorithm for EDGE architectures. In Proc. the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006, pp.129-140. https://doi.org/10.1145/1168857.1168875.
Liu D, Yin S, Liu L, Wei S. Polyhedral model based mapping optimization of loop nests for CGRAs. In Proc. the 50th ACM/EDAC/IEEE Design Automation Conference, May 29-June 7, 2013, Article No.19. https://doi.org/10.1145/2463209.2488757.
Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B. A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices, 2013, 48(6): 495-506. https://doi.org/10.1145/2499370.2462163.
Article Google Scholar
Nowatzki T, Gangadhar V, Sankaralingam K. Exploring the potential of heterogeneous von Neumann/dataflow execution models. In Proc. the 42nd Annual International Symposium on Computer Architecture, June 2015, pp.298-310. https://doi.org/10.1145/2749469.2750380.
Sankaralingam K, Nagarajan R, McDonald R et al. Distributed microarchitectural protocols in the TRIPS proto-type processor. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.480-491. https://doi.org/10.1109/MICRO.2006.19.
Putnam A, Swanson S, Mercaldi M, Michelson K, Petersen A, Schwerin A, Oskin M, Eggers S. The microarchitecture of a pipelined WaveScalar processor: An RTL-based study. Technical Report, University of Washington, 2004. http://cseweb.ucsd.edu/swanson/papers/TR-2004-11-02.pdf, Sept. 2020.
Shimada T, Hiraki K, Nishida K, Sekiguchi S. Evaluation of a prototype data ow processor of the SIGMA-1 for scientific computations. In Proc. the 13th Annual International Symposium on Computer Architecture, June 1986, pp.226-234.
Papadopoulos G M, Culler D E. Monsoon: An explicit token-store architecture. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.398-407. https://doi.org/10.1145/285930.285999.
Govindaraju V, Ho C H, Nowatzki T, Chhugani J, Satish N, Sankaralingam K, Kim C. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro, 2012, 32(5): 38-51. https://doi.org/10.1109/MM.2012.51.
Article Google Scholar
Shen X, Ye X, Tan X, Wang D, Zhang L, Li W, Zhang Z, Fan D. An efficient network-on-chip router for dataflow architecture. Journal of Computer Science and Technology, 2017, 32(1): 11-25. https://doi.org/10.1007/s11390-017-1703-5.
Article Google Scholar
Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers S J. Instruction scheduling for a tiled dataflow architecture. ACM SIGPLAN Notices, 2006, 41(11): 141-150. https://doi.org/10.1145/1168918.1168876.
Article Google Scholar
Voitsechov D, Etsion Y. Single-graph multiple flows: Energy efficient design alternative for GPGPUs. In Proc. the 41st ACM/IEEE Annual International Symposium on Computer Architecture, June 2014, pp.205-216. https://doi.org/10.1109/ISCA.2014.6853234.
Lee J K F, Smith A J. Branch prediction strategies and branch target buffer design. Computer, 1984, 17(1): 6-22. https://doi.org/10.1109/MC.1984.1658927.
Article Google Scholar
Ye X, Fan D, Sun N, Tang S, Zhang M, Zhang H. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture. In Proc. the 2013 International Symposium on Low Power Electronics and Design, September 2013, pp.273-278. https://doi.org/10.1109/ISLPED.2013.6629308.
Han R, Lu X Y, Xu J T. On Big Data Benchmarking. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Zhan J, Han R, Weng C (eds.), Springer, 2014, pp.3-18. https://doi.org/10.1007/978-3-319-13021-7_1.
Burger D, Austin T M. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News, 1997, 25(3): 13-25. https://doi.org/10.1145/268806.268810.
Article Google Scholar
Kurzak J, Tomov S, Dongarra J. Autotuning GEMM kernels for the Fermi GPU. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(11): 2045-2057. https://doi.org/10.1109/TPDS.2011.311.
Article Google Scholar
Del Mundo C, Feng W. Towards a performance-portable FFT library for heterogeneous computing. In Proc. the 11th ACM Conference on Computing Frontiers, May 2014, Article No. 11. https://doi.org/10.1145/2597917.2597943.
Holewinski J, Pouchet L N, Sadayappan P. High-performance code generation for stencil computations on GPU architectures. In Proc. the 26th ACM International Conference on Supercomputing, June 2012, pp.311-320. https://doi.org/10.1145/2304576.2304619.
Stratton J A, Rodrigues C, Sung I, Obeid N, Chang L, Anssari N, Liu G D, Hwu W W. Parboil: A revised benchmark suite for scientific and commercial through-put computing. Technical Report, University of Illinois at Urbana-Champaign, 2012. http://impact.crhc.illinois.e-du/Shared/Docs/impact-12-01.parboil.pdf, Sept. 2020.
Siehl K, Zhao X. Supporting energy-efficient computing on heterogeneous CPU-GPU architectures. In Proc. the 5th IEEE International Conference on Future Internet of Things and Cloud, August 2017, pp.134-141. https://doi.org/10.1109/FiCloud.2017.46.
Burtscher M, Zecena I, Zong Z. Measuring GPU power with the K20 built-in sensor. In Proc. the 7th Workshop on General Purpose Processing Using GPUs, March 2014, pp.28-36. https://doi.org/10.1145/2588768.2576783.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Yu-Jing Feng, Xu Tan, Xiao-Chun Ye, Dong-Rui Fan, Wen-Ming Li, Da Wang, Hao Zhang & Zhi-Min Tang
Beijing Smartchip Microelectronics Technology Company Limited, Beijing, 100000, China
De-Jian Li
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 100190, China
Dong-Rui Fan

Authors

Yu-Jing Feng
View author publications
You can also search for this author inPubMed Google Scholar
De-Jian Li
View author publications
You can also search for this author inPubMed Google Scholar
Xu Tan
View author publications
You can also search for this author inPubMed Google Scholar
Xiao-Chun Ye
View author publications
You can also search for this author inPubMed Google Scholar
Dong-Rui Fan
View author publications
You can also search for this author inPubMed Google Scholar
Wen-Ming Li
View author publications
You can also search for this author inPubMed Google Scholar
Da Wang
View author publications
You can also search for this author inPubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhi-Min Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiao-Chun Ye.

Supplementary Information

ESM 1

(PDF 126 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, YJ., Li, DJ., Tan, X. et al. Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism. J. Comput. Sci. Technol. 37, 942–959 (2022). https://doi.org/10.1007/s11390-020-0555-6

Download citation

Received: 15 April 2020
Accepted: 17 December 2020
Published: 30 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11390-020-0555-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Drawbacks of Programming Dataflow Architectures and Methods to Overcome Them

A Non-Stop Double Buffering Mechanism for Dataflow Architecture

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now