Skip to main content

Advertisement

Log in

Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The dataflow architecture, which is characterized by a lack of a redundant unified control logic, has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency, especially of applications used in high-performance computing (HPC). Importantly, the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner. Therefore, a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts. Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection, or a handshake mechanism in which the data is only sent after acknowledgments are received. However, these mechanisms are characterized by both inefficient data transfer and increased area cost. Good performance of the dataflow architecture depends on the efficiency of data transfer. In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost, we propose a Look-Ahead Acknowledgment (LAA) mechanism. LAA accelerates the execution ow by speculatively acknowledging ahead without penalties. Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%, with a reduction in the average execution time by 17.4% and an increase in the average power efficiency of dataflow processors by 22.4%. Crucially, our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%. In conclusion, the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dennis J B. Retrospective: A preliminary architecture for a basic data-flow processor. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.2-4. https://doi.org/10.1145/285930.285932.

  2. Arvind, Nikhil R S. Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, 1990, 39(3): 300-318. https://doi.org/10.1109/12.48862.

    Article  Google Scholar 

  3. Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Burger D, Keckler S W, Moore C R. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proc. the 30th Annual International Symposium on Computer Architecture, June 2003, pp.422-433. https://doi.org/10.1109/ISCA.2003.1207019.

  4. Swanson S, Michelson K, Schwerin A, Oskin M. WaveScalar. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture, December 2003, pp.291-302. https://doi.org/10.1109/MICRO.2003.1253203.

  5. Pratas F, Oriato D, Pell O, Mata R A, Sousa L. Accelerating the computation of induced dipoles for molecular mechanics with dataflow engines. In Proc. the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, April 2013, pp.177-180. https://doi.org/10.1109/FCCM.2013.34.

  6. Fu H, Gan L, Clapp R G, Ruan H, Pell O, Mencer O, Flynn M, Huang X, Yang G. Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro, 2014, 34(1): 30-40. https://doi.org/10.1109/MM.2013.111.

    Article  Google Scholar 

  7. Coons K E, Chen X, Burger D, McKinley K S, Kushwaha S K. A spatial path scheduling algorithm for EDGE architectures. In Proc. the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006, pp.129-140. https://doi.org/10.1145/1168857.1168875.

  8. Liu D, Yin S, Liu L, Wei S. Polyhedral model based mapping optimization of loop nests for CGRAs. In Proc. the 50th ACM/EDAC/IEEE Design Automation Conference, May 29-June 7, 2013, Article No.19. https://doi.org/10.1145/2463209.2488757.

  9. Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B. A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices, 2013, 48(6): 495-506. https://doi.org/10.1145/2499370.2462163.

    Article  Google Scholar 

  10. Nowatzki T, Gangadhar V, Sankaralingam K. Exploring the potential of heterogeneous von Neumann/dataflow execution models. In Proc. the 42nd Annual International Symposium on Computer Architecture, June 2015, pp.298-310. https://doi.org/10.1145/2749469.2750380.

  11. Sankaralingam K, Nagarajan R, McDonald R et al. Distributed microarchitectural protocols in the TRIPS proto-type processor. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.480-491. https://doi.org/10.1109/MICRO.2006.19.

  12. Putnam A, Swanson S, Mercaldi M, Michelson K, Petersen A, Schwerin A, Oskin M, Eggers S. The microarchitecture of a pipelined WaveScalar processor: An RTL-based study. Technical Report, University of Washington, 2004. http://cseweb.ucsd.edu/swanson/papers/TR-2004-11-02.pdf, Sept. 2020.

  13. Shimada T, Hiraki K, Nishida K, Sekiguchi S. Evaluation of a prototype data ow processor of the SIGMA-1 for scientific computations. In Proc. the 13th Annual International Symposium on Computer Architecture, June 1986, pp.226-234.

  14. Papadopoulos G M, Culler D E. Monsoon: An explicit token-store architecture. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.398-407. https://doi.org/10.1145/285930.285999.

  15. Govindaraju V, Ho C H, Nowatzki T, Chhugani J, Satish N, Sankaralingam K, Kim C. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro, 2012, 32(5): 38-51. https://doi.org/10.1109/MM.2012.51.

    Article  Google Scholar 

  16. Shen X, Ye X, Tan X, Wang D, Zhang L, Li W, Zhang Z, Fan D. An efficient network-on-chip router for dataflow architecture. Journal of Computer Science and Technology, 2017, 32(1): 11-25. https://doi.org/10.1007/s11390-017-1703-5.

    Article  Google Scholar 

  17. Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers S J. Instruction scheduling for a tiled dataflow architecture. ACM SIGPLAN Notices, 2006, 41(11): 141-150. https://doi.org/10.1145/1168918.1168876.

    Article  Google Scholar 

  18. Voitsechov D, Etsion Y. Single-graph multiple flows: Energy efficient design alternative for GPGPUs. In Proc. the 41st ACM/IEEE Annual International Symposium on Computer Architecture, June 2014, pp.205-216. https://doi.org/10.1109/ISCA.2014.6853234.

  19. Lee J K F, Smith A J. Branch prediction strategies and branch target buffer design. Computer, 1984, 17(1): 6-22. https://doi.org/10.1109/MC.1984.1658927.

    Article  Google Scholar 

  20. Ye X, Fan D, Sun N, Tang S, Zhang M, Zhang H. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture. In Proc. the 2013 International Symposium on Low Power Electronics and Design, September 2013, pp.273-278. https://doi.org/10.1109/ISLPED.2013.6629308.

  21. Han R, Lu X Y, Xu J T. On Big Data Benchmarking. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Zhan J, Han R, Weng C (eds.), Springer, 2014, pp.3-18. https://doi.org/10.1007/978-3-319-13021-7_1.

  22. Burger D, Austin T M. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News, 1997, 25(3): 13-25. https://doi.org/10.1145/268806.268810.

    Article  Google Scholar 

  23. Kurzak J, Tomov S, Dongarra J. Autotuning GEMM kernels for the Fermi GPU. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(11): 2045-2057. https://doi.org/10.1109/TPDS.2011.311.

    Article  Google Scholar 

  24. Del Mundo C, Feng W. Towards a performance-portable FFT library for heterogeneous computing. In Proc. the 11th ACM Conference on Computing Frontiers, May 2014, Article No. 11. https://doi.org/10.1145/2597917.2597943.

  25. Holewinski J, Pouchet L N, Sadayappan P. High-performance code generation for stencil computations on GPU architectures. In Proc. the 26th ACM International Conference on Supercomputing, June 2012, pp.311-320. https://doi.org/10.1145/2304576.2304619.

  26. Stratton J A, Rodrigues C, Sung I, Obeid N, Chang L, Anssari N, Liu G D, Hwu W W. Parboil: A revised benchmark suite for scientific and commercial through-put computing. Technical Report, University of Illinois at Urbana-Champaign, 2012. http://impact.crhc.illinois.e-du/Shared/Docs/impact-12-01.parboil.pdf, Sept. 2020.

  27. Siehl K, Zhao X. Supporting energy-efficient computing on heterogeneous CPU-GPU architectures. In Proc. the 5th IEEE International Conference on Future Internet of Things and Cloud, August 2017, pp.134-141. https://doi.org/10.1109/FiCloud.2017.46.

  28. Burtscher M, Zecena I, Zong Z. Measuring GPU power with the K20 built-in sensor. In Proc. the 7th Workshop on General Purpose Processing Using GPUs, March 2014, pp.28-36. https://doi.org/10.1145/2588768.2576783.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Chun Ye.

Supplementary Information

ESM 1

(PDF 126 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, YJ., Li, DJ., Tan, X. et al. Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism. J. Comput. Sci. Technol. 37, 942–959 (2022). https://doi.org/10.1007/s11390-020-0555-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-0555-6

Keywords

Navigation