Abstract
Dataflow architecture is a promising parallel computing platform with high performance, efficiency and flexibility. Dataflow mapping algorithm offloads programs onto dataflow hardware, which has a significant impact on the performance of the architecture. Dataflow mapping methods in previous studies are hardly efficient as they rarely consider the requirements of routing resources. In this paper, we propose a routing-aware mapping algorithm by combining hardware resources and dataflow graph characteristics to explore better mapping schemes. Our method first focuses on the influence of predecessor and successor nodes when mapping a node, and then comprehensively considers the competition of computing resources and the routing cost, to find the mapping solution with the lowest overhead. Experiments demonstrate that our method can achieve up to 2.06\(\times \) performance improvement and 12.8% energy consumption reduction compared to state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dennis, J.B.: First version of a data flow procedure language. In: Programming Symposium, pp. 362–376 (1974)
Dennis, J., Gao, G.: An efficient pipelined dataflow processor architecture. In: Supercomputing, pp. 368–373 (1988)
Weng, J., Liu, S., Wang, Z., Dadu, V., Nowatzki, T.: A hybrid systolicdataflow architecture for inductive matrix algorithms. In: IEEE HPCA 2020, pp. 703–716 (2020)
Prabhakar, R., Zhang, Y., Koeplinger, D., et al.: Plasticine: a reconfigurable architecture for parallel paterns. In: ISCA 2017, pp. 389–402 (2017)
Guha, A., Vedula, N., Shriraman, A.: Deepframe: a profile-driven compiler for spatial hardware accelerators. In: PACT 2019, pp. 68–81 (2019)
Vilim, M., Rucker, A., Olukotun, K.: Aurochs: an architecture for dataflow threads. In: ISCA 2021, pp. 402–415 (2021)
Zuckerman, S., Suetterlein, J., Knauerhase, R., et al.: Position paper: using a codelet program execution model for exascale machines. In: EXADAPT Workshop: volume 10. Citeseer (2011)
Arteaga, J., Zuckerman, S., Gao, G.R.: Multigrain parallelism: bridging coarse-grain parallel programs and fine-grain event-driven multithreading. In: IPDPS2017, pp. 799–808 (2017)
Li, Z., Ye, Y., Neuendorffer, S., Sampson, A.: Compiler-driven simulation of reconfigurable hardware accelerators. In: 2022 IEEE HPCA, pp. 619–632 (2022)
Man, X., Liu, L., Zhu, J., et al.: A general pattern-based dynamic compilation framework for coarse-grained reconfigurable architectures. In: DAC 2019, pp. 1–6 (2019)
Hamzeh, M., Shrivastava, A., et al.: REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures. In: DAC 2013, pp. 1–10 (2013)
Jiang, G., Li, Z., Wang, F., Wei, S.: Mapping of embedded applications on hybrid networks-on-chip with multiple switching mechanisms. IEEE Embed. Syst. Lett. 7(2), 59–62 (2015)
Murali, S., Micheli, G.D.: Bandwidth-constrained mapping of cores onto NoC architectures. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 896–901 (2004)
Hu, J., Marculescu, R.: Energy and performance-aware mapping for regular NoC architectures. IEEE TCAD 24(4), 551–562 (2005)
Zhu, D., Chen, L., Yue, S., Pedram, M.: Application mapping for express channel-based networks-on-chip. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 1–6 (2014)
Sahu, P.K., et al.: Application mapping onto mesh-based networkon-chip using discrete particle swarm optimization. IEEE Trans. VLSI. 22(2), 300–312 (2014)
Singh, A.K., et al.: Mapping on multi/many-core systems: survey of current and emerging trends. In: Design Automation Conference (DAC), pp. 1–10 (2013)
Pouchet, L.-N., et al.: Polybench: The polyhedral benchmark suite (2012). http://www.cs.ucla.edu/pouchet/software/polybench
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538), pp. 3–14 (2001). https://doi.org/10.1109/WWC.2001.990739
Standard Performance Evaluation Corporation. SPEC CPU2000 (2000). https://www.spec.org/cpu2000/. Accessed 27 Mar 2022
Ye, X., Fan, D., Sun, N., Tang, S., Zhang, M., Zhang, H.: SimICT: a fast and flexible framework for performance and power evaluation of large-scale architecture. In: International Symposium on Low Power Electronics and Design, pp. 273–278 (2013)
Acknowledgments
This work was partly supported by National Natural Science Foundation of China (Grant No. 61732018 and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), CAS Project for Youth Innovation Promotion Association, Zhejiang Lab (Grant No. 2022PB0AB01), and the project of the state grid corporation of China in 2020 “Integration Technology Research and Prototype Development for High End Controller Chip” (Grant No. 5700-202041264A-0-0-00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Fan, Z., Li, W., Liu, T., An, X., Ye, X., Fan, D. (2022). A Routing-Aware Mapping Method for Dataflow Architectures. In: Liu, S., Wei, X. (eds) Network and Parallel Computing. NPC 2022. Lecture Notes in Computer Science, vol 13615. Springer, Cham. https://doi.org/10.1007/978-3-031-21395-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-21395-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21394-6
Online ISBN: 978-3-031-21395-3
eBook Packages: Computer ScienceComputer Science (R0)