Abstract
Probabilistic models are receiving increasing attention as a complementary alternative to more widespread machine learning approaches such as neural networks. One particularly interesting class of models, so-called Sum-Product Networks (SPNs), combine the expressiveness of probabilistic models with tractable inference, making them an interesting candidate for use in real-world applications.
Previously, inference in SPNs has successfully been accelerated by fully pipelined FPGA-based hardware. However, with these approaches, the maximum size of the SPN for FPGA acceleration has effectively been limited by the fully spatial mapping of arithmetic operations into hardware and the number of available resources in the FPGA.
In this work, we present an extended and specialized modulo scheduling algorithm based on Integer Linear Programming (ILP) for time-multiplexed sharing of hardware arithmetic operators in the SPN inference accelerator. In addition and in order to scale the scheduling to large SPN graphs, we combine the scheduling algorithm with a graph-partitioning heuristic, exploiting the graph structure of SPNs.
The combination of heuristic graph partitioning and ILP-based scheduling allows generating pipelined accelerators with the best possible initiation interval, while limiting the resource utilization to pre-set bounds. The evaluation discusses the effect different parameters have on convergence time and solution quality. A performance comparison shows that the FPGA improves the inference throughput over a comparable CPU- and GPU platform by a factor (geo.-mean) of 4.4x and 1.7x, respectively.
The authors would like to thank Xilinx Inc. for supporting their work by donations of hard- and software. Calculations for this research were conducted on the Lichtenberg high performance computer of TU Darmstadt. This research was partially funded by the German Federal Ministry for Education and Research (BMBF) with the funding ID ZN 01\(\vert \)S17050.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Specifically, variables \(\hat{x}_{i j}, \hat{y}_{i j}, \hat{z}_{i v}\) and all constraints mentioning them.
- 2.
References
Canis, A., Brown, S.D., Anderson, J.H.: Modulo SDC scheduling with recurrence minimization in high-level synthesis. In: International Conference on Field Programmable Logic and Applications (FPL) (2014)
Canis, A., et al.: LegUp: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embedded Comput. Syst. (TECS) 13(2), 1–27 (2013)
Codina, J.M., Llosa, J., González, A.: A comparative study of modulo scheduling techniques. In: International Conference on Supercomputing (ICS 2002) (2002)
Cong, J., Xu, J.: Simultaneous FU and register binding based on network flow method. In: Design, Automation and Test in Europe (2008)
Dai, S., Zhang, Z.: Improving scalability of exact modulo scheduling with specialized conflict-driven learning. In: Design Automation Conference (2019)
Fan, K., Kudlur, M., Park, H., Mahlke, S.: Cost sensitive modulo scheduling in a loop accelerator synthesis system. In: IEEE/ACM International Symposium on Microarchitecture (MICRO2005) (2005)
Heinz, C., Hofmann, J., Korinth, J., Sommer, L., Weber, L., Koch, A.: The TaPaSCo open-source Toolflow. J. Sign. Process. Syst. 93(5), 545–563 (2021). https://doi.org/10.1007/s11265-021-01640-8
Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: Programming Language Design and Implementation (PLDI) (1988)
Memik, S.O., Memik, G., Jafari, R., Kursun, E.: Global resource sharing for synthesis of control data flow graphs on FPGAs. In: Design Automation Conference (2003)
Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., Kersting, K.: Mixed sum-product networks: a deep architecture for hybrid domains. In: Thirty-Second AAAI Conference on artificial intelligence (2018)
Ober, M., Hofmann, J., Sommer, L., Weber, L., Koch, A.: High-throughput multi-threaded sum-product network inference in the reconfigurable cloud. In: Workshop on Heterogeneous High-performance Reconfigurable Computing (2019)
Oppermann, J., Sittel, P., Kumm, M., Reuter-Oppermann, M., Koch, A., Sinnen, O.: Design-space exploration with multi-objective resource-aware modulo scheduling. In: Conference on Parallel and Distributed Computing (Euro-Par) (2019)
Peharz, R., Tschiatschek, S., Pernkopf, F., Domingos, P.: On theoretical properties of sum-product networks. In: Artificial Intelligence and Statistics (2015)
Peharz, R., et al.: Random sum-product networks: a simple but effective approach to probabilistic deep learning. In: Proceedings of UAI (2019)
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: IEEE International Conference on Computer Vision Workshops (2011)
Rau, B.R.: Iterative modulo scheduling. Int. J. Parall. Programm. 24(1), 3–64 (1996). https://doi.org/10.1007/BF03356742
Sánchez-Cauce, R., París, I., Díez, F.J.: Sum-product networks: a survey. IEEE Trans. Patt. Anal. Mach. Intell. (2021)
Sittel, P., Kumm, M., Oppermann, J., Möller, K., Zipf, P., Koch, A.: ILP-based modulo scheduling and binding for register minimization. In: International Conference on Field Programmable Logic and Applications (FPL) (2018)
Sommer, L., Oppermann, J., Molina, A., Binnig, C., Kersting, K., Koch, A.: Automatic mapping of the sum-product network inference problem to FPGA-based accelerators. In: IEEE International Conference on Computer Design (ICCD) (2018)
Sommer, L., Weber, L., Kumm, M., Koch, A.: Comparison of arithmetic number formats for inference in sum-product networks on FPGAs. In: International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2020)
Šůcha, P., Hanzálek, Z.: A cyclic scheduling problem with an undetermined number of parallel identical processors. Comput. Optim. Appl. (2011). https://doi.org/10.1007/s10589-009-9239-4
Venieris, S.I., Kouris, A., Bouganis, C.S.: Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions. ACM Comput. Surv. 51(3) (2018)
Weber, L., Sommer, L., Oppermann, J., Molina, A., Kersting, K., Koch, A.: Resource-efficient logarithmic number scale arithmetic for SPN inference on FPGAs. In: International Conference on Field-Programmable Technology (FPT) (2019)
Zhang, Z., Liu, B.: SDC-based modulo scheduling for pipeline synthesis. In: IEEE/ACM International Conference on Computer-Aided Design (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kruppe, H., Sommer, L., Weber, L., Oppermann, J., Axenie, C., Koch, A. (2022). Efficient Operator Sharing Modulo Scheduling for Sum-Product Network Inference on FPGAs. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2021. Lecture Notes in Computer Science, vol 13227. Springer, Cham. https://doi.org/10.1007/978-3-031-04580-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-04580-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04579-0
Online ISBN: 978-3-031-04580-6
eBook Packages: Computer ScienceComputer Science (R0)