Efficient Operator Sharing Modulo Scheduling for Sum-Product Network Inference on FPGAs

Kruppe, Hanna; Sommer, Lukas; Weber, Lukas; Oppermann, Julian; Axenie, Cristian; Koch, Andreas

doi:10.1007/978-3-031-04580-6_16

Efficient Operator Sharing Modulo Scheduling for Sum-Product Network Inference on FPGAs

Conference paper
First Online: 27 April 2022

953 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13227))

Abstract

Probabilistic models are receiving increasing attention as a complementary alternative to more widespread machine learning approaches such as neural networks. One particularly interesting class of models, so-called Sum-Product Networks (SPNs), combine the expressiveness of probabilistic models with tractable inference, making them an interesting candidate for use in real-world applications.

Previously, inference in SPNs has successfully been accelerated by fully pipelined FPGA-based hardware. However, with these approaches, the maximum size of the SPN for FPGA acceleration has effectively been limited by the fully spatial mapping of arithmetic operations into hardware and the number of available resources in the FPGA.

In this work, we present an extended and specialized modulo scheduling algorithm based on Integer Linear Programming (ILP) for time-multiplexed sharing of hardware arithmetic operators in the SPN inference accelerator. In addition and in order to scale the scheduling to large SPN graphs, we combine the scheduling algorithm with a graph-partitioning heuristic, exploiting the graph structure of SPNs.

The combination of heuristic graph partitioning and ILP-based scheduling allows generating pipelined accelerators with the best possible initiation interval, while limiting the resource utilization to pre-set bounds. The evaluation discusses the effect different parameters have on convergence time and solution quality. A performance comparison shows that the FPGA improves the inference throughput over a comparable CPU- and GPU platform by a factor (geo.-mean) of 4.4x and 1.7x, respectively.

The authors would like to thank Xilinx Inc. for supporting their work by donations of hard- and software. Calculations for this research were conducted on the Lichtenberg high performance computer of TU Darmstadt. This research was partially funded by the German Federal Ministry for Education and Research (BMBF) with the funding ID ZN 01\(\vert \)S17050.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Specifically, variables \(\hat{x}_{i j}, \hat{y}_{i j}, \hat{z}_{i v}\) and all constraints mentioning them.
2.
https://github.com/esa-tu-darmstadt/tapasco.

References

Canis, A., Brown, S.D., Anderson, J.H.: Modulo SDC scheduling with recurrence minimization in high-level synthesis. In: International Conference on Field Programmable Logic and Applications (FPL) (2014)
Google Scholar
Canis, A., et al.: LegUp: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embedded Comput. Syst. (TECS) 13(2), 1–27 (2013)
Google Scholar
Codina, J.M., Llosa, J., González, A.: A comparative study of modulo scheduling techniques. In: International Conference on Supercomputing (ICS 2002) (2002)
Google Scholar
Cong, J., Xu, J.: Simultaneous FU and register binding based on network flow method. In: Design, Automation and Test in Europe (2008)
Google Scholar
Dai, S., Zhang, Z.: Improving scalability of exact modulo scheduling with specialized conflict-driven learning. In: Design Automation Conference (2019)
Google Scholar
Fan, K., Kudlur, M., Park, H., Mahlke, S.: Cost sensitive modulo scheduling in a loop accelerator synthesis system. In: IEEE/ACM International Symposium on Microarchitecture (MICRO2005) (2005)
Google Scholar
Heinz, C., Hofmann, J., Korinth, J., Sommer, L., Weber, L., Koch, A.: The TaPaSCo open-source Toolflow. J. Sign. Process. Syst. 93(5), 545–563 (2021). https://doi.org/10.1007/s11265-021-01640-8
Article Google Scholar
Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: Programming Language Design and Implementation (PLDI) (1988)
Google Scholar
Memik, S.O., Memik, G., Jafari, R., Kursun, E.: Global resource sharing for synthesis of control data flow graphs on FPGAs. In: Design Automation Conference (2003)
Google Scholar
Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., Kersting, K.: Mixed sum-product networks: a deep architecture for hybrid domains. In: Thirty-Second AAAI Conference on artificial intelligence (2018)
Google Scholar
Ober, M., Hofmann, J., Sommer, L., Weber, L., Koch, A.: High-throughput multi-threaded sum-product network inference in the reconfigurable cloud. In: Workshop on Heterogeneous High-performance Reconfigurable Computing (2019)
Google Scholar
Oppermann, J., Sittel, P., Kumm, M., Reuter-Oppermann, M., Koch, A., Sinnen, O.: Design-space exploration with multi-objective resource-aware modulo scheduling. In: Conference on Parallel and Distributed Computing (Euro-Par) (2019)
Google Scholar
Peharz, R., Tschiatschek, S., Pernkopf, F., Domingos, P.: On theoretical properties of sum-product networks. In: Artificial Intelligence and Statistics (2015)
Google Scholar
Peharz, R., et al.: Random sum-product networks: a simple but effective approach to probabilistic deep learning. In: Proceedings of UAI (2019)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: IEEE International Conference on Computer Vision Workshops (2011)
Google Scholar
Rau, B.R.: Iterative modulo scheduling. Int. J. Parall. Programm. 24(1), 3–64 (1996). https://doi.org/10.1007/BF03356742
Article Google Scholar
Sánchez-Cauce, R., París, I., Díez, F.J.: Sum-product networks: a survey. IEEE Trans. Patt. Anal. Mach. Intell. (2021)
Google Scholar
Sittel, P., Kumm, M., Oppermann, J., Möller, K., Zipf, P., Koch, A.: ILP-based modulo scheduling and binding for register minimization. In: International Conference on Field Programmable Logic and Applications (FPL) (2018)
Google Scholar
Sommer, L., Oppermann, J., Molina, A., Binnig, C., Kersting, K., Koch, A.: Automatic mapping of the sum-product network inference problem to FPGA-based accelerators. In: IEEE International Conference on Computer Design (ICCD) (2018)
Google Scholar
Sommer, L., Weber, L., Kumm, M., Koch, A.: Comparison of arithmetic number formats for inference in sum-product networks on FPGAs. In: International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2020)
Google Scholar
Šůcha, P., Hanzálek, Z.: A cyclic scheduling problem with an undetermined number of parallel identical processors. Comput. Optim. Appl. (2011). https://doi.org/10.1007/s10589-009-9239-4
Venieris, S.I., Kouris, A., Bouganis, C.S.: Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions. ACM Comput. Surv. 51(3) (2018)
Google Scholar
Weber, L., Sommer, L., Oppermann, J., Molina, A., Kersting, K., Koch, A.: Resource-efficient logarithmic number scale arithmetic for SPN inference on FPGAs. In: International Conference on Field-Programmable Technology (FPT) (2019)
Google Scholar
Zhang, Z., Liu, B.: SDC-based modulo scheduling for pipeline synthesis. In: IEEE/ACM International Conference on Computer-Aided Design (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Embedded Systems and Applications Group, Technical University Darmstadt, Darmstadt, Germany
Hanna Kruppe, Lukas Sommer, Lukas Weber, Julian Oppermann & Andreas Koch
Intelligent Cloud Technologies Laboratory, Huawei Munich Research Center, Munich, Germany
Cristian Axenie

Authors

Hanna Kruppe
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Sommer
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Weber
View author publications
You can also search for this author in PubMed Google Scholar
Julian Oppermann
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Axenie
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Koch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanna Kruppe .

Editor information

Editors and Affiliations

University of California, San Diego, La Jolla, CA, USA
Alex Orailoglu
Fraunhofer IESE, Kaiserslautern, Germany
Matthias Jung
Brandenburg University of Technology, Cottbus, Germany
Marc Reichenbach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kruppe, H., Sommer, L., Weber, L., Oppermann, J., Axenie, C., Koch, A. (2022). Efficient Operator Sharing Modulo Scheduling for Sum-Product Network Inference on FPGAs. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2021. Lecture Notes in Computer Science, vol 13227. Springer, Cham. https://doi.org/10.1007/978-3-031-04580-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-04580-6_16
Published: 27 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04579-0
Online ISBN: 978-3-031-04580-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics