skip to main content
10.1145/3240765.3240834guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Optimizing Data Layout and System Configuration on FPGA-based Heterogeneous Platforms

Published: 05 November 2018 Publication History

Abstract

The most attractive feature of field-programmable gate arrays (FPGAs) is their configuration flexibility. However, if the configuration is performed manually, this flexibility places a heavy burden on system designers to choose among a vast number of configuration parameters and program transformations. In this paper, we improve the state-of-the-art with two main innovations: First, we apply compiler-automated transformations to the data layout and program statements to create streaming accesses. Such accesses are turned into streaming interfaces when the kernels are implemented in hardware, allowing the kernels to run efficiently. Second, we use two-step mixed integer programming to first minimize the execution time and then to minimize energy dissipation. Configuration parameters are chosen automatically, including several important ones omitted by existing models. Experimental results demonstrate significant performance gains and energy savings using these techniques.

References

[1]
Anderson, J.M., Amarasinghe, S.P., and Lam, M.S. Data and computation transformations for multiprocessors. In ACM SIGPLAN Notices (1995), vol. 30, ACM, pp. 166–178.
[2]
Arató, P., Juhász, S., Mann, Z.Á., Orbán, A., and Papp, D. Hardware-software partitioning in embedded system design. In International Symposium on Intelligent Signal Processing (2003), IEEE, pp. 197–202.
[3]
Aref, M., Ten Cate, B., Green, T.J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T.L., and Washburn, G. Design and implementation of the LogicBlox system. In International Conference on Management of Data (2015), ACM, pp. 1371–1382.
[4]
Choi, K., Soma, R., and Pedram, M. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. IEEE transactions on computer-aided design of integrated circuits and systems 24, 1 (2005), 18–28.
[5]
Choi, Y.-K., Zhang, P., Li, P., and Cong, J. Hlscope+: Fast and accurate performance estimation for fpga hls. In International Conference on Computer-Aided Design (2017), IEEE, pp. 691–698.
[6]
Cong, J., Huang, M., and Zhang, P. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of international symposium on Field-programmable gate arrays (2014), ACM, pp. 213–222.
[7]
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., and Zadeck, F.K. Efficiently computing static single assignment form and the control dependence graph. A CM Transactions on Programming Languages and Systems 13, 4 (1991), 451–490.
[8]
Dick, R.P., and Jha, N.K. Mogac: A multiobjective genetic algorithm for the co-synthesis of hardware-software embedded systems. In Proceedings of international conference on Computer-aided design (1997), IEEE Computer Society, pp. 522–529.
[9]
Gurobi Optimization, I. Gurobi optimizer reference manual, 2016.
[10]
Henkel, J. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of Design Automation Conference (1999), ACM, pp. 122–127.
[11]
Jigang, W., Sun, Q., and Srikanthan, T. Multiple-choice hardware/software partitioning: Computing model and algorithms. In International Conference on Computer Engineering and Technology (ICCET) (2010), vol. 2, IEEE, pp. V2–61.
[12]
Kim, N.S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J.S., Irwin, M.J., Kandemir, M., and Narayanan, V. Leakage current: Moore's law meets static power. computer 36, 12 (2003), 68–75.
[13]
Koeplinger, D., Delimitrou, C., Prabhakar, R., Kozyrakis, C., Zhang, Y., and Olukotun, K. Automatic generation of efficient accelerators for reconfigurable hardware. In International Symposium on Computer Architecture (2016), IEEE Press, pp. 115–127.
[14]
Lattner, C., and Adve, V. Llvm: A compilation framework for lifelong program analysis & transformation. In International symposium on Code generation and optimization: feedback-directed and runtime optimization (2004), IEEE Computer Society, p. 75.
[15]
Lattner, C., Lenharth, A., and Adve, V. Making context-sensitive points-to analysis with heap cloning practical for the real world. ACM SIGPLAN Notices 42, 6 (2007), 278–289.
[16]
Liu, D., and Schafer, B.C. Efficient and reliable high-level synthesis design space explorer for FPGAs. In International Conference on Field Programmable Logic and Applications (2016), IEEE, pp. 1–8.
[17]
Nguyen, T., Gurumani, S., Rupnow, K., and Chen, D. FCUDA-SoC: Platform integration for field-programmable soc with the CUDA-to-FPGA compiler. In International Symposium on Field-Programmable Gate Arrays (2016), ACM, pp. 5–14.
[18]
Niemann, R., and Marwedel, P. An algorithm for hardware/software partitioning using mixed integer linear programming. Design Automation for Embedded Systems 2, 2 (1997), 165–193.
[19]
Palesi, M., and Givargis, T. Multi-objective design space exploration using genetic algorithms. In International Symposium on Hardware/Software Codesign (2002), IEEE, pp. 67–72.
[20]
Papakonstantinou, A., Liang, Y., Stratton, J.A., Gururaj, K., Chen, D., Hwu, W.-M. W., and Cong, J. Multilevel granularity parallelism synthesis on FPGAs. In International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2011), IEEE, pp. 178–185.
[21]
Piccolboni, L., Mantovani, P., Guglielmo, G.D., and Carloni, L.P. Cosmos: Coordination of high-level synthesis and memory optimization for hardware accelerators. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 150.
[22]
Pouchet, L.-N., Zhang, P., Sadayappan, P., and Cong, J. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (2013), ACM, pp. 29–38.
[23]
Purnaprajna, M., Reformat, M., and Pedrycz, W. Genetic algorithms for hardware-software partitioning and optimal resource allocation. Journal of Systems Architecture 53, 7 (2007), 339–354.
[24]
So, B., Hall, M.W., and Diniz, P.C. A compiler approach to fast hardware design space exploration in fpga-based systems. In Proceedings of the Conference on Programming Language Design and Implementation (2002), PLDI '02, pp. 165–176.
[25]
Srinivasan, V., Radhakrishnan, S., and Vemuri, R. Hardware software partitioning with integrated hardware design space exploration. In Design, Automation and Test in Europe (1998), IEEE, pp. 28–35.
[26]
Stitt, G. Hardware/software partitioning with multi-version implementation exploration. In Proceedings of the 18th A CM Great Lakes symposium on VLSI (2008), ACM, pp. 143–146.
[27]
Wiangtong, T., Cheung, P.Y., and Luk, W. Comparing three heuristic search methods for functional partitioning in hardware-software codesign. Design Automation for Embedded Systems 6, 4 (2002), 425–449.
[28]
Xilinx. Vivado design suite user guide. UG902.
[29]
Xilinx. Zc702 evaluation board for the Zynq-7000. UG850.
[30]
Zhong, G., Venkataramani, V., Liang, Y., Mitra, T., and Niar, S. Design space exploration of multiple loops on FPGAs using high level synthesis. In International Conference on Computer Design (2014), IEEE, pp. 456–463.

Cited By

View all
  • (2022)FPGA sharing in the cloud: a comprehensive analysisFrontiers of Computer Science10.1007/s11704-022-2127-017:5Online publication date: 24-Dec-2022

Index Terms

  1. Optimizing Data Layout and System Configuration on FPGA-based Heterogeneous Platforms
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2018
        939 pages

        Publisher

        IEEE Press

        Publication History

        Published: 05 November 2018

        Permissions

        Request permissions for this article.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)FPGA sharing in the cloud: a comprehensive analysisFrontiers of Computer Science10.1007/s11704-022-2127-017:5Online publication date: 24-Dec-2022

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media