research-article

Combining computation and communication optimizations in system synthesis for streaming applications

Authors:

Jason Cong,

Muhuan Huang,

Peng ZhangAuthors Info & Claims

FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Pages 213 - 222

https://doi.org/10.1145/2554688.2554771

Published: 26 February 2014 Publication History

Get Access

Abstract

Data streaming is a widely-used technique to exploit task-level parallelism in many application domains such as video processing, signal processing and wireless communication. In this paper we propose an efficient system-level synthesis flow to map streaming applications onto FPGAs with consideration of simultaneous computation and communication optimizations. The throughput of a streaming system is significantly impacted by not only the performance and number of replicas of the computation kernels, but also the buffer size allocated for the communications between kernels. In general, module selection/replication and buffer size optimization were addressed separately in previous work. Our approach combines these optimizations together in system scheduling which minimizes the area cost for both logic and memory under the required throughput constraint. We first propose an integer linear program (ILP) based solution to the combined problem which has the optimal quality of results. Then we propose an iterative algorithm which can achieve the near-optimal quality of results but has a significant improvement on the algorithm scalability for large and complex designs. The key contribution is that we have a polynomial-time algorithm for an exact schedulability checking problem and a polynomial-time algorithm to improve the system performance with better module implementation and buffer size optimization. Experimental results show that compared to the separate scheme of module select/replication and buffer size optimization, the combined optimization scheme can gain 62% area saving on average under the same performance requirements. Moreover, our heuristic can achieve 2 to 3 orders of magnitude of speed-up in runtime, with less than 10% area overhead compared to the optimal solution by ILP.

References

[1]

E. Lee and D. Messerschmitt, "Synchronous data flow," Proceedings of the IEEE, vol. 75, no. 9, pp. 1235--1245, 1987.

Abstract

References

Cited By

Index Terms

Recommendations

Combining module selection and resource sharing for efficient FPGA pipeline synthesis

SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications

ILP-based cost-optimal DSP synthesis with module selection and data format conversion

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations