Abstract
We describe an automated pipelining approach for optimally balanced pipeline implementation that achieves low area cost as well as meeting timing requirements. Most previous automatic pipelining methods have focused on Instruction Set Architecture (ISA)-based designs and the main goal of such methods generally has been maximizing performance as measured in terms of instructions per clock (IPC). By contrast, we focus on datapath-oriented designs (e.g., DSP filters for image or communication processing applications) in ASIC design flows. The goal of the proposed pipelining approach is to find the optimally pipelined design that not only meets the user-specified target clock frequency, but also seeks to minimize area cost of a given design. Unlike most previous approaches, the proposed methods incorporate the use of accurate area and timing information (iteratively achieved by synthesizing every interim pipelined design) to achieve higher accuracy during design exploration. When compared with exhaustive design exploration that considers all possible pipeline patterns, the two heuristic pipelining methods presented here involve only a small area penalty (typically under 5%) while offering dramatically reduced computational complexity. Experimental validation is performed with commercial ASIC design tools and described for applications including polynomial function evaluation, FIR filters, matrix multiplication, and discrete wavelet transform filter designs with a 90nm standard cell library.
- J. Campbell and N. Day. 2003. High-level optimization of pipeline design. In Proceedings of the 8th IEEE International High-Level Design Validation and Test Workshop (HLDVT'03). 43--48. Google ScholarDigital Library
- A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. 1992. Low-power CMOS digital design. IEEE J. Solid-State Circ. 27, 4, 473--483.Google ScholarCross Ref
- J. Cong, Y. Fan, and Z. Zhang. 2004. Architecture-level synthesis for automatic interconnect pipelining. In Proceedings of the Design Automation Conference (DAC'04). 602--607. Google ScholarDigital Library
- J. Cong, A. Jagannathan, K. Konigsfeld, D. Milliron, M. Mohan, G. Reinman, M. Romesis, and H. Yang. 2005. Microarchitecture evaluation with floorplanning and interconnect pipelining. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'05). 8--15. Google ScholarDigital Library
- J. Cong and C. Wu. 1997. FPGA synthesis with retiming and pipelining for clock period minimization of sequential circuits. In Proceedings of the 34th Annual Design Automation Conference (DAC'97). 644--649. Google ScholarDigital Library
- S. Devadas and A. Newton. 1989. Algorithms for hardware allocation in data dath svnthesis. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 8, 768--781. Google ScholarDigital Library
- M. Dhodhi, F. Hielscher, R. Storer, and J. Bhasker. 1995. Datapath synthesis using a problem-space genetic algorithm. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 14, 934--944. Google ScholarDigital Library
- M. Galceran-Oms, J. Cortadella, D. Bufistov, and M. Kishinevsky. 2010. Automatic microarchitectural pipelining. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'10). 961--964. Google ScholarDigital Library
- D. Kroening and W. Paul. 2001. Automated pipeline design. In Proceedings of the Design Automation Conference (DAC'01). Google ScholarDigital Library
- D. Lee, L. Kim, and J. Villasenor. 2012. Precision-aware self-quantizing hardware architectures for the discrete wavelet transform. IEEE Trans. Image Process. 21, 768--777. Google ScholarDigital Library
- Y. Ma, Z. Li, J. Cong, X. Hong, G. Reinman, S. Dong, and Q. Zhou. 2007. Micro-architecture pipelining optimization with throughput-aware floorplanning. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'07). 920--925. Google ScholarDigital Library
- M. Marinescu and M. Rinard. 2001. High-level automatic pipelining for sequential circuits. In Proceedings of the 14th International Symposium on System Synthesis (ISSS'01). 215--220. Google ScholarDigital Library
- J. Nestor and G. Krishnamoorthy. 1993. SALSA: A new approach to scheduling with timing constraints. IEEE Trans. Comput.- Aided Des. Integr. Circ. Syst. 12, 1107--1122. Google ScholarDigital Library
- E. Nurvitadhi, J. Hoe, T. Kam, and S. Lu. 2010a. Automatic multithreaded pipeline synthesis from transactional datapath specifications. In Proceedings of the Design Automation Conference (DAC'10). 314--319. Google ScholarDigital Library
- E. Nurvitadhi, J. Hoe, T. Kam, and S. Lu. 2010b. Automatic pipelining from transactional datapath specifications. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'10). Google ScholarDigital Library
- V. Strassen. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 354--356. Google ScholarDigital Library
- R. R. Tummala. 2004. Retiming for wire pipelining in system-on-chip. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 23, 9, 1338--1345. Google ScholarDigital Library
Recommendations
Outer loop pipelining for application specific datapaths in FPGAs
Most hardware compilers apply loop pipelining to increase the parallelism achieved, but pipelining is restricted to the only innermost level in a nested loop. In this work we extend and adapt an existing outer loop pipelining approach known as single ...
Pipelining of double precision floating point division and square root operations
ACM-SE 44: Proceedings of the 44th annual Southeast regional conferenceSpace applications rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be compiled on DSP processors, high data rate DSP computations require ...
Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysCurrent pipelining approach in high-level synthesis (HLS) achieves high performance for applications with regular and statically analyzable memory access patterns. However, it cannot effectively handle infrequent data-dependent structural and data ...
Comments