ABSTRACT
This paper describes architectural enhancements in the Altera Stratix? 10 HyperFlex? FPGA architecture, fabricated in the Intel 14nm FinFET process. Stratix 10 includes ubiquitous flip-flops in the routing to enable a high degree of pipelining. In contrast to the earlier architectural exploration of pipelining in pass-transistor based architectures, the direct drive routing fabric in Stratix-style FPGAs enables an extremely low-cost pipeline register. The presence of ubiquitous flip-flops simplifies circuit retiming and improves performance. The availability of predictable retiming affects all stages of the cluster, place and route flow. Ubiquitous flip-flops require a low-cost clock network with sufficient flexibility to enable pipelining of dozens of clock domains. Different cost/performance tradeoffs in a pipelined fabric and use of a 14nm process, lead to other modifications to the routing fabric and the logic element. User modification of the design enables even higher performance, averaging 2.3X faster in a small set of designs.
- V. Betz, J. Rose, and A. Marquardt, "Architecture and CAD for Deep-Submicron FPGAs", Kluwer Academic Publishers, 1999 Google ScholarDigital Library
- D. Singh and S. Brown, "The Case for Registered Routing Switches in Field Programmable Gate Arrays", Proc. FPGA 2001, pp. 161--169 Google ScholarDigital Library
- D. Singh and S. Brown, "Integrated Retiming and Placement for Field Programmable Gate Arrays", Proc. FPGA 2002, pp. 67--76 Google ScholarDigital Library
- R. Deokar and S. Sapatnekar, "A Fresh Look at Retiming via Clock Skew Optimization", Proc. DAC 1995, pp. 304--309. Google ScholarDigital Library
- A. Sharma, C. Ebeling, and S. Hauck, "PipeRoute: A Pipelining-Aware Router for Reconfigurable Architectures", IEEE TCAD, Mar. 2006, pp. 518--532 Google ScholarDigital Library
- C. Ebeling, D. How, D. Lewis and H. Schmit, "Stratix? 10 High Performance Routable Clock Networks", Proc. FPGA 2016 Google ScholarDigital Library
- W. Tsu et al, "HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array", Proc. FPGA 1999, pp. 125--134 Google ScholarDigital Library
- D. Cronquist, C. Fisher, M. Figueroa, P. Franklin, and C. Ebeling, "Architecture Design Of Reconfigurable Pipelined Datapaths", Conf. Advanced Research in VLSI, 1999, pp. 23--40 Google ScholarDigital Library
- S. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe R. Taylor, "PipeRench: A Reconfigurable Architecture and Compiler", Computer, April 2000, pp. 70--77 Google ScholarDigital Library
- K. Eguro, "Supporting High-Performance Pipelined Computation in Commodity-Style FPGAs", PhD thesis, University of Washington, 2008 Google ScholarDigital Library
- D. Lewis et al, "The Stratix? Routing and Logic Architecture", Proc. FPGA 2003, pp. 12--20 Google ScholarDigital Library
- D. Lewis et al, "The Stratix-II? Logic and Routing Architecture", Proc. FPGA 2005, pp. 14--20 Google ScholarDigital Library
- D. Lewis et al, "Architectural Enhancements in Stratix-V?", Proc. FPGA 2013, pp. 147--156 Google ScholarDigital Library
- G. Lemieux and D. Lewis, "Circuit Design of FPGA Routing Switches", Proc. FPGA 2002, pp. 19--28 Google ScholarDigital Library
- B. Pedersen, "Logic Circuitry with Shared Lookup Table", US Patent 7317330Google Scholar
- C.-H. Jan et al, "A 14nm SoC Platform Technology Featuring 2nd Generation Tri-Gate Transistors, 70nm Gate Pitch, 52nm Metal Pitch, and 0.0499um2 SRAM Cells, Optimized for Low Power, High Performance and High Density SoC Products", Symp. VLSI, 2015, pp. T12-T13Google Scholar
- N. Weaver, J. Hauser, J. Wawrzynek, "The SFRA: A Corner-Turn FPGA Architecture", Proc. FPGA 2004, pp. 3--12 Google ScholarDigital Library
- V. Manohararajah, G. Chiu, D. Singh, and S. Brown, "Predicting Interconnect Delay for Physical Synthesis in a FPGA CAD Flow", IEEE TVLSI, Aug 2007, pp. 895--903 Google ScholarDigital Library
- D. Singh, V. Manohararajah, and S. Brown, "Two-stage Physical Synthesis for FPGAs", CICC 2005, pp. 171--178Google Scholar
- C. Leiserson and J. Saxe, "Optimizing Synchronous Systems", Symp. Foundations of Computer Science, 1981, pp 23--36 Google ScholarDigital Library
- C. Soviani, O. Tardieu, and S. Edwards, "Optimizing Sequential Cycles Through Shannon Decomposition and Retiming", IEEE TCAD, Mar 2007 pp. 456--467 Google ScholarDigital Library
- D. Lewis, B. Thomson, P. Boulton, and E. S. Lee, "Transforming Bit Serial Communication Circuits into Fast, Parallel VLSI Implementations", IEEE JSSC, April 1988, pp. 549--557Google ScholarCross Ref
- P. Pan, "Continuous Retiming: Algorithms and Applications", Proc. ICCD 1997, pp. 116--121 Google ScholarDigital Library
- W. Feng and S. Kaptanoglu, Designing Efficient Input Interconnect Blocks for LUT Clusters Using Counting and Entropy", Proc. FPGA 2007, pp. 23--30 Google ScholarDigital Library
Index Terms
- The Stratix™ 10 Highly Pipelined FPGA Architecture
Recommendations
The Stratix II logic and routing architecture
FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arraysThis paper describes the Altera Stratix II™ logic and routing architecture. This architecture features a novel adaptive logic module (ALM) that is based on a 6-LUT, but can be partitioned into two smaller LUTs to efficiently implement circuits ...
An FPGA implementation for neural networks with the FDFM processor core approach
This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Highly pipelined asynchronous FPGAs
FPGA '04: Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arraysWe present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large ...
Comments