Abstract
In this chapter, we propose the concept of Fluid Pipelines, an evolution in the chip design process that allows for efficient late pipeline transformations. In a regular chip design process, pipeline depth and cycle time are fixed early in the design flow. However, their impact can only be assessed when the implementation is mostly done and any change in the pipeline design is impractical. Although Elastic Systems are latency insensitive and allow changes in the pipeline depth late in the design process with little design effort, they have significant throughput penalty when new stages are added in the presence of pipeline loops. Fluid Pipelines allow for pipeline transformations without a throughput penalty. Formally, we introduce “or-causality” in addition to the already existing “and-causality” in Elastic Systems. It gives more flexibility than previously possible at the cost of having the designer to specify the intended behavior of the circuit. In an Out-of-Order core benchmark, Fluid Pipelines improve the optimal energy-delay point by shifting both performance (by 17%) and energy (by 13%). We envision a scenario where tools would be able to generate different pipeline configurations from the same RTL, e.g., low power, high performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Cycles in the graph representing the connections between registers, not to be confused with program loops.
- 2.
In practice, a circuit implementing elasticity will most likely be deterministic depending on the input set, but this is not a formal requirement of the Elastic Systems specification.
- 3.
Other equivalent naming conventions have been used, e.g., Elasticity has been expressed in terms of FIFO operation [23].
- 4.
We note that inserting pipeline stages was proposed in synchronous circuits [12], but breaks the cycle accuracy of the circuit and should be used with care.
- 5.
- 6.
Only the benchmarks that do not require Fortran were used.
References
D. Baudisch, K. Schneider, Evaluation of speculation in out-of-order execution of synchronous dataflow networks. Int. J. Parallel Program. 43(1), 86–129 (2015). doi:10.1007/s10766-013-0277-2
D. Bufistov, J. Cortadella, M. Galceran-Oms, J. Julvez, M. Kishinevsky, Retiming and recycling for elastic systems with early evaluation, in 46th Design Automation Conference (2009), pp. 288–291
B. Cao, K. Ross, M. Kim, S. Edwards, Implementing latency-insensitive dataflow blocks, in Proceedings of the 13th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE’15 (2015)
L.P. Carloni, A.L. Sangiovanni-Vincentelli, Performance analysis and optimization of latency insensitive systems, in Proceedings of the 37th Design Automation Conference (ACM, New York, NY, 2000), pp. 361–367. doi:http://doi.acm.org/10.1145/337292.337441
L.P. Carloni, K. McMillan, A. Saldanha, A. Sangiovanni-Vincentelli, A methodology for correct-by-construction latency-insensitive design, in International Conference on Computer-Aided Design (1999), pp. 309–315
L.F. Chao, A. LaPaugh, E.M. Sha, Rotation scheduling: a loop pipelining algorithm. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 16(3), 229–239 (1997). doi:10.1109/43.594829
N. Choudhary, B. Dwiel, E. Rotenberg, A physical design study of FabScalar-generated superscalar cores, in 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC) (2012), pp. 165–170. doi:10.1109/VLSI-SoC.2012.6379024
J. Cortadella, M. Kishinevsky, B. Grundmann, SELF: specification and design of synchronous elastic circuits, in Proceedings of the ACM/IEEE International Workshop on Timing Issues, TAU 06 (2006)
J. Cortadella, M. Galceran-Oms, M. Kishinevsky, Elastic systems, in Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE ’10 (2010), pp. 149–158
J. Cortadella, M. Galceran-Oms, M. Kishinevsky, S.S. Sapatnekar, Rtl synthesis: from logic synthesis to automatic pipelining. Proc. IEEE 103(11), 2061–2075 (2015). doi:10.1109/JPROC.2015.2456189
G. Dimitrakopoulos, I. Seitanidis, A. Psarras, K. Tsiouris, P.M. Mattheakis, J. Cortadella, Hardware primitives for the synthesis of multithreaded elastic systems, in 2014 Design, Automation Test in Europe Conference Exhibition (DATE) (2014), pp. 1–4. doi:10.7873/DATE.2014.314
I. Ganusov, H. Fraisse, A. Ng, R.T. Possignolo, S. Das, Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs, in Proceedings of the 26th Conference on Field Programmable Logic and Applications (FPL) (2016)
M. Hrishikesh, D. Burger, N.P. Jouppi, K.I. Farkas, P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 for inverter delays, in Proceedings of the 29th International Symposium on Computer Architecture (2002)
Y. Huang, P. Ienne, O. Temam, Y. Chen, C. Wu, Elastic CGRAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (ACM, New York, NY, 2013), pp. 171–180. doi:10.1145/2435264.2435296
K. Jensen, L.M. Kristensen, Coloured Petri Nets Modelling and Validation of Concurrent Systems (Springer, Berlin, Heidelberg, 2009)
J. Julvez, J. Cortadella, M. Kishinevsky, Performance analysis of concurrent systems with early evaluation, in International Conference on Computer-Aided Design, pp. 448–455 (2006). doi:10.1109/ICCAD.2006.320155
K.E. Ardestani, J. Renau, ESESC: a fast multicore simulator using time-based sampling, in International Symposium on High Performance Computer Architecture, HPCA’19 (2013)
C.E. Leiserson, J.B. Saxe, Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)
S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, N. Jouppi, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, in 42nd IEEE/ACM Int’l Symp. on Microarchitecture (IEEE, New York, 2009), pp. 469–480
M. Oskin, F. Chong, M. Farrens, HLS: combining statistical and symbolic simulation to guide microprocessor designs, in International Symposium on Computer Architecture, Vancouver (2000), pp. 71–82
R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry meets out-of-order execution, in Proceedings of the 34th International Conference on Computer Design (ICCD) (2016)
R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry without throughput penalty, in Proceedings of the 2016 International Workshop on Logic Synthesis (IWLS) (2016)
M. Vijayaraghavan, A. Arvind, Bounded dataflow networks and latency-insensitive circuits, in Proceedings of the 7th IEEE/ACM International Conference on Formal Methods and Models for Codesign (IEEE, Piscataway, NJ, 2009), pp. 171–180
Acknowledgements
This work was supported in part by the National Science Foundation under grants CNS-1059442-003, CNS-1318943-001, CCF-1337278, and CCF-1514284. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Possignolo, R.T., Ebrahimi, E., Skinner, H., Renau, J. (2018). Automated Pipeline Transformations with Fluid Pipelines. In: Reis, A., Drechsler, R. (eds) Advanced Logic Synthesis. Springer, Cham. https://doi.org/10.1007/978-3-319-67295-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-67295-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67294-6
Online ISBN: 978-3-319-67295-3
eBook Packages: EngineeringEngineering (R0)