Skip to main content

Automated Pipeline Transformations with Fluid Pipelines

  • Chapter
  • First Online:
Advanced Logic Synthesis

Abstract

In this chapter, we propose the concept of Fluid Pipelines, an evolution in the chip design process that allows for efficient late pipeline transformations. In a regular chip design process, pipeline depth and cycle time are fixed early in the design flow. However, their impact can only be assessed when the implementation is mostly done and any change in the pipeline design is impractical. Although Elastic Systems are latency insensitive and allow changes in the pipeline depth late in the design process with little design effort, they have significant throughput penalty when new stages are added in the presence of pipeline loops. Fluid Pipelines allow for pipeline transformations without a throughput penalty. Formally, we introduce “or-causality” in addition to the already existing “and-causality” in Elastic Systems. It gives more flexibility than previously possible at the cost of having the designer to specify the intended behavior of the circuit. In an Out-of-Order core benchmark, Fluid Pipelines improve the optimal energy-delay point by shifting both performance (by 17%) and energy (by 13%). We envision a scenario where tools would be able to generate different pipeline configurations from the same RTL, e.g., low power, high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Cycles in the graph representing the connections between registers, not to be confused with program loops.

  2. 2.

    In practice, a circuit implementing elasticity will most likely be deterministic depending on the input set, but this is not a formal requirement of the Elastic Systems specification.

  3. 3.

    Other equivalent naming conventions have been used, e.g., Elasticity has been expressed in terms of FIFO operation [23].

  4. 4.

    We note that inserting pipeline stages was proposed in synchronous circuits [12], but breaks the cycle accuracy of the circuit and should be used with care.

  5. 5.

    http://www.opencores.org.

  6. 6.

    Only the benchmarks that do not require Fortran were used.

References

  1. D. Baudisch, K. Schneider, Evaluation of speculation in out-of-order execution of synchronous dataflow networks. Int. J. Parallel Program. 43(1), 86–129 (2015). doi:10.1007/s10766-013-0277-2

    Article  Google Scholar 

  2. D. Bufistov, J. Cortadella, M. Galceran-Oms, J. Julvez, M. Kishinevsky, Retiming and recycling for elastic systems with early evaluation, in 46th Design Automation Conference (2009), pp. 288–291

    Google Scholar 

  3. B. Cao, K. Ross, M. Kim, S. Edwards, Implementing latency-insensitive dataflow blocks, in Proceedings of the 13th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE’15 (2015)

    Google Scholar 

  4. L.P. Carloni, A.L. Sangiovanni-Vincentelli, Performance analysis and optimization of latency insensitive systems, in Proceedings of the 37th Design Automation Conference (ACM, New York, NY, 2000), pp. 361–367. doi:http://doi.acm.org/10.1145/337292.337441

    Google Scholar 

  5. L.P. Carloni, K. McMillan, A. Saldanha, A. Sangiovanni-Vincentelli, A methodology for correct-by-construction latency-insensitive design, in International Conference on Computer-Aided Design (1999), pp. 309–315

    Google Scholar 

  6. L.F. Chao, A. LaPaugh, E.M. Sha, Rotation scheduling: a loop pipelining algorithm. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 16(3), 229–239 (1997). doi:10.1109/43.594829

    Article  Google Scholar 

  7. N. Choudhary, B. Dwiel, E. Rotenberg, A physical design study of FabScalar-generated superscalar cores, in 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC) (2012), pp. 165–170. doi:10.1109/VLSI-SoC.2012.6379024

    Google Scholar 

  8. J. Cortadella, M. Kishinevsky, B. Grundmann, SELF: specification and design of synchronous elastic circuits, in Proceedings of the ACM/IEEE International Workshop on Timing Issues, TAU 06 (2006)

    Google Scholar 

  9. J. Cortadella, M. Galceran-Oms, M. Kishinevsky, Elastic systems, in Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE ’10 (2010), pp. 149–158

    Google Scholar 

  10. J. Cortadella, M. Galceran-Oms, M. Kishinevsky, S.S. Sapatnekar, Rtl synthesis: from logic synthesis to automatic pipelining. Proc. IEEE 103(11), 2061–2075 (2015). doi:10.1109/JPROC.2015.2456189

    Article  Google Scholar 

  11. G. Dimitrakopoulos, I. Seitanidis, A. Psarras, K. Tsiouris, P.M. Mattheakis, J. Cortadella, Hardware primitives for the synthesis of multithreaded elastic systems, in 2014 Design, Automation Test in Europe Conference Exhibition (DATE) (2014), pp. 1–4. doi:10.7873/DATE.2014.314

    Google Scholar 

  12. I. Ganusov, H. Fraisse, A. Ng, R.T. Possignolo, S. Das, Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs, in Proceedings of the 26th Conference on Field Programmable Logic and Applications (FPL) (2016)

    Google Scholar 

  13. M. Hrishikesh, D. Burger, N.P. Jouppi, K.I. Farkas, P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 for inverter delays, in Proceedings of the 29th International Symposium on Computer Architecture (2002)

    Google Scholar 

  14. Y. Huang, P. Ienne, O. Temam, Y. Chen, C. Wu, Elastic CGRAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (ACM, New York, NY, 2013), pp. 171–180. doi:10.1145/2435264.2435296

    Book  Google Scholar 

  15. K. Jensen, L.M. Kristensen, Coloured Petri Nets Modelling and Validation of Concurrent Systems (Springer, Berlin, Heidelberg, 2009)

    Book  MATH  Google Scholar 

  16. J. Julvez, J. Cortadella, M. Kishinevsky, Performance analysis of concurrent systems with early evaluation, in International Conference on Computer-Aided Design, pp. 448–455 (2006). doi:10.1109/ICCAD.2006.320155

    Google Scholar 

  17. K.E. Ardestani, J. Renau, ESESC: a fast multicore simulator using time-based sampling, in International Symposium on High Performance Computer Architecture, HPCA’19 (2013)

    Google Scholar 

  18. C.E. Leiserson, J.B. Saxe, Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  19. S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, N. Jouppi, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, in 42nd IEEE/ACM Int’l Symp. on Microarchitecture (IEEE, New York, 2009), pp. 469–480

    Google Scholar 

  20. M. Oskin, F. Chong, M. Farrens, HLS: combining statistical and symbolic simulation to guide microprocessor designs, in International Symposium on Computer Architecture, Vancouver (2000), pp. 71–82

    Google Scholar 

  21. R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry meets out-of-order execution, in Proceedings of the 34th International Conference on Computer Design (ICCD) (2016)

    Google Scholar 

  22. R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry without throughput penalty, in Proceedings of the 2016 International Workshop on Logic Synthesis (IWLS) (2016)

    Google Scholar 

  23. M. Vijayaraghavan, A. Arvind, Bounded dataflow networks and latency-insensitive circuits, in Proceedings of the 7th IEEE/ACM International Conference on Formal Methods and Models for Codesign (IEEE, Piscataway, NJ, 2009), pp. 171–180

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under grants CNS-1059442-003, CNS-1318943-001, CCF-1337278, and CCF-1514284. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael T. Possignolo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Possignolo, R.T., Ebrahimi, E., Skinner, H., Renau, J. (2018). Automated Pipeline Transformations with Fluid Pipelines. In: Reis, A., Drechsler, R. (eds) Advanced Logic Synthesis. Springer, Cham. https://doi.org/10.1007/978-3-319-67295-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67295-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67294-6

  • Online ISBN: 978-3-319-67295-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics