Automated Pipeline Transformations with Fluid Pipelines

Possignolo, Rafael T.; Ebrahimi, Elnaz; Skinner, Haven; Renau, Jose

doi:10.1007/978-3-319-67295-3_6

Rafael T. Possignolo³,
Elnaz Ebrahimi³,
Haven Skinner³ &
…
Jose Renau³

Abstract

In this chapter, we propose the concept of Fluid Pipelines, an evolution in the chip design process that allows for efficient late pipeline transformations. In a regular chip design process, pipeline depth and cycle time are fixed early in the design flow. However, their impact can only be assessed when the implementation is mostly done and any change in the pipeline design is impractical. Although Elastic Systems are latency insensitive and allow changes in the pipeline depth late in the design process with little design effort, they have significant throughput penalty when new stages are added in the presence of pipeline loops. Fluid Pipelines allow for pipeline transformations without a throughput penalty. Formally, we introduce “or-causality” in addition to the already existing “and-causality” in Elastic Systems. It gives more flexibility than previously possible at the cost of having the designer to specify the intended behavior of the circuit. In an Out-of-Order core benchmark, Fluid Pipelines improve the optimal energy-delay point by shifting both performance (by 17%) and energy (by 13%). We envision a scenario where tools would be able to generate different pipeline configurations from the same RTL, e.g., low power, high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Proposed Pipeline Clocking Scheme for Microarchitecture Data Propagation Delay Minimization

Evaluating Out-of-Order Engine Limitations Using Uop Flow Simulation

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Notes

1.
Cycles in the graph representing the connections between registers, not to be confused with program loops.
2.
In practice, a circuit implementing elasticity will most likely be deterministic depending on the input set, but this is not a formal requirement of the Elastic Systems specification.
3.
Other equivalent naming conventions have been used, e.g., Elasticity has been expressed in terms of FIFO operation [23].
4.
We note that inserting pipeline stages was proposed in synchronous circuits [12], but breaks the cycle accuracy of the circuit and should be used with care.
5.
http://www.opencores.org.
6.
Only the benchmarks that do not require Fortran were used.

References

D. Baudisch, K. Schneider, Evaluation of speculation in out-of-order execution of synchronous dataflow networks. Int. J. Parallel Program. 43(1), 86–129 (2015). doi:10.1007/s10766-013-0277-2
Article Google Scholar
D. Bufistov, J. Cortadella, M. Galceran-Oms, J. Julvez, M. Kishinevsky, Retiming and recycling for elastic systems with early evaluation, in 46th Design Automation Conference (2009), pp. 288–291
Google Scholar
B. Cao, K. Ross, M. Kim, S. Edwards, Implementing latency-insensitive dataflow blocks, in Proceedings of the 13th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE’15 (2015)
Google Scholar
L.P. Carloni, A.L. Sangiovanni-Vincentelli, Performance analysis and optimization of latency insensitive systems, in Proceedings of the 37th Design Automation Conference (ACM, New York, NY, 2000), pp. 361–367. doi:http://doi.acm.org/10.1145/337292.337441
Google Scholar
L.P. Carloni, K. McMillan, A. Saldanha, A. Sangiovanni-Vincentelli, A methodology for correct-by-construction latency-insensitive design, in International Conference on Computer-Aided Design (1999), pp. 309–315
Google Scholar
L.F. Chao, A. LaPaugh, E.M. Sha, Rotation scheduling: a loop pipelining algorithm. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 16(3), 229–239 (1997). doi:10.1109/43.594829
Article Google Scholar
N. Choudhary, B. Dwiel, E. Rotenberg, A physical design study of FabScalar-generated superscalar cores, in 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC) (2012), pp. 165–170. doi:10.1109/VLSI-SoC.2012.6379024
Google Scholar
J. Cortadella, M. Kishinevsky, B. Grundmann, SELF: specification and design of synchronous elastic circuits, in Proceedings of the ACM/IEEE International Workshop on Timing Issues, TAU 06 (2006)
Google Scholar
J. Cortadella, M. Galceran-Oms, M. Kishinevsky, Elastic systems, in Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE ’10 (2010), pp. 149–158
Google Scholar
J. Cortadella, M. Galceran-Oms, M. Kishinevsky, S.S. Sapatnekar, Rtl synthesis: from logic synthesis to automatic pipelining. Proc. IEEE 103(11), 2061–2075 (2015). doi:10.1109/JPROC.2015.2456189
Article Google Scholar
G. Dimitrakopoulos, I. Seitanidis, A. Psarras, K. Tsiouris, P.M. Mattheakis, J. Cortadella, Hardware primitives for the synthesis of multithreaded elastic systems, in 2014 Design, Automation Test in Europe Conference Exhibition (DATE) (2014), pp. 1–4. doi:10.7873/DATE.2014.314
Google Scholar
I. Ganusov, H. Fraisse, A. Ng, R.T. Possignolo, S. Das, Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs, in Proceedings of the 26th Conference on Field Programmable Logic and Applications (FPL) (2016)
Google Scholar
M. Hrishikesh, D. Burger, N.P. Jouppi, K.I. Farkas, P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 for inverter delays, in Proceedings of the 29th International Symposium on Computer Architecture (2002)
Google Scholar
Y. Huang, P. Ienne, O. Temam, Y. Chen, C. Wu, Elastic CGRAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (ACM, New York, NY, 2013), pp. 171–180. doi:10.1145/2435264.2435296
Book Google Scholar
K. Jensen, L.M. Kristensen, Coloured Petri Nets Modelling and Validation of Concurrent Systems (Springer, Berlin, Heidelberg, 2009)
Book MATH Google Scholar
J. Julvez, J. Cortadella, M. Kishinevsky, Performance analysis of concurrent systems with early evaluation, in International Conference on Computer-Aided Design, pp. 448–455 (2006). doi:10.1109/ICCAD.2006.320155
Google Scholar
K.E. Ardestani, J. Renau, ESESC: a fast multicore simulator using time-based sampling, in International Symposium on High Performance Computer Architecture, HPCA’19 (2013)
Google Scholar
C.E. Leiserson, J.B. Saxe, Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)
Article MATH MathSciNet Google Scholar
S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, N. Jouppi, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, in 42nd IEEE/ACM Int’l Symp. on Microarchitecture (IEEE, New York, 2009), pp. 469–480
Google Scholar
M. Oskin, F. Chong, M. Farrens, HLS: combining statistical and symbolic simulation to guide microprocessor designs, in International Symposium on Computer Architecture, Vancouver (2000), pp. 71–82
Google Scholar
R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry meets out-of-order execution, in Proceedings of the 34th International Conference on Computer Design (ICCD) (2016)
Google Scholar
R.T. Possignolo, E. Ebrahimi, H. Skinner, J. Renau, FluidPipelines: elastic circuitry without throughput penalty, in Proceedings of the 2016 International Workshop on Logic Synthesis (IWLS) (2016)
Google Scholar
M. Vijayaraghavan, A. Arvind, Bounded dataflow networks and latency-insensitive circuits, in Proceedings of the 7th IEEE/ACM International Conference on Formal Methods and Models for Codesign (IEEE, Piscataway, NJ, 2009), pp. 171–180
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under grants CNS-1059442-003, CNS-1318943-001, CCF-1337278, and CCF-1514284. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Department of Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
Rafael T. Possignolo, Elnaz Ebrahimi, Haven Skinner & Jose Renau

Authors

Rafael T. Possignolo
View author publications
You can also search for this author in PubMed Google Scholar
Elnaz Ebrahimi
View author publications
You can also search for this author in PubMed Google Scholar
Haven Skinner
View author publications
You can also search for this author in PubMed Google Scholar
Jose Renau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael T. Possignolo .

Editor information

Editors and Affiliations

PPGC/PGMICRO, Institute of Informatics, UFRGS, Porto Alegre, Rio Grande do Sul, Brazil
André Inácio Reis
Group for Computer Architecture, University of Bremen, Bremen, Germany
Rolf Drechsler

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Possignolo, R.T., Ebrahimi, E., Skinner, H., Renau, J. (2018). Automated Pipeline Transformations with Fluid Pipelines. In: Reis, A., Drechsler, R. (eds) Advanced Logic Synthesis. Springer, Cham. https://doi.org/10.1007/978-3-319-67295-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-67295-3_6
Published: 16 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67294-6
Online ISBN: 978-3-319-67295-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Automated Pipeline Transformations with Fluid Pipelines

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Proposed Pipeline Clocking Scheme for Microarchitecture Data Propagation Delay Minimization

Evaluating Out-of-Order Engine Limitations Using Uop Flow Simulation

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated Pipeline Transformations with Fluid Pipelines

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Proposed Pipeline Clocking Scheme for Microarchitecture Data Propagation Delay Minimization

Evaluating Out-of-Order Engine Limitations Using Uop Flow Simulation

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation