Abstract
Embedded systems have strict timing and code size requirements. Retiming is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. Traditionally, retiming has been applied at instruction level to reduce cycle period for single loops. While multi-dimensional (MD) retiming can explore the outer loop parallelism, it introduces large overheads in loop index generation and code size due to loop transformation. In this paper, we propose a novel approach, that combines iterational retiming with instructional retiming to satisfy any given timing constraint by achieving full parallelism for iterations in a partition with minimal code size. The experimental results show that combining iterational retiming and instructional retiming, we can achieve 37% code size reduction comparing to applying iteration retiming alone.
This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 009741-0028-2001, and NSF CCR-0309461, NSF IIS-0513669, Microsoft, USA.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aiken, A., Nicolau, A.: Optimal loop parallelization. In: ACM Conference on Programming Language Design and Implementation, pp. 308–317 (1988)
Aiken, A., Nicolau, A.: Fine-Grain Parallelization and the Wavefront Method. MIT Press, Cambridge (1990)
Bacon, D., Graham, S., Sharp, O.: Compiler transformations for high-performance computing. In: Computing Surveys, pp. 345–420 (1994)
Chao, L.-F., LaPaugh, A.S., Sha, E.H.-M.: Rotation scheduling: A loop pipelining algorithm. IEEE Trans. on Computer-Aided Design 16(3), 229–239 (1997)
Chao, L.-F., Sha, E.-M.: Rate-optimal static scheduling for dsp data-flow programs. In: IEEE Third Great lakes Symposium on VLSI, pp. 80–84 (March 1993)
Chen, F., Sha, E.-M.: Loop scheduling and partitions for hiding memory latencies. In: International Symposium on System Synthesis (1999)
Lamport, L.: The parallel execution of do loops. Communications of the ACM SIG-PLAN 17, 82–93 (1991)
Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)
Passos, N., Sha, E.: Full parallelism of uniform nested loops by multi-dimensional retiming. In: Internal conference on Parallel Processing, August 1994, vol. 2, pp. 130–133 (1994)
Renfors, M., Neuvo, Y.: The maximum sampling rate of digital filters under hardware speed constraints. IEEE Transactions on Cirtuits and Systems, 196–202 (March 1981)
Wang, Z., Zhuge, Q., Sha, E.-M.: Scheduling and partitioning for multiple loop nests. In: International Symposium on System Synthesis, October 2001, pp. 183–188 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xue, C., Shao, Z., Liu, M., Qiu, M.K., Sha, E.H.M. (2005). Optimizing Nested Loops with Iterational and Instructional Retiming. In: Yang, L.T., Amamiya, M., Liu, Z., Guo, M., Rammig, F.J. (eds) Embedded and Ubiquitous Computing – EUC 2005. EUC 2005. Lecture Notes in Computer Science, vol 3824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596356_19
Download citation
DOI: https://doi.org/10.1007/11596356_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30807-2
Online ISBN: 978-3-540-32295-5
eBook Packages: Computer ScienceComputer Science (R0)