skip to main content
research-article

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Published:01 September 2009Publication History
Skip Abstract Section

Abstract

In this article, we present a new technique for optimizing loops that contain kernels mapped on a reconfigurable fabric. We assume the Molen machine organization as our framework. We propose combining loop unrolling with loop shifting, which is used to relocate the function calls contained in the loop body such that in every iteration of the transformed loop, software functions (running on GPP) execute in parallel with multiple instances of the kernel (running on FPGA). The algorithm computes the optimal unroll factor and determines the most appropriate transformation (which can be the combination of unrolling plus shifting or either of the two). This method is based on profiling information about the kernel’s execution times on GPP and FPGA, memory transfers and area utilization. In the experimental part, we apply this method to several kernels from loop nests extracted from real-life applications (DCT and SAD from MPEG2 encoder, Quantizer from JPEG, and Sobel’s Convolution) and perform an analysis of the results, comparing them with the theoretical maximum speedup by Amdahl’s Law and showing when and how our transformations are beneficial.

References

  1. Banerjee, S., Bozorgzadeh, E., and Dutt, N. 2006. PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures. In Proceedings of the Conference on Asia South Pacific Design Automation (ASP-DAC’06). 491--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cardoso, J. M. P. and Diniz, P. C. 2004. Modeling loop unrolling: Approaches and open issues. In Proceedings of the 4th International Workshop on Computer Systems: Architectures, Modeling, and Simulation (SAMOS’04). 224--233.Google ScholarGoogle Scholar
  3. Darte, A. and Huard, G. 1999. Loop shifting for loop compaction. In Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing (LCPC’99). 415--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dragomir, O. S., Moscu-Panainte, E., Bertels, K., and Wong, S. 2008a. Optimal unroll factor for reconfigurable architectures. In Proceedings of the 4th International Workshop on Applied Reconfigurable Computing (ARC’08). 4--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dragomir, O. S., Stefanov, T., and Bertels, K. 2008b. Loop unrolling and shifting for reconfigurable architectures. In Proceedings of the 18th International Conference on Field Programmable Logic and Applications (FPL’08).Google ScholarGoogle Scholar
  6. Guo, Z., Buyukkurt, B., Najjar, W., and Vissers, K. 2005. Optimized generation of data-path from C codes for FPGAs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’05). 112--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gupta, S., Dutt, N., Gupta, R., and Nicolau, A. 2004. Loop shifting and compaction for the high-level synthesis of designs with complex control flow. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’04). 114--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kuzmanov, G., Gaydadjiev, G., and Vassiliadis, S. 2004. The Virtex II Pro MOLEN processor. In Proceedings of the 4th International Workshop on Computer Systems: Architectures, Modeling, and Simulation (SAMOS’04). 192--202.Google ScholarGoogle Scholar
  9. Liao, J., Wong, W.-F., and Mitra, T. 2003. A model for hardware realization of kernel loops. In Proceedings of the 13th International Conference on Field-Programmable Logic and Applications (FPL’03). 334--344.Google ScholarGoogle Scholar
  10. Vassiliadis, S., Gaydadjiev, G. N., Bertels, K., and Panainte, E. M. 2003. The Molen programming paradigm. In Proceedings of the 3rd International Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS’03). 1--7.Google ScholarGoogle Scholar
  11. Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., Kuzmanov, G., and Panainte, E. M. 2004. The Molen polymorphic processor. IEEE Trans. Comput. 53, 11, 1363--1375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Weinhardt, M. and Luk, W. 2001. Pipeline vectorization. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 234--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xilinx Inc. 2007. Virtex II Pro and Virtex II Pro X platform FPGAs: Complete data sheet. http://www.xilinx.com/bvdocs/publications/ds083.pdf.Google ScholarGoogle Scholar
  14. Yankova, Y. D., Kuzmanov, G., Bertels, K., Gaydadjiev, G., Lu, Y., and Vassiliadis, S. 2007. DWARV: DelftWorkbench automated reconfigurable VHDL generator. In Proceedings of the 17th International Conference on Field Programmable Logic and Applications (FPL’07). 697--701.Google ScholarGoogle Scholar

Index Terms

  1. Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 4
      September 2009
      134 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/1575779
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2009
      • Accepted: 1 January 2009
      • Revised: 1 September 2008
      • Received: 1 May 2008
      Published in trets Volume 2, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader