skip to main content
10.1145/3359986.3361205acmconferencesArticle/Chapter ViewAbstractPublication PagesmemocodeConference Proceedingsconference-collections
research-article

Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays

Published:09 October 2019Publication History

ABSTRACT

To leverage the vast parallelism of loops, embedded loop accelerators often take the form of processor arrays with many, but simple processing elements. Each processing element executes a subset of a loop's iterations in parallel using instruction- and datalevel parallelism by tightly scheduling iterations using software pipelining and packing instructions into compact, individual programs. However, loop bounds are often unknown until runtime, which complicates the static generation of programs because they influence each program's control flow.

Existing solutions, like generating and storing all possible programs or full just-in-time compilation, are prohibitively expensive, especially in embedded systems. As a remedy, we propose a hybrid approach introducing a tree-like program representation, whose generation front-loads all intractable sub-problems to compile time, and from which all concrete program variants can efficiently be stitched together at runtime. The tree consists of so-called polyhedral fragments that represent concrete program parts and are annotated with iteration-dependent conditions.

We show that both this representation is both space- and time-efficient: it requires polynomial space to store---whereas storing all possibly generated programs is non-polynomial---and polynomial time to evaluate---whereas just-in-time compilation requires solving NP-hard problems. In a case study, we show for a representative loop program that using a tree of polyhedral fragments saves 98.88 % of space compared to storing all program variants.

References

  1. S. Aditya and V. Kathail, High-level synthesis: From algorithm to digital circuit. Dordrecht: Springer Netherlands, 2008, ch. Algorithmic Synthesis Using PICO, pp. 53--74.Google ScholarGoogle Scholar
  2. D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich, "A dynamically reconfigurable weakly programmable processor array architecture template." in Proceedings of the 2nd International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC), 2006, pp. 31--37.Google ScholarGoogle Scholar
  3. F. Hannig, V. Lari, S. Boppu, A. Tanase, and O. Reiche, "Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach," ACM Transactions on Embedded Computing Systems (TECS), vol. 13, no. 4s, p. 133, 2014.Google ScholarGoogle Scholar
  4. M. Brand, F. Hannig, A. Tanase, and J. Teich, "Orthogonal instruction processing: An alternative to lightweight VLIW processors," in IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2017, pp. 5--12.Google ScholarGoogle Scholar
  5. P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, no. 1, pp. 23--53, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. K. Rao, "Regular iterative algorithms and their implementation on processor arrays," Ph.D. dissertation, Stanford University, 1985.Google ScholarGoogle Scholar
  7. J. Teich, L. Thiele, and L. Zhang, "Scheduling of partitioned regular algorithms on processor arrays with constrained resources," in 1996 IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 1996, pp. 131--144.Google ScholarGoogle Scholar
  8. F. Hannig, Scheduling techniques for high-throughput loop accelerators. Munich: Verlag Dr. Hut, 2009.Google ScholarGoogle Scholar
  9. J. Teich, A compiler for application specific processor arrays. Aachen: Shaker, 1993.Google ScholarGoogle Scholar
  10. H. W. Nelis and E. F. Deprettere, "Automatic design and partitioning of systolic/wavefront arrays for VLSI," Circuits, Systems and Signal Processing, vol. 7, no. 2, pp. 235--252, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. R. Rau and C. D. Glaeser, "Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing," in Proceedings of the 14th Annual Workshop on Microprogramming. IEEE, 1981, pp. 183--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Witterauf, A. Tanase, F. Hannig, and J. Teich, "Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays," in 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 2016, pp. 58--66.Google ScholarGoogle Scholar
  13. S. Boppu, F. Hannig, and J. Teich, "Compact code generation for tightly-coupled processor arrays," Journal of Signal Processing Systems, vol. 77, no. 1-2, pp. 5--29, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Boppu, F. Hannig, "Loop program mapping and compact code generation for programmable hardware accelerators," in 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, June 2013, pp. 10--17.Google ScholarGoogle Scholar
  15. A. Hartono, M. M. Baskaran, J. Ramanujam, and P. Sadayappan, "DynTile: Parametric tiled loop generation for parallel execution on multicore processors," in IEEE International Symposium on Parallel Distributed Processing (IPDPS), 2010, pp. 1--12.Google ScholarGoogle Scholar
  16. D. Kim and S. Rajopadhye, "Efficient tiled loop generation: D-tiling," in Languages and compilers for parallel computing (LCPC). Springer, 2009, pp. 293--307.Google ScholarGoogle Scholar
  17. M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan, "When polyhedral transformations meet SIMD code generation," in ACM Sigplan Notices, vol. 48, no. 6. ACM, 2013, pp. 127--138.Google ScholarGoogle Scholar
  18. A. Konstantinidis, P. Kelly, J. Ramanujam, and P. Sadayappan, "Parametric GPU code generation for affine loop programs," in Languages and compilers for parallel computing (LCPC), vol. 8664. Springer, 2014, pp. 136--151.Google ScholarGoogle Scholar
  19. A. Jimborean, P. Clauss, J.-F. Dollinger, V. Loechner, and J. M. Martinez Caamaño, "Dynamic and speculative polyhedral parallelization using compiler-generated skeletons," International Journal of Parallel Programming, vol. 42, no. 4, pp. 529--545, Aug 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. M. M. Caamaño, W. Wolff, and P. Clauss, "Code bones: Fast and flexible code generation for dynamic and speculative polyhedral optimization," in Proceedings of the 22nd European Conference on Parallel Processing (Euro-Par). Springer, 2016, pp. 225--237.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design
    October 2019
    160 pages
    ISBN:9781450369978
    DOI:10.1145/3359986

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 October 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    MEMOCODE '19 Paper Acceptance Rate12of34submissions,35%Overall Acceptance Rate34of82submissions,41%

    Upcoming Conference

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader