skip to main content
10.1145/2636228.2636238acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Size slicing: a hybrid approach to size inference in futhark

Published: 03 September 2014 Publication History

Abstract

We present a shape inference analysis for a purely-functional language, named Futhark, that supports nested parallelism via array combinators such as map, reduce, filter}, and scan}. Our approach is to infer code for computing precise shape information at run-time, which in the most common cases can be effectively optimized by standard compiler optimizations. Instead of restricting the language or sacrificing ease of use, the language allows the occasional shape-dynamic, and even shape-misbehaving, constructs. Inherently shape-dynamic code is treated with a fall-back technique that preserves, asymptotically, the number of operations of the program and that computes and returns the array's shape alongside with its value. This approach leads to a shape-dependent system with existentially-quantified types, where static shape inference corresponds to eliminating existential quantifications from the types of program expressions.
We optimize the common case to negligible overhead via size slicing: a technique that separates the computation of the array's shape from its values. This allows the shape to be calculated in advance and to be used to instantiate the previously existentially-quantified shapes of the value slice. We report negligible overhead, on several mini-benchmarks and three real-world applications.

References

[1]
E. Barendsen and S. Smetsers. Conventional and Uniqueness Typing in Graph Rewrite Systems. In Found. of Soft. Tech. and Theoretical Comp. Sci. (FSTTCS), volume 761 of phLNCS, pages 41--51, 1993.
[2]
L. Bergstrom and J. Reppy. Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pages 247--258, Sept. 2012.
[3]
L. Birkedal, M. Tofte, and M. Vejlstrup. From region inference to von Neumann machines via region representation inference. In ACM Symposium on Principles of Programming Languages, POPL'96, pages 171--183. ACM Press, January 1996.
[4]
G. Blelloch. Programming Parallel Algorithms. Communications of the ACM (CACM), 39 (3): 85--97, 1996.
[5]
G. E. Blelloch. Vector Models for Data-parallel Computing. MIT Press, Cambridge, MA, USA, 1990. ISBN 0-262-02313-X.
[6]
G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing, 21 (1): 4--14, 1994.
[7]
W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: Improving the Effectiveness of Parallelizing Compilers. In Procs. Langs. Comp. Parallel Computing (LCPC), pages 141--154. Springer-Verlag, 1994.
[8]
M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell Array Codes with Multicore GPUs. In International Workshop on Declarative Aspects of Multicore Programming, DAMP'11, pages 3--14, 2011.
[9]
Y. Chicha, M. Lloyd, C. Oancea, and S. M. Watt. Parametric Polymorphism for Computer Algebra Software Components. In Procs. Int. Symp. Symbolic and Numeric Alg. for Scientific Computing (SYNASC), pages 119--130. Mirton Publishing House, 2004.
[10]
K. Claessen, M. Sheeran, and B. J. Svensson. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In International Workshop on Declarative Aspects of Multicore Programming, DAMP'12, pages 21--30, 2012.
[11]
F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops. In Int. Par. and Distr. Processing Symp. (PDPS), pages 20--29, 2002.
[12]
M. Elsman and M. Dybdal. Compiling a Subset of APL Into a Typed Intermediate Language. In Procs. Int. Workshop on Lib. Lang. and Compilers for Array Prog. (ARRAY). ACM, 2014.
[13]
K. Fraser and T. Harris. Concurrent Programming Without Locks. Trans. of Comput. Syst. (TOCS), 25 (2), May 2007.
[14]
C. Grelck. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming (JFP), 15 (3): 353--401, 2005.
[15]
C. Grelck and S.-B. Scholz. Accelerating APL programs with SAC. In Proceedings of the Conference on APL '99: On Track to the 21st Century, APL'99, pages 50--57. ACM, 1999.
[16]
C. Grelck and S.-B. Scholz. SAC: A functional array language for efficient multithreaded execution. Int. Journal of Parallel Programming, 34 (4): 383--427, 2006.
[17]
C. Grelck and F. Tang. Towards Hybrid Array Types in SAC. In 7th Workshop on Prg. Lang., (Soft. Eng. Conf.), pages 129--145, 2014.
[18]
J. Guo, J. Thiyagalingam, and S.-B. Scholz. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In Procs. Workshop Decl. Aspects of Multicore Prog. (DAMP), pages 15--24. ACM, 2011.
[19]
G. Hains and L. M. R. Mullin. Parallel functional programming with arrays. The Computer Journal, 36 (3): 238--245, 1993.
[20]
M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural Parallelization Analysis in SUIF. Trans. on Prog. Lang. and Sys. (TOPLAS), 27(4): 662--731, 2005.
[21]
T. Henriksen. Exploiting functional invariants to optimise parallelism: a dataflow approach. Master's thesis, DIKU, Denmark, 2014.
[22]
T. Henriksen and C. E. Oancea. A T2 Graph-Reduction Approach to Fusion. In Procs. Funct. High-Perf. Comp. (FHPC), pages 47--58. ACM, 2013. ISBN 978-1-4503-2381--9.
[23]
T. Henriksen and C. E. Oancea. Bounds Checking: An Instance of Hybrid Analysis. In Procs. Int. Workshop on Lib. Lang. and Compilers for Array Prog. (ARRAY). ACM, 2014.
[24]
K. E. Iverson. A Programming Language. John Wiley and Sons, Inc, May 1962.
[25]
C. B. Jay. Programming in fish. phInternational Journal on Software Tools for Technology Transfer, 2 (3): 307--315, 1999.
[26]
K. Kennedy, C. Koelbel, and H. Zima. The Rise and Fall of High Performance Fortran: An Historical Object Lesson. In Procs. Conf. on History of Prog. Lang. (HOPL III), pages 7-1-7-22. ACM, 2007.
[27]
G. Mainland and G. Morrisett. Nikola: Embedding Compiled GPU Functions in Haskell. In Proceedings of the 3rd ACM International Symposium on Haskell, pages 67--78, 2010.
[28]
C. Oancea, C. Andreetta, J. Berthold, A. Frisch, and F. Henglein. Financial Software on GPUs: between Haskell and Fortran. In Funct. High-Perf. Comp. (FHPC'12), 2012.
[29]
C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software Thread-Level Speculation. In Procs. Langs. Comp. Parallel Computing (LCPC), pages 156--171, 2008.
[30]
C. E. Oancea and L. Rauchwerger. Logical Inference Techniques for Loop Parallelization. In Procs. of Int. Conf. Prog. Lang. Design and Impl. (PLDI), pages 509--520, 2012.
[31]
C. E. Oancea and S. M. Watt. Domains and Expressions: An Interface between Two Approaches to Computer Algebra. In Procs. Int. Symp. Symbolic Alg. Comp. (ISSAC), pages 261--269. ACM, 2005.
[32]
C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Procs. Int. Symp. on Memory Management (ISMM), pages 10--19. ACM, 2009.
[33]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop Transformations: Convexity, Pruning and Optimization. In Procs. Sym. Principles of Prog. Lang. (POPL), pages 549--562. ACM, 2011.
[34]
P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocs. phJournal of Instruction-Level Parallelism, 1999.
[35]
A. Sabry and M. Felleisen. Reasoning about programs in continuation-passing style. SIGPLAN Lisp Pointers, V (1): 288--298, Jan. 1992. ISSN 1045-3563.
[36]
J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des. Test, 12 (3): 66--73, 2010. ISSN 0740-7475.
[37]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time Composition of Run-time Data and Iteration Reorderings. In Procs. Int. Conf. Prog. Lang. Design and Implem. (PLDI), pages 91--102. ACM, 2003.
[38]
P. Thiemann and M. M. T. Chakravarty. Agda meets accelerate. In phProceedings of the 24th Symposium on Implementation and Application of Functional Languages, IFL'2012, 2013. Revised Papers, Springer-Verlag, LNCS 8241.
[39]
M. Tofte, L. Birkedal, M. Elsman, and N. Hallenberg. A retrospective on region-based memory management. phHigher-Order and Symbolic Computation (HOSC), 17 (3): 245--265, September 2004.
[40]
K. Trojahner and C. Grelck. Dependently typed array programs don't go wrong. The Journal of Logic and Algebraic Programming, 78 (7): 643--664, 2009. The 19th Nordic Workshop on Programming Theory (NWPT'2007).
[41]
K. Trojahner and C. Grelck. Descriptor-free representation of arrays with dependent types. In Proceedings of the 20th International Conference on Implementation and Application of Functional Languages, IFL'08, pages 100--117. Springer-Verlag, 2011.
[42]
M. Vejlstrup. Multiplicity inference. Master's thesis, Department of Computer Science, University of Copenhagen, September 1994.
[43]
S. M. Watt. Aldor. In J. Grabmeier, E. Kaltofen, and V. Weispfenning, editors, Handbook of Computer Algebra, pages 154--160, 2003.
[44]
S. M. Watt, R. D. Jenks, R. S. Sutor, and B. M. Trager. The Scratchpad II Type System: Domains and Subdomains. In Procs of Computing Tools For Scientific Problem Solving, pages 63--82. A. Miola ed. Academic Press, 1990.
[45]
Y. Zhang and F. Mueller. CuNesl: Compiling nested data-parallel languages for SIMT architectures. In Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP'12, pages 340--349, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-0-7695-4796-1.

Cited By

View all
  • (2021)Towards size-dependent types for array programmingProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464310(1-14)Online publication date: 17-Jun-2021
  • (2020)Accelerating Nested Data Parallelism: Preserving RegularityEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_27(426-442)Online publication date: 18-Aug-2020
  • (2019)Compositional deep learning in FutharkProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3331553.3342617(47-59)Online publication date: 18-Aug-2019
  • Show More Cited By

Index Terms

  1. Size slicing: a hybrid approach to size inference in futhark

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FHPC '14: Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing
    September 2014
    116 pages
    ISBN:9781450330404
    DOI:10.1145/2636228
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 September 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dependent types
    2. functional language
    3. size analysis

    Qualifiers

    • Research-article

    Conference

    ICFP'14
    Sponsor:

    Acceptance Rates

    FHPC '14 Paper Acceptance Rate 10 of 11 submissions, 91%;
    Overall Acceptance Rate 18 of 25 submissions, 72%

    Upcoming Conference

    ICFP '25
    ACM SIGPLAN International Conference on Functional Programming
    October 12 - 18, 2025
    Singapore , Singapore

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Towards size-dependent types for array programmingProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464310(1-14)Online publication date: 17-Jun-2021
    • (2020)Accelerating Nested Data Parallelism: Preserving RegularityEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_27(426-442)Online publication date: 18-Aug-2020
    • (2019)Compositional deep learning in FutharkProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3331553.3342617(47-59)Online publication date: 18-Aug-2019
    • (2019)Incremental flattening for nested data parallelismProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295707(53-67)Online publication date: 16-Feb-2019
    • (2019)High-Performance Defunctionalisation in FutharkTrends in Functional Programming10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
    • (2018)Modular acceleration: tricky cases of functional high-performance computingProceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing10.1145/3264738.3264740(10-21)Online publication date: 17-Sep-2018
    • (2018)Certified Compilation of Financial ContractsProceedings of the 20th International Symposium on Principles and Practice of Declarative Programming10.1145/3236950.3236955(1-13)Online publication date: 3-Sep-2018
    • (2018)Static interpretation of higher-order modules in Futhark: functional GPU programming in the largeProceedings of the ACM on Programming Languages10.1145/32367922:ICFP(1-30)Online publication date: 30-Jul-2018
    • (2017)Lift: a functional data-parallel IR for high-performance GPU code generationProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049841(74-85)Online publication date: 4-Feb-2017
    • (2017)Futhark: purely functional GPU-programming with nested parallelism and in-place array updatesACM SIGPLAN Notices10.1145/3140587.306235452:6(556-571)Online publication date: 14-Jun-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media