ABSTRACT
Stream fusion is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all.
In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.
- G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. phJournal of Parallel and Distributed Computing, 21 (1): 4--14, 1994. Google ScholarDigital Library
- 2012)}bryan_osullivan_statistics:_2012Bryan O'Sullivan. statistics: A library of statistical types, data, and functions, aug 2012. URL http://hackage.haskell.org/package/statistics.Google Scholar
- M. M. T. Chakravarty, G. Keller, S. Peyton Jones, and S. Marlow. Associated types with class. In phProceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, POPL '05, page 1--13, New York, NY, USA, 2005. Google ScholarDigital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In phProceedings of the 2007 workshop on Declarative Aspects of Multicore Programming, DAMP '07, page 10--18, Nice, France, 2007. Google ScholarDigital Library
- 011)}christian_honer_zu_siederdissen_statisticalmethods:_2011Christian Höner zu Siederdissen. StatisticalMethods: collection of useful statistical methods., aug 2011. URL http://hackage.haskell.org/package/StatisticalMethods.Google Scholar
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In phProceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 315--326, Freiburg, Germany, 2007. Google ScholarDigital Library
- 010)}duncun_coutts_stream_2010Duncun Coutts. phStream Fusion: Practical shortcut fusion for coinductive sequence types. PhD thesis, University of Oxford, 2010.Google Scholar
- A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In phProceedings of the conference on Functional Programming Languages and Computer Architecture, FPCA '93, page 223--232, New York, NY, USA, 1993. Google ScholarDigital Library
- }goto_anatomy_2008K. Goto and R. v. d. Geijn. Anatomy of high-performance matrix multiplication. phACM Trans. Math. Softw., 34 (3): 1--25, 2008\natexlaba. Google ScholarDigital Library
- }goto_high-performance_2008K. Goto and R. v. d. Geijn. High-performance implementation of the level-3 BLAS. phACM Trans. Math. Softw., 35 (1): 1--14, 2008\natexlabb. Google ScholarDigital Library
- guennebaud_eigen_2010G. Guennebaud, B. Jacob, and others. Eigen v3, 2010. URL http://eigen.tuxfamily.org.Google Scholar
- G. W. Hamilton. Extending higher-order deforestation: transforming programs to eliminate even more trees. In K. Hammond and S. Curtis, editors, phProceedings of the Third Scottish Functional Programming Workshop, page 25--36, Exeter, UK, UK, aug 2001. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. phThe Elements of Statistical Learning. 2 edition, 2009.Google ScholarCross Ref
- t al.(2010)Joerg Walter, Mathias Koch, Gunter Winkler, and David Bellot}joerg_walter_boost_2010Joerg Walter, Mathias Koch, Gunter Winkler, and David Bellot. Boost basic linear algebra - 1.53.0, 2010. URL http://www.boost.org/doc/libs/1_53_0/libs/numeric/ublas/doc/index.htm.Google Scholar
- P. Johann. Short cut fusion: proved and improved. In phProceedings of the 2nd international conference on Semantics, applications, and implementation of program generation, SAIG'01, page 47--71, Berlin, Heidelberg, 2001. Google ScholarDigital Library
- W. Kahan. Pracniques: further remarks on reducing truncation errors. phCommun. ACM, 8 (1): 40--, jan 1965. Google ScholarDigital Library
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In phProceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP '10, page 261--272, New York, NY, USA, 2010. Google ScholarDigital Library
- J. Launchbury and S. L. Peyton Jones. State in Haskell. phLisp and Symbolic Computation, 8 (4): 293--341, 1995. Google ScholarDigital Library
- R. Leshchinskiy. vector: Efficient arrays, oct 2012. URL http://hackage.haskell.org/package/vector.Google Scholar
- B. Lippmeier and G. Keller. Efficient parallel stencil convolution in Haskell. In phProceedings of the 4th ACM Symposium on Haskell, Haskell '11, page 59--70, New York, NY, USA, 2011. Google ScholarDigital Library
- B. Lippmeier, M. Chakravarty, G. Keller, and S. Peyton Jones. Guiding parallel array fusion with indexed types. In phProceedings of the 2012 Symposium on Haskell, Haskell '12, page 25--36, New York, NY, USA, 2012. Google ScholarDigital Library
- S. Marlow and S. Peyton Jones. Making a fast curry: Push/Enter vs. Eval/Apply for higher-order languages. phJournal of Functional Programming, 16 (4--5): 415--449, 2006. Google ScholarDigital Library
- S. Marlow and P. Wadler. Deforestation for higher-order functions. In phProceedings of the 1992 Glasgow Workshop on Functional Programming, page 154--165, London, UK, UK, 1993. Google ScholarDigital Library
- S. Peyton Jones. Call-pattern specialisation for Haskell programs. In phProceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP '07, page 327--337, New York, NY, USA, 2007. Google ScholarDigital Library
- S. Peyton Jones, R. Leshchinskiy, G. Keller, and M. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In phProgramming Languages and Systems, page 138. 2008. Google ScholarDigital Library
- S. L. Peyton Jones and J. Launchbury. Unboxed values as first class citizens in a non-strict functional language. In phProceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture, pages 636--666, 1991. Google ScholarDigital Library
- S. L. Peyton Jones, N. Ramsey, and F. Reig. C--: a portable assembly language that supports garbage collection. In phInternational Conference on Principles and Practice of Declarative Programming, sep 1999. Google ScholarDigital Library
- S. L. Peyton Jones, T. Hoare, and A. Tolmach. Playing by the rules: rewriting as a practical optimisation technique. In phProceedings of the 2001 ACM SIGPLAN Workshop on Haskell, 2001.Google Scholar
- J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In phProceedings of the seventh ACM SIGPLAN International Conference on Functional Programming, ICFP '02, page 124--132, New York, NY, USA, 2002. Google ScholarDigital Library
- A. Takano and E. Meijer. Shortcut deforestation in calculational form. In phProceedings of the seventh international conference on Functional Programming and Computer Architecture, FPCA '95, page 306--313, New York, NY, USA, 1995. Google ScholarDigital Library
- D. A. Terei and M. M. Chakravarty. An LLVM backend for GHC. In phProceedings of the third ACM Symposium on Haskell, Haskell '10, page 109--120, New York, NY, USA, 2010. Google ScholarDigital Library
- T. Veldhuizen. Expression templates. phCGoogle Scholar
- Report, 7 (5): 26--31, jun 1995.Google Scholar
- T. L. Veldhuizen. Arrays in BlitzGoogle Scholar
- . In D. Caromel, R. R. Oldehoeft, and M. Tholburn, editors, phComputing in Object-Oriented Parallel Environments, number 1505 in Lecture Notes in Computer Science, pages 223--230. jan 1998. Google ScholarDigital Library
- P. Wadler. Deforestation: transforming programs to eliminate trees. phTheoretical Computer Science, 73 (2): 231--248, jun 1990. Google ScholarDigital Library
Index Terms
- Exploiting vector instructions with generalized stream fusion
Recommendations
Exploiting vector instructions with generalized stream fusion
ICFP '13Stream fusion is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed ...
Exploiting vector instructions with generalized stream fusion
Ideally, a program written as a composition of concise, self-contained components should perform as well as the equivalent hand-written version where the functionality of what was many components has been manually combined into a monolithic ...
The HERMIT in the stream: fusing stream fusion's concatMap
PEPM '14: Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program ManipulationStream Fusion, a popular deforestation technique in the Haskell community, cannot fuse the concatMap combinator. This is a serious limitation, as concatMap represents computations on nested streams. The original implementation of Stream Fusion used the ...
Comments