skip to main content
10.1145/2500365.2500601acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Exploiting vector instructions with generalized stream fusion

Published:25 September 2013Publication History

ABSTRACT

Stream fusion is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all.

In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.

References

  1. G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. phJournal of Parallel and Distributed Computing, 21 (1): 4--14, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2012)}bryan_osullivan_statistics:_2012Bryan O'Sullivan. statistics: A library of statistical types, data, and functions, aug 2012. URL http://hackage.haskell.org/package/statistics.Google ScholarGoogle Scholar
  3. M. M. T. Chakravarty, G. Keller, S. Peyton Jones, and S. Marlow. Associated types with class. In phProceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, POPL '05, page 1--13, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In phProceedings of the 2007 workshop on Declarative Aspects of Multicore Programming, DAMP '07, page 10--18, Nice, France, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 011)}christian_honer_zu_siederdissen_statisticalmethods:_2011Christian Höner zu Siederdissen. StatisticalMethods: collection of useful statistical methods., aug 2011. URL http://hackage.haskell.org/package/StatisticalMethods.Google ScholarGoogle Scholar
  6. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In phProceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 315--326, Freiburg, Germany, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 010)}duncun_coutts_stream_2010Duncun Coutts. phStream Fusion: Practical shortcut fusion for coinductive sequence types. PhD thesis, University of Oxford, 2010.Google ScholarGoogle Scholar
  8. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In phProceedings of the conference on Functional Programming Languages and Computer Architecture, FPCA '93, page 223--232, New York, NY, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }goto_anatomy_2008K. Goto and R. v. d. Geijn. Anatomy of high-performance matrix multiplication. phACM Trans. Math. Softw., 34 (3): 1--25, 2008\natexlaba. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }goto_high-performance_2008K. Goto and R. v. d. Geijn. High-performance implementation of the level-3 BLAS. phACM Trans. Math. Softw., 35 (1): 1--14, 2008\natexlabb. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. guennebaud_eigen_2010G. Guennebaud, B. Jacob, and others. Eigen v3, 2010. URL http://eigen.tuxfamily.org.Google ScholarGoogle Scholar
  12. G. W. Hamilton. Extending higher-order deforestation: transforming programs to eliminate even more trees. In K. Hammond and S. Curtis, editors, phProceedings of the Third Scottish Functional Programming Workshop, page 25--36, Exeter, UK, UK, aug 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Hastie, R. Tibshirani, and J. Friedman. phThe Elements of Statistical Learning. 2 edition, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. t al.(2010)Joerg Walter, Mathias Koch, Gunter Winkler, and David Bellot}joerg_walter_boost_2010Joerg Walter, Mathias Koch, Gunter Winkler, and David Bellot. Boost basic linear algebra - 1.53.0, 2010. URL http://www.boost.org/doc/libs/1_53_0/libs/numeric/ublas/doc/index.htm.Google ScholarGoogle Scholar
  15. P. Johann. Short cut fusion: proved and improved. In phProceedings of the 2nd international conference on Semantics, applications, and implementation of program generation, SAIG'01, page 47--71, Berlin, Heidelberg, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Kahan. Pracniques: further remarks on reducing truncation errors. phCommun. ACM, 8 (1): 40--, jan 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In phProceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP '10, page 261--272, New York, NY, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Launchbury and S. L. Peyton Jones. State in Haskell. phLisp and Symbolic Computation, 8 (4): 293--341, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Leshchinskiy. vector: Efficient arrays, oct 2012. URL http://hackage.haskell.org/package/vector.Google ScholarGoogle Scholar
  20. B. Lippmeier and G. Keller. Efficient parallel stencil convolution in Haskell. In phProceedings of the 4th ACM Symposium on Haskell, Haskell '11, page 59--70, New York, NY, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Lippmeier, M. Chakravarty, G. Keller, and S. Peyton Jones. Guiding parallel array fusion with indexed types. In phProceedings of the 2012 Symposium on Haskell, Haskell '12, page 25--36, New York, NY, USA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Marlow and S. Peyton Jones. Making a fast curry: Push/Enter vs. Eval/Apply for higher-order languages. phJournal of Functional Programming, 16 (4--5): 415--449, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Marlow and P. Wadler. Deforestation for higher-order functions. In phProceedings of the 1992 Glasgow Workshop on Functional Programming, page 154--165, London, UK, UK, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Peyton Jones. Call-pattern specialisation for Haskell programs. In phProceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP '07, page 327--337, New York, NY, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Peyton Jones, R. Leshchinskiy, G. Keller, and M. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In phProgramming Languages and Systems, page 138. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. L. Peyton Jones and J. Launchbury. Unboxed values as first class citizens in a non-strict functional language. In phProceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture, pages 636--666, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. L. Peyton Jones, N. Ramsey, and F. Reig. C--: a portable assembly language that supports garbage collection. In phInternational Conference on Principles and Practice of Declarative Programming, sep 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. L. Peyton Jones, T. Hoare, and A. Tolmach. Playing by the rules: rewriting as a practical optimisation technique. In phProceedings of the 2001 ACM SIGPLAN Workshop on Haskell, 2001.Google ScholarGoogle Scholar
  29. J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In phProceedings of the seventh ACM SIGPLAN International Conference on Functional Programming, ICFP '02, page 124--132, New York, NY, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Takano and E. Meijer. Shortcut deforestation in calculational form. In phProceedings of the seventh international conference on Functional Programming and Computer Architecture, FPCA '95, page 306--313, New York, NY, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. A. Terei and M. M. Chakravarty. An LLVM backend for GHC. In phProceedings of the third ACM Symposium on Haskell, Haskell '10, page 109--120, New York, NY, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Veldhuizen. Expression templates. phCGoogle ScholarGoogle Scholar
  33. Report, 7 (5): 26--31, jun 1995.Google ScholarGoogle Scholar
  34. T. L. Veldhuizen. Arrays in BlitzGoogle ScholarGoogle Scholar
  35. . In D. Caromel, R. R. Oldehoeft, and M. Tholburn, editors, phComputing in Object-Oriented Parallel Environments, number 1505 in Lecture Notes in Computer Science, pages 223--230. jan 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Wadler. Deforestation: transforming programs to eliminate trees. phTheoretical Computer Science, 73 (2): 231--248, jun 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting vector instructions with generalized stream fusion

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICFP '13: Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
      September 2013
      484 pages
      ISBN:9781450323260
      DOI:10.1145/2500365

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 September 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICFP '13 Paper Acceptance Rate40of133submissions,30%Overall Acceptance Rate333of1,064submissions,31%

      Upcoming Conference

      ICFP '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader