skip to main content
10.1145/2370816.2370825acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Riposte: a trace-driven compiler and parallel VM for vector code in R

Published:19 September 2012Publication History

ABSTRACT

There is a growing utilization gap between modern hardware and modern programming languages for data analysis.Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically-typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism.

In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5--50x faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150x without explicit programmer parallelization.

References

  1. Google V8 Javascript engine. http://code.google.com/p/v8/.Google ScholarGoogle Scholar
  2. The LuaJIT project. http://http://luajit.org/.Google ScholarGoogle Scholar
  3. The Ra extension to R. http://www.milbo.users.sonic.net/ra/.Google ScholarGoogle Scholar
  4. P. S. Abrams. An APL Machine. PhD thesis, Stanford Linear Accelerator Center, Stanford University, Stanford, CA, USA, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Aslam and L. Hendren. McFLAT: a profile-based framework for Matlab loop analysis and transformations. In Proceedings of the 23rd international conference on Languages and compilers for parallel computing, LCPC'10, pages 1--15, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, pages 225--237, 2005.Google ScholarGoogle Scholar
  8. S. Brunthaler. Inline caching meets quickening. In Proceedings of the 24th European conference on Object-oriented programming, ECOOP'10, pages 429--451, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 47--56, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Chevalier-Boisvert, L. Hendren, and C. Verbrugge. Optimizing Matlab through just-in-time specialization. In R. Gupta, editor, Compiler Construction, volume 6011 of Lecture Notes in Computer Science, pages 46--65. Springer Berlin / Heidelberg, 2010. 10.1007/978-3-642-11970-5_4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN international conference on Functional programming, ICFP '07, pages 315--326, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques, PACT '06, pages 33--42, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 465--478, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gal, C. W. Probst, and M. Franz. HotpathVM: An effective JIT compiler for resource-constrained devices. In Proceedings of the 2nd international conference on Virtual execution environments, VEE '06, pages 144--153, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Grover and Y. Lin. Compiling CUDA and other languages for GPUs. In GPU Technology Conference (GTC), 2012.Google ScholarGoogle Scholar
  16. L. J. Guibas and D. K. Wyatt. Compilation and delayed evaluation in APL. In Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, POPL '78, pages 1--8, New York, NY, USA, 1978. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP '10, pages 261--272, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Lameed and L. Hendren. Staged static techniques to efficiently implement array copy semantics in a Matlab JIT compiler. In Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, CC'11/ETAPS'11, pages 22--41, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. C. Miller. Tentative compilation: A design for an APL compiler. SIGAPL APL Quote Quad, 9:88--95, May 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Morandat, B. Hill, L. Osvald, and J. Vitek. Evaluating the design of the R language. In ECOOP 2012 Object-Oriented Programming, Lecture Notes in Computer Science, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO) 2011, pages 224--235, April 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Papakipos. The PeakStream platform: High productivity software development for multi-core processors. Technical report, 2006.Google ScholarGoogle Scholar
  23. S. Peyton Jones. Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems, APLAS '08, pages 138--138, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Pharr and W. R. Mark. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of the 2012 Innovative Parallel Computing: Foundations & Applications of GPU, Manycore, and Heterogeneous Systems, InPar '12, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  25. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Sci. Program., 13(4):277--298, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Poletto and V. Sarkar. Linear scan register allocation. ACM Trans. Program. Lang. Syst., 21(5):895--913, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3-900051-07-0.Google ScholarGoogle Scholar
  28. A. R. Runnalls and C. A. Silles. CXXR: An ideas hatchery for future R development. In Proceedings of the 2011 Joint Statistical Meetings (JSM), 2011.Google ScholarGoogle Scholar
  29. M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann. State of the art in parallel computing with R. Journal of Statistical Software, 31(1):1--27, 8 2009.Google ScholarGoogle ScholarCross RefCross Ref
  30. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A many-core x86 architecture for visual computing. In ACM SIGGRAPH 2008 papers, SIGGRAPH '08, pages 18:1--18:15, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Tierney. Code analysis and parallelizing vector operations in R. Computational Statistics, 24:217--223, 2009. 10.1007/s00180-008-0117-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Tierney. A byte code compiler for R. Technical report, 2012.Google ScholarGoogle Scholar
  33. A. Tzannes, G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 179--190, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Wentzlaff and A. Agarwal. Factored operating systems (FOS): The case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43(2):76--85, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Wolfe. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, Supercomputing '89, pages 655--664, New York, NY, USA, 1989. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. Swift: Fast, reliable, loosely coupled parallel computation. In Services, 2007 IEEE Congress on, pages 199--206, July 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Riposte: a trace-driven compiler and parallel VM for vector code in R

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
          September 2012
          512 pages
          ISBN:9781450311823
          DOI:10.1145/2370816

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 September 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate121of471submissions,26%

          Upcoming Conference

          PACT '24
          International Conference on Parallel Architectures and Compilation Techniques
          October 14 - 16, 2024
          Southern California , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader