research-article

Riposte: a trace-driven compiler and parallel VM for vector code in R

Authors:
Justin Talbot

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Zachary DeVito

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Pat Hanrahan

Stanford, Stanford, CA, USA

Stanford, Stanford, CA, USA
View Profile

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesSeptember 2012Pages 43–52https://doi.org/10.1145/2370816.2370825

Published:19 September 2012Publication History

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 43–52

ABSTRACT

There is a growing utilization gap between modern hardware and modern programming languages for data analysis.Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically-typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism.

In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5--50x faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150x without explicit programmer parallelization.

References

Google V8 Javascript engine. http://code.google.com/p/v8/.Google Scholar
The LuaJIT project. http://http://luajit.org/.Google Scholar
The Ra extension to R. http://www.milbo.users.sonic.net/ra/.Google Scholar
P. S. Abrams. An APL Machine. PhD thesis, Stanford Linear Accelerator Center, Stanford University, Stanford, CA, USA, 1970. Google ScholarDigital Library
A. Aslam and L. Hendren. McFLAT: a profile-based framework for Matlab loop analysis and transformations. In Proceedings of the 23rd international conference on Languages and compilers for parallel computing, LCPC'10, pages 1--15, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, pages 225--237, 2005.Google Scholar
S. Brunthaler. Inline caching meets quickening. In Proceedings of the 24th European conference on Object-oriented programming, ECOOP'10, pages 429--451, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 47--56, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
M. Chevalier-Boisvert, L. Hendren, and C. Verbrugge. Optimizing Matlab through just-in-time specialization. In R. Gupta, editor, Compiler Construction, volume 6011 of Lecture Notes in Computer Science, pages 46--65. Springer Berlin / Heidelberg, 2010. 10.1007/978-3-642-11970-5_4. Google ScholarDigital Library
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN international conference on Functional programming, ICFP '07, pages 315--326, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques, PACT '06, pages 33--42, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 465--478, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
A. Gal, C. W. Probst, and M. Franz. HotpathVM: An effective JIT compiler for resource-constrained devices. In Proceedings of the 2nd international conference on Virtual execution environments, VEE '06, pages 144--153, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
V. Grover and Y. Lin. Compiling CUDA and other languages for GPUs. In GPU Technology Conference (GTC), 2012.Google Scholar
L. J. Guibas and D. K. Wyatt. Compilation and delayed evaluation in APL. In Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, POPL '78, pages 1--8, New York, NY, USA, 1978. ACM. Google ScholarDigital Library
G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP '10, pages 261--272, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
N. Lameed and L. Hendren. Staged static techniques to efficiently implement array copy semantics in a Matlab JIT compiler. In Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, CC'11/ETAPS'11, pages 22--41, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
T. C. Miller. Tentative compilation: A design for an APL compiler. SIGAPL APL Quote Quad, 9:88--95, May 1979. Google ScholarDigital Library
F. Morandat, B. Hill, L. Osvald, and J. Vitek. Evaluating the design of the R language. In ECOOP 2012 Object-Oriented Programming, Lecture Notes in Computer Science, 2012. Google ScholarDigital Library
C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO) 2011, pages 224--235, April 2011. Google ScholarDigital Library
M. Papakipos. The PeakStream platform: High productivity software development for multi-core processors. Technical report, 2006.Google Scholar
S. Peyton Jones. Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems, APLAS '08, pages 138--138, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
M. Pharr and W. R. Mark. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of the 2012 Innovative Parallel Computing: Foundations & Applications of GPU, Manycore, and Heterogeneous Systems, InPar '12, 2012.Google ScholarCross Ref
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Sci. Program., 13(4):277--298, Oct. 2005. Google ScholarDigital Library
M. Poletto and V. Sarkar. Linear scan register allocation. ACM Trans. Program. Lang. Syst., 21(5):895--913, Sept. 1999. Google ScholarDigital Library
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3-900051-07-0.Google Scholar
A. R. Runnalls and C. A. Silles. CXXR: An ideas hatchery for future R development. In Proceedings of the 2011 Joint Statistical Meetings (JSM), 2011.Google Scholar
M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann. State of the art in parallel computing with R. Journal of Statistical Software, 31(1):1--27, 8 2009.Google ScholarCross Ref
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A many-core x86 architecture for visual computing. In ACM SIGGRAPH 2008 papers, SIGGRAPH '08, pages 18:1--18:15, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
L. Tierney. Code analysis and parallelizing vector operations in R. Computational Statistics, 24:217--223, 2009. 10.1007/s00180-008-0117-9. Google ScholarDigital Library
L. Tierney. A byte code compiler for R. Technical report, 2012.Google Scholar
A. Tzannes, G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 179--190, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
D. Wentzlaff and A. Agarwal. Factored operating systems (FOS): The case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43(2):76--85, Apr. 2009. Google ScholarDigital Library
M. Wolfe. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, Supercomputing '89, pages 655--664, New York, NY, USA, 1989. ACM. Google ScholarDigital Library
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. Swift: Fast, reliable, loosely coupled parallel computation. In Services, 2007 IEEE Congress on, pages 199--206, July 2007.Google ScholarCross Ref

Index Terms

Riposte: a trace-driven compiler and parallel VM for vector code in R
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Paraprox: pattern-based approximation for data parallel applications
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where ...
Read More
Paraprox: pattern-based approximation for data parallel applications
ASPLOS '14

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where ...
Read More
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data parallel
just-in-time compilation
r language
tracing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 314
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.