Abstract
R is a widely-used statistical programming language in the data science community. However, in the big data era, R faces the challenges from large scale data analysis tasks. It lacks the ability of distributed linear algebra computation in its local interactive shell. In this paper, we propose iPLAR, a system that runs in the interactive R environment, wraps the high performance parallel linear algebra library, and provides a group of easy-to-use interfaces. iPLAR adopts the client-server model to uncouple the interactive shell from the ScaLAPACK/MPI distributed computing backend. In addition, it provides R users with a group of parallel-detail-transparent interfaces that are similar to the native R linear algebra interfaces. We evaluate the efficiency of iPLAR with representative basic matrix operations and two widely-used machine learning algorithms. Experimental results show that iPLAR achieves the near-linear data scalability and enhances the interactive processing capability of R to large problem scales.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adler, D., Glser, C., Nenadic, O., Oehlschlgel, J., Zucchini, W.: ff: memory-efficient storage of large data on disk and fast access functions (2014). https://cran.r-project.org/web/packages/ff/
Casanova, H., Dongarra, J.: Netsolve: a network-enabled server for solving computational science problems. Int. J. High Perform. Comput. Appl. 11(3), 212–223 (1997)
Choy, R., Edelman, A.: Parallel matlab: Doing it right. Proc. IEEE 93(2), 331–341 (2005)
Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: Integrating R and hadoop. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data SIGMOD 2010, pp. 987–998. ACM, New York (2010)
Edelman, A.: The star-P high performance computing platform. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. IV-1197–IV-1200. IEEE Press, New York (2007)
Husbands, P., Isbell, C.: The parallel problems server: a client-server model for interactive large scale scientific computation. In: Hernández, V., Palma, J.M.L.M., Dongarra, J. (eds.) VECPAR 1998. LNCS, vol. 1573, pp. 156–169. Springer, Heidelberg (1999)
Kane, M.J., Emerson, J., Weston, S.: Scalable strategies for computing with massive data. J. Stat. Softw. 55(14), 1–19 (2013). http://www.jstatsoft.org/v55/i14/
King, J., Magoulas, R.: 2014 Data Science Salary Survey. O’Reilly (2014)
Li, H., Kadav, A., Kruus, E., Ungureanu, C.: Malt: Distributed data-parallelism for existing ml applications. In: Proceedings of the Tenth European Conference on Computer Systems EuroSys 2015, pp. 3:1–3:16. ACM, New York (2015)
Matloff, N.: The Art of R Programming: A Tour of Statistical Software Design. No Starch Press, San Francisco (2011)
Ostrouchov, G., Chen, W.C., Schmidt, D., Patel, P.: Programming with big data in R (2012). http://r-pbd.org/
Qian, Z., Chen, X., Kang, N., Chen, M., Yu, Y., Moscibroda, T., Zhang, Z.: Madlinq: Large-scale distributed matrix computation for the cloud. In: Proceedings of the 7th ACM European Conference on Computer Systems EuroSys 2012, pp. 197–210. ACM, New York (2012)
Apache Spark Project. http://spark.apache.org/
SparkR: R frontend for Spark. http://amplab-extras.github.io/SparkR-pkg/
Tippmann, S., et al.: Programming tools: Adventures with R. Nature 517(7532), 109–110 (2015)
Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., Schreiber, R.S.: Presto: Distributed machine learning and graph processing with sparse matrices. In: Proceedings of the 8th ACM European Conference on Computer Systems EuroSys 2013, pp. 197–210. ACM, New York (2013)
Yu, H.: Rmpi: Parallel statistical computing in R. R News 2(2), 10–14 (2002). http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf
Acknowledgments.
This work is funded in part by China NSF Grants (No. 61572250), Jiangsu Province Industry Support Program (BE2014131) and China NSF Grants (No. 61223003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Z., Fan, S., Gu, R., Yuan, C., Huang, Y. (2015). iPLAR: Towards Interactive Programming with Parallel Linear Algebra in R. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-27140-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)