Skip to main content

iPLAR: Towards Interactive Programming with Parallel Linear Algebra in R

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

Abstract

R is a widely-used statistical programming language in the data science community. However, in the big data era, R faces the challenges from large scale data analysis tasks. It lacks the ability of distributed linear algebra computation in its local interactive shell. In this paper, we propose iPLAR, a system that runs in the interactive R environment, wraps the high performance parallel linear algebra library, and provides a group of easy-to-use interfaces. iPLAR adopts the client-server model to uncouple the interactive shell from the ScaLAPACK/MPI distributed computing backend. In addition, it provides R users with a group of parallel-detail-transparent interfaces that are similar to the native R linear algebra interfaces. We evaluate the efficiency of iPLAR with representative basic matrix operations and two widely-used machine learning algorithms. Experimental results show that iPLAR achieves the near-linear data scalability and enhances the interactive processing capability of R to large problem scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adler, D., Glser, C., Nenadic, O., Oehlschlgel, J., Zucchini, W.: ff: memory-efficient storage of large data on disk and fast access functions (2014). https://cran.r-project.org/web/packages/ff/

  2. Casanova, H., Dongarra, J.: Netsolve: a network-enabled server for solving computational science problems. Int. J. High Perform. Comput. Appl. 11(3), 212–223 (1997)

    Article  Google Scholar 

  3. Choy, R., Edelman, A.: Parallel matlab: Doing it right. Proc. IEEE 93(2), 331–341 (2005)

    Article  Google Scholar 

  4. Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: Integrating R and hadoop. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data SIGMOD 2010, pp. 987–998. ACM, New York (2010)

    Google Scholar 

  5. Edelman, A.: The star-P high performance computing platform. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. IV-1197–IV-1200. IEEE Press, New York (2007)

    Google Scholar 

  6. Husbands, P., Isbell, C.: The parallel problems server: a client-server model for interactive large scale scientific computation. In: Hernández, V., Palma, J.M.L.M., Dongarra, J. (eds.) VECPAR 1998. LNCS, vol. 1573, pp. 156–169. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  7. Kane, M.J., Emerson, J., Weston, S.: Scalable strategies for computing with massive data. J. Stat. Softw. 55(14), 1–19 (2013). http://www.jstatsoft.org/v55/i14/

    Article  Google Scholar 

  8. King, J., Magoulas, R.: 2014 Data Science Salary Survey. O’Reilly (2014)

    Google Scholar 

  9. Li, H., Kadav, A., Kruus, E., Ungureanu, C.: Malt: Distributed data-parallelism for existing ml applications. In: Proceedings of the Tenth European Conference on Computer Systems EuroSys 2015, pp. 3:1–3:16. ACM, New York (2015)

    Google Scholar 

  10. Matloff, N.: The Art of R Programming: A Tour of Statistical Software Design. No Starch Press, San Francisco (2011)

    Google Scholar 

  11. Ostrouchov, G., Chen, W.C., Schmidt, D., Patel, P.: Programming with big data in R (2012). http://r-pbd.org/

  12. Qian, Z., Chen, X., Kang, N., Chen, M., Yu, Y., Moscibroda, T., Zhang, Z.: Madlinq: Large-scale distributed matrix computation for the cloud. In: Proceedings of the 7th ACM European Conference on Computer Systems EuroSys 2012, pp. 197–210. ACM, New York (2012)

    Google Scholar 

  13. Apache Spark Project. http://spark.apache.org/

  14. SparkR: R frontend for Spark. http://amplab-extras.github.io/SparkR-pkg/

  15. Tippmann, S., et al.: Programming tools: Adventures with R. Nature 517(7532), 109–110 (2015)

    Article  Google Scholar 

  16. Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., Schreiber, R.S.: Presto: Distributed machine learning and graph processing with sparse matrices. In: Proceedings of the 8th ACM European Conference on Computer Systems EuroSys 2013, pp. 197–210. ACM, New York (2013)

    Google Scholar 

  17. Yu, H.: Rmpi: Parallel statistical computing in R. R News 2(2), 10–14 (2002). http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf

    Google Scholar 

Download references

Acknowledgments.

This work is funded in part by China NSF Grants (No. 61572250), Jiangsu Province Industry Support Program (BE2014131) and China NSF Grants (No. 61223003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihua Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, Z., Fan, S., Gu, R., Yuan, C., Huang, Y. (2015). iPLAR: Towards Interactive Programming with Parallel Linear Algebra in R. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27140-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27139-2

  • Online ISBN: 978-3-319-27140-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics