Skip to main content

Size Oblivious Programming with InfiniMem

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Included in the following conference series:

Abstract

Many recently proposed BigData processing frameworks make programming easier, but typically expect the datasets to fit in the memory of either a single multicore machine or a cluster of multicore machines. When this assumption does not hold, these frameworks fail. We introduce the InfiniMem framework that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident. InfiniMem is easy to program with: the user just indicates the large collections of objects that are to be made disk-resident, while InfiniMem transparently handles their I/O management. The InfiniMem library can manage a very large number of objects in a uniform manner, even though the objects have different characteristics and relationships which, when processed, give rise to a wide range of access patterns requiring different organizations of data on the disk. We demonstrate the ease of programming and versatility of InfiniMem with 3 different probabilistic analytics algorithms, 3 different graph processing size oblivious frameworks; they require minimal effort, 6–9 additional lines of code. We show that InfiniMem can successfully generate a mesh with 7.5 million nodes and 300 million edges (4.5 GB on disk) in 40 min and it performs the PageRank computation on a 14 GB graph with 134 million vertices and 805 million edges at 14 min per iteration on an 8-core machine with 8 GB RAM. Many graph generators and processing frameworks cannot handle such large graphs. We also exploit InfiniMem on a cluster to scale-up an object-based DSM.

This work was supported by NSF Grant CCF-1524852, CCF-1318103, CNS-1157377, CCF-0963996, CCF-0905509, and a Google Research Award.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avery, C.: Giraph: large-scale graph processing infrastruction on hadoop. In: Proceedings of Hadoop Summit. Santa Clara, USA: [sn] (2011)

    Google Scholar 

  2. Bader, D.A., Madduri, K.: Gtgraph: A synthetic graph generator suite. Atlanta, GA, February 2006

    Google Scholar 

  3. Berry, J., Mackey, G.: The multithreaded graph library (2014)

    Google Scholar 

  4. Bu, Y., Borkar, V., Xu, G., Carey, M.J.: A bloat-aware design for big data applications. In: Proceedings of ISMM 2013, pp. 119–130. ACM (2013)

    Google Scholar 

  5. Chiang, Y.J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA 1995, pp. 139–149 (1995)

    Google Scholar 

  6. Da Zheng, D.M., Burns, R., Vogelstein, J., Priebe, C.E., Szalay, A.S.: Flashgraph: processing billion-node graphs on an array of commodity SSDs. In: FAST (2015)

    Google Scholar 

  7. Facebook: RocksDB Project. http://RocksDB.org

  8. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: OSDI 2012, pp. 17–30 (2012)

    Google Scholar 

  9. Koduru, S.-C., Vora, K., Gupta, R.: Optimizing caching DSM for distributed software speculation. In: Proceedings of Cluster Computing (2015)

    Google Scholar 

  10. Kundeti, V.K., et al.: Efficient parallel and out of core algorithms for constructing large bi-directed de bruijn graphs. BMC bioinform. 11(1), 560 (2010)

    Article  Google Scholar 

  11. Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Symposium on OSDI, pp. 31–46 (2012)

    Google Scholar 

  12. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. (2010). arXiv:1006.4990

  13. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD ICMD, pp. 135–146. ACM (2010)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)

    Google Scholar 

  15. Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., et al.: The tao of parallelism in algorithms. ACM SIGPLAN Not. 46, 12–25 (2011)

    Article  Google Scholar 

  16. Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of PPopp 2013, pp. 135–146. ACM (2013)

    Google Scholar 

  17. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed filesystem. In: IEEE MSST 2010, pp. 1–10 (2010)

    Google Scholar 

  18. Siek, J., Lee, L., Lumsdaine, A.: The boost graph library (BGL) (2000)

    Google Scholar 

  19. Team, T., et al.: Apache mahout project (2014). https://mahout.apace.org

  20. Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. Extern. Mem. Algorithms Vis. 50, 161–179 (1999)

    MathSciNet  Google Scholar 

  21. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark:cluster computing with working sets. In: Proceedings of HotCloud 2010, vol. 10, p. 10 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sai Charan Koduru .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Koduru, S.C., Gupta, R., Neamtiu, I. (2016). Size Oblivious Programming with InfiniMem . In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29778-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29777-4

  • Online ISBN: 978-3-319-29778-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics