Size Oblivious Programming with InfiniMem

Koduru, Sai Charan; Gupta, Rajiv; Neamtiu, Iulian

doi:10.1007/978-3-319-29778-1_1

Sai Charan Koduru¹⁶,
Rajiv Gupta¹⁶ &
Iulian Neamtiu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Included in the following conference series:

Languages and Compilers for Parallel Computing

669 Accesses

Abstract

Many recently proposed BigData processing frameworks make programming easier, but typically expect the datasets to fit in the memory of either a single multicore machine or a cluster of multicore machines. When this assumption does not hold, these frameworks fail. We introduce the InfiniMem framework that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident. InfiniMem is easy to program with: the user just indicates the large collections of objects that are to be made disk-resident, while InfiniMem transparently handles their I/O management. The InfiniMem library can manage a very large number of objects in a uniform manner, even though the objects have different characteristics and relationships which, when processed, give rise to a wide range of access patterns requiring different organizations of data on the disk. We demonstrate the ease of programming and versatility of InfiniMem with 3 different probabilistic analytics algorithms, 3 different graph processing size oblivious frameworks; they require minimal effort, 6–9 additional lines of code. We show that InfiniMem can successfully generate a mesh with 7.5 million nodes and 300 million edges (4.5 GB on disk) in 40 min and it performs the PageRank computation on a 14 GB graph with 134 million vertices and 805 million edges at 14 min per iteration on an 8-core machine with 8 GB RAM. Many graph generators and processing frameworks cannot handle such large graphs. We also exploit InfiniMem on a cluster to scale-up an object-based DSM.

This work was supported by NSF Grant CCF-1524852, CCF-1318103, CNS-1157377, CCF-0963996, CCF-0905509, and a Google Research Award.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Space Limited Graph Algorithms on Big Data

Finding Articulation Points of Large Graphs in Linear Time

Sublinear-Time Reductions for Big Data Computing

References

Avery, C.: Giraph: large-scale graph processing infrastruction on hadoop. In: Proceedings of Hadoop Summit. Santa Clara, USA: [sn] (2011)
Google Scholar
Bader, D.A., Madduri, K.: Gtgraph: A synthetic graph generator suite. Atlanta, GA, February 2006
Google Scholar
Berry, J., Mackey, G.: The multithreaded graph library (2014)
Google Scholar
Bu, Y., Borkar, V., Xu, G., Carey, M.J.: A bloat-aware design for big data applications. In: Proceedings of ISMM 2013, pp. 119–130. ACM (2013)
Google Scholar
Chiang, Y.J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA 1995, pp. 139–149 (1995)
Google Scholar
Da Zheng, D.M., Burns, R., Vogelstein, J., Priebe, C.E., Szalay, A.S.: Flashgraph: processing billion-node graphs on an array of commodity SSDs. In: FAST (2015)
Google Scholar
Facebook: RocksDB Project. http://RocksDB.org
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: OSDI 2012, pp. 17–30 (2012)
Google Scholar
Koduru, S.-C., Vora, K., Gupta, R.: Optimizing caching DSM for distributed software speculation. In: Proceedings of Cluster Computing (2015)
Google Scholar
Kundeti, V.K., et al.: Efficient parallel and out of core algorithms for constructing large bi-directed de bruijn graphs. BMC bioinform. 11(1), 560 (2010)
Article Google Scholar
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Symposium on OSDI, pp. 31–46 (2012)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. (2010). arXiv:1006.4990
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD ICMD, pp. 135–146. ACM (2010)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Google Scholar
Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., et al.: The tao of parallelism in algorithms. ACM SIGPLAN Not. 46, 12–25 (2011)
Article Google Scholar
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of PPopp 2013, pp. 135–146. ACM (2013)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed filesystem. In: IEEE MSST 2010, pp. 1–10 (2010)
Google Scholar
Siek, J., Lee, L., Lumsdaine, A.: The boost graph library (BGL) (2000)
Google Scholar
Team, T., et al.: Apache mahout project (2014). https://mahout.apace.org
Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. Extern. Mem. Algorithms Vis. 50, 161–179 (1999)
MathSciNet Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark:cluster computing with working sets. In: Proceedings of HotCloud 2010, vol. 10, p. 10 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, Riverside, USA
Sai Charan Koduru, Rajiv Gupta & Iulian Neamtiu

Authors

Sai Charan Koduru
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Iulian Neamtiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Charan Koduru .

Editor information

Editors and Affiliations

North Carolina State University, Raleigh, North Carolina, USA
Xipeng Shen
North Carolina State University, Raleigh, North Carolina, USA
Frank Mueller
North Carolina State University, Raleigh, North Carolina, USA
James Tuck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koduru, S.C., Gupta, R., Neamtiu, I. (2016). Size Oblivious Programming with InfiniMem . In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-29778-1_1
Published: 20 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics