skip to main content
10.1145/2038916.2038929acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PrIter: a distributed framework for prioritized iterative computations

Published: 26 October 2011 Publication History

Abstract

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.

References

[1]
Amazon ec2. http://aws.amazon.com/ec2/.
[2]
Hadoop. http://hadoop.apache.org/.
[3]
Priter project. http://code.google.com/p/priter/.
[4]
Stanford dataset. http://snap.stanford.edu/data/.
[5]
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW '08, pages 895--904, 2008.
[6]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW '98, pages 107--117, 1998.
[7]
Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB '10, 2010.
[8]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation -- Volume 7, OSDI '06, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association.
[9]
Chu, Cheng T., Kim, Sang K., Lin, Yi A., Yu, Yuanyuan, Bradski, Gary R., Ng, Andrew Y., and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. In NIPS, pages 281--288, 2006.
[10]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004.
[11]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In MapReduce '10, pages 810--818, 2010.
[12]
B. He, M. Yang, Z, Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: batched stream processing for data intensive distributed computing. In SoCC '10, pages 63--74, 2010.
[13]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72.
[14]
U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM '09, pages 229--238, 2009.
[15]
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.
[16]
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58(7):1019--1031, 2007.
[17]
D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In SoCC '10, pages 51--62, 2010.
[18]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. CoRR, abs/1006.4990, 2010.
[19]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010.
[20]
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In NSDI'11, 2011.
[21]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008.
[22]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09, pages 165--178, 2009.
[23]
D. Peng and F. Dabe. Large-scale incremental processing using distributed transactions and notifications. In OSDI '10: Proceedings of the 9th conference on Symposium on Opearting Systems Design and Implementation, pages 1--15, 2010.
[24]
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI'10, 2010.
[25]
N. Slonim, N. Friedman, and N. Tishby. Unsupervised document classification using sequential information maximization. In SIGIR '02, pages 129--136, 2002.
[26]
H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In IMC '09, pages 322--335, 2009.
[27]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. In VLDB '09, pages 1626--1629, 2009.
[28]
C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys '09, pages 205--218, 2009.
[29]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI '08, pages 1--14, 2008.
[30]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud'10, pages 10--10, 2010.
[31]
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI '08, pages 29--42, 2008.
[32]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. In DataCloud '11, 2011.
[33]
T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511--4515, March 2010.

Cited By

View all
  • (2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
  • (2024)Towards Efficient Graph Processing in Geo-Distributed Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345387235:11(2147-2160)Online publication date: Nov-2024
  • (2024)Fast Iterative Graph Computing with Updated Neighbor States2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00193(2449-2462)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. PrIter
  3. distributed framework
  4. iterative algorithms
  5. prioritized iteration

Qualifiers

  • Research-article

Funding Sources

Conference

SOCC '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)5
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
  • (2024)Towards Efficient Graph Processing in Geo-Distributed Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345387235:11(2147-2160)Online publication date: Nov-2024
  • (2024)Fast Iterative Graph Computing with Updated Neighbor States2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00193(2449-2462)Online publication date: 13-May-2024
  • (2024)Ingress: an automated incremental graph processing systemThe VLDB Journal10.1007/s00778-024-00838-z33:3(781-806)Online publication date: 20-Feb-2024
  • (2023)An efficient hardware accelerator for monotonic graph algorithms on dynamic directed graphsSCIENTIA SINICA Informationis10.1360/SSI-2022-019153:8(1575)Online publication date: 15-Aug-2023
  • (2022)CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303810934:9(4119-4132)Online publication date: 1-Sep-2022
  • (2022)Indexing hs code- a hybrid indexer for an optimized search of geotagged dataSpatial Information Research10.1007/s41324-022-00473-231:1(1-13)Online publication date: 20-Aug-2022
  • (2022)Toward High-Performance Delta-Based Iterative Processing with a Group-Based ApproachJournal of Computer Science and Technology10.1007/s11390-022-2101-137:4(797-813)Online publication date: 30-Jul-2022
  • (2021)Automating incremental graph processing with flexible memoizationProceedings of the VLDB Endowment10.14778/3461535.346155014:9(1613-1625)Online publication date: 22-Oct-2021
  • (2021)Handling Iterations in Distributed Dataflow SystemsACM Computing Surveys10.1145/347760254:9(1-38)Online publication date: 8-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media