research-article

PrIter: a distributed framework for prioritized iterative computations

Authors:

Cuirong WangAuthors Info & Claims

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

Article No.: 13, Pages 1 - 14

https://doi.org/10.1145/2038916.2038929

Published: 26 October 2011 Publication History

Abstract

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.

References

[1]

Amazon ec2. http://aws.amazon.com/ec2/.

[2]

Hadoop. http://hadoop.apache.org/.

[3]

Priter project. http://code.google.com/p/priter/.

[4]

Stanford dataset. http://snap.stanford.edu/data/.

[5]

S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW '08, pages 895--904, 2008.

Digital Library

[6]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW '98, pages 107--117, 1998.

Digital Library

[7]

Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB '10, 2010.

Digital Library

[8]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation -- Volume 7, OSDI '06, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association.

Digital Library

[9]

Chu, Cheng T., Kim, Sang K., Lin, Yi A., Yu, Yuanyuan, Bradski, Gary R., Ng, Andrew Y., and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. In NIPS, pages 281--288, 2006.

[10]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004.

Digital Library

[11]

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In MapReduce '10, pages 810--818, 2010.

Digital Library

[12]

B. He, M. Yang, Z, Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: batched stream processing for data intensive distributed computing. In SoCC '10, pages 63--74, 2010.

Digital Library

[13]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72.

Digital Library

[14]

U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM '09, pages 229--238, 2009.

Digital Library

[15]

L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.

[16]

D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58(7):1019--1031, 2007.

Digital Library

[17]

D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In SoCC '10, pages 51--62, 2010.

Digital Library

[18]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. CoRR, abs/1006.4990, 2010.

[19]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010.

Digital Library

[20]

D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In NSDI'11, 2011.

Digital Library

[21]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008.

Digital Library

[22]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09, pages 165--178, 2009.

Digital Library

[23]

D. Peng and F. Dabe. Large-scale incremental processing using distributed transactions and notifications. In OSDI '10: Proceedings of the 9th conference on Symposium on Opearting Systems Design and Implementation, pages 1--15, 2010.

Digital Library

[24]

R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI'10, 2010.

Digital Library

[25]

N. Slonim, N. Friedman, and N. Tishby. Unsupervised document classification using sequential information maximization. In SIGIR '02, pages 129--136, 2002.

Digital Library

[26]

H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In IMC '09, pages 322--335, 2009.

Digital Library

[27]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. In VLDB '09, pages 1626--1629, 2009.

Digital Library

[28]

C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys '09, pages 205--218, 2009.

Digital Library

[29]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI '08, pages 1--14, 2008.

Digital Library

[30]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud'10, pages 10--10, 2010.

Digital Library

[31]

M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI '08, pages 29--42, 2008.

Digital Library

[32]

Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. In DataCloud '11, 2011.

Digital Library

[33]

T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511--4515, March 2010.

Cited By

Yao FTao QYu WZhang YGong SWang QYu GZhou J(2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
https://doi.org/10.14778/3632093.3632094
Yao FTao QLin SZhang YYu WGong SWang QYu GZhou J(2024)Towards Efficient Graph Processing in Geo-Distributed Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345387235:11(2147-2160)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3453872
Zhou YGong SYao FChen HYu SLiu PZhang YYu GYu J(2024)Fast Iterative Graph Computing with Updated Neighbor States2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00193(2449-2462)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00193
Show More Cited By

Index Terms

PrIter: a distributed framework for prioritized iterative computations
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

PrIter: A Distributed Framework for Prioritizing Iterative Computations

Iterative computations are pervasive among data analysis applications, including web search, online social network analysis, recommendation systems, and so on. These applications typically involve data sets of massive scale. Fast convergence of the ...
Set-valued mixed quasi-variational inequalities and implicit resolvent equations

In this paper, we introduce and study a new class of variational inequalities, which is called the set-valued mixed quasi-variational inequality. The resolvent operator technique is used to establish the equivalence among generalized set-valued mixed ...
Primal-dual row-action method for convex programming

We present a primal-dual row-action method for the minimization of a convex function subject to general convex constraints. Constraints are used one at a time, no changes are made in the constraint functions and their Jacobian matrix (thus, the row-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

October 2011

377 pages

ISBN:9781450309769

DOI:10.1145/2038916

Program Chairs:
Jeffrey S. Chase
Duke University
,
Amr El Abbadi
Univ of California, Santa Barbara

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Computing and Communication Foundations

Conference

SOCC '11

Sponsor:

SOCC '11: ACM Symposium on Cloud Computing in conjunction with SOSP 2011

October 26 - 28, 2011

Cascais, Portugal

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

84
Total Citations
View Citations
888
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)5

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao FTao QYu WZhang YGong SWang QYu GZhou J(2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
https://doi.org/10.14778/3632093.3632094
Yao FTao QLin SZhang YYu WGong SWang QYu GZhou J(2024)Towards Efficient Graph Processing in Geo-Distributed Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345387235:11(2147-2160)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3453872
Zhou YGong SYao FChen HYu SLiu PZhang YYu GYu J(2024)Fast Iterative Graph Computing with Updated Neighbor States2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00193(2449-2462)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00193
Gong STian CYin QWang ZYu SZhang YYu WGeng LFu CYu GZhou J(2024)Ingress: an automated incremental graph processing systemThe VLDB Journal10.1007/s00778-024-00838-z33:3(781-806)Online publication date: 20-Feb-2024
https://doi.org/10.1007/s00778-024-00838-z
YANG YYU HZHAO JZHANG YLIAO XJIANG XJIN HLIU HMAO FZHANG JWANG B(2023)An efficient hardware accelerator for monotonic graph algorithms on dynamic directed graphsSCIENTIA SINICA Informationis10.1360/SSI-2022-019153:8(1575)Online publication date: 15-Aug-2023
https://doi.org/10.1360/SSI-2022-0191
Miao XMa LYang ZShao YCui BYu LJiang J(2022)CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303810934:9(4119-4132)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TKDE.2020.3038109
Singh BSrivastava KGupta DChoudhury TUm J(2022)Indexing hs code- a hybrid indexer for an optimized search of geotagged dataSpatial Information Research10.1007/s41324-022-00473-231:1(1-13)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s41324-022-00473-2
Yu HJiang XZhao JQi HZhang YLiao XLiu HMao FJin H(2022)Toward High-Performance Delta-Based Iterative Processing with a Group-Based ApproachJournal of Computer Science and Technology10.1007/s11390-022-2101-137:4(797-813)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s11390-022-2101-1
Gong STian CYin QYu WZhang YGeng LYu SYu GZhou J(2021)Automating incremental graph processing with flexible memoizationProceedings of the VLDB Endowment10.14778/3461535.346155014:9(1613-1625)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.14778/3461535.3461550
Gévay GSoto JMarkl V(2021)Handling Iterations in Distributed Dataflow SystemsACM Computing Surveys10.1145/347760254:9(1-38)Online publication date: 8-Oct-2021
https://dl.acm.org/doi/10.1145/3477602
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten