Abstract
Iterative computing is pervasive in web applications, data mining and scientific computing. Many parallel algorithms for such applications are synchronous algorithms which need strict synchronization between iterations to ensure their correctness, making the performance sensitive to computational skews in each iteration. Current load balancing approaches may alleviate the effect of computational skew, but cannot completely solve the problem. As a result, for many applications, the skews in each iteration still exist and they are accumulated, seriously affecting the completion time of these applications. In this paper, we propose an effective approach to make synchronous iterative computing applications themselves have the ability to tolerate the negative effects of unresolved computational skews. This approach divides a large computational task in a computing node or worker into a number of sub-tasks which only depend on the states of a few objects from the previous iteration. This allows the sub-tasks in subsequent iterations to proceed in advance whenever the states of related data objects are available. Consequently, the idle time caused by strict synchronization is reduced and the overall performance is thus enhanced. Experimental results show that this approach can improve the overall performance by up to \(2.45\times \) in comparison with the state-of-the-art approaches.













Similar content being viewed by others
References
Zhang Y, Gao Q, Gao L, Wang C (2012) Accelerate large-scale iterative computation through asynchronous accumulative updates. In: Proceedings of the 3rd workshop on Scientific Cloud Computing Date. ACM, Delft, Netherlands, pp 13–22
Kambatla K, Rapolu N, Jagannathan S, Grama A (2010) Asynchronous algorithms in mapreduce. In Proceedings of the 2010 IEEE international conference on cluster computing. IEEE Computer society, Heraklion, Crete, Greece, pp 245–254
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. In: Proceedings of the 26th conference on uncertainty in artificial intelligence. AUAI, Los Angeles, CA, USA, pp 1–10
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Zhang Y, Gao Q, Gao L, Wang C (2011) Priter: a distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM symposium on cloud computing. ACM, Cascais, Portugal, pp 1–13
Byna S, Chou J, Rübel O, Karimabadi H, Daughton WS, Roytershteyn V, Bethel E, Howison M, Hsu K-J, Lin K-W et al (2012) Parallel i/o, analysis, and visualization of a trillion particle simulation. In: Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–12
Banerjee S, Agarwal N (2012) Analyzing collective behavior from blogs using swarm intelligence. Knowl Inf Syst 33(3):523–547
Wang G, Salles MV, Sowell B, Wang X, Cao T, Demers A, Gehrke J, White W (2010) Behavioral simulations in mapreduce. Proc VLDB Endow 3(1):952–963
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041
Alimi J-M, Bouillot V, Rasera Y, Reverdy V, Corasaniti P-S, Balmes I, Requena S, Delaruelle X, Richet J-N (2012) First-ever full observable universe simulation. In: Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–11
Makino J, Daisaka H (2012) Grape-8: An accelerator for gravitational n-body simulation with 20.5gflops/w performance. In Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–10
Kwon Y, Balazinska M, Howe B, Rolia J (2010) Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In Proceedings of the 1st ACM symposium on Cloud computing. ACM, Indianapolis, IN, USA, pp 75–86
Lifflander J, Krishnamoorthy S, Kale LV (2012) Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st international ACM symposium on high-performance parallel and distributed computing. ACM, Delft, the Netherlands, pp 137–148
Zhang Y, Gao Q, Gao L, Wang C (2011) imapreduce: a distributed computing framework for iterative computation. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum. IEEE Computer society, Anchorage, Alaska, USA, pp 1112–1121
Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1):285–296
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In Proceedings of the 19th International ACM symposium on high performance distributed computing. ACM, Chicago, Illinois, USA, pp 810–818
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Power R, Li J (2010) Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX conference on Operating systems design and implementation. USENIX Association, Vancouver, BC, Canada, pp 1–14
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association, Berkeley, CA, USA, pp 1–10
Murray DG, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S (2011) Ciel: a universal execution engine for distributed data-flow computing. In: Proceedings of the 8th USENIX conference on networked systems design and implementation. USENIX Association, Boston, MA, USA, pp 1–9
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, Indianapolis, IN, USA, pp 135–146
Pearce O, Gamblin T, de Supinski BR, Schulz M, Amato NM (2012) Quantifying the effectiveness of load balance algorithms. In: Proceedings of the 26th ACM international conference on supercomputing. ACM, Venice, Italy, pp 185–194
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX conference on operating systems design and implementation. USENIX Association, Hollywood, CA, USA, pp 17–30
Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX conference on Operating systems design and implementation. USENIX Association, Vancouver, BC, Canada, pp 1–16
Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, Scottsdale, AZ, USA, pp 25–36
Couzin ID, Krause J, Franks NR, Levin SA (2005) Effective leadership and decision-making in animal groups on the move. Nature 433(7025):513–516
Raney B, Nagel K (2004) Iterative route planning for large-scale modular transportation simulations. Future Gener Comput Syst 20(7):1101–1118
TS etc. (2012) Biological modeling and simulation. http://zool33.uni-graz.at/schmickl/index.html
Schrank D, Eisele B, Lomax T (2012) Tti’s 2012 urban mobility report. In: Proceedings of the 2012 annual urban mobility report. Texas A&M Transportation Institute, Texas, USA
Acknowledgments
This work was supported by National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, China National Natural Science Foundation under Grant No. 61322210, 61272408, Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048 and Natural Science Foundation of Hubei under Grant No. 2012FFA007.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Y., Liao, X., Jin, H. et al. AsyIter: tolerating computational skew of synchronous iterative applications via computing decomposition. Knowl Inf Syst 41, 379–400 (2014). https://doi.org/10.1007/s10115-014-0748-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0748-9