Load Balancing in MapReduce Based on Data Locality

Chen, Yi; Liu, Zhaobin; Wang, Tingting; Wang, Lu

doi:10.1007/978-3-319-11197-1_18

Yi Chen²⁴,
Zhaobin Liu²⁴,
Tingting Wang²⁴ &
…
Lu Wang²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8630))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2701 Accesses

Abstract

With explosive growth in data size at era of information, MapReduce - a programing mode, which can process data in parallel, has been widely used. However, the original system gradually exposes some shortcomings. For example, handling skewed data can cause the imbalance of the system loads. After mapper processes data, the result will be sent to reducer by partition function. An inappropriate partition algorithm may result in poor network quality, the overloading of some reducers and the extension of the execution time of job. In summary, using an inappropriate algorithm to process skewed data will form a negative impact on the system performance. In order to solve load imbalance problem and improve performance of cluster, we plan to design an effective partition algorithm to guide the process of assigning data. Therefore, we develop an algorithm named CLP - Cluster Locality Partition, this algorithm consists of three parts: Preprocess part, Data-Cluster part and Locality-Partition part. The experimental results illustrate that the algorithm proposed in this paper is better than the default partition algorithm in the aspects of execution time and load balancing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

CSRA: An Efficient Resource Allocation Algorithm in MapReduce Considering Data Skewness

Locality Aware MapReduce

A MapReduce-based K-means clustering algorithm

Article 20 September 2021

References

Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51, 107–113 (2008)
Article Google Scholar
Morton, K., Balazinska, M., Grossman, D.: Paratimer: A progress indicator for mapreduce dags. In: Proceedings of the, ACM SIGMOD International Conference on Management of Data, pp. 507–518. ACM (2010)
Google Scholar
Ferreira Cordeiro, R.L., Traina Junior, C., Machado Traina, A.J., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 690–698. ACM (2011)
Google Scholar
Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.: A platform for scalable one-pass analytics using mapreduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 985–996. ACM (2011)
Google Scholar
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260–269. ACM (2008)
Google Scholar
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522–533. IEEE (2012)
Google Scholar
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: A study of skew in mapreduce applications. Open Cirrus Summit (2011)
Google Scholar
Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1043–1052. ACM (2008)
Google Scholar
Xu, Y., Kostamaa, P.: Efficient outer join data skew handling in parallel dbms. Proceedings of the VLDB Endowment 2, 1390–1396 (2009)
Article Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, vol. 8, p. 7 (2008)
Google Scholar
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the, ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012)
Google Scholar
Vahdat, A., Al-Fares, M., Farrington, N., Mysore, R.N., Porter, G., Radhakrishnan, S.: Scale-out networking in the data center. IEEE Micro 30, 29–41 (2010)
Article Google Scholar
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)
Google Scholar
Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: A scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Computer Communication Review 39, 39–50 (2009)
Article Google Scholar
Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: Optimizing mapreduce on heterogeneous clusters. ACM SIGARCH Computer Architecture News 40, 61–74 (2012)
Article Google Scholar
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for mapreduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)
Google Scholar
Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 17–24. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, P.R. China
Yi Chen, Zhaobin Liu & Tingting Wang
China Academy of Civil Aviation Science and Technology, Beijing, 100028, P.R.China
Lu Wang

Authors

Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhaobin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Illinois Institute of Technology, 60616-3793, Chicago, IL, USA
Xian-he Sun
School of Computer Science and Technology, Dalian Maritime University, 1 Linghai Road, 116026, Dalian, China
Wenyu Qu
University of Ottawa, SEECS, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Deakin University, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Wanlei Zhou
Dalian Maritime University, NO.1 Linhai Road, 116026, Dailian, China
Zhiyang Li & Tingting Yang &
BeiHang University, XueYuan Road No.37,HaiDian District, Beijing, China
Hua Guo
University of Bradford, BD7 1DP, Bradford, West Yorkshire, United Kingdom
Geyong Min
Computer Network Information Center, Chinese Academy of Sciences, 100190, Beijing, China
Yulei Wu
27 Shanda Nanlu, 250100, Jinan City, Shandong Province, China
Lei Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Liu, Z., Wang, T., Wang, L. (2014). Load Balancing in MapReduce Based on Data Locality. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8630. Springer, Cham. https://doi.org/10.1007/978-3-319-11197-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-11197-1_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11196-4
Online ISBN: 978-3-319-11197-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics