Skip to main content
Log in

SMART: Speedup Job Completion Time by Scheduling Reduce Tasks

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Distributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion. Job completion time (JCT) is a major metric for assessing their effectiveness. How to reduce the JCT for these systems through reasonable scheduling has become a hot issue in both industry and academia. Data skew is a common phenomenon that can compromise the performance of such distributed computing systems. This paper proposes SMART, which can effectively reduce the JCT through handling the data skew during the reducing phase. SMART predicts the size of reduce tasks based on part of the completed map tasks and then enforces largest-first scheduling in the reducing phase according to the predicted reduce task size. SMART makes minimal modifications to the original Hadoop with only 20 additional lines of code and is readily deployable. The robustness and the effectiveness of SMART have been evaluated with a real-world cluster against a large number of datasets. Experiments show that SMART reduces JCT by up to 6.47%, 9.26%, and 13.66% for Terasort, WordCount and InvertedIndex respectively with the Purdue MapReduce benchmarks suite (PUMA) dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Grandl R, Kandula S, Rao S, Akella A, Kulkarni J. Graphene: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. the 12th USENIX Conference on Operating Systems Design and Implementation, Nov. 2016, pp.81-97.

  2. Chang H, Kodialam M, Kompella R R, Lakshman T V, Lee M, Mukherjee S. Scheduling in MapReduce-like systems for fast completion time. In Proc. the 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, Apr. 2011, pp.3074-3082. DOI: 10.1109/IN-FCOM.2011.5935152.

  3. Peng Y, Chen K, Wang G, Bai W, Zhao Y, Wang H, Geng Y, Ma Z, Gu L. Towards comprehensive traffic forecasting in cloud computing: Design and application. IEEE/ACM Transactions on Networking, 2016, 24(4): 2210-2222. DOI: https://doi.org/10.1109/TNET.2015.2458892.

    Article  Google Scholar 

  4. Ullah I, Khan M S, Amir M, Kim J, Kim S M. LSTPD: Least slack time-based preemptive deadline constraint scheduler for Hadoop clusters. IEEE Access, 2020, 8: 111751-111762. DOI: https://doi.org/10.1109/ACCESS.2020.3002565.

    Article  Google Scholar 

  5. Gao Y, Zhou Y, Zhou B, Shi L, Zhang J. Handling data skew in MapReduce cluster by using partition tuning. Journal of Healthcare Engineering, 2017, 2017: Article No. 1425102. DOI: https://doi.org/10.1155/2017/1425102.

  6. Hammoud M, Sakr M F. Locality-aware reduce task scheduling for MapReduce. In Proc. the 3rd IEEE International Conference on Cloud Computing Technology and Science, Nov. 29-Dec. 1, 2011, pp.570-576. DOI: 10.1109/CloudCom.2011.87.

  7. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Commun. ACM, 2008, 51(1): 107-113. DOI: https://doi.org/10.1145/1327452.1327492.

    Article  Google Scholar 

  8. Ahmad F, Lee S, Thottethodi M, Vijaykumar T. PUMA: Purdue MapReduce benchmarks suite. Technical Report, Purdue University, 2012. https://engineering.purdue.edu/~puma/puma.pdf, May 2022.

  9. Graham R L. Bounds for certain multiprocessing anomalies. The Bell System Technical Journal, 1966, 45(9): 1563-1581. DOI: https://doi.org/10.1002/j.1538-7305.1966.tb01709.x.

    Article  MATH  Google Scholar 

  10. Mosharaf C, Ion S. Coow: A networking abstraction for cluster applications. In Proc. the 11th ACM Workshop on Hot Topics in Networks, Oct. 2012, pp.31-36. DOI: 10.1145/2390231.2390237.

  11. Kwon Y, Balazinska M, Howe B, Rolia J. Skew-Tune: Mitigating skew in MapReduce applications. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.25-36. DOI: 10.1145/2213836.2213840.

  12. Chowdhury M, Zhong Y, Stoica I. Efficient coow scheduling with Varys. ACM SIGCOMM Comput. Commun. Rev., 2014, 44(4): 443-454. DOI: https://doi.org/10.1145/2740070.2626315.

    Article  Google Scholar 

  13. Chowdhury M, Stoica I. Efficient coow scheduling without prior knowledge. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, Aug. 2015, pp.393-406. DOI: 10.1145/2785956.2787480.

  14. Mao H, Schwarzkopf M, Venkatakrishnan S B, Meng Z, Alizadeh M. Learning scheduling algorithms for data processing clusters. In Proc. the ACM Special Interest Group on Data Communication, Aug. 2019, pp.270-288. DOI: https://doi.org/10.1145/3341302.3342080.

  15. Nguyen K, Wang K, Bu Y, Fang L, Hu J, Xu G. FACADE: A compiler and runtime for (almost) object-bounded big data applications. In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2015, pp.675-690. DOI: 10.1145/2694344.2694345.

  16. Nguyen K, Fang L, Xu G, Demsky B, Lu S, Alamian S, Mutlu O. Yak: A high-performance big-data-friendly garbage collector. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.349-365.

  17. Rasmussen A, Lam V T, Conley M, Porter G, Kapoor R, Vahdat A. Themis: An I/O-efficient MapReduce. In Proc. the 3rd ACM Symposium on Cloud Computing, Oct. 2012, Article No. 13. DOI: https://doi.org/10.1145/2391229.2391242.

  18. Rao S, Ramakrishnan R, Silberstein A, Ovsiannikov M, Reeves D. Sailfish: A framework for large scale data processing. In Proc. the 3rd ACM Symposium on Cloud Computing, Oct. 2012, Article No. 4. DOI: https://doi.org/10.1145/2391229.2391233.

  19. Zhang H, Cho B, Seyfe E, Ching A, Freedman M J. Riffle: Optimized shuffle service for large-scale data analytics. In Proc. the 13th EuroSys Conference, Apr. 2018, Article No. 43. DOI: https://doi.org/10.1145/3190508.3190534.

  20. Zaharia M, Borthakur D, Sarma S J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proc. the 5th European Conference on Computer Systems, Apr. 2010, pp.265-278. DOI: 10.1145/1755913.1755940.

  21. Ibrahim S, Jin H, Lu L, He B, Antoniu G, Wu S. Maestro: Replica-aware map scheduling for MapReduce. In Proc. the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2012, pp.435-442. DOI: 10.1109/CCGrid.2012.122.

  22. Tang Z, Jiang L, Zhou J, Li K, Li K. A self-adaptive scheduling algorithm for reduce start time. Future Generation Computer Systems, 2015, 43/44: 51-60. DOI: https://doi.org/10.1016/j.future.2014.08.011.

    Article  Google Scholar 

  23. Ibrahim S, Jin H, Lu L, Wu S, He B, Qi L. LEEN: Locality/fairness-aware key partitioning for MapReduce in the cloud. In Proc. the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, pp.17-24. DOI: 10.1109/CloudCom.2010.25.

  24. Tan J, Meng X, Zhang L. Coupling task progress for MapReduce resource-aware scheduling. In Proc. the 2013 IEEE INFOCOM, Apr. 2013, pp.1618-1626. DOI: 10.1109/INFCOM.2013.6566958.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wan-Chun Dou.

Supplementary Information

ESM 1

(PDF 155 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, JQ., He, ZH., Gong, YY. et al. SMART: Speedup Job Completion Time by Scheduling Reduce Tasks. J. Comput. Sci. Technol. 37, 763–778 (2022). https://doi.org/10.1007/s11390-022-2118-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-2118-5

Keywords

Navigation