DMR: A Deterministic MapReduce for Multicore Systems

Zhang, Yu; Cao, Huifang

doi:10.1007/s10766-015-0390-5

DMR: A Deterministic MapReduce for Multicore Systems

Published: 06 October 2015

Volume 45, pages 128–141, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Yu Zhang¹ &
Huifang Cao¹

319 Accesses
2 Citations
Explore all metrics

Abstract

MapReduce has been shown promising to harness the multicore platform. Existing MapReduce libraries on multicore are written with shared-memory Pthreads, which introduce pervasive nondeterminism and might produce nondeterministic results if user-provided map or reduce functions are sensitive to the input order. We propose DMR, a deterministic MapReduce library, to ensure deterministic program behaviors no matter whether map/reduce function is sensitive to the input order. DMR adopts a round-robin scheduling of map tasks and a partitioned scheduling of reduce tasks to ensure deterministic scheduling. DMR is written with a deterministic message passing multithreaded model (DetMP) to provide Phoenix-like API, thus Phoenix workloads can be built and run on DMR with no or little change. Evaluation results by testing seven Phoenix workloads show that DMR only runs worse than Phoenix on an iterative MapReduce application kmeans, outperforms Phoenix between 1.42X and 3.33X faster on pca and word_count, and scales better than Phoenix on 3 of the rest 4 workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Peacock: a customizable MapReduce for multicore platform

Article 21 June 2014

MapReduce Parallel Programming Model: A State-of-the-Art Survey

Article 29 October 2015

HadoopM: A Message-Enabled Data Processing System on Large Clusters

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th OSDI, pp. 10–10. USENIX Association, Berkeley, CA, USA (2004)
Ranger, C., Raghuraman, R., Penmetsa, A., et al.: Evaluating MapReduce for multi-core and multiprocessor systems. In: 13th HPCA, pp. 13–24. IEEE Computer Society, Washington, DC, USA (2007)
Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In: IISWC ’09, pp. 198–207. IEEE Computer Society, Washington, DC, USA (2009)
Mao, Y., Morris, R., Kaashoek, M.F.: Optimizing MapReduce for multicore architectures. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Technical report (2010)
Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: 2nd MapReduce, pp. 9–16, New York, NY, USA, ACM (2011)
Chen, R., Chen, H., Zang, B.: Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: 19th PACT, New York, NY, USA, ACM. pp. 523–534 (2010)
Lu, M., Zhang, L., Huynh, H.P., et al.: Optimizing the MapReduce framework on Intel Xeon Phi coprocessor. In: BigData Congress ’2013, pp. 125–130 (October 2013)
Csallner, C., Fegaras, L., Li, C.: New ideas track: testing MapReduce-style programs. In: ESEC/FSE ’11, pp. 504–507. ACM, New York, NY, USA (2011)
Xiao, T., Zhang, J., Zhou, H., et al.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: 36th ICSE, pp. 44–53. ACM, New York, NY, USA (2014)
Zhang, Y., Ford, B.: A virtual memory foundation for scalable deterministic parallelism. In: 2nd APSys (July 2011)
Zhang, Y., Ford, B.: Lazy tree mapping: generalizing and scaling deterministic parallelism. In: 4th APSys (July 2013)
Carter, J.B., Bennett, J.K., Zwaenepoel, W.: Implementation and performance of Munin. In: 2nd PPoPP (October 1991)
Evans, J.: A scalable concurrent malloc (3) implementation for FreeBSD. In: BSDCan Conference, Ottawa, Canada (2006)

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61170018, 61229201, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Yu Zhang & Huifang Cao

Authors

Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Cao, H. DMR: A Deterministic MapReduce for Multicore Systems. Int J Parallel Prog 45, 128–141 (2017). https://doi.org/10.1007/s10766-015-0390-5

Download citation

Received: 30 March 2015
Accepted: 20 May 2015
Published: 06 October 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10766-015-0390-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DMR: A Deterministic MapReduce for Multicore Systems

Abstract

Access this article

Similar content being viewed by others

Peacock: a customizable MapReduce for multicore platform

MapReduce Parallel Programming Model: A State-of-the-Art Survey

HadoopM: A Message-Enabled Data Processing System on Large Clusters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DMR: A Deterministic MapReduce for Multicore Systems

Abstract

Access this article

Similar content being viewed by others

Peacock: a customizable MapReduce for multicore platform

MapReduce Parallel Programming Model: A State-of-the-Art Survey

HadoopM: A Message-Enabled Data Processing System on Large Clusters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation