Skip to main content
Log in

DMR: A Deterministic MapReduce for Multicore Systems

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

MapReduce has been shown promising to harness the multicore platform. Existing MapReduce libraries on multicore are written with shared-memory Pthreads, which introduce pervasive nondeterminism and might produce nondeterministic results if user-provided map or reduce functions are sensitive to the input order. We propose DMR, a deterministic MapReduce library, to ensure deterministic program behaviors no matter whether map/reduce function is sensitive to the input order. DMR adopts a round-robin scheduling of map tasks and a partitioned scheduling of reduce tasks to ensure deterministic scheduling. DMR is written with a deterministic message passing multithreaded model (DetMP) to provide Phoenix-like API, thus Phoenix workloads can be built and run on DMR with no or little change. Evaluation results by testing seven Phoenix workloads show that DMR only runs worse than Phoenix on an iterative MapReduce application kmeans, outperforms Phoenix between 1.42X and 3.33X faster on pca and word_count, and scales better than Phoenix on 3 of the rest 4 workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th OSDI, pp. 10–10. USENIX Association, Berkeley, CA, USA (2004)

  2. Ranger, C., Raghuraman, R., Penmetsa, A., et al.: Evaluating MapReduce for multi-core and multiprocessor systems. In: 13th HPCA, pp. 13–24. IEEE Computer Society, Washington, DC, USA (2007)

  3. Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In: IISWC ’09, pp. 198–207. IEEE Computer Society, Washington, DC, USA (2009)

  4. Mao, Y., Morris, R., Kaashoek, M.F.: Optimizing MapReduce for multicore architectures. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Technical report (2010)

  5. Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: 2nd MapReduce, pp. 9–16, New York, NY, USA, ACM (2011)

  6. Chen, R., Chen, H., Zang, B.: Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: 19th PACT, New York, NY, USA, ACM. pp. 523–534 (2010)

  7. Lu, M., Zhang, L., Huynh, H.P., et al.: Optimizing the MapReduce framework on Intel Xeon Phi coprocessor. In: BigData Congress ’2013, pp. 125–130 (October 2013)

  8. Csallner, C., Fegaras, L., Li, C.: New ideas track: testing MapReduce-style programs. In: ESEC/FSE ’11, pp. 504–507. ACM, New York, NY, USA (2011)

  9. Xiao, T., Zhang, J., Zhou, H., et al.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: 36th ICSE, pp. 44–53. ACM, New York, NY, USA (2014)

  10. Zhang, Y., Ford, B.: A virtual memory foundation for scalable deterministic parallelism. In: 2nd APSys (July 2011)

  11. Zhang, Y., Ford, B.: Lazy tree mapping: generalizing and scaling deterministic parallelism. In: 4th APSys (July 2013)

  12. Carter, J.B., Bennett, J.K., Zwaenepoel, W.: Implementation and performance of Munin. In: 2nd PPoPP (October 1991)

  13. Evans, J.: A scalable concurrent malloc (3) implementation for FreeBSD. In: BSDCan Conference, Ottawa, Canada (2006)

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61170018, 61229201, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Cao, H. DMR: A Deterministic MapReduce for Multicore Systems. Int J Parallel Prog 45, 128–141 (2017). https://doi.org/10.1007/s10766-015-0390-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0390-5

Keywords

Navigation