Skip to main content
Log in

Peacock: a customizable MapReduce for multicore platform

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

MapReduce has been demonstrated to be a promising alternative to simplify parallel programming with high performance on single multicore machine. Compared to the cluster version, MapReduce does not have bottlenecks in disk and network I/O on single multicore machine, and it is more sensitive to characteristics of workloads. A single execution flow may be inefficient for many classes of workloads. For example, the fixed execution flow of the MapReduce program structure can impose significant overheads for workloads that inherently have only one emitted value per key, which are mainly caused by the unnecessary reduce phase. In this paper, we refine the workload characterization from Phoenix++ according to the attributes of key-value pairs, and give a demonstration that the refined workload characterization model covers all classes of MapReduce workloads. Based on the model, we propose a new MapReduce system with workload-customizable execution flow. The system, namely Peacock, is implemented on top of Phoenix++. Experiments with four different classes of benchmarks on a 16-core Intel-based server show that Peacock achieves better performance than Phoenix++ for workloads that inherently have only one emitted value per key (up to a speedup of \(3.6\times \)) while identical for other classes of workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. The apache software foundation. Hadoop. http://hadoop.apache.org

  2. Intel Corporation. Threading building blocks. http://www.threadingbuildingblocks.org

  3. Stanford University. The Phoenix system for mapreduce programming. http://mapreduce.stanford.edu

  4. Aviram A, Weng SC, Hu S, Ford B (2010) Efficient system-enforced deterministic parallelism. In: Proceedings of the 9th USENIX conference on operating systems design and implementation, OSDI’10USENIX Association, Berkeley, CA, USA, pp 1–16

  5. Bergan T, Anderson O, Devietti J, Ceze L, Grossman D (2010) Coredet: a compiler and runtime system for deterministic multithreaded execution. In: Proceedings of the fifteenth edition of ASPLOS on architectural support for programming languages and operating systems, ASPLOS XVACM, New York, NY, USA, pp 53–64

  6. Borkar S (2007) Thousand core chips: a technology perspective. In: Proceedings of the 44th annual design automation conference, DAC ’07ACM, New York, NY, USA, pp 746–749

  7. Chen R, Chen H, Zang B (2010) Tiled-mapreduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10ACM, New York, NY, USA, pp 523–534

  8. Coplien JO (1995) Curiously recurring template patterns. C++ Rep 7(2):24–27

  9. Dagum L, Menon R (1998) Openmp: an industry-standard api for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55

    Article  Google Scholar 

  10. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  11. Feng M, Gupta R, Hu Y (2011) Spicec: scalable parallelism via implicit copying and explicit commit. SIGPLAN Not 46(8):69–80

    Article  Google Scholar 

  12. He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08ACM, New York, NY, USA, pp 260–269

  13. Jiang W, Ravi VT, Agrawal G (2010) A map-reduce system with an alternate api for multi-core environments. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, CCGRID ’10IEEE Computer Society, Washington, DC, USA, pp 84–93

  14. Jim G. Sort benchmark home page. http://sortbenchmark.org

  15. Jin G, Zhang W, Deng D, Liblit B, Lu S (2012) Automated concurrency-bug fixing. In: Proceedings of the 10th USENIX conference on operating systems design and implementation, OSDI’12USENIX Association, Berkeley, CA, USA, pp 221–236

  16. Liu T, Curtsinger C, Berger ED (2011) Dthreads: efficient deterministic multithreading. In: Proceedings of the twenty-third ACM symposium on operating systems principles, SOSP ’11ACM, New York, NY, USA, pp 327–336

  17. Mao Y, Morris R, Kaashoek MF (2010) Optimizing mapreduce for multicore architectures. Technical report, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

  18. Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th international symposium on high performance computer architecture, HPCA ’07IEEE Computer Society, Washington, DC, USA, pp 13–24

  19. Talbot J, Yoo RM, Kozyrakis C (2011) Phoenix++: modular mapreduce for shared-memory systems. In: Proceedings of the second international workshop on MapReduce and Its Applications, MapReduce ’11ACM, New York, NY, USA, pp 9–16

  20. Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable mapreduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE international symposium on workload characterization (IISWC), IISWC ’09IEEE Computer Society, Washington, DC, USA, pp 198–207

  21. Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30(1):4:1–4:28

    Article  Google Scholar 

  22. Zhang W, Lim J, Olichandran R, Scherpelz J, Jin G, Lu S, Reps T (2011) Conseq: detecting concurrency bugs through sequential errors. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems, ASPLOS XVIACM, New York, NY, USA, pp 251–264

Download references

Acknowledgments

The research is supported by National Science Foundation of China under Grant No. 61232008, National 863 Hi-Tech Research and Development Program under Grant No. 2013AA01A213, Guangzhou Science and Technology Program under Grant 2012Y2-00040, Chinese Universities Scientific Fund under Grant No. 2013TS094, and Research Fund for the Doctoral Program of MOE under Grant No. 20110142130005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Wu.

Additional information

Note that Phoenix++ is the best available implementation of MapReduce on shared-memory multicore platform.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, S., Peng, Y., Jin, H. et al. Peacock: a customizable MapReduce for multicore platform. J Supercomput 70, 1496–1513 (2014). https://doi.org/10.1007/s11227-014-1238-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1238-2

Keywords

Navigation