ABSTRACT
The development of automatic parallelization techniques has been fascinating researchers for decades. This has resulted in a significant amount of tools, which should relieve the designer from the burden of manually parallelizing an application. However, most of these tools only focus on minimizing execution time which drastically reduces their applicability to embedded devices. It is essential to find good trade-offs between different objectives like, e.g., execution time, energy consumption, or communication overhead, if applications should be parallelized for embedded multiprocessor system-on-chip (MPSoC) devices. Another important aspect which has to be taken into account is the streaming-based structure found in many embedded applications such as multimedia and network services. The best way to parallelize these applications is to extract pipeline parallelism. Therefore, this paper presents the first multi-objective aware approach exploiting pipeline parallelism automatically to make it most suitable for resource-restricted embedded devices. We have compared the new pipeline parallelization approach to an existing task-level extraction technique. The evaluation has shown that the new approach extracts very efficient multi-objective aware parallelism. In addition, the two approaches have been combined and it could be shown that both approaches perfectly complement each other.
- L. Benini, D. Bertozzi, A. Bogliolo, et al. MPARM: Exploring the Multi-Processor SoC Design Space with SystemC. Journal of VLSI Signal Processing Systems 2005. Google ScholarDigital Library
- U. Bondhugula, A. Hartono, J. Ramanujam, et al. A practical automatic polyhedral parallelizer and locality optimizer. In Proc. of PLDI, 2008. Google ScholarDigital Library
- J. Ceng, J. Castrillon, W. Sheng, et al. MAPS: an integrated framework for MPSoC application parallelization. In Proc. of DAC, 2008. Google ScholarDigital Library
- R. Chandra, D.-K. Chen, et al. Data distribution support on distributed shared memory multiprocessors. ACM SIGPLAN Notices, 1997. Google ScholarDigital Library
- S. Cho and R. G. Melhem. On the Interplay of Parallelization, Program Performance, and Energy Consumption. IEEE Trans. Parallel Distrib. Syst., 2010. Google ScholarDigital Library
- D. Cordes, A. Heinig, P. Marwedel, et al. Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming. In Proc. of ICPADS, 2011. Google ScholarDigital Library
- D. Cordes and P. Marwedel. Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms. In Proc. of DATE, 2012.Google ScholarCross Ref
- D. Cordes, P. Marwedel, and A. Mallik. Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming. In Proc. of CODES/ISSS. ACM, 2010. Google ScholarDigital Library
- B. Franke and M. O'Boyle. Compiler parallelization of C programs for multi-core DSPs with multiple address spaces. In Proc. of CODES+ISSS. ACM, 2003. Google ScholarDigital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proc. of ASPLOS-XII. ACM, 2006. Google ScholarDigital Library
- M. H. Hall, S. P. Amarasinghe, B. R. Murphy, et al. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proc. of Supercomputing, 1995. Google ScholarDigital Library
- M. W. Hall, J. M. Anderson, S. P. Amarasinghe, et al. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29(12), 1996. Google ScholarDigital Library
- M. S. Lam and R. P. Wilson. Limits of Control Flow on Parallelism. In ISCA, 1992. Google ScholarDigital Library
- C. G. Lee. UTDSP Benchmark Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html, April 2012.Google Scholar
- C. Lengauer. Loop Parallelization in the Polytope Model. In CONCUR '93, Lecture Notes in Computer Science 715. Springer-Verlag, 1993. Google ScholarDigital Library
- D. Liu, Z. Shao, M. Wang, et al. Optimal loop parallelization for maximizing iteration-level parallelism. In Proc. of CASES, 2009. Google ScholarDigital Library
- H. Nikolov, M. Thompson, T. Stefanov, et al. Daedalus: Toward composable multimedia MP-SoC design. In Proc. of DAC, 2008. Google ScholarDigital Library
- G. Ottoni, R. Rangan, A. Stoler, et al. Automatic Thread Extraction with Decoupled Software Pipelining. In Proc. of MICRO 38, 2005. Google ScholarDigital Library
- M. Qiu, J.-W. Niu, L. T. Yang, et al. Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures. In Proc. of GreenCom, 2010. Google ScholarDigital Library
- E. Raman, G. Ottoni, A. Raman, et al. Parallel-stage decoupled software pipelining. In Proc. of CGO. ACM, 2008. Google ScholarDigital Library
- V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, 1989. Google ScholarDigital Library
- V. Sarkar. Automatic partitioning of a program dependence graph into parallel tasks. IBM Journal of Research and Development, 1991. Google ScholarDigital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. of CC. Springer, 2002. Google ScholarDigital Library
- G. Tournavitis and B. Franke. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proc. of PACT. ACM, 2010. Google ScholarDigital Library
- G. Tournavitis, Z. Wang, B. Franke, et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machinelearning based mapping. In Proc. of PLDI, 2009. Google ScholarDigital Library
- S. Verdoolaege, H. Nikolov, and T. Stefanov. pn: A Tool for Improved Derivation of Process Networks. EURASIP Journal on Embedded Systems, 2007. Google ScholarDigital Library
- Y. Wang, H. Liu, D. Liu, et al. Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip. ACM Trans. Des. Autom. Electron. Syst., 16, 2011. Google ScholarDigital Library
- L. Wehmeyer and P. Marwedel. Fast, Efficient and Predictable Memory Accesses. Springer-Verlag New York, Inc., 2006. Google ScholarDigital Library
- M. E. Wolf and M. S. Lam. A Loop Transformation Theory and an Algorithm to Maximize Parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4), 1991. Google ScholarDigital Library
Index Terms
- Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Recommendations
Multi-objective aware extraction of task-level parallelism using genetic algorithms
DATE '12: Proceedings of the Conference on Design, Automation and Test in EuropeA large amount of research work has been done in the area of automatic parallelization for decades, resulting in a huge amount of tools, which should relieve the designer from the burden of manually parallelizing an application. Unfortunately, most of ...
On-the-Fly Pipeline Parallelism
Special Issue for SPAA 2013Pipeline parallelism organizes a parallel program as a linear sequence of stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have ...
On-the-fly pipeline parallelism
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architecturesPipeline parallelism organizes a parallel program as a linear sequence of s stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have ...
Comments