skip to main content
10.1145/2380445.2380463acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Published:07 October 2012Publication History

ABSTRACT

The development of automatic parallelization techniques has been fascinating researchers for decades. This has resulted in a significant amount of tools, which should relieve the designer from the burden of manually parallelizing an application. However, most of these tools only focus on minimizing execution time which drastically reduces their applicability to embedded devices. It is essential to find good trade-offs between different objectives like, e.g., execution time, energy consumption, or communication overhead, if applications should be parallelized for embedded multiprocessor system-on-chip (MPSoC) devices. Another important aspect which has to be taken into account is the streaming-based structure found in many embedded applications such as multimedia and network services. The best way to parallelize these applications is to extract pipeline parallelism. Therefore, this paper presents the first multi-objective aware approach exploiting pipeline parallelism automatically to make it most suitable for resource-restricted embedded devices. We have compared the new pipeline parallelization approach to an existing task-level extraction technique. The evaluation has shown that the new approach extracts very efficient multi-objective aware parallelism. In addition, the two approaches have been combined and it could be shown that both approaches perfectly complement each other.

References

  1. L. Benini, D. Bertozzi, A. Bogliolo, et al. MPARM: Exploring the Multi-Processor SoC Design Space with SystemC. Journal of VLSI Signal Processing Systems 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. U. Bondhugula, A. Hartono, J. Ramanujam, et al. A practical automatic polyhedral parallelizer and locality optimizer. In Proc. of PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Ceng, J. Castrillon, W. Sheng, et al. MAPS: an integrated framework for MPSoC application parallelization. In Proc. of DAC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Chandra, D.-K. Chen, et al. Data distribution support on distributed shared memory multiprocessors. ACM SIGPLAN Notices, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cho and R. G. Melhem. On the Interplay of Parallelization, Program Performance, and Energy Consumption. IEEE Trans. Parallel Distrib. Syst., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cordes, A. Heinig, P. Marwedel, et al. Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming. In Proc. of ICPADS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Cordes and P. Marwedel. Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms. In Proc. of DATE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Cordes, P. Marwedel, and A. Mallik. Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming. In Proc. of CODES/ISSS. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Franke and M. O'Boyle. Compiler parallelization of C programs for multi-core DSPs with multiple address spaces. In Proc. of CODES+ISSS. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proc. of ASPLOS-XII. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. H. Hall, S. P. Amarasinghe, B. R. Murphy, et al. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proc. of Supercomputing, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. W. Hall, J. M. Anderson, S. P. Amarasinghe, et al. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29(12), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. S. Lam and R. P. Wilson. Limits of Control Flow on Parallelism. In ISCA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. G. Lee. UTDSP Benchmark Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html, April 2012.Google ScholarGoogle Scholar
  15. C. Lengauer. Loop Parallelization in the Polytope Model. In CONCUR '93, Lecture Notes in Computer Science 715. Springer-Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Liu, Z. Shao, M. Wang, et al. Optimal loop parallelization for maximizing iteration-level parallelism. In Proc. of CASES, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Nikolov, M. Thompson, T. Stefanov, et al. Daedalus: Toward composable multimedia MP-SoC design. In Proc. of DAC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Ottoni, R. Rangan, A. Stoler, et al. Automatic Thread Extraction with Decoupled Software Pipelining. In Proc. of MICRO 38, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Qiu, J.-W. Niu, L. T. Yang, et al. Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures. In Proc. of GreenCom, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Raman, G. Ottoni, A. Raman, et al. Parallel-stage decoupled software pipelining. In Proc. of CGO. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Sarkar. Automatic partitioning of a program dependence graph into parallel tasks. IBM Journal of Research and Development, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proc. of CC. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Tournavitis and B. Franke. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proc. of PACT. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Tournavitis, Z. Wang, B. Franke, et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machinelearning based mapping. In Proc. of PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Verdoolaege, H. Nikolov, and T. Stefanov. pn: A Tool for Improved Derivation of Process Networks. EURASIP Journal on Embedded Systems, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Wang, H. Liu, D. Liu, et al. Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip. ACM Trans. Des. Autom. Electron. Syst., 16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Wehmeyer and P. Marwedel. Fast, Efficient and Predictable Memory Accesses. Springer-Verlag New York, Inc., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. E. Wolf and M. S. Lam. A Loop Transformation Theory and an Algorithm to Maximize Parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CODES+ISSS '12: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
          October 2012
          596 pages
          ISBN:9781450314268
          DOI:10.1145/2380445

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CODES+ISSS '12 Paper Acceptance Rate48of163submissions,29%Overall Acceptance Rate280of864submissions,32%

          Upcoming Conference

          ESWEEK '24
          Twentieth Embedded Systems Week
          September 29 - October 4, 2024
          Raleigh , NC , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader