skip to main content
research-article

Parallel programming patterns for multi-processor SoC: Application to video processing

Published:21 March 2013Publication History
Skip Abstract Section

Abstract

Efficient, scalable and productive parallel programming is a major challenge for exploiting the future multi-processor SoC platforms. This article presents the MultiFlex programming environment which has been developed to address this challenge. It is targeted for use on Platform 2012, a scalable multi-processor fabric. The MultiFlex environment supports high-level simulation, iterative platform mapping, and includes tools for programming model aware debug, trace, visualization and analysis.

This article focuses on the two classes of programming abstractions supported in MultiFlex. The first is a set of Parallel Programming Patterns (PPP) which offer a rich set of programming abstractions for implementing efficient data- and task-level parallel applications. The second is a Reactive Task Management (RTM) abstraction, which offers a lightweight C-based API to support dynamic dispatching of small grain tasks on tightly coupled parallel processing resources.

The use of the MultiFlex native programming model is illustrated through the capture and mapping of two representative video applications. The first is a high-quality rescaling (HQR) application on a multi-processor platform. We present the details of the optimization process which was required for mapping the HQR application, for which the reference code requires 350 GIPS (giga instructions per second), onto a 16 processor cluster. Our results show that the parallel implementation using the PPP model offers almost linear acceleration with respect to the number of processing elements.

The second application is a high-definition VC-1 decoder. For this application, we illustrate two different parallel programming model variants, one using PPPs, the other based on RTM. These two versions are mapped onto two variants of a homogeneous version of the Platform 2012 multi-core fabric.

References

  1. Benini, L., Flamand, E., Fuin, D., and Melpignano, D. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test Conference. 983--987.Google ScholarGoogle Scholar
  2. Ferrer, R., Bellens, P., Beltran, V., Gonzalez, M., Martorell, X., Badia, R. M., and Ayguade, E. 2010, Parallel programming models for heterogeneous multicore architectures. IEEE Micro, 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gamma, E, Helm. R., Johnson, R., and Vlissides, J. M. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Intel. 2011a. CILK plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google ScholarGoogle Scholar
  5. Intel. 2011b. Array building blocks. http://software.intel.com/en-us/articles/intel-array-buildingblocks/.Google ScholarGoogle Scholar
  6. Intel. 2011c. Threading building blocks, http://threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  7. Khronos 2013, Khronos OpenCL. http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  8. Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G. Clermidy, F., and Dutoit, D. 2012. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Microsoft Corporation. 2006. VC-1 technical overview.Google ScholarGoogle Scholar
  10. OW2 Consortium 2011. The MIND Project. http://mind.ow2.org.Google ScholarGoogle Scholar
  11. Paulin, P. G., Benny, B., Langevin, M., Bouchebaba, Y., Pilkington, C., Lavigueur, B., Lo, D., Gagne, V., and Metzger, M. 2010. MPSoC Platform Mapping Tools for Data-Dominated Applications. In Model-Based Design for Embedded Systems, G. Nicolescu G. and P. Mosterman, Eds., CRC Press, 2010.Google ScholarGoogle Scholar
  12. Paulin, P. G., Pilkington, C., et al. 2006, Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia. IEEE Trans. VLSI Syst. 14, 7, 667--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. STMicroelectronics and CEA 2010. Platform 2012: A many-core programmable accelerator for ultra-efficient embedded computing in nanometer technology. http://www.cmc.ca/en/WhatWeOffer/Prototyping/~/media/WhatWeOffer/TechPub/20101105_Whitepaper_Final.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Parallel programming patterns for multi-processor SoC: Application to video processing

      Recommendations

      Reviews

      Denis John Reilly

      Parallel computing emerged during the 1980s and, although multiprocessor and multicomputer architectures offered the potential of dramatic increases in processing power, the potential was rarely realized due to lack of programming tool support. Fast-forward 25 years, and we see multiprocessor architectures based on multicore processors, combining system-on-a-chip (SoC) and network-on-a-chip (NoC). Such architectures have tremendous computational capabilities, but-as you will learn from this paper-the key to unleash these capabilities lies in the programming abstractions and tool support. This paper essentially describes the Multiflex programming environment, which is aimed at multiprocessor SoC platforms. It is an enlightening read for anyone interested in the current state-of-the-art developments in parallel programming applications. The application area considered is that of video processing; some knowledge of this area is required for a full understanding of the paper. The Multiflex native programming model is described to include reactive task management (RTM), which provides a programming abstraction to achieve fine-grained task parallelism, and parallel programming patterns (PPP), which provide a library for fine-tuning the mapping of components onto physical resources using patterns. The rest of the paper then considers two realistic video processing applications to illustrate the use of Multiflex on the Platform P2012 computing fabric. According to the authors, "the first application is a high-quality rescaling (HQR) application [that is] used in most TV appliances." The second application is a VC-1 video decoder, which is a proprietary video format developed by Microsoft. The RTM and PPP programming abstractions are used to implement different instances of each of the video processing applications. The pros and cons of using each abstraction are discussed in relation to performance and tuning with the aid of optimization and trace tool support provided through Multiflex. Overall, this is a very good paper that is thorough and complete in every way. It is a must-read for anyone interested in parallel programming tools. It certainly rekindled my interest in parallel programming tool support. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 12, Issue 1s
        Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
        March 2013
        701 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/2435227
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2013
        • Accepted: 1 May 2012
        • Revised: 1 February 2012
        • Received: 1 October 2011
        Published in tecs Volume 12, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader