Abstract
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives specific Polyhedral Process Networks (PPN) parallel specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications, for example, multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which cannot be expressed as SANLPs. Therefore, in order to handle dynamic multimedia applications, in this article we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way.
The main contribution of this article is a first approach for automated translation of affine nested loop programs with dynamic loop bounds into input-output equivalent Polyhedral Process Networks. In addition, we present a method for analyzing the execution overhead introduced in the PPNs derived from programs with dynamic loop bounds. The presented automated translation approach has been evaluated by deriving a PPN parallel specification from a real-life application called Low Speed Obstacle Detection (LSOD) used in the smart cameras domain. By executing the derived PPN, we have obtained results which indicate that the approach we present in this article facilitates efficient parallel implementations of sequential nested loop programs with dynamic loop bounds. That is, our approach reveals the possible parallelism available in such applications, which allows for the utilization of multiple cores in an efficient way.
- Arulampalam, S. and Maskell, S. 2002. A tutorial of partical filter for on-line non-linear/non-Gaussian Bayesian tracking. IEEE Trans. Sig. Process. 68--73. Google ScholarDigital Library
- Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., and Bastoul, C. 2010. The polyhedral model is more widely applicable than you think. In Proceedings of ETAPS CC'10. Google ScholarDigital Library
- Castrillon, J., et al. 2010. Trace-based KPN composability analysis for mapping simultaneous applications to MPsoc platforms. In Proceedings of DATE'10. Google ScholarDigital Library
- Collard, J.-F., Barthou, D., and Feautrier, P. 1995. Fuzzy array dataflow analysis. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, 92--101. Google ScholarDigital Library
- de Kock, E. 2002. Multiprocessor mapping of process networks: A JPEG decoding case study. In Proceedings of the 15th International Symposium on System Synthesis (ISSS'02), 68--73. Google ScholarDigital Library
- Dwivedi, B., et al. 2004. Automatic synthesis of system on chip multiprocessor architectures for process networks. In Proceedings of the CODES+ISSS. Google ScholarDigital Library
- Farago, T. 2009. A framework for heterogeneous desktop parallel computing. M.S. thesis, LERC, LIACS.Google Scholar
- Feautrier, P. 1988. Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3, 243--268.Google ScholarCross Ref
- Feautrier, P. 1991. Dataflow analysis of scalar and array references. Para. Prog. 20, 1, 23--53.Google Scholar
- Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model. Lecture Notes in Computer Science, vol. 1132, 79--103. Google ScholarDigital Library
- Geigl, M., Griebl, M., and Lengauer, C. 1999. Termination detection in parallel loop nests with while loops. Paral. Comput. 25, 12, 1489--1510. Google ScholarDigital Library
- Goossens K., et al. 2003. Guaranteeing the quality of services in networks on chip. In Networks on Chip. Kluwer Publishers, 61--82. Google ScholarDigital Library
- Griebl, M. and Lengauer, C. 1996. The loop parallelizer loopo. In Proceedings of the 6th Workshop on Compilers for Parallel Computers, vol. 21. Forschungszentrum, 311--320.Google Scholar
- Haid, W., et al. 2009. Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOS. In Proceedings of ESTIMedia. IEEE, 35--44.Google ScholarCross Ref
- Kahn, G. 1974. The Semantics of a simple language for parallel programming. In Proceedings of the IFIP Congress 74. North-Holland Publishing Co.Google Scholar
- Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in Parallelization. In Proceedings of the ACM Symposium on Principles of Programming Languages (PoPL). CA, 107--120. Google ScholarDigital Library
- Martin, G. 2006. Overview of the MPSoC design challenge. In Proceedings of DAC. Google ScholarDigital Library
- Mihal, A. and Keutzer, K. 2003. Mapping concurrent applications onto architectural platforms. In Networks on Chips, A. Jantsch and H. Tenhunen, Eds., Kluwer Academic Publishers, 39--59. Google ScholarDigital Library
- Nadezhkin, D. and Stefanov, T. 2010. Identifying communication models in process networks derived from weakly dynamic programs. In Proceedings of SAMOS X. 372--379.Google Scholar
- Nikolov, H., Stefanov, T., and Deprettere, E. F. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Trans. CAD 27, 3, 542--555. Google ScholarDigital Library
- Raman, E., Ottoni, G., Raman, A., Bridges, M. J., and August, D. I. 2008. Parallel-stage decoupled software pipelining. In Proceedings of the 6th CGO, 114--123. Google ScholarDigital Library
- Stefanov, T. 2004. Converting weakly dynamic programs to equivalent process network specifications. Ph.D. thesis. Leiden University, The Netherlands, ISBN: 90-9018629-8.Google Scholar
- Stefanov T., et al. 2004. System design using Kahn process networks: The Compaan/Laura approach. In Proceedings of DATE. 340--345. Google ScholarDigital Library
- Turjan, A. 2007. Compiling nested loop programs to process networks. Ph.D. thesis. Leiden University, The Netherlands.Google Scholar
- Turjan, A., Kienhuis, B., and Deprettere, E. 2002. Realizations of the extended linearization model in the Compaan tool chain. In Proceedings of the 2nd Samos Workshop.Google Scholar
- Turjan, A., Kienhuis, B., and Deprettere, E. 2004. Translating affine nested-loop programs to process networks. In Proceedings of CASES'04, DC. Google ScholarDigital Library
- Verdoolaege, S., Nikolov, H., and Stefanov, T. 2007. PN: A tool for improved derivation of process networks. EURASIP J. Embed. Syst. 2007, 1, 19--19. Google ScholarDigital Library
Index Terms
- Automated generation of polyhedral process networks from affine nested-loop programs with dynamic loop bounds
Recommendations
Tiling imperfectly-nested loop nests
SC '00: Proceedings of the 2000 ACM/IEEE conference on SupercomputingTiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop ...
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionCoarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-...
Comments