Abstract
This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed and floating point data, respectively, for x86 architecture. Implementing data parallel instructions directly in the assembly code would have required a lot of design effort, and it would not been sustainable across evolving platform variants. Thus, the compiler can be employed to efficiently speed up critical sections of the target application. The compiler is therefore potentially employable to raise the design abstraction and reduce development time for both embedded and general-purpose applications.
- R. Allen and S. Johnson. 1988. Compiling c for vectorization, parallelization, and inline expansion. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 241--249. DOI:https://doi.org/10.1145/53990.540Google Scholar
- Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, and Nigel Topham. 2012. An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC.Google Scholar
- Marnix Arnold and Henk Corporaal. 2001. Designing domain-specific processors. In Proceedings of the Ninth International Symposium on Hardware/Software Codesign (CODES’01). ACM, New York, NY, 61--66. DOI:https://doi.org/10.1145/371636.371677Google Scholar
- ASIP Designer 2016. Synopsys—ASIP Designer. Retrieved from http://www.synopsys.com/dw/ipdir.php?ds=asip-designer.Google Scholar
- P. Banerjee, N. Shenoy, A Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A Jones, A Kanhare, A Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky. (2000). A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’00).Google ScholarDigital Library
- M. Benincasa, R. Besler, D. Brassaw, and R. L. Kohler. 1998. Rapid development of real-time systems using RTExpressTM. In Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP’98). 594--599. DOI:https://doi.org/10.1109/IPPS.1998.669986Google Scholar
- Aart J. C. Bik. 2004. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro, OR.Google Scholar
- João Bispo, Luís Reis, and João M. P. Cardoso. 2014. Multi-target c code generation from MATLAB. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 95:95–95:100. DOI:https://doi.org/10.1145/2627373.2627389Google Scholar
- Stéphane Chauveau and François Bodin. 1999. Menhir: An environment for high performance MATLAB. Sci. Program. 7, 3--4 (Aug. 1999), 303--312.Google Scholar
- Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 147--157. DOI:https://doi.org/10.1145/1176760.1176779Google ScholarDigital Library
- Keith Cooper and Linda Torczon. 2012. Engineering a Compiler (Second Edition). Morgan Kaufmann, Boston. 765–785 pages.Google Scholar
- Luiz De Rose and David Padua. 1999. Techniques for the translation of MATLAB programs into fortran 90. ACM Trans. Program. Lang. Syst. 21, 2 (March 1999), 286--323. DOI:https://doi.org/10.1145/316686.316693Google Scholar
- Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04). ACM, New York, NY, 82--93. DOI:https://doi.org/10.1145/996841.996853Google ScholarDigital Library
- GCC 2020. GCC, the GNU Compiler Collection. Retrieved from https://gcc.gnu.org.Google Scholar
- Serge Guelton, Joël Falcou, and Pierrick Brunet. 2014. Exploring the vectorization of python constructs using pythran and boost SIMD. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing (WPMVP’14). ACM, New York, NY, 79--86. DOI:https://doi.org/10.1145/2568058.2568060Google ScholarDigital Library
- Pramod G. Joisha and Prithviraj Banerjee. 2007. A translator system for the MATLAB language: Research articles. Softw. Pract. Exper. 37, 5 (April 2007), 535--578. DOI:https://doi.org/10.1002/spe.v37:5Google ScholarCross Ref
- Ken Kennedy and Kathryn S. McKinley. 1990. Loop distribution with arbitrary control flow. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society Press, Los Alamitos, CA, 407--416. http://dl.acm.org/citation.cfm?id=110382.110458Google Scholar
- Ioannis Latifis, Karthick Parashar, Grigoris Dimitroulakos, Hans Cappelle, Christakis Lezos, Konstantinos Masselos, and Francky Catthoor. 2017. A MATLAB vectorizing compiler targeting application-specific instruction set processors. ACM Trans. Des. Autom. Electron. Syst. 22, 2, Article 32 (Jan. 2017), 28 pages. DOI:https://doi.org/10.1145/2996182Google ScholarDigital Library
- R. Leupers and P. Marwedel. 1996. Instruction selection for embedded DSPs with complex instructions. In Proceedings of the Conference on European Design Automation (EURO-DAC’96/EURO-VHDL’96). IEEE Computer Society Press, Los Alamitos, CA, 200--205. http://dl.acm.org/citation.cfm?id=252471.252509Google Scholar
- Rainer Leupers and Steven Bashford. 2000. Graph-based code selection techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 5, 4 (October 2000), 794--814. DOI:https://doi.org/10.1145/362652.362661Google ScholarDigital Library
- T. Li, W. Jigang, S. K. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS’09). 266--270. DOI:https://doi.org/10.1109/ICIS.2009.108Google Scholar
- Bruno Cardoso Lopes and Rafael Auler. 2014. Getting Started with LLVM Core Libraries. Packt Publishing.Google ScholarDigital Library
- Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Los Alamitos, CA, 372--382. DOI:https://doi.org/10.1109/PACT.2011.68Google ScholarDigital Library
- Stanislav Manilov, Björn Franke, Anthony Magrath, and Cedric Andrieu. 2015. Free rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, 5:1–5:10. DOI:https://doi.org/10.1145/2670529.2754962Google ScholarDigital Library
- MathWorks Coder. 2020. MATLAB Coder. Retrieved from http://www.mathworks.com/products/matlab-coder/.Google Scholar
- Matlab embedded coder. 2020. MATLAB Embedded Coder—Generate C and C++ Code Optimized for Embedded systems. Retrieved from http://www.mathworks.com/products/embedded-coder/.Google Scholar
- Alastair Murray and Björn Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22. DOI:https://doi.org/10.1145/2259016.2259019Google ScholarDigital Library
- Dorit Naishlos. 2004. Autovectorization in GCC. In Proceedings of the GCC Developer’s Summit. 105--117.Google Scholar
- Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Los Alamitos, CA, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062Google ScholarDigital Library
- Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Los Alamitos, CA, 281--294. DOI:https://doi.org/10.1109/CGO.2006.25Google ScholarDigital Library
- Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11. DOI:https://doi.org/10.1145/1454115.1454119Google ScholarDigital Library
- Octave. 2020. GNU Octave. Retrieved from https://www.gnu.org/software/octave/.Google Scholar
- Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 152--163. DOI:https://doi.org/10.1145/1993498.1993517Google Scholar
- M. Prieto, L. Pinuel, F. Catthoor, F. Tirado, and C. Tenllado. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 2005 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 303--308. DOI:https://doi.org/10.1145/1084834.1084909Google Scholar
- M. J. Quinn, A Malishevsky, and N. Seelam. 1998. Otter: Bridging the gap between MATLAB and ScaLAPACK. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. 114--121. DOI:https://doi.org/10.1109/HPDC.1998.709963Google Scholar
- Raspberry Pi. 2016. Raspberry Pi Products. Retrieved from https://www.raspberrypi.org/products/.Google Scholar
- G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b–89b. DOI:https://doi.org/10.1109/IPDPS.2005.94Google Scholar
- Gang Ren, Peng Wu, and David Padua. 2006. Optimizing data permutations for SIMD devices. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 118--131. DOI:https://doi.org/10.1145/1133981.1133996Google ScholarDigital Library
- Sage. 2020. SageMath—Open-Source Mathematical Software System. Retrieved from http://www.sagemath.org/.Google Scholar
- H. Scharwaechter, R. Leupers, G. Ascheid, H. Meyr, J. M. Youn, and Y. Paek. 2007. A code-generator generator for multi-output instructions. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 131--136. DOI:https://doi.org/10.1145/1289816.1289851Google Scholar
- Scilab. 2020. Scilab. Retrieved from https://www.scilab.org/.Google Scholar
- Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07). IEEE Computer Society, Los Alamitos, CA, 280--291. DOI:https://doi.org/10.1109/PACT.2007.41Google ScholarDigital Library
- J. Shin, M. Hall, and J. Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the International Symposium on Code Generation and Optimization. 165--175. DOI:https://doi.org/10.1109/CGO.2005.33Google ScholarDigital Library
- L. L. Smith. 1991. Vectorizing C compilers: How good are they? In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing’91). 544--553. DOI:https://doi.org/10.1145/125826.126105Google ScholarDigital Library
- Yulei Sui, XIaokang Fan, Hao Zhou, and Jingling Xue. 2016. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization. In Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES’16). ACM, New York, NY, 41--51. DOI:https://doi.org/10.1145/2907950.2907957Google ScholarDigital Library
- Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE Computer Society, Los Alamitos, CA, 327--337. DOI:https://doi.org/10.1109/PACT.2009.18Google ScholarDigital Library
Index Terms
- A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism
Recommendations
A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors
Special Section of IDEA: Integrating Dataflow, Embedded Computing, and ArchitectureThis article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set ...
Matlab to C compilation targeting application specific instruction set processors
DATE '16: Proceedings of the 2016 Conference on Design, Automation & Test in EuropeThis paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...
Comments