skip to main content
research-article

A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

Authors Info & Claims
Published:03 October 2020Publication History
Skip Abstract Section

Abstract

This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed and floating point data, respectively, for x86 architecture. Implementing data parallel instructions directly in the assembly code would have required a lot of design effort, and it would not been sustainable across evolving platform variants. Thus, the compiler can be employed to efficiently speed up critical sections of the target application. The compiler is therefore potentially employable to raise the design abstraction and reduce development time for both embedded and general-purpose applications.

References

  1. R. Allen and S. Johnson. 1988. Compiling c for vectorization, parallelization, and inline expansion. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 241--249. DOI:https://doi.org/10.1145/53990.540Google ScholarGoogle Scholar
  2. Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, and Nigel Topham. 2012. An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC.Google ScholarGoogle Scholar
  3. Marnix Arnold and Henk Corporaal. 2001. Designing domain-specific processors. In Proceedings of the Ninth International Symposium on Hardware/Software Codesign (CODES’01). ACM, New York, NY, 61--66. DOI:https://doi.org/10.1145/371636.371677Google ScholarGoogle Scholar
  4. ASIP Designer 2016. Synopsys—ASIP Designer. Retrieved from http://www.synopsys.com/dw/ipdir.php?ds=asip-designer.Google ScholarGoogle Scholar
  5. P. Banerjee, N. Shenoy, A Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A Jones, A Kanhare, A Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky. (2000). A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’00).Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Benincasa, R. Besler, D. Brassaw, and R. L. Kohler. 1998. Rapid development of real-time systems using RTExpressTM. In Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP’98). 594--599. DOI:https://doi.org/10.1109/IPPS.1998.669986Google ScholarGoogle Scholar
  7. Aart J. C. Bik. 2004. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro, OR.Google ScholarGoogle Scholar
  8. João Bispo, Luís Reis, and João M. P. Cardoso. 2014. Multi-target c code generation from MATLAB. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 95:95–95:100. DOI:https://doi.org/10.1145/2627373.2627389Google ScholarGoogle Scholar
  9. Stéphane Chauveau and François Bodin. 1999. Menhir: An environment for high performance MATLAB. Sci. Program. 7, 3--4 (Aug. 1999), 303--312.Google ScholarGoogle Scholar
  10. Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 147--157. DOI:https://doi.org/10.1145/1176760.1176779Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Keith Cooper and Linda Torczon. 2012. Engineering a Compiler (Second Edition). Morgan Kaufmann, Boston. 765–785 pages.Google ScholarGoogle Scholar
  12. Luiz De Rose and David Padua. 1999. Techniques for the translation of MATLAB programs into fortran 90. ACM Trans. Program. Lang. Syst. 21, 2 (March 1999), 286--323. DOI:https://doi.org/10.1145/316686.316693Google ScholarGoogle Scholar
  13. Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04). ACM, New York, NY, 82--93. DOI:https://doi.org/10.1145/996841.996853Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. GCC 2020. GCC, the GNU Compiler Collection. Retrieved from https://gcc.gnu.org.Google ScholarGoogle Scholar
  15. Serge Guelton, Joël Falcou, and Pierrick Brunet. 2014. Exploring the vectorization of python constructs using pythran and boost SIMD. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing (WPMVP’14). ACM, New York, NY, 79--86. DOI:https://doi.org/10.1145/2568058.2568060Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pramod G. Joisha and Prithviraj Banerjee. 2007. A translator system for the MATLAB language: Research articles. Softw. Pract. Exper. 37, 5 (April 2007), 535--578. DOI:https://doi.org/10.1002/spe.v37:5Google ScholarGoogle ScholarCross RefCross Ref
  17. Ken Kennedy and Kathryn S. McKinley. 1990. Loop distribution with arbitrary control flow. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society Press, Los Alamitos, CA, 407--416. http://dl.acm.org/citation.cfm?id=110382.110458Google ScholarGoogle Scholar
  18. Ioannis Latifis, Karthick Parashar, Grigoris Dimitroulakos, Hans Cappelle, Christakis Lezos, Konstantinos Masselos, and Francky Catthoor. 2017. A MATLAB vectorizing compiler targeting application-specific instruction set processors. ACM Trans. Des. Autom. Electron. Syst. 22, 2, Article 32 (Jan. 2017), 28 pages. DOI:https://doi.org/10.1145/2996182Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Leupers and P. Marwedel. 1996. Instruction selection for embedded DSPs with complex instructions. In Proceedings of the Conference on European Design Automation (EURO-DAC’96/EURO-VHDL’96). IEEE Computer Society Press, Los Alamitos, CA, 200--205. http://dl.acm.org/citation.cfm?id=252471.252509Google ScholarGoogle Scholar
  20. Rainer Leupers and Steven Bashford. 2000. Graph-based code selection techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 5, 4 (October 2000), 794--814. DOI:https://doi.org/10.1145/362652.362661Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Li, W. Jigang, S. K. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS’09). 266--270. DOI:https://doi.org/10.1109/ICIS.2009.108Google ScholarGoogle Scholar
  22. Bruno Cardoso Lopes and Rafael Auler. 2014. Getting Started with LLVM Core Libraries. Packt Publishing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Los Alamitos, CA, 372--382. DOI:https://doi.org/10.1109/PACT.2011.68Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stanislav Manilov, Björn Franke, Anthony Magrath, and Cedric Andrieu. 2015. Free rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, 5:1–5:10. DOI:https://doi.org/10.1145/2670529.2754962Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. MathWorks Coder. 2020. MATLAB Coder. Retrieved from http://www.mathworks.com/products/matlab-coder/.Google ScholarGoogle Scholar
  26. Matlab embedded coder. 2020. MATLAB Embedded Coder—Generate C and C++ Code Optimized for Embedded systems. Retrieved from http://www.mathworks.com/products/embedded-coder/.Google ScholarGoogle Scholar
  27. Alastair Murray and Björn Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22. DOI:https://doi.org/10.1145/2259016.2259019Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dorit Naishlos. 2004. Autovectorization in GCC. In Proceedings of the GCC Developer’s Summit. 105--117.Google ScholarGoogle Scholar
  29. Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Los Alamitos, CA, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Los Alamitos, CA, 281--294. DOI:https://doi.org/10.1109/CGO.2006.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11. DOI:https://doi.org/10.1145/1454115.1454119Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Octave. 2020. GNU Octave. Retrieved from https://www.gnu.org/software/octave/.Google ScholarGoogle Scholar
  33. Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 152--163. DOI:https://doi.org/10.1145/1993498.1993517Google ScholarGoogle Scholar
  34. M. Prieto, L. Pinuel, F. Catthoor, F. Tirado, and C. Tenllado. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 2005 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 303--308. DOI:https://doi.org/10.1145/1084834.1084909Google ScholarGoogle Scholar
  35. M. J. Quinn, A Malishevsky, and N. Seelam. 1998. Otter: Bridging the gap between MATLAB and ScaLAPACK. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. 114--121. DOI:https://doi.org/10.1109/HPDC.1998.709963Google ScholarGoogle Scholar
  36. Raspberry Pi. 2016. Raspberry Pi Products. Retrieved from https://www.raspberrypi.org/products/.Google ScholarGoogle Scholar
  37. G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b–89b. DOI:https://doi.org/10.1109/IPDPS.2005.94Google ScholarGoogle Scholar
  38. Gang Ren, Peng Wu, and David Padua. 2006. Optimizing data permutations for SIMD devices. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 118--131. DOI:https://doi.org/10.1145/1133981.1133996Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sage. 2020. SageMath—Open-Source Mathematical Software System. Retrieved from http://www.sagemath.org/.Google ScholarGoogle Scholar
  40. H. Scharwaechter, R. Leupers, G. Ascheid, H. Meyr, J. M. Youn, and Y. Paek. 2007. A code-generator generator for multi-output instructions. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 131--136. DOI:https://doi.org/10.1145/1289816.1289851Google ScholarGoogle Scholar
  41. Scilab. 2020. Scilab. Retrieved from https://www.scilab.org/.Google ScholarGoogle Scholar
  42. Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07). IEEE Computer Society, Los Alamitos, CA, 280--291. DOI:https://doi.org/10.1109/PACT.2007.41Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Shin, M. Hall, and J. Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the International Symposium on Code Generation and Optimization. 165--175. DOI:https://doi.org/10.1109/CGO.2005.33Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. L. L. Smith. 1991. Vectorizing C compilers: How good are they? In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing’91). 544--553. DOI:https://doi.org/10.1145/125826.126105Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yulei Sui, XIaokang Fan, Hao Zhou, and Jingling Xue. 2016. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization. In Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES’16). ACM, New York, NY, 41--51. DOI:https://doi.org/10.1145/2907950.2907957Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE Computer Society, Los Alamitos, CA, 327--337. DOI:https://doi.org/10.1109/PACT.2009.18Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 19, Issue 6
      Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers
      November 2020
      271 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3427195
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2020
      • Online AM: 7 May 2020
      • Revised: 1 March 2020
      • Accepted: 1 March 2020
      • Received: 1 November 2019
      Published in tecs Volume 19, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format