Skip to main content
Log in

Towards Parallelism Extraction for Heterogeneous Multicore Android Devices

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Modern Android mobile devices are enabled by complex heterogeneous MPSoC platforms. To exploit the full potential of these hardware platforms, computationally intensive parts of applications have to be properly parallelized. However, the current practice involves several manual steps, which is a cumbersome task for programmers. In this paper, we present an automated approach to extract multiple forms of parallelism from native C code within Android applications, targeting heterogeneous multicore devices. We show the effectiveness of our approach by parallelizing a set of benchmarks on a Nexus 7 tablet, which is based on a Snapdragon MPSoC that features a quad-core Krait CPU cluster and an Adreno 320 GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Acosta, A., Almeida, F.: Euro-Par 2013: parallel processing workshops. In: Towards a Unified Heterogeneous Development Model in AndroidTM, Chap., pp. 238–248. Springer, Berlin (2014)

  2. Aguilar, M.A., Eusse, J.F., Ray, P., Leupers, R., Ascheid, G., Sheng, W., Sharma, P.: Parallelism extraction in embedded software for Android devices. In: SAMOS XV, pp. 9–17 (2015)

  3. Aguilar, M.A., Leupers, R.: Unified identification of multiple forms of parallelism in embedded applications. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 482–483 (2015)

  4. Aguilar, M.A., Leupers, R., Ascheid, G., Kavvadias, N.: A toolflow for parallelization of embedded software in multicore DSP platforms. In: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems. SCOPES’15, pp. 76–79. ACM, New York (2015)

  5. Aguilar, M.A., Leupers, R., Ascheid, G., Murillo, L.G.: Automatic parallelization and accelerator offloading for embedded applications on heterogeneous MPSoCs. In: Proceedings of the 53rd Annual Design Automation Conference, DAC’16, pp. 49:1–49:6. ACM, New York (2016)

  6. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20. 1967, Spring Joint Computer Conference, AFIPS’67 (Spring), pp. 483–485. ACM, New York (1967)

  7. ASUS: Nexus 7 (2013). (online) http://www.asus.com/Tablets_Mobile/Nexus_7_2013/. Accessed 02/2016

  8. Boissinot, B.: Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions. Ph.D. thesis (2010)

  9. Castrillon, J., Leupers, R.: Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, Berlin (2014)

    Book  Google Scholar 

  10. Castrillon, J., Leupers, R., Ascheid, G.: MAPS: mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Trans. Ind. Inform. (99), 19 (2011)

  11. Castrillon, J., Tretter, A., Leupers, R., Ascheid, G.: Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In: Proceedings of the 49th Annual Design Automation Conference, pp. 1266–1271. ACM, New York (2012)

  12. Chandrasekaran, S., Chapman, B.: A portable OpenMP runtime library based on MCAPI/MRAPI. (online) http://www.embedded.com. Accessed 03/2016

  13. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization, IISWC 2009. pp. 44–54 (2009)

  14. Cordes, D.A.: Automatic parallelization for embedded multi-core systems using high-level cost models. Ph.D. thesis, TU Dortmund University (2013)

  15. CriticalBlue: Prism. (online) http://www.criticalblue.com/. Accessed 3/2016

  16. Eclipse. (online) www.eclipse.org. Accessed 03/2016

  17. Eusse, J.F., Williams, C., Leupers, R.: CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design. ACM Trans. Reconfig. Technol. Syst. 8, 17:1–17:16 (2014)

  18. Faxen, K.F., Popov, K., Albertsson, L., Janson, S.: Embla—data dependence profiling for parallel programming. In: Proceedings of Complex, Intelligent and Software Intensive Systems, pp. 780–785 (2008)

  19. Gilles, K.: The semantics of a simple language for parallel programming. In: Rosenfeld, J.L. (ed.) IFIP Congress 74, pp. 471–475. North Holland, Amsterdam (1974)

  20. Google: Android Auto. (online) https://www.android.com/auto/. Accessed 02/2016

  21. Google: Android: Canvas and Drawables. (online) http://developer.android.com/guide/topics/graphics/2d-graphics.html. Accessed 03/2016

  22. Google: Android Studio. (online) http://developer.android.com/tools/studio/index.html. Accessed 03/2016

  23. Google: ART and Dalvik. (online) https://source.android.com/devices/tech/dalvik/index.html. Accessed 02/2016

  24. Google: Java Native Interface. (online) http://developer.android.com/training/articles/perf-jni.html. Accessed 02/2016

  25. Google: Native Development Kit. (online) http://developer.android.com/ndk/guides/concepts.html. Accessed 02/2016

  26. Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151–162 (2006)

  27. IDC: Smartphone os market share, 2015 q2. (online) http://www.idc.com/prodserv/smartphone-os-market-share.jsp. Accessed 02/2016

  28. Islam, M.: On the limitations of compilers to exploit thread-level parallelism in embedded applications. In: 6th IEEE/ACIS International Conference on Computer and Information Science, 2007. ICIS 2007, pp. 60–66 (2007)

  29. Johnson, R.C.: Efficient program analysis using dependence flow graphs. Ph.D. thesis, Cornell University (1994)

  30. Johnson, R.E.: Software development is program transformation. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. FoSER’10, pp. 177–180. ACM, New York (2010)

  31. Karkowski, I., Corporaal, H.: Overcoming the limitations of the traditional loop parallelization. FGCS 13(4–5), 407–416 (1998)

    Article  Google Scholar 

  32. Kejariwal, A., Veidenbaum, A.V., Nicolau, A., Girkarmark, M., Tian, X., Saito, H.: Challenges in exploitation of loop parallelism in embedded applications. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. CODES+ISSS’06, pp. 173–180. ACM, New York (2006)

  33. Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  34. Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: Proceedings of MICRO 45, pp. 437–448. IEEE Computer Society, Washington (2012)

  35. Keutzer, K., Mattson, T.: Our pattern language (OPL). A pattern language for parallel programming. (online) http://parlab.eecs.berkeley.edu/wiki/patterns/patterns. Accessed 06/2016

  36. Khronos: The OpenCL specification. version 1.1. (online) https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf. Accessed 03/2016

  37. Kienhuis, B., Rijpkema, E., Deprettere, E.: Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In: Proceedings of CODES 2000, pp. 13–17

  38. Kim, M.: Dynamic program analysis algorithms to assist parallelization. Ph.D. thesis, Atlanta. AAI0828881 (2012)

  39. Kock, E.A.D., Essink, G., Smits, W.J.M., Wolf, P.V.D.: YAPI: application modeling for signal processing systems. In: Proceedings of 37th DAC, pp. 402–405 (2000)

  40. McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation, 1st edn. Morgan Kaufmann, San Francisco (2012)

    Google Scholar 

  41. Membarth, R., Reiche, O., Hannig, F., Teich, J.: Code generation for embedded heterogeneous architectures on Android. In: Proceedings of DATE’14, pp. 86:1–86:6

  42. Multicore Association: Software-hardware interface for multi-many-core (SHIM) specification v1.00. (online) http://www.multicore-association.org. Accessed 06/2016

  43. OpenMP Review Board: Openmp application program interface. version 3.1. (online) www.openmp.org/mp-documents/OpenMP3.1.pdf. Accessed 08/2016

  44. Qualcomm: Snapdragon. (online) https://www.qualcomm.com/products/snapdragon. Accessed 02/2016

  45. Samsung: Exynos. (online) https://www.samsung.com/exynos. Accessed 02/2016

  46. Sheng, W., Schürmans, S., Odendahl, M., Bertsch, M., Volevach, V., Leupers, R., Ascheid, G.: A compiler infrastructure for embedded heterogeneous MPSoCs. Parallel Comput. 40(2), 51–68 (2014)

    Article  Google Scholar 

  47. Silexica: (online) http://www.silexica.com. Accessed 4/2016

  48. Stotzer, E.: Towards using OpenMP in embedded systems. OpenMPCon: Developers Conference (2015)

  49. Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: Proceedings of SAMOS XII, pp. 144–151 (2012)

  50. Sujeeth, A.K., Brown, K.J., Lee, H., Rompf, T., Chafi, H., Odersky, M., Olukotun, K.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13(4s), 134:1–134:25 (2014). doi:10.1145/2584665

    Article  Google Scholar 

  51. Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: Proceedings of MICRO 40, pp. 356–369. IEEE Computer Society (2007)

  52. Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: a language for streaming applications. In: Proceedings of CC’02, pp. 179–196. Springer, Berlin (2002)

  53. Tournavitis, G.: Profile-driven parallelization of sequential programs. Ph.D. thesis, University of Edinburgh (2011)

  54. Verdoolaege, S., Nikolov, H., Stefanov, T.: Pn: A tool for improved derivation of process networks. EURASIP J. Embedded Syst. 2007(1), 19–19 (2007). doi:10.1155/2007/75947

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Angel Aguilar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aguilar, M.A., Eusse, J.F., Ray, P. et al. Towards Parallelism Extraction for Heterogeneous Multicore Android Devices. Int J Parallel Prog 45, 1592–1624 (2017). https://doi.org/10.1007/s10766-016-0479-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0479-5

Keywords

Navigation