Abstract
The increase of performance in handheld devices due to their widespread adoption has required the integration of several distinct kinds of processor in a single chip. These technologies have turned current Systems on Chip into heterogeneous platforms. Stencil codes are a family of algorithms that appear in many relevant scientific and image processing codes. In order to improve the performance of these algorithms in heterogeneous platforms, the usage of accelerators is very important but, for a mobile applications developer, the development cost is very high. We propose a methodology, based in our framework Paralldroid, for automatically generating accelerated implementations of several well-known representative stencil codes. The performance of these codes has also been measured in order to demonstrate how Paralldroid is able to accelerate code without extensive or complex modifications. Results show great performance improvements for few code modifications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acosta, A., Almeida, F.: Parallel implementations of the particle filter algorithm for android mobile devices. In: 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 244–247, March 2015
Acosta, A., Afonso, S., Almeida, F.: Extending paralldroid with object oriented annotations. Parallel Comput. 57, 25–36 (2016). http://www.sciencedirect.com/science/article/pii/S0167819116300126
Acosta, A., Almeida, F.: Towards a unified heterogeneous development model in android. In: Eleventh International Workshop HeteroPar 2013: Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (2013)
ARM: ARM®Mali™GPU OpenCL developer guide. http://malideveloper.arm.com/documentation/developer-guides/arm-guide-opencl/
Christen, M., Schenk, O., Burkhart, H.: Patus: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: 2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 676–687. IEEE (2011)
Cray Inc.: Cray®XC™series software environment. http://www.cray.com/sites/default/files/resources/CrayXC40_SoftwareEnvironment.pdf
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 4:1–4:12. IEEE Press, Piscataway (2008). http://dl.acm.org/citation.cfm?id=1413370.1413375
Intel: Intel® Atom™ Processor for Smartphone and Tablet. https://ark.intel.com/products/family/70095/Intel-Atom-Processor-for-Smartphone-and-Tablet
Notebook Check: Apple A10 Fusion. https://www.notebookcheck.net/Apple-A10-Fusion-SoC.173824.0.html
NVIDIA: Tegra mobile processors: Tegra 2, Tegra 3 and Tegra 4. http://www.nvidia.com/object/tegra-superchip.html
Packard, N.H., Wolfram, S.: Two-dimensional cellular automata. J. Stat. Phys. 38(5), 901–946 (1985). http://dx.doi.org/10.1007/BF01010423
PGI: PGI Accelerator compilers with OpenACC directives. https://www.pgroup.com/resources/accel.htm
Qualcomm: Snapdragon mobile processors. http://www.qualcomm.com/snapdragon
Reyes, R., López-Rodríguez, I., Fumero, J.J., Sande, F.: accULL: an OpenACC implementation with CUDA and OpenCL support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32820-6_86
Samsung: Exynos mobile processors. http://www.samsung.com/global/business/semiconductor/minisite/Exynos/
Shimokawabe, T., Aoki, T., Onodera, N.: High-productivity framework for large-scale GPU/CPU stencil applications. Procedia Comput. Sci. 80, 1646–1657 (2016). http://www.sciencedirect.com/science/article/pii/S1877050916309863
Smith, G.D.: Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford University Press, New York (1985)
Unat, D., Cai, X., Baden, S.B.: Mint: realizing cuda performance in 3d stencil methods with annotated c. In: Proceedings of the International Conference on Supercomputing, pp. 214–224. ACM (2011)
Zhang, T., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)
Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3d stencil codes on GPU clusters. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO 2012, NY, USA, pp. 155–164 (2012). http://doi.acm.org/10.1145/2259016.2259037
Acknowledgement
This work was supported by the EC (ERDF), the NESUS IC1315 COST Action, the Spanish Ministry of Economy, Industry and Competitiveness through the TIN2016-78919-R project, and the CAPAP-H network.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Afonso, S., Acosta, A., Almeida, F. (2017). Automatic Acceleration of Stencil Codes in Android Devices. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-65482-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)