Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs

Mungiello, Innocenzo; De Rosa, Francesco

doi:10.1007/978-3-319-49109-7_34

Innocenzo Mungiello⁵ &
Francesco De Rosa⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 1))

Included in the following conference series:

International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

1678 Accesses

Abstract

This paper presents the experimental evaluation of a new data mapping technique for the GPU shared memory, called Adaptive Modular Mapping (AMM). The evaluated technique aims to remap data across the shared memory physical banks, so as to increase parallel accesses, resulting in appreciable gains in terms of performance. Unless previous techniques described in literature, AMM does not increase shared memory size as a side effect of the conflict-avoidance technique. The paper also presents the experimental set-up used for the validation of the proposed memory mapping methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

CUDA C Programming Guide
Google Scholar
Amato, F., Fasolino, A., Mazzeo, A., Moscato, V., Picariello, A., Romano, S., Tramontana, P.: Ensuring semantic interoperability for e-health applications. In: Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2011, pp. 315–320 (2011)
Google Scholar
Amato, F., Mazzeo, A., Penta, A., Picariello, A.: Building RDF ontologies from semistructured legal documents. pp. 997–1002 (2008)
Google Scholar
Amato, F., Moscato, F.: A model driven approach to data privacy verification in e-health systems. Transactions on Data Privacy 8(3), 273–296 (2015)
Google Scholar
Barbareschi, M.: Implementing hardware decision tree prediction: a scalable approach. In: 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 87–92. IEEE (2016)
Google Scholar
Barbareschi, M., Battista, E., Mazzocca, N., Venkatesan, S.: A hardware accelerator for data classification within the sensing infrastructure. In: Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on, pp. 400–405. IEEE (2014)
Google Scholar
Barbareschi, M., De Benedictis, A., Mazzeo, A., Vespoli, A.: Providing mobile traffic analysis as-a-service: Design of a service-based infrastructure to offer high-accuracy traffic classifiers based on hardware accelerators. Journal of Digital Information Management 13(4), 257 (2015)
Google Scholar
Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, p. 13. ACM (2011)
Google Scholar
Cheng, J., Grossman, M., McKercher, T.: Professional Cuda C Programming. John Wiley & Sons (2014)
Google Scholar
Cilardo, A.: Efficient bit-parallel GF(2m) multiplier for a large class of irreducible pentanomials. IEEE Transactions on Computers 58(7), 1001–1008 (2009)
Google Scholar
Cilardo, A.: Exploring the potential of threshold logic for cryptography-related operations. IEEE Transactions on Computers 60(4), 452–462 (2011)
Google Scholar
Cilardo, A., Fusella, E., Gallo, L., Mazzeo, A.: Exploiting concurrency for the automated synthesis of MPSoC interconnects. ACM Transactions on Embedded Computing Systems 14(3) (2015)
Google Scholar
Cilardo, A., Gallo, L.: Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization 11(4) (2014)
Google Scholar
Darte, A., Dion, M., Robert, Y.: A characterization of one-to-one modular mappings. Parallel Processing Letters 6(01), 145–157 (1996)
Google Scholar
Darte, A., Schreiber, R., Villard, G.: Lattice-based memory allocation. IEEE Transactions on Computers 54(10), 1242–1257 (2005)
Google Scholar
Escobar, F.A., Chang, X., Valderrama, C.: Suitability analysis of fpgas for heterogeneous platforms in hpc. IEEE Transactions on Parallel and Distributed Systems 27(2), 600–612 (2016)
Google Scholar
Fusella, E., Cilardo, A.: H2ONoC: A hybrid optical-electronic NoC based on hybrid topology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2016)
Google Scholar
Fusella, E., Cilardo, A.: Minimizing power loss in optical networks-on-chip through application-specific mapping. Microprocessors and Microsystems (2016)
Google Scholar
Gao, S., Peterson, G.D.: Optimizing cuda shared memory usage
Google Scholar
Grun, P., Dutt, N., Nicolau, A.: Apex: access pattern based memory architecture exploration. In: Proceedings of the 14th international symposium on Systems synthesis, pp. 25–32. ACM (2001)
Google Scholar
Hallmans, D., A˚ sberg, M., Nolte, T.: Towards using the graphics processing unit (gpu) for embedded systems. In: Proceedings of 2012 IEEE 17th International Conference on Emerging Technologies & Factory Automation (ETFA 2012), pp. 1–4. IEEE (2012)
Google Scholar
Khan, A., Al-Mouhamed, M., Fatayar, A., Almousa, A., Baqais, A., Assayony, M.: Padding free bank conflict resolution for cuda-based matrix transpose algorithm. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on, pp. 1–6. IEEE (2014)
Google Scholar
Kim, Y., Shrivastava, A.: Cumapz: a tool to analyze memory access patterns in cuda. In: Proceedings of the 48th Design Automation Conference, pp. 128–133. ACM (2011)
Google Scholar
Kirk, D.B., Wen-mei, W.H.: Programming massively parallel processors: a hands-on approach. Newnes (2012)
Google Scholar
Luebke, D.: Cuda: Scalable parallel programming for high-performance scientific computing. In: 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 836–838. IEEE (2008)
Google Scholar
Lustig, D., Martonosi, M.: Reducing gpu offload latency via fine-grained cpu-gpu synchronization. In: HPCA, vol. 13, pp. 354–365 (2013)
Google Scholar
Mungiello, I.: Experimental evaluation of memory optimizations on an embedded gpu platform. In: 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 169–174. IEEE (2015)
Google Scholar
Sung, I.J., Liu, G.D., Hwu, W.M.W.: Dl: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing (InPar), 2012, pp. 1–11. IEEE (2012)
Google Scholar
Ueng, S.Z., Lathara, M., Baghsorkhi, S.S., Wen-mei, W.H.: Cuda-lite: Reducing gpu programming complexity. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 1–15. Springer (2008)
Google Scholar
Wang, Z., Grewe, D., Oboyle, M.F.: Automatic and portable mapping of data parallel programs to opencl for gpu-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization (TACO) 11(4), 42 (2015)
Google Scholar
Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., Cong, J.: High-level synthesis: From algorithm to digital circuit (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Naples Federico II and Centro Regionale ICT (CeRICT), Naples, Italy
Innocenzo Mungiello
University of Naples Federico II, Naples, Italy
Francesco De Rosa

Authors

Innocenzo Mungiello
View author publications
You can also search for this author in PubMed Google Scholar
Francesco De Rosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Innocenzo Mungiello .

Editor information

Editors and Affiliations

Campus Nord,Ed. Omega (Room 109), Technical University of Catalonia Campus Nord,Ed. Omega (Room 109), Barcelona, Spain
Fatos Xhafa
Fukuoka Institute of Technology , Fukuoka, Japan
Leonard Barolli
Federico II, Università degli Studi di Napoli Federico II, Napoli, Italy
Flora Amato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mungiello, I., De Rosa, F. (2017). Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-49109-7_34
Published: 22 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49108-0
Online ISBN: 978-3-319-49109-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics