Skip to main content

Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2016)

Abstract

This paper presents the experimental evaluation of a new data mapping technique for the GPU shared memory, called Adaptive Modular Mapping (AMM). The evaluated technique aims to remap data across the shared memory physical banks, so as to increase parallel accesses, resulting in appreciable gains in terms of performance. Unless previous techniques described in literature, AMM does not increase shared memory size as a side effect of the conflict-avoidance technique. The paper also presents the experimental set-up used for the validation of the proposed memory mapping methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. CUDA C Programming Guide

    Google Scholar 

  2. Amato, F., Fasolino, A., Mazzeo, A., Moscato, V., Picariello, A., Romano, S., Tramontana, P.: Ensuring semantic interoperability for e-health applications. In: Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2011, pp. 315–320 (2011)

    Google Scholar 

  3. Amato, F., Mazzeo, A., Penta, A., Picariello, A.: Building RDF ontologies from semistructured legal documents. pp. 997–1002 (2008)

    Google Scholar 

  4. Amato, F., Moscato, F.: A model driven approach to data privacy verification in e-health systems. Transactions on Data Privacy 8(3), 273–296 (2015)

    Google Scholar 

  5. Barbareschi, M.: Implementing hardware decision tree prediction: a scalable approach. In: 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 87–92. IEEE (2016)

    Google Scholar 

  6. Barbareschi, M., Battista, E., Mazzocca, N., Venkatesan, S.: A hardware accelerator for data classification within the sensing infrastructure. In: Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on, pp. 400–405. IEEE (2014)

    Google Scholar 

  7. Barbareschi, M., De Benedictis, A., Mazzeo, A., Vespoli, A.: Providing mobile traffic analysis as-a-service: Design of a service-based infrastructure to offer high-accuracy traffic classifiers based on hardware accelerators. Journal of Digital Information Management 13(4), 257 (2015)

    Google Scholar 

  8. Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, p. 13. ACM (2011)

    Google Scholar 

  9. Cheng, J., Grossman, M., McKercher, T.: Professional Cuda C Programming. John Wiley & Sons (2014)

    Google Scholar 

  10. Cilardo, A.: Efficient bit-parallel GF(2m) multiplier for a large class of irreducible pentanomials. IEEE Transactions on Computers 58(7), 1001–1008 (2009)

    Google Scholar 

  11. Cilardo, A.: Exploring the potential of threshold logic for cryptography-related operations. IEEE Transactions on Computers 60(4), 452–462 (2011)

    Google Scholar 

  12. Cilardo, A., Fusella, E., Gallo, L., Mazzeo, A.: Exploiting concurrency for the automated synthesis of MPSoC interconnects. ACM Transactions on Embedded Computing Systems 14(3) (2015)

    Google Scholar 

  13. Cilardo, A., Gallo, L.: Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization 11(4) (2014)

    Google Scholar 

  14. Darte, A., Dion, M., Robert, Y.: A characterization of one-to-one modular mappings. Parallel Processing Letters 6(01), 145–157 (1996)

    Google Scholar 

  15. Darte, A., Schreiber, R., Villard, G.: Lattice-based memory allocation. IEEE Transactions on Computers 54(10), 1242–1257 (2005)

    Google Scholar 

  16. Escobar, F.A., Chang, X., Valderrama, C.: Suitability analysis of fpgas for heterogeneous platforms in hpc. IEEE Transactions on Parallel and Distributed Systems 27(2), 600–612 (2016)

    Google Scholar 

  17. Fusella, E., Cilardo, A.: H2ONoC: A hybrid optical-electronic NoC based on hybrid topology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2016)

    Google Scholar 

  18. Fusella, E., Cilardo, A.: Minimizing power loss in optical networks-on-chip through application-specific mapping. Microprocessors and Microsystems (2016)

    Google Scholar 

  19. Gao, S., Peterson, G.D.: Optimizing cuda shared memory usage

    Google Scholar 

  20. Grun, P., Dutt, N., Nicolau, A.: Apex: access pattern based memory architecture exploration. In: Proceedings of the 14th international symposium on Systems synthesis, pp. 25–32. ACM (2001)

    Google Scholar 

  21. Hallmans, D., A˚ sberg, M., Nolte, T.: Towards using the graphics processing unit (gpu) for embedded systems. In: Proceedings of 2012 IEEE 17th International Conference on Emerging Technologies & Factory Automation (ETFA 2012), pp. 1–4. IEEE (2012)

    Google Scholar 

  22. Khan, A., Al-Mouhamed, M., Fatayar, A., Almousa, A., Baqais, A., Assayony, M.: Padding free bank conflict resolution for cuda-based matrix transpose algorithm. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on, pp. 1–6. IEEE (2014)

    Google Scholar 

  23. Kim, Y., Shrivastava, A.: Cumapz: a tool to analyze memory access patterns in cuda. In: Proceedings of the 48th Design Automation Conference, pp. 128–133. ACM (2011)

    Google Scholar 

  24. Kirk, D.B., Wen-mei, W.H.: Programming massively parallel processors: a hands-on approach. Newnes (2012)

    Google Scholar 

  25. Luebke, D.: Cuda: Scalable parallel programming for high-performance scientific computing. In: 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 836–838. IEEE (2008)

    Google Scholar 

  26. Lustig, D., Martonosi, M.: Reducing gpu offload latency via fine-grained cpu-gpu synchronization. In: HPCA, vol. 13, pp. 354–365 (2013)

    Google Scholar 

  27. Mungiello, I.: Experimental evaluation of memory optimizations on an embedded gpu platform. In: 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 169–174. IEEE (2015)

    Google Scholar 

  28. Sung, I.J., Liu, G.D., Hwu, W.M.W.: Dl: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing (InPar), 2012, pp. 1–11. IEEE (2012)

    Google Scholar 

  29. Ueng, S.Z., Lathara, M., Baghsorkhi, S.S., Wen-mei, W.H.: Cuda-lite: Reducing gpu programming complexity. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 1–15. Springer (2008)

    Google Scholar 

  30. Wang, Z., Grewe, D., Oboyle, M.F.: Automatic and portable mapping of data parallel programs to opencl for gpu-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization (TACO) 11(4), 42 (2015)

    Google Scholar 

  31. Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., Cong, J.: High-level synthesis: From algorithm to digital circuit (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Innocenzo Mungiello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mungiello, I., De Rosa, F. (2017). Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49109-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49108-0

  • Online ISBN: 978-3-319-49109-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics