research-article

High performance computing of fiber scattering simulation

Authors:

David KaeliAuthors Info & Claims

GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

Pages 90 - 98

https://doi.org/10.1145/2716282.2716285

Published: 07 February 2015 Publication History

Abstract

Cellulose is one of the most promising energy resources that is waiting to be tapped. Harvesting energy from cellulose requires decoding its atomic structure. Some structural information can be exposed by modeling data produced by X-ray scattering. Forward simulation can be used to explore structural parameters of cellulose, including the diameter, twist and coiling, but modeling fiber scattering is computationally challenging. In this paper, we explore how to accelerate a molecular scattering algorithm by leveraging a modern high-end Graphic Processing Unit (GPU). A step-wise optimization approach is described in this work that considers memory utilization, math intrinsics, concurrent kernel execution and workload partitioning. Different caching strategies to manage the state of the atom volume in memory are taken into account. We have developed optimized cluster solutions for both CPUs and GPUs. Different workload distribution schemes and con- current execution approaches for both CPUs and GPUs have been investigated. Leveraging accelerators hosted on a cluster, we have reduced days/weeks of intensive simulation to parallel execution of just a few minutes/seconds. Our GPU-integrated cluster solution can potentially support concurrent modeling of hundreds of cellulose fibril structures, opening up new avenues for energy research.

References

[1]

CUDA C Programming Guide. NVIDIA Corporation, Feb, 2014.

[2]

Ashwin M Aji, Lokendra S Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R Bisset, James Dinan, Wuchun Feng, John MellorCrummey, et al. On the efficacy of gpu-integrated mpi for scientific applications. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pages 191–202. ACM, 2013.

Digital Library

[3]

Axel Arnold, Olaf Lenz, Stefan Kesselheim, Rudolf Weeber, Florian Fahrenberger, Dominic Roehm, Peter Košovan, and Christian Holm. Espresso 3.1: Molecular dynamics software for coarse-grained models. In Meshfree methods for partial differential equations VI, pages 1–23. Springer, 2013.

[4]

Andreas Athanasopoulos, Anastasios Dimou, Vasileios Mezaris, and Ioannis Kompatsiaris. Gpu acceleration for support vector machines. In Procs. 12th Inter. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Delft, Netherlands, 2011.

[5]

Jeffrey DiMarco and Michela Taufer. Performance impact of dynamic parallelism on different clustering algorithms. In SPIE Defense, Security, and Sensing, pages 87520E–87520E. International Society for Optics and Photonics, 2013.

[6]

Wu-chun Feng, Yong Cao, Debprakash Patnaik, and Naren Ramakrishnan. Temporal data mining for neuroscience. GPU Computing Gems Emerald Edition, page 211, 2011.

[7]

OTTO Glatter. A new method for the evaluation of small-angle scattering data. Journal of Applied Crystallography, 10(5):415–421, 1977.

[8]

OTTO Glatter. The interpretation of real-space information from small-angle scattering experiments. Journal of Applied Crystallography, 12(2):166–175, 1979.

[9]

Richard V Greene and Biosciences Center Director. National renewable energy laboratory. 2013.

[10]

Robert D Hagan. Multi-GPU Load Balancing for Simulation and Rendering. PhD thesis, Virginia Polytechnic Institute and State University, 2011.

[11]

Hideyo Inouye, Paul E Fraser, and Daniel A Kirschner. Structure of beta-crystallite assemblies formed by alzheimer beta-amyloid protein analogues: analysis by x-ray diffraction. Biophysical journal, 64(2):502–519, 1993.

[12]

Hideyo Inouye, Yan Zhang, Lin Yang, Nagarajan Venugopalan, Robert F Fischetti, S Charlotte Gleber, Stefan Vogt, W Fowle, Bryan Makowski, Melvin Tucker, et al. Multiscale deconstruction of molecular architecture in corn stover. Scientific reports, 4, 2014.

[13]

Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. Modeling gpu-cpu workloads and systems. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 31–42. ACM, 2010.

Digital Library

[14]

Dimitri Komatitsch, Gordon Erlebacher, Dominik Göddeke, and David Michéa. High-order finite-element seismic wave propagation modeling with mpi on a large gpu cluster. Journal of Computational Physics, 229(20):7692–7714, 2010.

Digital Library

[15]

Jiliang Liu, Hideyo Inouye, Nagarajan Venugopalan, Robert F Fischetti, S Charlotte Gleber, Stefan Vogt, Joanne C Cusumano, Jeong Im Kim, Clint Chapple, and Lee Makowski. Tissue specific specialization of the nanoscale architecture of arabidopsis. Journal of structural biology, 184(2):103–114, 2013.

[16]

Weiguo Liu, Bertil Schmidt, Gerrit Voss, and Wolfgang Muller-Wittig. Streaming algorithms for biological sequence alignment on gpus. Parallel and Distributed Systems, IEEE Transactions on, 18(9):1270–1281, 2007.

Digital Library

[17]

Massachusetts Green High Performance Computing Center. http://www.northeastern.edu/rc.

[18]

William Shu Lai Mok and Michael Jerry Antal Jr. Uncatalyzed solvolysis of whole biomass hemicellulose by hot compressed liquid water. Industrial & Engineering Chemistry Research, 31(4):1157––1161, 1992.

[19]

Yoshiharu Nishiyama, Paul Langan, and Henri Chanzy. Crystal structure and hydrogen-bonding system in cellulose Iβ from synchrotron x-ray and neutron fiber diffraction. Journal of the American Chemical Society, 124(31):9074–9082, 2002.

[20]

Yoshiharu Nishiyama, Junji Sugiyama, Henri Chanzy, and Paul Langan. Crystal structure and hydrogen bonding system in cellulose Iα from synchrotron x-ray and neutron fiber diffraction. Journal of the American Chemical Society, 125(47):14300–14306, 2003.

[21]

NVIDIA. Nvidia’s Next Generation CUDA TM Compute Architecture, Kepler TM GK110, 2012.

[22]

NVIDIA. Profiler, Compute Visual, August 2014.

[23]

CUDA NVidia. Occupancy calculator, 2009.

[24]

DK Panda. Mvapich2: A high performance mpi library for nvidia gpu clusters with infiniband. GTC2013,(March 20, 2013), 2013.

[25]

Fanny Nina Paravecino and David Kaeli. Accelerated connected component labeling using cuda framework. In Computer Vision and Graphics, pages 502–509. Springer, 2014.

[26]

Dino Quintero, Luis Carlos Cruz, Ricardo Machado Picone, Dusan Smolej, Daniel de Souza Casali, Gheorghe Tudor, Joanna Wong, et al. IBM Platform Computing Solutions Reference Architectures and Best Practices. IBM Redbooks, 2014.

[27]

Martin Schweiger. Gpu-accelerated finite element method for modelling light transport in diffuse optical tomography. Journal of Biomedical Imaging, 2011:10, 2011.

Digital Library

[28]

John A Stratton, Nasser Anssari, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Liwen Chang, Geng Daniel Liu, and Wen-mei Hwu. Optimization and architecture effects on gpu computing workload performance. In Innovative Parallel Computing (InPar), 2012, pages 1–10. IEEE, 2012.

[29]

Top500. List of top 500 supercomputers. http://www.top500.org/lists/2014/06/, 2014.

[30]

Stanley Tzeng, Anjul Patney, and John D Owens. Task management for irregular-parallel workloads on the gpu. In Proceedings of the Conference on High Performance Graphics, pages 29–37. Eurographics Association, 2010.

Digital Library

[31]

Yash Ukidave, Amir Kavyan Ziabari, Perhaad Mistry, Gunar Schirner, and David Kaeli. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms. International Journal of High Performance Computing Applications, page 1094342014526907, 2014.

Digital Library

[32]

Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Xiangyong Ouyang, Sayantan Sur, and Dhabaleswar K Panda. Optimized non-contiguous mpi datatype communication for gpu clusters: Design, implementation and evaluation with mvapich2. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 308–316. IEEE, 2011.

Digital Library

[33]

Nathan Whitehead and Alex Fit-Florea. Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B), 21:1––1874919424, 2011.

[34]

Yan Zhang, Leiming Yu, David Kaeli, and Lee Makowski. Fast simulation of x-ray diffraction patterns from cellulose fibrils using gpus. In Northeast Bioengineering Conference (NEBEC), 2014 40th Annual, pages 1–2. IEEE, 2014.

[35]

Kaiyong Zhao and Xiaowen Chu. G-blastn: accelerating nucleotide alignment by graphics processors. Bioinformatics, page btu047, 2014.

Cited By

Li GPattabiraman KCher CBose PWest J(2016)Understanding error propagation in GPGPU applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014932(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014932
Li GPattabiraman KCher CBose P(2016)Understanding Error Propagation in GPGPU ApplicationsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.20(240-251)Online publication date: Nov-2016
https://doi.org/10.1109/SC.2016.20

Index Terms

High performance computing of fiber scattering simulation

Recommendations

Accelerated high-performance computing through efficient multi-process GPU resource sharing
CF '12: Proceedings of the 9th conference on Computing Frontiers

The HPC field is witnessing a widespread adoption of GPUs as accelerators for traditional homogeneous HPC systems. One of the prevalent parallel programming models is the SPMD paradigm, which has been adapted for GPU-based parallel processing. Since ...
Efficient simulation of agent-based models on multi-GPU and multi-core clusters
SIMUTools '10: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-...
Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

February 2015

120 pages

ISBN:9781450334075

DOI:10.1145/2716282

Program Chairs:
David Kaeli
Northeastern University, USA
,
John Cavazos
University of Delaware, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GPGPU-8

GPGPU-8: General-purpose Processing with Graphics Processing Units 8

February 7, 2015

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
147
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li GPattabiraman KCher CBose PWest J(2016)Understanding error propagation in GPGPU applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014932(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014932
Li GPattabiraman KCher CBose P(2016)Understanding Error Propagation in GPGPU ApplicationsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.20(240-251)Online publication date: Nov-2016
https://doi.org/10.1109/SC.2016.20

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten