skip to main content
10.1145/2716282.2716285acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

High performance computing of fiber scattering simulation

Published: 07 February 2015 Publication History

Abstract

Cellulose is one of the most promising energy resources that is waiting to be tapped. Harvesting energy from cellulose requires decoding its atomic structure. Some structural information can be exposed by modeling data produced by X-ray scattering. Forward simulation can be used to explore structural parameters of cellulose, including the diameter, twist and coiling, but modeling fiber scattering is computationally challenging. In this paper, we explore how to accelerate a molecular scattering algorithm by leveraging a modern high-end Graphic Processing Unit (GPU). A step-wise optimization approach is described in this work that considers memory utilization, math intrinsics, concurrent kernel execution and workload partitioning. Different caching strategies to manage the state of the atom volume in memory are taken into account. We have developed optimized cluster solutions for both CPUs and GPUs. Different workload distribution schemes and con- current execution approaches for both CPUs and GPUs have been investigated. Leveraging accelerators hosted on a cluster, we have reduced days/weeks of intensive simulation to parallel execution of just a few minutes/seconds. Our GPU-integrated cluster solution can potentially support concurrent modeling of hundreds of cellulose fibril structures, opening up new avenues for energy research.

References

[1]
CUDA C Programming Guide. NVIDIA Corporation, Feb, 2014.
[2]
Ashwin M Aji, Lokendra S Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R Bisset, James Dinan, Wuchun Feng, John MellorCrummey, et al. On the efficacy of gpu-integrated mpi for scientific applications. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pages 191–202. ACM, 2013.
[3]
Axel Arnold, Olaf Lenz, Stefan Kesselheim, Rudolf Weeber, Florian Fahrenberger, Dominic Roehm, Peter Košovan, and Christian Holm. Espresso 3.1: Molecular dynamics software for coarse-grained models. In Meshfree methods for partial differential equations VI, pages 1–23. Springer, 2013.
[4]
Andreas Athanasopoulos, Anastasios Dimou, Vasileios Mezaris, and Ioannis Kompatsiaris. Gpu acceleration for support vector machines. In Procs. 12th Inter. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Delft, Netherlands, 2011.
[5]
Jeffrey DiMarco and Michela Taufer. Performance impact of dynamic parallelism on different clustering algorithms. In SPIE Defense, Security, and Sensing, pages 87520E–87520E. International Society for Optics and Photonics, 2013.
[6]
Wu-chun Feng, Yong Cao, Debprakash Patnaik, and Naren Ramakrishnan. Temporal data mining for neuroscience. GPU Computing Gems Emerald Edition, page 211, 2011.
[7]
OTTO Glatter. A new method for the evaluation of small-angle scattering data. Journal of Applied Crystallography, 10(5):415–421, 1977.
[8]
OTTO Glatter. The interpretation of real-space information from small-angle scattering experiments. Journal of Applied Crystallography, 12(2):166–175, 1979.
[9]
Richard V Greene and Biosciences Center Director. National renewable energy laboratory. 2013.
[10]
Robert D Hagan. Multi-GPU Load Balancing for Simulation and Rendering. PhD thesis, Virginia Polytechnic Institute and State University, 2011.
[11]
Hideyo Inouye, Paul E Fraser, and Daniel A Kirschner. Structure of beta-crystallite assemblies formed by alzheimer beta-amyloid protein analogues: analysis by x-ray diffraction. Biophysical journal, 64(2):502–519, 1993.
[12]
Hideyo Inouye, Yan Zhang, Lin Yang, Nagarajan Venugopalan, Robert F Fischetti, S Charlotte Gleber, Stefan Vogt, W Fowle, Bryan Makowski, Melvin Tucker, et al. Multiscale deconstruction of molecular architecture in corn stover. Scientific reports, 4, 2014.
[13]
Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. Modeling gpu-cpu workloads and systems. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 31–42. ACM, 2010.
[14]
Dimitri Komatitsch, Gordon Erlebacher, Dominik Göddeke, and David Michéa. High-order finite-element seismic wave propagation modeling with mpi on a large gpu cluster. Journal of Computational Physics, 229(20):7692–7714, 2010.
[15]
Jiliang Liu, Hideyo Inouye, Nagarajan Venugopalan, Robert F Fischetti, S Charlotte Gleber, Stefan Vogt, Joanne C Cusumano, Jeong Im Kim, Clint Chapple, and Lee Makowski. Tissue specific specialization of the nanoscale architecture of arabidopsis. Journal of structural biology, 184(2):103–114, 2013.
[16]
Weiguo Liu, Bertil Schmidt, Gerrit Voss, and Wolfgang Muller-Wittig. Streaming algorithms for biological sequence alignment on gpus. Parallel and Distributed Systems, IEEE Transactions on, 18(9):1270–1281, 2007.
[17]
Massachusetts Green High Performance Computing Center. http://www.northeastern.edu/rc.
[18]
William Shu Lai Mok and Michael Jerry Antal Jr. Uncatalyzed solvolysis of whole biomass hemicellulose by hot compressed liquid water. Industrial & Engineering Chemistry Research, 31(4):1157––1161, 1992.
[19]
Yoshiharu Nishiyama, Paul Langan, and Henri Chanzy. Crystal structure and hydrogen-bonding system in cellulose Iβ from synchrotron x-ray and neutron fiber diffraction. Journal of the American Chemical Society, 124(31):9074–9082, 2002.
[20]
Yoshiharu Nishiyama, Junji Sugiyama, Henri Chanzy, and Paul Langan. Crystal structure and hydrogen bonding system in cellulose Iα from synchrotron x-ray and neutron fiber diffraction. Journal of the American Chemical Society, 125(47):14300–14306, 2003.
[21]
NVIDIA. Nvidia’s Next Generation CUDA TM Compute Architecture, Kepler TM GK110, 2012.
[22]
NVIDIA. Profiler, Compute Visual, August 2014.
[23]
CUDA NVidia. Occupancy calculator, 2009.
[24]
DK Panda. Mvapich2: A high performance mpi library for nvidia gpu clusters with infiniband. GTC2013,(March 20, 2013), 2013.
[25]
Fanny Nina Paravecino and David Kaeli. Accelerated connected component labeling using cuda framework. In Computer Vision and Graphics, pages 502–509. Springer, 2014.
[26]
Dino Quintero, Luis Carlos Cruz, Ricardo Machado Picone, Dusan Smolej, Daniel de Souza Casali, Gheorghe Tudor, Joanna Wong, et al. IBM Platform Computing Solutions Reference Architectures and Best Practices. IBM Redbooks, 2014.
[27]
Martin Schweiger. Gpu-accelerated finite element method for modelling light transport in diffuse optical tomography. Journal of Biomedical Imaging, 2011:10, 2011.
[28]
John A Stratton, Nasser Anssari, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Liwen Chang, Geng Daniel Liu, and Wen-mei Hwu. Optimization and architecture effects on gpu computing workload performance. In Innovative Parallel Computing (InPar), 2012, pages 1–10. IEEE, 2012.
[29]
Top500. List of top 500 supercomputers. http://www.top500.org/lists/2014/06/, 2014.
[30]
Stanley Tzeng, Anjul Patney, and John D Owens. Task management for irregular-parallel workloads on the gpu. In Proceedings of the Conference on High Performance Graphics, pages 29–37. Eurographics Association, 2010.
[31]
Yash Ukidave, Amir Kavyan Ziabari, Perhaad Mistry, Gunar Schirner, and David Kaeli. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms. International Journal of High Performance Computing Applications, page 1094342014526907, 2014.
[32]
Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Xiangyong Ouyang, Sayantan Sur, and Dhabaleswar K Panda. Optimized non-contiguous mpi datatype communication for gpu clusters: Design, implementation and evaluation with mvapich2. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 308–316. IEEE, 2011.
[33]
Nathan Whitehead and Alex Fit-Florea. Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B), 21:1––1874919424, 2011.
[34]
Yan Zhang, Leiming Yu, David Kaeli, and Lee Makowski. Fast simulation of x-ray diffraction patterns from cellulose fibrils using gpus. In Northeast Bioengineering Conference (NEBEC), 2014 40th Annual, pages 1–2. IEEE, 2014.
[35]
Kaiyong Zhao and Xiaowen Chu. G-blastn: accelerating nucleotide alignment by graphics processors. Bioinformatics, page btu047, 2014.

Cited By

View all
  • (2016)Understanding error propagation in GPGPU applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014932(1-12)Online publication date: 13-Nov-2016
  • (2016)Understanding Error Propagation in GPGPU ApplicationsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.20(240-251)Online publication date: Nov-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs
February 2015
120 pages
ISBN:9781450334075
DOI:10.1145/2716282
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cluster
  2. Fiber Scattering Simulation
  3. GPU

Qualifiers

  • Research-article

Conference

GPGPU-8

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Understanding error propagation in GPGPU applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014932(1-12)Online publication date: 13-Nov-2016
  • (2016)Understanding Error Propagation in GPGPU ApplicationsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.20(240-251)Online publication date: Nov-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media