Skip to main content

Accelerating NWChem Coupled Cluster Through Dataflow-Based Execution

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

Abstract

Numerical techniques used for describing many-body systems, such as the Coupled Cluster methods (CC) of the quantum chemistry package NWChem, are of extreme interest to the computational chemistry community in fields such as catalytic reactions, solar energy, and bio-mass conversion. In spite of their importance, many of these computationally intensive algorithms have traditionally been thought of in a fairly linear fashion, or are parallelised in coarse chunks.

In this paper, we present our effort of converting the NWChem’s CC code into a dataflow-based form that is capable of utilizing the task scheduling system PaRSEC (Parallel Runtime Scheduling and Execution Controller) – a software package designed to enable high performance computing at scale. We discuss the modularity of our approach and explain how the PaRSEC-enabled dataflow version of the subroutines seamlessly integrate into the NWChem codebase. Furthermore, we argue how the CC algorithms can be easily decomposed into finer grained tasks (compared to the original version of NWChem); and how data distribution and load balancing are decoupled and can be tuned independently. We demonstrate performance acceleration by more than a factor of two in the execution of the entire CC component of NWChem, concluding that the utilization of dataflow-based execution for CC methods enables more efficient and scalable computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All subroutines with prefix “icsd_t2_” and suffices: 2_2_2_2(), 2_2_3(), 2_4_2(), 2_5_2(), 2_6(), lt2_3x(), 4_2_2(), 4_3(), 4_4(), 5_2(), 5_3(), 6_2_2(), 6_3(), 7_2(), 7_3(), vt1ic_1_2(), 8(), 2_2_2(), 2_4(), 2_5(), 4_2(), 5(), 6_2(), vt1ic_1, 7(), 2_2(), 4(), 6(), 2().

References

  1. Bartlett, R.J., Musial, M.: Coupled-cluster theory in quantum chemistry. Rev. Mod. Phys. 79(1), 291–352 (2007)

    Article  Google Scholar 

  2. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38(12), 37–51 (2012)

    Article  Google Scholar 

  3. Cosnard, M., Loi, M.: Automatic task graph generation techniques. In: Proceedings of the 28th Hawaii International Conference on System Sciences (1995)

    Google Scholar 

  4. Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.: PTG: an abstraction for unhindered parallelism. In: Proceedings of International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (2014)

    Google Scholar 

  5. Deumens, E., Lotrich, V.F., Perera, A., Ponton, M.J., Sanders, B.A., Bartlett, R.J.: Software design of ACES III with the super instruction architecture. Wiley Interdisc. Rev. Comput. Mol. Sci. 1(6), 895–901 (2011)

    Article  Google Scholar 

  6. Hirata, S.: Tensor contraction engine: abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. J. Phys. Chem. A 107(46), 9887–9897 (2003)

    Article  Google Scholar 

  7. Kowalski, K., Krishnamoorthy, S., Olson, R., Tipparaju, V., Aprà, E.: Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin-based systems. In: High Performance Computing, Networking, Storage and Analysis (SC), 2011, pp. 1–10 (2011)

    Google Scholar 

  8. Lai, P.W., Stock, K., Rajbhandari, S., Krishnamoorthy, S., Sadayappan, P.: A framework for load balancing of tensor contraction expressions via dynamic task partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–10. ACM (2013)

    Google Scholar 

  9. Lotrich, V., Flocke, N., Ponton, M., Yau, A., Perera, A., Deumens, E., Bartlett, R.: Parallel implementation of electronic structure energy, gradient and hessian calculations. J. Chem. Phys. 128, 194104-1–194104-15 (2008)

    Article  Google Scholar 

  10. McCraw, H., Danalis, A., Herault, T., Bosilca, G., Dongarra, J., Kowalski, K., Windus, T.: Utilizing dataflow-based execution for coupled cluster methods. In: Proceedings of IEEE Cluster 2014, pp. 296–297 (2014)

    Google Scholar 

  11. Nieplocha, J., Palmer, B., Tipparaju, V., Krishnan, M., Trease, H., Apra, E.: Advances, applications and performance of the global arrays shared memory programming toolkit. Int. J. High Perform. Comput. Appl. 20(2), 203–231 (2006)

    Article  Google Scholar 

  12. Ozog, D., Shende, S., Malony, A., Hammond, J., Dinan, J., Balaji, P.: Inspector/executor load balancing algorithms for block-sparse tensor contractions. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS 2013, pp. 483–484. ACM (2013)

    Google Scholar 

  13. Purvis, G., Bartlett, R.: A full coupled-cluster singles and doubles model - the inclusion of disconnected triples. J. Chem. Phys. 76(4), 1910–1918 (1982)

    Article  Google Scholar 

  14. Solomonik, E., Matthews, D., Hammond, J., Demmel, J.: Cyclops tensor framework: reducing communication and eliminating load imbalance in massively parallel contractions. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 813–824 (2013)

    Google Scholar 

  15. Valiev, M., Bylaska, E.J., Govind, N., Kowalski, K., Straatsma, T.P., Van Dam, H.J.J., Wang, D., Nieplocha, J., Aprà, E., Windus, T.L., de Jong, W.: NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput. Phys. Commun. 181(9), 1477–1489 (2010)

    Article  Google Scholar 

Download references

Acknowledgment

This material is based upon work supported in part by the Air Force Office of Scientific Research under AFOSR Award No. FA9550-12-1-0476, and the DOE Office of Science, Advanced Scientific Computing Research, under award No. DE-SC0006733 “SUPER - Institute for Sustained Performance, Energy and Resilience”. A portion of this research was performed using EMSL, a DOE Office of Science User Facility sponsored by the Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heike Jagode .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jagode, H., Danalis, A., Bosilca, G., Dongarra, J. (2016). Accelerating NWChem Coupled Cluster Through Dataflow-Based Execution. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32149-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32148-6

  • Online ISBN: 978-3-319-32149-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics