chatHPC: Empowering HPC users with large language models

Yin, Junqi; Hines, Jesse; Herron, Emily; Ghosal, Tirthankar; Liu, Hong; Prentice, Suzanne; Lama, Vanessa; Wang, Feiyi

doi:10.1007/s11227-024-06637-1

chatHPC: Empowering HPC users with large language models

Published: 21 November 2024

Volume 81, article number 194, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Junqi Yin¹,
Jesse Hines¹,
Emily Herron¹,
Tirthankar Ghosal¹,
Hong Liu¹,
Suzanne Prentice¹,
Vanessa Lama¹ &
…
Feiyi Wang¹

595 Accesses
1 Citation
Explore all metrics

Abstract

The ever-growing number of pre-trained large language models (LLMs) across scientific domains presents a challenge for application developers. While these models offer vast potential, fine-tuning them with custom data, aligning them for specific tasks, and evaluating their performance remain crucial steps for effective utilization. However, applying these techniques to models with tens of billions of parameters can take days or even weeks on modern workstations, making the cumulative cost of model comparison and evaluation a significant barrier to LLM-based application development. To address this challenge, we introduce an end-to-end pipeline specifically designed for building conversational and programmable AI agents on high performance computing (HPC) platforms. Our comprehensive pipeline encompasses: model pre-training, fine-tuning, web and API service deployment, along with crucial evaluations for lexical coherence, semantic accuracy, hallucination detection, and privacy considerations. We demonstrate our pipeline through the development of chatHPC, a chatbot for HPC question answering and script generation. Leveraging our scalable pipeline, we achieve end-to-end LLM alignment in under an hour on the Frontier supercomputer. We propose a novel self-improved, self-instruction method for instruction set generation, investigate scaling and fine-tuning strategies, and conduct a systematic evaluation of model performance. The established practices within chatHPC will serve as a valuable guidance for future LLM-based application development on HPC platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models

Article 18 June 2024

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Hugging Face Hub. (2024). https://huggingface.co. Accessed 17 Oct 2024
Bran AM, Cox S, Schilter O, Baldassari C, White AD, Schwaller P (2024) ChemCrow: augmenting large-language models with chemistry tools. Nat March Intell. https://doi.org/10.1038/s42256-024-00832-8
Jiang Z, Liu J, Chen Z, Li Y, Huang J, Huo Y, He P, Gu J, Lyu MR (2024) LILAC: log parsing using LLMs with adaptive parsing cache. https://arxiv.org/abs/2310.01796
Yin J, Liu S, Reshniak V, Wang X, Zhang G (2023) A scalable transformer model for real-time decision making in neutron scattering experiments. J Mach Learn Model Comput. https://doi.org/10.1615/jmachlearnmodelcomput.2023048607
Article Google Scholar
Yin J, Dash S, Gounley J, Wang F, Tourassi G (2023) Evaluation of pre-training large language models on leadership-class supercomputers. J Supercomput. https://doi.org/10.1007/s11227-023-05479-7
Article Google Scholar
Yin J, Dash S, Wang F, Shankar M (2023) Forge: pre-training open foundation models for science. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’23. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3581784.3613215
Yin J, Bose A, Cong G, Lyngaas I, Anthony Q (2024) Comparative study of large language model architectures on frontier. In: 38th IEEE International parallel and distributed processing symposium. https://doi.ieeecomputersociety.org/10.1109/IPDPS57955.2024.00056
Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto HP, Kaplan J, Edwards H, Burda Y, Joseph N, Brockman G, Ray A, Puri R, Krueger G, Petrov M, Khlaaf H, Sastry G, Mishkin P, Chan B, Gray S, Ryder N, Pavlov M, Power A, Kaiser L, Bavarian M, Winter C, Tillet P, Such FP, Cummings D, Plappert M, Chantzis F, Barnes E, Herbert-Voss A, Guss WH, Nichol A, Paino A, Tezak N, Tang J, Babuschkin I, Balaji S, Jain S, Saunders W, Hesse C, Carr AN, Leike J, Achiam J, Misra V, Morikawa E, Radford A, Knight M, Brundage M, Murati M, Mayer K, Welinder P, McGrew B, Amodei D, McCandlish S, Sutskever I, Zaremba W (2021) Evaluating large language models trained on code. https://arxiv.org/abs/2107.03374
Nichols D, Davis JH, Xie Z, Rajaram A, Bhatele A (2024) Can large language models write parallel code? https://arxiv.org/abs/2401.12554
Jiang Z, Lin H, Zhong Y, Huang Q, Chen Y, Zhang Z, Peng Y, Li X, Xie C, Nong S, Jia Y, He S, Chen H, Bai Z, Hou Q, Yan S, Zhou D, Sheng Y, Jiang Z, Xu H, Wei H, Zhang Z, Nie P, Zou L, Zhao S, Xiang L, Liu Z, Li Z, Jia X, Ye J, Jin X, Liu X (2024) MegaScale: scaling large language model training to more than 10,000 GPUs. In: Proceedings of the 21st USENIX symposium on networked systems design and implementation. https://www.usenix.org/system/files/nsdi24-jiang-ziheng.pdf
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R (2022) Training language models to follow instructions with human, feedback. In: 36th Conference on neural information processing systems. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
Wang Z, Bi B, Pentyala SK, Ramnath K, Chaudhuri S, Mehrotra S, Zixu Zhu Mao X-B, Asur S, Na, Cheng (2024) A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more. arXiv:2407.16216
Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu CH, Gonzalez JE, Zhang H, Stoica I (2023) Efficient memory management for large language model serving with PagedAttention. https://arxiv.org/abs/2309.06180
Zheng L, Chiang W-L, Sheng Y, Zhuang S, Wu Z, Zhuang Y, Lin Z, Li Z, Li D, Xing EP, Zhang H, Gonzalez JE, Stoica I (2023) Judging LLM-as-a-judge with MT-bench and chatbot arena. https://arxiv.org/abs/2306.05685
LangChain. https://github.com/langchain-ai/langchain. Accessed 3 Apr 2024
Godoy W, Valero-Lara P, Teranishi K, Balaprakash P, Vetter J (2023) Evaluation of openai codex for hpc parallel programming models kernel generation. In: Proceedings of the 52nd International Conference on Parallel Processing Workshops. ICPP Workshops ’23, pp 136–144. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3605731.3605886
Valero-Lara P, Huante A, Lail MA, Godoy WF, Teranishi K, Balaprakash P, Vetter JS (2023) Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation. https://arxiv.org/abs/2309.07103
Ding X, Chen L, Emani M, Liao C, Lin P-H, Vanderbruggen T, Xie Z, Cerpa A, Du W (2023) Hpc-gpt: integrating large language model for high-performance computing. In: Proceedings of the SC ’23 workshops of the international conference on high performance computing, network, storage, and analysis, SC-W ’23. Association for Computing Machinery, New York, NY, USA, pp 951–960. https://doi.org/10.1145/3624062.3624172
Chen L, Ahmed NK, Dutta A, Bhattacharjee A, Yu S, Mahmud QI, Abebe W, Phan H, Sarkar A, Butler B, Hasabnis N, Oren G, Vo VA, Munoz JP, Willke TL, Mattson T, Jannesari A (2024) The landscape and challenges of HPC research and LLMs. https://arxiv.org/abs/2402.02018
Wang Y, Zhong W, Li L, Mi F, Zeng X, Huang W, Shang L, Jiang X, Liu Q (2023) Aligning Large language models with human: a survey. https://arxiv.org/abs/2307.12966
Rafailov R, Sharma A, Mitchell E, Ermon S, Manning CD, Finn C (2023) Direct preference optimization: your language model is secretly a reward model. https://arxiv.org/abs/2305.18290
Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, Khashabi D, Hajishirzi H (2023) Self-instruct: aligning language models with self-generated instructions. https://arxiv.org/abs/2212.10560
Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2021) LoRA: low-rank adaptation of large language models. https://arxiv.org/abs/2106.09685
Chavan A, Liu Z, Gupta D, Xing E, Shen Z (2023) One-for-all: generalized LoRA for parameter-efficient fine-tuning. https://arxiv.org/abs/2306.07967
Rasley J, Rajbhandari S, Ruwase O, He Y (2020) Deepspeed: system optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’20. Association for Computing Machinery, New York, NY, USA, pp 3505–3506. https://doi.org/10.1145/3394486.3406703
Yao Z, Aminabadi RY, Ruwase O, Rajbhandari S, Wu X, Awan AA, Rasley J, Zhang M, Li C, Holmes C, Zhou Z, Wyatt M, Smith M, Kurilenko L, Qin H, Tanaka M, Che S, Song SL, He Y (2023) DeepSpeed-chat: easy, fast and affordable RLHF training of ChatGPT-like models at all scales. https://arxiv.org/abs/2308.01320
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W-T, Rocktäschel T, Riedel S, Kiela D (2021) Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in neural information processing systems 33. https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
Wang B, Chen W, Pei H, Xie C, Kang M, Zhang C, Xu C, Xiong Z, Dutta R, Schaeffer R, Truong ST, Arora S, Mazeika M, Hendrycks D, Lin Z, Cheng Y, Koyejo S, Song D, Li B (2024) DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. https://arxiv.org/abs/2306.11698
Kubeflow. https://www.kubeflow.org. Accessed 17 Oct 2024
Rajbhandari S, Rasley J, Ruwase O, He Y (2020) Zero: memory optimizations toward training trillion parameter models. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’20. IEEE Press, Atlanta, Georgia
Wang G, Qin H, Jacobs SA, Holmes C, Rajbhandari S, Ruwase O, Yan F, Yang L, He Y (2023) ZeRO++: extremely efficient collective communication for giant model. Training 1(1):1–1
Google Scholar
Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2020) Bertscore: evaluating text generation with bert. In: International conference on learning representations. https://openreview.net/forum?id=SkeHuCVFDr
Yuan W, Neubig G, Liu P (2021) Bartscore: evaluating generated text as text generation. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in neural information processing systems, pp 27263–27277. https://proceedings.neurips.cc/paper/2021/file/e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Paper.pdf
Liang P, Bommasani R, Lee T, Tsipras D, Soylu D, Yasunaga M, Zhang Y, Narayanan D, Wu Y, Kumar A, Newman B, Yuan B, Yan B, Zhang C, Cosgrove C, Manning CD, Ré C, Acosta-Navas D, Hudson DA, Zelikman E, Durmus E, Ladhak F, Rong F, Ren H, Yao H, Wang J, Santhanam K, Orr L, Zheng L, Yuksekgonul M, Suzgun M, Kim N, Guha N, Chatterji N, Khattab O, Henderson P, Huang Q, Chi R, Xie SM, Santurkar S, Ganguli S, Hashimoto T, Icard T, Zhang T, Chaudhary V, Wang W, Li X, Mai Y, Zhang Y, Koreeda Y (2022) Holistic evaluation of language models. https://arxiv.org/abs/2211.09110
Manakul P, Liusie A, Gales MJF (2023) SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. Proceedings of the 2023 conference on empirical methods in natural language processing. https://aclanthology.org/2023.emnlp-main.557.pdf
Ganesan K (2018) ROUGE 2.0: updated and improved measures for evaluation of summarization tasks. arXiv. https://arxiv.org/abs/1803.01937
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota. Association for Computational Linguistics, vol 1, pp 4171–4186
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 7871–7880
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Ferrer CC, Chen M, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W, Fuller B, Gao C, Goswami V, Goyal N, Hartshorn A, Hosseini S, Hou R, Inan H, Kardas M, Kerkez V, Khabsa M, Kloumann I, Korenev A, Koura PS, Lachaux M-A, Lavril T, Lee J, Liskovich D, Lu Y, Mao Y, Martinet X, Mihaylov T, Mishra P, Molybog I, Nie Y, Poulton A, Reizenstein J, Rungta R, Saladi K, Schelten A, Silva R, Smith EM, Subramanian R, Tan XE, Tang B, Taylor R, Williams A, Kuan JX, Xu P, Yan Z, Zarov I, Zhang Y, Fan A, Kambadur M, Narang S, Rodriguez A, Stojnic R, Edunov S, Scialom T (2023) Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
Prompt Hub. https://www.prompthub.us/resources/llm-latency-benchmark-report. Accessed 25 Mar 2024
Gao L, Tow J, Abbasi B, Biderman S, Black S, DiPofi A, Foster C, Golding L, Hsu J, Le Noac’h A, Li H, McDonell K, Muennighoff N, Ociepa C, Phang J, Reynolds L, Schoelkopf H, Skowron A, Sutawika L, Tang E, Thite A, Wang B, Wang K, Zou A (2023) A framework for few-shot language model evaluation. Zenodo. https://doi.org/10.5281/zenodo.10256836 (https://zenodo.org/records/10256836)
Article Google Scholar
Edge D, Trinh H, Cheng N, Bradley J, Chao A, Mody A, Truitt S, Larson J (2024) From local to global: a graph RAG approach to query-focused summarization. arXiv:2404.16130
Lange J, Papatheodore T, Thomas T, Effler C, Haun A, Cunningham C, Fenske K, Ferreira da Silva R, Maheshwari K, Yin J, Dash S, Eisenbach M, Hagerty N, Joo B, Holmen J, Norman M, Dietz D, Beck T, Oral S, Atchley S, Roth P (2023) Evaluating the cloud for capability class leadership, workloads. https://doi.org/10.2172/2000306
Partee S, Ellis M, Rigazzi A, Shao AE, Bachman S, Marques G, Robbins B (2022) Using machine learning at scale in numerical simulations with smartsim: an application to ocean climate modeling. J Comput Sci 62:101707. https://doi.org/10.1016/j.jocs.2022.101707
Article Google Scholar
Yin J, Wang F, Shankar MA (2023) Deepthermo: deep learning accelerated parallel Monte Carlo sampling for thermodynamics evaluation of high entropy alloys. In: 2023 IEEE International parallel and distributed processing symposium (IPDPS), pp 333–343. https://doi.org/10.1109/IPDPS54959.2023.00041

Download references

Acknowledgements

This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the US Department of Energy under Contract No. DE-AC05-00OR22725.

Funding

This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the US Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
Junqi Yin, Jesse Hines, Emily Herron, Tirthankar Ghosal, Hong Liu, Suzanne Prentice, Vanessa Lama & Feiyi Wang

Authors

Junqi Yin
View author publications
You can also search for this author inPubMed Google Scholar
Jesse Hines
View author publications
You can also search for this author inPubMed Google Scholar
Emily Herron
View author publications
You can also search for this author inPubMed Google Scholar
Tirthankar Ghosal
View author publications
You can also search for this author inPubMed Google Scholar
Hong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Suzanne Prentice
View author publications
You can also search for this author inPubMed Google Scholar
Vanessa Lama
View author publications
You can also search for this author inPubMed Google Scholar
Feiyi Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JY performed the model alignment and analysis, and wrote the main manuscript text. JH developed the frontend and deployed the model. EH and TG designed the evaluation. HL, SP, and VL contributed to the preparation of the instruction set. The project was advised by FW. All authors discussed the results and reviewed the manuscript.

Corresponding author

Correspondence to Junqi Yin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The publisher, by accepting the article for publication, acknowledges that the US Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Appendices

Appendix 1: Example prompt for SI$^2$ on help tickets

Following is an example prompt to use the retrieved_docs = retriever.similarity_search_with_score(question, k = 3)} by the fine-tuned UAE retriever from the facility documentation as context, to help guide the generation of Q&A pairs from help tickets.

Appendix 2: Example chaHPC API for script generation

chatHPC API is compatible with OpenAI, and following is an example of using chatHPC API to generate job script.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yin, J., Hines, J., Herron, E. et al. chatHPC: Empowering HPC users with large language models. J Supercomput 81, 194 (2025). https://doi.org/10.1007/s11227-024-06637-1

Download citation

Accepted: 21 October 2024
Published: 21 November 2024
DOI: https://doi.org/10.1007/s11227-024-06637-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

chatHPC: Empowering HPC users with large language models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix 1: Example prompt for SI\(^2\) on help tickets

Appendix 2: Example chaHPC API for script generation

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

chatHPC: Empowering HPC users with large language models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generative Artificial Intelligence: Opportunities and Challenges of Large Language Models

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix 1: Example prompt for SI\(^2\) on help tickets

Appendix 2: Example chaHPC API for script generation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now