Enhancing Small Language Models via ChatGPT and Dataset Augmentation

Pieper, Tom; Ballout, Mohamad; Krumnack, Ulf; Heidemann, Gunther; Kühnberger, Kai-Uwe

doi:10.1007/978-3-031-70242-6_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14763))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

453 Accesses
1 Citations

Abstract

This paper explores the enhancement of small language models through strategic dataset augmentation via ChatGPT-3.5-Turbo, in the domain of Natural Language Inference (NLI). By employing knowledge distillation-based techniques and synthetic dataset augmentation, we aim to bridge the performance gap between large language models (LLMs) and small language models (SLMs) without the immense cost of human annotation. Our methods involve two forms of rationale generation–information extraction and informed reasoning–to enrich the ANLI dataset. We then fine-tune T5-Small on these augmented datasets, evaluating its performance against an established benchmark. Our findings reveal that the incorporation of synthetic rationales significantly improves the model’s ability to comprehend natural language, leading to 1.3% and 2.3% higher classification accuracy, respectively, on the ANLI dataset, demonstrating the potential of leveraging LLMs for dataset augmentation. This approach not only enhances the performance of smaller models on complex tasks but also introduces a cost-effective method for fine-tuning smaller language models. By advancing our understanding of knowledge distillation and fine-tuning strategies, this work contributes to the ongoing effort to create more capable and efficient NLP systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballout, M., Krumnack, U., Heidemann, G., Kuehnberger, K.U.: Show me how it’s done: the role of explanations in fine-tuning language models (2024)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners, May 2020. http://arxiv.org/abs/2005.14165
Bucil, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 535–541. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1150402.1150464
Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations (2018)
Google Scholar
Chakrabarty, T., Saakyan, A., Ghosh, D., Muresan, S.: FLUTE: figurative language understanding through textual explanations (2022)
Google Scholar
DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models (2020)
Google Scholar
Github Repository. https://github.com/tomlpieper/ba. Accessed 24 May 2024
Hase, P., Bansal, M.: When can models learn from explanations? A formal framework for understanding the roles of explanation data. In: Andreas, J., Narasimhan, K., Nematzadeh, A. (eds.) Proceedings of the First Workshop on Learning with Natural Language Supervision, Dublin, Ireland, pp. 29–39. Association for Computational Linguistics, May 2022. https://doi.org/10.18653/v1/2022.lnls-1.4. https://aclanthology.org/2022.lnls-1.4
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
Google Scholar
Ho, N., Schmid, L., Yun, S.Y.: Large language models are reasoning teachers (2023)
Google Scholar
Hsieh, C.Y., et al.: Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes (2023)
Google Scholar
Li, S., et al.: Explanations from large language models make small reasoners better (2022)
Google Scholar
Liu, A., Swayamdipta, S., Smith, N.A., Choi, Y.: WANLI: worker and AI collaboration for natural language inference dataset creation. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, pp. 6826–6847. Association for Computational Linguistics, December 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.508. https://aclanthology.org/2022.findings-emnlp.508
MacCartney, B., Manning, C.D.: Modeling semantic containment and exclusion in natural language inference. In: Scott, D., Uszkoreit, H. (eds.) Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 521–528. Coling 2008 Organizing Committee, August 2008. https://aclanthology.org/C08-1066
Meng, Y., Huang, J., Zhang, Y., Han, J.: Generating training data with language models: towards zero-shot language understanding (2022)
Google Scholar
Narang, S., Raffel, C., Lee, K., Roberts, A., Fiedel, N., Malkan, K.: Wt5?! Training text-to-text models to explain their predictions (2020)
Google Scholar
Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding (2020)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318, ACL 2002. Association for Computational Linguistics, USA (2002). https://doi.org/10.3115/1073083.1073135. https://doi.org/10.3115/1073083.1073135
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2023)
Google Scholar
Rajani, N.F., McCann, B., Xiong, C., Socher, R.: Explain yourself! Leveraging language models for commonsense reasoning (2019)
Google Scholar
Barattieri di San Pietro, C., Frau, F., Mangiaterra, V., Bambini, V.: The pragmatic profile of chatGPT: assessing the communicative skills of a conversational agent. Sistemi Intelligenti XXXV, 379–400 (2023). https://doi.org/10.1422/108136
Talmor, A., Herzig, J., Lourie, N., Berant, J.: CommonsenseQA: a question answering challenge targeting commonsense knowledge (2019)
Google Scholar
Lanham, M.: Attention is all we need! In: Generating a New Reality, pp. 195–222. Apress, Berkeley (2021). https://doi.org/10.1007/978-1-4842-7092-9_7
Chapter MATH Google Scholar
Wang, L., Yoon, K.J.: Knowledge distillation and student-teacher learning for visual intelligence: a review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3048–3068 (2022). https://doi.org/10.1109/TPAMI.2021.3055564
Article MATH Google Scholar
Wang, R., Zhou, W., Sachan, M.: Let’s synthesize step by step: Iterative dataset synthesis with large language models by extrapolating errors from small models (2023)
Google Scholar
Wei, J., et al.: Finetuned language models are zero-shot learners (2022)
Google Scholar
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)
Google Scholar
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing (2020)
Google Scholar
Ye, J., et al.: ZEROGEN: efficient zero-shot learning via dataset generation (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cognitive Science, Osnabrueck University, 49069, Osnabrück, Germany
Tom Pieper, Mohamad Ballout, Ulf Krumnack, Gunther Heidemann & Kai-Uwe Kühnberger

Authors

Tom Pieper
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Ballout
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Krumnack
View author publications
You can also search for this author in PubMed Google Scholar
Gunther Heidemann
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Kühnberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Pieper .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Amon Rapp
University of Turin, Turin, Italy
Luigi Di Caro
University of Derby, Derby, UK
Farid Meziane
Oakland University, Rochester, MI, USA
Vijayan Sugumaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pieper, T., Ballout, M., Krumnack, U., Heidemann, G., Kühnberger, KU. (2024). Enhancing Small Language Models via ChatGPT and Dataset Augmentation. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-70242-6_26
Published: 20 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70241-9
Online ISBN: 978-3-031-70242-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Small Language Models via ChatGPT and Dataset Augmentation