Abstract
This paper explores the enhancement of small language models through strategic dataset augmentation via ChatGPT-3.5-Turbo, in the domain of Natural Language Inference (NLI). By employing knowledge distillation-based techniques and synthetic dataset augmentation, we aim to bridge the performance gap between large language models (LLMs) and small language models (SLMs) without the immense cost of human annotation. Our methods involve two forms of rationale generation–information extraction and informed reasoning–to enrich the ANLI dataset. We then fine-tune T5-Small on these augmented datasets, evaluating its performance against an established benchmark. Our findings reveal that the incorporation of synthetic rationales significantly improves the model’s ability to comprehend natural language, leading to 1.3% and 2.3% higher classification accuracy, respectively, on the ANLI dataset, demonstrating the potential of leveraging LLMs for dataset augmentation. This approach not only enhances the performance of smaller models on complex tasks but also introduces a cost-effective method for fine-tuning smaller language models. By advancing our understanding of knowledge distillation and fine-tuning strategies, this work contributes to the ongoing effort to create more capable and efficient NLP systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ballout, M., Krumnack, U., Heidemann, G., Kuehnberger, K.U.: Show me how it’s done: the role of explanations in fine-tuning language models (2024)
Brown, T.B., et al.: Language models are few-shot learners, May 2020. http://arxiv.org/abs/2005.14165
Bucil, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 535–541. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1150402.1150464
Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations (2018)
Chakrabarty, T., Saakyan, A., Ghosh, D., Muresan, S.: FLUTE: figurative language understanding through textual explanations (2022)
DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models (2020)
Github Repository. https://github.com/tomlpieper/ba. Accessed 24 May 2024
Hase, P., Bansal, M.: When can models learn from explanations? A formal framework for understanding the roles of explanation data. In: Andreas, J., Narasimhan, K., Nematzadeh, A. (eds.) Proceedings of the First Workshop on Learning with Natural Language Supervision, Dublin, Ireland, pp. 29–39. Association for Computational Linguistics, May 2022. https://doi.org/10.18653/v1/2022.lnls-1.4. https://aclanthology.org/2022.lnls-1.4
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
Ho, N., Schmid, L., Yun, S.Y.: Large language models are reasoning teachers (2023)
Hsieh, C.Y., et al.: Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes (2023)
Li, S., et al.: Explanations from large language models make small reasoners better (2022)
Liu, A., Swayamdipta, S., Smith, N.A., Choi, Y.: WANLI: worker and AI collaboration for natural language inference dataset creation. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, pp. 6826–6847. Association for Computational Linguistics, December 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.508. https://aclanthology.org/2022.findings-emnlp.508
MacCartney, B., Manning, C.D.: Modeling semantic containment and exclusion in natural language inference. In: Scott, D., Uszkoreit, H. (eds.) Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 521–528. Coling 2008 Organizing Committee, August 2008. https://aclanthology.org/C08-1066
Meng, Y., Huang, J., Zhang, Y., Han, J.: Generating training data with language models: towards zero-shot language understanding (2022)
Narang, S., Raffel, C., Lee, K., Roberts, A., Fiedel, N., Malkan, K.: Wt5?! Training text-to-text models to explain their predictions (2020)
Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding (2020)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318, ACL 2002. Association for Computational Linguistics, USA (2002). https://doi.org/10.3115/1073083.1073135. https://doi.org/10.3115/1073083.1073135
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2023)
Rajani, N.F., McCann, B., Xiong, C., Socher, R.: Explain yourself! Leveraging language models for commonsense reasoning (2019)
Barattieri di San Pietro, C., Frau, F., Mangiaterra, V., Bambini, V.: The pragmatic profile of chatGPT: assessing the communicative skills of a conversational agent. Sistemi Intelligenti XXXV, 379–400 (2023). https://doi.org/10.1422/108136
Talmor, A., Herzig, J., Lourie, N., Berant, J.: CommonsenseQA: a question answering challenge targeting commonsense knowledge (2019)
Lanham, M.: Attention is all we need! In: Generating a New Reality, pp. 195–222. Apress, Berkeley (2021). https://doi.org/10.1007/978-1-4842-7092-9_7
Wang, L., Yoon, K.J.: Knowledge distillation and student-teacher learning for visual intelligence: a review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3048–3068 (2022). https://doi.org/10.1109/TPAMI.2021.3055564
Wang, R., Zhou, W., Sachan, M.: Let’s synthesize step by step: Iterative dataset synthesis with large language models by extrapolating errors from small models (2023)
Wei, J., et al.: Finetuned language models are zero-shot learners (2022)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing (2020)
Ye, J., et al.: ZEROGEN: efficient zero-shot learning via dataset generation (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pieper, T., Ballout, M., Krumnack, U., Heidemann, G., Kühnberger, KU. (2024). Enhancing Small Language Models via ChatGPT and Dataset Augmentation. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-70242-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70241-9
Online ISBN: 978-3-031-70242-6
eBook Packages: Computer ScienceComputer Science (R0)