Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference

Cai, Mingxiu; Wang, Daling; Feng, Shi; Zhang, Yifei

doi:10.1007/978-3-031-44699-3_34

Mingxiu Cai¹¹,
Daling Wang¹¹,
Shi Feng¹¹ &
…
Yifei Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14304))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

537 Accesses

Abstract

Dialogue generation task is one of the popular research topics in the field of natural language processing. However, how to improve the quality of model generated responses with the user feedback in the dialogue generation task is still one of the difficulties in the research. In this paper, we propose a dialogue generation method based on user feedback by modeling the likeability of user feedback and optimizing the model by using Reinforcement Learning from Human Feedback (RLHF) techniques to generate more likeable responses to users. We also introduce commonsense inference to help the model better understand the knowledge context and user intent. Finally, we used contrastive search in the decoding stage to make the generated responses more diverse. To verify the effectiveness of the model, we conducted some experiments and compared our model with the baseline models. The experiment results show that our approach outperforms the baseline models in terms of automatic evaluation. The final evaluation results show that our model ranks 2nd in the NLPCC 2023 Shared Task 9 Track 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://huggingface.co/svjack/comet-atomic-zh.

References

Jaques, N., et al.: Human-centric dialog training via offline reinforcement learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 3985–4003 (2020)
Google Scholar
Gao, X., Zhang, Y., Galley, M., Brockett, C., Dolan, B.: Dialogue response ranking training with large-scale human feedback data. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 386–395 (2020)
Google Scholar
Zhang, S., et al.: Multi-action dialog policy learning from logged user feedback. arXiv preprint arXiv:2302.13505 (2023)
Ziegler, D.M., et al.: Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019)
Wu, Z., Bi, W., Li, X., Kong, L., Kao, B.: Lexical knowledge internalization for neural dialog generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022, pp. 7945–7958 (2022)
Google Scholar
Liu, Z., Wang, H., Niu, Z.-Y., Wu, H., Che, W., Liu, T.: Towards conversational recommendation over multi-type dialogs. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 1036–1049 (2020)
Google Scholar
Wu, Z., et al.: Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693 (2023)
Sap, M., et al.: Atomic: an atlas of machine commonsense for if-then reasoning. Proc. AAAI Conf. Artif. Intell. 33, 3027–3035 (2019)
Google Scholar
Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: COMET: commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 4762–4779 (2019)
Google Scholar
Sabour, S., Zheng, C., Huang, M.: CEM: commonsense-aware empathetic response generation. Proc. AAAI Conf. Artif. Intell. 36, 11229–11237 (2022)
Google Scholar
Sharma, A., Lin, I.W., Miner, A.S., Atkins, D.C., Althoff, T.: Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach. In: Proceedings of the Web Conference 2021, pp. 194–205 (2021)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. In: NeurIPS (2022)
Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 7871–7880 (2020)
Google Scholar
Wang, Y., et al.: A large-scale Chinese short-text conversation dataset. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pp. 91–103. Springer (2020)
Google Scholar
Yuxian, Gu., et al.: EVA2.0: investigating open-domain Chinese dialogue systems with large-scale pre-training. Mach. Intell. Res. 20(2), 207–219 (2023)
Article Google Scholar

Download references

Acknowledgments

The work was supported by National Natural Science Foundation of China (62172086, 62272092).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, China
Mingxiu Cai, Daling Wang, Shi Feng & Yifei Zhang

Authors

Mingxiu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Daling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daling Wang .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, M., Wang, D., Feng, S., Zhang, Y. (2023). Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-44699-3_34
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44698-6
Online ISBN: 978-3-031-44699-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference