Skip to main content

Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14304))

  • 537 Accesses

Abstract

Dialogue generation task is one of the popular research topics in the field of natural language processing. However, how to improve the quality of model generated responses with the user feedback in the dialogue generation task is still one of the difficulties in the research. In this paper, we propose a dialogue generation method based on user feedback by modeling the likeability of user feedback and optimizing the model by using Reinforcement Learning from Human Feedback (RLHF) techniques to generate more likeable responses to users. We also introduce commonsense inference to help the model better understand the knowledge context and user intent. Finally, we used contrastive search in the decoding stage to make the generated responses more diverse. To verify the effectiveness of the model, we conducted some experiments and compared our model with the baseline models. The experiment results show that our approach outperforms the baseline models in terms of automatic evaluation. The final evaluation results show that our model ranks 2nd in the NLPCC 2023 Shared Task 9 Track 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/svjack/comet-atomic-zh.

References

  1. Jaques, N., et al.: Human-centric dialog training via offline reinforcement learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 3985–4003 (2020)

    Google Scholar 

  2. Gao, X., Zhang, Y., Galley, M., Brockett, C., Dolan, B.: Dialogue response ranking training with large-scale human feedback data. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 386–395 (2020)

    Google Scholar 

  3. Zhang, S., et al.: Multi-action dialog policy learning from logged user feedback. arXiv preprint arXiv:2302.13505 (2023)

  4. Ziegler, D.M., et al.: Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019)

  5. Wu, Z., Bi, W., Li, X., Kong, L., Kao, B.: Lexical knowledge internalization for neural dialog generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022, pp. 7945–7958 (2022)

    Google Scholar 

  6. Liu, Z., Wang, H., Niu, Z.-Y., Wu, H., Che, W., Liu, T.: Towards conversational recommendation over multi-type dialogs. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 1036–1049 (2020)

    Google Scholar 

  7. Wu, Z., et al.: Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693 (2023)

  8. Sap, M., et al.: Atomic: an atlas of machine commonsense for if-then reasoning. Proc. AAAI Conf. Artif. Intell. 33, 3027–3035 (2019)

    Google Scholar 

  9. Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: COMET: commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 4762–4779 (2019)

    Google Scholar 

  10. Sabour, S., Zheng, C., Huang, M.: CEM: commonsense-aware empathetic response generation. Proc. AAAI Conf. Artif. Intell. 36, 11229–11237 (2022)

    Google Scholar 

  11. Sharma, A., Lin, I.W., Miner, A.S., Atkins, D.C., Althoff, T.: Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach. In: Proceedings of the Web Conference 2021, pp. 194–205 (2021)

    Google Scholar 

  12. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  13. Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. In: NeurIPS (2022)

    Google Scholar 

  14. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 7871–7880 (2020)

    Google Scholar 

  15. Wang, Y., et al.: A large-scale Chinese short-text conversation dataset. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pp. 91–103. Springer (2020)

    Google Scholar 

  16. Yuxian, Gu., et al.: EVA2.0: investigating open-domain Chinese dialogue systems with large-scale pre-training. Mach. Intell. Res. 20(2), 207–219 (2023)

    Article  Google Scholar 

Download references

Acknowledgments

The work was supported by National Natural Science Foundation of China (62172086, 62272092).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daling Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, M., Wang, D., Feng, S., Zhang, Y. (2023). Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44699-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44698-6

  • Online ISBN: 978-3-031-44699-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics