Skip to main content

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14854))

Included in the following conference series:

  • 368 Accesses

Abstract

Conversational Recommender Systems (CRS) illuminate user preferences via multi-round interactive dialogues, ultimately navigating towards precise and satisfactory recommendations. However, contemporary CRS are limited to inquiring binary or multi-choice questions based on a single attribute type (e.g., color) per round, which causes excessive rounds of interaction and diminishes the user’s experience. To address this, we propose a more realistic and efficient conversational recommendation problem setting, called Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which enables CRS to inquire about multi-choice questions covering multiple types of attributes in each round, thereby improving interactive efficiency. Moreover, by formulating MTAMCR as a hierarchical reinforcement learning task, we propose a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance both the questioning efficiency and recommendation effectiveness in MTAMCR. Specifically, a long-term policy over options (i.e., ask or recommend) determines the action type, while two short-term intra-option policies sequentially generate the chain of attributes or items through multi-step reasoning and selection, optimizing the diversity and interdependence of questioning attributes. Finally, extensive experiments on four benchmarks demonstrate the superior performance of CoCHPL over prevailing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI. p. 1726–1734 (2017)

    Google Scholar 

  2. Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NeurIPS. pp. 2787–2795 (2013)

    Google Scholar 

  3. Christakopoulou, K., Radlinski, F., Hofmann, K.: Towards conversational recommender systems. In: SIGKDD. pp. 815–824 (2016)

    Google Scholar 

  4. Deng, Y., Li, Y., Sun, F., Ding, B., Lam, W.: Unified conversational recommendation policy learning via graph-based reinforcement learning. In: SIGIR. pp. 1431–1441 (2021)

    Google Scholar 

  5. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)

    Google Scholar 

  6. Lei, W., He, X., Miao, Y., Wu, Q., Hong, R., Kan, M.Y., Chua, T.S.: Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In: WSDM. pp. 304–312 (2020)

    Google Scholar 

  7. Lei, W., Zhang, G., He, X., Miao, Y., Wang, X., Chen, L., Chua, T.S.: Interactive path reasoning on graph for conversational recommendation. In: SIGKDD. pp. 2073–2083 (2020)

    Google Scholar 

  8. Lin, A., Zhu, Z., Wang, J., Caverlee, J.: Enhancing user personalization in conversational recommenders. In: WWW. pp. 770–778 (2023)

    Google Scholar 

  9. Sun, Y., Zhang, Y.: Conversational recommender system. In: SIGIR. pp. 235–244 (2018)

    Google Scholar 

  10. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NeurIPS (1999)

    Google Scholar 

  11. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112(1-2), 181–211 (1999)

    Google Scholar 

  12. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS. pp. 6000–6010 (2017)

    Google Scholar 

  13. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)

    Google Scholar 

  14. Xie, Z., Yu, T., Zhao, C., Li, S.: Comparison-based conversational recommender system with relative bandit feedback. In: SIGIR. pp. 1400–1409 (2021)

    Google Scholar 

  15. Xu, K., Yang, J., Xu, J., Gao, S., Guo, J., Wen, J.R.: Adapting user preference to online feedback in multi-round conversational recommendation. In: WSDM. pp. 364–372 (2021)

    Google Scholar 

  16. Zhang, W., Liu, H., Han, J., Ge, Y., Xiong, H.: Multi-agent graph convolutional reinforcement learning for dynamic electric vehicle charging pricing. In: SIGKDD. pp. 2471–2481 (2022)

    Google Scholar 

  17. Zhang, W., Liu, H., Liu, Y., Zhou, J., Xiong, H.: Semi-supervised hierarchical recurrent graph neural network for city-wide parking availability prediction. In: AAAI. pp. 1186–1193 (2020)

    Google Scholar 

  18. Zhang, Y., Wu, L., Shen, Q., Pang, Y., Wei, Z., Xu, F., Long, B., Pei, J.: Multiple choice questions based multi-interest policy learning for conversational recommendation. In: WWW. pp. 2153–2162 (2022)

    Google Scholar 

  19. Zhang, Y., Chen, X., Ai, Q., Yang, L., Croft, W.B.: Towards conversational search and recommendation: System ask, user respond. In: CIKM. pp. 177–186 (2018)

    Google Scholar 

  20. Zhao, S., Wei, W., Liu, Y., Wang, Z., Li, W., Mao, X.L., Zhu, S., Yang, M., Wen, Z.: Towards hierarchical policy learning for conversational recommendation with hypergraph-based reinforcement learning. In: IJCAI. pp. 2459–2467 (2023)

    Google Scholar 

Download references

Acknowledgement

This research was supported in part by the National Natural Science Foundation of China under Grant No. 62102110, 92370204, the Guangzhou Basic and Applied Basic Research Program under Grant No. 2024A04J3279, and the Education Bureau of Guangzhou Municipality.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, W., Zhang, W., Wang, W., Song, Y., Liu, H. (2024). Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5569-1_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5568-4

  • Online ISBN: 978-981-97-5569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics