Abstract
Conversational Recommender Systems (CRS) illuminate user preferences via multi-round interactive dialogues, ultimately navigating towards precise and satisfactory recommendations. However, contemporary CRS are limited to inquiring binary or multi-choice questions based on a single attribute type (e.g., color) per round, which causes excessive rounds of interaction and diminishes the user’s experience. To address this, we propose a more realistic and efficient conversational recommendation problem setting, called Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which enables CRS to inquire about multi-choice questions covering multiple types of attributes in each round, thereby improving interactive efficiency. Moreover, by formulating MTAMCR as a hierarchical reinforcement learning task, we propose a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance both the questioning efficiency and recommendation effectiveness in MTAMCR. Specifically, a long-term policy over options (i.e., ask or recommend) determines the action type, while two short-term intra-option policies sequentially generate the chain of attributes or items through multi-step reasoning and selection, optimizing the diversity and interdependence of questioning attributes. Finally, extensive experiments on four benchmarks demonstrate the superior performance of CoCHPL over prevailing state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI. p. 1726–1734 (2017)
Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NeurIPS. pp. 2787–2795 (2013)
Christakopoulou, K., Radlinski, F., Hofmann, K.: Towards conversational recommender systems. In: SIGKDD. pp. 815–824 (2016)
Deng, Y., Li, Y., Sun, F., Ding, B., Lam, W.: Unified conversational recommendation policy learning via graph-based reinforcement learning. In: SIGIR. pp. 1431–1441 (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Lei, W., He, X., Miao, Y., Wu, Q., Hong, R., Kan, M.Y., Chua, T.S.: Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In: WSDM. pp. 304–312 (2020)
Lei, W., Zhang, G., He, X., Miao, Y., Wang, X., Chen, L., Chua, T.S.: Interactive path reasoning on graph for conversational recommendation. In: SIGKDD. pp. 2073–2083 (2020)
Lin, A., Zhu, Z., Wang, J., Caverlee, J.: Enhancing user personalization in conversational recommenders. In: WWW. pp. 770–778 (2023)
Sun, Y., Zhang, Y.: Conversational recommender system. In: SIGIR. pp. 235–244 (2018)
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NeurIPS (1999)
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112(1-2), 181–211 (1999)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS. pp. 6000–6010 (2017)
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Xie, Z., Yu, T., Zhao, C., Li, S.: Comparison-based conversational recommender system with relative bandit feedback. In: SIGIR. pp. 1400–1409 (2021)
Xu, K., Yang, J., Xu, J., Gao, S., Guo, J., Wen, J.R.: Adapting user preference to online feedback in multi-round conversational recommendation. In: WSDM. pp. 364–372 (2021)
Zhang, W., Liu, H., Han, J., Ge, Y., Xiong, H.: Multi-agent graph convolutional reinforcement learning for dynamic electric vehicle charging pricing. In: SIGKDD. pp. 2471–2481 (2022)
Zhang, W., Liu, H., Liu, Y., Zhou, J., Xiong, H.: Semi-supervised hierarchical recurrent graph neural network for city-wide parking availability prediction. In: AAAI. pp. 1186–1193 (2020)
Zhang, Y., Wu, L., Shen, Q., Pang, Y., Wei, Z., Xu, F., Long, B., Pei, J.: Multiple choice questions based multi-interest policy learning for conversational recommendation. In: WWW. pp. 2153–2162 (2022)
Zhang, Y., Chen, X., Ai, Q., Yang, L., Croft, W.B.: Towards conversational search and recommendation: System ask, user respond. In: CIKM. pp. 177–186 (2018)
Zhao, S., Wei, W., Liu, Y., Wang, Z., Li, W., Mao, X.L., Zhu, S., Yang, M., Wen, Z.: Towards hierarchical policy learning for conversational recommendation with hypergraph-based reinforcement learning. In: IJCAI. pp. 2459–2467 (2023)
Acknowledgement
This research was supported in part by the National Natural Science Foundation of China under Grant No. 62102110, 92370204, the Guangzhou Basic and Applied Basic Research Program under Grant No. 2024A04J3279, and the Education Bureau of Guangzhou Municipality.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fan, W., Zhang, W., Wang, W., Song, Y., Liu, H. (2024). Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-97-5569-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)