Dynamic personalization in conversational recommender systems

Mahmood, Tariq; Mujtaba, Ghulam; Venturini, Adriano

doi:10.1007/s10257-013-0222-3

Dynamic personalization in conversational recommender systems

Original Article
Published: 30 April 2013

Volume 12, pages 213–238, (2014)
Cite this article

Information Systems and e-Business Management Aims and scope Submit manuscript

Tariq Mahmood¹,
Ghulam Mujtaba² &
Adriano Venturini³

1145 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

Conversational recommender systems are E-Commerce applications which interactively assist online users to acquire their interaction goals during their sessions. In our previous work, we have proposed and validated a methodology for conversational systems which autonomously learns the particular web page to display to the user, at each step of the session. We employed reinforcement learning to learn an optimal strategy, i.e., one that is personalized for a real user population. In this paper, we extend our methodology by allowing it to autonomously learn and update the optimal strategy dynamically (at run-time), and individually for each user. This learning occurs perpetually after every session, as long as the user continues her interaction with the system. We evaluate our approach in an off-line simulation with four simulated users, as well as in an online evaluation with thirteen real users. The results show that an optimal strategy is learnt and updated for each real and simulated user. For each simulated user, the optimal behavior is reasonably adapted to this user’s characteristics, but converges after several hundred sessions. For each real user, the optimal behavior converges only in several sessions. It provides assistance only in certain situations, allowing many users to buy several products together in shorter time and with more page-views and lesser number of query executions. We prove that our approach is novel and show how its current limitations can catered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Qian Zhang, Jie Lu & Yaochu Jin

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

The concept paper of this methodology appeared in the Malaysian Joint Conference on Artificial Intelligence (MJCAI) conference Mahmood et al. (2010a).
We are not computing the state transition probability model of the user’s behavior in advance.
We acquired NutKing’s data from eCTRL Solutions, an Italian company offering tourism-based technologies for conversational recommender systems.
We set these values after analyzing some simulated sessions with our user models.
http://www.useit.com/alertbox/ecommerce.html.
Example given only for 3 states with UR = SelectPromotion, but same explanation applies to the corresponding state with UR = SelectTop10.
http://www.richrelevance.com.
http://www.intershop.com.
http://www.oracle.com/us/products/applications/siebel/index.html.
http://www.omniture.com/en/products/conversion/recommendations.
http://www.locayta.com/.
Suggesting items bought by those who have similar preferences; for more details, see Resnick and Varian (1997) on these preference levels to make recommendations.
These two actions are representative of real user behaviors; users rejected tightening for result sizes approximately close to 100.

References

Brusilovsky P, Kobsa A, Nejdl W (2007) The Adaptive Web: methods and strategies of web personalization (Lecture Notes in Computer Science/Information Systems and Applications, incl. Internet/Web, and HCI), 1st edn. Springer, Berlin. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/3540720782
Aha D, Breslow L (1997) Refining conversational case libraries. In: Case-based reasoning research and development, proceedings of the 2nd international conference on case-based reasoning (ICCBR-97), Springer, pp 267–278
Anderson CR, Domingos P, Weld DS (2001) Adaptive web navigation for wireless devices. In: Proceedings of the 17th international joint conference on artificial intelligence, vol 2, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’01, pp 879–884. http://dl.acm.org/citation.cfm?id=1642194.1642211
Brusilovsky P (2001) Adaptive hypermedia. User Model User Adapt Interact 11(1–2):87–110
Article Google Scholar
Ceaparu I, Lazar J, Bessiere K, Robinson J, Shneiderman B (2004) Determining causes and severity of end-user frustration. Int J Hum Comput Interaction. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.157.8407
Cheng Y (2009) Real time demand learning-based Q-learning approach for dynamic pricing in e-retailing setting. In: IEEC ’09: Proceedings of the 2009 international symposium on information engineering and electronic commerce, IEEE Computer Society, Washington, DC, USA, pp 594–598. doi:10.1109/IEEC.2009.131
De Meo P, Rosaci D, Sarnè GM, Ursino D, Terracina G (2007) Ec-xamas: supporting e-commerce activities by an xml-based adaptive multi-agent system. Appl Artif Intell 21(6):529–562. doi:10.1080/08839510701409052
Article Google Scholar
Golovin N, Rahm E (2004) Reinforcement learning architecture for web recommendations. In: International conference on information technology: coding and computing (ITCC’04), vol 1, April 5–7, 2004, Las Vegas, Nevada, USA, pp 398–402
Goy A, Ardissono L, Petrone G (2007) Personalization in e-commerce applications. In: The adaptive web: methods and strategies of web personalization, chap 16, pp 485–520
Hirohiko Morita EU, Yamakawa T (2009) Markov model based adaptive web advertisement system by tracking a user’s taste. Int J Innov Comput Info Control 5(3):811–819
Google Scholar
Kazienko P, Kolodziejski P (2006) Personalized integration of recommendation methods for e-commerce. IJCSA 3(3):12–26
Google Scholar
Kim Y, Yum BJ, Song J, Kim SM (2005) Development of a recommender system based on navigational and behavioral patterns of customers in e-commerce sites. Expert Syst Appl 28(2):381–393
Article Google Scholar
Kobsa A, Koenemann J, Pohl W (2001) Personalized hypermedia presentation techniques for improving online customer relationships. Knowl Eng Rev 16:111–155
Article Google Scholar
Li L, Yang Z, Wang B, Kitsuregawa M (2007) Dynamic adaptation strategies for long-term and short-term user profile to personalize search. In: APWeb/WAIM’07: proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on advances in data and web management, Springer, Berlin, Heidelberg, pp 228–240
Mahmood T, Ahmed SH, Mahmood S (2010a) Optimal dynamic personalization in conversational recommender systems. In: MJCAI: Malaysian joint conference on artificial intelligence, Malaysia
Mahmood T, Ricci F, Venturini A (2010b) Improving recommendation effectiveness by adapting the dialogue strategy in online travel planning. J Info Technol Tour 11(3):285–302
Google Scholar
Mirzadeh N, Ricci F (2007) Cooperative query rewriting for decision making support and recommender systems. Appl Artif Intell 21:1–38
Article Google Scholar
Nakada T, Kanai H, Kunifuji S (2007) Dynamic book recommendation model for real bookstores. In: The 5th international conference on pervasive computing (Pervasive 2007), Canada
Nielsen J, Molich R, Snyder C, Farrell S (2001) E-Commerce eser experience. Nielsen Norman Group
Peterson ET (2011) The big book of key performance indicators, 1st edn. No. 2 in Web Analytics Demystified, webanalyticsdemystified.com
Resnick P, Varian HR (1997) Recommender systems. Commun ACM 40(3):56–58
Article Google Scholar
Rojanavasu P, Srinil P, Pinngern O (2005) New recommendation system using reinforcement learning. In: Proceedings of the fourth international conference on eBusiness, Bangkok, Thailand
Rosaci D, Sarné GM (2012) A multi-agent recommender system for supporting device adaptivity in e-commerce. J Intell Inf Syst 38(2):393–418. doi:10.1007/s10844-011-0160-9
Article Google Scholar
Schwartz B (2005) The paradox of choice : why more is less. Harper Perennial, New York City
Smith M, Lee-Urban S, Muñoz-Avila H (2007) Retaliate: learning winning policies in first-person shooter games. In: AAAI, pp 1801–1806
Song X, Lin CY, Tseng BL, Sun MT (2006) Modeling evolutionary behaviors for community-based dynamic recommendation. In: SDM
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge. http://www.cs.ualberta.ca/~sutton/book/the-book.html
Tsang SL, Clarke S (2007) Mining user models for effective adaptation of context-aware applications. In: IPC ’07: proceedings of the 2007 international conference on intelligent pervasive computing, IEEE, Washington, DC, USA, pp 178–187. doi:10.1109/IPC.2007.76
Watkins C, Dayan P (1992) Technical note: Q-learning. Mach Learn 8(3):279–292. doi:10.1023/A:1022676722315
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Computer and Emerging Sciences (NUCES), Shah Lateef Town, National Highway, Karachi, 75030, Pakistan
Tariq Mahmood
Sukkur Institute of Business Administration, Airport Road, Sukkur, Sindh, Pakistan
Ghulam Mujtaba
ECTRL Solutions SRL, Via Solteri, 38, 38121, Trento, Italy
Adriano Venturini

Authors

Tariq Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Ghulam Mujtaba
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Venturini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tariq Mahmood.

Appendices

Appendix 1: Details of user models

1.
WillUM: This user accepts tightening if any of the suggested attributes has a non-Null value in the test item. The user accepts this attribute even if it is not her next preferred attribute, according to the sorted order of the attributes’ frequency usage. This allows us to model the "willingness” of the user, because the user is accepting this attribute although she doesn’t really prefer it. If none of the suggested attributes has a non-NULL value, then acceptance cannot be simulated. In this situation, WillUM acts as follows:
- if the result size is smaller than 100, i.e., when CRS = small or when CRS = medium, the user rejects tightening and executes the original query, and
- if the result size is larger than (or equal to) 100, i.e., when CRS is either large or very large, the user rejects tightening and autonomously modifies her query (as in Case 1) (T-modq).^{Footnote 13}
2.
ModwillUM: The user considers accepting tightening only 26 % of the time that Sugg has been executed during a session. In doing so, ModwillUM simulates the real users’ response to Sugg Mirzadeh and Ricci (2007). We make a random selection (from a uniform distribution) of the situation in which the user will accept tightening. If the user considers accepting tightening, acceptance is simulated similarly to the behavior in WillUM. If acceptance cannot be simulated in this case, or if the user does not consider accepting tightening (74 % of the time), then the user either rejects tightening or manually modifies her query as in WillUM.
3.
UnwillUM: The user never accepts tightening; if \(CRS \in \{small, medium \}, \) the user rejects tightening and executes the query. Otherwise, if \(CRS \in \{large, very large\}, \) the user modifies her query as in WillUM.
4.
PopRandomUM: This model represents the behavior of a user population: each time Sugg is executed, we randomly select (from a uniform distribution) and simulate one from amongst the above three user behaviors.

Appendix 2: Q-values logged in off-line evaluation

The Q-values for all the pairs in which PUR=G is always 100, since G is a goal state. Hence, we don’t log the Q-values for such pairs. We now count the number of remaining pairs. The pair associated with the initial state is {PUR = S-go, CRS = very large}_ShowQF. Also, there are four states in which PUR=QF-execq (as mentioned above), and in each of these, the Agent can execute either Sugg or Exec; so that makes a total of \(4\times2=8\) pairs. For each of the remaining 5 PUR values, there are 4 possible states, one each for a possible value of CRS. Moreover, when PUR = T-acct, PUR = T-rejt, PUR = T-modq, PUR = R-modq, and PUR = R-add, the Agent can only take actions Exec, Exec, Modify, Modify and Add respectively. This gives \(5\times4\times1=20\) state-action pairs, and the total number of pairs for which we log the Q-values is 1 + 8 + 20 = 29.

Appendix 3: Q-values logged in online evaluation

The 23 Q-values for our online evaluation are listed below:

1.
{UR = Login, PB = False, TE = Less}_SWP
2.
{UR = BuyPromotion, PB = True, TE = Less}_ATC
3.
{UR = BuyPromotion, PB = True, TE = More}_ATC
4.
{UR = BuyPromotion, PB = False, TE = Less}_ATC
5.
{UR = BuyPromotion, PB = False, TE = More}_ATC
6.
{UR = BuyTop10, PB = True, TE = Less}_ATC
7.
{UR = BuyTop10, PB = True, TE = More}_ATC
8.
{UR = BuyTop10, PB = False, TE = Less}_ATC
9.
{UR = BuyTop10, PB = False, TE = More}_ATC
10.
{UR = SelectPromotion, PB = True, TE = Less}_SPP
11.
{UR = SelectPromotion, PB = True, TE = More}_STP
12.
{UR = SelectPromotion, PB = True, TE = More}_SPP
13.
{UR = SelectPromotion, PB = False, TE = Less}_SPP
14.
{UR = SelectPromotion, PB = False, TE = Less}_STP
15.
{UR = SelectPromotion, PB = False, TE = More}_STP
16.
{UR = SelectPromotion, PB = False, TE = More}_SPP
17.
{UR = SelectTop10, PB = True, TE = Less}_STP
18.
{UR = SelectTop10, PB = True, TE = More}_STP
19.
{UR = SelectTop10, PB = True, TE = More}_SPP
20.
{UR = SelectTop10, PB = False, TE = Less}_STP
21.
{UR = SelectTop10, PB = False, TE = Less}_SPP
22.
{UR = SelectTop10, PB = False, TE = Less}_STP
23.
{UR = SelectTop10, PB = False, TE = More}_STP
24.
{UR = SelectTop10, PB = False, TE = More}_SPP

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, T., Mujtaba, G. & Venturini, A. Dynamic personalization in conversational recommender systems. Inf Syst E-Bus Manage 12, 213–238 (2014). https://doi.org/10.1007/s10257-013-0222-3

Download citation

Received: 01 August 2012
Revised: 10 January 2013
Accepted: 03 April 2013
Published: 30 April 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10257-013-0222-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Dynamic personalization in conversational recommender systems

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Details of user models

Appendix 2: Q-values logged in off-line evaluation

Appendix 3: Q-values logged in online evaluation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic personalization in conversational recommender systems

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Details of user models

Appendix 2: Q-values logged in off-line evaluation

Appendix 3: Q-values logged in online evaluation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation