Abstract
Improved search quality enhances users’ satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate relevance judgments. It is done in two steps: first, query-product pair data are aggregated from the logs and then order rate etc. are calculated for each pair in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of relevance judgements, data aggregation and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide Mercateo dataset from our platform. It contains more than 10 million AtB click logs and 1 million order logs from a catalogue of about 3.5 million products associated with 3060 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that our CRM approach learns effectively from logged data and beats a strong baseline ranker (\(\lambda \)-MART) by a huge margin. Our method outperforms full-information loss (e.g. cross-entropy) on various deep neural network models. These findings demonstrate that by adopting CRM approach, E-Com platforms can get better product search quality compared to full-information approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Available at: https://github.com/ecom-research/CRM-LTR.
- 3.
Available at: https://github.com/usnistgov/trec_eval.
- 4.
Available at: https://sourceforge.net/p/lemur/wiki/RankLib/.
References
Agrawal, R., Halverson, A., Kenthapadi, K., Mishra, N., Tsaparas, P.: Generating labels from clicks. In: WSDM 2009, pp. 172–181. ACM (2009). https://doi.org/10.1145/1498759.1498824
Bendersky, M., Wang, X., Najork, M., Metzler, D.: Learning with sparse and biased feedback for personal search. In: JCAI 2018, pp. 5219–5223. AAAI Press (2018)
Bi, K., Teo, C.H., Dattatreya, Y., Mohan, V., Croft, W.B.: Leverage implicit feedback for context-aware product search. In: eCOM@SIGIR (2019)
Borisov, A., Kiseleva, J., Markov, I., de Rijke, M.: Calibration: a simple way to improve click models. In: CIKM 2018 (2018)
Brenner, E.P., Zhao, J., Kutiyanawala, A., Yan, Z.: End-to-end neural ranking for ecommerce product search. In: SIGIR eCom, vol. 18 (2018)
Chapelle, O., Chang, Y.: Yahoo! Learning to rank challenge overview. In: Proceedings of the Learning to Rank Challenge, pp. 1–24 (2011)
Chen, D.: Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J. Database Market. Customer Strategy Manag. 19(3), 197–208 (2012). https://doi.org/10.1057/dbm.2012.17
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching N-grams in ad-hoc search. In: WSDM 2018, pp. 126–134. ACM, New York (2018). https://doi.org/10.1145/3159652.3159659, http://doi.acm.org/10.1145/3159652.3159659
Dheeru, D., Taniskidou, E.: UCI machine learning repository (2017)
Alonso, O., et al.: Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis. In: SIGIR 2009, pp. 760–761. ACM (2009)
Guo, J., Fan, Y., Ji, X., Cheng, X.: MatchZoo: a learning, practicing, and developing system for neural text matching. In: SIGIR 2019 (2019). https://doi.org/10.1145/3331184.3331403, http://doi.acm.org/10.1145/3331184.3331403
Hu, Y., Da, Q., Zeng, A., Yu, Y., Xu, Y.: Reinforcement learning to rank in e-commerce search engine: formalization, analysis, and application. In: KDD 2018, NY, USA (2018). https://doi.org/10.1145/3219819.3219846, http://doi.acm.org/10.1145/3219819.3219846
Jiang, S., et al.: Learning query and document relevance from a web-scale click graph. In: SIGIR 2016 (2016)
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002. ACM (2002). https://doi.org/10.1145/775047.775067
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 7-es (2007). https://doi.org/10.1145/1229179.1229181
Joachims, T., Swaminathan, A., Rijke, M.d.: Deep learning with logged Bandit feedback. In: ICLR 2018, May 2018
Joachims, T., Swaminathan, A., Schnabel, T.: Unbiased learning-to-rank with biased feedback. In: WSDM 2017. ACM (2017). https://doi.org/10.1145/3018661.3018699
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N.: Speeding up document ranking with rank-based features. In: SIGIR 2015, NY, USA (2015). https://doi.org/10.1145/2766462.2767776
Mitra, B., Diaz, F., Craswell, N.: Learning to match using local and distributed representations of text for web search. CoRR (2016)
Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., Cheng, X.: DeepRank: a new deep architecture for relevance ranking in information retrieval. CoRR abs/1710.05649 (2017)
Qi, Y., Wu, Q., Wang, H., Tang, J., Sun, M.: Bandit learning with implicit feedback. In: NIPS 2018, pp. 7287–7297. Curran Associates Inc., Red Hook (2018)
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retrieval 13(4), 346–374 (2010). https://doi.org/10.1007/s10791-009-9123-y
Santu, S.K.K., Sondhi, P., Zhai, C.: On application of learning to rank for e-commerce search. In: SIGIR 2017 (2017)
Schuth, A., Hofmann, K., Whiteson, S., de Rijke, M.: Lerot: an online learning to rank framework. In: Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation, pp. 23–26. ACM (2013)
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: SIGIR 2015, pp. 373–382. ACM, New York (2015)
Sidana, S., Laclau, C., Amini, M.R., Vandelle, G., Bois-Crettez, A.: KASANDR: a large-scale dataset with implicit feedback for recommendation. In: SIGIR 2017, pp. 1245–1248 (2017)
Swaminathan, A., Joachims, T.: Batch learning from logged bandit feedback through counterfactual risk minimization. JMLR 16, 1731–1755 (2015)
Swaminathan, A., Joachims, T.: The self-normalized estimator for counterfactual learning. In: NIPS 2015, pp. 3231–3239. MIT Press, Cambridge (2015)
Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. CoRR abs/1511.08277 (2015). http://arxiv.org/abs/1511.08277
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)
Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398. ACM, New York (2007)
Yang, Z., et al.: A deep top-K relevance matching model for ad-hoc retrieval. In: Zhang, S., Liu, T.-Y., Li, X., Guo, J., Li, C. (eds.) CCIR 2018. LNCS, vol. 11168, pp. 16–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01012-6_2
Acknowledgments
We would like to thank Alan Schelten, Till Brychcy and Rudolf Sailer for insightful discussions which helped in improving the quality of this work. This work has been supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy through the WoWNet project IUK-1902-003// IUK625/002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Comparison of Counterfactual Risk Estimators
We compare the performance of SNIPS estimator with two baseline estimators for counterfactual risk. We conduct the experiments on AtB click training data of Mercateo dataset. The inverse porpensity scoring (IPS) estimator is calculated by:
Second estimator is an empirical average (EA) estimator defined as follows:
where \(\overline{\delta } (c,a)\) is the empirical average of the losses for a given context and action pair. The results for these estimators are provided in Table 5. Compared to SNIPS both IPS and EA perform significantly worse on all evaluated metrics. The results confirm the importance of equivariance of the counterfactual estimator and show the advantages of SNIPS estimator.
B Choosing Hyperparameter \(\lambda \)
One major drawback of SNIPS estimator is that, being a ratio estimator, it is not possible to perform its direct stochastic optimization [16]. In particular, given the success of stochastic gradient descent (SGD) training of deep neural networks in related applications, this is quite disadvantageous as one can not employ SGD for training.
To overcome this limitation, Joachims et al. [16] fixed the value of denominator in Eq. 3. They denote the denominator by S and solve multiple constrained optimization problems for different values of S. Each of these problems can be reformulated using lagrangian of the constrained optimization problem as:
where \(\lambda _j\) corresponds to a fixed denominator \(S_j\).
The main difficulty in applying the CRM method to learn from logged data is the need to choose hyperparameter \(\lambda \). We discuss below our heuristics of selecting it. We also evaluate the dependence of \(\lambda \) on SNIPS denominator S, which can be used to guide the search for \(\lambda \). To achieve good performance with CRM loss, one has to tune hyperparameter \(\lambda \in [0, 1]\). Instead of doing a grid search, we follow a smarter way to find a suitable \(\lambda \). Building on the observations proposed in [16], we can guide the search of \(\lambda \) based on value of SNIPS denominator S. It was shown in [16] that the value of S increases monotonically, if \(\lambda \) is increased. Secondly, it is straightforward to note that expectation of S is 1. This implies that, with increasing number of bandit feedback, the optimal value for \(\lambda \) should be selected such that its corresponding S value concentrates around 1. In our experiments, we first select some random \(\lambda \in [0, 1]\) and train the model for two epochs with this \(\lambda \). We then calculate S for the trained model; if S is greater than 1, we decrease \(\lambda \) by 10%, otherwise we increase it by 10%. The final value of \(\lambda \) is decided based on best performance on validation set.
In Fig. 3, we plot the values of denominator S on order logs (training set) of Mercateo dataset for different values of hyperparameter \(\lambda \). On the figure below, Fig. 4, we also plot performance on orders test set, in terms of MAP and NDCG@5 scores, of different rankers for these values of hyperparameter \(\lambda \). It is to be noted that the values of SNIPS denominator S monotonically increase with increasing \(\lambda \). The MAP and NDCG@5 reach its highest value for \(\lambda = 0.4\), but decrease only slightly with increasing values of \(\lambda \). Furthermore, it can also be seen from these two figures that the \(\lambda \) values with good performance on test set have corresponding SNIPS denominator values close to 1.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Anwaar, M.U., Rybalko, D., Kleinsteuber, M. (2021). Mend the Learning Approach, Not the Data: Insights for Ranking E-Commerce Products. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-67670-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67669-8
Online ISBN: 978-3-030-67670-4
eBook Packages: Computer ScienceComputer Science (R0)