Mend the Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Anwaar, Muhammad Umer; Rybalko, Dmytro; Kleinsteuber, Martin

doi:10.1007/978-3-030-67670-4_16

Muhammad Umer Anwaar^13,15,
Dmytro Rybalko¹⁴ &
Martin Kleinsteuber^13,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12461))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2175 Accesses

Abstract

Improved search quality enhances users’ satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate relevance judgments. It is done in two steps: first, query-product pair data are aggregated from the logs and then order rate etc. are calculated for each pair in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of relevance judgements, data aggregation and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide Mercateo dataset from our platform. It contains more than 10 million AtB click logs and 1 million order logs from a catalogue of about 3.5 million products associated with 3060 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that our CRM approach learns effectively from logged data and beats a strong baseline ranker ($\lambda $-MART) by a huge margin. Our method outperforms full-information loss (e.g. cross-entropy) on various deep neural network models. These findings demonstrate that by adopting CRM approach, E-Com platforms can get better product search quality compared to full-information approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Practical Deep Online Ranking System in E-commerce Recommendation

Personalized Ranking Mechanism Using Yandex Dataset on Machine Learning Approaches

Learning to Rank and Discover for E-Commerce Search

Notes

1.
https://www.statista.com/statistics/379046/.
2.
Available at: https://github.com/ecom-research/CRM-LTR.
3.
Available at: https://github.com/usnistgov/trec_eval.
4.
Available at: https://sourceforge.net/p/lemur/wiki/RankLib/.

References

Agrawal, R., Halverson, A., Kenthapadi, K., Mishra, N., Tsaparas, P.: Generating labels from clicks. In: WSDM 2009, pp. 172–181. ACM (2009). https://doi.org/10.1145/1498759.1498824
Bendersky, M., Wang, X., Najork, M., Metzler, D.: Learning with sparse and biased feedback for personal search. In: JCAI 2018, pp. 5219–5223. AAAI Press (2018)
Google Scholar
Bi, K., Teo, C.H., Dattatreya, Y., Mohan, V., Croft, W.B.: Leverage implicit feedback for context-aware product search. In: eCOM@SIGIR (2019)
Google Scholar
Borisov, A., Kiseleva, J., Markov, I., de Rijke, M.: Calibration: a simple way to improve click models. In: CIKM 2018 (2018)
Google Scholar
Brenner, E.P., Zhao, J., Kutiyanawala, A., Yan, Z.: End-to-end neural ranking for ecommerce product search. In: SIGIR eCom, vol. 18 (2018)
Google Scholar
Chapelle, O., Chang, Y.: Yahoo! Learning to rank challenge overview. In: Proceedings of the Learning to Rank Challenge, pp. 1–24 (2011)
Google Scholar
Chen, D.: Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J. Database Market. Customer Strategy Manag. 19(3), 197–208 (2012). https://doi.org/10.1057/dbm.2012.17
Article Google Scholar
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching N-grams in ad-hoc search. In: WSDM 2018, pp. 126–134. ACM, New York (2018). https://doi.org/10.1145/3159652.3159659, http://doi.acm.org/10.1145/3159652.3159659
Dheeru, D., Taniskidou, E.: UCI machine learning repository (2017)
Google Scholar
Alonso, O., et al.: Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis. In: SIGIR 2009, pp. 760–761. ACM (2009)
Google Scholar
Guo, J., Fan, Y., Ji, X., Cheng, X.: MatchZoo: a learning, practicing, and developing system for neural text matching. In: SIGIR 2019 (2019). https://doi.org/10.1145/3331184.3331403, http://doi.acm.org/10.1145/3331184.3331403
Hu, Y., Da, Q., Zeng, A., Yu, Y., Xu, Y.: Reinforcement learning to rank in e-commerce search engine: formalization, analysis, and application. In: KDD 2018, NY, USA (2018). https://doi.org/10.1145/3219819.3219846, http://doi.acm.org/10.1145/3219819.3219846
Jiang, S., et al.: Learning query and document relevance from a web-scale click graph. In: SIGIR 2016 (2016)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002. ACM (2002). https://doi.org/10.1145/775047.775067
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 7-es (2007). https://doi.org/10.1145/1229179.1229181
Joachims, T., Swaminathan, A., Rijke, M.d.: Deep learning with logged Bandit feedback. In: ICLR 2018, May 2018
Google Scholar
Joachims, T., Swaminathan, A., Schnabel, T.: Unbiased learning-to-rank with biased feedback. In: WSDM 2017. ACM (2017). https://doi.org/10.1145/3018661.3018699
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N.: Speeding up document ranking with rank-based features. In: SIGIR 2015, NY, USA (2015). https://doi.org/10.1145/2766462.2767776
Mitra, B., Diaz, F., Craswell, N.: Learning to match using local and distributed representations of text for web search. CoRR (2016)
Google Scholar
Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., Cheng, X.: DeepRank: a new deep architecture for relevance ranking in information retrieval. CoRR abs/1710.05649 (2017)
Google Scholar
Qi, Y., Wu, Q., Wang, H., Tang, J., Sun, M.: Bandit learning with implicit feedback. In: NIPS 2018, pp. 7287–7297. Curran Associates Inc., Red Hook (2018)
Google Scholar
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retrieval 13(4), 346–374 (2010). https://doi.org/10.1007/s10791-009-9123-y
Article Google Scholar
Santu, S.K.K., Sondhi, P., Zhai, C.: On application of learning to rank for e-commerce search. In: SIGIR 2017 (2017)
Google Scholar
Schuth, A., Hofmann, K., Whiteson, S., de Rijke, M.: Lerot: an online learning to rank framework. In: Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation, pp. 23–26. ACM (2013)
Google Scholar
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: SIGIR 2015, pp. 373–382. ACM, New York (2015)
Google Scholar
Sidana, S., Laclau, C., Amini, M.R., Vandelle, G., Bois-Crettez, A.: KASANDR: a large-scale dataset with implicit feedback for recommendation. In: SIGIR 2017, pp. 1245–1248 (2017)
Google Scholar
Swaminathan, A., Joachims, T.: Batch learning from logged bandit feedback through counterfactual risk minimization. JMLR 16, 1731–1755 (2015)
MathSciNet MATH Google Scholar
Swaminathan, A., Joachims, T.: The self-normalized estimator for counterfactual learning. In: NIPS 2015, pp. 3231–3239. MIT Press, Cambridge (2015)
Google Scholar
Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. CoRR abs/1511.08277 (2015). http://arxiv.org/abs/1511.08277
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)
Article Google Scholar
Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398. ACM, New York (2007)
Google Scholar
Yang, Z., et al.: A deep top-K relevance matching model for ad-hoc retrieval. In: Zhang, S., Liu, T.-Y., Li, X., Guo, J., Li, C. (eds.) CCIR 2018. LNCS, vol. 11168, pp. 16–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01012-6_2
Chapter Google Scholar

Download references

Acknowledgments

We would like to thank Alan Schelten, Till Brychcy and Rudolf Sailer for insightful discussions which helped in improving the quality of this work. This work has been supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy through the WoWNet project IUK-1902-003// IUK625/002.

Author information

Authors and Affiliations

Technische Universität München, München, Germany
Muhammad Umer Anwaar & Martin Kleinsteuber
IBM, Moscow, Russia
Dmytro Rybalko
Mercateo, München, Germany
Muhammad Umer Anwaar & Martin Kleinsteuber

Authors

Muhammad Umer Anwaar
View author publications
You can also search for this author in PubMed Google Scholar
Dmytro Rybalko
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kleinsteuber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Umer Anwaar .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Yuxiao Dong
University College Dublin, Dublin, Ireland
Georgiana Ifrim
Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Amazon Alexa Knowledge, Cambridge, UK
Craig Saunders
Ghent University, Kotrijk, Belgium
Sofie Van Hoecke

Appendices

A Comparison of Counterfactual Risk Estimators

We compare the performance of SNIPS estimator with two baseline estimators for counterfactual risk. We conduct the experiments on AtB click training data of Mercateo dataset. The inverse porpensity scoring (IPS) estimator is calculated by:

$$\begin{aligned} \hat{R}_{IPS}(\pi _w) = \frac{1}{n}\sum _{i=1}^n \delta _i \frac{\pi _w(a_i|c_i)}{\pi _0(a_i|c_i)}. \end{aligned}$$

(4)

Second estimator is an empirical average (EA) estimator defined as follows:

$$\begin{aligned} \hat{R}_{EA}(\pi _w) = \sum _{(c,a) \in (\mathcal {C,A})} \overline{\delta } (c,a) \pi _w (a|c) , \end{aligned}$$

(5)

where $\overline{\delta } (c,a)$ is the empirical average of the losses for a given context and action pair. The results for these estimators are provided in Table 5. Compared to SNIPS both IPS and EA perform significantly worse on all evaluated metrics. The results confirm the importance of equivariance of the counterfactual estimator and show the advantages of SNIPS estimator.

Table 5. Results on Mercateo dataset with AtB click relevance for IPS and empirical average estimators

Full size table

B Choosing Hyperparameter $\lambda $

One major drawback of SNIPS estimator is that, being a ratio estimator, it is not possible to perform its direct stochastic optimization [16]. In particular, given the success of stochastic gradient descent (SGD) training of deep neural networks in related applications, this is quite disadvantageous as one can not employ SGD for training.

To overcome this limitation, Joachims et al. [16] fixed the value of denominator in Eq. 3. They denote the denominator by S and solve multiple constrained optimization problems for different values of S. Each of these problems can be reformulated using lagrangian of the constrained optimization problem as:

$$\begin{aligned} \hat{w}_j = \mathop {\text {argmin}}\limits _{w} \frac{1}{n} \sum _{i=1}^n (\delta _i - \lambda _j) \frac{\pi _w(a_i|c_i)}{\pi _0(a_i|c_i)} \end{aligned}$$

(6)

where $\lambda _j$ corresponds to a fixed denominator $S_j$.

The main difficulty in applying the CRM method to learn from logged data is the need to choose hyperparameter $\lambda $. We discuss below our heuristics of selecting it. We also evaluate the dependence of $\lambda $ on SNIPS denominator S, which can be used to guide the search for $\lambda $. To achieve good performance with CRM loss, one has to tune hyperparameter $\lambda \in [0, 1]$. Instead of doing a grid search, we follow a smarter way to find a suitable $\lambda $. Building on the observations proposed in [16], we can guide the search of $\lambda $ based on value of SNIPS denominator S. It was shown in [16] that the value of S increases monotonically, if $\lambda $ is increased. Secondly, it is straightforward to note that expectation of S is 1. This implies that, with increasing number of bandit feedback, the optimal value for $\lambda $ should be selected such that its corresponding S value concentrates around 1. In our experiments, we first select some random $\lambda \in [0, 1]$ and train the model for two epochs with this $\lambda $. We then calculate S for the trained model; if S is greater than 1, we decrease $\lambda $ by 10%, otherwise we increase it by 10%. The final value of $\lambda $ is decided based on best performance on validation set.

In Fig. 3, we plot the values of denominator S on order logs (training set) of Mercateo dataset for different values of hyperparameter $\lambda $. On the figure below, Fig. 4, we also plot performance on orders test set, in terms of MAP and NDCG@5 scores, of different rankers for these values of hyperparameter $\lambda $. It is to be noted that the values of SNIPS denominator S monotonically increase with increasing $\lambda $. The MAP and NDCG@5 reach its highest value for $\lambda = 0.4$, but decrease only slightly with increasing values of $\lambda $. Furthermore, it can also be seen from these two figures that the $\lambda $ values with good performance on test set have corresponding SNIPS denominator values close to 1.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anwaar, M.U., Rybalko, D., Kleinsteuber, M. (2021). Mend the Learning Approach, Not the Data: Insights for Ranking E-Commerce Products. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-67670-4_16
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67669-8
Online ISBN: 978-3-030-67670-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)