LtrGCN: Large-Scale Graph Convolutional Networks-Based Learning to Rank for Web Search

Li, Yuchen; Xiong, Haoyi; Kong, Linghe; Wang, Shuaiqiang; Sun, Zeyi; Chen, Hongyang; Chen, Guihai; Yin, Dawei

doi:10.1007/978-3-031-43427-3_38

Yuchen Li¹³,
Haoyi Xiong¹⁴,
Linghe Kong¹³,
Shuaiqiang Wang¹⁴,
Zeyi Sun¹⁵,
Hongyang Chen¹⁵,
Guihai Chen¹³ &
…
Dawei Yin¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14174))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

898 Accesses
1 Citations

Abstract

While traditional Learning to Rank (LTR) models use query-webpage pairs to perform regression tasks to predict the ranking scores, they usually fail to capture the structure of interactions between queries and webpages over an extremely large bipartite graph. In recent years, Graph Convolutional Neural Networks (GCNs) have demonstrated their unique advantages in link prediction over bipartite graphs and have been successfully used for user-item recommendations. However, it is still difficult to scale-up GCNs for web search, due to the (1) extreme sparsity of links in query-webpage bipartite graphs caused by the expense of ranking scores annotation and (2) imbalance between queries (billions) and webpages (trillions) for web-scale search as well as the imbalance in annotations. In this work, we introduce the Q-subgraph and W-subgraph to represent every query and webpage with the structure of interaction preserved, and then propose LtrGCN—an LTR pipeline that samples Q-subgraphs and W-subgraphs from all query-webpage pairs, learns to extract features from Q-subgraphs and W-subgraphs, and predict ranking scores in an end-to-end manner. We carried out extensive experiments to evaluate LtrGCN using two real-world datasets and online experiments based on the A/B test at a large-scale search engine. The offline results show that LtrGCN could achieve \(\varDelta \) NDCG\(_{5}\) = 2.89%–3.97% compared to baselines. We deploy LtrGCN with realistic traffic at a large-scale search engine, where we can still observe significant improvement. LtrGCN performs consistently in both offline and online experiments.

This work was supported in part by National Key R &D Program of China (No. 2021ZD0110303), NSFC grant 62141220, 61972253, U1908212, 62172276, 61972254, the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning, Shanghai Science and Technology Development Funds 23YF1420500, Open Research Projects of Zhejiang Lab No. 2022NL0AB01.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abu-El-Haija, S., et al.: Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: Proceedings of the 36th International Conference on Machine Learning, ICML, pp. 21–29 (2019)
Google Scholar
Ai, Q., Bi, K., Guo, J., Croft, W.B.: Learning a deep listwise context model for ranking refinement. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 135–144 (2018)
Google Scholar
Ai, Q., Wang, X., Bruch, S., Golbandi, N., Bendersky, M., Najork, M.: Learning groupwise multivariate scoring functions using deep neural networks. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, SIGIR, pp. 85–92 (2019)
Google Scholar
Bruch, S., Zoghi, M., Bendersky, M., Najork, M.: Revisiting approximate metric optimization in the age of deep neural networks. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 1241–1244 (2019)
Google Scholar
Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, NeurIPS, pp. 193–200 (2006)
Google Scholar
Burges, C.J.C., et al.: Learning to rank using gradient descent. In: Machine Learning, Proceedings of the Twenty-Second International Conference, ICML, pp. 89–96 (2005)
Google Scholar
Cao, Z., Qin, T., Liu, T., Tsai, M., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Machine Learning, Proceedings of the Twenty-Fourth International Conference, ICML, pp. 129–136 (2007)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD, pp. 785–794 (2016)
Google Scholar
Chuklin, A., Schuth, A., Zhou, K., Rijke, M.D.: A comparative analysis of interleaving methods for aggregated search. ACM Trans. Inf. Syst. (TOIS) 33(2), 1–38 (2015)
Article Google Scholar
Gao, C., Wang, X., He, X., Li, Y.: Graph neural networks for recommender system. In: The Fifteenth ACM International Conference on Web Search and Data Mining, WSDM, pp. 1623–1625 (2022)
Google Scholar
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, NeurIPS, pp. 1024–1034 (2017)
Google Scholar
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., Wang, M.: LightGCN: simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 639–648 (2020)
Google Scholar
Huang, W., Zhang, T., Rong, Y., Huang, J.: Adaptive sampling towards fast graph representation learning. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS, pp. 4563–4572 (2018)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. SIGIR Forum 51(2), 243–250 (2017)
Article Google Scholar
Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD, pp. 217–226 (2006)
Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, NeurIPS, pp. 3146–3154 (2017)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR (2017)
Google Scholar
Klicpera, J., Bojchevski, A., Günnemann, S.: Predict then propagate: graph neural networks meet personalized pagerank. In: 7th International Conference on Learning Representations, ICLR (2019)
Google Scholar
Li, P., Burges, C.J.C., Wu, Q.: Mcrank: learning to rank using multiple classification and gradient boosting. In: Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, NeurIPS, pp. 897–904 (2007)
Google Scholar
Li, Q., Han, Z., Wu, X.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI, pp. 3538–3545 (2018)
Google Scholar
Li, Y., Xiong, H., Kong, L., Zhang, R., Dou, D., Chen, G.: Meta hierarchical reinforced learning to rank for recommendation: a comprehensive study in moocs. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD, pp. 302–317 (2022)
Google Scholar
Li, Y., et al.: Coltr: semi-supervised learning to rank with co-training and over-parameterization for web search. IEEE Trans. Knowl. Data Eng. (2023)
Google Scholar
Pobrotyn, P., Bartczak, T., Synowiec, M., Białobrzeski, R., Bojar, J.: Context-aware learning to rank with self-attention. arXiv preprint arXiv:2005.10084 (2020)
Pobrotyn, P., Białobrzeski, R.: NeuralNDCG: direct optimisation of a ranking metric via differentiable relaxation of sorting. arXiv preprint arXiv:2102.07831 (2021)
Qin, T., Liu, T.Y.: Introducing letor 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013)
Qin, T., Liu, T., Li, H.: A general approximation framework for direct optimization of information retrieval measures. Inf. Retr. 13(4), 375–397 (2010)
Article Google Scholar
Qiu, Z., Hu, Q., Zhong, Y., Zhang, L., Yang, T.: Large-scale stochastic optimization of NDCG surrogates for deep learning with provable convergence. In: International Conference on Machine Learning, ICML, pp. 18122–18152 (2022)
Google Scholar
Shi, M., Tang, Y., Zhu, X., Wilson, D.A., Liu, J.: Multi-class imbalanced graph convolutional network learning. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI, pp. 2879–2885 (2020)
Google Scholar
Vardasbi, A., de Rijke, M., Markov, I.: Cascade model-based propensity estimation for counterfactual learning to rank. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 2089–2092 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, NeurIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, R., et al.: DCN V2: improved deep & cross network and practical lessons for web-scale learning to rank systems. In: WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, WWW, pp. 1785–1797 (2021)
Google Scholar
Wu, F., Jr, A.H.S., Zhang, T., Fifty, C., Yu, T., Weinberger, K.Q.: Simplifying graph convolutional networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML, pp. 6861–6871 (2019)
Google Scholar
Wu, X., Chen, H., Zhao, J., He, L., Yin, D., Chang, Y.: Unbiased learning to rank in feeds recommendation. In: The Fourteenth ACM International Conference on Web Search and Data Mining, WSDM, pp. 490–498 (2021)
Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2021)
Article MathSciNet Google Scholar
Xia, F., Liu, T., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: Machine Learning, Proceedings of the Twenty-Fifth International Conference, ICML, pp. 1192–1199 (2008)
Google Scholar
Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K., Jegelka, S.: Representation learning on graphs with jumping knowledge networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 5449–5458 (2018)
Google Scholar
Yan, L., Qin, Z., Zhuang, H., Wang, X., Bendersky, M., Najork, M.: Revisiting two tower models for unbiased learning to rank. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 2410–2414 (2022)
Google Scholar
Yang, T., Ying, Y.: AUC maximization in the era of big data and AI: a survey. ACM Comput. Surv. 55(8), 172:1–172:37 (2023)
Google Scholar
Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: a survey. IEEE Trans. Knowl. Data Eng. 34(1), 249–270 (2022)
Article Google Scholar
Zhao, S., Wang, H., Li, C., Liu, T., Guan, Y.: Automatically generating questions from queries for community-based question answering. In: Fifth International Joint Conference on Natural Language Processing, IJCNLP, pp. 929–937 (2011)
Google Scholar
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yuchen Li, Linghe Kong & Guihai Chen
Baidu Inc., Beijing, China
Haoyi Xiong, Shuaiqiang Wang & Dawei Yin
Zhejiang Lab, Hangzhou, China
Zeyi Sun & Hongyang Chen

Authors

Yuchen Li
View author publications
You can also search for this author in PubMed Google Scholar
Haoyi Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Linghe Kong
View author publications
You can also search for this author in PubMed Google Scholar
Shuaiqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zeyi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongyang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guihai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haoyi Xiong or Linghe Kong .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement. The authors declare that they have listed all conflicts of interest. This article does not contain any studies with human participants or animals performed by any of the authors. All research and analysis presented in this paper will adhere to ethical principles of honesty, integrity, and respect for human dignity. Sources of information will be cited accurately and fully, and any potential conflicts of interest will be disclosed. Informed consent will be obtained from human subjects involved in the research, and any sensitive or confidential information will be handled with the utmost discretion. Data they used, the data processing and inference phases do not contain any user personal information. This work does not have the potential to be used for policing or the military. The rights and welfare of all individuals involved in this research project will be respected, and no harm or discomfort will be inflicted upon them. This paper strives to maintain high ethical standards and promote the advancement of knowledge in an ethical and responsible manner.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y. et al. (2023). LtrGCN: Large-Scale Graph Convolutional Networks-Based Learning to Rank for Web Search. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-43427-3_38
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

LtrGCN: Large-Scale Graph Convolutional Networks-Based Learning to Rank for Web Search