skip to main content
10.1145/3580305.3599822acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections

Extreme Multi-Label Classification for Ad Targeting using Factorization Machines

Published: 04 August 2023 Publication History


Applications involving Extreme Multi-Label Classification (XMLC) face several practical challenges with respect to scale, model size and prediction latency, while maintaining satisfactory predictive accuracy. In this paper, we propose a Multi-Label Factorization Machine (MLFM) model, which addresses some of the challenges in XMLC problems. We use behavioral ad targeting as a case study to illustrate the benefits of the MLFM model. Predicting user qualifications for targeting segments plays a major role in both personalization and real-time bidding. Considering the large number of segments and the prediction time requirements of real-world production systems, building scalable models is often difficult and computationally burdensome. To cope with these challenges, we (1) reformulate the problem of assigning users to segments as a multi-label classification (XMLC) problem, and (2) leverage the benefits of the conventional FM model and generalize its capacity to joint prediction across a large number of targeting segments. We have shown that the MLFM model is both effective and computationally efficient compared to several baseline models on publicly available datasets in addition to the targeting use case.

Supplementary Material

MP4 File (adfp606-2min-promo.mp4)
Short promotional video for the paper ?Extreme Multi-Label Classification for Ad Targeting using Factorization Machines?, published at KDD 2023. The paper tackles the problem of extreme multi-label classification in applications related to personalization and recommendation with high accuracy and low latency.


Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. 13--24.
Rohit Babbar and Bernhard Schölkopf. 2017. Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the tenth ACM international conference on web search and data mining. 721--729.
Samy Bengio, Krzysztof Dembczynski, Thorsten Joachims, Marius Kloft, and Manik Varma. 2019. Extreme classification (dagstuhl seminar 18291). In Dagstuhl Reports, Vol. 8. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code.
Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. Advances in neural information processing systems, Vol. 28 (2015).
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2019. Large-scale multi-label text classification on EU legislation. arXiv preprint arXiv:1906.02192 (2019).
Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, and Inderjit S. Dhillon. 2021. Extreme Multi-Label Learning for Semantic Matching in Product Search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643--2651.
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 3163--3171.
Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, Vol. 11, 4 (2010).
Chen Chen, Haobo Wang, Weiwei Liu, Xingyuan Zhao, Tianlei Hu, and Gang Chen. 2019. Two-stage label embedding via neural factorization machine for multi-label classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3304--3311.
Krzysztof Dembczyński, Willem Waegeman, Weiwei Cheng, and Eyke Hüllermeier. 2012. On label dependence and loss minimization in multi-label classification. Machine Learning, Vol. 88, 1 (2012), 5--45.
Wei Deng, Junwei Pan, Tian Zhou, Deguang Kong, Aaron Flores, and Guang Lin. 2021. Deeplight: Deep lightweight feature interactions for accelerating ctr predictions in ad serving. In Proceedings of the 14th ACM international conference on Web search and data mining. 922--930.
Google. 2023. Google Ad Targeting.
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 1725--1731.
David J Hand and Robert J Till. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, Vol. 45, 2 (2001), 171--186.
Trevor Hastie, Robert Tibshirani, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.
Chih-Wei Hsu and Chih-Jen Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, Vol. 13, 2 (2002), 415--425.
IAB. 2020. IAB Taxonomy. Retrieved February 1, 2023 from
Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.
Kalina Jasinska, Krzysztof Dembczynski, Róbert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hullermeier. 2016. Extreme f-measure maximization using sparse probability estimates. In International conference on machine learning. PMLR, 1435--1444.
Yacine Jernite, Anna Choromanska, and David Sontag. 2017. Simultaneous learning of trees and representations for extreme classification and density estimation. In International Conference on Machine Learning. PMLR, 1665--1674.
Yuchin Juan, Damien Lefortier, and Olivier Chapelle. 2017. Field-aware factorization machines in a real-world online advertising system. In Proceedings of the 26th International Conference on World Wide Web Companion. 680--688.
Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM conference on recommender systems. 43--50.
Nikos Karampatziakis and Paul Mineiro. 2015. Scalable multilabel prediction via randomized methods. arXiv preprint arXiv:1502.02710 (2015).
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.
Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C Weng. 2007. A note on Platt's probabilistic outputs for support vector machines. Machine learning, Vol. 68, 3 (2007), 267--276.
Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor Tsang. 2021. The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence (2021).
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
Eneldo Loza Mencía and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.
Donna Katzman McClish. 1989. Analyzing a portion of the ROC curve. Medical decision making, Vol. 9, 3 (1989), 190--195.
Meta. 2023. Facebook Ad Targeting.
Paul Mineiro and Nikos Karampatziakis. 2015. Fast label embeddings for extremely large output spaces. arXiv preprint arXiv:1503.08873 (2015).
Alexandru Niculescu-Mizil and Ehsan Abbasnejad. 2017. Label filters for large scale multilabel classification. In Artificial intelligence and statistics. PMLR, 1448--1457.
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2876--2885.
Junwei Pan, Yizhi Mao, Alfonso Lobos Ruiz, Yu Sun, and Aaron Flores. 2019. Predicting different types of conversions with multi-task learning in online advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2689--2697.
Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, and Quan Lu. 2018. Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference. 1349--1357.
Yashoteja Prabhu, Anil Kag, Shilpa Gopinath, Kunal Dahiya, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 441--449.
Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.
Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 263--272.
Piyush Rai, Changwei Hu, Ricardo Henao, and Lawrence Carin. 2015. Large-scale bayesian multi-label learning via topic-based label embeddings. Advances in neural information processing systems, Vol. 28 (2015).
Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.
Ryan Rifkin and Aldebaro Klautau. 2004. In defense of one-vs-all classification. The Journal of Machine Learning Research, Vol. 5 (2004), 101--141.
Alan F Smeaton. 2005. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In International Conference on Image and Video Retrieval. Springer, 11--17.
Alan F Smeaton, Paul Over, and Wessel Kraaij. 2004. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In Proceedings of the 12th annual ACM international conference on Multimedia. 652--655.
Cees GM Snoek, Marcel Worring, Jan C Van Gemert, Jan-Mark Geusebroek, and Arnold WM Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM international conference on Multimedia. 421--430.
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.
Yang Sun, Junwei Pan, Alex Zhang, and Aaron Flores. 2021. FM2: Field-matrixed factorization machines for recommender systems. In Proceedings of the Web Conference 2021. 2828--2837.
Piotr Szymański, Tomasz Kajdanowicz, and Nitesh Chawla. 2018. LNEMLC: Label network embeddings for multi-label classification. arXiv preprint arXiv:1812.02956 (2018).
Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.
Willem Waegeman, Krzysztof Dembczyński, and Eyke Hüllermeier. 2019. Multi-target prediction: a unifying view on problems and methods. Data Mining and Knowledge Discovery, Vol. 33, 2 (2019), 293--324.
Chang Xu, Dacheng Tao, and Chao Xu. 2016. Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1275--1284.
Yahoo. 2023. Yahoo Ad Targeting.
Eric Ye, Xiao Bai, Neil O'Hare, Eliyar Asgarieh, Kapil Thadani, Francisco Perez-Sorrosal, and Sujyothi Adiga. 2022. Multilingual Taxonomic Web Page Classification for Contextual Targeting at Yahoo. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4372--4380.
Ian EH Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545--553.
Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning. PMLR, 3069--3077.
Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Advances in Neural Information Processing Systems, Vol. 32 (2019).
Hsiang-Fu Yu, Jiong Zhang, Wei-Cheng Chang, Jyun-Yu Jiang, Wei Li, and Cho-Jui Hsieh. 2022. Pecos: Prediction for enormous and correlated output spaces. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4848--4849.
Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit Dhillon. 2021. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. Advances in Neural Information Processing Systems, Vol. 34 (2021), 7267--7280.



Information & Contributors


Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023


Request permissions for this article.

Check for updates

Author Tags

  1. ad targeting
  2. extreme multi-label classification
  3. factorization machines
  4. user modeling


  • Research-article


KDD '23

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 415
    Total Downloads
  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Jan 2025

Other Metrics


View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media