research-article

Extreme Multi-Label Classification for Ad Targeting using Factorization Machines

Authors:

Martin Pavlovski,

Srinath Ravindran,

Djordje Gligorijevic,

Shubham Agrawal,

Ivan Stojkovic,

Nelson Segura-Nunez,

Jelena GligorijevicAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4705 - 4716

https://doi.org/10.1145/3580305.3599822

Published: 04 August 2023 Publication History

Abstract

Applications involving Extreme Multi-Label Classification (XMLC) face several practical challenges with respect to scale, model size and prediction latency, while maintaining satisfactory predictive accuracy. In this paper, we propose a Multi-Label Factorization Machine (MLFM) model, which addresses some of the challenges in XMLC problems. We use behavioral ad targeting as a case study to illustrate the benefits of the MLFM model. Predicting user qualifications for targeting segments plays a major role in both personalization and real-time bidding. Considering the large number of segments and the prediction time requirements of real-world production systems, building scalable models is often difficult and computationally burdensome. To cope with these challenges, we (1) reformulate the problem of assigning users to segments as a multi-label classification (XMLC) problem, and (2) leverage the benefits of the conventional FM model and generalize its capacity to joint prediction across a large number of targeting segments. We have shown that the MLFM model is both effective and computationally efficient compared to several baseline models on publicly available datasets in addition to the targeting use case.

Supplementary Material

MP4 File (adfp606-2min-promo.mp4)

Short promotional video for the paper ?Extreme Multi-Label Classification for Ad Targeting using Factorization Machines?, published at KDD 2023. The paper tackles the problem of extreme multi-label classification in applications related to personalization and recommendation with high accuracy and low latency.

Download
11.04 MB

References

[1]

Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. 13--24.

Digital Library

[2]

Rohit Babbar and Bernhard Schölkopf. 2017. Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the tenth ACM international conference on web search and data mining. 721--729.

Digital Library

[3]

Samy Bengio, Krzysztof Dembczynski, Thorsten Joachims, Marius Kloft, and Manik Varma. 2019. Extreme classification (dagstuhl seminar 18291). In Dagstuhl Reports, Vol. 8. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

[4]

K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. http://manikvarma.org/downloads/XC/XMLRepository.html

[5]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. Advances in neural information processing systems, Vol. 28 (2015).

[6]

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2019. Large-scale multi-label text classification on EU legislation. arXiv preprint arXiv:1906.02192 (2019).

[7]

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, and Inderjit S. Dhillon. 2021. Extreme Multi-Label Learning for Semantic Matching in Product Search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643--2651.

[8]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 3163--3171.

Digital Library

[9]

Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, Vol. 11, 4 (2010).

[10]

Chen Chen, Haobo Wang, Weiwei Liu, Xingyuan Zhao, Tianlei Hu, and Gang Chen. 2019. Two-stage label embedding via neural factorization machine for multi-label classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3304--3311.

Digital Library

[11]

Krzysztof Dembczyński, Willem Waegeman, Weiwei Cheng, and Eyke Hüllermeier. 2012. On label dependence and loss minimization in multi-label classification. Machine Learning, Vol. 88, 1 (2012), 5--45.

Digital Library

[12]

Wei Deng, Junwei Pan, Tian Zhou, Deguang Kong, Aaron Flores, and Guang Lin. 2021. Deeplight: Deep lightweight feature interactions for accelerating ctr predictions in ad serving. In Proceedings of the 14th ACM international conference on Web search and data mining. 922--930.

Digital Library

[13]

Google. 2023. Google Ad Targeting. https://support.google.com/google-ads/answer/1704368

[14]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 1725--1731.

Digital Library

[15]

David J Hand and Robert J Till. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, Vol. 45, 2 (2001), 171--186.

[16]

Trevor Hastie, Robert Tibshirani, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.

[17]

Chih-Wei Hsu and Chih-Jen Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, Vol. 13, 2 (2002), 415--425.

[18]

IAB. 2020. IAB Taxonomy. https://iabtechlab.com/standards/audience-taxonomy/ Retrieved February 1, 2023 from

[19]

Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.

Digital Library

[20]

Kalina Jasinska, Krzysztof Dembczynski, Róbert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hullermeier. 2016. Extreme f-measure maximization using sparse probability estimates. In International conference on machine learning. PMLR, 1435--1444.

[21]

Yacine Jernite, Anna Choromanska, and David Sontag. 2017. Simultaneous learning of trees and representations for extreme classification and density estimation. In International Conference on Machine Learning. PMLR, 1665--1674.

[22]

Yuchin Juan, Damien Lefortier, and Olivier Chapelle. 2017. Field-aware factorization machines in a real-world online advertising system. In Proceedings of the 26th International Conference on World Wide Web Companion. 680--688.

Digital Library

[23]

Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM conference on recommender systems. 43--50.

Digital Library

[24]

Nikos Karampatziakis and Paul Mineiro. 2015. Scalable multilabel prediction via randomized methods. arXiv preprint arXiv:1502.02710 (2015).

[25]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[26]

Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.

Digital Library

[27]

Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C Weng. 2007. A note on Platt's probabilistic outputs for support vector machines. Machine learning, Vol. 68, 3 (2007), 267--276.

[28]

Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor Tsang. 2021. The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence (2021).

[29]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[30]

Eneldo Loza Mencía and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.

Digital Library

[31]

Donna Katzman McClish. 1989. Analyzing a portion of the ROC curve. Medical decision making, Vol. 9, 3 (1989), 190--195.

[32]

Meta. 2023. Facebook Ad Targeting. https://www.facebook.com/business/ads/ad-targeting

[33]

Paul Mineiro and Nikos Karampatziakis. 2015. Fast label embeddings for extremely large output spaces. arXiv preprint arXiv:1503.08873 (2015).

[34]

Alexandru Niculescu-Mizil and Ehsan Abbasnejad. 2017. Label filters for large scale multilabel classification. In Artificial intelligence and statistics. PMLR, 1448--1457.

[35]

Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2876--2885.

Digital Library

[36]

Junwei Pan, Yizhi Mao, Alfonso Lobos Ruiz, Yu Sun, and Aaron Flores. 2019. Predicting different types of conversions with multi-task learning in online advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2689--2697.

Digital Library

[37]

Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, and Quan Lu. 2018. Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference. 1349--1357.

Digital Library

[38]

Yashoteja Prabhu, Anil Kag, Shilpa Gopinath, Kunal Dahiya, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 441--449.

Digital Library

[39]

Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.

Digital Library

[40]

Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 263--272.

Digital Library

[41]

Piyush Rai, Changwei Hu, Ricardo Henao, and Lawrence Carin. 2015. Large-scale bayesian multi-label learning via topic-based label embeddings. Advances in neural information processing systems, Vol. 28 (2015).

[42]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.

Digital Library

[43]

Ryan Rifkin and Aldebaro Klautau. 2004. In defense of one-vs-all classification. The Journal of Machine Learning Research, Vol. 5 (2004), 101--141.

Digital Library

[44]

Alan F Smeaton. 2005. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In International Conference on Image and Video Retrieval. Springer, 11--17.

Digital Library

[45]

Alan F Smeaton, Paul Over, and Wessel Kraaij. 2004. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In Proceedings of the 12th annual ACM international conference on Multimedia. 652--655.

Digital Library

[46]

Cees GM Snoek, Marcel Worring, Jan C Van Gemert, Jan-Mark Geusebroek, and Arnold WM Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM international conference on Multimedia. 421--430.

Digital Library

[47]

Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.

Digital Library

[48]

Yang Sun, Junwei Pan, Alex Zhang, and Aaron Flores. 2021. FM2: Field-matrixed factorization machines for recommender systems. In Proceedings of the Web Conference 2021. 2828--2837.

Digital Library

[49]

Piotr Szymański, Tomasz Kajdanowicz, and Nitesh Chawla. 2018. LNEMLC: Label network embeddings for multi-label classification. arXiv preprint arXiv:1812.02956 (2018).

[50]

Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.

Digital Library

[51]

Willem Waegeman, Krzysztof Dembczyński, and Eyke Hüllermeier. 2019. Multi-target prediction: a unifying view on problems and methods. Data Mining and Knowledge Discovery, Vol. 33, 2 (2019), 293--324.

Digital Library

[52]

Chang Xu, Dacheng Tao, and Chao Xu. 2016. Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1275--1284.

Digital Library

[53]

Yahoo. 2023. Yahoo Ad Targeting. https://www.advertising.yahooinc.com/our-dsp/target

[54]

Eric Ye, Xiao Bai, Neil O'Hare, Eliyar Asgarieh, Kapil Thadani, Francisco Perez-Sorrosal, and Sujyothi Adiga. 2022. Multilingual Taxonomic Web Page Classification for Contextual Targeting at Yahoo. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4372--4380.

Digital Library

[55]

Ian EH Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545--553.

Digital Library

[56]

Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning. PMLR, 3069--3077.

[57]

Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[58]

Hsiang-Fu Yu, Jiong Zhang, Wei-Cheng Chang, Jyun-Yu Jiang, Wei Li, and Cho-Jui Hsieh. 2022. Pecos: Prediction for enormous and correlated output spaces. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4848--4849.

Digital Library

[59]

Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit Dhillon. 2021. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. Advances in Neural Information Processing Systems, Vol. 34 (2021), 7267--7280.

Index Terms

Extreme Multi-Label Classification for Ad Targeting using Factorization Machines
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Factorization methods
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. World Wide Web
    1. Online advertising
      1. Display advertising

Recommendations

Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learning
Abstract
Generalized zero-shot extreme multi-label learning (GZXML) aims to predict relevant labels for unknown instances from a set of seen and unseen labels and is widely used in engineering applications. Since the supervisory information of the ...
Long-tail Mixup for Extreme Multi-label Classification
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Extreme multi-label classification (XMC) aims at finding multiple relevant labels for a given sample from a huge label set at the industrial scale. The XMC problem inherently poses two challenges: scalability and label sparsity - the number of labels is ...
Data scarcity, robustness and extreme multi-label classification
Abstract
The goal in extreme multi-label classification (XMC) is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. The distribution of training instances among labels in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
415
Total Downloads

Downloads (Last 12 months)216
Downloads (Last 6 weeks)7

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents