skip to main content
10.1145/3539618.3591780acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open Access

Uncertainty Quantification for Extreme Classification

Published:18 July 2023Publication History

ABSTRACT

Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable machine learning models for decision making. However, most research in this domain has only focused on problems with small label spaces and ignored eXtreme Multi-label Classification (XMC), which is an essential task in the era of big data for web-scale machine learning applications. Moreover, enormous label spaces could also lead to noisy retrieval results and intractable computational challenges for uncertainty quantification. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions. Empirical studies on six large-scale real-world datasets show that our framework not only outperforms single models in predictive performance, but also can serve as strong uncertainty-based baselines for label misclassification and out-of-distribution detection, with significant speedup. Besides, our framework can further yield better state-of-the-art results based on deep XMC models with uncertainty quantification.

References

  1. Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, et al. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion (2021).Google ScholarGoogle Scholar
  2. Mario Almagro, Raquel Mart'inez Unanue, Victor Fresno, and Soto Montalvo. 2020. ICD-10 coding of Spanish electronic discharge summaries: an extreme classification problem. IEEE Access, Vol. 8 (2020), 100073--100083.Google ScholarGoogle ScholarCross RefCross Ref
  3. Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 721--729.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rohit Babbar and Bernhard Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning, Vol. 108, 8 (2019), 1329--1351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noémie Elhadad. 2018. Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence.Google ScholarGoogle Scholar
  6. José M Bernardo and Adrian FM Smith. 2009. Bayesian theory. Vol. 405. John Wiley & Sons.Google ScholarGoogle Scholar
  7. Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification.. In NIPS, Vol. 29. 730--738.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leo Breiman. 1996. Bagging predictors. Machine learning, Vol. 24, 2 (1996), 123--140.Google ScholarGoogle Scholar
  9. Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, et al. 2021. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643--2651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163--3171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bertrand Charpentier, Daniel Zügner, and Stephan Günnemann. 2020. Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1356--1367.Google ScholarGoogle Scholar
  12. Hugh A Chipman, Edward I George, and Robert E McCulloch. 2007. Bayesian ensemble learning. Advances in neural information processing systems, Vol. 19 (2007), 265.Google ScholarGoogle Scholar
  13. Lavsen Dahal, Aayush Kafle, and Bishesh Khanal. 2020. Uncertainty Estimation in Deep 2D Echocardiography Segmentation. arXiv preprint arXiv:2005.09349 (2020).Google ScholarGoogle Scholar
  14. Emily Denton, Jason Weston, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2015. User conditional hashtag prediction for images. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1731--1740.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Stefan Depeweg, Jose-Miguel Hernandez-Lobato, Finale Doshi-Velez, and Steffen Udluft. 2018. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In International Conference on Machine Learning. PMLR, 1184--1193.Google ScholarGoogle Scholar
  16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186.Google ScholarGoogle Scholar
  17. Pedro M Domingos. 1997. Why Does Bagging Work? A Bayesian Account and its Implications.. In KDD. Citeseer, 155--158.Google ScholarGoogle Scholar
  18. Tony Duan, Avati Anand, Daisy Yi Ding, Khanh K Thai, Sanjay Basu, Andrew Ng, and Alejandro Schuler. 2020. NGBoost: Natural gradient boosting for probabilistic prediction. In International Conference on Machine Learning. PMLR, 2690--2700.Google ScholarGoogle Scholar
  19. Yarin Gal. 2016. Uncertainty in Deep Learning. Ph.,D. Dissertation. University of Cambridge.Google ScholarGoogle Scholar
  20. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.Google ScholarGoogle Scholar
  21. Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. 2011. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 85--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelligence. Nature, Vol. 521, 7553 (2015), 452--459.Google ScholarGoogle Scholar
  23. Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google ScholarGoogle Scholar
  24. Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 7987--7994.Google ScholarGoogle ScholarCross RefCross Ref
  26. Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. Machine Learning, Vol. 109, 11 (2020), 2099--2119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, Vol. 32 (2019), 7026--7037.Google ScholarGoogle Scholar
  28. Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan. 2013. Beam search algorithms for multilabel learning. Machine learning, Vol. 92, 1 (2013), 65--89.Google ScholarGoogle Scholar
  29. Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Advances in Neural Information Processing Systems, Vol. 30 (2017).Google ScholarGoogle Scholar
  30. Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In NeurIPS. 4601--4609.Google ScholarGoogle Scholar
  31. Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. 2006. A tutorial on energy-based learning. Predicting structured data, Vol. 1, 0 (2006).Google ScholarGoogle Scholar
  32. Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 115--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jeremiah Zhe Liu. 2019. Variable selection with rigorous uncertainty quantification using bayesian deep neural networks. In Bayesian Deep Learning Workshop at NeurIPS.Google ScholarGoogle Scholar
  34. Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out-of-distribution Detection. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google ScholarGoogle Scholar
  35. Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, and Inderjit Dhillon. 2021. Label disentanglement in partition-based extreme multilabel classification. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google ScholarGoogle Scholar
  36. Andrey Malinin. 2019. Uncertainty estimation in deep learning with application to spoken language assessment. Ph.,D. Dissertation. University of Cambridge.Google ScholarGoogle Scholar
  37. Andrey Malinin and Mark Gales. 2020. Uncertainty Estimation in Autoregressive Structured Prediction. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  38. Andrey Malinin, Liudmila Prokhorenkova, and Aleksei Ustimenko. 2020. Uncertainty in Gradient Boosting via Ensembles. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  39. Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger, Alzheimer's Disease Neuroimaging Initiative, et al. 2019. Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage, Vol. 195 (2019), 11--22.Google ScholarGoogle ScholarCross RefCross Ref
  41. Wang Ruo-Peng and Xu Hong-Min. 2009. A smoothing function for 1-norm support vector machines. In 2009 Fifth International Conference on Natural Computation, Vol. 1. IEEE, 450--454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mohammad Hossein Shaker and Eyke Hüllermeier. 2020. Aleatoric and epistemic uncertainty with random forests. arXiv preprint arXiv:2001.00893 (2020).Google ScholarGoogle Scholar
  43. Artem Shelmanov, Evgenii Tsymbalov, Dmitri Puzyrev, Kirill Fedyanin, Alexander Panchenko, and Maxim Panov. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1833--1840.Google ScholarGoogle ScholarCross RefCross Ref
  44. Yukihiro Tagami. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  46. Warren E Walker, Poul Harremoës, Jan Rotmans, Jeroen P Van Der Sluijs, Marjolein BA Van Asselt, Peter Janssen, and Martin P Krayer von Krauss. 2003. Defining uncertainty: a conceptual basis for uncertainty management in model-based decision support. Integrated assessment, Vol. 4, 1 (2003), 5--17.Google ScholarGoogle Scholar
  47. Haoran Wang, Weitang Liu, Alex Bocchieri, and Yixuan Li. 2021. Can multi-label classification networks know what they don't know? Advances in Neural Information Processing Systems, Vol. 34 (2021).Google ScholarGoogle Scholar
  48. Alfred Wehrl. 1978. General properties of entropy. Reviews of Modern Physics, Vol. 50, 2 (1978), 221.Google ScholarGoogle ScholarCross RefCross Ref
  49. Shunyao Wu, Yuzhu Chen, Zhiruo Li, Jian Li, Fengyang Zhao, and Xiaoquan Su. 2021. Towards multi-label classification: Next step of machine learning for microbiome research Computational and Structural Biotechnology Journal (2021).Google ScholarGoogle Scholar
  50. Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, and Krzysztof Dembczyński. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 6358--6368.Google ScholarGoogle Scholar
  51. Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning. PMLR, 3069--3077.Google ScholarGoogle Scholar
  52. Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware ee Model for High-Performance Extreme Multi-Label Text Classification. Advances in Neural Information Processing Systems, Vol. 32 (2019), 5820--5830.Google ScholarGoogle Scholar
  53. Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. PMLR, 593--601.Google ScholarGoogle Scholar
  54. Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, and Inderjit S Dhillon. 2020. PECOS: Prediction for enormous and correlated output spaces. arXiv preprint arXiv:2010.05878 (2020).Google ScholarGoogle Scholar
  55. Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit S Dhillon. 2021. Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  56. Jize Zhang, Bhavya Kailkhura, and T Yong-Jin Han. 2020. Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In International Conference on Machine Learning. PMLR, 11117--11128.Google ScholarGoogle Scholar
  57. Wenbin Zheng, Xiaping Fu, and Yibin Ying. 2014. Spectroscopy-based food classification with extreme learning machine. Chemometrics and Intelligent Laboratory Systems, Vol. 139 (2014), 42--47.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Uncertainty Quantification for Extreme Classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)148
        • Downloads (Last 6 weeks)24

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader