ABSTRACT
Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable machine learning models for decision making. However, most research in this domain has only focused on problems with small label spaces and ignored eXtreme Multi-label Classification (XMC), which is an essential task in the era of big data for web-scale machine learning applications. Moreover, enormous label spaces could also lead to noisy retrieval results and intractable computational challenges for uncertainty quantification. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions. Empirical studies on six large-scale real-world datasets show that our framework not only outperforms single models in predictive performance, but also can serve as strong uncertainty-based baselines for label misclassification and out-of-distribution detection, with significant speedup. Besides, our framework can further yield better state-of-the-art results based on deep XMC models with uncertainty quantification.
- Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, et al. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion (2021).Google Scholar
- Mario Almagro, Raquel Mart'inez Unanue, Victor Fresno, and Soto Montalvo. 2020. ICD-10 coding of Spanish electronic discharge summaries: an extreme classification problem. IEEE Access, Vol. 8 (2020), 100073--100083.Google ScholarCross Ref
- Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 721--729.Google ScholarDigital Library
- Rohit Babbar and Bernhard Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning, Vol. 108, 8 (2019), 1329--1351.Google ScholarDigital Library
- Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noémie Elhadad. 2018. Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence.Google Scholar
- José M Bernardo and Adrian FM Smith. 2009. Bayesian theory. Vol. 405. John Wiley & Sons.Google Scholar
- Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification.. In NIPS, Vol. 29. 730--738.Google ScholarDigital Library
- Leo Breiman. 1996. Bagging predictors. Machine learning, Vol. 24, 2 (1996), 123--140.Google Scholar
- Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, et al. 2021. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643--2651.Google ScholarDigital Library
- Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163--3171.Google ScholarDigital Library
- Bertrand Charpentier, Daniel Zügner, and Stephan Günnemann. 2020. Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1356--1367.Google Scholar
- Hugh A Chipman, Edward I George, and Robert E McCulloch. 2007. Bayesian ensemble learning. Advances in neural information processing systems, Vol. 19 (2007), 265.Google Scholar
- Lavsen Dahal, Aayush Kafle, and Bishesh Khanal. 2020. Uncertainty Estimation in Deep 2D Echocardiography Segmentation. arXiv preprint arXiv:2005.09349 (2020).Google Scholar
- Emily Denton, Jason Weston, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2015. User conditional hashtag prediction for images. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1731--1740.Google ScholarDigital Library
- Stefan Depeweg, Jose-Miguel Hernandez-Lobato, Finale Doshi-Velez, and Steffen Udluft. 2018. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In International Conference on Machine Learning. PMLR, 1184--1193.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186.Google Scholar
- Pedro M Domingos. 1997. Why Does Bagging Work? A Bayesian Account and its Implications.. In KDD. Citeseer, 155--158.Google Scholar
- Tony Duan, Avati Anand, Daisy Yi Ding, Khanh K Thai, Sanjay Basu, Andrew Ng, and Alejandro Schuler. 2020. NGBoost: Natural gradient boosting for probabilistic prediction. In International Conference on Machine Learning. PMLR, 2690--2700.Google Scholar
- Yarin Gal. 2016. Uncertainty in Deep Learning. Ph.,D. Dissertation. University of Cambridge.Google Scholar
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.Google Scholar
- Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. 2011. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 85--94.Google ScholarDigital Library
- Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelligence. Nature, Vol. 521, 7553 (2015), 452--459.Google Scholar
- Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google Scholar
- Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.Google ScholarDigital Library
- Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 7987--7994.Google ScholarCross Ref
- Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. Machine Learning, Vol. 109, 11 (2020), 2099--2119.Google ScholarDigital Library
- Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, Vol. 32 (2019), 7026--7037.Google Scholar
- Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan. 2013. Beam search algorithms for multilabel learning. Machine learning, Vol. 92, 1 (2013), 65--89.Google Scholar
- Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Advances in Neural Information Processing Systems, Vol. 30 (2017).Google Scholar
- Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In NeurIPS. 4601--4609.Google Scholar
- Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. 2006. A tutorial on energy-based learning. Predicting structured data, Vol. 1, 0 (2006).Google Scholar
- Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 115--124.Google ScholarDigital Library
- Jeremiah Zhe Liu. 2019. Variable selection with rigorous uncertainty quantification using bayesian deep neural networks. In Bayesian Deep Learning Workshop at NeurIPS.Google Scholar
- Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out-of-distribution Detection. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
- Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, and Inderjit Dhillon. 2021. Label disentanglement in partition-based extreme multilabel classification. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Andrey Malinin. 2019. Uncertainty estimation in deep learning with application to spoken language assessment. Ph.,D. Dissertation. University of Cambridge.Google Scholar
- Andrey Malinin and Mark Gales. 2020. Uncertainty Estimation in Autoregressive Structured Prediction. In International Conference on Learning Representations.Google Scholar
- Andrey Malinin, Liudmila Prokhorenkova, and Aleksei Ustimenko. 2020. Uncertainty in Gradient Boosting via Ensembles. In International Conference on Learning Representations.Google Scholar
- Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.Google ScholarDigital Library
- Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger, Alzheimer's Disease Neuroimaging Initiative, et al. 2019. Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage, Vol. 195 (2019), 11--22.Google ScholarCross Ref
- Wang Ruo-Peng and Xu Hong-Min. 2009. A smoothing function for 1-norm support vector machines. In 2009 Fifth International Conference on Natural Computation, Vol. 1. IEEE, 450--454.Google ScholarDigital Library
- Mohammad Hossein Shaker and Eyke Hüllermeier. 2020. Aleatoric and epistemic uncertainty with random forests. arXiv preprint arXiv:2001.00893 (2020).Google Scholar
- Artem Shelmanov, Evgenii Tsymbalov, Dmitri Puzyrev, Kirill Fedyanin, Alexander Panchenko, and Maxim Panov. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1833--1840.Google ScholarCross Ref
- Yukihiro Tagami. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.Google Scholar
- Warren E Walker, Poul Harremoës, Jan Rotmans, Jeroen P Van Der Sluijs, Marjolein BA Van Asselt, Peter Janssen, and Martin P Krayer von Krauss. 2003. Defining uncertainty: a conceptual basis for uncertainty management in model-based decision support. Integrated assessment, Vol. 4, 1 (2003), 5--17.Google Scholar
- Haoran Wang, Weitang Liu, Alex Bocchieri, and Yixuan Li. 2021. Can multi-label classification networks know what they don't know? Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Alfred Wehrl. 1978. General properties of entropy. Reviews of Modern Physics, Vol. 50, 2 (1978), 221.Google ScholarCross Ref
- Shunyao Wu, Yuzhu Chen, Zhiruo Li, Jian Li, Fengyang Zhao, and Xiaoquan Su. 2021. Towards multi-label classification: Next step of machine learning for microbiome research Computational and Structural Biotechnology Journal (2021).Google Scholar
- Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, and Krzysztof Dembczyński. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 6358--6368.Google Scholar
- Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning. PMLR, 3069--3077.Google Scholar
- Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware ee Model for High-Performance Extreme Multi-Label Text Classification. Advances in Neural Information Processing Systems, Vol. 32 (2019), 5820--5830.Google Scholar
- Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. PMLR, 593--601.Google Scholar
- Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, and Inderjit S Dhillon. 2020. PECOS: Prediction for enormous and correlated output spaces. arXiv preprint arXiv:2010.05878 (2020).Google Scholar
- Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit S Dhillon. 2021. Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification. In Advances in Neural Information Processing Systems.Google Scholar
- Jize Zhang, Bhavya Kailkhura, and T Yong-Jin Han. 2020. Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In International Conference on Machine Learning. PMLR, 11117--11128.Google Scholar
- Wenbin Zheng, Xiaping Fu, and Yibin Ying. 2014. Spectroscopy-based food classification with extreme learning machine. Chemometrics and Intelligent Laboratory Systems, Vol. 139 (2014), 42--47.Google ScholarCross Ref
Index Terms
- Uncertainty Quantification for Extreme Classification
Recommendations
Numerical approach for quantification of epistemic uncertainty
In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...
Uncertainty Quantification for Text Classification
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalThis full-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Uncertainty Quantification for Text Classification
Advances in Information RetrievalAbstractThis half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Comments