ABSTRACT
We study the problem of active learning for multi-class classification on large-scale datasets. In this setting, the existing active learning approaches built upon uncertainty measures are ineffective for discovering unknown regions, and those based on expected error reduction are inefficient owing to their huge time costs. To overcome the above issues, this paper proposes a novel query selection criterion called approximated error reduction (AER). In AER, the error reduction of each candidate is estimated based on an expected impact over all datapoints and an approximated ratio between the error reduction and the impact over its nearby datapoints. In particular, we utilize hierarchical anchor graphs to construct the candidate set as well as the nearby datapoint sets of these candidates. The benefit of this strategy is that it enables a hierarchical expansion of candidates with the increase of labels, and allows us to further accelerate the AER estimation. We finally introduce AER into an efficient semi-supervised classifier for scalable active learning. Experiments on publicly available datasets with the sizes varying from thousands to millions demonstrate the effectiveness of our approach.
Supplemental Material
- Oisin Mac Aodha, Neill D. F. Campbell, Jan Kautz, and Gabriel J. Brostow . 2014. Hierarchical subquery evaluation for active learning on a graph Proceedings of the Conference on Computer Vision and Pattern Recognition. 564--571. Google ScholarDigital Library
- Deng Cai and Xiaofei He . 2012. Manifold adaptive experimental design for text categorization. IEEE Transactions on Knowledge and Data Engineering Vol. 24, 4 (2012), 707--719. Google ScholarDigital Library
- Wenbin Cai, Ya Zhang, and Jun Zhou . 2013. Maximizing expected model change for active learning in regression Proceedings of the IEEE International Conference on Data Mining. 51--60.Google Scholar
- Xiaojun Chang, Yao-Liang Yu, and Yi Yang . 2017. Robust Top-k Multiclass SVM for Visual Category Recognition Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 75--83. Google ScholarDigital Library
- Wei-Lin Chiang, Mu-Chu Lee, and Chih-Jen Lin . 2016. Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1485--1494. Google ScholarDigital Library
- Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio . 2015. Attention-based models for speech recognition. In Proceedings of the Advances in Neural Information Processing Systems. 577--585. Google ScholarDigital Library
- Gautam Dasarathy, Robert Nowak, and Xiaojin Zhu . 2015. S2: An efficient graph based active learning algorithm with application to nonparametric classification. In Proceedings of the Annual Conference on Learning Theory. 503--522.Google Scholar
- Weijie Fu, Meng Wang, Shijie Hao, and Tingting Mu . 2017. FLAG: faster learning on anchor graph with label predictor optimization. IEEE Transactions on Big Data (2017). To appear.Google Scholar
- Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou . 2014. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 10, 36 (2014), 1936--1949.Google ScholarCross Ref
- Ajay J Joshi, Fatih Porikli, and Nikolaos P Papanikolopoulos . 2012. Scalable active learning for multiclass image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, 11 (2012), 2259--2273. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Proceedings of the Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
- David D Lewis and William A Gale . 1994. A sequential algorithm for training text classifiers Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12. Google ScholarDigital Library
- Xin Li and Yuhong Guo . 2013. Active Learning with Multi-Label SVM Classification Proceedings of the International Joint Conferences on Artificial Intelligence. 1479--1485. Google ScholarDigital Library
- Xin Li and Yuhong Guo. . 2013. Adaptive active learning for image classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 859--866. Google ScholarDigital Library
- Christopher H Lin, M Mausam, and Daniel S Weld . 2016. Re-Active Learning: Active Learning with Relabeling Proceedings of the AAAI Conference on Artificial Intelligence. 1845--1852. Google ScholarDigital Library
- Wei Liu, Junfeng He, and Shih Fu Chang . 2010. Large graph construction for scalable semi-supervised learning Proceedings of the International Conference on Machine Learning. 679--686. Google ScholarDigital Library
- Marius Muja and David G Lowe . 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, 11 (2014), 2227--2240.Google ScholarCross Ref
- Carlos Riquelme, Mohammad Ghavamzadeh, and Alessandro Lazaric . 2017. Active learning for accurate estimation of linear models Proceedings of the International Conference on Machine Learning.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei . 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, 3 (2015), 211--252. Google ScholarDigital Library
- Burr Settles . 2010. Active learning literature survey. University of Wisconsin, Madison Vol. 52, 55--66 (2010), 11.Google Scholar
- Burr Settles and Mark Craven . 2008. An analysis of active learning strategies for sequence labeling tasks Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1070--1079. Google ScholarDigital Library
- Vikas Sindhwani, Partha Niyogi, Mikhail Belkin, and Sathiya Keerthi . 2005. Linear manifold regularization for large scale semi-supervised learning Proceedings of the International Conference on Machine Learning, Vol. Vol. 28.Google Scholar
- Meng Wang, Weijie Fu, Shijie Hao, Hengchang Liu, and Xindong Wu . 2017. Learning on big graph: Label inference and regularization with anchor hierarchy. IEEE Transactions on Knowledge and Data Engineering Vol. 29, 5 (2017), 1101--1114. Google ScholarDigital Library
- Meng Wang, Weijie Fu, Shijie Hao, Dacheng Tao, and Xindong Wu . 2016. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering Vol. 28, 7 (2016), 1864--1877.Google ScholarCross Ref
- Zheng Wang and Jieping Ye . 2015. Querying discriminative and representative samples for batch mode active learning. ACM Transactions on Knowledge Discovery from Data Vol. 9, 3 (2015), 17. Google ScholarDigital Library
- Kai Yu, Jinbo Bi, and Volker Tresp . 2006. Active learning via transductive experimental design Proceedings of the International Conference on Machine Learning. 1081--1088. Google ScholarDigital Library
- Raphael Yuster and Uri Zwick . 2005. Fast sparse matrix multiplication. ACM Transactions on Algorithms Vol. 1, 1 (2005), 2--13. Google ScholarDigital Library
- Kai Zhang, Liang Lan, James T Kwok, Slobodan Vucetic, and Bahram Parvin . 2015. Scaling up graph-based semisupervised learning via prototype vector machines. IEEE Transactions on Neural Networks and Learning Systems Vol. 26, 3 (2015), 444--457.Google ScholarCross Ref
- Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani . 2003. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning, Vol. Vol. 3.Google Scholar
Index Terms
- Scalable Active Learning by Approximated Error Reduction
Recommendations
Active Learning from the Web
WWW '23: Proceedings of the ACM Web Conference 2023Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be ...
Active lmitation learning: formal and practical reductions to I.I.D. learning
In standard passive imitation learning, the goal is to learn a policy that performs as well as a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and ...
Active learning with direct query construction
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningActive learning may hold the key for solving the data scarcity problem in supervised learning, i.e., the lack of labeled data. Indeed, labeling data is a costly process, yet an active learner may request labels of only selected instances, thus reducing ...
Comments