Abstract
Most existing active learning studies focus on designing sample selection algorithms. However, several fundamental problems deserve investigation to provide deep insight into active learning. In this article, we conduct an in-depth investigation on active learning for classification from the perspective of model change. We derive a general active learning framework for classification called maximum model change (MMC), which aims at querying the influential examples. The model change is quantified as the difference between the model parameters before and after training with the expanded training set. Inspired by the stochastic gradient update rule, the gradient of the loss with respect to a given candidate example is adopted to approximate the model change. This framework is applied to two popular classifiers: support vector machines and logistic regression. We analyze the convergence property of MMC and theoretically justify it. We explore the connection between MMC and uncertainty-based sampling to provide a uniform view. In addition, we discuss its potential usability to other learning models and show its applicability in a wide range of applications. We validate the MMC strategy on two kinds of benchmark datasets, the UCI repository and ImageNet, and show that it outperforms many state-of-the-art methods.
- N. Abe and H. Mamitsuka. 1998. Query learning strategies using boosting and bagging. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). 1--10.Google ScholarDigital Library
- M. Balcan, A. Beygelzimer, and J. Langford. 2006. Agnostic active learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML’06). 65--72. Google ScholarDigital Library
- L. Breiman. 2001. Random forests. Machine Learning Journal 45, 1, 5--32.Google ScholarDigital Library
- K. Brinker. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 59--66.Google Scholar
- C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarDigital Library
- W. Cai, M. Zhang, and Y. Zhang. 2015. Active learning for ranking with sample density. Information Retrieval Journal 18, 2, 123--144. Google ScholarDigital Library
- W. Cai, Y. Zhang, and J. Zhou. 2013. Maximizing expected model change for active learning in regression. In Proceedings of the 13th International Conference on Data Mining (ICDM’13). 51--60. Google ScholarCross Ref
- W. Cai, Y. Zhang, S. Zhou, W. Wang, C. Ding, and X. Gu. 2014. Active learning for support vector machines with maximum model change. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’14). 211--226. Google ScholarDigital Library
- Y. B. Cao, J. Xu, T. Y. Liu, and H. Li. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 186--193. Google ScholarDigital Library
- C. Chang and C. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, Article No. 27. Google ScholarDigital Library
- D. A. Chon, Z. Ghahramani, and M. I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 129--145.Google ScholarCross Ref
- S. Dasgupta and D. Hsu. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 208--215. Google ScholarDigital Library
- S. Dasgupta, D. Hsu, and C. Monteleoni. 2007. A general agnostic active learning algorithm. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 353--360.Google Scholar
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255.Google Scholar
- P. Donmez and J. G. Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 248--255. Google ScholarCross Ref
- K. Dwyer and R. Holte. 2007. Decision tree instability and active learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’07). 128--139. Google ScholarDigital Library
- Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2, 133--168. Google ScholarDigital Library
- A. Freytag, E. Rodner, and J. Denzler. 2014. Selecting influential examples: Active learning with expected model output changesIn Proceedings of the European Conference on Computer Vision (ECCV’14). 562--577.Google Scholar
- Y. Fu, X. Zhu, and B. Li. 2013. A survey on instance selection for active learning. Knowledge and Information Systems 35, 2, 249--283. Google ScholarCross Ref
- Y. Guo. 2010. Active instance sampling via matrix partition. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 802--810.Google Scholar
- Y. Guo and D. Schuurmans. 2007. Discriminative batch mode active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 593--600.Google Scholar
- Steve Hanneke. 2007. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 353--360.Google ScholarDigital Library
- S. C. Hoi, R. Jin, and M. R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233--1248. Google ScholarDigital Library
- S. C. Hoi, R. Jin, J. Zhu, and M. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16. Google ScholarDigital Library
- T. M. Hospedales, S. Gong, and T. Xiang. 2013. Finding rare classes: Active learning with generative and discriminative models. IEEE Transactions on Knowledge and Data Engineering 25, 2, 374--386. Google ScholarDigital Library
- S. Huang, R. Jin, and Z. Zhou. 2010. Active learning by querying informative and representative examples. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 892--900.Google Scholar
- S. Huang, R. Jin, and Z. Zhou. 2014a. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 10, 1936--1949. Google Scholar
- X. Huang, L. Shi, and J. Suykens. 2014b. Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5, 984--997. Google ScholarCross Ref
- J. Kivinen, A. Smola, and R. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarDigital Library
- D. D. Lewis and W. A. Gale. 1994 A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. Google ScholarCross Ref
- L. Li, X. Jin, S. J. Pan, and J. Sun. 2012. Multi-domain active learning for text classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 1086--1094. Google ScholarDigital Library
- M. Li and I. K. Sethi. 2006. Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 8, 1251--1261. Google ScholarDigital Library
- J. Lin and A. Kolcz. 2012. Large-scale machine learning at Twitter. In Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 793--804. Google ScholarDigital Library
- Z. Lu, X. Wu, and J. Bongard. 2009. Active learning with adaptive heterogeneous ensembles. In Proceedings of the 9th International Conference on Data Mining (ICDM’09). 327--336. Google ScholarDigital Library
- J. Maiora, B. Ayerdi, and M. Graña. 2014. Random forest active learning for AAA thrombus segmentation in computed tomography angiography images. Neurocomputing 126, 71--77.Google ScholarDigital Library
- P. Melville and R. Mooney. 2004. Diverse ensembles for active learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 584--591. Google ScholarDigital Library
- H. Nguyen and A. Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 623--630. Google ScholarDigital Library
- N. Roy and A. McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 441--448.Google ScholarDigital Library
- A. I. Schein and L. H. Ungar. 2007. Active learning for logistic regression: An evaluation. Machine Learning 68, 3, 235--265. Google ScholarDigital Library
- G. Schohn and D. Cohn. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 839--846.Google Scholar
- B. Settles. 2010. Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin at Madison.Google Scholar
- B. Settles and M. Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1070--1079. Google ScholarCross Ref
- B. Settles, M. Craven, and S. Ray. 2008. Multiple-instance active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’08). 1289--1296.Google Scholar
- G. Sharma, F. Jurie, and P. Perez. 2014. Learning Non-Linear SVM in Input Space for Image Classification. Retrieved May 23, 2017, from https://hal.inria.fr/hal-00977304v1/document.Google Scholar
- S. Shwartz, Y. Singer, and N. Srebro. 2007. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 807--814.Google Scholar
- M. Sugiyama and S. Nakajima. 2009. Pool-based active learning in approximate linear regression. Machine Learning 75, 3, 249--274. Google ScholarDigital Library
- S. Tong and E. Chang. 2001. Support vector machine active learning for image retrieval. In Proceedings of the 9th ACM International Conference on Multimedia (MM’01). 107--118. Google ScholarDigital Library
- S. Tong and D. Koller. 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarDigital Library
- C. Tsai, K. McGarry, and J. Tait. 2006. CLAIRE: A modular support vector image indexing and classification system. ACM Transactions on Information Systems 24, 3, 353--379. Google ScholarDigital Library
- V. Vapnik. 1999. The Nature of Statistical Learning Theory. Springer.Google ScholarDigital Library
- D. Vasisht, A. Damianou, M. Varma, and A. Kapoor. 2014. Active learning for sparse Bayesian multilabel classification. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 472--481. Google ScholarDigital Library
- A. Vlachos. 2004. Active Learning with Support Vector Machines. Master’s Thesis. School of Informatics, University of Edinburgh.Google Scholar
- Z. Wang, K. Crammer, and S. Vucetic. 2012. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training. Journal of Machine Learning Research 13, 1, 3103--3131.Google ScholarDigital Library
- Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR’03). 393--407. Google ScholarCross Ref
- J. Zhang, R. Jin, Y. Yang, and A. G. Hauptmann. 2003. Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 888--895.Google Scholar
- J. Zhu, H. Wang, and B. Tsou. 2009. A density-based re-ranking technique for active learning for data annotations. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. 1--10. Google ScholarDigital Library
Index Terms
- Active Learning for Classification with Maximum Model Change
Recommendations
Stopping Criterion for Active Learning with Model Stability
Regular PapersActive learning selectively labels the most informative instances, aiming to reduce the cost of data annotation. While much effort has been devoted to active sampling functions, relatively limited attention has been paid to when the learning process ...
Active Learning for Support Vector Machines with Maximum Model Change
Machine Learning and Knowledge Discovery in DatabasesAbstractMargin-based strategies and model change based strategies represent two important types of strategies for active learning. While margin-based strategies have been dominant for Support Vector Machines (SVMs), most methods are based on heuristics ...
Active learning for support vector machines with maximum model change
ECMLPKDD'14: Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IMargin-based strategies and model change based strategies represent two important types of strategies for active learning. While margin-based strategies have been dominant for Support Vector Machines (SVMs), most methods are based on heuristics and lack ...
Comments